0% found this document useful (0 votes)
84 views358 pages

Terrence Tao - An Epsilon of Room, I - Real Analysis

Uploaded by

Farrukh Afzal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views358 pages

Terrence Tao - An Epsilon of Room, I - Real Analysis

Uploaded by

Farrukh Afzal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 358

An Epsilon of Room, I: Real Analysis: pages

from year three of a mathematical blog

Terence Tao

This is a preliminary version of the book An Epsilon of Room, I: Real Analysis: pages from year three
of a mathematical blog published by the American Mathematical Society (AMS). This preliminary
version is made available with the permission of the AMS and may not be changed, edited, or reposted
at any other website without explicit written permission from the author and the AMS.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
To Garth Gaudry, who set me on the road;
To my family, for their constant support;
And to the readers of my blog, for their feedback and contributions.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Contents

Preface ix
A remark on notation x
Acknowledgments xi

Chapter 1. Real analysis

§1.1. A quick review of measure and integration theory 3

§1.2. Signed measures and the Radon-Nikodym-Lebesgue


theorem 15

§1.3. Lp spaces 27

§1.4. Hilbert spaces 45

§1.5. Duality and the Hahn-Banach theorem 59

§1.6. A quick review of point-set topology 71

§1.7. The Baire category theorem and its Banach space


consequences 85

§1.8. Compactness in topological spaces 101

§1.9. The strong and weak topologies 117

§1.10. Continuous functions on locally compact Hausdorff


spaces 133

§1.11. Interpolation of Lp spaces 157

vii

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
viii Contents

§1.12. The Fourier transform 183


§1.13. Distributions 211
§1.14. Sobolev spaces 235
§1.15. Hausdorff dimension 257

Chapter 2. Related articles


§2.1. An alternate approach to the Carathéodory extension
theorem 277
§2.2. Amenability, the ping-pong lemma, and the Banach-
Tarski paradox 281
§2.3. The Stone and Loomis-Sikorski representation theorems 293
§2.4. Well-ordered sets, ordinals, and Zorn’s lemma 301
§2.5. Compactification and metrisation 311
§2.6. Hardy’s uncertainty principle 317
§2.7. Create an epsilon of room 323
§2.8. Amenability 333
Bibliography 339
Index 345

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Preface

In February of 2007, I converted my “What’s new” web page of research


updates into a blog at terrytao.wordpress.com. This blog has since grown
and evolved to cover a wide variety of mathematical topics, ranging from my
own research updates, to lectures and guest posts by other mathematicians,
to open problems, to class lecture notes, to expository articles at both basic
and advanced levels.
With the encouragement of my blog readers, and also of the AMS, I
published many of the mathematical articles from the first two years of the
blog as [Ta2008] and [Ta2009], which will henceforth be referred to as
Structure and Randomness and Poincaré’s Legacies Vols. I, II. This gave
me the opportunity to improve and update these articles to a publishable
(and citeable) standard, and also to record some of the substantive feedback
I had received on these articles by the readers of the blog.
The current text contains many (though not all) of the posts for the third
year (2009) of the blog, focusing primarily on those posts of a mathematical
nature which were not contributed primarily by other authors, and which
are not published elsewhere. It has been split into two volumes.
The current volume consists of lecture notes from my graduate real anal-
ysis courses that I taught at UCLA (Chapter ), together with some related
material in Chapter 1.15.2. These notes cover the second part of the gradu-
ate real analysis sequence here, and therefore assume some familiarity with
general measure theory (in particular, the construction of Lebesgue mea-
sure and the Lebesgue integral, and more generally the material reviewed
in Section 1.1), as well as undergraduate real analysis (e.g., various notions
of limits and convergence). The notes then cover more advanced topics in

ix

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
x Preface

measure theory (notably, the Lebesgue-Radon-Nikodym and Riesz represen-


tation theorems) as well as a number of topics in functional analysis, such
as the theory of Hilbert and Banach spaces, and the study of key function
spaces such as the Lebesgue and Sobolev spaces, or spaces of distributions.
The general theory of the Fourier transform is also discussed. In addi-
tion, a number of auxiliary (but optional) topics, such as Zorn’s lemma, are
discussed in Chapter 1.15.2. In my own course, I covered the material in
Chapter 1 only and also used Folland’s text [Fo2000] as a secondary source.
But I hope that the current text may be useful in other graduate real analy-
sis courses, particularly in conjunction with a secondary text (in particular,
one that covers the prerequisite material on measure theory).
The second volume in this series (referred to henceforth as Volume II )
consists of sundry articles on a variety of mathematical topics, which is only
occasionally related to the above course, and can be read independently.

A remark on notation
For reasons of space, we will not be able to define every single mathematical
term that we use in this book. If a term is italicised for reasons other than
emphasis or for definition, then it denotes a standard mathematical object,
result, or concept, which can be easily looked up in any number of references.
(In the blog version of the book, many of these terms were linked to their
Wikipedia pages, or other online reference pages.)
I will, however, mention a few notational conventions that I will use
throughout. The cardinality of a finite set E will be denoted |E|. We will
use the asymptotic notation X = O(Y ), X  Y , or Y  X to denote the
estimate |X| ≤ CY for some absolute constant C > 0. In some cases we will
need this constant C to depend on a parameter (e.g., d), in which case we
shall indicate this dependence by subscripts, e.g., X = Od (Y ) or X d Y .
We also sometimes use X ∼ Y as a synonym for X  Y  X.
In many situations there will be a large parameter n that goes off to
infinity. When that occurs, we also use the notation on→∞ (X) or simply
o(X) to denote any quantity bounded in magnitude by c(n)X, where c(n)
is a function depending only on n that goes to zero as n goes to infinity. If
we need c(n) to depend on another parameter, e.g., d, we indicate this by
further subscripts, e.g., on→∞;d (X).
We will occasionally use the averaging notation Ex∈X f (x) :=
x∈X f (x) to denote the average value of a function f : X → C on
1
|X|
a non-empty finite set X.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Preface xi

Acknowledgments
The author is supported by a grant from the MacArthur Foundation, by
NSF grant DMS-0649473, and by the NSF Waterman Award.
Thanks to Kestutis Cesnavicius, Wolfgang M., Daniel Mckenzie, Simion,
Snegud, Blake Stacey, Konrad Swanepoel, and anonymous commenters for
global corrections to the text, and to Edward Dunne at the American Math-
ematical Society for encouragement and editing.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Chapter 1

Real analysis

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.1

A quick review of
measure and
integration theory

In this section we quickly review the basics of abstract measure theory and
integration theory, which was covered in the previous course but will of
course be relied upon in the current course. This is only a brief summary
of the material; certainly, one should consult a real analysis text for the full
details of the theory.

1.1.1. Measurable spaces. Ideally, measure theory on a space X should


be able to assign a measure (or volume, or mass, etc.) to every set in X.
Unfortunately, due to paradoxes such as the Banach-Tarski paradox, many
natural notions of measure (e.g., Lebesgue measure) cannot be applied to
measure all subsets of X; instead, we must restrict our attention to certain
measurable subsets of X. This turns out to suffice for most applications;
for instance, just about any non-pathological subset of Euclidean space that
one actually encounters will be Lebesgue measurable (as a general rule of
thumb, any set which does not rely on the axiom of choice in its construction
will be measurable).
To formalise this abstractly, we use
Definition 1.1.1 (Measurable spaces). A measurable space (X, X ) is a set
X, together with a collection X of subsets of X which form a σ-algebra, thus
X contains the empty set and X, and is closed under countable intersections,
countable unions, and complements. A subset of X is said to be measurable
with respect to the measurable space if it lies in X .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
4 1. Real analysis

A function f : X → Y from one measurable space (X, X ) to another


(Y, Y) is said to be measurable if f −1 (E) ∈ X for all E ∈ Y.
Remark 1.1.2. The class of measurable spaces forms a category, with the
measurable functions being the morphisms. The symbol σ stands for count-
able union; cf. σ-compact, σ-finite, Fσ set.
Remark 1.1.3. The notion of a measurable space (X, X ) (and of a mea-
surable function) is superficially similar to that of a topological space (X, F )
(and of a continuous function); the topology F contains ∅ and X just as the
σ-algebra X does, but is now closed under arbitrary unions and finite in-
tersections, rather than countable unions, countable intersections, and com-
plements. The two categories are linked to each other by the Borel algebra
construction; see Example 1.1.5 below.
Example 1.1.4. We say that one σ-algebra X on a set X is coarser than
another X  (or that X  is finer than X ) if X ⊂ X  (or equivalently, if the
identity map from (X, X  ) to (X, X ) is measurable); thus every set which
is measurable in the coarse space is also measurable in the fine space. The
coarsest σ-algebra on a set X is the trivial σ-algebra {∅,X}, while the finest
is the discrete σ-algebra 2X := {E : E ⊂X}.
 
Example 1.1.5. The intersection α∈A Xα := α∈A Xα of an arbitrary
family (Xα )α∈A of σ-algebras on X is another σ-algebra on X. Because
of this, given any collection F of sets on X we can define the σ-algebra
B[F] generated by F , defined to be the intersection of all the σ-algebras
containing F , or equivalently the coarsest algebra for which all sets in F are
measurable. (This intersection is non-vacuous, since it will always involve
the discrete σ-algebra 2X .) In particular, the open sets F of a topological
space (X, F) generate a σ-algebra, known as the Borel σ-algebra of that
space.

We can also define the join α∈A Xα of any family (Xα )α∈A of σ-algebras
on X by the formula
 
(1.1) Xα := B[ Xα ].
α∈A α∈A
For instance, the Lebesgue σ-algebra L of Lebesgue measurable sets on a
Euclidean space Rn is the join of the Borel σ-algebra B and of the algebra
of null sets and their complements (also called co-null sets).
Exercise 1.1.1. A function f : X → Y from one topological space to
another is said to be Borel measurable if it is measurable once X and Y are
equipped with their respective Borel σ-algebras. Show that every continuous
function is Borel measurable. (The converse statement, of course, is very far
from being true; for instance, the pointwise limit of a sequence of measurable

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 5

functions, if it exists, is also measurable, whereas the analogous claim for


continuous functions is completely false.)
Remark 1.1.6. A function f : Rn → C is said to be Lebesgue measurable
if it is measurable from Rn (with the Lebesgue σ-algebra) to C (with the
Borel σ-algebra), or equivalently if f −1 (B) is Lebesgue measurable for every
open ball B in C. Note the asymmetry between Lebesgue and Borel here;
in particular, the composition of two Lebesgue measurable functions need
not be Lebesgue measurable.
Example 1.1.7. Given a function f : X → Y from a set X to a measurable
space (Y, Y), we can define the pullback f −1 (Y) of Y to be the σ-algebra
f −1 (Y) := {f −1 (E) : E ∈ Y}; this is the coarsest structure on X that makes
f measurable. For instance, the pullback of the Borel σ-algebra from [0, 1]
to [0, 1]2 under the map (x, y) → x consists of all sets of the form E × [0, 1],
where E ⊂ [0, 1] is Borel measurable.
More generally, given a family (fα : X → Yα )α∈A of functions into mea-
surable spaces (Yα , Yα ), we can form the σ-algebra α∈A fα−1 (Yα ) generated
by the fα ; this is the coarsest structure on X that makes all the fα simul-
taneously measurable.
Remark 1.1.8. In probability theory and information theory, the func-
tions fα : X → Yα in Example 1.1.7 can be interpreted as observables, and
the σ-algebra generated by these observables thus captures mathematically
the concept of observable information. For instance, given a time parame-
ter t, one might define the σ-algebra F≤t generated by all observables for
some random process (e.g., Brownian motion) that can be made at time t
or earlier; this endows the underlying event space X with an uncountable
increasing family of σ-algebras.
Example 1.1.9. If E is a subset of a measurable space (Y, Y), the pullback
of Y under the inclusion map ι : E → Y is called the restriction of Y to
E and is denoted Y E . Thus, for instance, we can restrict the Borel and
Lebesgue σ-algebras on a Euclidean space Rn to any subset of such a space.

Exercise 1.1.2. Let M be an n-dimensional manifold, and let (πα : Uα →


Vα ) be an atlas of coordinate charts for M , where Uα is an open cover of M
and Vα are open subsets of Rn . Show that the Borel σ-algebra on M is the
unique σ-algebra whose restriction to each Uα is the pullback via πα of the
restriction of the Borel σ-algebra of Rn to Vα .
Example 1.1.10. A function f : X → A into some index set A will partition
X into level sets f −1 ({α}) for α ∈ A; conversely, every partition X =

α∈A Eα of X arises from at least one function f in this manner (one can

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
6 1. Real analysis

just take f to be the map from points in X to the partition cell in which that
point lies). Given such an f , we call the σ-algebra f −1 (2A ) the σ-algebra
generated by the partition; a set is measurable with respect
 to this structure
if and only if it is the union of some subcollection α∈B Eα of cells of the
partition.
Exercise 1.1.3. Showthat a σ-algebra on a finite set X necessarily arises
from a partition X = α∈A Eα as in Example 1.1.10, and furthermore the
partition is unique (up to relabeling). Thus, in the finitary world, σ-algebras
are essentially the same concept as partitions.
Example 1.1.11. Let (Xα , Xα )α∈A be a family of measurable spaces,
then the Cartesian product α∈A Xα has canonical projection maps
πβ : α∈A Xα → Xβ for each β ∈ A. The product σ-algebra α∈A Xα
is defined as the σ-algebra on α∈A Xα generated by the πα , as in Example
1.1.7.
Exercise 1.1.4. Let (Xα )α∈A be an at most countable family of second
countable topological spaces. Show that the Borel σ-algebra of the prod-
uct space (with the product topology) is equal to the product of the Borel
σ-algebras of the factor spaces. In particular, the Borel σ-algebra on Rn
is the product of n copies of the Borel σ-algebra on R. (The claim can
fail when the countability hypotheses are dropped, though in most applica-
tions in analysis, these hypotheses are satisfied.) We caution however that
the Lebesgue σ-algebra on Rn is not the product of n copies of the one-
dimensional Lebesgue σ-algebra, as it contains some additional null sets;
however, it is the completion of that product.
Exercise 1.1.5. Let (X, X ) and (Y, Y) be measurable spaces. Show that
if E is measurable with respect to X × Y, then for every x ∈ X, the set
{y ∈ Y : (x, y) ∈ E} is measurable in Y, and similarly for every y ∈ Y ,
the set {x ∈ X : (x, y) ∈ E} is measurable in X . Thus, sections of Borel
measurable sets are again Borel measurable. (The same is not true for
Lebesgue measurable sets.)
1.1.2. Measure spaces. Now we endow measurable spaces with a mea-
sure, turning them into measure spaces.
Definition 1.1.12 (Measures). A (non-negative) measure μ on a measur-
able space (X, X ) is a function μ : X → [0, +∞] such that μ(∅) =0, and such
that we have the countable additivity property μ( ∞ E
n=1 n ) = ∞
n=1 μ(En )
whenever E1 , E2 , . . . are disjoint measurable sets. We refer to the triplet
(X, X , μ) as a measure space.
A measure space (X, X , μ) is finite if μ(X) < ∞; it is a probability space
if μ(X) = 1 (and then we call μ a probability measure). It is σ-finite if X
can be covered by countably many sets of finite measure.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 7

A measurable set E is a null set if μ(E) = 0. A property on points x in


X is said to hold for almost every x ∈ X (or almost surely, for probability
spaces) if it holds outside of a null set. We abbreviate “almost every” and
“almost surely” as a.e. and a.s., respectively. The complement of a null set
is said to be a co-null set or to have full measure.
Example 1.1.13 (Dirac measures). Given any measurable space (X, X )
and a point x ∈ X, we can define the Dirac measure (or Dirac mass) δx to
be the measure such that δx (E) = 1 when x ∈ E and δx (E) = 0, otherwise.
This is a probability measure.
Example 1.1.14 (Counting measure). Given any measurable space (X, X ),
we define counting measure # by defining #(E) to be the cardinality |E|
of E when E is finite, or +∞ otherwise. This measure is finite when X is
finite, and σ-finite when X is at most countable. If X is also finite, we can
1
define normalised counting measure |E| #; this is a probability measure, also
known as the uniform probability measure on X (especially if we give X the
discrete σ-algebra).
Example 1.1.15. Any finite non-negative linear combination of measures
is again a measure; any finite convex combination of probability measures
is again a probability measure.
Example 1.1.16. If f : X → Y is a measurable map from one measurable
space (X, X ) to another (Y, Y), and μ is a measure on X , we can define the
push-forward f∗ μ : Y → [0, +∞] by the formula f∗ μ(E) := μ(f −1 (E)); this
is a measure on (Y, Y). Thus, for instance, f∗ δx = δf (x) for all x ∈ X.

We record some basic properties of measures of sets:


Exercise 1.1.6. Let (X, X , μ) be a measure space. Show the following
statements:
(i) Monotonicity. If E ⊂ F are measurable sets, then μ(E) ≤ μ(F ). (In
particular, any measurable subset of a null set is again a null set.)
(ii) Countable subadditivity. If E1 , E2 , . . .are a countable sequence of
measurable sets, then μ( ∞ n=1 En ) ≤

n=1 μ(En ). (Of course, one
also has subadditivity for finite sequences.) In particular, any count-
able union of null sets is again a null set.

 convergence for sets. If E1 ⊂ E2 ⊂ · · · are measurable,


(iii) Monotone
then μ( ∞
n=1 En ) = limn→∞ μ(En ).
(iv) Dominated convergence for sets.∞ If E1 ⊃ E2 ⊃ · · · are measurable,
and μ(E1 ) is finite, then μ( n=1 En ) = limn→∞ μ(En ). Show that
the claim can fail if μ(E1 ) is infinite.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
8 1. Real analysis

Exercise 1.1.7. A measure space is said to be complete if every subset of a


null set is measurable (and is thus again a null set). Show that every measure
space (X, X , μ) has a unique minimal complete refinement (X, X , μ), known
as the completion of (X, X , μ), and that a set is measurable in X if and only
if it is equal almost everywhere to a measurable set in X . (The completion
of the Borel σ-algebra with respect to Lebesgue measure is known as the
Lebesgue σ-algebra.)

A powerful way to construct measures on σ-algebras X is to first con-


struct them on a smaller Boolean algebra A that generates X , and then
extend them via the following result:
Theorem 1.1.17 (Carathéodory’s extension theorem, special case). Let
(X, X ) be a measurable space, and let A be a Boolean algebra (i.e., closed
under finite unions, intersections, and complements) that generates X . Let
μ : A → [0, +∞] be a function such that
(i) μ(∅) = 0;
∞
(ii) If A1 , A2 , . . . ∈ A are disjoint and n=1 An ∈ A, then

 ∞
μ( An ) = μ(An ).
n=1 n=1
Then μ can be extended to a measure μ : X → [0, +∞] on X , which we shall
also call μ.
Remark 1.1.18. The conditions (i) and (ii) in the above theorem are clearly
necessary if μ has any chance to be extended to a measure on X . Thus this
theorem gives a necessary and sufficient condition for a function on a Boolean
algebra to be extended to a measure. The extension can easily be shown to
be unique when X is σ-finite.

Proof. (Sketch)
∞ Define the outer measure μ∗ (E) of any set E ⊂ X as the

infimum of n=1 μ(An ), where (An )n=1 ranges over all coverings of E by
elements in A. It is not hard to see that if μ∗ agrees with μ on A, it will
suffice to show that it is a measure on X .
It is easy to check that μ∗ is monotone and countably subadditive (as
in parts (i), (ii) of Exercise 1.1.6) on all of 2X and assigns zero to ∅; thus it
is an outer measure in the abstract sense. But we need to show countable
additivity on X . The key is to first show the related property
(1.2) μ∗ (A) = μ∗ (A ∩ E) + μ∗ (A\E)
for all A ⊂ X and E ∈ X . This can first be shown for E ∈ A, and then
one observes that the class of E that obeys (1.2) for all A is a σ-algebra; we
leave this as a (moderately lengthy) exercise.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 9

The identity (1.2) already shows that μ∗ is finitely additive on X ; com-


bining this with countable subadditivity and monotonicity, we conclude that
μ∗ is countably additive, as required. 
Exercise 1.1.8. Let the notation and hypotheses be as in Theorem 1.1.17.
Show that given any ε > 0 and any set E ∈ X of finite measure, there exists
a set F ∈ A which differs from E by a set of measure at most ε. If X is
σ-finite, show that the hypothesis that E has finite measure can be removed.
(Hint: First reduce to the case when X is finite, then show that the class
of all E obeying this property is a σ-algebra.) Thus sets in the σ-algebra X
almost lie in the algebra A; this is an example of Littlewood’s first principle.
The same statements of course apply for the completion X of X .

One can use Theorem 1.1.17 to construct Lebesgue measure on R and


on Rn (taking A to be, say, the algebra generated by half-open intervals or
boxes), although the verification of hypothesis (ii) of Theorem 1.1.17 turns
out to be somewhat delicate, even in the one-dimensional case. But one
can at least get the higher-dimensional Lebesgue measure from the one-
dimensional one by the product measure construction:
Exercise 1.1.9. Let (X1 , X1 , μ1 ), . . . , (Xn , Xn , μn ) be a finite collection of
measure spaces, and let ( ni=1 Xi , ni=1 Xi ) be the product measurable space.
Show that there exists a unique measure μ on this space such that μ( ni=1 Ai )
= ni=1 μ(Ai ) for all Ai ∈ Xi . The measure μ is referred to as the product
measure of the μ1 , . . . , μn and is denoted ni=1 μi .
Exercise 1.1.10. Let E be a Lebesgue measurable subset of Rn , and let
m be Lebesgue measure. Establish the inner regularity property
(1.3) m(E) = sup{μ(K) : K ⊂ E, compact}
and the outer regularity property
(1.4) m(E) = inf{μ(U ) : E ⊂ U, open}.
Combined with the fact that m is locally finite, this implies that m is a
Radon measure; see Definition 1.10.2.

1.1.3. Integration. Now we define integration on a measure space


(X, X , μ).
Definition 1.1.19 (Integration). Let (X, X , μ) be a measure space.
(i) If f : X → [0, +∞] is a non-negative simple function (i.e., a measur-
able function that only takes on finitely many values
 a1 , . . . , an ), we
define the integral X f dμ of f to be X f dμ = ni=1 ai μ(f −1 ({ai }))
(with the convention that ∞ · 0 = 0). In particular, if f = 1A is the
indicator function of a measurable set A, then X 1A dμ = μ(A).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
10 1. Real analysis

(ii) If f : X → [0, +∞] is a non-negative measurable function, we define


the integral X f dμ to be the supremum of X g dμ, where g ranges
over all simple functions bounded between 0 and f .
(iii) If f : X → [−∞, +∞] is a measurable function whose positive and
negative parts f+ := max(f, 0), f− := max(−f, 0) have finite inte-
gral, we say that f is absolutely integrable and define X f dμ :=
X f+ dμ − X f− dμ.
(iv) If f : X → C is a measurable function with real and imaginary parts
absolutely integrable, we say that f is absolutely integrable and define
X f dμ := X Re f dμ + i X Im f dμ.

We will sometimes show the variable of integration, e.g., writing Xf (x)dμ(x)


for X f dμ, for sake of clarity.

The following results are standard, and the proofs are omitted:

Theorem 1.1.20 (Standard facts about integration). Let (X, X , μ) be a


measure space.
• All the above integration notions are compatible with each other; for
instance, if f is both non-negative and absolutely integrable, then
Definition parts (ii) and (iii) (and (iv)) agree.
• The functional f → X f dμ is linear over R+ for simple functions
or non-negative functions, is linear over R for real-valued absolutely
integrable functions, and linear over C for complex-valued absolutely
integrable functions. In particular, the set of (real or complex) abso-
lutely integrable functions on (X, X , μ) is a (real or complex) vector
space.
• A complex-valued measurable function f : X → C is absolutely in-
tegrable if and only if X |f | dμ < ∞, in which case we have the
triangle inequality | X f dμ| ≤ X |f | dμ. Of course, the same claim
holds for real-valued measurable functions.
• If f : X → [0, +∞] is non-negative, then X f dμ ≥ 0, with equality
holding if and only if f = 0 a.e.
• If one modifies an absolutely integrable function on a set of measure
zero, then the new function is also absolutely integrable and has the
same integral as the original function. Similarly, two non-negative
functions that agree a.e. have the same integral. (Because of this,
we can meaningfully integrate functions that are only defined almost
everywhere.)
• If f : X → C is absolutely integrable, then f is finite a.e. and vanishes
outside of a σ-finite set.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 11

• If f : X → C is absolutely integrable and ε > 0, then there exists a


complex-valued simple function g : X → C such that X |f − g| dμ ≤
ε. (This is a manifestation of Littlewood’s second principle.)
• Change of variables formula. If φ : X → Y is a measurable map
to another measurable space (Y, Y), and g : Y → C, then we have
X g ◦ φ dμ = Y g dφ∗ μ, in the sense that whenever one of the
integrals is well defined, then the other is also and equals the first.
• It is also important to note that the Lebesgue integral on Rn extends
the more classical Riemann integral. As a consequence, many prop-
erties of the Riemann integral (e.g., change of variables formula with
respect to smooth diffeomorphisms) are inherited by the Lebesgue in-
tegral, thanks to various limiting arguments.

We now recall the fundamental convergence theorems relating limits


and integration: the first three are for non-negative functions, the last three
are for absolutely integrable functions. They are ultimately derived from
their namesakes in Exercise 1.1.5 and an approximation argument by simple
functions; again the proofs are omitted. (They are also closely related to
each other, and are in fact largely equivalent.)
Theorem 1.1.21 (Convergence theorems). Let (X,X, μ) be a measure space.
• Monotone convergence for sequences. If 0 ≤ f1 ≤ f2 ≤ · · · are
measurable, then X limn→∞ fn dμ = limn→∞ X fn dμ.
• Monotone convergence
 ∞ If fn : X → [0, +∞] are measur-
for series.
able, then X ∞ f
n=1 n dμ = n=1 X fn dμ.
• Fatou’s lemma. If fn : X → [0, +∞] are measurable, then

lim inf fn dμ ≤ lim inf fn dμ.


X n→∞ n→∞ X
• Dominated convergence for sequences. If fn : X → C are measurable
functions converging pointwise a.e. to a limit f and |fn | ≤ g a.e. for
some absolutely integrable g : X → [0, +∞], then

lim fn dμ = lim fn dμ.


X n→∞ n→∞ X

• Dominated convergence
 for series. If fn: X → C are measurable
functions with n X |fn | dμ < ∞, then  n fn (x) is absolutely con-
vergent for a.e. x and X ∞ n=1 fn dμ =

n=1 X fn dμ.
• Egorov’s theorem. If fn : X → C are measurable functions converg-
ing pointwise a.e. to a limit f on a subset A of X of finite measure
and ε > 0, then there exists a set of measure at most ε, outside of
which fn converges uniformly to f in A. (This is a manifestation of
Littlewood’s third principle.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
12 1. Real analysis

Remark 1.1.22. As a rule of thumb, if one does not have exact or approx-
imate monotonicity or domination (where “approximate” means “up to an
error e whose L1 norm X |e| dμ goes to zero”), then one should not expect
the integral of a limit to equal the limit of the integral in general; there is
just too much room for oscillation.
Exercise 1.1.11. Let f : X → C be an absolutely integrable function on a
measure space (X, X , μ). Show that f is uniformly integrable, in the sense
that for every ε > 0 there exists δ > 0 such that E |f | dμ ≤ ε whenever
E is a measurable set of measure at most δ. (The property of uniform
integrability becomes more interesting, of course, when applied to a family
of functions rather than to a single function.)

With regard to product measures and integration, the fundamental the-


orem in this subject is
Theorem 1.1.23 (Fubini-Tonelli theorem). Let (X, X , μ) and (Y, Y, ν) be
σ-finite measure spaces, with product space (X × Y, X × Y, μ × ν).
• Tonelli’s theorem. If f : X × Y → [0, +∞] is measurable, then

f dμ × ν = ( f (x, y) dν(y)) dμ(x)


X×Y X Y

= ( f (x, y) dμ(x))dν(y).
Y X
• Fubini’s theorem. If f : X × Y → C is absolutely integrable, then we
also have
f dμ × ν = ( f (x, y) dν(y)) dμ(x)
X×Y X Y

= ( f (x, y) dμ(x))dν(y),
Y X
with the inner integrals being absolutely integrable a.e. and the outer
integrals all being absolutely integrable.
If (X, X , μ) and (Y, Y, ν) are complete measure spaces, then the same claims
hold with the product σ-algebra X × Y replaced by its completion.
Remark 1.1.24. The theorem fails for non-σ-finite spaces, but virtually ev-
ery measure space actually encountered in “hard analysis” applications will
be σ-finite. (One should be cautious, however, with any space constructed
using ultrafilters or the first uncountable ordinal.) It is also important that
f obey some measurability in the product space; there exist non-measurable
f for which the iterated integrals exist (and may or may not be equal to
each other, depending on the properties of f and even on which axioms of
set theory one chooses), but the product integral (of course) does not.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 13

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/01.
Thanks to Andy, PDEBeginner, Phil, Sune Kristian Jacobsen, wangtwo,
and an anonymous commenter for corrections.
Several commenters noted Solovay’s theorem, which asserts that there
exist models of set theory without the axiom of choice in which all sets are
measurable. This led to some discussion of the extent in which one could
formalise the claim that any set which could be defined without the axiom
of choice was necessarily measurable, but the discussion was inconclusive.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.2

Signed measures and


the Radon-Nikodym-
Lebesgue theorem

In this section, X = (X, X ) is a fixed measurable space. We shall often


omit the σ-algebra X and simply refer to elements of X as measurable sets.
Unless otherwise indicated, all subsets of X appearing below are restricted
to be measurable, and all functions on X appearing below are also restricted
to be measurable.
We let M+ (X) denote the space of measures on X, i.e., functions μ :
X → [0, +∞] which are countably additive and send ∅ to 0. For reasons that
will be clearer later, we shall refer to such measures as unsigned measures.
In this section we investigate the structure of this space, together with the
closely related spaces of signed measures and finite measures.
Suppose that we have already constructed one unsigned measure m ∈
M+ (X) on X (e.g., think of X as the real line with the Borel σ-algebra,
and let m be the Lebesgue measure). Then we can obtain many further
unsigned measures on X by multiplying m by a function f : X → [0, +∞],
to obtain a new unsigned measure mf , defined by the formula

(1.5) mf (E) := 1E f dμ.


X

If f = 1A is an indicator function, we write m A for m1A , and refer to


this measure as the restriction of m to A.

15

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
16 1. Real analysis

Exercise 1.2.1. Show (using the monotone convergence theorems, Theorem


1.1.21) that mf is indeed an unsigned measure, and for any g : X → [0, +∞],
we have X g dmf = X gf dm. We will express this relationship symboli-
cally as
(1.6) dmf = f dm.
Exercise 1.2.2. Let m be σ-finite. Given two functions f, g : X → [0, +∞],
show that mf = mg if and only if f (x) = g(x) for m-almost every x. (Hint:
As usual, first do the case when m is finite. The key point is that if f and
g are not equal m-almost everywhere, then either f > g on a set of positive
measure, or f < g on a set of positive measure.) Give an example to show
that this uniqueness statement can fail if m is not σ-finite. (Hint: The space
X can be very simple.)

In view of Exercises 1.2.1 and 1.2.2, let us temporarily call a measure


μ differentiable with respect to m if dμ = f dm (i.e., μ = mf ) for some
f : X → [0, +∞], and call f the Radon-Nikodym derivative of μ with respect
to m, writing

(1.7) f= ;
dm
by Exercise 1.2.2, we see if m is σ-finite that this derivative is defined up to
m-almost everywhere equivalence.
Exercise 1.2.3 (Relationship between Radon-Nikodym derivative and clas-
sical derivative). Let m be the Lebesgue measure on [0, +∞), and let μ be
an unsigned measure that is differentiable with respect to m. If μ has a con-

tinuous Radon-Nikodym derivative dm , show that the function x → μ([0, x])
d dμ
is differentiable and dx μ([0, x]) = dm (x) for all x.
Exercise 1.2.4. Let X be at most countable with the discrete σ-algebra.
Show that every measure on X is differentiable with respect to counting
measure #.

If every measure was differentiable with respect to m (as is the case in


Exercise 1.2.4), then we would have completely described the space of mea-
sures of X in terms of the non-negative functions of X (modulo m-almost
everywhere equivalence). Unfortunately, not every measure is differentiable
with respect to every other: for instance, if x is a point in X, then the only
measures that are differentiable with respect to the Dirac measure δx are
the scalar multiples of that measure. We will explore the precise obstruc-
tion that prevents all measures from being differentiable, culminating in the
Radon-Nikodym-Lebesgue theorem that gives a satisfactory understanding
of the situation in the σ-finite case (which is the case of interest for most
applications).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 17

In order to establish this theorem, it will be important to first study


some other basic operations on measures, notably the ability to subtract one
measure from another. This will necessitate the study of signed measures,
to which we now turn.

1.2.1. Signed measures. We have seen that if we fix a reference measure


m, then non-negative functions f : X → [0, +∞] (modulo m-almost ev-
erywhere equivalence) can be identified with unsigned measures mf : X →
[0, +∞]. This motivates various operations on measures that are analogous
to operations on functions (indeed, one could view measures as a kind of
“generalised function” with respect to a fixed reference measure m). For in-
stance, we can define the sum of two unsigned measures μ, ν : X → [0, +∞]
as
(1.8) (μ + ν)(E) := μ(E) + ν(E)
and non-negative scalar multiples cμ for c > 0 by
(1.9) (cμ)(E) := c(μ(E)).
We can also say that one measure μ is less than another ν if
(1.10) μ(E) ≤ ν(E) for all E ∈ X .
These operations are all consistent with their functional counterparts, e.g.,
mf +g = mf + mg , etc.
Next, we would like to define the difference μ − ν of two unsigned mea-
sures. The obvious thing to do is to define
(1.11) (μ − ν)(E) := μ(E) − ν(E),
but we have a problem if μ(E) and ν(E) are both infinite: ∞ − ∞ is unde-
fined! To fix this problem, we will only define the difference of two unsigned
measures μ, ν if at least one of them is a finite measure. Observe that in
such a case, μ − ν takes values in (−∞, +∞] or [−∞, +∞), but not both.
Of course, we no longer expect μ − ν to be monotone. However, it is
still
∞ finitely additive and even countably  additive in the sense that the sum

n=1 (μ − ν)(E n ) converges to (μ − ν)( n=1 En ) whenever E1 , E2 , . . . are
disjoint
∞ sets. Furthermore, the sum is absolutely convergent when (μ −
ν)( n=1 En ) is finite. This motivates

Definition 1.2.1 (Signed measure). A signed measure is a map μ : X →


[−∞, +∞] such that
(i) μ(∅) = 0;
(ii) μ can take either the value +∞ or −∞, but not both;

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
18 1. Real analysis

∞
E1 , E2 , . . . ⊂ X are disjoint, then
(iii) If  n=1 μ(En ) converges to

μ( n=1 En ), with the former sum being absolutely convergent1 if
the latter expression is finite.
Thus every unsigned measure is a signed measure, and the difference of
two unsigned measures is a signed measure if at least one of the unsigned
measures is finite; we will see shortly that the converse statement is also true;
i.e., every signed measure is the difference of two unsigned measures (with
one of the unsigned measures being finite). Another example of a signed
measure are the measures mf defined by (1.5), where f : X → [−∞, +∞]
is now signed rather than unsigned, but with the assumption that at least
one of the signed parts f+ := max(f, 0), f− := max(−f, 0) of f is absolutely
integrable.
We also observe that a signed measure μ is unsigned if and only if μ ≥ 0
(where we use (1.10) to define order on measures).
Given a function f : X → [−∞, +∞], we can partition X into one
set X+ := {x : f (x) ≥ 0} on which f is non-negative and another set
X− := {x : f (x) < 0} on which f is negative; thus f X+ ≥ 0 and f X− ≤ 0.
It turns out that the same is true for signed measures:
Theorem 1.2.2 (Hahn decomposition theorem). Let μ be a signed measure.
Then one can find a partition X = X+ ∪ X− such that μ X+ ≥ 0 and
μ X− ≤ 0.

Proof. By replacing μ with −μ if necessary, we may assume that μ avoids


the value +∞.
Call a set E totally positive if μ E ≥ 0 and totally negative if μ E ≤ 0.
The idea is to pick X+ to be the totally positive set of maximal measure—
a kind of “greedy algorithm”, if you will. More precisely, define m+ to
be the supremum of μ(E), where E ranges over all totally positive sets.
(The supremum is non-vacuous, since the empty set is totally positive.) We
claim that the supremum is actually attained. Indeed, we can always find a
maximising sequence E1 , E2 , . . . of totally positive
∞ sets with μ(En ) → m+ .
It is not hard to see that the union X+ := n=1 En is also totally positive,
and μ(X+ ) = m+ as required. Since μ avoids +∞, we see in particular that
m+ is finite.
Set X− := X\X+ . We claim that X− is totally negative. We do this as
follows. Suppose for contradiction that X− is not totally negative, then there
exists a set E1 in X− of strictly positive measure. If E1 is totally positive,
then X+ ∪ E1 is a totally positive set having measure strictly greater than
1 Actually, the absolute convergence is automatic from the Riemann rearrangement theorem.

Another consequence of (iii) is that any subset of a finite measure set is again of finite measure,
and the finite union of finite measure sets again has finite measure.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 19

m+ , a contradiction. Thus E1 must contain a subset E2 of strictly larger


measure. Let us pick E2 so that μ(E2 ) ≥ μ(E1 ) + 1/n1 , where n1 is the
smallest integer for which such an E2 exists. If E2 is totally positive, then
we are again done, so we can find a subset E3 with μ(E3 ) ≥ μ(E2 ) + 1/n2 ,
where n2 is the smallest integer for which such an E3 exists. Continuing
in this fashion, we either stop and get a contradiction or obtain a nested
sequence of sets E1 ⊃ E2 ⊃ · · · in X− of increasing
 positive measure (with
μ(Ej+1 ) ≥ μ(Ej )+1/nj ). The intersection E := j Ej then also has positive
measure, hence finite, which implies that the nj go to infinity; it is then not
difficult to see that E itself cannot contain any subsets of strictly larger
measure, and so E is a totally positive set of positive measure in X− , and
we again obtain a contradiction. 

Remark 1.2.3. A somewhat simpler proof of the Hahn decomposition the-


orem is available if we assume μ to be finite positive variation (which means
that μ(E) is bounded above as E varies). For each positive n, let En be a set
whose measure μ(En ) is within 2−n of sup{μ(E) : E ∈ X }. One can easily
show that any subset of En \En−1 has measure O(2−n ), and in particular
 −n ) for any n ≤ n. This allows one
that En \ nn−1
 =n En−1 has measure O(2
0 ∞ 0
to control the unions n=n0 En , and thence the lim sup X+ of the En , which
one can then show to have the required properties. One can in fact show
that any signed measure that avoids +∞ must have finite positive variation,
but this turns out to require a certain amount of work.

Let us say that a set E is null for a signed measure μ if μ E = 0. (This


implies that μ(E) = 0, but the converse is not true, since a set E of signed
measure zero could contain subsets of non-zero measure.) It is easy to see
that the sets X− , X+ given by the Hahn decomposition theorem are unique
modulo null sets.
Let us say that a signed measure μ is supported on E if the complement
of E is null (or equivalently, if μ E = μ. If two signed measures μ, ν can be
supported on disjoint sets, we say that they are mutually singular (or that
μ is singular with respect to ν) and write μ ⊥ ν. If we write μ+ := μ X+
and μ− := −μ X− , we thus soon establish

Exercise 1.2.5 (Jordan decomposition theorem). Every signed measure μ


can be uniquely decomposed as μ = μ+ − μ− , where μ+ , μ− are mutually
singular unsigned measures. (The only claim not already established is
the uniqueness.) We refer to μ+ , μ− as the positive and negative parts (or
positive and negative variations) of μ.

This is of course analogous to the decomposition f = f+ − f− of a


function into positive and negative parts. Inspired by this, we define the

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
20 1. Real analysis

absolute value (or total variation) |μ| of a signed measure to be |μ| := μ+ +


μ− .
Exercise 1.2.6. Show that |μ| is the minimal unsigned measure such that
−|μ|
∞ ≤ μ ≤ |μ|. Furthermore, |μ|(E) is equal to the maximum value of
|μ(E )|, where (E ) ∞ ranges over the partitions of E. (This may
n=1 n n n=1
help explain the terminology “total variation”.)
Exercise 1.2.7. Show that μ(E) is finite for every E if and only if |μ| is a
finite unsigned measure, if and only if μ+ , μ− are finite unsigned measures.
If any of these properties hold, we call μ a finite measure. (In a similar
spirit, we call a signed measure μ σ-finite if |μ| is σ-finite.)
The space of finite measures on X is clearly a real vector space and is
denoted M(X).

1.2.2. The Lebesgue-Radon-Nikodym theorem. Let m be a reference


unsigned measure. We saw at the beginning of this section that the map
f → mf is an embedding of the space L+ (X, dm) of non-negative func-
tions (modulo m-almost everywhere equivalence) into the space M+ (X)
of unsigned measures. The same map is also an embedding of the space
L1 (X, dm) of absolutely integrable functions (again modulo m-almost ev-
erywhere equivalence) into the space M(X) of finite measures. (To verify
this, one first makes the easy observation that the Jordan decomposition
of a measure mf given by an absolutely integrable function f is simply
mf = mf+ − mf− .)
In the converse direction, one can ask if every finite measure μ in M(X)
can be expressed as mf for some absolutely integrable f . Unfortunately,
there are some obstructions to this. First, from (1.5) we see that if μ = mf ,
then any set that has measure zero with respect to m must also have measure
zero with respect to μ. In particular, this implies that a non-trivial measure
that is singular with respect to m cannot be expressed in the form mf .
In the σ-finite case, this turns out to be the only obstruction:
Theorem 1.2.4 (Lebesgue-Radon-Nikodym theorem). Let m be an un-
signed σ-finite measure, and let μ be a signed σ-finite measure. Then there
exists a unique decomposition μ = mf + μs , where f ∈ L1 (X, dm) and
μs ⊥ m. If μ is unsigned, then f and μs are also.

Proof. We prove this only for the case when μ, ν are finite rather than
σ-finite, and leave the general case as an exercise. The uniqueness follows
from Exercise 1.2.2 and the previous observation that mf cannot be mutually
singular with m for any non-zero f , so it suffices to prove existence. By the
Jordan decomposition theorem, we may assume that μ is unsigned as well.
(In this case, we expect f and μs to be unsigned also.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 21

The idea is to select f “greedily”. More precisely, let M be the supremum


of the quantity X f dm, where f ranges over all non-negative functions such
that mf ≤ μ. Since μ is finite, M is finite. We claim that the supremum is
actually attained for some f . Indeed, if we let fn be a maximising sequence,
thus mfn ≤ μ and X fn dm → M , one easily checks that the function
f = supn fn attains the supremum.
The measure μs := μ − mf is a non-negative finite measure by construc-
tion. To finish the theorem, it suffices to show that μs ⊥ m.
It will suffice to show that (μs − εm)+ ⊥ m for all ε, as the claim then
easily follows by letting ε be a countable sequence going to zero. But if
(μs − εm)+ were not singular with respect to m, we see from the Hahn
decomposition theorem that there is a set E with m(E) > 0 such that
(μs − εm) E ≥ 0, and thus μs ≥ εm E . But then one could add ε1E to f ,
contradicting the construction of f . 
Exercise 1.2.8. Complete the proof of Theorem 1.2.4 for the σ-finite case.
We have the following corollary:
Corollary 1.2.5 (Radon-Nikodym theorem). Let m be an unsigned σ-finite
measure, and let μ be a signed σ-finite measure. Then the following are
equivalent.
(i) μ = mf for some f ∈ L1 (X, dm).
(ii) μ(E) = 0 whenever m(E) = 0.
(iii) For every ε > 0, there exists δ > 0 such that μ(E) < ε whenever
m(E) ≤ δ.
When any of these statements occur, we say that μ is absolutely continuous
with respect to m, and write μ  m. As in the start of this section, we call

f the Radon-Nikodym derivative of μ with respect to m, and write f = dm .

Proof. The implication of (iii) from (i) is Exercise 1.1.11. The implication
of (ii) from (iii) is trivial. To deduce (i) from (ii), apply Theorem 1.2.2
to μ and observe that μs is supported on a set of m-measure zero E by
hypothesis. Since E is null for m, it is null for mf and μ also, and so μs is
trivial, giving (i). 
Corollary 1.2.6 (Lebesgue decomposition theorem). Let m be an unsigned
σ-finite measure, and let μ be a signed σ-finite measure. Then there is a
unique decomposition μ = μac + μs , where μac  m and μs ⊥ m. (We refer
to μac and μs as the absolutely continuous and singular components of μ
with respect to m.) If μ is unsigned, then μac and μs are also.
Exercise 1.2.9. If every point in X is measurable, we call a signed measure
μ continuous if μ({x}) = 0 for all x. Let the hypotheses be as in Corollary

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
22 1. Real analysis

1.2.6, but suppose also that every point is measurable and m is continuous.
Show that there is a unique decomposition μ = μac + μsc + μpp , where
μac  m, μpp is supported on an at most countable set, and μsc is both
singular with respect to m and continuous. Furthermore, if μ is unsigned,
then μac , μsc , μpp are also. We call μsc and μpp the singular continuous and
pure point components of μ, respectively.
Example 1.2.7. A Cantor measure is singular continuous with respect to
Lebesgue measure, while Dirac measures are pure point. Lebesgue measure
on a line is singular continuous with respect to Lebesgue measure on a plane
containing that line.
Remark 1.2.8. Suppose one is decomposing a measure μ on a Euclidean
space Rd with respect to Lebesgue measure m on that space. Very roughly
speaking, a measure is pure point if it is supported on a 0-dimensional
subset of Rd , it is absolutely continuous if its support is spread out on a full
dimensional subset, and it is singular continuous if it is supported on some
set of dimension intermediate between 0 and d. For instance, if μ is the sum
of a Dirac mass at (0, 0) ∈ R2 , one-dimensional Lebesgue measure on the
x-axis, and two-dimensional Lebesgue measure on R2 , then these are the
pure point, singular continuous, and absolutely continuous components of
μ, respectively. This heuristic is not completely accurate (in part because we
have left the definition of “dimension” vague) but is not a bad rule of thumb
for a first approximation. We will study analytic concepts of dimension in
more detail in Section 1.15.
To motivate the terminology “continuous” and “singular continuous”,
we recall two definitions on an interval I ⊂ R, and make a third:
• A function f : I → R is continuous if for every x ∈ I and every
ε > 0, there exists δ > 0 such that |f (y) − f (x)| ≤ ε whenever y ∈ I
is such that |y − x| ≤ δ.
• A function f : I → R is uniformly continuous if for every ε > 0,
there exists δ > 0 such that |f (y) − f (x)| ≤ ε whenever [x, y] ⊂ I has
length at most δ.
• A function f : I → R is absolutely  continuous if for every ε >
0, there exists δ > 0 such that ni=1 |f (yi ) − f (xi )| ≤ ε whenever
[x1 , y1 ], . . . , [xn , yn ] are disjoint intervals in I of total length at most
δ.
Clearly, absolute continuity implies uniform continuity, which in turn implies
continuity. The significance of absolute continuity is that it is the largest
class of functions for which the fundamental theorem of calculus holds (using
the classical derivative and the Lebesgue integral), as can be seen in any
introductory graduate real analysis course.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 23

Exercise 1.2.10. Let m be Lebesgue measure on the interval [0, +∞], and
let μ be a finite unsigned measure.
Show that μ is a continuous measure if and only if the function x →
μ([0, x]) is continuous. Show that μ is an absolutely continuous measure
with respect to m if and only if the function x → μ([0, x]) is absolutely
continuous.

1.2.3. A finitary analogue of the Lebesgue decomposition (op-


tional). At first glance, the above theory is only non-trivial when the un-
derlying set X is infinite. For instance, if X is finite and m is the uni-
form distribution on X, then every other measure on X will be absolutely
continuous with respect to m, making the Lebesgue decomposition trivial.
Nevertheless, there is a non-trivial version of the above theory that can be
applied to finite sets (cf. Section 1.3 of Structure and Randomness). The
cleanest formulation is to apply it to a sequence of (increasingly large) sets
rather than to a single set:
Theorem 1.2.9 (Finitary analogue of the Lebesgue-Radon-Nikodym theo-
rem). Let Xn be a sequence of finite sets (and with the discrete σ-algebra),
and for each n, let mn be the uniform distribution on Xn , and let μn be
another probability measure on Xn . Then, after passing to a subsequence,
one has a decomposition
(1.12) μn = μn,ac + μn,sc + μn,pp ,
where:
(i) Uniform absolute continuity. For every ε > 0, there exists δ > 0
(independent of n) such that μn,ac (E) ≤ ε whenever mn (E) ≤ δ, for
all n and all E ⊂ Xn .
(ii) Asymptotic singular continuity. μn,sc is supported on a set of mn -
measure o(1), and we have μn,sc ({x}) = o(1) uniformly for all x ∈
Xn , where o(1) denotes an error that goes to zero as n → ∞.
(iii) Uniform pure point. For every ε > 0 there exists N > 0 (independent
of n) such that for each n, there exists a set En ⊂ Xn of cardinality
at most N such that μn,pp (Xn \En ) ≤ ε.

Proof. Using the Radon-Nikodym theorem (or just working by hand, since
everything is finite), we can write dμn = fn dmn for some fn : Xn → [0, +∞)
with average value 1.
For each positive integer k, the sequence μn ({fn ≥ k}) is bounded be-
tween 0 and 1, so by the Bolzano-Weierstrass theorem, it has a convergent
subsequence. Applying the usual diagonalisation argument (as in the proof
of the Arzelá-Ascoli theorem, Theorem 1.8.23), we may thus assume (after

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
24 1. Real analysis

passing to a subsequence and relabeling) that μn ({fn ≥ k}) converges for


positive k to some limit ck .
Clearly, the ck are decreasing and range between 0 and 1, and so converge
as k → ∞ to some limit 0 < c < 1.
Since limk→∞ limn→∞ μn ({fn ≥ k}) = c, we can find a sequence kn going
to infinity such that μn ({fn ≥ kn }) → c as n → ∞. We now set μn,ac to be
the restriction of μn to the set {fn < kn }. We claim the absolute continuity
property (i). Indeed, for any ε > 0, we can find a k such that ck ≥ c − ε/10.
For n sufficiently large, we thus have
(1.13) μn ({fn ≥ k}) ≥ c − ε/5
and
(1.14) μn ({fn ≥ kn }) ≤ c + ε/5,
and hence
(1.15) μn,ac ({fn ≥ k}) ≤ 2ε/5.
If we take δ < ε/5k, we thus see (for n sufficiently large) that (i) holds. (For
the remaining n, one simply shrinks δ as much as is necessary.)
Write μn,s := μn −μn,ac , thus μn,s is supported on a set of size |Xn |/Kn =
o(|Xn |) by Markov’s inequality. It remains to extract out the pure point
components. This we do by a similar procedure as above. Indeed, by arguing
as before we may assume (after passing to a subsequence as necessary) that
the quantities μn {x : μn ({x}) ≥ 1/j} converge to a limit dj for each positive
integer j, that the dj themselves converge to a limit d, and that there exists
a sequence jn → ∞ such that μn {x : μn ({x}) ≥ 1/jn } converges to d. If one
sets μsc and μpp to be the restrictions of μs to the sets {x : μn ({x}) < 1/jn }
and {x : μn ({x}) ≥ 1/jn }, respectively, one can verify the remaining claims
by arguments similar to those already given. 
Exercise 1.2.11. Generalise Theorem 1.2.9 to the setting where the Xn can
be infinite and non-discrete (but we still require every point to be measur-
able), the mn are arbitrary probability measures, and the μn are arbitrary
finite measures of uniformly bounded total variation.
Remark 1.2.10. This result is still not fully finitary because it deals with
a sequence of finite structures, rather than with a single finite structure. It
appears in fact to be quite difficult (and perhaps even impossible) to make
a fully finitary version of the Lebesgue decomposition (in the same way that
the finite convergence principle in Section 1.3 of Structure and Randomness
was a fully finitary analogue of the infinite convergence principle), though
one can certainly form some weaker finitary statements that capture a por-
tion of the strength of this theorem. For instance, one very cheap thing to

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 25

do, given two probability measures μ, m, is to introduce a threshold param-


eter k, and partition μ = μ≤k + μ>k , where μ≤k ≤ km, and μ>k is supported
on a set of m-measure at most 1/k; such a decomposition is automatic from
Theorem 1.2.4 and Markov’s inequality, and has meaningful content even
when the underlying space X is finite, but this type of decomposition is
not as powerful as the full Lebesgue decompositions (mainly because the
size of the support for μ>k is relatively large compared to the threshold
k). Using the finite convergence principle, one can do a bit better, writing
μ = μ≤k + μk<·≤F (k) + μ≥F (k) for any function F and any ε > 0, where
k = OF,ε (1), μ≤k ≤ km, μ≥F (k) is supported on a set of m-measure at most
1/F (k), and μk<·≤F (k) has total mass at most ε, but this is still fails to
capture the full strength of the infinitary decomposition, because ε needs to
be fixed in advance. I have not been able to find a fully finitary statement
that is equivalent to, say, Theorem 1.2.9; I suspect that if it does exist, it
will have quite a messy formulation.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/04.
The material here is largely based on Folland’s text [Fo2000], except for
the last section. Thanks to Ke, Max Baroi, Xiaochuan Liu, and several
anonymous commenters for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.3

Lp spaces

Now that we have reviewed the foundations of measure theory, let us now
put it to work to set up the basic theory of one of the fundamental families
of function spaces in analysis, namely the Lp spaces (also known as Lebesgue
spaces). These spaces serve as important model examples for the general
theory of topological and normed vector spaces, which we will discuss a little
bit in this lecture and then in much greater detail in later lectures.
Just as scalar quantities live in the space of real or complex numbers, and
vector quantities live in vector spaces, functions f : X → C (or other objects
closely related to functions, such as measures) live in function spaces. Like
other spaces in mathematics (e.g., vector spaces, metric spaces, topological
spaces, etc.) a function space V is not just mere sets of objects (in this
case, the objects are functions), but they also come with various important
structures that allow one to do some useful operations inside these spaces
and from one space to another. For example, function spaces tend to have
several (though usually not all) of the following types of structures, which
are usually related to each other by various compatibility conditions:

• Vector space structure. One can often add two functions f, g in


a function space V and expect to get another function f + g in that
space V ; similarly, one can multiply a function f in V by a scalar c
and get another function cf in V . Usually, these operations obey the
axioms of a vector space, though it is important to caution that the
dimension of a function space is typically infinite. (In some cases, the
space of scalars is a more complicated ring than the real or complex
field, in which case we need the notion of a module rather than a
vector space, but we will not use this more general notion in this
course.) Virtually all of the function spaces we shall encounter in

27

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
28 1. Real analysis

this course will be vector spaces. Because the field of scalars is real
or complex, vector spaces also come with the notion of convexity,
which turns out to be crucial in many aspects of analysis. As a
consequence (and in marked contrast to algebra or number theory),
much of the theory in real analysis does not seem to extend to other
fields of scalars (in particular, real analysis fails spectacularly in the
finite characteristic setting).
• Algebra structure. Sometimes (though not always) we also wish
to multiply two functions f , g in V and get another function f g in V ;
when combined with the vector space structure and assuming some
compatibility conditions (e.g., the distributive law), this makes V an
algebra. This multiplication operation is often just pointwise multi-
plication, but there are other important multiplication operations on
function spaces too, such as2 convolution.
• Norm structure. We often want to distinguish large functions in
V from small ones, especially in analysis, in which small terms in an
expression are routinely discarded or deemed to be acceptable errors.
One way to do this is to assign a magnitude or norm f V to each
function that measures its size. Unlike the situation with scalars,
where there is basically a single notion of magnitude, functions have
a wide variety of useful notions of size, each measuring a different
aspect (or combination of aspects) of the function, such as height,
width, oscillation, regularity, decay, and so forth. Typically, each
such norm gives rise to a separate function space (although sometimes
it is useful to consider a single function space with multiple norms
on it). We usually require the norm to be compatible with the vector
space structure (and algebra structure, if present), for instance by
demanding that the triangle inequality hold.
• Metric structure. We also want to tell whether two functions f ,
g in a function space V are near together or far apart. A typical
way to do this is to impose a metric d : V × V → R+ on the space
V . If both a norm V and a vector space structure are available,
there is an obvious way to do this: define the distance between two
functions f, g in V to be3 d(f, g) := f − gV . It is often important

2 One sometimes sees other algebraic structures than multiplication appear in function spaces,

such as commutators and derivations, but again we will not encounter those in this course. An-
other common algebraic operation for function spaces is conjugation or adjoint, leading to the
notion of a *-algebra.
3 This will be the only type of metric on function spaces encountered in this course. But there

are some non-linear function spaces of importance in non-linear analysis (e.g., spaces of maps from
one manifold to another) which have no vector space structure or norm, but still have a metric.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 29

to know if the vector space is complete4 with respect to the given


metric; this allows one to take limits of Cauchy sequences, and (with
a norm and vector space structure) sum absolutely convergent series,
as well as use some useful results from point set topology such as the
Baire category theorem; see Section 1.7. All of these operations are
of course vital in analysis.
• Topological structure. It is often important to know when a
sequence (or, occasionally, nets) of functions fn in V converges in
some sense to a limit f (which, hopefully, is still in V ); there are of-
ten many distinct modes of convergence (e.g., pointwise convergence,
uniform convergence, etc.) that one wishes to carefully distinguish
from each other. Also, in order to apply various powerful topologi-
cal theorems (or to justify various formal operations involving limits,
suprema, etc.), it is important to know when certain subsets of V
enjoy key topological properties (most notably compactness and con-
nectedness), and to know which operations on V are continuous. For
all of this, one needs a topology on V . If one already has a metric,
then one of course has a topology generated by the open balls of that
metric. But there are many important topologies on function spaces
in analysis that do not arise from metrics. We also often require the
topology to be compatible with the other structures on the function
space; for instance, we usually require the vector space operations of
addition and scalar multiplication to be continuous. In some cases,
the topology on V extends to some natural superspace W of more
general functions that contain V . In such cases, it is often important
to know whether V is closed in W , so that limits of sequences in V
stay in V .
• Functional structures. Since numbers are easier to understand
and deal with than functions, it is not surprising that we often study
functions f in a function space V by first applying some functional
λ : V → C to V to identify some key numerical quantity λ(f ) associ-
ated to f . Norms f → f V are of course one important example of
a functional, integration f → X f dμ provides another, and evalua-
tion f → f (x) at a point x provides a third important class. (Note,
though, that while evaluation is the fundamental feature of a function
in set theory, it is often a quite minor operation in analysis; indeed,
in many function spaces, evaluation is not even defined at all, for
instance because the functions in the space are only defined almost
4 Compactness would be an even better property than completeness to have, but function

spaces unfortunately tend be non-compact in various rather nasty ways, although there are useful
partial substitutes for compactness that are available; see, e.g., Section 1.6 of Poincaré’s Legacies,
Vol. I.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
30 1. Real analysis

everywhere!) An inner product ,  on V (see below) also provides a


large family f → f, g of useful functionals. It is of particular in-
terest to study functionals that are compatible with the vector space
structure (i.e., are linear) and with the topological structure (i.e., are
continuous); this will give rise to the important notion of duality on
function spaces.
• Inner product structure. One often would like to pair a func-
tion f in a function space V with another object g (which is often,
though not always, another function in the same function space V )
and obtain a number f, g, that typically measures the amount of
interaction or correlation between f and g. Typical examples include
inner products arising from integration, such as f, g := X f g dμ;
integration itself can also be viewed as a pairing, f, μ := X f dμ.
Of course, we usually require such inner products to be compatible
with the other structures present on the space (e.g., to be compatible
with the vector space structure, we usually require the inner product
to be bilinear or sesquilinear ). Inner products, when available, are
incredibly useful in understanding the metric and norm geometry of a
space, due to such fundamental facts as the Cauchy-Schwarz inequal-
ity and the parallelogram law. They also give rise to the important
notion of orthogonality between functions.
• Group actions. We often expect our function spaces to enjoy vari-
ous symmetries; we might wish to rotate, reflect, translate, modulate,
or dilate our functions and expect to preserve most of the structure
of the space when doing so. In modern mathematics, symmetries
are usually encoded by group actions (or actions of other group-like
objects, such as semigroups or groupoids; one also often upgrades
groups to more structured objects such as Lie groups). As usual,
we typically require the group action to preserve the other struc-
tures present on the space, e.g., one often restricts attention to group
actions that are linear (to preserve the vector space structure), con-
tinuous (to preserve topological structure), unitary (to preserve inner
product structure), isometric (to preserve metric structure), and so
forth. Besides giving us useful symmetries to spend, the presence
of such group actions allows one to apply the powerful techniques of
representation theory, Fourier analysis, and ergodic theory. However,
as this is a foundational real analysis class, we will not discuss these
important topics much here (and in fact will not deal with group
actions much at all).
• Order structure. In some cases, we want to utilise the notion
of a function f being non-negative, or dominating another function

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 31

g. One might also want to take the max or supremum of two or


more functions in a function space V , or split a function into positive
and negative components. Such order structures interact with the
other structures on a space in many useful ways (e.g., via the Stone-
Weierstrass theorem, Theorem 1.10.18). Much like convexity, order
structure is specific to the real line and is another reason why much of
real analysis breaks down over other fields. (The complex plane is of
course an extension of the real line and so is able to exploit the order
structure of that line, usually by treating the real and imaginary
components separately.)
There are of course many ways to combine various flavours of these struc-
tures together, and there are entire subfields of mathematics that are devoted
to studying particularly common and useful categories of such combinations
(e.g., topological vector spaces, normed vector spaces, Banach spaces, Ba-
nach algebras, von Neumann algebras, C ∗ algebras, Frechet spaces, Hilbert
spaces, group algebras, etc.) The study of these sorts of spaces is known
collectively as functional analysis. We will study some (but certainly not
all) of these combinations in an abstract and general setting later in this
course, but to begin with we will focus on the Lp spaces, which are very
good model examples for many of the above general classes of spaces, and
also of importance in many applications of analysis (such as probability or
PDE).

1.3.1. Lp spaces. In this section, (X, X , μ) will be a fixed measure space;


notions such as “measurable”, “measure”, “almost everywhere”, etc., will
always be with respect to this space, unless otherwise specified. Similarly,
unless otherwise specified, all subsets of X mentioned are restricted to be
measurable, as are all scalar functions on X.
For the sake of concreteness, we shall select the field of scalars to be
the complex numbers C. The theory of real Lebesgue spaces is virtually
identical to that of complex Lebesgue spaces, and the former can largely be
deduced from the latter as a special case.
We already have the notion of an absolutely integrable function on X,
which is a function f : X → C such that X |f | dμ is finite. More generally,
given any5 exponent 0 < p < ∞, we can define a pth-power integrable
function to be a function f : X → C such that X |f |p dμ is finite.
Remark 1.3.1. One can also extend these notions to functions that take
values in the extended complex plane C ∪ {∞}, but one easily observes that
pth power integrable functions must be finite almost everywhere, and so
5 Besides p = 1, the case of most interest is the case of square-integrable functions, when

p = 2. We will also extend this notion later to p = ∞, which is also an important special case.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
32 1. Real analysis

there is essentially no increase in generality afforded by extending the range


in this manner.

Following the Lebesgue philosophy (that one should ignore whatever is


going on on a set of measure zero), let us declare two measurable functions
to be equivalent if they agree almost everywhere. This is easily checked to
be an equivalence relation, which does not affect the property of being pth-
power integrable. Thus, we can define the Lebesgue space Lp (X, X , μ) to be
the space of pth-power integrable functions, quotiented out by this equiva-
lence relation. Thus, strictly speaking, a typical element of Lp (X, X , μ) is
not actually a specific function f , but is instead an equivalence class [f ], con-
sisting of all functions equivalent to a single function f . However, we shall
abuse notation and speak loosely of a function f “belonging” to Lp (X, X , μ),
where it is understood that f is only defined up to equivalence, or more im-
precisely is “defined almost everywhere”. For the purposes of integration,
this equivalence is quite harmless, but this convention does mean that we
can no longer evaluate a function f in Lp (X, X , μ) at a single point x if that
point x has zero measure. It takes a little bit of getting used to the idea of
a function that cannot actually be evaluated at any specific point, but with
some practice you will find that it will not cause6 any significant conceptual
difficulty.
Exercise 1.3.1. If (X, X , μ) is a measure space and X is the completion of
X , show that the spaces Lp (X, X , μ) and Lp (X, X , μ) are isomorphic using
the obvious candidate for the isomorphism. Because of this, when dealing
with Lp spaces, we will usually not be too concerned with whether the
underlying measure space is complete.
Remark 1.3.2. Depending on which of the three structures X, X , μ of the
measure space one wishes to emphasise, the space Lp (X, X , μ) is often ab-
breviated Lp (X), Lp (X ), Lp (X, μ), or even just Lp . Since for this discussion
the measure space (X, X , μ) will be fixed, we shall usually use the Lp ab-
breviation in this section. When the space X is discrete (i.e., X = 2X ) and
μ is a counting measure, then Lp (X, X , μ) is usually abbreviated p (X) or
just p (and the almost everywhere equivalence relation trivialises and can
thus be completely ignored).

At present, the Lebesgue spaces Lp are just sets. We now begin to place
several of the structures mentioned in the introduction to upgrade these sets
to richer spaces.
6 One could also take a more abstract view, dispensing with the set X altogether and defining

the Lebesgue space Lp (X , μ) on abstract measure spaces (X , μ), but we will not do so here.
Another way to think about elements of Lp is that they are functions which are unreliable on an
unknown set of measure zero, but remain reliable almost everywhere.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 33

We begin with vector space structure. Fix 0 < p < ∞, and let f, g ∈ Lp
be two pth-power integrable functions. From the crude pointwise (or more
precisely, pointwise almost everywhere) inequality
|f (x) + g(x)|p ≤ (2 max(|f (x)|, |g(x)|))p
(1.16) = 2p max(|f (x)|p , |g(x)|p )
≤ 2p (|f (x)|p + |g(x)|p ),
we see that the sum of two pth-power integrable functions is also pth-power
integrable. It is also easy to see that any scalar multiple of a pth-power
integrable function is also pth-power integrable. These operations respect
almost everywhere equivalence, and so Lp becomes a (complex) vector space.
Next, we set up the norm structure. If f ∈ Lp , we define the Lp norm
f Lp of f to be the number

(1.17) f Lp := ( |f |p dμ)1/p .


X
This is a finite non-negative number by definition of Lp ; in particular, we
have the identity
(1.18) f r Lp = f rLpr
for all 0 < p, r < ∞.
The Lp norm has the following three basic properties:
Lemma 1.3.3. Let 0 < p < ∞ and f, g ∈ Lp .
(i) Non-degeneracy. f Lp = 0 if and only if f = 0.
(ii) Homogeneity. cf Lp = |c|f Lp for all complex numbers c.
(iii) (Quasi-)triangle inequality. We have f + gLp ≤ C(f Lp + gLp )
for some constant C depending on p. If p ≥ 1, then we can take
C = 1 (this fact is also known as Minkowski’s inequality).

Proof. The claims (i) and (ii) are obvious. (Note how important it is that
we equate functions that vanish almost everywhere in order to get (i).) The
quasi-triangle inequality follows from a variant of the estimates in (1.16)
and is left as an exercise. For the triangle inequality, we have to be more
efficient than the crude estimate (1.16). By the non-degeneracy property
we may take f Lp and gLp to be non-zero. Using homogeneity, we can
normalise f Lp + gLp to equal 1, thus (by homogeneity again) we can
write f = (1 − θ)F and g = θG for some 0 < θ < 1 and F, G ∈ Lp with
F Lp = GLp = 1. Our task is now to show that

(1.19) |(1 − θ)F (x) + θG(x)|p dμ ≤ 1.


X

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
34 1. Real analysis

But observe that for 1 ≤ p < ∞, the function x → |x|p is convex on C, and
in particular that
(1.20) |(1 − θ)F (x) + θG(x)|p ≤ (1 − θ)|F (x)|p + θ|G(x)|p .
(If one wishes, one can use the complex triangle inequality to first reduce to
the case when F , G are non-negative, in which case one only needs convexity
on [0, +∞) rather than all of C.) The claim (1.19) then follows from (1.20)
and the normalisations of F , G. 
Exercise 1.3.2. Let 0 < p ≤ 1 and f, g ∈ Lp .
(i) Establish the variant f + gpLp ≤ f pLp + gpLp of the triangle
inequality.
(ii) If furthermore f and g are non-negative (almost everywhere), estab-
lish also the reverse triangle inequality f + gLp ≥ f Lp + gLp .
(iii) Show that the best constant C in the quasi-triangle inequality is
1
−1
2 p . In particular, the triangle inequality is false for p < 1.
(iv) Now suppose instead that 1 < p < ∞ or 0 < p < 1. If f, g ∈ Lp are
such that f + gLp = f Lp + gLp , show that one of the functions
f , g is a non-negative scalar multiple of the other (up to equivalence,
of course). What happens when p = 1?

A vector space V with a function  : V → [0, +∞) obeying the


non-degeneracy, homogeneity, and (quasi-)triangle inequality is known as
a (quasi-)normed vector space, and the function f → f  is then known as
a (quasi-)norm; thus Lp is a normed vector space for 1 ≤ p < ∞ but only
a quasi-normed vector space for 0 < p < 1. A function  : V → [0, +∞)
obeying the homogeneity and triangle inequality, but not necessarily the
non-degeneracy property, is known as a seminorm; thus for instance the Lp
norms for 1 ≤ p < ∞ would have been seminorms if we did not equate
functions that agreed almost everywhere. (Conversely, given a seminormed
vector space (V, ), one can convert it into a normed vector space by quo-
tienting out the subspace {f ∈ V : f  = 0}. We leave the details as an
exercise for the reader.)
Exercise 1.3.3. Let  : V → [0, +∞) be a function on a vector space
which obeys the non-degeneracy and homogeneity properties. Show that
 is a norm if and only if the closed unit ball {x : x ≤ 1} is convex.
Show that the same equivalence also holds for the open unit ball. This fact
emphasises the geometric nature of the triangle inequality.
Exercise 1.3.4. If f ∈ Lp for some 0 < p < ∞, show that the support
{x ∈ X : f (x) = 0} of f (which is defined only up to sets of measure zero)
is a σ-finite set. (Because of this, we can often reduce from the non-σ-finite

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 35

case to the σ-finite case in many, though not all, questions concerning Lp
spaces.)

We now are able to define Lp norms and spaces in the limit p = ∞.


We say that a function f : X → C is essentially bounded if there exists an
M such that |f (x)| ≤ M for almost every x ∈ X, and define f L∞ to be
the least M that serves as such a bound. We let L∞ denote the space of
essentially bounded functions, quotiented out by equivalence, and given the
norm  · L∞ . It is not hard to see that this is also a normed vector space.
Observe that a sequence fn ∈ L∞ converges to a limit f ∈ L∞ if and only
if fn converges essentially uniformly to f , i.e., it converges uniformly to f
outside of a set of measure zero. (Compare with Egorov’s theorem (Theo-
rem 1.1.21), which equates pointwise convergence with uniform convergence
outside of a set of arbitrarily small measure.)
Now we explain why we call this norm the L∞ norm:
Example 1.3.4. Let f be a (generalised) step function, thus f = A1E
for some amplitude A > 0 and some set E. Let us assume that E has
positive finite measure. Then f Lp = Aμ(E)1/p for all 0 < p < ∞, and
also f L∞ = A. Thus in this case, at least, the L∞ norm is the limit of
the Lp norms. This example illustrates also that the Lp norms behave like
combinations of the height A of a function, and the width μ(E) of such a
function, though of course the concepts of height and width are not formally
defined for functions that are not step functions.
Exercise 1.3.5. • If f ∈ L∞ ∩ Lp0 for some 0 < p0 < ∞, show that
f Lp → f L∞ as p → ∞. (Hint: Use the monotone convergence
theorem, Theorem 1.1.21.)
• If f ∈ L∞ , show that f Lp → ∞ as p → ∞.

Once one has a vector space structure and a (quasi-)norm structure, we


immediately get a (quasi-)metric structure:
Exercise 1.3.6. Let (V, ) be a normed vector space. Show that the func-
tion d : V × V → [0, +∞) defined by d(f, g) := f − g is a metric on V
which is translation invariant (thus d(f + h, g + h) = d(f, g) for all f, g ∈ V )
and homogeneous (thus d(cf, cg) = |c|d(f, g) for all f, g ∈ V and scalars c).
Conversely, show that every translation-invariant homogeneous metric on V
arises from precisely one norm in this manner. Establish a similar claim
relating quasi-norms with quasi-metrics (which are defined as metrics, but
with the triangle inequality replaced by a quasi-triangle inequality), or be-
tween seminorms and semimetrics (which are defined as metrics, but where
distinct points are allowed to have a zero separation; these are also known
as pseudometrics).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
36 1. Real analysis

The (quasi-)metric structure in turn generates a topological structure in


the usual manner using the (quasi-)metric balls as a base for the topology.
In particular, a sequence of functions fn ∈ Lp converges to a limit f ∈ Lp
if fn − f Lp → 0 as n → ∞. We refer to this type of convergence as
a convergence in Lp norm or a strong convergence in Lp (we will discuss
other modes of convergence in later lectures). As is usual in (quasi-)metric
spaces (or more generally for Hausdorff spaces), the limit, if it exists, is
unique. (This is however not the case for topological structures induced by
seminorms or semimetrics, though we can solve this problem by quotienting
out the degenerate elements as discussed earlier.)

Recall that anyseries ∞ n=1 an of scalars is convergent if it is absolutely
convergent (i.e., if ∞n=1 |an | < ∞). This fact turns out to be closely related
to the fact that the field of scalars C is complete. This can be seen from
the following result:
Exercise 1.3.7. Let (V, ) be a normed vector space (and hence also a
metric space and a topological space). Show that the following are equiva-
lent:
• V is a complete metric space (i.e., every Cauchy sequence converges).
• Every sequence fn ∈ V which is absolutely convergent (i.e.,
∞ N
n=1 fn  < ∞) is also conditionally convergent (i.e., n=1 fn con-
verges to a limit as N → ∞).
Remark 1.3.5. The situation is more complicated for complete quasi-
normed vector spaces; not every absolutely convergent series is conditionally
convergent. On the other hand, if fn  decays faster than a sufficiently large
negative power of n, one recovers conditional convergence; see [Ta].
Remark 1.3.6. Let X be a topological space, and let BC(X) be the space of
bounded continuous functions on X; this is a vector space. We can place the
uniform norm f u := supx∈X |f (x)| on this space; this makes BC(X) into a
normed vector space. It is not hard to verify that this space is complete, and
so every absolutely convergent series in BC(X) is conditionally convergent.
This fact is better known as the Weierstrass M -test.
A space obeying the properties in Exercise 1.3.5 (i.e., a complete normed
vector space) is known as a Banach space. We will study Banach spaces in
more detail later in this course. For now, we give one of the fundamental
examples of Banach spaces.
Proposition 1.3.7. Lp is a Banach space for every 1 ≤ p ≤ ∞.
∞
Proof. By Exercise 1.3.7, it suffices to show that any series n=1 fn of
functions in Lp which is absolutely convergent is also conditionally conver-
gent. This is easy in the case p = ∞ and is left as an exercise. In the case

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 37


1 ≤ p < ∞, we write M := ∞ n=1 fn Lp , which is a finite quantity by hy-
N
pothesis. By the triangle inequality, we have  n=1 |fn |Lp  ≤ M for all N .

By monotone convergence
∞ (Theorem 1.1.21), we conclude  n=1 |fn |Lp ≤
M . In particular, n=1 fn (x) is absolutely convergent for almost every x.
Write the limit of this series as F (x). By dominated convergence (Theorem
N p
1.1.21), we see that n=1 fn (x) converges in L norm to F , and we are
done. 

An important fact is that functions in Lp can be approximated by simple


functions:
Proposition 1.3.8. If 0 < p < ∞, then the space of simple functions with
finite measure support is a dense subspace of Lp .
Remark 1.3.9. The concept of a non-trivial dense subspace is one which
only comes up in infinite dimensions, and it is hard to visualise directly. Very
roughly speaking, the infinite number of degrees of freedom in an infinite
dimensional space gives a subspace an infinite number of “opportunities” to
come as close as one desires to any given point in that space, which is what
allows such spaces to be dense.

Proof. The only non-trivial thing to show is the density. An application of


the monotone convergence theorem (Theorem 1.1.21) shows that the space
of bounded Lp functions are dense in Lp . Another application of monotone
convergence (and Exercise 1.3.4) then shows that the space of bounded Lp
functions of finite measure support are dense in the space of bounded Lp
functions. Finally, by discretising the range of bounded Lp functions, we see
that the space of simple functions with finite measure support is dense in
the space of bounded Lp functions with finite support. 
Remark 1.3.10. Since not every function in Lp is a simple function with
finite measure support, we thus see that the space of simple functions with
finite measure support with the Lp norm is an example of a normed vector
space which is not complete.
Exercise 1.3.8. Show that the space of simple functions (not necessarily
with finite measure support) is a dense subspace of L∞ . Is the same true if
one reinstates the finite measure support restriction?
Exercise 1.3.9. Suppose that μ is σ-finite and X is separable (i.e., count-
ably generated). Show that Lp is separable (i.e., has a countable dense
subset) for all 1 ≤ p < ∞. Give a counterexample that shows that L∞ need
not be separable. (Hint: Try using a counting measure.)

Next, we turn to algebra properties of Lp spaces. The key fact here is

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
38 1. Real analysis

Proposition 1.3.11 (Hölder’s inequality). Let f ∈ Lp and g ∈ Lq for some


0 < p, q ≤ ∞. Then f g ∈ Lr and f gLr ≤ f Lp gLq , where the exponent
r is defined by the formula 1r = 1p + 1q .

Proof. This will be a variant of the proof of the triangle inequality in


Lemma 1.3.3, again relying ultimately on convexity. The claim is easy when
p = ∞ or q = ∞ and is left as an exercise for the reader in this case,
so we assume p, q < ∞. Raising f and g to the power r using (1.17), we
may assume r = 1, which makes 1 < p, q < ∞ dual exponents in the sense
that 1p + 1q = 1. The claim is obvious if either f Lp or gLq are zero, so
we may assume they are non-zero; by homogeneity we may then normalise
f Lp = gLq = 1. Our task is now to show that

(1.21) |f g| dμ ≤ 1.
X

Here, we use the convexity of the exponential function t → et on [0, +∞),


which implies the convexity of the function t → |f (x)|p(1−t) |g(x)|qt for t ∈
[0, 1] for any x. In particular we have
1 1
(1.22) |f (x)g(x)| ≤ |f (x)|p + |g(x)|q ,
p q
and the claim (1.21) follows from the normalisations on p, q, f , g. 
Remark 1.3.12. For a different proof of this inequality (based on the tensor
power trick ), see Section 1.9 of Structure and Randomness.
Remark 1.3.13. One can also use Hölder’s inequality to prove the triangle
inequality for Lp , 1 ≤ p < ∞ (i.e., Minkowski’s inequality). From the
complex triangle inequality |f + g| ≤ |f | + |g|, it suffices to check the case
when f , g are non-negative. In this case we have the identity
(1.23) f + gpLp = f |f + g|p−1 L1 + g|f + g|p−1 L1 ,

while Hölder’s inequality gives f |f + g|p−1 L1 ≤ f Lp f + gp−1


Lp and
p−1
g|f +g| L1 ≤ gLp f +gLp . The claim then follows from some algebra
p−1

(and checking the degenerate cases separately, e.g., when f + gLp = 0).
Remark 1.3.14. The proofs of Hölder’s inequality and Minkowski’s in-
equality both relied on convexity of various functions in C or [0, +∞). One
way to emphasise this is to deduce both inequalities from Jensen’s inequality,
which is an inequality that manifestly exploits this convexity. We will not
take this approach here, but see for instance [LiLo2000] for a discussion.
Example 1.3.15. It is instructive to test Hölder’s inequality (and also Ex-
ercises 1.3.10–1.3.14 below) in the special case when f , g are generalised step

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 39

functions, say f = A1E and g = B1F with A, B non-zero. The inequality


then simplifies to
(1.24) μ(E ∩ F )1/r ≤ μ(E)1/p μ(F )1/q ,
which can be easily deduced from the hypothesis 1p + 1q = 1r and the trivial
inequalities μ(E ∩ F ) ≤ μ(E) and μ(E ∩ F ) ≤ μ(F ). One then easily sees
(when p, q are finite) that equality in (1.24) only holds if μ(E ∩F ) = μ(E) =
μ(F ), or in other words if E and F agree almost everywhere. Note the above
computations also explain why the condition 1p + 1q = 1r is necessary.

Exercise 1.3.10. Let 0 < p, q < ∞, and let f ∈ Lp , g ∈ Lq be such


that Hölder’s inequality is obeyed with equality. Show that of the functions
f p , g q , one of them is a scalar multiple of the other (up to equivalence, of
course). What happens if p or q is infinite?

An important corollary of Hölder’s inequality is the Cauchy-Schwarz


inequality

(1.25) | f (x)g(x) dμ| ≤ f L2 gL2 ,


X
which can of course be proven by many other means.
Exercise 1.3.11. If f ∈ Lp for some 0 < p ≤ ∞ and is also supported
on a set E of finite measure, show that f ∈ Lq for all 0 < q ≤ p, with
1
−1
f Lq ≤ μ(E) q p f Lp . When does equality occur?
Exercise 1.3.12. If f ∈ Lp for some 0 < p < ∞ and every set of positive
measure in X has measure at least m, show that f ∈ Lq for all p < q ≤ ∞,
with f Lq ≤ m q − p f Lp . When does equality occur? (This result is
1 1

especially useful for the p spaces, in which μ is a counting measure and m


can be taken to be 1.)
Exercise 1.3.13. If f ∈ Lp0 ∩ Lp1 for some 0 < p0 < p1 ≤ ∞, show that
f ∈ Lp for all p0 ≤ p ≤ p1 and that f Lp ≤ f L p0 f Lp1 , where 0 < θ < 1
1−θ θ
1 1−θ θ
is such that p = p0 + p1 . Another way of saying this is that the function
p → log f L is convex. When does equality occur? This convexity is
1 p

a prototypical example of interpolation, about which we shall say more in


Section 1.11.
Exercise 1.3.14. If f ∈ Lp0 for some 0 < p0 ≤ ∞ and its support E :=
{x ∈ X : f (x) = 0} has finite measure, show that f ∈ Lp for all 0 < p < p0
and that f pLp → μ(E) as p → 0. (Because of this, the measure of the
support of f is sometimes known as the L0 norm of f , or more precisely the
L0 norm raised to the power 0.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
40 1. Real analysis

1.3.2. Linear functionals on Lp . Given an exponent 1 ≤ p ≤ ∞, define


the dual exponent 1 ≤ p ≤ ∞ by the formula 1p + p1 = 1 (thus p = p/(p − 1)
for 1 < p < ∞, while 1 and ∞ are duals of each other). From Hölder’s

inequality, we see that for any g ∈ Lp , the functional λg : Lp → C defined
by

(1.26) λg (f ) := f g dμ
X
is well defined on Lp ; the functional is also clearly linear. Furthermore,
Hölder’s inequality also tells us that this functional is continuous.
A deep and important fact about Lp spaces is that, in most cases, the
converse is true: the recipe (1.26) is the only way to create continuous linear
functionals on Lp .
Theorem 1.3.16 (Dual of Lp ). Let 1 ≤ p < ∞, and assume μ is σ-finite.
Let λ : Lp → C be a continuous linear functional. Then there exists a unique

g ∈ Lp such that λ = λg .

This result should be compared with the Radon-Nikodym theorem


(Corollary 1.2.5). Both theorems start with an abstract function μ : X → R
or λ : Lp → C, and create a function out of it. Indeed, we shall see shortly
that the two theorems are essentially equivalent to each other. We will de-
velop Theorem 1.3.16 further in Section 1.5, once we introduce the notion
of a dual space.
To prove Theorem 1.3.16, we first need a simple and useful lemma:
Lemma 1.3.17 (Continuity is equivalent to boundedness for linear opera-
tors). Let T : X → Y be a linear transformation from one normed vector
space (X, X ) to another (Y, Y ). Then the following are equivalent:
(i) T is continuous.
(ii) T is continuous at 0.
(iii) There exists a constant C such that T xY ≤ CxX for all x ∈ X.

Proof. It is clear that (i) implies (ii), and that (iii) implies (ii). Next, from
linearity we have T x = T x0 + T (x − x0 ) for any x, x0 ∈ X, which (together
with the continuity of addition, which follows from the triangle inequality)
shows that continuity of T at 0 implies continuity of T at any x0 , so that
(ii) implies (i). The only remaining task is to show that (i) implies (iii).
By continuity, the inverse image of the unit ball in Y must be an open
neighbourhood of 0 in X, thus there exists some radius r > 0 such that
T xY < 1 whenever xX < r. The claim then follows (with C := 1/r) by
homogeneity. (Alternatively, one can deduce (iii) from (ii) by contradiction.
If (iii) failed, then there exists a sequence xn of non-zero elements of X

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 41

such that T xn Y /xn X goes to infinity. By homogeneity, we can arrange


matters so that xn X goes to zero, but T xn Y stays away from zero, thus
contradicting continuity at 0.) 

Proof of Theorem 1.3.16. The uniqueness claim is similar to the unique-


ness claim in the Radon-Nikodym theorem (Exercise 1.2.2) and is left as an
exercise to the reader; the hard part is establishing existence.
Let us first consider the case when μ is finite. The linear functional
λ : Lp → C induces a functional ν : X → C on sets E by the formula
(1.27) ν(E) := λ(1E ).
Since λ is linear, ν is finitely additive (and sends the empty set to zero).
Also, if E1 , E2 , . . . are a sequence of disjoint sets, then 1N En converges
n=1
in Lp to 1∞ n=1 En
as n → ∞ (by the dominated convergence theorem and
the finiteness of μ), and thus (by continuity of λ and finite additivity of
ν), ν is countably additive as well. Finally, from (1.27) we also see that
ν(E) = 0 whenever μ(E) = 0, thus ν is absolutely continuous with respect
to μ. Applying the Radon-Nikodym theorem (Corollary 1.2.5) to both the
real and imaginary components of ν, we conclude that ν = μg for some
g ∈ L1 . Thus by (1.27) we have
(1.28) λ(1E ) = λg (1E )
for all measurable E. By linearity, this implies that λ and λg agree on
simple functions. Taking uniform limits (using Exercise 1.3.8) and using
continuity (and the finite measure of μ), we conclude that λ and λg agree
on all bounded functions. Taking monotone limits (working on the positive
and negative supports of the real and imaginary parts of g separately), we
conclude that λ and λg agree on all functions in Lp , and in particular that
X f g dμ is absolutely convergent for all f ∈ L .
p

To finish the theorem in this case, we need to establish that g lies in



Lp . By taking real and imaginary parts, we may assume without loss of
generality that g is real; by splitting into the regions where g is positive and
negative, we may assume that g is non-negative.
We already know that λg = λ is a continuous functional from Lp to C.
By Lemma 1.3.17, this implies a bound of the form |λg (f )| ≤ Cf Lp for
some C > 0.
Suppose first that p > 1. Heuristically, we would like to test this
 
inequality with f := g p −1 , since we formally have λg (f ) = gpLp and

f Lp = gpLp−1
 . (Not coincidentally, this is also the choice that would

make Hölder’s inequality an equality; see Exercise 1.3.10.) Cancelling the


gLp factors would then give the desired finiteness of gLp .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
42 1. Real analysis

We cannot quite make that argument work, because it is circular: it


assumes gLp is finite in order to show that gLp is finite! But this can be

easily remedied. We test the inequality with fN := min(g, N )p −1 for some

large N ; this lies in Lp . We have λg (fN ) ≥  min(g, N )pLp and fN Lp =

 min(g, N )pLp−1
 , and hence  min(g, N )Lp ≤ C for all N . Letting N go to

infinity and using monotone convergence (Theorem 1.1.21), we obtain the


claim.
In the p = 1 case, we instead use f := 1g>N as the test functions, to
conclude that g is bounded almost everywhere by N . We leave the details
to the reader.
This handles the case when μ is finite. When μ is σ-finite, we can write X
as the union of an increasing sequence En of sets of finite measure. On each

such set, the above arguments let us write λ = λgn for some gn ∈ Lp (En ).
The uniqueness arguments tell us that the gn are all compatible with each
other, in particular if n < m, then gn and gm agree on En . Thus all the gn are
in fact restrictions of a single function g to En . The previous arguments also

tell us that the Lp norm of gn is bounded by the same constant C uniformly

in n, so by monotone convergence (Theorem 1.1.21), g has bounded Lp norm
also, and we are done. 

Remark 1.3.18. When 1 < p < ∞, the hypothesis that μ is σ-finite can
be dropped, but not when p = 1; see, e.g., [Fo2000, Section 6.2] for further
discussion. In these lectures, though, we will be content with working in
the σ-finite setting. On the other hand, the claim fails when p = ∞ (except
when X is finite); we will see this in Section 1.5, when we discuss the Hahn-
Banach theorem.

Remark 1.3.19. We have seen how the Lebesgue-Radon-Nikodym theorem


can be used to establish Theorem 1.3.16. The converse is also true: Theorem
1.3.16 can be used to deduce the Lebesgue-Radon-Nikodym theorem (a fact
essentially observed by von Neumann). For simplicity, let us restrict our
attention to the unsigned finite case, thus μ and m are unsigned and finite.
This implies that the sum μ+m is also unsigned and finite. We observe that
the linear functional λ : f → X f dμ is continuous on L1 (μ + m), hence by
Theorem 1.3.16, there must exist a function g ∈ L∞ (μ + m) such that

(1.29) f dμ = f g d(μ + m)
X X

for all f ∈ L1 (μ + m). It is easy to see that g must be real and non-negative,
and also at most 1 almost everywhere. If E is the set where m = 1, we
see by setting f = 1E in (1.29) that E has m-measure zero, and so μ E is

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 43

singular. Outside of E, we see from (1.29) and some rearrangement that

(1.30) (1 − g)f dμ = f g dm
X\E X

and one then easily verifies that μ agrees with m g outside of E  . This gives
1−g
the desired Lebesgue-Radon-Nikodym decomposition μ = m g + μ E .
1−g

Remark 1.3.20. The argument used in Remark 1.3.19 also shows that the
Radon-Nikodym theorem implies the Lebesgue-Radon-Nikodym theorem.
Remark 1.3.21. One can give an alternate proof of Theorem 1.3.16, which
relies on the geometry (and in particular, the uniform convexity) of Lp spaces
rather than on the Radon-Nikodym theorem, and can thus be viewed as
giving an independent proof of that theorem; see Exercise 1.4.14.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/09.
Thanks to Xiaochuan Li for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.4

Hilbert spaces

In the next few lectures, we will be studying four major classes of function
spaces. In decreasing order of generality, these classes are the topological
vector spaces, the normed vector spaces, the Banach spaces, and the Hilbert
spaces. In order to motivate the discussion of the more general classes of
spaces, we will first focus on the most special class—that of (real and com-
plex) Hilbert spaces. These spaces can be viewed as generalisations of (real
and complex) Euclidean spaces such as Rn and Cn to infinite-dimensional
settings, and indeed much of one’s Euclidean geometry intuition concerning
lengths, angles, orthogonality, subspaces, etc., will transfer readily to arbi-
trary Hilbert spaces. In contrast, this intuition is not always accurate in
the more general vector spaces mentioned above. In addition to Euclidean
spaces, another fundamental example7 of Hilbert spaces comes from the
Lebesgue spaces L2 (X, X , μ) of a measure space (X, X , μ).
Hilbert spaces are the natural abstract framework in which to study two
important (and closely related) concepts, orthogonality and unitarity, al-
lowing us to generalise familiar concepts and facts from Euclidean geometry
such as the Cartesian coordinate system, rotations and reflections, and the
Pythagorean theorem to Hilbert spaces. (For instance, the Fourier trans-
form (Section 1.12) is a unitary transformation and can thus be viewed as a
kind of generalised rotation.) Furthermore, the Hodge duality on Euclidean

7 There are of course many other Hilbert spaces of importance in complex analysis, harmonic
analysis, and PDE, such as Hardy spaces H2 , Sobolev spaces H s = W s,2 , and the space HS of
Hilbert-Schmidt operators; see for instance Section 1.14 for a discussion of Sobolev spaces. Com-
plex Hilbert spaces also play a fundamental role in the foundations of quantum mechanics, being
the natural space to hold all the possible states of a quantum system (possibly after projectivising
the Hilbert space), but we will not discuss this subject here.

45

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
46 1. Real analysis

spaces has a partial analogue for Hilbert spaces, namely the Riesz represen-
tation theorem for Hilbert spaces, which makes the theory of duality and
adjoints for Hilbert spaces especially simple (when compared with the more
subtle theory of duality for, say, Banach spaces; see Section 1.5).
These notes are only the most basic introduction to the theory of Hilbert
spaces. In particular, the theory of linear transformations between two
Hilbert spaces, which is perhaps the most important aspect of the subject,
is not covered much at all here.

1.4.1. Inner product spaces. The Euclidean norm

(1.31) |(x1 , . . . , xn )| := x21 + · · · + x2n


in real Euclidean space Rn can be expressed in terms of the dot product
· : Rn × Rn → R, defined as
(1.32) (x1 , . . . , xn ) · (y1 , . . . , yn ) := x1 y1 + · · · + xn yn
by the well-known formula
(1.33) |x| = (x · x)1/2 .
In particular, we have the positivity property
(1.34) x·x≥0
with equality if and only if x = 0. One reason why it is more advantageous
to work with the dot product than the norm is that while the norm function
is only sublinear, the dot product is bilinear, thus
(1.35) (cx + dy) · z = c(x · z) + d(y · z); z · (cx + dy) = c(z · x) + d(z · y)
for all vectors x, y and scalars c, d, and also symmetric,
(1.36) x · y = y · x.
These properties make the inner product easier to manipulate algebraically
than the norm.
The above discussion was for the real vector space Rn , but one can
develop analogous statements for the complex vector space Cn , in which
the norm

(1.37) (z1 , . . . , zn ) := |z1 |2 + · · · + |zn |2
can be represented in terms of the complex inner product ,  : Cn ×Cn → C
defined by the formula
(1.38) (z1 , . . . , zn ) · (w1 , . . . , wn ) := z1 w1 + · · · + zn wn
by the analogue of (1.33), namely
(1.39) x = (x, x)1/2 .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 47

In particular, as before with (1.34), we have the positivity property


(1.40) x, x ≥ 0
with equality if and only if x = 0. The bilinearity property (1.35) is modified
to the sesquilinearity property
(1.41) cx + dy, z = cx, z + dy, z, z, cx + dy = cz, x + dz, y
while the symmetry property (1.36) needs to be replaced with
(1.42) x, y = y, x
in order to be compatible with sesquilinearity.
We can formalise all these properties axiomatically as follows.
Definition 1.4.1 (Inner product space). A complex inner product space
(V, , ) is a complex vector space V , together with an inner product ,  :
V × V → C which is sesquilinear (i.e., (1.41) holds for all x, y ∈ V and
c, d ∈ C) and symmetric in the sesquilinear sense (i.e., (1.42) holds for all
x, y ∈ V ), and obeys the positivity property (1.40) for all x ∈ V , with
equality if and only if x = 0. We will usually abbreviate (V, , ) as V .
A real inner product space is defined similarly, but with all references to
C replaced by R (and all references to complex conjugation dropped).
Example 1.4.2. Rn with the standard dot product (1.32) is a real inner
product space, and Cn with the complex inner product (1.38) is a complex
inner product space.
Example 1.4.3. If (X, X , μ) is a measure space, then the complex L2 space
L2 (X, X , μ) = L2 (X, X , μ; C) with the complex inner product

(1.43) f, g := f g dμ
X

(which is well defined by the Cauchy-Schwarz inequality) is easily verified


to be a complex inner product space, and similarly for the real L2 space
(with the complex conjugate signs dropped, of course). Note that the finite
dimensional examples Rn , Cn can be viewed as the special case of the L2
examples in which X is {1, . . . , n} with the discrete σ-algebra and counting
measure.
Example 1.4.4. Any subspace of a (real or complex) inner product space
is again a (real or complex) inner product space, simply by restricting the
inner product to the subspace.
Example 1.4.5. Also, any real inner product space V can be complexified
into the complex inner product space VC , defined as the space of formal

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
48 1. Real analysis

combinations x + iy of vectors x, y ∈ V (with the obvious complex vector


space structure), and with inner product
(1.44) a + ib, c + id := a, c + ib, c − ia, d + b, d.
Example 1.4.6. Fix a probability space (X, X , μ). The space of square-
integrable real-valued random variables of mean zero is an inner product
space if one uses covariance as the inner product. (What goes wrong if one
drops the mean zero assumption?)

Given a (real or complex) inner product space V , we can define the norm
x of any vector x ∈ V by the formula (1.39), which is well defined thanks
to the positivity property; in the case of the L2 spaces, this norm of course
corresponds to the usual L2 norm. We have the following basic facts:
Lemma 1.4.7. Let V be a real or complex inner product space.
(i) Cauchy-Schwarz inequality. For any x, y ∈ V , we have |x, y| ≤
xy.
(ii) The function x → x is a norm on V . (Thus every inner product
space is a normed vector space.)

Proof. We shall just verify the complex case, as the real case is similar
(and slightly easier). The positivity property tells us that the quadratic
form ax + by, ax + by is non-negative for all complex numbers a, b. Using
sesquilinearity and symmetry, we can expand this form as
(1.45) |a|2 x2 + 2 Re(abx, y) + |b|2 y2 .
Optimising in a, b (see also Section 1.10 of Structure and Randomness), we
obtain the Cauchy-Schwarz inequality. To verify the norm property, the only
non-trivial verification is that of the triangle inequality x + y ≤ x + y.
But on expanding x + y2 = x + y, x + y, we see that
(1.46) x + y2 = x2 + 2 Re(x, y) + y2 ,
and the claim then follows from the Cauchy-Schwarz inequality. 

Observe from the Cauchy-Schwarz inequality that the inner product ,  :


H × H → C is continuous.
Exercise 1.4.1. Let T : V → W be a linear map from one (real or complex)
inner product space to another. Show that T preserves the inner product
structure (i.e., T x, T y = x, y for all x, y ∈ V ) if and only if T is an
isometry (i.e., T x = x for all x ∈ V ). (Hint: In the real case, express
x, y in terms of x + y2 and x − y2 . In the complex case, use x + y, x −
y, x + iy, x − iy instead of x + y, x − y.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 49

Inspired by the above exercise, we say that two inner product spaces
are isomorphic if there exists an invertible isometry from one space to the
other; such invertible isometries are known as isomorphisms.
Exercise 1.4.2. Let V be a real or complex inner product space. If x1 , . . .,
xn are a finite collection of vectors in V , show that the Gram matrix
(xi , xj )1≤i,j≤n is Hermitian and positive semidefinite, and it is positive
definite if and only if the x1 , . . . , xn are linearly independent. Conversely,
given a Hermitian positive semidefinite matrix (aij )1≤i,j≤n with real (resp.,
complex) entries, show that there exists a real (resp., complex) inner product
space V and vectors x1 , . . . , xn such that xi , xj  = aij for all 1 ≤ i, j ≤ n.

In analogy with the Euclidean case, we say that two vectors x, y in


a (real or complex) vector space are orthogonal if x, y = 0. (With this
convention, we see in particular that 0 is orthogonal to every vector, and is
the only vector with this property.)
Exercise 1.4.3 (Pythagorean theorem). Let V be a real or complex inner
product space. If x1 , . . . , xn are a finite set of pairwise orthogonal vectors,
then x1 + · · · + xn 2 = x1 2 + · · · + xn 2 . In particular, we see that
x1 + x2  ≥ x1  whenever x2 is orthogonal to x1 .

A (possibly infinite) collection (eα )α∈A of vectors in a (real or complex)


inner product space is said to be orthonormal if they are pairwise orthogonal
and all of unit length.
Exercise 1.4.4. Let (eα )α∈A be an orthonormal system of vectors in a real
or complex inner product space. Show that this system is (algebraically)
linearly independent (thus any non-trivial finite linear combination of vec-
tors in this system is non-zero). If x lies in the algebraic span of this system
(i.e., it is a finite linear combination of vectors in the system), establish the
inversion formula
(1.47) x= x, eα eα
α∈A

(with only finitely many of the terms non-zero) and the (finite) Plancherel
formula
(1.48) x2 = |x, eα |2 .
α∈A

Exercise 1.4.5 (Gram-Schmidt theorem). Let e1 , . . . , en be a finite or-


thonormal system in a real or complex inner product space, and let v be a
vector not in the span of e1 , . . . , en . Show that there exists a vector en+1
with span(e1 , . . . , en , en+1 ) = span(e1 , . . . , en , v) such that e1 , . . . , en+1 is an
orthonormal system. Conclude that an n-dimensional real or complex inner

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
50 1. Real analysis

product space is isomorphic to Rn or Cn , respectively. Thus, any statement


about inner product spaces which only involves a finite-dimensional subspace
of that space can be verified just by checking it on Euclidean spaces.
Exercise 1.4.6 (Parallelogram law). For any inner product space V , estab-
lish the parallelogram law
(1.49) x + y2 + x − y2 = 2x2 + 2y2 .
Show that this inequality fails for Lp (X, X , μ) for p = 2 as soon as X contains
at least two disjoint sets of non-empty finite measure. On the other hand,
establish the Hanner inequalities
(1.50) f + gpp + f − gpp ≥ (f p + gp )p + |f p − gp |p
and
(1.51) (f + gp + f − gp )p + |f + gp − f − gp |p ≤ 2p (f pp + gpp )
for 1 ≤ p ≤ 2, with the inequalities being reversed for 2 ≤ p < ∞. (Hint:
(1.51) can be deduced from (1.50) by a simple substitution. For (1.50), re-
duce to the case when f , g are non-negative, and then exploit the inequality
|x + y|p + |x − y|p ≥ ((1 + r)p−1 + (1 − r)p−1 )xp
(1.52)
+ ((1 + r)p−1 − (1 − r)p−1 )r1−p y p
for all non-negative x, y, 0 < r < 1, and 1 ≤ p ≤ 2, with the inequality
being reversed for 2 ≤ p < ∞, and with equality being attained when y < x
and r = y/x.)

1.4.2. Hilbert spaces. Thus far, our discussion of inner product spaces
has been largely algebraic in nature; this is because we have not been able
to take limits inside these spaces and do some actual analysis. This can be
rectified by adding an additional axiom:
Definition 1.4.8 (Hilbert spaces). A (real or complex) Hilbert space is a
(real or complex) inner product space which is complete (or equivalently, an
inner product space which is also a Banach space).
Example 1.4.9. From Proposition 1.3.7, (real or complex) L2 (X, X , μ) is
a Hilbert space for any measure space (X, X , μ). In particular, Rn and Cn
are Hilbert spaces.
Exercise 1.4.7. Show that a subspace of a Hilbert space H will itself be a
Hilbert space if and only if it is closed. (In particular, proper dense subspaces
of Hilbert spaces are not Hilbert spaces.)
Example 1.4.10. By Example 1.4.9, the space l2 (Z) of doubly infinite
square-summable sequences is a Hilbert space. Inside this space, the space
cc (Z) of sequences of finite support is a proper dense subspace (as can be

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 51

seen for instance by Proposition 1.3.8, though this can also be seen much
more directly), and so cannot be a Hilbert space.
Exercise 1.4.8. Let V be an inner product space. Show that there exists a
Hilbert space V which contains a dense subspace isomorphic to V ; we refer
to V as a completion of V . Furthermore, this space is essentially unique

in the sense that if V , V are two such completions, then there exists an

isomorphism from V to V which is the identity on V (if one identifies V with
the dense subspaces of V and V  . Because of this fact, inner product spaces
are sometimes known as pre-Hilbert spaces, and can always be identified with
dense subspaces of actual Hilbert spaces.
Exercise 1.4.9. Let H, H  be two Hilbert spaces. Define the direct sum
H ⊕ H  of the two spaces to be the vector space H × H  with inner product
(x, x ), (y, y  )H⊕H  := x, yH + x , y  H  . Show that H ⊕ H  is also a
Hilbert space.
Example 1.4.11. If H is a complex Hilbert space, one can define the com-
plex conjugate H of that space to be the set of formal conjugates {x : x ∈ H}
of vectors in H, with complex vector space structure x + y := x + y and
cx := cx, and inner product x, yH := y, xH . One easily checks that H is
again a complex Hilbert space. Note the map x → x is not a complex linear
isometry; instead, it is a complex antilinear isometry.

A key application of the completeness axiom is to be able to define the


nearest point from a vector to a closed convex body.
Proposition 1.4.12 (Existence of minimisers). Let H be a Hilbert space,
let K be a non-empty closed convex subset of H, and let x be a point in H.
Then there exists a unique y in K that minimises the distance y − x to x.
Furthermore, for any other z in K, we have Rez − y, y − x ≥ 0.

Recall that a subset K of a real or complex vector space is convex if


(1 − t)v + tw ∈ K whenever v, w ∈ K and 0 ≤ t ≤ 1.

Proof. Observe from the parallelogram law (1.49) that we have the (geo-
metrically obvious) fact that if y and y  are distinct and equidistant from
x, then their midpoint (y + y  )/2 is strictly closer to x than either of y or
y  . This (and convexity) ensures that the distance minimiser, if it exists, is
unique. Also, if y is the distance minimiser and z is in K, then (1 − θ)y + θz
is at least as distant from x as y is for any 0 < θ < 1, by convexity. Squaring
this and rearranging, we conclude that
(1.53) 2 Rez − y, y − x + θz − y2 ≥ 0.
Letting θ → 0 we obtain the final claim in the proposition.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
52 1. Real analysis

It remains to show existence. Write D := inf y∈K x − y. It is clear


that D is finite and non-negative. If the infimum is attained, then we would
be done. We cannot conclude immediately that this is the case, but we can
certainly find a sequence yn ∈ K such that x−yn  → D. On the other hand,
the midpoints yn +y
2
m
lie in K by convexity and so x − yn +y
2
m
 ≥ D. Using
the parallelogram law (1.49) we deduce that yn − ym  → 0 as n, m → ∞,
and so yn is a Cauchy sequence; by completeness, it converges to a limit
y, which lies in K since K is closed. From the triangle inequality we see
that x − yn  → x − y, and thus x − y = D, and so y is a distance
minimiser. 
Exercise 1.4.10. Show by constructing counterexamples that the existence
of the distance minimiser y can fail if either the closure or convexity hypoth-
esis on K is dropped, or if H is merely an inner product space rather than
a Hilbert space. (Hint: For the last case, let H be the inner product space
C([0, 1]) ⊂ L2 ([0, 1]), and let K be the subspace of continuous functions
supported on [0, 1/2].) On the other hand, show that existence (but not
uniqueness) can be recovered if K is assumed to be compact rather than
convex.
Exercise 1.4.11. Using the Hanner inequalities (Exercise 1.4.6), show that
Proposition 1.4.12 also holds for the Lp spaces as long as 1 < p < ∞. (The
specific feature of the Lp spaces that is allowing this is known as uniform
convexity.) Give counterexamples to show that the proposition can fail for
L1 and for L∞ .

Proposition 1.4.12 has some importance in calculus of variations, but we


will not pursue those applications here.
Since every subspace is necessarily convex, we have a corollary:
Exercise 1.4.12 (Orthogonal projections). Let V be a closed subspace of a
Hilbert space H. Then for every x ∈ H there exists a unique decomposition
x = xV + xV ⊥ , where xV ∈ V and xV ⊥ is orthogonal to every element of V .
Furthermore, xV is the closest element of V to x.

Let πV : H → V be the map πV : x → xV , where xV is given by the


above exercise; we refer to πV as the orthogonal projection from H onto V .
It is not hard to see that πV is linear, and from the Pythagorean theorem we
see that πV is a contraction (thus πV x ≤ x for all x ∈ V ). In particular,
πV is continuous.
Exercise 1.4.13 (Orthogonal complement). Given a subspace V of a Hil-
bert space H, define the orthogonal complement V ⊥ of V to be the set of
all vectors in H that are orthogonal to every element of V . Establish the
following claims:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 53

• V ⊥ is a closed subspace of H, and that (V ⊥ )⊥ is the closure of V .


• V ⊥ is the trivial subspace {0} if and only if V is dense.
• If V is closed, then H is isomorphic to the direct sum of V and V ⊥ .
• If V , W are two closed subspaces of H, then (V + W )⊥ = V ⊥ ∩ W ⊥
and (V ∩ W )⊥ = V ⊥ + W ⊥ .

Every vector v in a Hilbert space gives rise to a continuous linear func-


tional λv : H → C, defined by the formula λv (w) := w, v (the continuity
follows from the Cauchy-Schwarz inequality). The Riesz representation the-
orem for Hilbert spaces gives a converse:
Theorem 1.4.13 (Riesz representation theorem for Hilbert spaces). Let
H be a complex Hilbert space, and let λ : H → C be a continuous linear
functional on H. Then there exists a unique v in H such that λ = λv . A
similar claim holds for real Hilbert spaces (replacing C by R throughout).

Proof. We just show the claim for complex Hilbert spaces, since the claim
for real Hilbert spaces is very similar. First, we show uniqueness: if λv = λv ,
then λv−v = 0, and in particular v − v  , v − v   = 0, and so v = v  .
Now we show existence. We may assume that λ is not identically zero,
since the claim is obvious otherwise. Observe that the kernel V := {x ∈
H : λ(x) = 0} is then a proper subspace of H, which is closed since λ
is continuous. By Exercise 1.4.13, the orthogonal complement V ⊥ must
contain at least one non-trivial vector w, which we can normalise to have
unit magnitude. Since w does not lie in V , λ(w) is non-zero. Now observe
λ(x)
that for any x in H, x − λ(w) w lies in the kernel of λ, i.e., it lies in V . Taking
inner products with w, we conclude that
λ(x)
(1.54) x, w − = 0,
λ(w)
and thus
(1.55) λ(x) = x, λ(w)w.
Thus we have λ = λλ(w)w , and the claim follows. 

Remark 1.4.14. This result gives an alternate proof of the p = 2 case of


Theorem 1.3.16, and by modifying Remark 1.26, it can be used to give an
alternate proof of the Lebesgue-Radon-Nikodym theorem; this proof is due
to von Neumann.
Remark 1.4.15. In the next set of notes, when we define the notion of a
dual space, we can reinterpret the Riesz representation theorem as providing
a canonical isomorphism H ∗ ≡ H.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
54 1. Real analysis

Exercise 1.4.14. Using Exercise 1.4.11, give an alternate proof of the 1 <
p < ∞ case of Theorem 1.3.16.

One important consequence of the Riesz representation theorem is the


existence of adjoints:
Exercise 1.4.15 (Existence of adjoints). Let T : H → H  be a continuous
linear transformation. Show that that there exists a unique continuous linear
transformation T † : H  → H with the property that T x, y = x, T † y for
all x ∈ H and y ∈ H  . The transformation T † is called the (Hilbert space)
adjoint of T ; it is of course compatible with the notion of an adjoint matrix
from linear algebra.
Exercise 1.4.16. Let T : H → H  be a continuous linear transformation.
• Show that (T † )† = T .
• Show that T is an isometry if and only if T † T= idH .
• Show that T is an isomorphism if and only if T † T= idH and T T † =
idH  .
• If S : H  → H  is another continuous linear transformation, show
that (ST )† = T † S † .
Remark 1.4.16. An isomorphism of complex Hilbert spaces is also known
as a unitary transformation. (For real Hilbert spaces, the term orthogonal
transformation is used instead.) Note that unitary and orthogonal n × n
matrices generate unitary and orthogonal transformations on Cn and Rn ,
respectively.
Exercise 1.4.17. Show that the projection map πV : H → V from a Hilbert
space to a closed subspace is the adjoint of the inclusion map ιV : V → H.

1.4.3. Orthonormal bases. In the section on inner product spaces, we


studied finite linear combinations of orthonormal systems. Now that we
have completeness, we turn to infinite linear combinations.
We begin with countable linear combinations:
Exercise 1.4.18. Suppose that e1 , e2 , e3 , . . . is a countable orthonormal
system in a complex Hilbert space H, and c1 , c2 , . . . is a sequence of complex
numbers. (As usual, similar statements will hold here for real Hilbert spaces
and real numbers.)

(i) Show that the series ∞ n=1 cn en is conditionally convergent in H if
and only if cn is square-summable.
∞
(ii) If cn is square-summable, show that n=1 cn en is unconditionally
convergent in H, i.e., every permutation of the cn en sums to the
same value.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 55

∞
(iii) Show that the map (cn )∞ n=1 → n=1 cn en is an isometry from the
2
Hilbert space (N) to H. The image V of this isometry is the
smallest closed subspace of H that contains e1 , e2 , . . ., and which we
shall therefore call the (Hilbert space) span of e1 , e2 , . . ..
(iv) Take adjoints of (ii) and conclude that for any x ∈ H, we have
 
πV (x) = ∞ n=1 x, en en and πV (x) =( ∞n=1 |x, en | )
2 1/2 . Con-

clude in particular the Bessel inequality n=1 |x, en | ≤ x2 .
2

Remark 1.4.17. Note the contrast here between conditional and uncondi-
tional summability (which needs only square-summability of the coefficients
cn ) and absolute summability (which requires the stronger condition that
the cn are absolutely summable). In particular there exist non-absolutely
summable series that are still unconditionally summable, in contrast to the
situation for scalars, in which one has the Riemann rearrangement theorem.

Now we can handle arbitrary orthonormal systems (eα )α∈A . If (cα )α∈A
is square-summable, then at most countably many of the cα are non-zero (by
Exercise
 1.3.4). Using parts (i), (ii) of Exercise 1.4.18, we can then form the
sum α∈A cα eα in an unambiguous manner. It is not hard to use Exercise
1.4.18 to then conclude that this gives an isometric embedding of 2 (A) into
H. The image of this isometry is the smallest closed subspace of H that
contains the orthonormal system, which we call the (Hilbert space) span of
that system. (It is the closure of the algebraic span of the system.)
Exercise 1.4.19. Let (eα )α∈A be an orthonormal system in H. Show that
the following statements are equivalent:
(i) The Hilbert space span of (eα )α∈A is all of H.
(ii) The algebraic span of (eα )α∈A (i.e., the finite linear combinations of
the eα ) is dense in H.

(iii) One has the Parseval identity x2 = α∈A |x, eα |2 for all x ∈ H.

(iv) One has the inversion formula x = α∈A x, eα eα for all x ∈ H (in
particular, the coefficients x, eα  are square-summable).
(v) The only vector that is orthogonal to all the eα is the zero vector.
(vi) There is an isomorphism from 2 (A) to H that maps δα to eα for all
α ∈ A (where δα is the Kronecker delta at α).

A system (eα )α∈A obeying any (and hence all) of the properties in Ex-
ercise 1.4.19 is known as an orthonormal basis of the Hilbert space H. All
Hilbert spaces have such a basis:
Proposition 1.4.18. Every Hilbert space has at least one orthonormal ba-
sis.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
56 1. Real analysis

Proof. We use the standard Zorn’s lemma argument (see Section 2.4). Ev-
ery Hilbert space has at least one orthonormal system, namely the empty
system. We order the orthonormal systems by inclusion, and observe that
the union of any totally ordered set of orthonormal systems is again an
orthonormal system. By Zorn’s lemma, there must exist a maximal or-
thonormal system (eα )α∈A . There cannot be any unit vector orthogonal to
all the elements of this system, since otherwise one could add that vector
to the system and contradict orthogonality. Applying Exercise 1.4.19 in the
contrapositive, we obtain an orthonormal basis as claimed. 
Exercise 1.4.20. Show that every vector space V has at least one algebraic
basis, i.e., a set of basis vectors such that every vector in V can be expressed
uniquely as a finite linear combination of basis vectors. (Such bases are also
known as Hamel bases.)
Corollary 1.4.19. Every Hilbert space is isomorphic to 2 (A) for some set
A.
Exercise 1.4.21. Let A, B be sets. Show that 2 (A) and 2 (B) are isomor-
phic iff A and B have the same cardinality. (Hint: The case when A or B
is finite is easy, so suppose A and B are both infinite. If 2 (A) and 2 (B)
are isomorphic, show that B can be covered by a family of at most count-
able sets indexed by A, and vice versa. Then apply the Schröder-Bernstein
theorem (Section 1.13 of Volume II ).

We can now classify Hilbert spaces up to isomorphism by a single car-


dinal, the dimension of that space:
Exercise 1.4.22. Show that all orthonormal bases of a given Hilbert space
H have the same cardinality. This cardinality is called the (Hilbert space)
dimension of the Hilbert space.
Exercise 1.4.23. Show that a Hilbert space is separable (i.e., has a count-
able dense subset) if and only if its dimension is at most countable. Conclude
in particular that up to isomorphism, there is exactly one separable infinite-
dimensional Hilbert space.
Exercise 1.4.24. Let H, H  be complex Hilbert spaces. Show that there
exists another Hilbert space H ⊗ H  , together with a map ⊗ : H × H  →
H ⊗ H  with the following properties:
(i) The map ⊗ is bilinear, thus (cx + dy) ⊗ x = c(x ⊗ x ) + d(y ⊗ x )
and x ⊗ (cx + dy  ) = c(x ⊗ x ) + d(x ⊗ y  ) for all x, y ∈ H, x , y  ∈
H  , c, d ∈ C.
(ii) We have x⊗x , y⊗y  H⊗H  = x, yH x , y  H  for all x, y ∈ H, x , y  ∈
H .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 57

(iii) The (algebraic) span of {x ⊗ x : x ∈ H, x ∈ H  } is dense in H ⊗ H  .

Furthermore, show that H ⊗ H  and ⊗ are unique up to isomorphism in


˜  and ⊗
the sense that if H ⊗H ˜ : H × H  → H ⊗H ˜  are another pair of
objects obeying the above properties, then there exists an isomorphism Φ :
H ⊗ H  → H ⊗H
˜  such that x⊗x˜  = Φ(x ⊗ x ) for all x ∈ H, x ∈ H  . (Hint:
To prove existence, create orthonormal bases for H and H  and take formal
tensor products of these bases.) The space H ⊗ H  is called the (Hilbert
space) tensor product of H and H  , and x ⊗ x is the tensor product of x
and x .

Exercise 1.4.25. Let (X, X , μ) and (Y, Y, ν) be measure spaces. Show that
L2 (X ×Y, X ×Y, μ×ν) is the tensor product of L2 (X, X , μ) and L2 (Y, Y, μ),
if one defines the tensor product f ⊗g of f ∈ L2 (X, X , μ) and g ∈ L2 (Y, Y, μ)
as f ⊗ g(x, y) := f (x)g(y).

We do not yet have enough theory in other areas to give the really
useful applications of Hilbert space theory yet, but let us just illustrate a
simple one, namely the development of Fourier series on the unit circle
R/Z. We can give this space the usual Lebesgue measure (identifying the
unit circle with [0, 1), if one wishes), giving rise to the complex Hilbert space
L2 (R/Z). On this space we can form the characters en (x) := e2πinx for all
integers n; one easily verifies that (en )n∈Z is an orthonormal system. We
claim that it is in fact an orthonormal basis. By Exercise 1.4.19, it suffices
to show that the algebraic span of the en , i.e., the space of trigonometric
polynomials, is dense in L2 (R/Z). But8 from an explicit computation (e.g.,
using Fejér kernels) one can show that the indicator function of any interval
can be approximated to arbitrary accuracy in the L2 norm by trigonometric
polynomials, and is thus in the closure of the trigonometric polynomials. By
linearity, the same is then true of an indicator function of a finite union of
intervals; since Lebesgue measurable sets in R/Z can be approximated to
arbitrary accuracy by finite unions of intervals, the same is true for indicators
of measurable sets. By linearity, the same is true for simple functions, and
by density (Proposition 1.3.8) the same is true for arbitrary L2 functions,
and the claim follows.
The Fourier transform fˆ : Z → C of a function f ∈ L2 (R/Z) is defined
as
1
(1.56) fˆ(n) := f, en  = f (x)e−2πinx dx.
0

8 One can also use the Stone-Weierstrass theorem here; see Theorem 1.10.18.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
58 1. Real analysis

From Exercise 1.4.19, we obtain the Parseval identity

|fˆ(n)|2 = |f (x)|2 dx
n∈Z R/Z

(in particular, fˆ ∈ 2 (Z)) and the inversion formula


f= fˆ(n)en ,
n∈Z
where the right-hand side is unconditionally convergent. Indeed, the Fourier
transform f → fˆ is a unitary transformation between L2 (R/Z) and 2 (Z).
(These facts are collectively referred to as Plancherel’s theorem for the unit
circle.) We will develop Fourier analysis on other spaces than the unit circle
in Section 1.12.
Remark 1.4.20. Of course, much of the theory here generalises the corre-
sponding theory in finite-dimensional linear algebra; we will continue this
theme much later in the course when we turn to the spectral theorem. How-
ever, not every aspect of finite-dimensional linear algebra will carry over so
easily. For instance, it turns out to be quite difficult to take the determinant
or trace of a linear transformation from a Hilbert space to itself in general
(unless the transformation is particularly well behaved, e.g., of trace class).
The Jordan normal form also does not translate to the infinite-dimensional
setting, leading to the notorious invariant subspace problem in the subject.
It is also worth cautioning that while the theory of orthonormal bases in
finite-dimensional Euclidean spaces generalises very nicely to the Hilbert
space setting, the more general theory of bases in finite dimensions becomes
much more subtle in infinite-dimensional Hilbert spaces, unless the basis is
“almost orthonormal” in some sense (e.g., if it forms a frame).

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/17.
Thanks to Américo Tavares, S, and Xiaochuan Liu for corrections.
Uhlrich Groh and Dmitriy raised the interesting open problem of whether
any closed subset K of H for which distance minimisers to every point x
existed and are unique were necessarily convex, thus providing a converse to
Proposition 1.4.12. (Sets with this property are known as Chebyshev sets.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.5

Duality and the


Hahn-Banach theorem

When studying a mathematical space X (e.g., a vector space, a topolog-


ical space, a manifold, a group, an algebraic variety etc.), there are two
fundamentally basic ways to try to understand the space:

(i) By looking at subobjects in X, or more generally maps f : Y → X


from some other space Y into X. For instance, a point in a space X
can be viewed as a map from the abstract point pt to X; a curve in
a space X could be thought of as a map from [0, 1] to X; a group G
can be studied via its subgroups K, and so forth.
(ii) By looking at objects on X, or more precisely maps f : X → Y from
X into some other space Y . For instance, one can study a topological
space X via the real- or complex-valued continuous functions f ∈
C(X) on X; one can study a group G via its quotient groups π :
G → G/H; one can study an algebraic variety V by studying the
polynomials on V (and in particular, the ideal of polynomials that
vanish identically on V ); and so forth.

(There are also more sophisticated ways to study an object via its maps,
e.g., by studying extensions, joinings, splittings, universal lifts, etc. The
general study of objects via the maps between them is formalised abstractly
in modern mathematics as category theory, and is also closely related to
homological algebra.)
A remarkable phenomenon in many areas of mathematics is that of (con-
travariant) duality: that the maps into and out of one type of mathematical
object X can be naturally associated to the maps out of and into a dual

59

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
60 1. Real analysis

object X ∗ (note the reversal of arrows here!) In some cases, the dual object
X ∗ looks quite different from the original object X. (For instance, in Stone
duality, discussed in Section 2.3, X would be a Boolean algebra (or some
other partially ordered set) and X ∗ would be a compact totally disconnected
Hausdorff space (or some other topological space).) In other cases, most no-
tably with Hilbert spaces as discussed in Section 1.4, the dual object X ∗ is
essentially identical to X itself.
In these notes we discuss a third important case of duality, namely du-
ality of normed vector spaces, which is of an intermediate nature to the
previous two examples: The dual X ∗ of a normed vector space turns out to
be another normed vector space, but generally one which is not equivalent
to X itself (except in the important special case when X is a Hilbert space,
as mentioned above). On the other hand, the double dual (X ∗ )∗ turns out
to be closely related to X, and in several (but not all) important cases, is
essentially identical to X. One of the most important uses of dual spaces in
functional analysis is that it allows one to define the transpose T ∗ : Y ∗ → X ∗
of a continuous linear operator T : X → Y .
A fundamental tool in understanding duality of normed vector spaces
will be the Hahn-Banach theorem, which is an indispensable tool for ex-
ploring the dual of a vector space. (Indeed, without this theorem, it is not
clear at all that the dual of a non-trivial normed vector space is non-trivial!)
Thus, we shall study this theorem in detail in this section concurrently with
our discussion of duality.

1.5.1. Duality. In the category of normed vector spaces, the natural no-
tion of a map (or morphism) between two such spaces is that of a continuous
linear transformation T : X → Y between two normed vector spaces X, Y .
By Lemma 1.3.17, any such linear transformation is bounded, in the sense
that there exists a constant C such that T xY ≤ CxX for all x ∈ X. The
least such constant C is known as the operator norm of T , and is denoted
T op or simply T .
Two normed vector spaces X, Y are equivalent if there is an invertible
continuous linear transformation T : X → Y from X to Y , thus T is bijective
and there exist constants C, c > 0 such that cxX ≤ T xY ≤ CxX for
all x ∈ X. If one can take C = c = 1, then T is an isometry, and X and Y
are called isomorphic. When one has two norms 1 , 2 on the same vector
space X, we say that the norms are equivalent if the identity from (X, 1 )
to (X, 2 ) is an invertible continuous transformation, i.e., that there exist
constants C, c > 0 such that cx1 ≤ x2 ≤ Cx1 for all x ∈ X.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 61

Exercise 1.5.1. Show that all linear transformations from a finite-dimen-


sional space to a normed vector space are continuous. Conclude that all
norms on a finite-dimensional space are equivalent.

Let B(X → Y ) denote the space of all continuous linear transforma-


tions from X to Y . (This space is also denoted by many other names, e.g.,
L(X, Y ), Hom(X → Y ), etc.) This has the structure of a vector space: the
sum S +T : x → Sx+T x of two continuous linear transformations is another
continuous linear transformation, as is the scalar multiple cT : x → cT x of
a linear transformation.
Exercise 1.5.2. Show that B(X → Y ) with the operator norm is a normed
vector space. If Y is complete (i.e., is a Banach space), show that B(X → Y )
is also complete (i.e., is also a Banach space).
Exercise 1.5.3. Let X, Y , Z be Banach spaces. Show that if T ∈ B(X →
Y ) and S ∈ B(Y → Z), then the composition ST : X → Z lies in B(X → Z)
and ST op ≤ Sop T op . (As a consequence of this inequality, we see that
B(X → X) is a Banach algebra.)

Now we can define the notion of a dual space.


Definition 1.5.1 (Dual space). Let X be a normed vector space. The
(continuous) dual space X ∗ of X is defined to be X ∗ := B(X → R) if X is
a real vector space, and X ∗ := B(X → C) if X is a complex vector space.
Elements of X ∗ are known as continuous linear functionals (or bounded
linear functionals) on X.
Remark 1.5.2. If one drops the requirement that the linear functionals be
continuous, we obtain the algebraic dual space of linear functionals on X.
This space does not play a significant role in functional analysis, though.

From Exercise 1.5.2, we see that the dual of any normed vector space
is a Banach space, and so duality is arguably a Banach space notion rather
than a normed vector space notion. The following exercise reinforces this:
Exercise 1.5.4. We say that a normed vector space X has a completion X
if X is a Banach space and X can be identified with a dense subspace of X
(cf. Exercise 1.4.8).
(i) Show that every normed vector space X has at least one completion

X, and that any two completions X, X are isomorphic in the sense

that there exists an isomorphism from X to X which is the identity
on X.
(ii) Show that the dual spaces X ∗ and (X)∗ are isomorphic to each other.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
62 1. Real analysis

The next few exercises are designed to give some intuition as to how
dual spaces work.

Exercise 1.5.5. Let Rn be given the Euclidean metric. Show that (Rn )∗
is isomorphic to Rn . Establish the corresponding result for the complex
spaces Cn .

Exercise 1.5.6. Let cc (N) be the vector space of sequences (an )n∈N of real
or complex numbers which are compactly supported (i.e., at most finitely
many of the an are non-zero). We give cc the uniform norm  ∞ .
(i) Show that the dual space cc (N)∗ is isomorphic to 1 (N).

(ii) Show that the completion of cc (N) is isomorphic to c0 (N), the space
of sequences on N that go to zero at infinity (again with the uniform
norm); thus, by Exercise 1.5.4, the dual space of c0 (N) is isomorphic
to 1 (N) also.
(iii) On the other hand, show that the dual of 1 (N) is isomorphic to
∞ (N), a space which is strictly larger than c (N) or c (N). Thus
c 0
we see that the double dual of a Banach space can be strictly larger
than the space itself.

Exercise 1.5.7. Let H be a real or complex Hilbert space. Using the Riesz
representation theorem for Hilbert spaces (Theorem 1.4.13), show that the
dual space H ∗ is isomorphic (as a normed vector space) to the conjugate
space H (see Example 1.4.11), with an element g ∈ H being identified
with the linear functional f → f, g. Thus we see that Hilbert spaces are
essentially self-dual (if we ignore the pesky conjugation sign).

Exercise 1.5.8. Let (X, X , μ) be a σ-finite measure space, and let 1 ≤


p < ∞. Using Theorem 1.3.16, show that the dual space of Lp (X, X , μ) is
 
isomorphic to Lp (X, X , μ), with an element g ∈ Lp (X, X , μ) being identified
with the linear functional f → X f g dμ. (The one tricky thing to verify
is that the identification is an isometry, but this can be seen by a closer
inspection of the proof of Theorem 1.3.16.) For an additional challenge:
remove the σ-finite hypothesis when p > 1.

One of the key purposes of introducing the notion of a dual space is that
it allows one to define the notion of a transpose.

Definition 1.5.3 (Transpose). Let T : X → Y be a continuous linear trans-


formation from one normed vector space X to another Y . The transpose
T ∗ : Y ∗ → X ∗ of T is defined to be the map that sends any continuous
linear functional λ ∈ Y ∗ to the linear functional T ∗ λ := λ ◦ T ∈ X ∗ , thus
(T ∗ λ)(x) = λ(T x) for all x ∈ X.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 63

Exercise 1.5.9. Show that the transpose T ∗ of a continuous linear transfor-


mation T between normed vector spaces is again a continuous linear trans-
formation with T ∗ op ≤ T op , thus the transpose operation is itself a
linear map from B(X → Y ) to B(Y ∗ → X ∗ ). (We will improve this result
in Theorem 1.5.13 below.)
Exercise 1.5.10. An n×m matrix A with complex entries can be identified
with a linear transformation LA : Cn → Cm . Identifying the dual space of
Cn with itself as in Exercise 1.5.5, show that the transpose L∗A : Cm → Cn
is equal to LAt , where At is the transpose matrix of A.
Exercise 1.5.11. Show that the transpose of a surjective continuous linear
transformation between normed vector spaces is injective. Show also that
the condition of surjectivity can be relaxed to that of having a dense image.
Remark 1.5.4. Observe that if T : X → Y and S : Y → Z are continuous
linear transformations between normed vector spaces, then (ST )∗ = T ∗ S ∗ .
In the language of category theory, this means that duality X → X ∗ of
normed vector spaces, and transpose T → T ∗ of continuous linear transfor-
mations, form a contravariant functor from the category of normed vector
spaces (or Banach spaces) to itself.
Remark 1.5.5. The transpose T ∗ : H  → H of a continuous linear trans-
formation T : H → H  between complex Hilbert spaces is closely related
to the adjoint T † : H  → H of that transformation, as defined in Exercise
1.4.15, by using the obvious (antilinear) identifications between H and H,
and between H  and H  . This is analogous to the linear algebra fact that
the adjoint matrix is the complex conjugate of the transpose matrix. One
should note that in the literature, the transpose operator T ∗ is also (some-
what confusingly) referred to as the adjoint of T . Of course, for real vector
spaces, there is no distinction between transpose and adjoint.
1.5.2. The Hahn-Banach theorem. Thus far, we have defined the dual
space X ∗ , but apart from some concrete special cases (Hilbert spaces, Lp
spaces, etc.), we have not been able to say much about what X ∗ consists
of—it is not even clear yet that if X is non-trivial (i.e., not just {0}), that
X ∗ is also non-trivial—for all one knows, there could be no non-trivial con-
tinuous linear functionals on X at all! The Hahn-Banach theorem is used to
resolve this, by providing a powerful means to construct continuous linear
functionals as needed.
Theorem 1.5.6 (Hahn-Banach theorem). Let X be a normed vector space,
and let Y be a subspace of X. Then any continuous linear functional λ ∈ Y ∗
on Y can be extended to a continuous linear functional λ̃ ∈ X ∗ on X with
the same operator norm; thus λ̃ agrees with λ on Y and λ̃X ∗ = λY ∗ .
(Note: the extension λ̃ is, in general, not unique.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
64 1. Real analysis

We prove this important theorem in stages. We first handle the codi-


mension one real case:
Proposition 1.5.7. The Hahn-Banach theorem is true when X, Y are real
vector spaces, and X is spanned by Y and an additional vector v.

Proof. We can assume that v lies outside Y , since the claim is trivial oth-
erwise. We can also normalise λY ∗ = 1 (the claim is of course trivial if
λY ∗ vanishes). To specify the extension λ̃ of λ, it suffices by linearity to
specify the value of λ̃(v). In order for the extension λ̃ to continue to have
operator norm 1, we require that
|λ̃(y + tv)| ≤ y + tvX
for all t ∈ R and y ∈ Y . This is automatic for t = 0, so by homogeneity it
suffices to attain this bound for t = 1. We rearrange this a bit as
sup λ(y  ) − y  + vX ≤ λ̃(v) ≤ inf y + vX − λ(y).
y  ∈Y y∈Y

But as λ has operator norm 1, an application of the triangle inequality


shows that the infimum on the right-hand side is at least as large as the
supremum on the left-hand side, and so one can choose λ̃(v) obeying the
required properties. 
Corollary 1.5.8. The Hahn-Banach theorem is true when X, Y are real
normed vector spaces.

Proof. This is a standard “Zorn’s lemma argument” (see Section 2.4). Fix
Y , X, λ. Define a partial extension of λ to be a pair (Y  , λ ), where Y  is an
intermediate subspace between Y and X, and λ is an extension of λ with
the same operator norm as λ. The set of all partial extensions is partially
ordered by declaring (Y  , λ ) ≥ (Y  , λ ) if Y  contains Y  and λ extends λ .
It is easy to see that every chain of partial extensions has an upper bound;
hence, by Zorn’s lemma, there must be a maximal partial extension (Y∗ , λ∗ ).
If Y∗ = X, we are done; otherwise, one can find v ∈ X\Y∗ . By Proposition
1.5.7, we can then extend λ∗ further to the larger space spanned by Y∗ and
v, a contradiction; and the claim follows. 
Remark 1.5.9. Of course, this proof of the Hahn-Banach theorem relied
on the axiom of choice (via Zorn’s lemma) and is thus non-constructive.
It turns out that this is, to some extent, necessary: it is not possible to
prove the Hahn-Banach theorem if one deletes the axiom of choice from the
axioms of set theory (although it is possible to deduce the theorem from
slightly weaker versions of this axiom, such as the ultrafilter lemma).

Finally, we establish the complex case by leveraging the real case.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 65

Proof of Hahn-Banach theorem (complex case). Let λ : Y → C be


a continuous complex-linear functional, which we can normalise to have
operator norm 1. Then the real part ρ := Re(λ) : Y → R is a continuous
real-linear functional on Y (now viewed as a real normed vector space rather
than a complex one), which has operator norm at most 1 (in fact, it is equal
to 1, though we will not need this). Applying Corollary 1.5.8, we can extend
this real-linear functional ρ to a continuous real-linear functional ρ̃ : X → R
on X (again viewed now just as a real normed vector space) of norm at most
1.
To finish the job, we have to somehow complexify ρ̃ to a complex-linear
functional λ̃ : X → R of norm at most 1 that agrees with λ on Y. It is
reasonable to expect that Re λ̃ = ρ̃; a bit of playing around with complex
linearity then forces
(1.57) λ̃(x) := ρ̃(x) − iρ̃(ix).
Accordingly, we shall use (1.57) to define λ̃. It is easy to see that λ̃ is a
continuous complex-linear functional agreeing with λ on Y . Since ρ̃ has norm
at most 1, we have | Re λ̃(x)| ≤ xX for all x ∈ X. We can amplify this
(cf. Section 1.9 of Structure and Randomness) by exploiting phase rotation
symmetry, thus | Re λ̃(eiθ x)| ≤ xX for all θ ∈ R. Optimising in θ, we see
that ρ̃ has norm at most 1, as required. 
Exercise 1.5.12. In the special case when X is a Hilbert space, give an
alternate proof of the Hahn-Banach theorem, using the material from Section
1.4, that avoids Zorn’s lemma or the axiom of choice.

Now we put this Hahn-Banach theorem to work in the study of duality


and transposes.
Exercise 1.5.13. Let T : X → Y be a continuous linear transformation
which is bounded from below (i.e., there exists c > 0 such that T x ≥ cx
for all x ∈ X); note that this ensures that X is equivalent to some subspace of
Y . Show that the transpose T ∗ : Y ∗ → X ∗ is surjective. Give an example to
show that the claim fails if T is merely assumed to be injective rather than
bounded from below. (Hint: Consider the map (an )∞ ∞
n=1 → (an /n)n=1 on
some suitable space of sequences.) This should be compared with Exercise
1.5.11.
Exercise 1.5.14. Let x be an element of a normed vector space X. Show
that there exists λ ∈ X ∗ such that λX ∗ = 1 and λ(x) = xX . Conclude
in particular that the dual of a non-trivial normed vector space is again
non-trivial.

Given a normed vector space X, we can form its double dual (X ∗ )∗ : the
space of linear functionals on X ∗ . There is a very natural map ι : X →

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
66 1. Real analysis

(X ∗ )∗ , defined as
(1.58) ι(x)(λ) := λ(x)
for all x ∈ X and λ ∈ X ∗.
(This map is closely related to the Gelfand
transform in the theory of operator algebras; see Section 1.10.4.) It is easy
to see that ι is a continuous linear transformation, with operator norm at
most 1. But the Hahn-Banach theorem gives a stronger statement:
Theorem 1.5.10. ι is an isometry.

Proof. We need to show that ι(x)X ∗∗ = x for all x ∈ X. The upper
bound is clear; the lower bound follows from Exercise 1.5.14. 
Exercise 1.5.15. Let Y be a subspace of a normed vector space X. Define
the complement Y ⊥ of Y to be the space of all λ ∈ X ∗ which vanish on Y .
(i) Show that Y ⊥ is a closed subspace of X ∗ , and that Y := {x ∈ X :
λ(x) = 0 for all λ ∈ Y ⊥ } (compare with Exercise 1.4.13). In other
words, ι(Y ) = ι(X) ∩ Y ⊥⊥ .
(ii) Show that Y ⊥ is trivial if and only if Y is dense, and Y ⊥ = X ∗ if
and only if Y is trivial.
(iii) Show that Y ⊥ is isomorphic to the dual of the quotient space X/Y
(which has the norm x + Y X/Y := inf y∈Y x + yX ).
(iv) Show that Y ∗ is isomorphic to X ∗ /Y ⊥ .

From Theorem 1.5.10, every normed vector space can be identified with
a subspace of its double dual (and every Banach space is identified with
a closed subspace of its double dual). If ι is surjective, then we have an
isomorphism X ≡ X ∗∗ , and we say that X is reflexive in this case; since X ∗∗
is a Banach space, we conclude that only Banach spaces can be reflexive.
From linear algebra we see in particular that any finite-dimensional normed
vector space is reflexive; from Exercises 1.5.7 and 1.5.8 we see that any
Hilbert space and any Lp space with 1 < p < ∞ on a σ-finite space is also
reflexive (and the hypothesis of σ-finiteness can in fact be dropped). On the
other hand, from Exercise 1.5.6, we see that the Banach space c0 (N) is not
reflexive.
An important fact is that l1 (N) is also not reflexive: the dual of l1 (N)
is equivalent to l∞ (N), but the dual of l∞ (N) is strictly larger than that of
l1 (N). Indeed, consider the subspace c(N) of l∞ (N) consisting of bounded
convergent sequences (equivalently, this is the space spanned by c0 (N) and
the constant sequence (1)n∈N ). The limit functional (an )∞ n=1 → limn→∞ an
is a bounded linear functional on c(N), with operator norm 1, and thus by
the Hahn-Banach theorem can be extended to a generalised limit functional
λ : l∞ (N) → C which is a continuous linear functional of operator norm

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 67

1. As such generalised limit functionals annihilate all of c0 (N) but are still
non-trivial, they do not correspond to any element of 1 (N) ≡ c0 (N)∗ .
Exercise 1.5.16. Let λ : l∞ (N) → C be a generalised limit functional (i.e.,
an extension of the limit functional of c(N) of operator norm 1) which is
also an algebra homomorphism, i.e., λ((xn yn )∞ ∞ ∞
n=1 ) = λ((xn )n=1 )λ((yn )n=1 )
for all sequences (xn )∞ ∞
n=1 , (yn )n=1 ∈
∞ (N). Show that there exists a unique

non-principal ultrafilter p ∈ βN\N (as defined for instance Section 1.5 of


Structure and Randomness) such that λ((xn )∞ n=1 ) = limn→p xn for all se-
quences (xn )∞n=1 ∈ ∞ (N). Conversely, show that every non-principal ultra-

filter generates a generalised limit functional that is also an algebra homo-


morphism. (This exercise may become easier once one is acquainted with
the Stone-Čech compactification; see Section 2.5.1. If the algebra homomor-
phism property is dropped, one has to consider probability measures on the
space of non-principal ultrafilters instead.)
Exercise 1.5.17. Show that any closed subspace of a reflexive space is
again reflexive. Also show that a Banach space X is reflexive if and only
if its dual is reflexive. Conclude that if (X, X , μ) is a measure space which
contains a countably infinite sequence of disjoint sets of positive measure,
then L1 (X, X , μ) and L∞ (X, X , μ) are not reflexive. (Hint: Reduce to the
σ-finite case. L∞ will contain an isometric copy of ∞ (N).)

Theorem 1.5.10 gives a classification of sorts for normed vector spaces:


Corollary 1.5.11. Every normed vector space X is isomorphic to a sub-
space of BC(Y ), the space of bounded continuous functions on some bounded
complete metric space Y , with the uniform norm.

Proof. Take Y to be the unit ball in X ∗ , then the map ι identifies X with
a subspace of BC(Y ). 
Remark 1.5.12. If X is separable, it is known that one can take Y to just
be the unit interval [0, 1]; this is the Banach-Mazur theorem, which we will
not prove here.

Next, we apply the Hahn-Banach theorem to the transpose operation,


improving Exercise 1.5.9:
Theorem 1.5.13. Let T : X → Y be a continuous linear transformation
between normed vector spaces. Then T ∗ op = T op . Thus the transpose
operation is an isometric embedding of B(X → Y ) into B(Y ∗ → X ∗ ).

Proof. By Exercise 1.5.9, it suffices to show that T ∗ op ≥ T op . Accord-


ingly, let α be any number strictly less than T op , then we can find x ∈ X
such that T xY ≥ αx. By Exercise 1.5.14 we can then find λ ∈ Y ∗

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
68 1. Real analysis

such that λY ∗ = 1 and λ(T x) = T ∗ λ(x) = T xY ≥ αx, and thus
T ∗ λX ∗ ≥ α. This implies that T ∗ op ≥ α; taking suprema over all α
strictly less than T op we obtain the claim. 

If we identify X and Y with subspaces of X ∗∗ and Y ∗∗ , respectively, we


thus see that T ∗∗ : X ∗∗ → Y ∗∗ is an extension of T : X → Y with the same
operator norm. In particular, if X and Y are reflexive, we see that T ∗∗ can
be identified with T itself (exactly as in the finite-dimensional linear algebra
setting).

1.5.3. Variants of the Hahn-Banach theorem (optional). The Hahn-


Banach theorem has a number of essentially equivalent variants, which also
are of interest for the geometry of normed vector spaces.
Exercise 1.5.18 (Generalised Hahn-Banach theorem). Let Y be a subspace
of a real or complex vector space X, let ρ : X → R be a sublinear functional
on X (thus ρ(cx) = cρ(x) for all non-negative c and all x ∈ X and ρ(x+y) ≤
ρ(x) + ρ(y) for all x, y ∈ X), and let λ : Y → R be a linear functional on
Y such that λ(y) ≤ ρ(y) for all y ∈ Y . Show that λ can be extended to a
linear functional λ̃ on X such that λ̃(x) ≤ ρ(x) for all x ∈ X. Show that
this statement implies the usual Hahn-Banach theorem. (Hint: Adapt the
proof of the Hahn-Banach theorem.)

Call a subset A of a real vector space V algebraically open if the sets


{t : x + tv ∈ A} are open in R for all x, v ∈ V ; note that every open set in
a normed vector space is algebraically open.
Theorem 1.5.14 (Geometric Hahn-Banach theorem). Let A, B be con-
vex subsets of a real vector space V , with A algebraically open. Then the
following are equivalent:
(i) A and B are disjoint.
(ii) There exists a linear functional λ : V → R and a constant c such that
λ < c on A, and λ ≥ c on B. (Equivalently, there is a hyperplane
separating A and B, with A avoiding the hyperplane entirely.)
If A and B are convex cones (i.e., tx ∈ A whenever x ∈ A and t > 0, and
similarly for B), we may take c = 0.
Remark 1.5.15. In finite dimensions, it is not difficult to drop the algebraic
openness hypothesis on A as long as one now replaces the condition λ < c
by λ ≤ c. However in infinite dimensions one cannot do this. Indeed, if we
take V = cc (N), let A be the set of sequences whose last non-zero element
is strictly positive, and let B = −A consist of those sequences whose last
non-zero element is strictly negative. Then one can verify that there is no
hyperplane separating A from B.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 69

Proof. Clearly (ii) implies (i); now we show that (i) implies (ii). We first
handle the case when A and B are convex cones.
Define a good pair to be a pair (A, B) where A and B are disjoint convex
cones, with A algebraically open, thus (A, B) is a good pair by hypothesis.
We can order (A, B) ≤ (A , B  ) if A contains A and B  contains B. A
standard application of Zorn’s lemma (Section 2.4) reveals that any good
pair (A, B) is contained in a maximal good pair, and so without loss of
generality we may assume that (A, B) is a maximal good pair.
We can of course assume that neither A nor B is empty. We now claim
that B is the complement of A. For if not, then there exists v ∈ V which
does not lie in either A or B. By the maximality of (A, B), the convex cone
generated by B ∪ {v} must intersect A at some point, say w. By dilating
w if necessary we may assume that w lies on a line segment between v and
some point b in B. By using the convexity and disjointness of A and B
one can then deduce that for any a ∈ A, the ray {a + t(w − b) : t > 0} is
disjoint from B. Thus one can enlarge A to the convex cone generated by
A and w − b, which is still algebraically open and now strictly larger than
A (because it contains v), a contradiction. Thus B is the complement of A.
Let us call a line in V monochromatic if it is entirely contained in A
or entirely contained in B. Note that if a line is not monochromatic, then
(because A and B are convex and partition the line, and A is algebraically
open) the line splits into an open ray contained in A and a closed ray con-
tained in B. From this we can conclude that if a line is monochromatic,
then all parallel lines must also be monochromatic, because otherwise we
look at the ray in the parallel line which contains A and use convexity of
both A and B to show that this ray is adjacent to a halfplane contained in
B, contradicting algebraic openness. Now let W be the space of all vectors w
for which there exists a monochromatic line in the direction w (including 0).
Then W is easily seen to be a vector space; since A, B are non-empty, W is
a proper subspace of V . On the other hand, if w and w are not in W , some
playing around with the property that A and B are convex sets partitioning
V shows that the plane spanned by w and w contains a monochromatic
line, and hence some non-trivial linear combination of w and w lies in W .
Thus V /W is precisely one dimensional. Since every line with direction in
w is monochromatic, A and B also have well-defined quotients A/W and
B/W on this one-dimensional subspace, which remain convex (with A/W
still algebraically open). But then it is clear that A/W and B/W are an
open and closed ray from the origin in V /W , respectively. It is then a rou-
tine matter to construct a linear functional λ : V → R (with null space W )
such that A = {λ < 0} and B = {λ ≥ 0}, and the claim follows.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
70 1. Real analysis

To establish the general case when A, B are not convex cones, we lift to
one higher dimension and apply the previous result to convex cones A , B  ∈
R×V defined by A := {(t, tx) : t > 0, x ∈ A}, B  := {(t, tx) : t > 0, x ∈ B}.
We leave the verification that this works as an exercise. 
Exercise 1.5.19. Use the geometric Hahn-Banach theorem to reprove Ex-
ercise 1.5.18, thus providing a slightly different proof of the Hahn-Banach
theorem. (It is possible to reverse these implications and deduce the geomet-
ric Hahn-Banach theorem from the usual Hahn-Banach theorem, but this
is somewhat trickier, requiring one to fashion a norm out of the difference
A − B of two convex cones.)
Exercise 1.5.20 (Algebraic Hahn-Banach theorem). Let V be a vector
space over a field F , let W be a subspace of V , and let λ : W → F be a
linear map. Show that there exists a linear map λ̃ : V → F which extends
λ.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/26.
Thanks to Eric, Xiaochuan Li, and an anonymous commenter for corrections.
Some further discussion of variants of the Hahn-Banach theorem (in the
finite-dimensional setting) can be found in Section 1.16 of Structure and
Randomness.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.6

A quick review of
point-set topology

To progress further in our study of function spaces, we will need to de-


velop the standard theory of metric spaces, and of the closely related theory
of topological spaces (i.e., point-set topology). We will be assuming that
readers will already have encountered these concepts in an undergraduate
topology or real analysis course, but for sake of completeness we will briefly
review the basics of both spaces here.

1.6.1. Metric spaces. In many spaces, one wants a notion of when two
points in the space are near or far. A particularly quantitative and intuitive
way to formalise this notion is via the concept of a metric space.

Definition 1.6.1 (Metric spaces). A metric space X = (X, d) is a set X,


together with a distance function d : X×X→ R+ which obeys the following
properties:
• Non-degeneracy. For any x, y ∈ X, we have d(x, y) ≥ 0, with equality
if and only if x = y.
• Symmetry. For any x, y ∈ X, we have d(x, y) = d(y, x).
• Triangle inequality. For any x, y, z ∈ X, we have d(x, z) ≤ d(x, y) +
d(y, z).

Example 1.6.2. Every normed vector space (X, ) is a metric space, with
distance function d(x, y) := x − y.

Example 1.6.3. Any subset Y of a metric space X = (X, d) is also a metric


space Y = (Y, d Y ×Y ), where d Y ×Y : Y × Y → R+ is the restriction of d to

71

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
72 1. Real analysis

Y × Y . We call the metric space Y = (Y, d Y ×Y ) a subspace of the metric


space X = (X, d).
Example 1.6.4. Given two metric spaces X = (X, dX ) and Y = (Y, dY ), we
can define the product space X × Y = (X × Y, dX × dY ) to be the Cartesian
product X × Y with the product metric
(1.59) dX × dY ((x, y), (x , y  )) := max(dX (x, x ), dY (y, y  )).
(One can also pick slightly different metrics here, such as dX (x, x )+dY (y, y  ),
but this metric only differs from (1.59) by a factor of two, and so they are
equivalent (see Example 1.6.11 below).)
Example 1.6.5. Any set X can be turned into a metric space by using the
discrete metric d : X×X→ R+ , defined by setting d(x, y) = 0 when x = y
and d(x, y) = 1 otherwise.

Given a metric space, one can then define various useful topological
structures. There are two ways to do so. One is via the machinery of
convergent sequences:
Definition 1.6.6 (Topology of a metric space). Let (X, d) be a metric space.
• A sequence xn of points in X is said to converge to a limit x ∈ X if
one has d(xn , x) → 0 as n → ∞. In this case, we say that xn → x in
the metric d as n → ∞, and that limn→∞ xn = x in the metric space
X. (It is easy to see that any sequence of points in a metric space
has at most one limit.)
• A point x is an adherent point of a set E ⊂ X if it is the limit of some
sequence in E. (This is slightly different from being a limit point of
E, which is equivalent to being an adherent point of E\{x}; every
adherent point is either a limit point or an isolated point of E.) The
set of all adherent points of E is called the closure E of X. A set E
is closed if it contains all its adherent points, i.e., if E = E. A set
E is dense if every point in X is adherent to E, or equivalently if
E = X.
• Given any x in X and r > 0, define the open ball B(x, r) centred
at x with radius r to be the set of all y in X such that d(x, y) < r.
Given a set E, we say that x is an interior point of E if there is some
open ball centred at x which is contained in E. The set of all interior
points is called the interior E ◦ of E. A set is open if every point is
an interior point, i.e., if E = E ◦ .

There is however an alternate approach to defining these concepts, which


takes the concept of an open set as a primitive, rather than the distance
function, and defines other terms in terms of open sets. For instance:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 73

Exercise 1.6.1. Let (X, d) be a metric space.


(i) Show that a sequence xn of points in X converges to a limit x ∈
X if and only if every open neighbourhood of x (i.e., an open set
containing x) contains xn for all sufficiently large n.
(ii) Show that a point x is an adherent point of a set E if and only if
every open neighbourhood of x intersects E.
(iii) Show that a set E is closed if and only if its complement is open.
(iv) Show that the closure of a set E is the intersection of all the closed
sets containing E.
(v) Show that a set E is dense if and only if every non-empty open set
intersects E.
(vi) Show that the interior of a set E is the union of all the open sets
contained in E, and that x is an interior point of E if and only if
some neighbourhood of x is contained in E.

In the next section we will adopt this “open sets first” perspective when
defining topological spaces.
On the other hand, there are some other properties of subsets of a metric
space which require the metric structure more fully, and cannot be defined
purely in terms of open sets (see, e.g., Example 1.6.24), although some of
these concepts can still be defined using a structure intermediate to metric
spaces and topological spaces, such as uniform space. For instance:
Definition 1.6.7. Let (X, d) be a metric space.
• A sequence (xn )∞ n=1 of points in X is a Cauchy sequence if
d(xn , xm ) → 0 as n, m → ∞ (i.e., for every ε > 0 there exists N > 0
such that d(xn , xm ) ≤ ε for all n, m ≥ N ).
• A space X is complete if every Cauchy sequence is convergent.
• A set E in X is bounded if it is contained inside a ball.
• A set E is totally bounded in X if for every ε > 0, E can be covered
by finitely many balls of radius ε.
Exercise 1.6.2. Show that any metric space X can be identified with a
dense subspace of a complete metric space X, known as a metric completion
or Cauchy completion of X. (For instance, R is a metric completion of
Q.) (Hint: One can define a real number to be an equivalence class of
Cauchy sequences of rationals. Once the reals are defined, essentially the

same construction works in arbitrary metric spaces.) Furthermore, if X is
another metric completion of X, show that there exists an isometry between

X and X which is the identity on X. Thus, up to isometry, there is a unique
metric completion to any metric space.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
74 1. Real analysis

Exercise 1.6.3. Show that a metric space X is complete if and only if it is


closed in every superspace Y of X (i.e., in every metric space Y for which
X is a subspace). Thus one can think of completeness as being the property
of being absolutely closed.
Exercise 1.6.4. Show that every totally bounded set is also bounded. Con-
versely, in a Euclidean space Rn with the usual metric, show that every
bounded set is totally bounded. But give an example of a set in a met-
ric space which is bounded but not totally bounded. (Hint: Use Example
1.6.5.)

Now we come to an important concept.


Theorem 1.6.8 (Heine-Borel theorem for metric spaces). Let (X, d) be a
metric space. Then the following are equivalent:
(i) Sequential compactness. Every sequence in X has a convergent sub-
sequence.
(ii) Compactness. Every open cover (Vα )α∈A of X (i.e., a collection of
open sets Vα whose union contains X) has a finite subcover.
(iii) Finite intersection property. If (Fα )α∈A is a collection of closed sub-
sets of X such that any finite subcollection of sets has non-empty
intersection, then the entire collection has non-empty intersection.
(iv) X is complete and totally bounded.

Proof. ((ii) =⇒ (i)). If there was an infinite sequence xn with no con-


vergent subsequence, then given any point x in X there must exist an open
ball centred at x which contains xn for only finitely many n (since otherwise
one could easily construct a subsequence of xn converging to x. By (ii), one
can cover X with a finite number of such balls. But then the sequence xn
would be finite, a contradiction.
((i) =⇒ (iv)). If X was not complete, then there would exist a Cauchy
sequence which is not convergent; one easily shows that this sequence cannot
have any convergent subsequences either, contradicting (i). If X was not to-
tally bounded, then there exists ε > 0 such that X cannot be covered by any
finite collection of balls of radius ε; a standard greedy algorithm argument
then gives a sequence xn such that d(xn , xm ) ≥ ε for all distinct n, m. This
sequence clearly has no convergent subsequence, again a contradiction.
((ii) ⇐⇒ (iii)). This follows from de Morgan’s laws and Exercise
1.6.1(iii).
((iv) =⇒ (iii)). Let (Fα )α∈A be as in (iii). Call a set E in X rich if
it intersects all of the Fα . Observe that if one could cover X by a finite
number of non-rich sets, then (as each non-rich set is disjoint from at least

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 75

one of the Fα ) there would be a finite number of Fα whose intersection is


empty, a contradiction. Thus, whenever we cover X by finitely many sets,
at least one of them must be rich.
As X is totally bounded, for each n ≥ 1 we can find a finite set
xn,1 , . . . , xn,mn such that the balls B(xn,1 , 2−n ), . . . , B(xn,mn , 2−n ) cover X.
By the previous discussion, we can then find 1 ≤ in ≤ mn such that
B(xn,in , 2−n ) is rich.
Call a ball B(xn,i , 2−n ) asymptotically rich if it contains infinitely many
of the xj,ij . As these balls cover X, we see that for each n, B(xn,i , 2−n ) is
asymptotically rich for at least one i. Furthermore, since each ball of radius
2−n can be covered by balls of radius 2−n−1 , we see that if B(xn,j , 2−n )
is asymptotically rich, then it must intersect an asymptotically rich ball
B(xn+1,j  , 2−n−1 ). Iterating this, we can find a sequence B(xn,jn , 2−n ) of
asymptotically rich balls, each one of which intersects the next one. This
implies that xn,jn is a Cauchy sequence and hence (as X is assumed com-
plete) converges to a limit x. Observe that there exist arbitrarily small rich
balls that are arbitrarily close to x, and thus x is adherent to every Fα ; since
the Fα are closed, we see that x lies in every Fα , and we are done. 
Remark 1.6.9. The hard implication (iv) =⇒ (iii) of the Heine-Borel
theorem is noticeably more complicated than any of the others. This turns
out to be unavoidable; this component of the Heine-Borel theorem turns out
to be logically equivalent to König’s lemma in the sense of reverse math-
ematics, and thus cannot be proven in sufficiently weak systems of logical
reasoning.

Any space that obeys one of the four equivalent properties in Theorem
1.6.8 is called a compact space; a subset E of a metric space X is said to be
compact if it is a compact space when viewed as a subspace of X. There are
some variants of the notion of compactness which are also of importance for
us:
• A space is σ-compact if it can be expressed as the countable union
of compact sets. (For instance, the real line R with the usual metric
is σ-compact.)
• A space is locally compact if every point is contained in the interior
of a compact set. (For instance, R is locally compact.)
• A subset of a space is precompact or relatively compact if it is con-
tained inside a compact set (or equivalently, if its closure is compact).
Another fundamental notion in the subject is that of a continuous map.
Exercise 1.6.5. Let f : X → Y be a map from one metric space (X, dX )
to another (Y, dY ). Then the following are equivalent:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
76 1. Real analysis

• Metric continuity. For every x ∈ X and ε > 0, there exists δ > 0


such that dY (f (x), f (x )) ≤ ε whenever dX (x, x ) ≤ δ.
• Sequential continuity. For every sequence xn ∈ X that converges to
a limit x ∈ X, f (xn ) converges to f (x).
• Topological continuity. The inverse image f −1 (V ) of every open set
V in Y , is an open set in X.
• The inverse image f −1 (F ) of every closed set F in Y , is a closed set
in X.
A function f obeying any one of the properties in Exercise 1.6.5 is known
as a continuous map.
Exercise 1.6.6. Let X, Y, Z be metric spaces, and let f : X → Y and
g : X → Z be continuous maps. Show that the combined map f ⊕ g : X →
Y × Z defined by f ⊕ g(x) := (f (x), g(x)) is continuous if and only if f and
g are continuous. Show also that the projection maps πY : Y × Z → Y ,
πZ : Y × Z → Z defined by πY (y, z) := y, πZ (y, z) := z are continuous.
Exercise 1.6.7. Show that the image of a compact set under a continuous
map is again compact.
1.6.2. Topological spaces. Metric spaces capture many of the notions of
convergence and continuity that one commonly uses in real analysis, but
there are several such notions (e.g., pointwise convergence, semicontinuity,
or weak convergence) in the subject that turn out to not be modeled by
metric spaces. A very useful framework to handle these more general modes
of convergence and continuity is that of a topological space, which one can
think of as an abstract generalisation of a metric space in which the metric
and balls are forgotten, and the open sets become the central object.9
Definition 1.6.10 (Topological space). A topological space X = (X, F) is
a set X, together with a collection F of subsets of X, known as open sets,
which obey the following axioms:
• ∅ and X are open.
• The intersection of any finite number of open sets is open.
• The union of any arbitrary number of open sets is open.
The collection F is called a topology on X.
Given two topologies F , F  on a space X, we say that F is a coarser (or
weaker ) topology than F  (or equivalently, that F  is a finer (or stronger)
topology than F ) if F ⊂ F  (informally, F  has more open sets than F.)
9 There are even more abstract notions, such as pointless topological spaces, in which the

collection of open sets has become an abstract lattice, in the spirit of Section 2.3, but we will not
need such notions in this course.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 77

Example 1.6.11. Every metric space (X, d) generates a topology Fd ,


namely the space of sets which are open with respect to the metric d. Ob-
serve that if two metrics d, d on X are equivalent in the sense that
(1.60) cd(x, y) ≤ d (x, y) ≤ Cd(x, y)
for all x, y in X and some constants c, C > 0, then they generate identical
topologies.
Example 1.6.12. The finest (or strongest) topology on any set X is the
discrete topology 2X = {E : E ⊂ X}, in which every set is open; this is the
topology generated by the discrete metric (Example 1.6.5). The coarsest (or
weakest) topology is the trivial topology {∅, X}, in which only the empty set
and the full set are open.
Example 1.6.13. Given any collection A of sets of X, we can define the
topology F[A] generated by A to be the intersection of all the topologies
that contain A; this is easily seen to be the coarsest topology that makes all
the sets in A open. For instance, the topology generated by a metric space
is the same as the topology generated by its open balls.
Example 1.6.14. If (X, F ) is a topological space, and Y is a subset of X,
then we can define the relative topology F Y := {E ∩ Y : E ∈ F } to be the
collection of all open sets in X, restricted to Y . This makes (Y, F Y ) a
topological space, known as a subspace of (X, F ).

Any notion in metric space theory which can be defined purely in terms
of open sets, can now be defined for topological spaces. Thus for instance:
Definition 1.6.15. Let (X, F) be a topological space.
• A sequence xn of points in X converges to a limit x ∈ X if and only
if every open neighbourhood of x (i.e., an open set containing x)
contains xn for all sufficiently large n. In this case we write xn → x
in the topological space (X, F ), and (if x is unique) we write x =
limn→∞ xn .
• A point is a sequentially adherent point of a set E if it is the limit of
some sequence in E.
• A point x is an adherent point of a set E if and only if every open
neighbourhood of x intersects E.
• The set of all adherent points of E is called the closure of E and is
denoted E.
• A set E is closed if and only if its complement is open, or equivalently
if it contains all its adherent points.
• A set E is dense if and only if every non-empty open set intersects
E, or equivalently if its closure is X.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
78 1. Real analysis

• The interior of a set E is the union of all the open sets contained
in E, and x is called an interior point of E if and only if some
neighbourhood of x is contained in E.
• A space X is sequentially compact if every sequence has a convergent
subsequence.
• A space X is compact if every open cover has a finite subcover.
• The concepts of being σ-compact, locally compact, and precom-
pact can be defined as before. (One could also define sequential
σ-compactness, etc., but these notions are rarely used.)
• A map f : X → Y between topological spaces is sequentially contin-
uous if whenever xn converges to a limit x in X, f (xn ) converges to
a limit f (x) in Y .
• A map f : X → Y between topological spaces is continuous if the
inverse image of every open set is open.
Remark 1.6.16. The stronger a topology becomes, the more open and
closed sets it will have, but fewer sequences will converge, there are fewer
(sequentially) adherent points and (sequentially) compact sets, closures be-
come smaller, and interiors become larger. There will be more (sequentially)
continuous functions on this space, but fewer (sequentially) continuous func-
tions into the space. Note also that the identity map from a space X with
one topology F to the same space X with a different topology F  is contin-
uous precisely when F is stronger than F  .
Example 1.6.17. In a metric space, these topological notions coincide with
their metric counterparts, and sequential compactness and compactness are
equivalent, as are sequential continuity and continuity.
Exercise 1.6.8 (Urysohn’s subsequence principle). Let xn be a sequence
in a topological space X, and let x be another point in X. Show that the
following are equivalent:
• xn converges to x.
• Every subsequence of xn converges to x.
• Every subsequence of xn has a further subsequence that converges to
x.
Exercise 1.6.9. Show that every sequentially adherent point is an adherent
point, and every continuous function is sequentially continuous.
Remark 1.6.18. The converses to Exercise 1.6.9 are unfortunately not
always true in general topological spaces. For instance, if we endow an
uncountable set X with the cocountable topology (so that a set is open if
it is either empty or its complement is at most countable), then we see

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 79

that the only convergent sequences are those which are eventually constant.
Thus, every subset of X contains its sequentially adherent points, and every
function from X to another topological space is sequentially continuous,
even though not every set in X is closed and not every function on X is
continuous. An example of a set which is sequentially compact but not
compact is the first uncountable ordinal with the order topology (Exercise
1.6.10). It is trickier to give an example of a compact space which is not
sequentially compact; this will have to wait until we establish Tychonoff’s
theorem (Theorem 1.8.14). However one can “fix” this discrepancy between
the sequential and non-sequential concepts by replacing sequences with the
more general notion of nets; see Section 1.6.3.

Remark 1.6.19. Metric space concepts such as boundedness, completeness,


Cauchy sequences, and uniform continuity do not have counterparts for gen-
eral topological spaces, because they cannot be defined purely in terms of
open sets. (They can however be extended to some other types of spaces,
such as uniform spaces or coarse spaces.)

Now we give some important topologies that capture certain modes of


convergence or continuity that are difficult or impossible to capture using
metric spaces alone.

Example 1.6.20 (Zariski topology). This topology is important in alge-


braic geometry, though it will not be used in this course. If F is an al-
gebraically closed field, we define the Zariski topology on the vector space
F n to be the topology generated by the complements of proper algebraic
varieties in F n . Thus a set is Zariski open if it is either empty or is the
complement of a finite union of proper algebraic varieties. A set in F n is
then Zariski dense if it is not contained in any proper subvariety and the
Zariski closure of a set is the smallest algebraic variety that contains that
set.

Example 1.6.21 (Order topology). Any totally ordered set (X, <) gen-
erates the order topology, defined as the topology generated by the sets
{x ∈ X : x > a} and {x ∈ X : x < a} for all a ∈ X. In particular, the
extended real line [−∞, +∞] can be given the order topology, and the no-
tion of convergence of sequences in this topology to either finite or infinite
limits is identical to the notion one is accustomed to in undergraduate real
analysis. (On the real line, of course, the order topology corresponds to the
usual topology.) Also observe that a function n → xn from the extended
natural numbers N ∪ {+∞} (with the order topology) into a topological
space X is continuous if and only if xn → x+∞ as n → ∞, so one can
interpret convergence of sequences as a special case of continuity.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
80 1. Real analysis

Exercise 1.6.10. Let ω be the first uncountable ordinal, endowed with


the order topology. Show that ω is sequentially compact (Hint: Every
sequence has a lim sup), but not compact (Hint: Every point has a countable
neighbourhood).
Example 1.6.22 (Half-open topology). The right half-open topology Fr on
the real line R is the topology generated by the right half-open intervals
[a, b) for −∞ < a < b < ∞; this is a bit finer than the usual topology on
R. Observe that a sequence xn converges to a limit x in the right half-open
topology if and only if it converges in the ordinary topology F , and also
if xn ≥ x for all sufficiently large x. Observe that a map f : R → R is
right-continuous iff it is a continuous map from (R, Fr ) to (R, F ). One can
of course model left-continuity via a suitable left half-open topology in a
similar fashion.
Example 1.6.23 (Upper topology). The upper topology Fu on the real
line is defined as the topology generated by the sets (a, +∞) for all a ∈
R. Observe that (somewhat confusingly) a function f : R → R is lower
semicontinuous iff it is continuous from (R, F) to (R, Fu ). One can of
course model upper semicontinuity via a suitable lower topology in a similar
fashion.
Example 1.6.24 (Product topology). Let Y X be the space of all functions
f : X → Y from a set X to a topological space Y . We define the product
topology on Y X to be the topology generated by the sets {f ∈ Y X : f (x) ∈
V } for all x ∈ X and all open V ⊂ Y . Observe that a sequence of functions
fn : X → Y converges pointwise to a limit f : X → Y iff it converges in
the product topology. We will study the product topology in more depth in
Section 1.8.3.
Example 1.6.25 (Product topology, again). If (X, FX ) and (Y, FY ) are two
topological spaces, we can define the product space (X × Y, FX × FY ) to be
the Cartesian product X × Y with the topology generated by the product
sets U × V , where U and V are open in X and Y , respectively. Observe
that two functions f : Z → X, g : Z → Y from a topological space Z are
continuous if and only if their direct sum f : Z → X × Y is continuous in
the product topology, and also that the projection maps πX : X × Y → X
and πY : X × Y → Y are continuous (cf. Exercise 1.6.6).
We mention that not every topological space can be generated from
a metric (such topological spaces are called metrisable). One important
obstruction to this arises from the Hausdorff property:
Definition 1.6.26. A topological space X is said to be a Hausdorff space
if for any two distinct points x, y in X there exist disjoint neighbourhoods
Vx , Vy of x and y, respectively.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 81

Example 1.6.27. Every metric space is Hausdorff (one can use the open
balls B(x, d(x, y)/2) and B(y, d(x, y)/2) as the separating neighbourhoods).
On the other hand, the trivial topology (Example 1.6.13) on two or more
points is not Hausdorff, and neither is the cocountable topology (Remark
1.6.18) on an uncountable set, or the upper topology (Example 1.6.23) on
the real line. Thus, these topologies do not arise from a metric.
Exercise 1.6.11. Show that the half-open topology (Example 1.6.22) is
Hausdorff, but does not arise from a metric. (Hint: Assume for contradiction
that the half-open topology did arise from a metric. Then show that for
every real number x there exists a rational number q and a positive integer
n such that the ball of radius 1/n centred at q has infimum x.) Thus there are
more obstructions to metrisability than just the Hausdorff property; a more
complete answer is provided by Urysohn’s metrisation theorem (Theorem
2.5.7).
Exercise 1.6.12. Show that in a Hausdorff space, any sequence can have at
most one limit. (For a more precise statement, see Exercise 1.6.16 below.)

A homeomorphism (or topological isomorphism) between two topological


spaces is a continuous invertible map f : X → Y whose inverse f −1 : Y →
X is also continuous. Such a map identifies the topology on X with the
topology on Y , and so any topological concept of X will be preserved by f
to the corresponding topological concept of Y . For instance, X is compact
if and only if Y is compact, X is Hausdorff if and only if Y is Hausdorff, x
is adherent to E if and only if f (x) is adherent to f (E), and so forth. When
there is a homeomorphism between two topological spaces, we say that X
and Y are homeomorphic (or topologically isomorphic).
Example 1.6.28. The tangent function is a homeomorphism between
(−π/2, π/2) and R (with the usual topologies), and thus it preserves all
topological structures on these two spaces. Note however that the former
space is bounded as a metric space while the latter is not, and the latter
is complete while the former is not. Thus metric properties such as bound-
edness or completeness are not purely topological properties, since they are
not preserved by homeomorphisms.

1.6.3. Nets (optional). A sequence (xn )∞


n=1 in a space X can be viewed
as a function from the natural numbers N to X. We can generalise this
concept as follows.
Definition 1.6.29 (Nets). A net in a space X is a tuple (xα )α∈A , where
A = (A, <) is a directed set (i.e., a partially ordered set such that any two
elements have at least one upper bound) and xα ∈ X for each α ∈ A. We
say that a statement P (α) holds for sufficiently large α in a directed set A

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
82 1. Real analysis

if there exists β ∈ A such that P (α) holds for all α ≥ β. (Note in particular
that if P (α) and Q(α) separately hold for sufficiently large α, then their
conjunction P (α) ∧ Q(α) also holds for sufficiently large α.)
A net (xα )α∈A in a topological space X is said to converge to a limit
x ∈ X if for every neighbourhood V of x, we have xα ∈ V for all sufficiently
large α.
A subnet of a net (xα )α∈A is a tuple of the form (xφ(β) )β∈B , where
(B, <) is another directed set, and φ : B → A is a monotone map (thus
φ(β  ) ≥ φ(β) whenever β  ≥ β) which also has cofinal image. This means
that for any α ∈ A there exists β ∈ B with φ(β) ≥ α (in particular, if P (α)
is true for sufficiently large α, then P (φ(β)) is true for sufficiently large β).
Remark 1.6.30. Every sequence is a net, but one can create nets that do
not arise from sequences (in particular, one can take A to be uncountable).
Note a subtlety in the definition of a subnet—we do not require φ to be
injective, so B can in fact be larger than A! Thus subnets differ a little bit
from subsequences in that they allow repetitions.
Remark 1.6.31. Given a directed set A, one can endow A∪{+∞} with the
topology generated by the singleton sets {α} with α ∈ A together with the
sets [α, +∞] := {β ∈ A∪{+∞} : β ≥ α} for α ∈ A, with the convention that
+∞ > α for all α ∈ A. The property of being directed is precisely saying
that these sets form a base. A net (xα )α∈A converges to a limit x+∞ if and
only if the function α → xα is continuous on A∪{+∞} (cf. Example 1.6.21).
Also, if (xφ(β) )β∈B is a subnet of (xα )α∈A , then φ is a continuous map from
B ∪ {+∞} to A ∪ {+∞}, if we adopt the convention that φ(+∞) = +∞.
In particular, a subnet of a convergent net remains convergent to the same
limit.

The point of working with nets instead of sequences is that one no longer
needs to worry about the distinction between sequential and non-sequential
concepts in topology, as the following exercises show.
Exercise 1.6.13. Let X be a topological space, let E be a subset of X, and
let x be an element of X. Show that x is an adherent point of E if and only
if there exists a net (xα )α∈A in E that converges to x. (Hint: Take A to be
the directed set of neighbourhoods of x, ordered by reverse set inclusion.)
Exercise 1.6.14. Let f : X → Y be a map between two topological spaces.
Show that f is continuous if and only if for every net (xα )α∈A in X that
converges to a limit x, the net (f (xα ))α∈A converges in Y to f (x).
Exercise 1.6.15. Let X be a topological space. Show that X is compact if
and only if every net has a convergent subnet. (Hint: Equate both properties
of X with the finite intersection property, and review the proof of Theorem

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 83

1.6.8.) Similarly, show that a subset E of X is relatively compact if and only


if every net in E has a subnet that converges in X. (Note that as not every
compact space is sequentially compact, this exercise shows that we cannot
enforce injectivity of φ in the definition of a subnet.)
Exercise 1.6.16. Show that a space is Hausdorff if and only if every net
has at most one limit.
Exercise 1.6.17. In the product space Y X in Example 1.6.24, show that a
net (fα )α∈A converges in Y X to f ∈ Y X if and only if for every x ∈ X, the
net (fα (x))α∈A converges in Y to f (x).

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/30.
Thanks to Franciscus Rebro, johan, Josh Zahl, Xiaochuan Liu, and anony-
mous commenters for corrections.
An anonymous commenter pointed out that while the real line can be
viewed very naturally as the metric completion of the rationals, this cannot
quite be used to give a definition of the real numbers, because the notion of a
metric itself requires the real numbers in its definition! However, K. P. Hart
noted that Bourbaki resolves this problem by defining the reals as the com-
pletion of the rationals as a uniform space rather than as a metric space.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.7

The Baire category


theorem and its
Banach space
consequences

The notion of what it means for a subset E of a space X to be small


varies from context to context. For instance, in measure theory, when
X = (X, X , μ) is a measure space, one useful notion of a small set is that of
a null set: a set E of measure zero (or at least contained in a set of measure
zero). By countable additivity, countable unions of null sets are null. Taking
contrapositives, we obtain
Lemma 1.7.1 (Pigeonhole principle for measure spaces). Let E1 , E2 , . . .
be an at
 most countable sequence of measurable subsets of a measure space
X. If n En has positive measure, then at least one of the En has positive
measure.

Now suppose that X was a Euclidean space Rd with Lebesgue measure


m. The Lebesgue differentiation theorem easily implies that having positive
measure is equivalent to being dense in certain balls:
Proposition 1.7.2. Let E be a measurable subset of Rd . Then the following
are equivalent:
• E has positive measure.
• For any ε > 0, there exists a ball B such that m(E ∩ B) ≥
(1 − ε)m(B).

85

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
86 1. Real analysis

Thus one can think of a null set as a set which is nowhere dense in some
measure-theoretic sense.
It turns out that there are analogues of these results when the measure
space X = (X, X , μ) is replaced instead by a complete metric space X =
(X, d). Here, the appropriate notion of a small set is not a null set, but
rather that of a nowhere dense set: a set E which is not dense in any ball,
or equivalently a set whose closure has empty interior. (A good example of
a nowhere dense set would be a proper subspace, or smooth submanifold, of
Rd , or a Cantor set; on the other hand, the rationals are a dense subset of R
and thus clearly not nowhere dense.) We then have the following important
result:
Theorem 1.7.3 (Baire category theorem). Let E1 , E2 , . . . be 
an at most
countable sequence of subsets of a complete metric space X. If n En con-
tains a ball B, then at least one of the En is dense in a subball B  of B
(and in particular is not nowhere dense). To put it in the contrapositive:
the countable union of nowhere dense sets cannot contain a ball.
Exercise 1.7.1. Show that the Baire category theorem is equivalent to the
claim that in a complete metric space, the countable intersection of open
dense sets remain dense.
Exercise 1.7.2. Using the Baire category theorem, show that any non-
empty complete metric space without isolated points is uncountable. (In
particular, this shows that the Baire category theorem can fail for incomplete
metric spaces such as the rationals Q.)
To quickly illustrate an application of the Baire category theorem, ob-
serve that it implies that one cannot cover a finite-dimensional real or com-
plex vector space Rn , Cn by a countable number of proper subspaces. One
can of course also establish this fact by using Lebesgue measure on this
space. However, the advantage of the Baire category approach is that it
also works well in infinite dimensional complete normed vector spaces, i.e.,
Banach spaces, whereas the measure-theoretic approach runs into significant
difficulties in infinite dimensions. This leads to three fundamental equiva-
lences between the qualitative theory of continuous linear operators on Ba-
nach spaces (e.g., finiteness, surjectivity, etc.) to the quantitative theory
(i.e., estimates):
• The uniform boundedness principle that equates the qualitative
boundedness (or convergence) of a family of continuous operators
with their quantitative boundedness.
• The open mapping theorem that equates the qualitative solvability
of a linear problem Lu = f with the quantitative solvability of that
problem.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 87

• The closed graph theorem that equates the qualitative regularity of


a (weakly continuous) operator T with the quantitative regularity of
that operator.

Strictly speaking, these theorems are not used much directly in practice,
because one usually works in the reverse direction (i.e., first proving quan-
titative bounds, and then deriving qualitative corollaries). But the above
three theorems help explain why we usually approach qualitative problems
in functional analysis via their quantitative counterparts.
Let us first prove the Baire category theorem:

Proof of Baire category theorem. Assume that the Baire category the-
orem failed; then it would be possible to cover a ball B(x0 , r0 ) in a complete
metric space by a countable family E1 , E2 , E3 , . . . of nowhere dense sets.
We now invoke the following easy observation: if E is nowhere dense,
then every ball B contains a subball B  which is disjoint from E. Indeed,
this follows immediately from the definition of a nowhere dense set.
Invoking this observation, we can find a ball B(x1 , r1 ) in B(x0 , r0 /10)
(say) which is disjoint from E1 ; we may also assume that r1 ≤ r0 /10 by
shrinking r1 as necessary. Then, inside B(x1 , r1 /10), we can find a ball
B(x2 , r2 ) which is also disjoint from E2 , with r2 ≤ r1 /10. Continuing this
process, we end up with a nested sequence of balls B(xn , rn ), each of which
are disjoint from E1 , . . . , En , and such that B(xn , rn ) ⊂ B(xn−1 , rn−1 /10)
and rn ≤ rn−1 /10 for all n = 1, 2, . . ..
From the triangle inequality we have d(xn , xn−1 ) ≤ 2rn−1 /10 ≤ 2 ×
10−n r0 ,
and so the sequence xn is a Cauchy sequence. As X is complete,
xn converges to a limit x. Summing the geometric series, one verifies that
x ∈ B(xn−1 , rn−1 ) for all n = 1, 2, . . ., and in particular x is an element of
B which avoids all of E1 , E2 , E3 , . . ., a contradiction. 

We can illustrate the analogy between the Baire category theorem and
the measure-theoretic analogs by introducing some further definitions. Call
a set E meager or of the first category if it can be expressed (or covered)
by a countable union of nowhere dense sets, and of the second category if it
is not meager. Thus, the Baire category theorem shows that any subset of
a complete metric space with non-empty interior is of the second category,
which may help explain the name for the property. Call a set co-meager or
residual if its complement is meager, and call a set Baire or almost open if it
differs from an open set by a meager set (note that a Baire set is unrelated to
the Baire σ-algebra). Then we have the following analogy between complete
metric space topology, and measure theory:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
88 1. Real analysis

Complete non-empty Measure space X


metric space X of positive measure
first category (meager) zero measure (null)
second category positive measure
residual (co-meager) full measure (co-null)
Baire measurable

Nowhere dense sets are meager, and meager sets have empty interior.
Contrapositively, sets with dense interior are residual, and residual sets are
somewhere dense. Taking complements instead of contrapositives, we see
that open dense sets are co-meager, and co-meager sets are dense.
While there are certainly many analogies between meager sets and null
sets (for instance, both classes are closed under countable unions or under
intersections with arbitrary sets), the two concepts can differ in practice.
For instance, in the real line R with the standard metric and measure space
structures, the set
∞
(1.61) (qn − 2−n , qn + 2−n ),
n=1
where q1 , q2 , . . . is an enumeration of the rationals, is open and dense but
has Lebesgue measure at most 2; thus its complement has infinite measure
in R but is nowhere dense (hence meager). As a variant of this, the set
∞ 
 ∞
(1.62) (qn − 2−n /m, qn + 2−n /m)
m=1 n=1
is a null set but is the intersection of countably many open dense sets and
is thus co-meager.
Exercise 1.7.3. A real number x is Diophantine if for every ε > 0 there
exists cε > 0 such that |x − aq | ≥ |q|c2+ε
ε
for every rational number aq . Show
that the set of Diophantine real numbers has full measure but is meager.
Remark 1.7.4. If one assumes some additional axioms of set theory (e.g.,
the continuum hypothesis), it is possible to show that the collection of mea-
ger subsets of R and the collection of null subsets of R (viewed as σ-ideals of
the collection of all subsets of R) are isomorphic; this is the Sierpinski-Erdős
theorem, which we will not prove here. Roughly speaking, this theorem tells
us that any effective first-order statement which is true about meager sets
will also be true about null sets, and conversely.

1.7.1. The uniform boundedness principle. As mentioned in the intro-


duction, the Baire category theorem implies various equivalences between
qualitative and quantitative properties of linear transformations between

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 89

Banach spaces. Note that Lemma 1.3.17 already gave a prototypical such
equivalence between a qualitative property (continuity) and a quantitative
one (boundedness).

Theorem 1.7.5 (Uniform boundedness principle). Let X be a Banach


space, let Y be a normed vector space, and let (Tα )α∈A be a family of con-
tinuous linear operators Tα : X → Y . Then the following are equivalent:
• Pointwise boundedness. For every x ∈ X, the set {Tα x : α ∈ A} is
bounded.
• Uniform boundedness. The operator norms {Tα op : α ∈ A} are
bounded.

The uniform boundedness principle is also known as the Banach-Stein-


haus theorem.

Proof. It is clear that (ii) implies (i); now assume (i) holds and let us obtain
(ii).
For each n = 1, 2, . . ., let En be the set
(1.63) En := {x ∈ X : Tα xY ≤ n for all α ∈ A}.
The hypothesis (i) is nothing more than the assertion that the En cover X,
and thus by the Baire category theorem must be dense in a ball. Since the
Tα are continuous, the En are closed, and so one of the En contains a ball.
Since En − En ⊂ E2n , we see that one of the En contains a ball centred at
the origin. Dilating n as necessary, we see that one of the En contains the
unit ball B(0, 1). But then all the Tα op are bounded by n, and the claim
follows. 

Exercise 1.7.4. Give counterexamples to show that the uniform bounded-


ness principle fails when one relaxes the assumptions in any of the following
ways:
• X is merely a normed vector space rather than a Banach space (i.e.,
completeness is dropped).
• The Tα are not assumed to be continuous.
• The Tα are allowed to be non-linear rather than linear.
Thus completeness, continuity, and linearity are all essential for the uniform
boundedness principle to apply.

Remark 1.7.6. It is instructive to establish the uniform boundedness prin-


ciple more “constructively” without the Baire category theorem (though the
proof of the Baire category theorem is still implicitly present), as follows.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
90 1. Real analysis

Suppose that (ii) fails, then Tα op is unbounded. We can then find a se-
quence αn ∈ A such that Tαn+1 op > 100n Tαn op (say) for all n. We can
then find unit vectors xn such that Tαn xn Y ≥ 12 Tαn op .
We can then form the absolutely convergent
∞ (and hence conditionally
convergent, by completeness) sum x = n=1 n 10−n xn for some choice of
signs n = ±1 recursively as follows: Once 1 , . . . , n−1 have been chosen,
choose the sign n so that
n
1
(1.64)  m 10−m Tαm xm Y ≥ 10−n Tαn xn Y ≥ 10−n Tαn op .
2
m=1
From the triangle inequality we soon conclude that
1
(1.65) Tαn xY ≥ 10−n Tαn op .
4
But by hypothesis, the right-hand side of (1.65) is unbounded in n, contra-
dicting (i).

A common way to apply the uniform boundedness principle is via the


following corollary:
Corollary 1.7.7 (Uniform boundedness principle for norm convergence).
Let X be a Banach space, let Y be a normed vector space, and let (Tn )∞n=1
be a family of continuous linear operators Tn : X → Y . Then the following
are equivalent:
(i) Pointwise convergence. For every x ∈ X, Tn x converges strongly in
Y as n → ∞.
(ii) Pointwise convergence to a continuous limit. There exists a contin-
uous linear T : X → Y such that for every x ∈ X, Tn x converges
strongly in Y to T x as n → ∞.
(iii) Uniform boundedness + dense subclass convergence. The operator
norms {Tn  : n = 1, 2, . . .} are bounded, and for a dense set of x in
X, Tn x converges strongly in Y as n → ∞.

Proof. Clearly (ii) implies (i), and as convergent sequences are bounded,
we see from Theorem 1.7.3 that (i) implies (iii). The implication of (ii) from
(iii) follows by a standard limiting argument and is left as an exercise. 
Remark 1.7.8. The same equivalences hold if one replaces the sequence
(Tn )∞
n=1 by a net (Tα )α∈A .

Example 1.7.9 (Fourier inversion formula). For any f ∈ L2 (R) and N > 0,
define the Dirichlet summation operator
N
(1.66) SN f (x) := fˆ(ξ)e2πixξ dξ,
−N

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 91

where fˆ is the Fourier transform of f , defined on smooth compactly sup-



ported functions f ∈ C0∞ (R) by the formula fˆ(ξ) := −∞ f (x)e−2πixξ dx
and then extended to L2 by the Plancherel theorem (see Section 1.12). Us-
ing the Plancherel identity, we can verify that the operator norms SN op
are uniformly bounded (indeed, they are all 1); also, one can check that for
f ∈ C0∞ (R), SN f converges in L2 norm to f as N → ∞. As C0∞ (R) is
known to be dense in L2 (R), this implies that SN f converges in L2 norm
to f for every f ∈ L2 (R).
This argument only used the “easy” implication of Corollary 1.7.7,
namely the deduction of (ii) from (iii). The “hard” implication, using the
Baire category theorem was not directly utilised. However, from a meta-
mathematical standpoint, that implication is important because it tells us
that the above strategy to prove convergence in norm of the Fourier inver-
sion formula on L2 , i.e., to obtain uniform operator norms on the partial
sums and to establish convergence on a dense subclass of “nice” functions,
is in some sense the only strategy available to prove such a result.

Remark 1.7.10. There is a partial analogue of Corollary 1.7.7 for the ques-
tion of pointwise almost everywhere convergence rather than norm conver-
gence, which is known as Stein’s maximal principle (discussed, for instance,
in Section 1.9 of Structure and Randomness). For instance, it reduces Car-
leson’s theorem on the pointwise almost everywhere convergence of Fourier
series to the boundedness of a certain maximal function (the Carleson maxi-
mal operator) related to Fourier summation, although the latter task is again
quite non-trivial. (As in Example 1.7.9, the role of the maximal principle is
meta-mathematical rather than direct.)

Remark 1.7.11. Of course, if we omit some of the hypotheses, it is no


longer true that pointwise boundedness and uniform boundedness are the
same. For instance, if we let c0 (N) be the space of complex sequences
with only finitely many non-zero entries and with the uniform topology,
and let λn : c0 (N) → C be the map (am )∞ m=1 → nan , then the λn are
pointwise bounded but not uniformly bounded; thus completeness of X is
important. Also, even in the one-dimensional case X = Y = R, the uniform
boundedness principle can easily be seen to fail if the Tα are non-linear
transformations rather than linear ones.

1.7.2. The open mapping theorem. A map f : X → Y between topo-


logical spaces X and Y is said to be open if it maps open sets to open sets.
This is similar to, but slightly different, from the more familiar property
of being continuous, which is equivalent to the inverse image of open sets
being open. For instance, the map f : R → R defined by f (x) := x2 is

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
92 1. Real analysis

continuous but not open; conversely, the function g : R2 → R defined by


g(x, y) := sgn(y) + x is discontinuous but open.
We have just seen that it is quite possible for non-linear continuous maps
to fail to be open. But for linear maps between Banach spaces, the situation
is much better:

Theorem 1.7.12 (Open mapping theorem). Let L : X → Y be a continu-


ous linear transformation between two Banach spaces X and Y . Then the
following are equivalent:

(i) L is surjective.
(ii) L is open.
(iii) Qualitative solvability. For every f ∈ Y there exists a solution u ∈ X
to the equation Lu = f .
(iv) Quantitative solvability. There exists a constant C > 0 such that for
every f ∈ Y there exists a solution u ∈ X to the equation Lu = f ,
which obeys the bound uX ≤ Cf Y .
(v) Quantitative solvability for a dense subclass. There exists a constant
C > 0 such that for a dense set of f in Y , there exists a solution u ∈
X to the equation Lu = f , which obeys the bound uX ≤ Cf Y .

Proof. Clearly (iv) implies (iii), which is equivalent to (i), and it is easy to
see from linearity that (ii) and (iv) are equivalent (cf. the proof of Lemma
1.3.17). (iv) trivially implies (v), while to conversely obtain (iv) from (v),
observe that if E is any dense subset of the Banach space Y, then any f in Y
can be expressed as an absolutely convergent series f = n fn of elements
 −1
in E (since one can iteratively approximate the residual f − N n=1 fn to
arbitrary accuracy by an element of E for N = 1, 2, 3, . . .), and the claim
easily follows. So it suffices to show that (iii) implies (iv).
For each n, let En ⊂ Y be the set of all f ∈ Y for which there exists a
 to Lu = f with uX ≤ nf Y . From the hypothesis (iii), we see
solution
that n En = Y . Since Y is complete, the Baire category theorem implies
that there is some En which is dense in some ball B(f0 , r) in Y . In other
words, the problem Lu = f is approximately quantitatively solvable in the
ball B(f0 , r) in the sense that for every ε > 0 and every f ∈ B(f0 , r), there
exists an approximate solution u with Lu − f Y ≤ ε and uX ≤ nLuY ,
and thus uX ≤ nr + nε.
By subtracting two such approximate solutions, we conclude that for
any f ∈ B(0, 2r) and any ε > 0, there exists u ∈ X with Lu − f Y ≤ 2ε
and uX ≤ 2nr + 2nε.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 93

Since L is homogeneous, we can rescale and conclude that for any f ∈ Y


and any ε > 0 there exists u ∈ X with Lu − f Y ≤ 2ε and uX ≤
2nf Y + 2nε.
In particular, setting ε = 14 f Y (treating the case f = 0 separately),
we conclude that for any f ∈ Y , we may write f = Lu + f  , where f  Y ≤
2 f Y and uX ≤ 2 nf Y .
1 5

We can iterate this procedure and then take limits (now using the com-
pleteness of X rather than Y ) to obtain a solution to Lu = f for every
f ∈ Y with uX ≤ 5nf Y , and the claim follows. 
Remark 1.7.13. The open mapping theorem provides metamathematical
justification for the method of a priori estimates for solving linear equations
such as Lu = f for a given datum f ∈ Y and for an unknown u ∈ X, which
is of course a familiar problem in linear PDE. The a priori method assumes
that f is in some dense class of nice functions (e.g., smooth functions) in
which solvability of Lu = f is presumably easy, and then proceeds to obtain
the a priori estimate uX ≤ Cf Y for some constant C. Theorem 1.7.12
then assures that Lu = f is solvable for all f in Y (with a similar bound).
As before, this implication does not directly use the Baire category theorem,
but that theorem helps explain why this method is not wasteful.
A pleasant corollary of the open mapping theorem is that, as with or-
dinary linear algebra or with arbitrary functions, invertibility is the same
thing as bijectivity:
Corollary 1.7.14. Let T : X → Y be a continuous linear operator between
two Banach spaces X, Y . Then the following are equivalent:
• Qualitative invertibility. T is bijective.
• Quantitative invertibility. T is bijective, and T −1 : Y → X is a
continuous (hence bounded) linear transformation.
Remark 1.7.15. The claim fails without the completeness hypotheses on
X and Y . For instance, consider the operator T : cc (N) → cc (N) defined
by T (an )∞ an ∞
n=1 := ( n )n=1 , where we give cc (N) the uniform norm. Then T is
continuous and bijective, but T −1 is unbounded.
Exercise 1.7.5. Show that Corollary 1.7.14 can still fail if we drop the
completeness hypothesis on just X or just Y .
Exercise 1.7.6. Suppose that L : X → Y is a surjective continuous lin-
ear transformation between Banach spaces. By combining the open map-
ping theorem with the Hahn-Banach theorem, show that the transpose map
L∗ : Y ∗ → X ∗ is bounded from below, i.e., there exists c > 0 such that
L∗ λX ∗ ≥ cλY ∗ for all λ ∈ Y ∗ . Conclude that L∗ is an isomorphism
between Y ∗ and L∗ (Y ∗ ).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
94 1. Real analysis

Let L be as in Theorem 1.7.12, so that the problem Lu = f is both


qualitatively and quantitatively solvable. A standard application of Zorn’s
lemma (similar to that used to prove the Hahn-Banach theorem) shows that
the problem Lu = f is also qualitatively linearly solvable, in the sense that
there exists a linear transformation S : Y → X such that LSf = f for all
f ∈ Y (i.e., S is a right-inverse of L). In view of the open mapping theorem,
it is then tempting to conjecture that L must also be quantitatively linearly
solvable, in the sense that there exists a continuous linear transformation
S : Y → X such that LSf = f for all f ∈ Y . By Corollary 1.7.14, we see that
this conjecture is true when the problem Lu = f is determined, i.e., there is
exactly one solution u for each datum f . Unfortunately, the conjecture can
fail when Lu = f is underdetermined (more than one solution u for each f );
we discuss this in Section 1.7.4. On the other hand, the situation is much
better for Hilbert spaces:
Exercise 1.7.7. Suppose that L : H → H  is a surjective continuous linear
transformation between Hilbert spaces. Show that there exists a continuous
linear transformation S : H  → H such that LS = I. Furthermore, show
that we can ensure that the range of S is orthogonal to the kernel of L, and
that this condition determines S uniquely.
Remark 1.7.16. In fact, Hilbert spaces are essentially the only type of
Banach space for which we have this nice property, due to the Lindenstrauss-
Tzafriri solution [LiTz1971] of the complemented subspaces problem.
Exercise 1.7.8. Let M and N be closed subspaces of a Banach space X.
Show that the following statements are equivalent:
(i) Qualitative complementation. Every x in X can be expressed in the
form m + n for m ∈ M, n ∈ N in exactly one way.
(ii) Quantitative complementation. Every x in X can be expressed in the
form m + n for m ∈ M, n ∈ N in exactly one way. Furthermore there
exists C > 0 such that mX , nX ≤ CxX for all x.
When either of these two properties hold, we say that M (or N ) is a com-
plemented subspace, and that N is a complement of M (or vice versa).
The property of being complemented is closely related to that of quan-
titative linear solvability:
Exercise 1.7.9. Let L : X → Y be a surjective bounded linear map between
Banach spaces. Show that there exists a bounded linear map S : Y → X
such that LSf = f for all f ∈ Y if and only if the kernel {u ∈ X : Lu = 0}
is a complemented subspace of X.
Exercise 1.7.10. Show that any finite-dimensional or closed finite co-di-
mensional subspace of a Banach space is complemented.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 95

Remark 1.7.17. The problem of determining whether a given closed sub-


space of a Banach space is complemented or not is, in general, quite diffi-
cult. However, non-complemented subspaces do exist in abundance; some
examples are given in the appendix, and the Lindenstrauss-Tzafriri theo-
rem [LiTz1971] asserts that any Banach space not isomorphic to a Hilbert
space contains at least one non-complemented subspace. There is also a re-
markable construction of Gowers and Maurey [Go1993] of a Banach space
such that every subspace, other than those ruled out by Exercise 1.7.10, are
uncomplemented.

1.7.3. The closed graph theorem. Recall that a map T : X → Y be-


tween two metric spaces is continuous if and only if, whenever xn converges
to x in X, T xn converges to T x in Y . We can also define the weaker prop-
erty of being closed : an map T : X → Y is closed if and only if whenever xn
converges to x in X and T xn converges to a limit y in Y , then y is equal to
T x; equivalently, T is closed if its graph {(x, T x) : x ∈ X} is a closed subset
of X × Y . This is weaker than continuity because it has the additional re-
quirement that the sequence T xn is already convergent. (Despite the name,
closed operators are not directly related to open operators.)
Example 1.7.18. Let T : c0 (N) → c0 (N) be the transformation T (am )∞
m=1
:= (mam )∞
m=1 . This transformation is unbounded and hence discontinuous,
but one easily verifies that it is closed.

As Example 1.7.18 shows, being closed is often a weaker property than


being continuous. However, the remarkable closed graph theorem shows that
as long as the domain and range of the operator are both Banach spaces,
the two statements are equivalent:
Theorem 1.7.19 (Closed graph theorem). Let T : X → Y be a linear
transformation between two Banach spaces. Then the following are equiva-
lent:
(i) T is continuous.
(ii) T is closed.
(iii) Weak continuity. There exists some topology F on Y , weaker than the
norm topology (i.e., containing fewer open sets) but still Hausdorff,
for which T : X → (Y, F ) is continuous.

Proof. It is clear that (i) implies (iii) (just take F to equal the norm topol-
ogy). To see why (iii) implies (ii), observe that if xn → x in X and T xn → y
in norm, then T xn → y in the weaker topology F as well; but by weak
continuity T xn → T x in F . Since Hausdorff topological spaces have unique
limits, we have T x = y and so T is closed.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
96 1. Real analysis

Now we show that (ii) implies (i). If T is closed, then the graph Γ :=
{(x, T x) : x ∈ X} is a closed linear subspace of the Banach space X × Y
and is thus also a Banach space. On the other hand, the projection map
π : (x, T x) → x from Γ to X is clearly a continuous linear bijection. By
Corollary 1.7.14, its inverse x → (x, T x) is also continuous, and so T is
continuous as desired. 

We can reformulate the closed graph theorem in the following fashion:


Corollary 1.7.20. Let X, Y be Banach spaces, and suppose we have some
continuous inclusion Y ⊂ Z of Y into a Hausdorff topological vector space Z.
Let T : X → Z be a continuous linear transformation. Then the following
are equivalent.
(i) Qualitative regularity. For all x ∈ X, T x ∈ Y .
(ii) Quantitative regularity. For all x ∈ X, T x ∈ Y , and furthermore
T xY ≤ CxX for some C > 0 independent of x.
(iii) Quantitative regularity on a dense subclass. For all x in a dense
subset of X, T x ∈ Y , and furthermore T xY ≤ CxX for some
C > 0 independent of x.

Proof. Clearly (ii) implies (iii) or (i). If we have (iii), then T extends
uniquely to a bounded linear map from X to Y , which must agree with the
original continuous map from X to Z since limits in the Hausdorff space Z
are unique, and so (iii) implies (ii). Finally, if (i) holds, then we can view
T as a map from X to Y , which by Theorem 1.7.19 is continuous, and the
claim now follows from Lemma 1.3.17. 

In practice, one should think of Z as some sort of low regularity space


with a weak topology, and Y as a high regularity subspace with a stronger
topology. Corollary 1.7.20 motivates the method of a priori estimates to
establish the Y -regularity of some linear transform T x of an arbitrary el-
ement x in a Banach space X by first establishing the a priori estimate
T xY ≤ CxX for a dense subclass of nice elements of X, and then us-
ing the above corollary (and some weak continuity of T in a low regularity
space) to conclude. The closed graph theorem provides the metamathemat-
ical explanation as to why this approach is at least as powerful as any other
approach to proving regularity.
Example 1.7.21. Let 1 ≤ p ≤ 2, and let p be the dual exponent of p. To
prove that the Fourier transform fˆ of a function f ∈ Lp (R) necessarily lies

in Lp (R), it suffices to prove the Hausdorff-Young inequality
(1.67) fˆLp (R) ≤ Cp f Lp (R)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 97

for some constant Cp and all f in some suitable dense subclass of Lp (R)
(e.g., the space C0∞ (R) of smooth functions of compact support), together
with the soft observation that the Fourier transform is continuous from
Lp (R) to the space of tempered distributions, which is a Hausdorff space

into which Lp (R) embeds continuously. (We will prove this inequality in
(1.103).) One can replace the Hausdorff-Young inequality here by countless
other estimates in harmonic analysis to obtain similar qualitative regularity
conclusions.
1.7.4. Non-linear solvability (optional). In this section we give an ex-
ample of a linear equation Lu = f which can only be quantitatively solved
in a non-linear fashion. We will use a number of basic tools which we will
only cover later in this course, and so this material is optional reading.
Let X = {0, 1}N be the infinite discrete cube with the product topology;
by Tychonoff’s theorem (Theorem 1.8.14) this is a compact Hausdorff space.
The Borel σ-algebra is generated by the cylinder sets
(1.68) En := {(xm )∞
m=1 ∈ {0, 1} : xn = 1}.
N

(From a probabilistic view point, one can think of X as the event space for
flipping a countably infinite number of coins and En as the event that the
nth coin lands as heads.)
Let M (X) be the space of finite Borel measures on X; this can be verified
to be a Banach space. There is a map L : M (X) → ∞ (N) defined by
(1.69) L(μ) := (μ(En ))∞
n=1 .

This is a continuous linear transformation. The equation Lu = f is


quantitatively solvable for every f ∈ ∞ (N). Indeed, if f is an indicator
function f = 1A , then f = LδxA , where xA ∈ {0, 1}N is the sequence that
equals 1 on A and 0 outside of A, and δxA is the Dirac mass at A. The
general case then follows by expressing a bounded sequence as an integral
of indicator functions (e.g., if f takes values in [0,1], we can write f =
1
0 1{f >t} dt). Note however that this is a non-linear operation, since the
indicator 1{f >t} depends non-linearly on f .
We now claim that the equation Lu = f is not quantitatively linearly
solvable, i.e., there is no bounded linear map S : ∞ (N) → M (X) such that
LSf = f for all f ∈ ∞ (N). This fact was first observed by Banach and
Mazur; we shall give two proofs, one of a soft analysis flavour and one of a
hard analysis flavour.
We begin with the soft analysis proof, starting with a measure-theoretic
result which is of independent interest.
Theorem 1.7.22 (Nikodym convergence theorem). Let (X, B) be a mea-
surable space, and let σn : B → R be a sequence of signed finite measures

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
98 1. Real analysis

which is weakly convergent in the sense that σn (E) converges to some limit
σ(E) for each E ∈ B.
• The σn are uniformly countably additive, which means that for
any sequence E1 , E2 , . . . of disjoint measurable sets, the series
 ∞
m=1 |σn (Em )| converges uniformly in n.
• σ is a signed finite measure.

Proof. It suffices to prove the first claim, since this easily implies that σ is
also countably additive and is thence a signed finite measure. Suppose for
contradiction that the claim failed, then one ∞could find disjoint E1 , E2 , . . .
and ε > 0 such that one has lim supn→∞ m=M |σn (Em )| > ε for all M .
We now construct disjoint sets A1 , A2 , . . ., each consisting of the union of a
finite collection of the Ej , and an increasing sequence n1 , n2 , . . . of positive
integers, by the following recursive procedure:
0. Initialise k = 0.
1. Suppose recursively that n1 < · · · < n2k and A1 , . . . , Ak has already
been constructed for some k ≥ 0.
2. Choose n2k+1 > n2k so large that for all n ≥ n2k+1 , μn (A1 ∪ · · · ∪ Ak )
differs from μ(A1 ∪ · · · ∪ Ak ) by at most ε/10.
3. Choose Mk so large that Mk is larger than j for any Ej ⊂ A1 ∪· · ·∪Ak ,

and such that ∞ m=Mk |μnj (Em )| ≤ ε/100
k+1 for all 1 ≤ j ≤ 2k + 1.
∞
4. Choose n2k+2 > n2k+1 so that m=Mk |μn2k+2 (Em )| > ε.
5. Pick Ak+1 to be a finite union of the Ej with j ≥ Mk such that
|μn2k+2 (Ak+1 )| > ε/2.
6. Increment k to k + 1 and then return to step 2.

It is then a routine matter to show that if A := ∞ j=1 Aj , then |μn2k+2 (A) −
μn2k+1 (A)| ≥ ε/10 for all j, contradicting the hypothesis that μj is weakly
convergent to μ. 
Exercise 1.7.11 (Schur’s property for 1 ). Show that if a sequence in 1 (N)
is convergent in the weak topology, then it is convergent in the strong topol-
ogy.

We return now to the map S : ∞ (N) → M (X). Consider the sequence


an ∈ c0 (N) ⊂ ∞ defined by an := (1m≤n )∞ m=1 , i.e., each an is the sequence
consisting of n 1’s followed by an infinite number of 0’s. As the dual of c0 (N)
is isomorphic to 1 (N), we see from the dominated convergence theorem that
an is a weakly Cauchy sequence in c0 (N), in the sense that λ(an ) is Cauchy
for any λ ∈ c0 (N)∗ . Applying S, we conclude that S(an ) is weakly Cauchy
in M (X). In particular, using the bounded linear functionals μ → μ(E) on

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 99

M (X), we see that S(an )(E) converges to some limit μ(E) for all measurable
sets E. Applying the Nikodym convergence theorem, we see that μ is also
a signed finite measure. We then see that S(an ) converges
 in the weak
topology to μ. (One way to see this is to define ν := ∞ n=1 2 −n |S(a )| + |μ|,
n
then ν is finite and S(an ), μ are all absolutely continuous with respect to ν.
Now use the Radon-Nikodym theorem (see Section 1.2) and the fact that
L1 (ν)∗ ≡ L∞ (ν).) On the other hand, as LS = I and L and S are both
bounded, S is a Banach space isomorphism between c0 and S(c0 ). Thus
S(c0 ) is complete, hence closed, hence weakly closed (by the Hahn-Banach
theorem), and so μ = S(a) for some a ∈ c0 . By the Hahn-Banach theorem
again, this implies that an converges weakly to a ∈ c0 . But this is easily
seen to be impossible, since the constant sequence (1)∞ m=1 does not lie in c0 ,
and the claim follows.
Now we give the hard analysis proof. Let e1 , e2 , . . . be the standard basis
for ∞ (N), let N be a large number, and consider the random sums

(1.70) S(ε1 e1 + · · · + εN eN ),
where εn ∈ {−1, 1} are iid random signs. Since the ∞ norm of ε1 e1 + · · · +
εN eN is 1, we have
(1.71) S(ε1 e1 + · · · + εN eN )M (X) ≤ C
for some constant C independent of N . On the other hand, we can write
S(en ) = fn ν for some finite measure ν and some fn ∈ L1 (ν) using Radon-
Nikodym as in the previous proof, and then
(1.72) ε1 f1 + · · · + εN fN L1 (ν) ≤ C.
Taking expectations and applying Khintchine’s inequality, we conclude
N
(1.73) ( |fn |2 )1/2 L1 (ν) ≤ C 
n=1
for some constant C independent of N . By Cauchy-Schwarz, this implies
that
N √
(1.74)  |fn |L1 (ν) ≤ C  N .
n=1
But as fn L1 (ν) = S(en )M (X) ≥ c for some constant c > 0 independent
of N , we obtain a contradiction for N large enough, and the claim follows.
Remark 1.7.23. The phenomenon of non-linear quantitative solvability
actually comes up in many applications of interest. For instance, consider
the Fefferman-Stein decomposition theorem [FeSt1972], which asserts that
any f ∈ BM O(R) of bounded mean oscillation can be decomposed as f =
g + Hh for some g, h ∈ L∞ (R), where H is the Hilbert transform. This

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
100 1. Real analysis

theorem was first proven by using the duality of the Hardy space H 1 (R)
and BMO (and by using Exercise 1.5.13), and by using the fact that a
function f is in H 1 (R) if and only if f and Hf both lie in L1 (R). From
the open mapping theorem, we know that we can pick g, h so that the L∞
norms of g, h are bounded by a multiple of the BMO norm of f . But it turns
out not to be possible to pick g and h in a bounded linear manner in terms
of f , although this is a little tricky to prove. (Uchiyama [Uc1982] famously
gave an explicit construction of g, h in terms of f , but the construction was
highly non-linear.)
An example in a similar spirit was given more recently by Bourgain and
Brezis [BoBr2003], who considered the problem of solving the equation
div u = f on the d-dimensional torus Td for some function f : Td → C on
the torus with mean zero and with some unknown vector field u : Td → Cd ,
where the derivatives are interpreted in the weak sense. They showed that if
d ≥ 2 and f ∈ Ld (Td ), then there existed a solution u to this problem with
u ∈ W 1,d ∩ C 0 , despite the failure of Sobolev embedding at this endpoint.
Again, the open mapping theorem allows one to choose u with norm bounded
by a multiple of the norm of f , but Bourgain and Brezis also show that one
cannot select u in a bounded linear fashion depending on f .

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/02/01.
Thanks to Achille Talon, Phi, Isett, Ulrich, Xiaochuan Liu, and anonymous
commenters for corrections.
Let me close with a question. All of the above constructions of non-
complemented closed subspaces or of linear problems that can only be quan-
titatively solved non-linearly were quite involved. Is there a soft or elemen-
tary way to see that closed subspaces of Banach spaces exist which are not
complemented, or (equivalently) that surjective continuous linear maps be-
tween Banach spaces do not always enjoy a continuous linear right-inverse?
I do not have a good answer to this question.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.8

Compactness in
topological spaces

One of the most useful concepts for analysis that arises from topology and
metric spaces is the concept of compactness. Recall (from Section 1.6) that
a space X is compact if every open cover of X has a finite subcover, or
equivalently if any collection of closed sets whose finite subcollections have
non-empty intersection itself has non-empty intersection. (In other words,
all families of closed sets obey the finite intersection property.)
In these notes, we explore how compactness interacts with other key
topological concepts: the Hausdorff property, bases and subbases, product
spaces, and equicontinuity, in particular establishing the useful Tychonoff
and Arzelá-Ascoli theorems that give criteria for compactness (or precom-
pactness).
Exercise 1.8.1 (Basic properties of compact sets).
• Show that any finite set is compact.
• Show that any finite union of compact subsets of a topological space
is still compact.
• Show that any image of a compact space under a continuous map is
still compact.
Show that these three statements continue to hold if “compact” is replaced
by “sequentially compact”.

1.8.1. Compactness and the Hausdorff property. Recall from Sec-


tion 1.6 that a topological space is Hausdorff if every distinct pair x, y of
points can be separated by two disjoint open neighbourhoods U, V of x, y,

101

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
102 1. Real analysis

respectively; every metric space is Hausdorff, but not every topological space
is.
At first glance, the Hausdorff property bears no resemblance to the com-
pactness property. However, they are in some sense dual to each other, as
the following two exercises show:
Exercise 1.8.2. Let X = (X, F ) be a compact topological space.
• Show that every closed subset in X is compact.
• Show that any weaker topology F  ⊂ F on X also yields a compact
topological space (X, F  ).
• Show that the trivial topology on X is always compact.
Exercise 1.8.3. Let X be a Hausdorff topological space.
• Show that every compact subset of X is closed.
• Show that any stronger topology F  ⊃ F on X also yields a Hausdorff
topological space (X, F  ).
• Show that the discrete topology on X is always Hausdorff.

The first exercise asserts that compact topologies tend to be weak, while
the second exercise asserts that Hausdorff topologies tend to be strong. The
next lemma asserts that the two concepts only barely overlap:
Lemma 1.8.1. Let F ⊂ F  be a weak and strong topology, respectively, on
a space X. If F  is compact and F is Hausdorff, then F = F  . (In other
words, a compact topology cannot be strictly stronger than a Hausdorff one,
and a Hausdorff topology cannot be strictly weaker than a compact one.)

Proof. Since F ⊂ F  , every set which is closed in (X, F ) is closed in (X, F  ),


and every set which is compact in (X, F  ) is compact in (X, F ). But from
Exercises 1.8.2 and 1.8.3, every set which is closed in (X, F  ) is compact
in (X, F  ), and every set which is compact in (X, F) is closed in (X, F).
Putting all this together, we see that (X, F) and (X, F  ) have exactly the
same closed sets, and thus have exactly the same open sets; in other words,
F = F . 
Corollary 1.8.2. Any continuous bijection f : X → Y from a compact
topological space (X, FX ) to a Hausdorff topological space (Y, FY ) is a home-
omorphism.

Proof. Consider the pullback f # (FY ) := {f −1 (U ) : U ∈ FY } of the topol-


ogy on Y by f ; this is a topology on X. As f is continuous, this topology
is weaker than FX , and thus by Lemma 1.8.1 is equal to FX . As f is a
bijection, this implies that f −1 is continuous, and the claim follows. 

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 103

One may wish to compare this corollary with Corollary 1.7.14.


Remark 1.8.3. Spaces which are both compact and Hausdorff (e.g., the
unit interval [0, 1] with the usual topology) have many nice properties and
are moderately common, so much so that the two properties are often con-
catenated as CH. Spaces that are locally compact and Hausdorff (e.g., mani-
folds) are much more common and have nearly as many nice properties, and
so these two properties are often concatenated as LCH. One should caution
that (somewhat confusingly) in some older literature (particularly those in
the French tradition), “compact” is used for “compact Hausdorff”.
(Optional). Another way to contrast compactness and the Hausdorff
property is via the machinery of ultrafilters. Define a filter on a space X
to be a collection p of sets of 2X which is closed under finite intersection, is
also monotone (i.e., if E ∈ p and E ⊂ F ⊂ X, then F ∈ p) and does not
contain the empty set. Define an ultrafilter to be a filter with the additional
property that for any E ∈ X, exactly one of E and X\E lies in p. (See also
Section 1.5 of Structure and Randomness.)
Exercise 1.8.4 (Ultrafilter lemma). Show that every filter is contained in
at least one ultrafilter. (Hint: Use Zorn’s lemma; see Section 2.4.)
Exercise 1.8.5. A collection of subsets of X has the finite intersection
property if every finite intersection of sets in the collection has non-empty
intersection. Show that every filter has the finite intersection property, and
that every collection of sets with the finite intersection property is contained
in a filter (and hence contained in an ultrafilter, by the ultrafilter lemma).
Given a point x ∈ X and an ultrafilter p on X, we say that p converges
to x if every neighbourhood of x belongs to p.
Exercise 1.8.6. Show that a space X is Hausdorff if and only if every
ultrafilter has at most one limit. (Hint: For the “if” part, observe that if x, y
cannot be separated by disjoint neighbourhoods, then the neighbourhoods
of x and y together enjoy the finite intersection property.)
Exercise 1.8.7. Show that a space X is compact if and only if every ul-
trafilter has at least one limit. (Hint: Use the finite intersection property
formulation of compactness and Exercise 1.8.5.)
1.8.2. Compactness and bases. Compactness is the property that every
open cover has a finite subcover. This property can be difficult to verify
in practice, in part because the class of open sets is very large. However,
in many cases one can replace the class of open sets with a much smaller
class of sets. For instance, in metric spaces, a set is open if and only if
it is the union of open balls (note that the union may be infinite or even
uncountable). We can generalise this notion as follows:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
104 1. Real analysis

Definition 1.8.4 (Base). Let (X, F ) be a topological space. A base for this
space is a collection B of open sets such that every open set in X can be
expressed as the union of sets in the base. The elements of B are referred
to as basic open sets.
Example 1.8.5. The collection of open balls B(x, r) in a metric space forms
a base for the topology of that space. As another (rather trivial) example
of a base: any topology F is a base for itself.

This concept should be compared with that of a basis of a vector space:


every vector in that space can be expressed as a linear combination of vectors
in a basis. However, one difference between a base and a basis is that
the representation of an open set as the union of basic open sets is almost
certainly not unique.
Given a base B, define a basic open neighbourhood of a point x ∈ X to
be a basic open set that contains x. Observe that a set U is open if and
only if every point in U has a basic open neighbourhood contained in U .
Exercise 1.8.8. Let B be a collection of subsets of a set X. Show that
B is a basis for some topology F if and only if it it covers X and has the
following additional property: given any x ∈ X and any two basic open
neighbourhoods U, V of x, there exists another basic open neighbourhood
W of x that is contained in U ∩ V . Furthermore, the topology F is uniquely
determined by B.

To verify the compactness property, it suffices to do so for basic open


covers (i.e., coverings of the whole space by basic open sets):
Exercise 1.8.9. Let (X, F ) be a topological space with a base B. Then the
following are equivalent:
• Every open cover has a finite subcover (i.e., X is compact);
• Every basic open cover has a finite subcover.

A useful fact about compact metric spaces is that they are in some sense
countably generated.
Lemma 1.8.6. Let X = (X, dX ) be a compact metric space.
(i) X is separable (i.e., it has an at most countably infinite dense sub-
set).
(ii) X is second-countable (i.e., it has an at most countably infinite base).

Proof. By Theorem 1.6.8, X is totally bounded. In particular, for every


n ≥ 1, one can cover X by a finite number of balls B(xn,1 , n1 ), . . . , B(xn,kn , n1 )
of radius n1 . The set of points {xn,i : n ≥ 1; 1 ≤ i ≤ kn } is then easily

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 105

verified to be dense and at most countable, giving (i). Similarly, the set of
balls {B(xn,i , n1 ) : n ≥ 1; 1 ≤ i ≤ kn } can be easily verified to be a base
which is at most countable, giving (ii). 
Remark 1.8.7. One can easily generalise compactness here to σ-compact-
ness; thus, for instance, finite-dimensional vector spaces Rn are separa-
ble and second-countable. The properties of separability and second-count-
ability are much weaker than σ-compactness in general, but can still serve
to provide some constraint as to the size or complexity of a metric space or
topological space in many situations.
We now weaken the notion of a base to that of a subbase.
Definition 1.8.8 (Subbase). Let (X, F ) be a topological space. A subbase
for this space is a collection B of subsets of X such that F is the weakest
topology that makes B open (i.e., F is generated by B). Elements of B are
referred to as subbasic open sets.
Observe for instance that every base is a subbase. The converse is not
true: for instance, the half-open intervals (−∞, a), (a, +∞) for a ∈ R form
a subbase for the standard topology on R, but not a base. In contrast to
bases, which need to obey the property in Exercise 1.8.8, no property is
required on a collection B in order for it to be a subbase; every collection of
sets generates a unique topology with respect to which it is a subbase.
The precise relationship between subbases and bases is given by the
following exercise.
Exercise 1.8.10. Let (X, F ) be a topological space, and let B be a collection
of subsets of X. Then the following are equivalent:
• B is a subbase for (X, F ).
• The space B ∗ := {B1 ∩· · ·∩Bk : B1 , . . . , Bk ∈ B} of finite intersections
of B (including the whole space X, which corresponds to the case
k = 0) is a base for (X, F).
Thus a set is open iff it is the union of finite intersections of subbasic
open sets.
Many topological facts involving open sets can often be reduced to veri-
fications on basic or subbasic open sets, as the following exercise illustrates:
Exercise 1.8.11. Let (X, F ) be a topological space, and B be a subbase of
X, and let B ∗ be a base of X.
• Show that a sequence xn ∈ X converges to a limit x ∈ X if and
only if every subbasic open neighbourhood of x contains xn for all
sufficiently large xn . (Optional: Show that an analogous statement
is also true for nets.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
106 1. Real analysis

• Show that a point x ∈ X is adherent to a set E if and only if every


basic open neighbourhood of x intersects E. Give an example to
show that the claim fails for subbasic open sets.
• Show that a point x ∈ X is in the interior of a set U if and only if U
contains a basic open neighbourhood of x. Give an example to show
that the claim fails for subbasic open sets.
• If Y is another topological space, show that a map f : Y → X is
continuous if and only if the inverse image of every subbasic open set
is open.

There is a useful strengthening of Exercise 1.8.9 in the spirit of the above


exercise, namely the Alexander subbase theorem:
Theorem 1.8.9 (Alexander subbase theorem). Let (X, F) be a topological
space with a subbase B. Then the following are equivalent:
• Every open cover has a finite subcover (i.e., X is compact);
• Every subbasic open cover has a finite subcover.

Proof. Call an open cover bad if it had no finite subcover and good oth-
erwise. In view of Exercise 1.8.9, it suffices to show that if every subbasic
open cover is good, then every basic open cover is good also, where we use
the basis B ∗ coming from Exercise 1.8.10.
Suppose for contradiction that every subbasic open cover was good but
at least one basic open cover was bad. If we order the bad basic open covers
by set inclusion, observe that every chain of bad basic open covers has an
upper bound that is also a bad basic open cover, namely the union of all
the covers in the chain. Thus, by Zorn’s lemma (Section 2.4), there exists a
maximal bad basic open cover C = (Uα )α∈A . Thus this cover has no finite
subcover, but if one adds any new basic open set to this cover, then there
must now be a finite subcover.
Pick a basic open set Uα in this cover C. Then we can write Uα =
B1 ∩ · · · ∩ Bk for some subbasic open sets B1 , . . . , Bk . We claim that at
least one of the B1 , . . . , Bk also lie in the cover C. To see this, suppose for
contradiction that none of the B1 , . . . , Bk was in C. Then adding any of the
Bi to C enlarges the basic open cover and thus creates a finite subcover; thus
Bi together with finitely many sets from C cover X, or equivalently that one
can cover X\Bi with finitely many sets from C. Thus one can also cover

X\Uα = ki=1 (X\Bi ) with finitely many sets from C, and thus X itself can
be covered by finitely many sets from C, a contradiction.
From the above discussion and the axiom of choice, we see that for each
basic set Uα in C there exists a subbasic set Bα containing Uα that also lies
in C. (Two different basic sets Uα , Uβ could lead to the same subbasic set

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 107

Bα = Bβ , but this will not concern us.) Since the Uα cover X, the Bα do
also. By hypothesis, a finite number of Bα can cover X, and so C is good,
which gives the desired a contradiction. 
Exercise 1.8.12. (Optional) Use Exercise 1.8.7 to give another proof of the
Alexander subbase theorem.
Exercise 1.8.13. Use the Alexander subbase theorem to show that the unit
interval [0, 1] (with the usual topology) is compact, without recourse to the
Heine-Borel or Bolzano-Weierstrass theorems.
Exercise 1.8.14. Let X be a well-ordered set, endowed with the order
topology (Exercise 1.6.10); such a space is known as an ordinal space. Show
that X is Hausdorff, and that X is compact if and only if X has a maximal
element.

One of the major applications of the Alexander subbase theorem is to


prove Tychonoff ’s theorem, which we turn to next.

1.8.3. Compactness and product spaces. Given two topological spaces


X = (X, FX ) and Y = (Y, FY ), we can form the product space X × Y , using
the cylinder sets {U × Y : U ∈ FX } ∪ {X × V : V ∈ FY } as a subbase, or
equivalently using the open boxes {U × V : U ∈ FX , V ∈ FY } as a base
(cf. Example 1.6.25). One easily verifies that the obvious projection maps
πX : X × Y → X, πY : X × Y → Y are continuous, and that these maps
also provide homeomorphisms between X × {y} and X, or between {x} × Y
and Y , for every x ∈ X, y ∈ Y . Also observe that a sequence (xn , yn )∞ n=1
(or net (xα , yα )α∈A ) converges to a limit (x, y) in X if and only if (xn )∞
n=1
and (yn )∞
n=1 (or (xα )α∈A and (yα )α∈A ) converge in X and Y to x and y,
respectively.
This operation preserves a number of useful topological properties, for
instance
Exercise 1.8.15. Prove that the product of two Hausdorff spaces is still
Hausdorff.
Exercise 1.8.16. Prove that the product of two sequentially compact spaces
is still sequentially compact.
Proposition 1.8.10. The product of two compact spaces is compact.

Proof. By Exercise 1.8.9 it suffices to show that any basic open cover of
X × Y by boxes (Uα × Vα )α∈A has a finite subcover. For any x ∈ X, this
open cover covers {x} × Y ; by the compactness of Y ≡ {x} × Y , we can thus
cover {x} × Y by a finite number of open boxes Uα × Vα . Intersecting the
Uα together, we obtain a neighbourhood Ux of x such that Ux × Y is covered

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
108 1. Real analysis

by a finite number of these boxes. But by compactness of X, we can cover


X by a finite number of Ux . Thus all of X × Y can be covered by a finite
number of boxes in the cover, and the claim follows. 
Exercise 1.8.17. (Optional) Obtain an alternate proof of Proposition 1.8.10
using Exercise 1.6.15.

The above theory for products of two spaces extends without difficulty
to products of finitely many spaces. Now we consider infinite products.
Definition 1.8.11 (Product spaces). Given a family (Xα , Fα )α∈A of topo-
logical spaces, let X := α∈A Xα be the Cartesian product, i.e., the space
of tuples (xα )α∈A with xα ∈ Xα for all α ∈ A. For each α ∈ A, we have the
obvious projection map πα : X → Xα that maps (xβ )β∈A to xα .
• We define the product topology on X to be the topology generated
by the cylinder sets πα−1 (Uα ) for α ∈ A and Uα ∈ Fα as a subbase, or
equivalently the weakest topology that makes all of the πα continuous.
• We define the box topology on X to be the topology generated by all
the boxes α∈A Uα , where Uα ∈ Fα for all α ∈ A.
Unless otherwise specified, we assume the product space to be endowed with
the product topology rather than the box topology.

When A is finite, the product topology and the box topology coincide.
When A is infinite, the two topologies are usually different (as we shall see),
but the box topology is always at least as strong as the product topology.
Actually, in practice the box topology is too strong to be of much use—there
are not enough convergent sequences in it. For instance, in the space RN of
real-valued sequences (xn )∞ 1 −nm ∞
n=1 , even sequences such as ( m! e )n=1 do not
converge to the zero sequence as m → ∞ (why?), despite converging in just
about every other sense.
Exercise 1.8.18. Show that the arbitrary product of Hausdorff spaces re-
mains Hausdorff in either the product or the box topology.
Exercise 1.8.19. Let (Xn , dn ) be a sequence of metric spaces. Show that
the the function d : X × X → R+ on the product space X := n Xn defined
by

∞ ∞ dn (xn , yn )
d((xn )n=1 , (yn )n=1 ) := 2−n
1 + dn (xn , yn )
n=1
is a metric on X which generates the product topology on X.
Exercise 1.8.20. Let X = α∈A Xα be a product space with the product
topology. Show that a sequence xn in that space converges to a limit x ∈ X
if and only if πα (xn ) converges in Xα to πα (x) for every α ∈ A. (The same

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 109

statement also holds for nets.) Thus convergence in the product topology is
essentially the same concept as pointwise convergence (cf. Example 1.6.24).

The box topology usually does not preserve compactness. For instance,
one easily checks that the product of any number of discrete spaces is still
discrete in the box topology. On the other hand, a discrete space is com-
pact (or sequentially compact) if and only if it is finite. Thus the infinite
product of any number of non-trivial (i.e., having at least two elements)
compact discrete spaces will be non-compact, and similarly for sequential
compactness.
The situation improves significantly with the product topology, however
(which is weaker, and thus more likely to be compact). We begin with the
situation for sequential compactness.
Proposition 1.8.12 (Sequential Tychonoff theorem). Any at most count-
able product of sequentially compact topological spaces is sequentially com-
pact.

Proof. We will use the Arzelá-Ascoli diagonalisation argument. The finite


case is already handled by Exercise 1.8.16 (and in any event can be easily
deduced from the countable case), so suppose we have a countably infinite
sequence (Xn , Fn )∞
n=1 of sequentially compact spaces, and consider the prod-
uct space X = ∞ (1) (2)
n=1 Xn with the product topology. Let x , x , . . . be a
sequence in X, thus each x(m) is itself a sequence x(m) = (xn )∞
(m)
n=1 with
(m)
xn ∈ Xn for all n. Our objective is to find a subsequence x (m j ) which con-
verges to some limit x = (xn )∞
n=1 in the product topology, which by Exercise
(mj )
1.8.20 is the same as pointwise convergence (i.e., xn → xn as j → ∞ for
each n).
(m)
Consider the first coordinates x1 ∈ X1 of the sequence x(m) . As X1 is
sequentially compact, we can find a subsequence (x(m1,j ) )∞
j=1 in X such that
(m1,j )
x1 converges in X1 to some limit x1 ∈ X1 .
(m )
Now, in this subsequence, consider the second coordinates x2 1,j ∈ X2 .
As X2 is sequentially compact, we can find a further subsequence (x(m2,j ) )∞
j=1
(m )
in X such that x2 2,j converges in X2 to some limit x2 ∈ X1 . Also, we
(m )
inherit from the preceding subsequence that x1 2,j converges in X1 to x1 .
We continue in this vein, creating nested subsequences (x(mi,j ) )∞
j=1 for
(mi,j ) (mi,j )
i = 1, 2, 3, . . . whose first i components x1 , . . . , xi converge to x1 ∈
X1 , . . . , xi ∈ Xi , respectively.
None of these subsequences, by themselves are sufficient to finish the
problem. But now we use the diagonalisation trick: We consider the diagonal

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
110 1. Real analysis

(m )
sequence (x(mj,j ) )∞
j=1 . One easily verifies that xn
j,j
converges in Xn to xn
as j → ∞ for every n, and so we have extracted a sequence that is convergent
in the product topology. 
Remark 1.8.13. In the converse direction, if a product of spaces is se-
quentially compact, then each of the factor spaces must also be sequentially
compact, since they are continuous images of the product space and one can
apply Exercise 1.8.1.

The sequential Tychonoff theorem breaks down for uncountable prod-


ucts. Consider for instance the product space X := {0, 1}{0,1} of functions
N

f : {0, 1}N → {0, 1}. As {0, 1} (with the discrete topology) is sequentially
compact, this is an (uncountable) product of sequentially compact spaces.
On the other hand, for each n ∈ N we can define the evaluation function
fn : {0, 1}N → {0, 1} by fn : (am )∞
m=1 → an . This is a sequence in X; we
claim that it has no convergent subsequence. Indeed, given any nj → ∞, we
can find x = (xm )∞ m=1 ∈ {0, 1}
∞ such that x
nj = fnj (x) does not converge
to a limit as j → ∞, and so fnj does not converge pointwise (i.e., does not
converge in the product topology).
However, we can recover the result for uncountable products as long as
we work with topological compactness rather than sequential compactness,
leading to Tychonoff ’s theorem:
Theorem 1.8.14 (Tychonoff’s theorem). Any product of compact topolog-
ical spaces is compact.

Proof. Write X = α∈A Xα for this product of compact topological spaces.


By Theorem 1.8.9, it suffices to show that any open cover of X by subbasic
open sets (πα−1
β
(Uβ ))β∈B has a finite subcover, where B is some index set,
and for each β ∈ B, αβ ∈ A and Uβ is open in Xαβ .
For each α ∈ A, consider the subbasic open sets πα−1 (Uβ ) that are asso-
ciated to those β ∈ B with αβ = α. If the open sets Uβ here cover Xα , then
by compactness of Xα , a finite number of the Uβ already suffice to cover
Xα , and so a finite number of the πα−1 (Uβ ) cover X, and we are done. So we
may assume that the Uβ do not cover Xα , thus there exists xα ∈ Xα that
avoids all the Uβ with αβ = α. One then sees that the point (xα )α∈A in X
avoids all of the πα−1 (Uβ ), a contradiction. The claim follows. 
Remark 1.8.15. The axiom of choice was used in several places in the proof
(in particular, via the Alexander subbase theorem). This turns out to be
necessary, because one can use Tychonoff’s theorem to establish the axiom
of choice. This was first observed by Kelley and can be sketched as follows.
It suffices to show that the product α∈A Xα of non-empty sets is again non-
empty. We can make each Xα compact (e.g., by using the trivial topology).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 111

We then adjoin an isolated element ∞ to each Xα to obtain another compact


space Xα ∪ {∞}, with Xα closed in Xα ∪ {∞}. By Tychonoff’s theorem,
the product X := α∈A (Xα ∪ {∞}) is compact, and thus every collection
of closed sets with finite intersection property has non-empty intersection.
But observe that the sets πα−1 (Xα ) in X, where πα : X → Xα ∪ {∞} is the
obvious projection, are closed and have the finite intersection property; thus
the intersection of all of these sets is non-empty, and the claim follows.
Remark 1.8.16. From the above discussion, we see that the space
Z
{0, 1}{0,1} is compact but not sequentially compact; thus compactness does
not necessarily imply sequential compactness.
Exercise 1.8.21. Let us call a topological space (X, F ) first-countable if,
for every x ∈ X, there exists a countable family Bx,1 , Bx,2 , . . . of open neigh-
bourhoods of x such that every neighbourhood of x contains at least one of
the Bx,j .
• Show that every metric space is first-countable.
• Show that every second-countable space is first-countable (see
Lemma 1.8.6).
• Show that every separable metric space is second-countable.
• Show that every space which is second-countable, is separable.
• (Optional) Show that every net (xα )α∈A which converges in X to x,
has a convergent subsequence (xφ(n) )∞
n=1 (i.e., a subnet whose index
set is N).
• Show that any compact space which is first-countable is also sequen-
tially compact. (The converse is not true: Exercise 1.6.10 provides a
counterexample.)

(Optional) There is an alternate proof of Tychonoff’s theorem that uses


the machinery of universal nets. We sketch this approach in a series of
exercises.
Definition 1.8.17. A net (xα )α∈A in a set X is universal if for every func-
tion f : X → {0, 1}, the net (f (xα ))α∈A converges to either 0 or 1.
Exercise 1.8.22. Show that a universal net (xα )α∈A in a compact topologi-
cal space is necessarily convergent. (Hint: Show that the collection of closed
sets which contain xα for sufficiently large α enjoys the finite intersection
property.)
Exercise 1.8.23 (Kelley’s theorem). Every net (xα )α∈A in a set X has
a universal subnet (xφ(β) )β∈B . (Hint: First use Exercise 1.8.5 to find an
ultrafilter p on A that contains the upsets {β ∈ A : β ≥ α} for all α ∈ A.
Now let B be the space of all pairs (U, α), where α ∈ U ∈ p, ordered by

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
112 1. Real analysis

requiring (U, α) ≤ (U  , α ) when U ⊃ U  and α ≤ α , and let φ : B → A be


the map φ : (U, α) → α.)
Exercise 1.8.24. Use the previous two exercises, together with Exercise
1.8.20, to establish an alternate proof of Tychonoff’s theorem.
Exercise 1.8.25. Establish yet another proof of Tychonoff’s theorem using
Exercise 1.8.7 directly (rather than proceeding via Exercise 1.8.12).

1.8.4. Compactness and equicontinuity. We now pause to give an im-


portant application of the (sequential) Tychonoff theorem. We begin with
some definitions. If X = (X, FX ) is a topological space and Y = (Y, dY ) is a
metric space, let BC(X → Y ) be the space of bounded continuous functions
from X to Y . (If X is compact, this is the same space as C(X → Y ), the
space of continuous functions from X to Y .) We can give this space the
uniform metric
d(f, g) := sup dY (f (x), g(x)).
x∈X

Exercise 1.8.26. If Y is complete, show that BC(X → Y ) is a complete


metric space. (Note that this implies Exercise 1.5.2.)

Note that if f : X → Y is continuous if and only if, for every x ∈ X and


ε > 0, there exists a neighbourhood U of x such that dY (f (x ), f (x)) ≤ ε
for all x ∈ U . We now generalise this concept to families.
Definition 1.8.18. Let X be a topological space, let Y be a metric space,
and let (fα )α∈A be a family of functions fα ∈ BC(X → Y ).
• We say that this family fα is pointwise bounded if for every x ∈ X,
the set {fα (x) : α ∈ A} is bounded in Y .
• We say that this family fα is pointwise precompact if for every x ∈ X,
the set {fα (x) : α ∈ A} is precompact in Y .
• We say that this family fα is equicontinuous if for every x ∈ X and
ε > 0, there exists a neighbourhood U of x such that dY (fα (x ), fα (x))
≤ ε for all α ∈ A and x ∈ U .
• If X = (X, dX ) is also a metric space, we say that the family fα
is uniformly equicontinuous if for every ε > 0 there exists a δ > 0
such that dY (fα (x ), fα (x)) ≤ ε for all α ∈ A and x , x ∈ x with
dX (x, x ) ≤ δ.
Remark 1.8.19. From the Heine-Borel theorem, the pointwise bounded-
ness and pointwise precompactness properties are equivalent if Y is a subset
of Rn for some n. Any finite collection of continuous functions is automati-
cally an equicontinuous family (why?), and any finite collection of uniformly
continuous functions is automatically a uniformly equicontinuous family.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 113

The concept only acquires additional meaning once one considers infinite
families of continuous functions.

Example 1.8.20. With X = [0, 1] and Y = R, the family of functions


fn (x) := xn for n = 1, 2, 3, . . . are pointwise bounded (and thus pointwise
precompact) but not equicontinuous. The family of functions gn (x) := n
for n = 1, 2, 3, . . ., on the other hand, are equicontinuous, but not pointwise
bounded or pointwise precompact. The family of functions hn (x) := sin nx
for n = 1, 2, 3, . . . are pointwise bounded (even uniformly bounded) but not
equicontinuous.

Example 1.8.21. With X = Y = R, the functions fn (x) = arctan nx are


pointwise bounded (even uniformly bounded), are equicontinuous, and are
each individually uniformly continuous, but are not uniformly equicontinu-
ous.

Exercise 1.8.27. Show that the uniform boundedness principle (Theorem


1.7.5) can be restated as the assertion that any family of bounded linear
operators from the unit ball of a Banach space to a normed vector space is
pointwise bounded if and only if it is equicontinuous.

Example 1.8.22. A function f : X → Y between two metric spaces is said


to be Lipschitz (or Lipschitz continuous) if there exists a constant C such
that dY (f (x), f (x )) ≤ CdX (x, x ) for all x, x ∈ X; the smallest constant
C one can take here is known as the Lipschitz constant of f . Observe that
Lipschitz functions are automatically continuous, hence the name. Also
observe that a family (fα )α∈A of Lipschitz functions with uniformly bounded
Lipschitz constant is equicontinuous.

One nice consequence of equicontinuity is that it equates uniform con-


vergence with pointwise convergence, or even pointwise convergence on a
dense subset.

Exercise 1.8.28. Let X be a topological space, let Y be a complete metric


space, let f1 , f2 , . . . ∈ BC(X → Y ) be an equicontinuous family of functions.
Show that the following are equivalent:
• The sequence fn is pointwise convergent.
• The sequence fn is pointwise convergent on some dense subset of X.
If X is compact, show that the above two statements are also equivalent to
• The sequence fn is uniformly convergent.
(Compare with Corollary 1.7.7.) Show that no two of the three statements
remain equivalent if the hypothesis of equicontinuity is dropped.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
114 1. Real analysis

We can now use Proposition 1.8.12 to give a useful characterisation of


precompactness in C(X → Y ) when X is compact, known as the Arzelá-
Ascoli theorem:
Theorem 1.8.23 (Arzelá-Ascoli theorem). Let Y be a metric space, let X
be a compact metric space, and let (fα )α∈A be a family of functions fα ∈
BC(X → Y ). Then the following are equivalent:
(i) {fα : α ∈ A} is a precompact subset of BC(X → Y ).
(ii) (fα )α∈A is pointwise precompact and equicontinuous.
(iii) (fα )α∈A is pointwise precompact and uniformly equicontinuous.

Proof. We first show that (i) implies (ii). For any x ∈ X, the evaluation
map f → f (x) is a continuous map from C(X → Y ) to Y , and thus maps
precompact sets to precompact sets. As a consequence, any precompact
family in C(X → Y ) is pointwise precompact. To show equicontinuity,
suppose for contradiction that equicontinuity failed at some point x, thus
there exists ε > 0, a sequence αn ∈ A, and points xn → x such that
dY (fαn (xn ), fαn (x)) > ε for every n. One then verifies that no subsequence
of fαn can converge uniformly to a continuous limit, contradicting precom-
pactness. (Note that in the metric space C(X → Y ), precompactness is
equivalent to sequential precompactness.)
Now we show that (ii) implies (iii). It suffices to show that equicontinuity
implies uniform equicontinuity. This is a straightforward generalisation of
the more familiar argument that continuity implies uniform continuity on
a compact domain, and we repeat it here. Namely, fix ε > 0. For every
x ∈ X, equicontinuity provides a δx > 0 such that dY (fα (x), fα (x )) ≤ ε
whenever x ∈ B(x, δx ) and α ∈ A. The balls B(x, δx /2) cover X, thus
by compactness some finite subcollection B(xi , δxi /2), i = 1, . . . , n, of these
balls cover X. One then easily verifies that dY (fα (x), fα (x )) ≤ ε whenever
x, x ∈ X with dX (x, x ) ≤ min1≤i≤n δxi /2.
Finally, we show that (iii) implies (i). It suffices to show that any se-
quence fn ∈ BC(X → Y ), n = 1, 2, . . ., which is pointwise precompact and
uniformly equicontinuous, has a convergent subsequence. By embedding Y
in its metric completion Y , we may assume without loss of generality that Y
is complete. (Note that for every x ∈ X, the set {fn (x) : n = 1, 2, . . .} is pre-
compact in Y , hence the closure in Y is complete and thus closed in Y also.
Thus any pointwise limit of the fn in Y will take values in Y .) By Lemma
1.8.6, we can find a countable dense subset x1 , x2 , . . . of X. For each xm ,
we can use pointwise precompactness to find a compact set Km ⊂ Y such
that fα (xm ) takes values in Km . For each n, the tuple Fn := (fn (xm ))∞ m=1
can then be viewed as a point in the product space ∞ n=1 Kn . By Proposi-
tion 1.8.12, this product space is sequentially compact, hence we may find

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 115

a subsequence nj → ∞ such that Fn is convergent in the product topol-


ogy, or equivalently that fn pointwise converges on the countable dense set
{x1 , x2 , . . .}. The claim now follows from Exercise 1.8.28. 
Remark 1.8.24. The above theorem characterises precompact subsets of
BC(X → Y ) when X is a compact metric space. One can also characterise
compact subsets by observing that a subset of a metric space is compact if
and only if it is both precompact and closed.
There are many variants of the Arzelá-Ascoli theorem with stronger or
weaker hypotheses or conclusions; for instance, we have
Corollary 1.8.25 (Arzelá-Ascoli theorem, special case). Let fn : X → Rm
be a sequence of functions from a compact metric space X to a finite-
dimensional vector space Rm which are equicontinuous and pointwise
bounded. Then there is a subsequence fnj of fn which converges uniformly
to a limit (which is necessarily bounded and continuous).
Thus, for instance, any sequence of uniformly bounded and uniformly
Lipschitz functions fn : [0, 1] → R will have a uniformly convergent subse-
quence. This claim fails without the uniform Lipschitz assumption (consider,
for instance, the functions fn (x) := sin(nx)). Thus one needs a “little bit
extra” uniform regularity in addition to uniform boundedness in order to
force the existence of uniformly convergent subsequences. This is a gen-
eral phenomenon in infinite-dimensional function spaces: compactness in a
strong topology tends to require some sort of uniform control on regularity
or decay in addition to uniform bounds on the norm.
Exercise 1.8.29. Show that the equivalence of (i) and (ii) continues to
hold if X is assumed to be just a compact Hausdorff space rather than
a compact metric space (the statement (iii) no longer makes sense in this
setting). (Hint: X need not be separable any more, however one can still
adapt the diagonalisation argument used to prove Proposition 1.8.12. The
starting point is the observation that for every ε > 0 and every x ∈ X,
one can find a neighbourhood U of x and some subsequence fnj which only
oscillates by at most ε (or maybe 2ε) on U .)
Exercise 1.8.30 (Locally compact Hausdorff version of Arzelá-Ascoli). Let
X be a locally compact Hausdorff space which is also σ-compact, and let
fn ∈ C(X → R) be an equicontinuous, pointwise bounded sequence of func-
tions. Then there exists a subsequence fnj ∈ C(X → R) which converges
uniformly on compact subsets of X to a limit f ∈ C(X → R). (Hint: Ex-
press X as a countable union of compact sets Kn , each one contained in the
interior of the next. Apply the compact Hausdorff Arzelá-Ascoli theorem on
each compact set (Exercise 1.8.29). Then apply the Arzelá-Ascoli argument
one last time.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
116 1. Real analysis

Remark 1.8.26. The Arzelá-Ascoli theorem (and other compactness theo-


rems of this type) are often used in partial differential equations to demon-
strate existence of solutions to various equations or variational problems.
For instance, one may wish to solve some equation F (u) = f , for some func-
tion u : X → Rm . One way to do this is to first construct a sequence un
of approximate solutions so that F (un ) → f as n → ∞ in some suitable
sense. If one can also arrange these un to be equicontinuous and pointwise
bounded, then the Arzelá-Ascoli theorem allows one to pass to a subsequence
that converges to a limit u. Given enough continuity (or semi-continuity)
properties on F , one can then show that F (u) = f as required.
More generally, the use of compactness theorems to demonstrate exis-
tence of solutions in PDE is known as the compactness method. It is ap-
plicable in a remarkably broad range of PDE problems, but often has the
drawback that it is difficult to establish uniqueness of the solutions created
by this method (compactness guarantees existence of a limit point, but not
uniqueness). Also, in many cases one can only hope for compactness in
rather weak topologies, and as a consequence it is often difficult to establish
regularity of the solutions obtained via compactness methods.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/02/09.
Thanks to Nate Chandler, Emmanuel Kowalski, Eric, K. P. Hart, Ke, Luca
Trevisan, PDEBeginner, RR, Samir Chomsky, Xiaochuan Liu, and anony-
mous commenters for corrections.
David Speyer and Eric pointed out that the axiom of choice was used
in two different ways in the proof of Tychonoff’s theorem, first to prove the
subbase theorem and second to select an element xα from each Xα . Inter-
estingly, it is the latter use which is the more substantial one; the subbase
theorem can be shown to be equivalent to the ultrafilter lemma, which is
strictly weaker than the axiom of choice. Furthermore, for Hausdorff spaces,
one can establish Tychonoff’s theorem purely using ultralimits, which shows
the strange non-Hausdorff nature of the topology in Remark 1.8.15.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.9

The strong and weak


topologies

A normed vector space (X, X ) automatically generates a topology, known


as the norm topology or strong topology on X, generated by the open balls
B(x, r) := {y ∈ X : y − xX < r}. A sequence xn in such a space converges
strongly (or converges in norm) to a limit x if and only if xn − xX → 0 as
n → ∞. This is the topology we have implicitly been using in our previous
discussion of normed vector spaces.
However, in some cases it is useful to work in topologies on vector spaces
that are weaker than a norm topology. One reason for this is that many
important modes of convergence, such as pointwise convergence, conver-
gence in measure, smooth convergence, or convergence on compact subsets,
are not captured by a norm topology, and so it is useful to have a more
general theory of topological vector spaces that contains these modes. An-
other reason (of particular importance in PDE) is that the norm topology
on infinite-dimensional spaces is so strong that very few sets are compact
or precompact in these topologies, making it difficult to apply compactness
methods in these topologies (cf. Section 1.6 of Poincaré’s Legacies, Vol. II ).
Instead, one often first works in a weaker topology, in which compactness
is easier to establish, and then somehow upgrades any weakly convergent
sequences obtained via compactness to stronger modes of convergence (or
alternatively, one abandons strong convergence and exploits the weak con-
vergence directly). Two basic weak topologies for this purpose are the weak
topology on a normed vector space X, and the weak* topology on a dual vec-
tor space X ∗ . Compactness in the latter topology is usually obtained from
the Banach-Alaoglu theorem (and its sequential counterpart), which will be

117

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
118 1. Real analysis

a quick consequence of Tychonoff ’s theorem (and its sequential counterpart)


from the previous section.
The strong and weak topologies on normed vector spaces also have ana-
logues for the space B(X → Y ) of bounded linear operators from X to
Y , thus supplementing the operator norm topology on that space with two
weaker topologies, which (somewhat confusingly) are named the strong op-
erator topology and the weak operator topology.

1.9.1. Topological vector spaces. We begin with the definition of a topo-


logical vector space, which is a space with suitably compatible topological
and vector space structures on it.
Definition 1.9.1. A topological vector space V = (V, F) is a real or complex
vector space V , together with a topology F such that the addition operation
+ : V × V → V and the scalar multiplication operation · : R × V → V or
· : C × V → V is jointly continuous in both variables (thus, for instance, +
is continuous from V × V with the product topology to V ).

It is an easy consequence of the definitions that the translation maps


x → x + x0 for x0 ∈ V and the dilation maps x → λ · x for non-zero scalars
λ are homeomorphisms on V ; thus for instance the translation or dilation
of an open set (or a closed set, a compact set, etc.) is open (resp. closed,
compact, etc.) We also have the usual limit laws: if xn → x and yn → y in
a topological vector space, then xn + yn → x + y, and if λn → λ in the field
of scalars, then λn xn → λx. (Note how we need joint continuity here; if we
only had continuity in the individual variables, we could only conclude that
xn + yn → x + y (for instance) if one of xn or yn was constant.)
We now give some basic examples of topological vector spaces.
Exercise 1.9.1. Show that every normed vector space is a topological vec-
tor space, using the balls B(x, r) as the base for the topology. Show that
the same statement holds if the vector space is quasi-normed rather than
normed.
Exercise 1.9.2. Every semi-normed vector space is a topological vector
space, again using the balls B(x, r) as a base for the topology. This topology
is Hausdorff if and only if the seminorm is a norm.
Example 1.9.2. Any linear subspace of a topological vector space is again
a topological vector space (with the induced topology).
Exercise 1.9.3. Let V be a vector space, and let (Fα )α∈A be a (possibly
infinite) family of topologies
 on V , each of which turning V intoa topological
vector space. Let F := α∈A Fα be the topology generated by α∈A Fα (i.e.,
it is the weakest topology that contains all of the Fα ). Show that (V, F) is

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 119

also a topological vector space. Also show that a sequence xn ∈ V converges


to a limit x in F if and only if xn → x in Fα for all α ∈ A. (The same
statement also holds if sequences are replaced by nets.) In particular, by
Exercise 1.9.2, we can talk about the topological vector space V generated
by a family of seminorms (α )α∈A on V .
Exercise 1.9.4. Let T : V → W be a linear map between vector spaces.
Suppose that we give V the topology induced by a family of seminorms
(Vα )α∈A , and W the topology induced by a family of seminorms (Wβ )β∈B .
Show that T is continuous if and only if, for each β ∈ B, there
 exists a finite
subset Aβ of A and a constant Cβ such that T f Wβ ≤ Cβ α∈Aβ f Vα for
all f ∈ V .
Example 1.9.3 (Pointwise convergence). Let X be a set, and let CX be the
space of complex-valued functions f : X → C; this is a complex vector space.
Each point x ∈ X gives rise to a seminorm f x := |f (x)|. The topology
generated by all of these seminorms is the topology of pointwise convergence
on CX (and is also the product topology on this space); a sequence fn ∈ CX
converges to f in this topology if and only if it converges pointwise. Note
that if X has more than one point, then none of the seminorms individually
generate a Hausdorff topology, but when combined together, they do.
Example 1.9.4 (Uniform convergence). Let X be a topological space, and
let C(X) be the space of complex-valued continuous functions f : X → C.
If X is not compact, then one does not expect functions in C(X) to be
bounded in general, and so the sup norm does not necessarily make C(X)
into a normed vector space. Nevertheless, one can still define balls B(f, r)
in C(X) by
B(f, r) := {g ∈ C(X) : sup |f (x) − g(x)| ≤ r}
x∈X
and verify that these form a base for a topological vector space. A sequence
fn ∈ C(X) converges in this topology to a limit f ∈ C(X) if and only if fn
converges uniformly to f , thus supx∈X |fn (x) − f (x)| is finite for sufficiently
large n and converges to zero as n → ∞. More generally, one can make a
topological vector space out of any norm, quasi-norm, or seminorm which
is infinite on some portion of the vector space.
Example 1.9.5 (Uniform convergence on compact sets). Let X and C(X)
be as in the previous example. For every compact subset K of X, we can
define a seminorm C(K) on C(X) by f C(K) := supx∈K |f (x)|. The
topology generated by all of these seminorms (as K ranges over all compact
subsets of X) is called the topology of uniform convergence on compact sets;
it is stronger than the topology of pointwise convergence but weaker than the
topology of uniform convergence. Indeed, a sequence fn ∈ C(X) converges

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
120 1. Real analysis

to f ∈ C(X) in this topology if and only if fn converges uniformly to f on


each compact set.
Exercise 1.9.5. Show that an arbitrary product of topological vector spaces
(endowed with the product topology) is again a topological vector space.10
Exercise 1.9.6. Show that a topological vector space is Hausdorff if and
only if the origin {0} is closed. (Hint: First use the continuity of addition to
prove the lemma that if V is an open neighbourhood of 0, then there exists
another open neighbourhood U of 0 such that U + U ⊂ V , i.e., u + u ∈ V
for all u, u ∈ U .)
Example 1.9.6 (Smooth convergence). Let C ∞ ([0, 1]) be the space of
smooth functions f : [0, 1] → C. One can define the C k norm on this
space for any non-negative integer k by the formula
k
f C k := sup |f (j) (x)|,
j=0 x∈[0,1]

where f (j) is the jth derivative of f . The topology generated by all the C k
norms for k = 0, 1, 2, . . . is the smooth topology: a sequence fn converges in
(j)
this topology to a limit f if fn converges uniformly to f (j) for each j ≥ 0.
Exercise 1.9.7 (Convergence in measure). Let (X, X , μ) be a measure
space, and let L(X) be the space of measurable functions f : X → C.
Show that the sets
B(f, ε, r) := {g ∈ L(X) : μ({x : |f (x) − g(x)| ≥ r} < ε)}
for f ∈ L(X), ε > 0, r > 0 form the base for a topology that turns L(X)
into a topological vector space, and that a sequence fn ∈ L(X) converges to
a limit f in this topology if and only if it converges in measure.
Exercise 1.9.8. Let [0, 1] be given the usual Lebesgue measure. Show
that the vector space L∞ ([0, 1]) cannot be given a topological vector space
structure in which a sequence fn ∈ L∞ ([0, 1]) converges to f in this topology
if and only if it converges almost everywhere. (Hint: Construct a sequence
fn in L∞ ([0, 1]) which does not converge pointwise a.e. to zero, but such
that every subsequence has a further subsequence that converges a.e. to
zero, and use Exercise 1.6.8.) Thus almost everywhere convergence is not
“topologisable” in general.
Exercise 1.9.9 (Algebraic topology). Recall that a subset U of a real vector
space V is algebraically open if the sets {t ∈ R : x + tv ∈ U } are open for
all x, v ∈ V .
10 I am not sure if the same statement is true for the box topology; I believe it is false.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 121

(i) Show that any set which is open in a topological vector space, is also
algebraically open.
(ii) Give an example of a set in R2 which is algebraically open, but not
open in the usual topology. (Hint: A line intersects the unit circle in
at most two points.)
(iii) Show that the collection of algebraically open sets in V is a topology.
(iv) Show that the collection of algebraically open sets in R2 does not
give R2 the structure of a topological vector space.
Exercise 1.9.10 (Quotient topology). Let V be a topological vector space,
and let W be a subspace of V . Let V /W := {v + W : v ∈ V } be the space
of cosets of W ; this is a vector space. Let π : V → V /W be the coset
map π(v) := v + W . Show that the collection of sets U ⊂ V /W such that
π −1 (U ) is open gives V /W the structure of a topological vector space. If V
is Hausdorff, show that V /W is Hausdorff if and only if W is closed in V .

Some (but not all) of the concepts that are definable for normed vector
spaces are also definable for the more general category of topological vector
spaces. For instance, even though there is no metric structure, one can still
define the notion of a Cauchy sequence xn ∈ V in a topological vector space:
this is a sequence such that xn −xm → 0 as n, m → ∞ (or more precisely, for
any open neighbourhood U of 0, there exists N > 0 such that xn − xm ∈ U
for all n, m ≥ N ). It is then possible to talk about a topological vector
space being complete (i.e., every Cauchy sequence converges). (From a more
abstract perspective, the reason we can define notions such as completeness
is because a topological vector space has something better than a topological
structure, namely a uniform structure.)
Remark 1.9.7. As we have seen in previous lectures, complete normed
vector spaces (i.e., Banach spaces) enjoy some very nice properties. Some
of these properties (e.g., the uniform boundedness principle and the open
mapping theorem) extend to a slightly larger class of complete topological
vector spaces, namely the Fréchet spaces. A Fréchet space is a complete
Hausdorff topological vector space whose topology is generated by an at
most countable family of seminorms; examples include the space C ∞ ([0, 1])
from Exercise 1.9.6 or the uniform convergence on compact topology from
Exercise 1.9.5 in the case when X is σ-compact. We will however not study
Fréchet spaces systematically here.

One can also extend the notion of a dual space V ∗ from normed vector
spaces to topological vector spaces in the obvious manner: the dual space
V ∗ of a topological space is the space of continuous linear functionals from
V to the field of scalars (either R or C, depending on whether V is a real

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
122 1. Real analysis

or complex vector space). This is clearly a vector space. Unfortunately, in


the absence of a norm on V , one cannot define the analogue of the norm
topology on V ∗ , but as we shall see below, there are some weaker topologies
that one can still place on this dual space.

1.9.2. Compactness in the strong topology. We now return to normed


vector spaces and briefly discuss compactness in the strong (or norm) topol-
ogy on such spaces. In finite dimensions, the Heine-Borel theorem tells us
that a set is compact if and only if it is closed and bounded. In infinite dimen-
sions, this is not enough, for two reasons. Firstly, compact sets need to be
complete, so we are only likely to find many compact sets when the ambient
normed vector space is also complete (i.e., it is a Banach space). Secondly,
compact sets need to be totally bounded rather than merely bounded, and
this is quite a stringent condition. Indeed it forces compact sets to be almost
finite-dimensional in the following sense:
Exercise 1.9.11. Let K be a subset of a Banach space V . Show that the
following are equivalent:
(i) K is compact.
(ii) K is sequentially compact.
(iii) K is closed and bounded, and for every ε > 0, K lies in the ε-
neighbourhood {x ∈ V : x − y < ε for some y ∈ W } of a finite-
dimensional subspace W of V .

 sequence V1 ⊂ V2 ⊂ · · · of finite-
Suppose furthermore that there is a nested
dimensional subspaces of V such that ∞ n=1 Vn is dense. Show that the
following statement is equivalent to the first three:
(iv) K is closed and bounded, and for every ε > 0, there exists an n such
that K lies in the ε-neighbourhood of Vn .
Example 1.9.8. Let 1 ≤ p < ∞. In order for a set K ⊂ p (N) to be
compact in the strong topology, it needs to be closed and bounded, and also
uniformly pth-power integrable at spatial infinity in the sense that for every
ε > 0 there exists n > 0 such that
( |f (m)|p )1/p ≤ ε
m>n

for all f ∈ K. Thus, for instance, the moving bump example {e1 , e2 , e3 , . . .},
where en is the sequence which equals 1 on n and zero elsewhere, is not
uniformly pth power integrable and thus not a compact subset of p (N),
despite being closed and bounded.
For continuous Lp spaces, such as Lp (R), uniform integrability at spatial
infinity is not sufficient to force compactness in the strong topology; one also

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 123

needs some uniform integrability at very fine scales, which can be described
using harmonic analysis tools such as the Fourier transform (Section 1.12).
We will not discuss this topic here.
Exercise 1.9.12. Let V be a normed vector space.
• If W is a finite-dimensional subspace of V , and x ∈ V , show that
there exists y ∈ W such that x − y ≤ x − y   for all y  ∈ W . Give
an example to show that y is not necessarily unique (in contrast to
the situation with Hilbert spaces).
• If W is a finite-dimensional proper subspace of V , show that there
exists x ∈ V with x = 1 such that x − y ≥ 1 for all y ∈ W (cf.
the Riesz lemma).
• Show that the closed unit ball {x ∈ V : x ≤ 1} is compact in the
strong topology if and only if V is finite dimensional.

1.9.3. The weak and weak* topologies. Let V be a topological vector


space. Then, as discussed above, we have the vector space V ∗ of continuous
linear functionals on V . We can use this dual space to create two useful
topologies: the weak topology on V and the weak* topology on V ∗ .
Definition 1.9.9 (Weak and weak* topologies). Let V be a topological
vector space, and let V ∗ be its dual.
• The weak topology on V is the topology generated by the seminorms
xλ := |λ(x)| for all λ ∈ V ∗ .
• The weak* topology on V ∗ is the topology generated by the seminorms
λx := |λ(x)| for all x ∈ V .
Remark 1.9.10. It is possible for two non-isomorphic topological vector
spaces to have isomorphic duals, but with non-isomorphic weak* topologies.
(For instance, 1 (N) has a very large number of preduals, which can generate
a number of different weak* topologies on 1 (N).) So, technically, one cannot
talk about the weak* topology on a dual space V ∗ , without specifying exactly
what the predual space V is. However, in practice, the predual space is
usually clear from context.
Exercise 1.9.13. Show that the weak topology on V is a topological vector
space structure on V that is weaker than the strong topology on V . Also,
show that the weak* topology on V ∗ is a topological vector space structure
on V ∗ that is weaker than the weak topology on V ∗ (which is defined using
the double dual (V ∗ )∗ ). When V is reflexive, show that the weak and weak*
topologies on V ∗ are equivalent.

From the definition, we see that a sequence xn ∈ V converges in the


weak topology, or converges weakly for short, to a limit x ∈ V if and only

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
124 1. Real analysis

if λ(xn ) → λ(x) for all λ ∈ V ∗ . This weak convergence is often denoted


xn  x, to distinguish it from strong convergence xn → x. Similarly, a
sequence λn ∈ V ∗ converges in the weak* topology to λ ∈ V ∗ if λn (x) → λ(x)
for all x ∈ V (thus λn , viewed as a function on V , converges pointwise to
λ).
Remark 1.9.11. If V is a Hilbert space, then from the Riesz representation
theorem for Hilbert spaces (Theorem 1.4.13) we see that a sequence xn ∈ V
converges weakly (or in the weak* sense) to a limit x ∈ V if and only if
xn , y → x, y for all y ∈ V .
Exercise 1.9.14. Show that if V is a normed vector space, then the weak
topology on V and the weak* topology on V ∗ are both Hausdorff. (Hint:
You will need the Hahn-Banach theorem.) In particular, we conclude the
important fact that weak and weak* limits, when they exist, are unique.

The following exercise shows that the strong, weak, and weak* topologies
can all differ from each other.
Exercise 1.9.15. Let V := c0 (N), thus V ∗ ≡ 1 (N) and V ∗∗ ≡ ∞ (N). Let
e1 , e2 , . . . be the standard basis of either V , V ∗ , or V ∗∗ .
• Show that the sequence e1 , e2 , . . . converges weakly in V to zero, but
does not converge strongly in V .
• Show that the sequence e1 , e2 , . . . converges in the weak* sense in V ∗
to zero, but does not converge in the weak or strong senses in V ∗ .

• Show that the sequence ∞ m=n em for n = 1, 2, . . . converges in the
weak* topology of V ∗∗ to zero, but does not converge in the weak or
strong senses. (Hint: Use a generalised limit functional.)
Remark 1.9.12. Recall from Exercise 1.7.11 that sequences in V ∗ ≡ 1 (N)
that converge in the weak topology also converge in the strong topology.
We caution however that the two topologies are not quite equivalent; for
instance, the open unit ball in 1 (N) is open in the strong topology but not
in the weak.
Exercise 1.9.16. Let V be a normed vector space, and let E be a subset
of V . Show that the following are equivalent:
• E is strongly bounded (i.e., E is contained in a ball).
• E is weakly bounded (i.e., λ(E) is bounded for all λ ∈ V ∗ ).
(Hint: Use the Hahn-Banach theorem and the uniform boundedness princi-
ple.) Similarly, if F is a subset of V ∗ , and V is a Banach space, show that F
is strongly bounded if and only if F is weak* bounded (i.e., {λ(x) : λ ∈ F }
is bounded for each x ∈ V ). Conclude in particular that any sequence which
is weakly convergent in V or weak* convergent in V ∗ is necessarily bounded.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 125

Exercise 1.9.17. Let V be a Banach space, and let xn ∈ V converge weakly


to a limit x ∈ V . Show that the sequence xn is bounded, and
xV ≤ lim inf xn V .
n→∞
Observe from Exercise 1.9.15 that strict inequality can hold (cf. Fatou’s
lemma, Theorem 1.1.21). Similarly, if λn ∈ V ∗ converges in the weak*
topology to a limit λ ∈ V ∗ , show that the sequence λn is bounded and that
λV ∗ ≤ lim inf λn V ∗ .
n→∞
Again, construct an example to show that strict inequality can hold. Thus
we see that weak or weak* limits can lose mass in the limit, as opposed
to strong limits. (Note from the triangle inequality that if xn converges
strongly to x, then xn V converges to xV .)
Exercise 1.9.18. Let H be a Hilbert space, and let xn ∈ H converge weakly
to a limit x ∈ H. Show that the following statements are equivalent:
• xn converges strongly to x.
• xn  converges to x.
Exercise 1.9.19. Let H be a separable Hilbert space. We say that a se-
1 N
quence xn ∈ H converges in the Césaro sense to a limit x ∈ H if N n=1 xn
converges strongly to x as n → ∞.
• Show that if xn converges strongly to x, then it also converges in the
Césaro sense to x.
• Give examples to show that weak convergence does not imply
Césaro convergence, and vice versa. On the other hand, if a se-
quence xn converges both weakly and in the Césaro sense, show that
the weak limit is necessarily equal to the Césaro limit.
• Show that if a bounded sequence converges in the Césaro sense to a
limit x, then some subsequence converges weakly to x.
• Show that a sequence xn converges weakly to x if and only if every
subsequence has a further subsequence that converges in the Césaro
sense to x.
Exercise 1.9.20. Let V be a Banach space. Show that the closed unit ball
in V is also closed in the weak topology, and the closed unit ball in V ∗ is
closed in the weak* topology.
Exercise 1.9.21. Let V be a Banach space. Show that the weak* topology
on V ∗ is complete.
Exercise 1.9.22. Let V be a normed vector space, and let W be a subspace
of V which is closed in the strong topology of V .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
126 1. Real analysis

• Show that W is closed in the weak topology of V .


• If wn ∈ W is a sequence and w ∈ W , show that wn converges to w in
the weak topology of W if and only if it converges to w in the weak
topology of V . (Because of this fact, we can often refer to “the weak
topology” without specifying the ambient space precisely.)
Exercise 1.9.23. Let V := c0 (N) with the uniform (i.e., ∞) norm, and
identify the dual space V ∗ with 1 (N) in the usual manner.
• Show that a sequence xn ∈ c0 (N) converges weakly to a limit x ∈
c0 (N) if and only if the xn are bounded in c0 (N) and converge point-
wise to x.
• Show that a sequence λn ∈ 1 (N) converges in the weak* topology
to a limit λ ∈ 1 (N) if and only if the λn are bounded in 1 (N) and
converge pointwise to λ.
• Show that the weak topology in c0 (N) is not complete.
(More generally, it may help to think of the weak and weak* topologies as
being analogous to pointwise convergence topologies.)

One of the main reasons why we use the weak and weak* topologies in
the first place is that they have much better compactness properties than
the strong topology, thanks to the Banach-Alaoglu theorem:
Theorem 1.9.13 (Banach-Alaoglu theorem). Let V be a normed vector
space. Then the closed unit ball of V ∗ is compact in the weak* topology.

This result should be contrasted with Exercise 1.9.12.

Proof. Let’s say V is a complex vector space (the case of real vector spaces
is of course analogous). Let B ∗ be the closed unit ball of V ∗ , then any
linear functional λ ∈ B ∗ maps the closed unit ball B of V into the disk
D := {z ∈ C : |z| ≤ 1}. Thus one can identify B ∗ with a subset of D B , the
space of functions from B to D. One easily verifies that the weak* topology
on B ∗ is nothing more than the product topology of D B restricted to B ∗ .
Also, one easily shows that B ∗ is closed in D B . But by Tychonoff’s theorem,
D B is compact, and so B ∗ is compact also. 

One should caution that the Banach-Alaoglu theorem does not imply
that the space V ∗ is locally compact in the weak* topology, because the
norm ball in V has empty interior in the weak* topology unless V is finite
dimensional. In fact, we have the following result of Riesz:
Exercise 1.9.24. Let V be a locally compact Hausdorff topological vector
space. Show that V is finite dimensional. (Hint: If V is locally compact,
then there exists an open neighbourhood U of the origin whose closure is

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 127

compact. Show that U ⊂ W + 12 U for some finite-dimensional subspace W ,


where W + 12 U := {w + 12 u : w ∈ W, u ∈ U }. Iterate this to conclude that
U ⊂ W + εU for any ε > 0. On the other hand, use the compactness of U
to show that for any point x ∈ V \W there exists ε > 0 such that x − εU is
disjoint from W . Conclude that U ⊂ W and thence that V = W .)

The sequential version of the Banach-Alaoglu theorem is also of impor-


tance (particularly in PDE):
Theorem 1.9.14 (Sequential Banach-Alaoglu theorem). Let V be a sepa-
rable normed vector space. Then the closed unit ball of V ∗ is sequentially
compact in the weak* topology.

Proof. The functionals in B ∗ are uniformly bounded and uniformly equicon-


tinuous on B, which by hypothesis has a countable dense subset Q. By the
sequential Tychonoff theorem, any sequence in B ∗ then has a subsequence
which converges pointwise on Q, and thus converges pointwise on B by Exer-
cise 1.8.28, and thus converges in the weak* topology. But as B ∗ is closed in
this topology, we conclude that B ∗ is sequentially compact as required. 
Remark 1.9.15. One can also deduce the sequential Banach-Alaoglu theo-
rem from the general Banach-Alaoglu theorem by observing that the weak*
topology on (bounded subsets of) the dual of a separable space is metrisable.
The sequential Banach-Alaoglu theorem can break down for non-separable
spaces. For instance, the closed unit ball in ∞ (N) is not sequentially com-
pact in the weak* topology, basically because the space βN of ultrafilters
is not sequentially compact (see Exercise 2.3.12 of Poincaré’s Legacies, Vol.
I ).

If V is reflexive, then the weak topology on V is identical to the weak*


topology on (V ∗ )∗ . We thus have
Corollary 1.9.16. If V is a reflexive normed vector space, then the closed
unit ball in V is weakly compact and (if V ∗ is separable) is also sequentially
weakly compact.
Remark 1.9.17. If V is a normed vector space that is not separable, then
one can show that V ∗ is not separable either. Indeed, using transfinite induc-
tion on a first uncountable ordinal, one can construct an uncountable proper
chain of closed separable subspaces of the inseparable space V , which by the
Hahn-Banach theorem induces an uncountable proper chain of closed sub-
spaces on V ∗ , which is not compatible with separability. As a consequence,
a reflexive space is separable if and only if its dual is separable.11
11 On the other hand, separable spaces can have non-separable duals; consider 1 (N), for
instance.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
128 1. Real analysis

In particular, any bounded sequence in a reflexive separable normed


vector space has a weakly convergent subsequence. This fact leads to the
very useful weak compactness method in PDE and calculus of variations,
in which a solution to a PDE or variational problem is constructed by first
constructing a bounded sequence of near-solutions or near-extremisers to
the PDE or variational problem and then extracting a weak limit. However,
it is important to caution that weak compactness can fail for non-reflexive
spaces; indeed, for such spaces the closed unit ball in V may not even be
weakly complete, let alone weakly compact, as already seen in Exercise
1.9.23. Thus, one should be cautious when applying the weak compactness
method to a non-reflexive space such as L1 or L∞ . (On the other hand,
weak* compactness does not need reflexivity, and is thus safer to use in such
cases.)
In later notes we will see that the (sequential) Banach-Alaoglu theorem
will combine very nicely with the Riesz representation theorem for mea-
sures (Section 1.10.2), leading in particular to Prokhorov’s theorem (Exercise
1.10.29).

1.9.4. The strong and weak operator topologies. Now we turn our
attention from function spaces to spaces of operators. Recall that if X
and Y are normed vector spaces, then B(X → Y ) is the space of bounded
linear transformations from X to Y . This is a normed vector space with the
operator norm
T op := sup{T xY : xX ≤ 1}.
This norm induces the operator norm topology on B(X → Y ). Unfortu-
nately, this topology is so strong that it is difficult for a sequence of oper-
ators Tn ∈ B(X → Y ) to converge to a limit; for this reason, we introduce
two weaker topologies.

Definition 1.9.18 (Strong and weak operator topologies). Let X, Y be


normed vector spaces. The strong operator topology on B(X → Y ) is the
topology induced by the seminorms T → T xY for all x ∈ X. The weak
operator topology on B(X → Y ) is the topology induced by the seminorms
T → |λ(T x)| for all x ∈ X and λ ∈ Y ∗ .

Note that a sequence Tn ∈ B(X → Y ) converges in the strong operator


topology to a limit T ∈ B(X → Y ) if and only if Tn x → T x strongly
in Y for all x ∈ X, and Tn converges in the weak operator topology. (In
contrast, Tn converges to T in the operator norm topology if and only if
Tn x converges to T x uniformly on bounded sets.) One easily sees that the
weak operator topology is weaker than the strong operator topology, which
in turn is (somewhat confusingly) weaker than the operator norm topology.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 129

Example 1.9.19. When X is the scalar field, then B(X → Y ) is canonically


isomorphic to Y . In this case, the operator norm and strong operator topol-
ogy coincide with the strong topology on Y , and the weak operator norm
topology coincides with the weak topology on Y . Meanwhile, B(Y → X)
coincides with Y ∗ , and the operator norm topology coincides with the strong
topology on Y ∗ , while the strong and weak operator topologies correspond
with the weak* topology on Y ∗ .

We can rephrase the uniform boundedness principle for convergence


(Corollary 1.7.7) as follows:
Proposition 1.9.20 (Uniform boundedness principle). Let Tn ∈ B(X →
Y ) be a sequence of bounded linear operators from a Banach space X to a
normed vector space Y , let T ∈ B(X → Y ) be another bounded linear oper-
ator, and let D be a dense subspace of X. Then the following are equivalent:
• Tn converges in the strong operator topology of B(X → Y ) to T .
• Tn is bounded in the operator norm (i.e., Tn op is bounded ), and
the restriction of Tn to D converges in the strong operator topology
of B(D → Y ) to the restriction of T to D.
Exercise 1.9.25. Let the hypotheses be as in Proposition 1.9.20, but now
assume that Y is also a Banach space. Show that the conclusion of Propo-
sition 1.9.20 continues to hold if “strong operator topology” is replaced by
“weak operator topology”.
Exercise 1.9.26. Show that the operator norm topology, strong operator
topology, and weak operator topology are all Hausdorff. As these topologies
are nested, we thus conclude that it is not possible for a sequence of operators
to converge to one limit in one of these topologies and to converge to a
different limit in another.
Example 1.9.21. Let X = L2 (R), and for each t ∈ R, let Tt : X → X be
the translation operator by t: Tt f (x) := f (x − t). If f is continuous and
compactly supported, then (e.g., from dominated convergence) we see that
Tt f → f in L2 as t → 0. Since the space of continuous and compactly sup-
ported functions is dense in L2 (R), this implies (from the above proposition,
with some obvious modifications to deal with the continuous parameter t in-
stead of the discrete parameter n) that Tt converges in the strong operator
topology (and hence weak operator topology) to the identity. On the other
hand, Tt does not converge to the identity in the operator √ norm topology.
Indeed, observe for any t > 0 that (Tt − I)1[0,t] L2 (R) = 21[0,t] L2 (R) ,

and thus Tt − Iop ≥ 2.
In a similar vein, Tt does not converge to anything in the strong operator
topology (and hence does not converge in the operator norm topology either)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
130 1. Real analysis

in the limit t → ∞, since Tt 1[0,1] (say) does not converge strongly in L2 .


However, one easily verifies that Tt f, g → 0 as t → ∞ for any compactly
supported f, g ∈ L2 (R), and hence for all f, g ∈ L2 (R) by the usual limiting
argument, and hence Tt converges in the weak operator topology to zero.
The following exercise may help clarify the relationship between the
operator norm, strong operator, and weak operator topologies.
Exercise 1.9.27. Let H be a Hilbert space, and let Tn ∈ B(H → H) be a
sequence of bounded linear operators.
• Show that Tn → 0 in the operator norm topology if and only if
Tn xn , yn  → 0 for any bounded sequences xn , yn ∈ H.
• Show that Tn → 0 in the strong operator topology if and only if
Tn xn , yn  → 0 for any convergent sequence xn ∈ H and any bounded
sequence yn ∈ H.
• Show that Tn → 0 in the weak operator topology if and only if
Tn xn , yn  → 0 for any convergent sequences xn , yn ∈ H.
• Show that Tn → 0 in the operator norm (resp. weak operator) topol-
ogy if and only if Tn† → 0 in the operator norm (resp. weak operator)
topology. Give an example to show that the corresponding claim for
the strong operator topology is false.
There is a counterpart of the Banach-Alaoglu theorem (and its sequential
analogue), at least in the case of Hilbert spaces:
Exercise 1.9.28. Let H, H  be Hilbert spaces. Show that the closed unit
ball (in the operator norm) in B(H → H  ) is compact in the weak operator
topology. If H and H  are separable, show that B(H → H  ) is sequentially
compact in the weak operator topology.
The behaviour of convergence in various topologies with respect to com-
position is somewhat complicated, as the following exercise shows.
Exercise 1.9.29. Let H be a Hilbert space, let Sn , Tn ∈ B(H → H) be
sequences of operators, and let S ∈ B(H → H) be another operator.
• If Tn → 0 in the operator norm (resp. strong operator or weak
operator) topology, show that STn → 0 and Tn S → 0 in the operator
norm (resp. strong operator or weak operator) topology.
• If Tn → 0 in the operator norm topology and Sn is bounded in the
operator norm topology, show that Sn Tn → 0 and Tn Sn → 0 in the
operator norm topology.
• If Tn → 0 in the strong operator topology and Sn is bounded in the
operator norm topology, show that Sn Tn → 0 in the strong operator
norm topology.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 131

• Give an example where Tn → 0 in the strong operator topology and


Sn → 0 in the weak operator topology, but Tn Sn does not converge
to zero even in the weak operator topology.
Exercise 1.9.30. Let H be a Hilbert space. An operator T ∈ B(H → H)
is said to be finite rank if its image T (H) is finite dimensional. T is said
to be compact if the image of the unit ball is precompact. Let K(H → H)
denote the space of compact operators on H.
• Show that T ∈ B(H → H) is compact if and only if it is the limit
of finite rank operators in the operator norm topology. Conclude in
particular that K(H → H) is a closed subset of B(H → H) in the
operator norm topology.
• Show that an operator T ∈ B(H → H) is compact if and only if T †
is compact.
• If H is separable, show that every T ∈ B(H → H) is the limit of
finite rank operators in the strong operator topology.
• If T ∈ K(H → H), show that T maps weakly convergent sequences to
strongly convergent sequences. (This property is known as complete
continuity.)
• Show that K(H → H) is a subspace of B(H → H), which is closed
with respect to left and right multiplication by elements of B(H →
H). (In other words, the space of compact operators is a two-ideal
in the algebra of bounded operators.)

The weak operator topology plays a particularly important role on the


theory of von Neumann algebras, which we will not discuss here.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/02/21.
Thanks to Eric, etale, less than epsilon, Matt Daws, PDEBeginner, Sebas-
tian Scholtes, Xiaochuan Liu, Yasser Taima, and anonymous commenters
for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.10

Continuous functions
on locally compact
Hausdorff spaces

A key theme in real analysis is that of studying general functions f : X → R


or f : X → C by first approximating them by simpler or nicer functions.
But the precise class of simple or nice functions may vary from context
to context. In measure theory, for instance, it is common to approximate
measurable functions by indicator functions or simple functions. But in
other parts of analysis, it is often more convenient to approximate rough
functions by continuous or smooth functions (perhaps with compact support,
or some other decay condition) or by functions in some algebraic class, such
as the class of polynomials or trigonometric polynomials.
In order to approximate rough functions by more continuous ones, one
of course needs tools that can generate continuous functions with some spec-
ified behaviour. The two basic tools for this are Urysohn’s lemma, which
approximates indicator functions by continuous functions, and the Tietze
extension theorem, which extends continuous functions on a subdomain to
continuous functions on a larger domain. An important consequence of these
theorems is the Riesz representation theorem for linear functionals on the
space of compactly supported continuous functions, which describes such
functionals in terms of Radon measures.
Sometimes, approximation by continuous functions is not enough; one
must approximate continuous functions in turn by an even smoother class
of functions. A useful tool in this regard is the Stone-Weierstrass theorem,

133

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
134 1. Real analysis

which generalises the classical Weierstrass approximation theorem to more


general algebras of functions.
As an application of this theory (and of many of the results accumu-
lated in previous lecture notes), we will present (in an optional section) the
commutative Gelfand-Neimark theorem classifying all commutative unital
C ∗ -algebras.

1.10.1. Urysohn’s lemma. Let X be a topological space. An indicator


function 1E in this space will not typically be a continuous function (in-
deed, if X is connected, this only happens when E is the empty set or the
whole set). Nevertheless, for certain topological spaces, it is possible to
approximate an indicator function by a continuous function, as follows.

Lemma 1.10.1 (Urysohn’s lemma). Let X be a topological space. Then the


following are equivalent:
(i) Every pair of disjoint closed sets K, L in X can be separated by dis-
joint open neighbourhoods U ⊃ K, V ⊃ L.
(ii) For every closed set K in X and every open neighbourhood U of K,
there exists an open set V and a closed set L such that K ⊂ V ⊂
L ⊂ U.
(iii) For every pair of disjoint closed sets K, L in X there exists a contin-
uous function f : X → [0, 1] which equals 1 on K and 0 on L.
(iv) For every closed set K in X and every open neighbourhood U of K,
there exists a continuous function f : X → [0, 1] such that 1K (x) ≤
f (x) ≤ 1U (x) for all x ∈ X.

A topological space which obeys any (and hence all) of (i)–(iv) is known
as a normal space; definition (i) is traditionally taken to be the standard
definition of normality. We will give some examples of normal spaces shortly.

Proof. The equivalence of (iii) and (iv) is clear, as the complement of a


closed set is an open set and vice versa. The equivalence of (i) and (ii)
follows similarly.
To deduce (i) from (iii), let K, L be disjoint closed sets, let f be as
in (iii), and let U, V be the open sets U := {x ∈ X : f (x) > 2/3} and
V := {x ∈ X : f (x) < 1/3}.
The only remaining task is to deduce (iv) from (ii). Suppose we have a
closed set K = K1 and an open set U = U0 with K1 ⊂ U0 . Applying (ii),
we can find an open set U1/2 and a closed set K1/2 such that

K1 ⊂ U1/2 ⊂ K1/2 ⊂ U0 .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 135

Applying (ii) two more times, we can find more open sets U1/4 , U3/4 and
closed sets K1/4 , K3/4 such that
K1 ⊂ U3/4 ⊂ K3/4 ⊂ U1/2 ⊂ K1/2 ⊂ U1/4 ⊂ K1/4 ⊂ U0 .
Iterating this process, we can construct open sets Uq and closed sets Kq for
every dyadic rational q = a/2n in (0, 1) such that Uq ⊂ Kq for all 0 < q < 1,
and Kq ⊂ Uq for any 0 ≤ q < q  ≤ 1.
If we now define f (x) := sup{q : x ∈ Uq } = inf{q : x ∈ Kq }, where
q ranges over dyadic rationals between 0 and 1, and with the convention
that the emptyset has sup 1 and inf 0, one easily verifies that the sets
{f (x) > α} = q>α Uq and {f (x) < α} = q<α X\Kq are open for every
real number α, and so f is continuous as required. 

The definition of normality is very similar to the Hausdorff property,


which separates pairs of points instead of closed sets. Indeed, if every point
in X is closed (a property known as the T1 property), then normality clearly
implies the Hausdorff property. The converse is not always true, but (as the
term suggests) in practice most topological spaces one works with in real
analysis are normal. For instance:

Exercise 1.10.1. Show that every metric space is normal.

Exercise 1.10.2. Let X be a Hausdorff space.


• Show that a compact subset of X and a point disjoint from that set
can always be separated by open neighbourhoods.
• Show that a pair of disjoint compact subsets of X can always be
separated by open neighbourhoods.
• Show that every compact Hausdorff space is normal.

Exercise 1.10.3. Let R be the real line with the usual topology F , and
let F  be the topology on R generated by F and the rationals. Show that
(R, F  ) is Hausdorff, with every point closed, but is not normal.

The above example was a simple but somewhat artificial example of a


non-normal space. One can create more natural examples of non-normal
Hausdorff spaces (with every point closed), but establishing non-normality
becomes more difficult. The following example is due to Stone [St1948].

Exercise 1.10.4. Let NR be the space of natural number-valued tuples


(nx )x∈R endowed with the product topology (i.e., the topology of pointwise
convergence).
• Show that NR is Hausdorff and every point is closed.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
136 1. Real analysis

• For j = 1, 2, let Kj be the set of all tuples (nx )x∈R such that nx = j
for all x outside of a countable set and such that x → nx is injective
on this finite set (i.e., there do not exist distinct x, x such that nx =
nx = j). Show that K1 , K2 are disjoint and closed.
• Show that given any open neighbourhood U of K1 , there exists dis-
joint
∞ finite subsets A1 , A2 , . . . of R and an injective function f :
i=1 Ai → N such that for any j ≥ 0, any (mx )x∈R such that
mx = f (x) for all x ∈ A1 ∪ · · · ∪ Aj and is identically 1 on Aj+1 ,
lies in U .
• Show that any open neighbourhood of K1 and any open neighbour-
hood of K2 necessarily intersect, and so NR is not normal.
• Conclude that RR with the product topology is not normal.

The property of being normal is a topological one, thus if one topological


space is normal, then any other topological space homeomorphic to it is also
normal. However, (unlike, say, the Hausdorff property), the property of
being normal is not preserved under passage to subspaces:
Exercise 1.10.5. Give an example of a subspace of a normal space which
is not normal. (Hint: Use Exercise 1.10.4, possibly after replacing R with
a homeomorphic equivalent.)

Let Cc (X → R) be the space of real continuous compactly supported


functions on X. Urysohn’s lemma generates a large number of useful el-
ements of Cc (X → R), in the case when X is locally compact Hausdorff
(LCH):
Exercise 1.10.6. Let X be a locally compact Hausdorff space, let K be a
compact set, and let U be an open neighbourhood of K. Show that there
exists f ∈ Cc (X → R) such that 1K (x) ≤ f (x) ≤ 1U (x) for all x ∈ X.
(Hint: First use the local compactness of X to find a neighbourhood of K
with compact closure, then restrict U to this neighbourhood. The closure
of U is now a compact set. Restrict everything to this set, at which point
the space becomes normal.)

One consequence of this exercise is that Cc (X → R) tends to be dense


in many other function spaces. We give an important example here:
Definition 1.10.2 (Radon measure). Let X be a locally compact Hausdorff
space that is also σ-compact, and let B be the Borel σ-algebra. An (unsigned)
Radon measure is an unsigned measure μ : B → R+ with the following
properties:
• Local finiteness. For any compact subset K of X, μ(K) is finite.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 137

• Outer regularity. For any Borel set E of X, μ(E) = inf{μ(U ) : U ⊃


E; U open}.
• Inner regularity. For any Borel set E of X, μ(E) = sup{μ(K) : K ⊂
E; K compact}.
Example 1.10.3. Lebesgue measure m on Rn is a Radon measure, as is
any absolutely continuous unsigned measure mf , where f ∈ L1 (Rn , dm).
More generally, if μ is Radon and ν is a finite unsigned measure which is
absolutely continuous with respect to μ, then ν is Radon. On the other hand,
counting measure on Rn is not Radon (it is not locally finite). It is possible
to define Radon measures on Hausdorff spaces that are not σ-compact or
locally compact, but the theory is more subtle and will not be considered
here. We will study Radon measures more thoroughly in the next section.
Proposition 1.10.4. Let X be a locally compact Hausdorff space which
is also σ-compact, and let μ be a Radon measure on X. Then for any
0 < p < ∞, Cc (X → R) is a dense subset in (real-valued ) Lp (X, μ). In other
words, every element of Lp (X, μ) can be expressed as a limit (in Lp (X, μ))
of continuous functions of compact support.

Proof. Since continuous functions of compact support are bounded, and


compact sets have finite measure, we see that Cc (X) is a subspace of
Lp (X, μ). We need to show that the closure Cc (X) of this space contains all
of Lp (X, μ).
Let K be a compact set, and let E ⊂ K be a Borel set, then E has finite
measure. Applying inner and outer regularity, we can find a sequence of
compact sets Kn ⊂ E and open sets Un ⊃ E such that μ(E\Kn ), μ(Un \E) →
0. Applying Exercise 1.10.6, we can then find fn ∈ Cc (X → R) such
that 1Kn (x) ≤ fn (x) ≤ 1Un (x). In particular, this implies (by the squeeze
theorem) that fn converges in Lp (X, μ) to 1E (here we use the finiteness of p).
Thus 1E lies in Cc (X → R) for any measurable subset E of K. By linearity,
all simple functions supported on K also lie in Cc (X → R); taking closures,
we see that any Lp function supported in K also lies in Cc (X → R). As X is
σ-finite, one can express any non-negative Lp function as a monotone limit
of compactly supported functions, and thus every non-negative Lp function
lies in Cc (X → R), and thus all Lp functions lie in this space, and the claim
follows. 

Of course, the real-valued version of the above proposition immediately


implies a complex-valued analogue. On the other hand, the claim fails when
p = ∞:
Exercise 1.10.7. Let X be a locally compact Hausdorff space that is
σ-compact, and let μ be a Radon measure. Show that the closure of

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
138 1. Real analysis

Cc (X → R) in L∞ (X, μ) is C0 (X → R), the space of continuous real-


valued functions which vanish at infinity (i.e., for every ε > 0 there exists
a compact set K such that |f (x)| ≤ ε for all x ∈ / K). Thus, in general,
Cc (X → R) is not dense in L∞ (X, μ).

Thus we see that the L∞ norm is strong enough to preserve continuity


in the limit, whereas the Lp norms are (locally) weaker and permit discon-
tinuous functions to be approximated by continuous ones.
Another important consequence of Urysohn’s lemma is the Tietze exten-
sion theorem:

Theorem 1.10.5 (Tietze extension theorem). Let X be a normal topological


space, let [a, b] ⊂ R be a bounded interval, let K be a closed subset of X, and
let f : K → [a, b] be a continuous function. Then there exists a continuous
function f˜ : X → [a, b] which extends f , i.e., f˜(x) = f (x) for all x ∈ K.

Proof. It suffices to find a continuous extension f˜ : X → R taking val-


ues in the real line rather than in [a, b], since one can then replace f˜ by
min(max(f˜, a), b) (note that min and max are continuous operations).
Let T : BC(X → R) → BC(K → R) be the restriction map T f :=
f K . This is clearly a continuous linear map; our task is to show that
it is surjective, i.e., to find a solution to the equation T g = f for each
f ∈ BC(X → R). We do this by the standard analysis trick of getting an
approximate solution to T g = f first, and then using iteration to boost the
approximate solution to an exact solution.
Let f : K → R have sup norm 1, thus f takes values in [−1, 1]. To
solve the problem T g = f , we approximate f by 13 1f ≥1/3 − 13 1f ≤−1/3 . By
Urysohn’s lemma, we can find a continuous function g : X → [−1/3, 1/3]
such that g = 1/3 on the closed set {x ∈ K : f ≥ 1/3} and g = −1/3 on
the closed set {x ∈ K : f ≤ −1/3}. Now, T g is not quite equal to f ; but
observe from construction that f − T g has sup norm 2/3.
Scaling this fact, we conclude that, given any f ∈ BC(K → R), we can
find a decomposition f = T g + f  , where gBC(X→R) ≤ 13 f BC(K→R) and
f  BC(K→R) ≤ 23 f BC(K→R) .
Starting with any f = f0 ∈ BC(K → R), we can now iterate this
construction to express fn = T gn + fn+1 for all n = 0, 1, 2, . . ., where
fn BC(K→R) ≤ ( 23 )n f BC(K→R) and gn BC(X→R) ≤ 13 ( 23 )n f BC(K→R) .

As BC(X → R) is a Banach space, we see that ∞ n=0 gn converges abso-
lutely to some limit g ∈ BC(X → R) and that T g = f , as desired. 

Remark 1.10.6. Observe that Urysohn’s lemma can be viewed the special
case of the Tietze extension theorem when K is the union of two disjoint

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 139

closed sets, and when f is equal to 1 on one of these sets and equal to 0 on
the other.

Remark 1.10.7. One can extend the Tietze extension theorem to finite-
dimensional vector spaces: if K is a closed subset of a normal vector space
X and f : K → Rn is bounded and continuous, then one has a bounded
continuous extension f : K → Rn . Indeed, one simply applies the Tietze
extension theorem to each component of f separately. However, if the range
space is replaced by a space with a non-trivial topology, then there can
be topological obstructions to continuous extension. For instance, a map
f : {0, 1} → Y from a two-point set into a topological space Y is always
continuous, but can be extended to a continuous map f˜ : R → Y if and only
if f (0) and f (1) lie in the same path-connected component of Y . Similarly,
if f : S 1 → Y is a map from the unit circle into a topological space Y ,
then a continuous extension from S 1 to R2 exists if and only if the closed
curve f : S 1 → Y is contractible to a point in Y . These sorts of questions
require the machinery of algebraic topology to answer them properly, and
are beyond the scope of this course.

There are analogues for the Tietze extension theorem in some other
categories of functions. For instance, in the Lipschitz category, we have

Exercise 1.10.8. Let X be a metric space, let K be a subset of X, and


let f : K → R be a Lipschitz continuous map with some Lipschitz constant
A (thus |f (x) − f (y)| ≤ Ad(x, y) for all x, y ∈ K). Show that there exists
an extension f˜ : X → R of f which is Lipschitz continuous with the same
Lipschitz constant A. (Hint: A greedy algorithm will work here: pick f˜ to
be as large as one can get away with (or as small as one can get away with).)

One can also remove the requirement that the function f be bounded in
the Tietze extension theorem:

Exercise 1.10.9. Let X be a normal topological space, let K be a closed


subset of X, and let f : K → R be a continuous map (not necessarily
bounded). Then there exists an extension f˜ : X → R of f which is still
continuous. (Hint: First compress f to be bounded by working with, say,
arctan(f ) (other choices are possible), and apply the usual Tietze extension
theorem. There will be some sets in which one cannot invert the compression
function, but one can deal with this by a further appeal to Urysohn’s lemma
to damp the extension out on such sets.)

There is also a locally compact Hausdorff version of the Tietze extension


theorem:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
140 1. Real analysis

Exercise 1.10.10. Let X be locally compact Hausdorff, let K be compact,


and let f ∈ C(K → R). Then there exists f˜ ∈ Cc (X → R) which extends
f.

Proposition 1.10.4 shows that measurable functions in Lp can be approx-


imated by continuous functions of compact support (cf. Littlewood’s second
principle). Another approximation result in a similar spirit is Lusin’s theo-
rem:

Theorem 1.10.8 (Lusin’s theorem). Let X be a locally compact Hausdorff


space that is σ-compact, and let μ be a Radon measure. Let f : X → R be
a measurable function supported on a set of finite measure, and let ε > 0.
Then there exists g ∈ Cc (X → R) which agrees with f outside of a set of
measure at most ε.

Proof. Observe that as f is finite everywhere, it is bounded outside of a set


of arbitrarily small measure. Thus we may assume without loss of generality
that f is bounded. Similarly, as X is σ-compact (or by inner regularity), the
support of f differs from a compact set by a set of arbitrarily small measure;
so we may assume that f is also supported on a compact set K. By Theorem
1.10.5, it then suffices to show that f is continuous on the complement of
an open set of arbitrarily small measure; by outer regularity, we may delete
the adjective “open” from the preceding sentence.
As f is bounded and compactly supported, f lies in Lp (X, μ) for every
0 < p < ∞, and using Proposition 1.10.4 and Chebyshev’s inequality, it
is not hard to find, for each n = 1, 2, . . ., a function fn ∈ Cc (X → R)
which differs from f by at most 1/2n outside of a set of measure at most
ε/2n+2 (say). In particular, fn converges uniformly to f outside of a set of
measure at most ε/4, and f is therefore continuous outside this set. The
claim follows. 

Another very useful application of Urysohn’s lemma is to create parti-


tions of unity.

Lemma 1.10.9 (Partitions of unity). Let X be a normal topological space,


and let (Kα )α∈A be a collection of closed sets that cover X. For each α ∈ A,
let Uα be an open neighbourhood of Kα , which are finitely overlapping in the
sense that each x ∈ X has a neighbourhood that belongs to at most finitely
many of the Uα . Then there exists a continuous
 function fα : X → [0, 1]
supported on Uα for each α ∈ A such that α∈A fα (x) = 1 for all x ∈ X.
If X is locally compact Hausdorff instead of normal, and the Kα are
compact, then one can take the fα to be compactly supported.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 141

Proof. Suppose first that X is normal. By Urysohn’s lemma, one can find a
continuous function gα : X → [0, 1] for each α ∈ A which is supported
 on Uα
and equals 1 on the closed set Kα . Observe that the function g := α∈A gα
is well defined, continuous and bounded below by 1. The claim then follows
by setting fα := gα /g.
The final claim follows by using Exercise 1.10.6 instead of Urysohn’s
lemma. 
Exercise 1.10.11. Let X be a topological space. A function f : X → R is
said to be upper semicontinuous if f −1 ((−∞, a)) is open for all real a and
lower semicontinuous if f −1 ((a, +∞)) is open for all real a.
• Show that an indicator function 1E is upper semicontinuous if and
only if E is closed and lower semicontinuous if and only if E is open.
• If X is normal, show that a function f is upper semi-continuous
if and only if f (x) = inf{g(x) : g ∈ C(X → (−∞, +∞]), g ≥ f }
for all x ∈ X, and lower semi-continuous if and only if f (x) =
sup{g(x) : g ∈ C(X → [−∞, +∞)), g ≤ f } for all x ∈ X, where
we write f ≤ g if f (x) ≤ g(x) for all x ∈ X.

1.10.2. The Riesz representation theorem. Let X be a locally com-


pact Hausdorff space which is also σ-compact. In Definition 1.10.2 we de-
fined the notion of a Radon measure. Such measures are quite common in
real analysis. For instance, we have the following result.
Theorem 1.10.10. Let μ be a non-negative finite Borel measure on a com-
pact metric space X. Then μ is a Radon measure.

Proof. As μ is finite, it is locally finite, so it suffices to show inner and outer


regularity. Let A be the collection of all Borel subsets E of X such that
sup{μ(K) : K ⊂ E, closed} = inf{μ(U ) : U ⊃ E, open} = μ(E).
It will then suffice to show that every Borel set lies in A (note that as X is
compact, a subset K of X is closed if and only if it is compact).
Clearly, A contains the empty set and the whole set X, and it is closed
under complements. It is also closed under finite unions and intersections.
Indeed, given two sets E, F ∈ A, we can find a sequences Kn ⊂ E ⊂
Un , Ln ⊂ F ⊂ Vn of closed sets Kn , Ln and open sets Un , Vn such that
μ(Kn ), μ(Un ) → μ(E) and μ(Ln ), μ(Vn ) → μ(F ). Since
μ(Kn ∩ Ln ) + μ(Kn ∪ Ln ) = μ(Kn ) + μ(Ln )
→ μ(E) + μ(F )
= μ(E ∩ F ) + μ(E ∪ F ),

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
142 1. Real analysis

we have (by monotonicity of μ) that


μ(Kn ∩ Ln ) → μ(E ∩ F ), μ(Kn ∪ Ln ) → μ(E ∪ F )
and similarly
μ(Un ∩ Vn ) → μ(E ∩ F ), μ(Un ∪ Vn ) → μ(E ∪ F ),
and so E ∩ F, E ∪ F ∈ A.
One can also show that A is closed under countable disjoint unions and
is thus a σ-algebra. Indeed, given disjoint sets En ∈ A and ε > 0, pick a
closed Kn ⊂ En and open Un ⊃ En such that μ(En \Kn ), μ(Un \En ) ≤ ε/2n ;
then

 ∞
 ∞
μ( En ) ≤ μ( Un ) ≤ μ(En ) + ε
n=1 n=1 n=1
and

 
N N
μ( En ) ≥ μ( Kn ) ≥ μ(En ) − ε
n=1 n=1 n=1
for any N , and the claim follows from the squeeze test.
To finish the claim it suffices to show that every open set V lies in A.
For this it will suffice to show that V is a countable union of closed sets. But
as X is a compact metric space, it is separable (Lemma 1.8.6), and so V has
a countable dense subset x1 , x2 , . . .. One then easily verifies that every point
in the open set V is contained in a closed ball of rational radius centred at
one of the xi that is in turn contained in V ; thus V is the countable union
of closed sets as desired. 

This result can be extended to more general spaces than compact metric
spaces, for instance to Polish spaces (provided that the measure remains
finite). For instance:
Exercise 1.10.12. Let X be a locally compact metric space which is σ-
compact, and let μ be an unsigned Borel measure which is finite on every
compact set. Show that μ is a Radon measure.

When the assumptions of X are weakened, then it is possible to find


locally finite Borel measures that are not Radon measures, but they are
somewhat pathological in nature.
Exercise 1.10.13. Let X be a locally compact Hausdorff space which is
σ-compact, and let μ be a Radon measure. Define a Fσ set to be a countable
union of closed sets, and a Gδ set to be a countable intersection of open sets.
Show that every Borel set can be expressed as the union of an Fσ set and a
null set, and as a Gδ set with a null subset removed.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 143

If μ is a Radon measure on X, then we can define the integral Iμ (f ) :=


X f dμ for every f ∈ Cc (X → R), since μ assigns every compact set a
finite measure. Furthermore, Iμ is a linear functional on Cc (X → R) which
is positive in the sense that Iμ (f ) ≥ 0 whenever f is non-negative. If we
place the uniform norm on Cc (X → R), then Iμ is continuous if and only if μ
is finite; but we will not use continuity for now, relying instead on positivity.
The fundamentally important Riesz representation theorem for such
spaces asserts that this is the only way to generate such linear function-
als:
Theorem 1.10.11 (Riesz representation theorem for Cc (X → R), un-
signed version). Let X be a locally compact Hausdorff space which is also
σ-compact. Let I : Cc (X → R) → R be a positive linear functional. Then
there exists a unique Radon measure μ on X such that I = Iμ .
Remark 1.10.12. The σ-compactness hypothesis can be dropped (after
relaxing the inner regularity condition to only apply to open sets, rather
than to all sets); but I will restrict attention here to the σ-compact case
(which already covers a large fraction of the applications of this theorem)
as the argument simplifies slightly.

Proof. We first prove the uniqueness, which is quite easy due to all the
properties that Radon measures enjoy. Suppose we had two Radon measures
μ, μ such that I = Iμ = Iμ ; in particular, we have

(1.75) f dμ = f dμ
X X
for all f ∈ Cc (X → R). Now let K be a compact set, and let U be an open
neighbourhood of K. By Exercise 1.10.6, we can find f ∈ Cc (X → R) with
1K ≤ f ≤ 1U ; applying this to (1.75), we conclude that
μ(U ) ≥ μ (K).
Taking suprema in K and using inner regularity, we conclude that μ(U ) ≥
μ (U ); exchanging μ and μ we conclude that μ and μ agree on open sets;
by outer regularity we then conclude that μ and μ agree on all Borel sets.
Now we prove existence, which is significantly trickier. We will ini-
tially make the simplifying assumption that X is compact (so in particular
Cc (X → R) = C(X → R) = BC(X → R)), and remove this assumption at
the end of the proof.
Observe that I is monotone on C(X → R), thus I(f ) ≤ I(g) whenever
f ≤ g.
We would like to define the measure μ on Borel sets E by defining
μ(E) := I(1E ). This does not work directly, because 1E is not continuous.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
144 1. Real analysis

To get around this problem, we shall begin by extending the functional I to


the class BClsc (X → R+ ) of bounded lower semicontinuous non-negative
functions. We define I(f ) for such functions by the formula

I(f ) := sup{I(g) : g ∈ Cc (X → R); 0 ≤ g ≤ f }

(cf. Exercise 1.10.11). This definition agrees with the existing definition of
I(f ) in the case when f is continuous. Since I(1) is finite and I is monotone,
one sees that I(f ) is finite (and non-negative) for all f ∈ BClsc (X → R+ ).
One also easily sees that I is monotone on BClsc (X → R+ ): I(f ) ≤ I(g)
whenever f, g ∈ BClsc (X → R+ ) and f ≤ g, and homogeneous in the sense
that I(cf ) = cI(f ) for all f ∈ BClsc (X → R+ ) and c > 0. It is also easy
to verify the super-additivity property I(f + f  ) ≥ I(f ) + I(f  ) for f, f  ∈
BClsc (X → R+ ); this simply reflects the linearity of I on Cc (X → R),
together with the fact that if 0 ≤ g ≤ f and 0 ≤ g  ≤ f  , then 0 ≤ g + g  ≤
f + f .
We now complement the super-additivity property with a countably
subadditive one: if fn ∈ BClsc (X → R+ ) is a sequence, and f ∈

∞lsc (X → R ) is such that f (x) ≤ n=1 fn (x) for all x ∈ X, then I(f ) ≤
BC +

n=1 I(fn ).

∞Pick a small 1/2 0 < ε < 1. It will suffice to show that I(g) ≤
n=1 I(fn ) + O(ε ) (say) whenever g ∈ Cc (X → R) is such that 0 ≤ g ≤
f , and O(ε1/2 ) denotes a quantity bounded in magnitude by Cε1/2 , where
C is a quantity that is independent of ε.
Fix g. For every x ∈ X, we can find a neighbourhood Ux of x such
that |g(y) − g(x)| ≤ ε for all y ∈ Ux ; we can also find Nx > 0 such that
Nx
n=1 fn (x) ≥ f (x) − ε. By shrinking Ux if necessary, we see from the
lower semicontinuity of the fn and f that we can also ensure that fn (y) ≥
fn (x) − ε/2n for all 1 ≤ n ≤ Nx and y ∈ Ux .
By normality, we can find open neighbourhoods Vx of x whose closure
lies in Ux . The Vx form an open cover of X. Since we are assuming X to
be compact, we can thus find a finite subcover Vx1 , . . . , Vxk of X. Applying

Lemma 1.10.9, we can thus find a partition of unity 1 = kj=1 ψj , where
each ψj is supported on Uxj .

 Let x ∈ X be such that g(x) ≥ ε. Then we can write g(x) =
j:x∈Uxj g(x)ψj (x). If j is in this sum, then |g(xj ) − g(x)| ≤ ε, and thus
√ √
(for ε small enough) g(xj ) ≥ ε/2, and hence f (xj ) ≥ ε/2. We can then
write
Nx j
fn (xj ) √
1≤ + O( ε),
f (xj )
n=1

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 145

and thus

fn (xj ) √
g(x) ≤ g(xj )ψj (x) + O( ε)
√ f (xj )
n=1 j:f (xj )≥ ε/2;Nx ≥n
j

(here we use the fact that j ψj (x) = 1 and that the continuous compactly
supported function g is bounded). Observe that only finitely many sum-
mands are non-zero. We conclude that

fn (xj ) √
I(g) ≤ I( g(xj )ψj ) + O( ε)
√ f (xj )
n=1 j:f (xj )≥ ε/2;Nxj ≥n

(here we use that 1 ∈ Cc (X) and so I(1) is finite). On the other hand, for
any x ∈ X and any n, the expression
fn (xj )
g(xj )ψj (x)
√ f (xj )
j:f (xj )≥ ε/2;Nxj ≥n

is bounded from above by


fn (xj )ψj (x);
j

since fn (x) ≥ fn (xj ) − ε/2n and j ψj (x) = 1, this is bounded above in
turn by
ε/2n + fn (x).
We conclude that


I(g) ≤ [I(fn ) + O(ε/2n )] + O( ε)
n=1
and the subadditivity claim follows.
Combining subadditivity and superadditivity we see that I is additive:
I(f + g) = I(f ) + I(g) for f, g ∈ BClsc (X → R+ ).
Now that we are able to integrate lower semicontinuous functions, we
can start defining the Radon measure μ. When U is open, we define μ(U )
by
μ(U ) := I(1U ),
which is well defined and non-negative since 1U is bounded, non-negative
and lower semicontinuous. When K is closed we define μ(K) by comple-
mentation:
μ(K) := μ(X) − μ(X\K);
this is compatible with the definition of μ on open sets by additivity of I,
and is also non-negative. The monotonicity of I implies monotonicity of μ:
in particular, if a closed set K lies in an open set U , then μ(K) ≤ μ(U ).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
146 1. Real analysis

Given any set E ⊂ X, define the outer measure


μ+ (E) := inf{μ(U ) : E ⊂ U, open}
and the inner measure
μ− (E) := sup{μ(K) : E ⊃ K, closed};
thus 0 ≤ μ− (E) ≤ μ+ (E) ≤ μ(X). We call a set E measurable if μ− (E) =
μ+ (E). By arguing as in the proof of Theorem 1.10.10, we see that the class
of measurable sets is a Boolean algebra. Next, we claim that every open set
U is measurable. Indeed, unwrapping all the definitions, we see that
μ(U ) = sup{I(f ) : f ∈ Cc (X → R); 0 ≤ f ≤ 1U }.
Each f in this supremum is supported in some closed subset K of U , and
from this one easily verifies that μ+ (U ) = μ(U ) = μ− (U ). Similarly, every
closed set K is measurable. We can now extend μ to measurable sets by
declaring μ(E) := μ+ (E) = μ− (E) when E is measurable; this is compatible
with the previous definitions of μ.
Next, let E1 , E2 , . . . be a countable sequence of disjoint measurable sets.
Then for any ε > 0, we can find open neighbourhoods Un of En and closed
sets Kn in En such that μ(En ) ≤ μ(Un ) ≤ μ(En ) + ε/2n and μ(En ) − ε/2n ≤
μ(K∞n ) ≤ μ(En ).
∞ Using the subadditivity
∞ of I on BClsc (X → R+ ), we have
μ( n=1 Un ) ≤ n=1 μ(Un ) ≤ n=1 μ(En )+ε. Similarly, from the additivity
 N N
of I we have μ( N ∞Kn ) = n=1 μ(Kn ) ≥ n=1 
n=1 μ(En ) − ε. Letting
∞ ε → 0,
we conclude that n=1 En is measurable with μ( ∞ n=1 nE ) = n=1 μ(En ).
Thus the Boolean algebra of measurable sets is in fact a σ-algebra, and
μ is a countably additive measure on it. From construction we also see
that it is finite, outer regular, and inner regular, and therefore is a Radon
measure. The only remaining thing to check is that I(f ) = Iμ (f ) for all
f ∈ C(X → R). If f is a finite non-negative linear combination of indicator
functions of open sets, the claim is clear from the construction of μ and
the additivity of I on BClsc (X → R+ ); taking uniform limits, we obtain
the claim for non-negative continuous functions, and then by linearity we
obtain it for all functions.
This concludes the proof in the case when X is compact. Now suppose
that X is σ-compact. Then we can find a partition of unity 1 = ∞ n=0 ψn
into continuous compactly supported functions ψn ∈ Cc (X → R ), with
+

each x ∈ X being contained in the support of finitely many ψn . (Indeed,


from σ-compactness and the locally compact Hausdorff property one can
find a nested sequence K1 ⊂K2 ⊂ · · · of compact sets, with each Kn in the
interior of Kn+1 , such that n Kn = X. Using Exercise 1.10.6, one can find
functions ηn ∈ Cc (X → R+ ) that equal 1 on Kn and are supported
on Kn+1 ;
now take ψn := ηn+1 − ηn and ψ0 := η0 .) Observe that I(f ) = n I(ψn f )

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 147

for all f ∈ Cc (X → R). From the compact case we see that there exists a
finite Radon measure μn such that I(ψn f ) = Iμn (f ) for all f ∈ Cc (X → R);
setting μ := n μn one can verify (using the monotone convergence theorem,
Theorem 1.1.21) that μ obeys the required properties. 

Remark 1.10.13. One can also construct the Radon measure μ using
the Carathéodory extension theorem (Theorem 1.1.17); this proof of the
Riesz representation theorem can be found in many real analysis texts. A
third method is to first create the space L1 by taking the completion of
Cc (X → R) with respect to the L1 norm f L1 := I(|f |), and then define
μ(E) := 1E L1 . It seems to me that all three proofs are almost equally
lengthy and ultimately rely on the same ingredients; they all seem to have
their strengths and weaknesses, and involve at least one tricky computation
somewhere (in the above argument, the most tricky thing is the countable
subadditivity of I on lower semicontinuous functions). I have yet to find
a proof of this theorem which is both clean and conceptual, and would be
happy to learn of other proofs of this theorem.

Remark 1.10.14. One can use the Riesz representation theorem to provide
an alternate construction of Lebesgue measure, say on R. Indeed, the Rie-
mann integral already provides a positive linear functional on Cc (R → R),
which by the Riesz representation theorem must come from a Radon mea-
sure, which can be easily verified to assign the value b − a to every interval
[a, b] and thus must agree with Lebesgue measure. The same approach lets
one define volume measures on manifolds with a volume form.

Exercise 1.10.14. Let X be a locally compact Hausdorff space which is


σ-compact, and let μ be a Radon measure. For any non-negative Borel
measurable function f , show that

f dμ = inf{ g dμ : g ≥ f ; g lower semicontinuous}


X X

and

f dμ = sup{ g dμ : 0 ≤ g ≤ f ; g upper semicontinuous}.


X X

Similarly, for any non-negative lower semicontinuous function g, show that

g dμ = sup{ h dμ : 0 ≤ h ≤ g; h ∈ Cc (X → R)}.
X X

Now we consider signed functionals on Cc (X → R), which we now turn


into a normed vector space using the uniform norm. The key lemma here is
the following variant of the Jordan decomposition theorem (Exercise 1.2.5).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
148 1. Real analysis

Lemma 1.10.15 (Jordan decomposition for functions). Let I ∈ Cc (X →


R)∗ be a (real ) continuous linear functional. Then there exist positive linear
functions I + , I − ∈ Cc (X → R)∗ such that I = I + − I − .

Proof. For f ∈ Cc (X → R+ ), we define


I + (f ) := sup{I(g) : g ∈ Cc (X → R) : 0 ≤ g ≤ f }.
Clearly 0 ≤ I(f ) ≤ I + (f ) for f ∈ Cc (X → R+ ); one also easily verifies
the homogeneity property I + (cf ) = cI + (f ) and superadditivity property
I + (f1 + f2 ) ≥ I + (f1 ) + I + (f2 ) for c > 0 and f, f1 , f2 ∈ Cc (X → R+ ). On
the other hand, if g, f1 , f2 ∈ Cc (X → R+ ) are such that g ≤ f1 +f2 , then we
can decompose g = g1 + g2 for some g1 , g2 ∈ Cc (X → R+ ) with g1 ≤ f1 and
g2 ≤ f2 ; for instance we can take g1 := min(g, f1 ) and g2 := g − g1 . From
this we can complement superadditivity with subadditivity and conclude
that I + (f1 + f2 ) = I + (f1 ) + I + (f2 ).
Every function in Cc (X → R) can be expressed as the difference of two
functions in Cc (X → R+ ). From the additivity and homogeneity of I + on
Cc (X → R+ ), we may thus extend I + uniquely to be a linear functional
on Cc (X → R). Since I is bounded on Cc (X → R), we see that I + is
also. If we then define I − := I + − I, one quickly verifies all the required
properties. 
Exercise 1.10.15. Show that the functionals I + , I − appearing in the above
lemma are unique.

Define a signed Radon measure on a σ-compact, locally compact Haus-


dorff space X to be a signed Borel measure μ whose positive and negative
variations are both Radon. It is easy to see that a signed Radon measure μ
generates a linear functional Iμ on Cc (X → R) as before, and Iμ is contin-
uous if μ is finite. We have a converse:
Exercise 1.10.16 (Riesz representation theorem, signed version). Let X
be a locally compact Hausdorff space which is also σ-compact, and let I ∈
Cc (X → R)∗ be a continuous linear functional. Then there exists a unique
signed finite Radon measure μ such that I = Iμ . (Hint: Combine Theorem
1.10.11 with Lemma 1.10.15.)

The space of signed finite Radon measures on X is denoted M (X → R),


or M (X) for short.
Exercise 1.10.17. Show that the space M (X), with the total variation
norm μM (X) := |μ|(X), is a real Banach space, which is isomorphic to the
dual of both Cc (X → R) and its completion C0 (X → R), thus
Cc (X → R)∗ ≡ C0 (X → R)∗ ≡ M (X).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 149

Remark 1.10.16. Note that the previous exercise generalises the identifica-
tions cc (N)∗ ≡ c0 (N)∗ ≡ 1 (N) from previous notes. For compact Hausdorff
spaces X, we have C(X → R) = C0 (X → R), and thus
C(X → R)∗ ≡ M (X). For locally compact Hausdorff spaces that are σ-
compact but not compact, we instead have C(X → R)∗ ≡ M (βX), where
βX is the Stone-Čech compactification of X, which we will discuss in Section
2.5.
Remark 1.10.17. One can of course also define complex Radon measures
to be those complex finite Borel measures whose real and imaginary parts
are signed Radon measures, and define M (X → C) to be the space of all
such measures; then one has analogues of the above identifications. We omit
the details.
Exercise 1.10.18. Let X, Y be two locally compact Hausdorff spaces that
are also σ-compact, and let f : X → Y be a continuous map. If μ is an
unsigned finite Radon measure on X, show that the pushforward measure
f# μ on Y , defined by f# μ(E) := μ(f −1 (E)), is a Radon measure on Y .
Establish the same fact for signed Radon measures.

Let X be locally compact Hausdorff and σ-compact. As M (X) is equiv-


alent to the dual of the Banach space C0 (X → R), it acquires a weak*
topology (see Section 1.9), known as the vague topology. A sequence of
Radon measures μn ∈ M (X) then converges vaguely to a limit μ ∈ M (X)
if and only if X f dμn → X f dμ for all f ∈ C0 (X → R).
Exercise 1.10.19. Let m be a Lebesgue measure on the real line (with the
usual topology).
• Show that the measures nm [0,1/n] converge vaguely as n → ∞ to
the Dirac mass δ0 at the origin 0.

• Show that the measures n1 ni=1 δi/n converge vaguely as n → ∞
to the measure m [0,1] . (Hint: Continuous, compactly supported
functions are Riemann integrable.)
• Show that the measures δn converge vaguely as n → ∞ to the zero
measure 0.
Exercise 1.10.20. Let X be locally compact Hausdorff and σ-compact.
Show that for every unsigned Radon measure μ, the map ι : L1 (μ) → M (X)
defined by sending f ∈ L1 (μ) to the measure μf is an isometry, thus L1 (μ)
can be identified with a subspace of M (X). Show that this subspace is closed
in the norm topology, but give an example to 
show that it need not be closed
in the vague topology. Show that M (X) = μ L1 (μ), where μ ranges over
all unsigned Radon measures on X; thus one can think of M (X) as many
L1 ’s “glued together”.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
150 1. Real analysis

Exercise 1.10.21. Let X be a locally compact Hausdorff space which is


σ-compact. Let fn ∈ C0 (X → R) be a sequence of functions, and let
f ∈ C0 (X → R) be another function. Show that fn converges weakly to f
in C0 (X → R) if and only if the fn are uniformly bounded and converge
pointwise to f .
Exercise 1.10.22. Let X be a locally compact metric space which is σ-
compact.
• Show that the space of finitely supported measures in M (X) is a
dense subset of M (X) in the vague topology.
• Show that a Radon probability measure in M (X) can be expressed
as the vague limit of a sequence of discrete (i.e., finitely supported)
probability measures.

1.10.3. The Stone-Weierstrass theorem. We have already seen how


rough functions (e.g., Lp functions) can be approximated by continuous
functions. Now we study in turn how continuous functions can be approx-
imated by even more special functions, such as polynomials. The natural
topology to work with here is the uniform topology (since uniform limits of
continuous functions are continuous).
For non-compact spaces, such as R, it is usually not possible to approx-
imate continuous functions uniformly by a smaller class of functions. For
instance, the function sin(x) cannot be approximated uniformly by polyno-
mials on R, since sin(x) is bounded, the only bounded polynomials are the
constants, and constants cannot converge to anything other than another
constant. On the other hand, on a compact domain such as [−1, 1], one can
easily approximate sin(x) uniformly by polynomials, for instance by using
Taylor series. So we will focus instead on compact Hausdorff spaces X such
as [−1, 1], in which continuous functions are automatically bounded.
The space P([−1, 1]) of (real-valued) polynomials is a subspace of the
Banach space C([−1, 1]). But it is also closed under pointwise multiplication
f, g → f g, making P([−1, 1]) an algebra, and not merely a vector space. We
can then rephrase the classical Weierstrass approximation theorem as the
assertion that P([−1, 1]) is dense in C([−1, 1]).
One can then ask the more general question of when a subalgebra A of
C(X), i.e., a subspace closed under pointwise multiplication, is dense. Not
every subalgebra is dense: the algebra of constants, for instance, will not
be dense in C(X) when X has at least two points. Another example in a
similar spirit: given two distinct points x1 , x2 in X, the space {f ∈ C(X) :
f (x1 ) = f (x2 )} is a subalgebra of C(X), but it is not dense, because it is
already closed, and cannot separate x1 and x2 in the sense that it cannot
produce a function that assigns different values to x1 and x2 .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 151

The remarkable Stone-Weierstrass theorem shows that this inability to


separate points is the only obstruction to density, at least for algebras with
the identity.
Theorem 1.10.18 (Stone-Weierstrass theorem, real version). Let X be a
compact Hausdorff space, and let A be a subalgebra of C(X → R) which
contains the constant function 1 and separates points (i.e., for every distinct
x1 , x2 ∈ X, there exists at least one f in A such that f (x1 ) = f (x2 )). Then
A is dense in C(X → R).
Remark 1.10.19. Observe that this theorem contains the Weierstrass ap-
proximation theorem as a special case, since the algebra of polynomials
clearly separates points. Indeed, we will use (a very special case) of the
Weierstrass approximation theorem in the proof.

Proof. It suffices to verify the claim for algebras A which are closed in the
C(X → R) topology, since the claim follows in the general case by replacing
A with its closure (note that the closure of an algebra is still an algebra).
Observe from the Weierstrass approximation theorem that on any
bounded interval [−K, K], the function |x| can be expressed as the uni-
form limit of polynomials Pn (x); one can even write down explicit formulae
for such a Pn , though we will not need such formulae here. Since continuous
functions on the compact space X are bounded, this implies that for any
f ∈ A, the function |f | is the uniform limit of polynomial combinations
Pn (f ) of f . As A is an algebra, the Pn (f ) lie in A; as A is closed; we see
that |f | lies in A.
f −g f −g
Using the identities max(f, g) = f +g f +g
2 +| 2 |, min(f, g) = 2 −| 2 |, we
conclude that A is a lattice in the sense that one has max(f, g), min(f, g) ∈ A
whenever f, g ∈ A.
Now let f ∈ C(X → R) and ε > 0. We would like to find g ∈ A such
that |f (x) − g(x)| ≤ ε for all x ∈ X.
Given any two points x, y ∈ X, we can at least find a function gxy ∈ A
such that gxy (x) = f (x) and gxy (y) = f (y); this follows since the vector
space A separates points and also contains the identity function (the case
x = y needs to be treated separately). We now use these functions gxy to
build the approximant g. First, observe from continuity that for every x, y ∈
X there exists an open neighbourhood Vxy of y such that gxy (y  ) ≥ f (y  ) − ε
for all y  ∈ Vxy . By compactness, for any fixed x we can cover X by a
finite number of these Vxy . Taking the max of all the gxy associated to this
finite subcover, we create another function gx ∈ A such that gx (x) = f (x)
and gx (y) ≥ f (y) − ε for all y ∈ X. By continuity, we can find an open
neighbourhood Ux of x such that gx (x ) ≤ f (x ) + ε for all x ∈ Ux . Again
applying compactness, we can cover X by a finite number of the Ux ; taking

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
152 1. Real analysis

the min of all the gx associated to this finite subcover we obtain g ∈ A with
f (x) − ε ≤ g(x) ≤ f (x) + ε for all x ∈ X, and the claim follows. 

There is an analogue of the Stone-Weierstrass theorem for algebras that


does not contain the identity:
Exercise 1.10.23. Let X be a compact Hausdorff space, and let A be
a closed subalgebra of C(X → R) which separates points but does not
contain the identity. Show that there exists a unique x0 ∈ X such that
A = {f ∈ C(X → R) : f (x0 ) = 0}.

The Stone-Weierstrass theorem is not true as stated in the complex


case. For instance, the space C(D → C) of complex-valued functions on the
closed unit disk D := {z ∈ C : |z| ≤ 1} has a closed proper subalgebra that
separates points, namely the algebra H(D) of functions in C(D → C) that
are holomorphic on the interior of this disk. Indeed, by Cauchy’s theorem
and its converse (Morera’s theorem), a function f ∈ C(D → C) lies in H(D)
if and only if γ f = 0 for every closed contour γ in D, and one easily verifies
that this implies that H(D) is closed; meanwhile, the holomorphic function
z → z separates all points. However, the Stone-Weierstrass theorem can be
recovered in the complex case by adding one further axiom, namely that the
algebra be closed under conjugation:
Exercise 1.10.24 (Stone-Weierstrass theorem, complex version). Let X
be a compact Hausdorff space, and let A be a complex subalgebra of
C(X → C) which contains the constant function 1, separates points, and
is closed under the conjugation operation f → f . Then A is dense in
C(X → C).
Exercise 1.10.25. Let T ⊂ C([R, Z] → C) be the space of trigonometric

polynomials x → N n=−N cn e
2πinx , where N ≥ 0 and the c are complex
n
numbers. Show that T is dense in C([R, Z] → C) (with the uniform topol-
ogy), and that T is dense in Lp ([R, Z] → C) (with the Lp topology) for all
0 < p < ∞.
Exercise 1.10.26. Let X be a locally compact Hausdorff space that is σ-
compact, and let A be a subalgebra of C(X → R) which separates points and
contains the identity function. Show that for every function f ∈ C(X → R)
there exists a sequence fn ∈ A which converges to f uniformly on compact
subsets of X.
Exercise 1.10.27. Let X, Y be compact Hausdorff spaces. Show that every
function f ∈ C(X × Y → R) can be expressed as the uniform limit of

functions of the form (x, y) → kj=1 fj (x)gj (y), where fj ∈ C(X → R) and
gj ∈ C(Y → R).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 153

Exercise 1.10.28. Let (Xα )α∈A be a family of compact Hausdorff spaces,


and let X := α∈A Xα be the product space (with the product topology).
Let f ∈ C(X → R). Show that f can be expressed as the uniform limit of
continuous functions fn , each of which only depend on finitely many of the
coordinates in A. Thus there exists a finite subset An of A and a continuous
function gn ∈ C( α∈An Xα → R) such that fn ((xα )α∈A ) = gn ((xα )α∈An )
for all (xα )α∈A ∈ X.

One useful application of the Stone-Weierstrass theorem is to demon-


strate separability of spaces such as C(X).
Proposition 1.10.20. Let X be a compact metric space. Then C(X → C)
and C(X → R) are separable.

Proof. It suffices to show that C(X → R) is separable. By Lemma 1.8.6,


X has a countable dense subset x1 , x2 , . . .. By Urysohn’s lemma, for each
n, m ≥ 1 we can find a function ψn,m ∈ C(X → R) which equals 1 on
B(xn , 1/m) and is supported on B(xn , 2/m). The ψn,m can then easily be
verified to separate points, and so by the Stone-Weierstrass theorem, the
algebra of polynomial combinations of the ψn,m in C(X → R) are dense;
this implies that the algebra of rational polynomial combinations of the ψn,m
are dense, and the claim follows. 

Combining this with the Riesz representation theorem and the sequential
Banach-Alaoglu theorem (Theorem 1.9.14), we obtain
Corollary 1.10.21. If X is a compact metric space, then the closed unit
ball in M (X) is sequentially compact in the vague topology.

Combining this with Theorem 1.10.10, we conclude a special case of


Prokhorov’s theorem:
Corollary 1.10.22 (Prokhorov’s theorem, compact case). Let X be a com-
pact metric space, and let μn be a sequence of Borel (hence Radon) probabil-
ity measures on X. Then there exists a subsequence of μn which converges
vaguely to another Borel probability measure μ.
Exercise 1.10.29 (Prokhorov’s theorem, non-compact case). Let X be a
locally compact metric space which is σ-compact, and let μn be a sequence
of Borel probability measures. We assume that the sequence μn is tight,
which means that for every ε > 0 there exists a compact set K such that
μn (X\K) ≤ ε for all n. Show that there is a subsequence of μn which
converges vaguely to another Borel probability measure μ. If tightness is
not assumed, show that there is a subsequence which converges vaguely to
a non-negative Borel measure μ, but give an example to show that this
measure need not be a probability measure.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
154 1. Real analysis

This theorem can be used to establish Helly’s selection theorem:

Exercise 1.10.30 (Helly’s selection theorem). Let fn : R → R be a se-


quence of functions whose total variation is uniformly bounded in n, and
which is bounded at one point x0 ∈ R (i.e., {fn (x0 ) : n = 1, 2, . . .} is
bounded). Show that there exists a subsequence of fn which converges
pointwise a.e. on compact subsets of R. (Hint: One can deduce this from
Prokhorov’s theorem using the fundamental theorem of calculus for functions
of bounded variation.)

1.10.4. The commutative Gelfand-Naimark theorem (optional).


One particularly beautiful application of the machinery developed in the
last few notes is the commutative Gelfand-Naimark theorem, which classifies
commutative C ∗ -algebras and is of importance in spectral theory, operator
algebras, and quantum mechanics.

Definition 1.10.23. A complex Banach algebra is a complex Banach space


A which is also a complex algebra, such that xy ≤ xy for all x, y ∈ A.
An algebra is unital if it contains a multiplicative identity 1, and commuta-
tive if xy = yx for all x, y ∈ A. A C ∗ -algebra is a complex Banach algebra
with an antilinear map x → x∗ from A to A which is an isometry (thus
x∗  = x for all x ∈ A), an involution (thus (x∗ )∗ = x for all x ∈ A), an
antihomomorphism (thus (xy)∗ = x∗ y ∗ for all x, y ∈ A), and obeys the C ∗
identity x∗ x = x2 for all x ∈ A.
A homomorphism φ : A → B between two C ∗ -algebras is a continu-
ous algebra homomorphism such that φ(x∗ ) = φ(x)∗ for all x ∈ X. An
isomorphism is an homomorphism whose inverse exists and is also a homo-
morphism; two C ∗ -algebras are isomorphic if there exists an isomorphism
between them.

Exercise 1.10.31. If H is a Hilbert space, and B(H → H) is the algebra of


bounded linear operators on this space with the adjoint map T → T ∗ and the
operator norm, show that B(H → H) is a unital C ∗ -algebra (not necessarily
commutative). Indeed, one can think of C ∗ -algebras as an abstraction of a
space of bounded linear operators on a Hilbert space (this is basically the
content of the non-commutative Gelfand-Naimark theorem, which we will
not discuss here).

Exercise 1.10.32. If X is a compact Hausdorff space, show that C(X → C)


is a unital commutative C ∗ -algebra, with involution f ∗ := f .

The remarkable (unital commutative) Gelfand-Naimark theorem asserts


the converse statement to Exercise 1.10.32:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 155

Theorem 1.10.24 (Unital commutative Gelfand-Naimark theorem). Every


unital commutative C ∗ -algebra A is isomorphic to C(X → C) for some
compact Hausdorff space X.

There are analogues of this theorem for non-unital or non-commutative


C ∗ -algebras, but for simplicity we shall restrict our attention to the unital
commutative case. We first need some spectral theory.
Exercise 1.10.33. Let A be a unital Banach algebra. Show that if x ∈ A
is such that x − 1 < 1, then x is invertible. (Hint: Use Neumann series.)
Conclude that the space A× ⊂ A of invertible elements of A is open.

Define the spectrum σ(x) of an element x ∈ A to be the set of all z ∈ C


such that x − z1 is not invertible.
Exercise 1.10.34. If A is a unital Banach algebra and x ∈ A, show that
σ(x) is a compact subset of C that is contained inside the disk {z ∈ C :
|z| ≤ x}.
Exercise 1.10.35 (Beurling-Gelfand spectral radius formula). If A is a
unital Banach algebra and x ∈ A, show that σ(x) is non-empty with sup{|z| :
z ∈ σ(x)} = limn→∞ xn 1/n . (Hint: To get the upper bound, observe that
if xn − z n 1 is invertible for some n ≥ 1, then so is x − zI. Then use Exercise
1.10.34. To get the lower bound, first observe that for any λ ∈ A∗ , the
function fλ : z → λ((x − zI)−1 ) is holomorphic on the complement of σ(x),
which is already enough (with Liouville’s theorem) to show that σ is non-
empty. Let r > sup{|z| : z ∈ σ(x)} be arbitrary, then use Laurent series to
show that |λ(xn )| ≤ Cλ,r rn for all n and some Cλ,r independent of n. Then
divide by rn and use the uniform boundedness principle to conclude.)
Exercise 1.10.36 (C ∗ -algebra spectral radius formula). Let A be a unital
C ∗ -algebra. Show that
x = (x∗ x)2 1/2 = (xx∗ )2 1/2
n n+1 n n+1

for all n ≥ 1 and x ∈ A. Conclude that any homomorphism between C ∗ -


algebras has operator norm at most 1. Also conclude that
sup{|z| : z ∈ σ(x)} = x.

The next important concept is that of a character.


Definition 1.10.25. Let A be a unital commutative C ∗ -algebra. A char-
acter of A is an element λ ∈ A∗ in the dual Banach space such that
λ(xy) = λ(x)λ(y), λ(1) = 1, and λ(x∗ ) = λ(x) for all x, y ∈ A; equiva-
lently, a character is a homomorphism from A to C (viewed as a (unital) C ∗
algebra). We let  ⊂ A∗ be the space of all characters; this space is known
as the spectrum of A.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
156 1. Real analysis

Exercise 1.10.37. If A is a unital commutative C ∗ -algebra, show that  is


a compact Hausdorff subset of A∗ in the weak-* topology. (Hint: First use
the spectral radius formula to show that all characters have operator norm
1, then use the Banach-Alaoglu theorem.)
Exercise 1.10.38. Define an ideal of a unital commutative C ∗ -algebra A
to be a proper subspace I of A such that xy, yx ∈ I for all x ∈ I and
y ∈ A. Show that if λ ∈ Â, then the kernel λ−1 ({0}) is a maximal ideal in
A; conversely, if I is a maximal ideal in A, show that I is closed, and there
is exactly one λ ∈ Â such that I = λ−1 ({0}). Thus the spectrum of A can
be canonically identified with the space of maximal ideals in A.
Exercise 1.10.39. Let X be a compact Hausdorff space, and let A be the
C ∗ -algebra A := C(X → C). Show that for each x ∈ X, the operation
λx : f → f (x) is a character of A. Show that the map λ : x → λx is a
homeomorphism from X to Â; thus the spectrum of C(X → C) can be
canonically identified with X. (Hint: Use Exercise 1.10.23 to show the
surjectivity of λ, Urysohn’s lemma to show injectivity, and Corollary 1.8.2
to show the homeomorphism property.)
Inspired by the above exercise, we define the Gelfand representation
ˆ: A → C(Â → C), by the formula x̂(λ) := λ(x).
Exercise 1.10.40. Show that if A is a unital commutative C ∗ -algebra, then
the Gelfand representation is a homomorphism of C ∗ -algebras.
Exercise 1.10.41. Let x be a non-invertible element of a unital commuta-
tive C ∗ -algebra A. Show that x̂ vanishes at some λ ∈ Â. (Hint: The set
{xy : y ∈ A} is a proper ideal of A, and thus by Zorn’s lemma (Section 2.4)
it is contained in a maximal ideal.)
Exercise 1.10.42. Show that if A is a unital commutative C ∗ -algebra, then
the Gelfand representation is an isometry. (Hint: Use Exercise 1.10.36 and
Exercise 1.10.41.)
Exercise 1.10.43. Use the complex Stone-Weierstrass theorem and Exer-
cises 1.10.40, 1.10.42 to conclude the proof of Theorem 1.10.24.
Notes. This lecture first appeared at
terrytao.wordpress.com/2009/03/02.
Thanks to Anush Tserunyan, Haokun Xu, Max Baroi, mmailliw/william,
PDEbeginner, and anonymous commenters for corrections.
Eric noted another example of a locally compact Hausdorff space which
was not normal, namely (ω + 1) × (ω1 + 1)\(ω, ω1 ), where ω is the first
infinite ordinal, and ω1 is the first uncountable ordinal (endowed with the
order topology, of course).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.11

Interpolation of Lp
spaces

In the previous sections, we have been focusing largely on the soft side of
real analysis, which is primarily concerned with qualitative properties such as
convergence, compactness, measurability, and so forth. In contrast, we will
now emphasise the hard side of real analysis, in which we study estimates
and upper and lower bounds of various quantities, such as norms of functions
or operators. (Of course, the two sides of analysis are closely connected to
each other; an understanding of both sides and their interrelationships is
needed in order to get the broadest and most complete perspective for this
subject; see Section 1.3 of Structure and Randomness for more discussion.)
One basic tool in hard analysis is that of interpolation, which allows
one to start with a hypothesis of two (or more) upper bound estimates, e.g.,
A0 ≤ B0 and A1 ≤ B1 , and conclude a family of intermediate estimates
Aθ ≤ Bθ (or maybe Aθ ≤ Cθ Bθ , where Cθ is a constant) for any choice of
parameter 0 < θ < 1. Of course, interpolation is not a magic wand; one
needs various hypotheses (e.g., linearity, sublinearity, convexity, or com-
plexifiability) on Ai , Bi in order for interpolation methods to be applicable.
Nevertheless, these techniques are available for many important classes of
problems, most notably that of establishing boundedness estimates such as
T f Lq (Y,ν) ≤ Cf Lp (X,μ) for linear (or linear-like) operators T from one
Lebesgue space Lp (X, μ) to another Lq (Y, ν). (Interpolation can also be
performed for many other normed vector spaces than the Lebesgue spaces,
but we will just focus on Lebesgue spaces in these notes to focus the discus-
sion.) Using interpolation, it is possible to reduce the task of proving such
estimates to that of proving various endpoint versions of these estimates.

157

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
158 1. Real analysis

In some cases, each endpoint only faces a portion of the difficulty that the
interpolated estimate did, and so by using interpolation, one has split the
task of proving the original estimate into two or more simpler subtasks. In
other cases, one of the endpoint estimates is very easy, and the other one
is significantly more difficult than the original estimate. Thus interpolation
does not really simplify the task of proving estimates in this case, but at least
clarifies the relative difficulty between various estimates in a given family.
As is the case with many other tools in analysis, interpolation is not
captured by a single interpolation theorem; instead, there are a family of
such theorems, which can be broadly divided into two major categories, re-
flecting the two basic methods that underlie the principle of interpolation.
The real interpolation method is based on a divide-and-conquer strategy: to
understand how to obtain control on some expression such as T f Lq (Y,ν)
for some operator T and some function f , one would divide f into two or
more components, e.g., into components where f is large and where f is
small, or where f is oscillating with high frequency or only varying with
low frequency. Each component would be estimated using a carefully cho-
sen combination of the extreme estimates available; optimising over these
choices and summing up (using whatever linearity-type properties on T are
available), one would hope to get a good estimate on the original expres-
sion. The strengths of the real interpolation method are that the linearity
hypotheses on T can be relaxed to weaker hypotheses, such as sublinearity
or quasilinearity; also, the endpoint estimates are allowed to be of a weaker
type than the interpolated estimates. On the other hand, the real interpola-
tion often concedes a multiplicative constant in the final estimates obtained,
and one is usually obligated to keep the operator T fixed throughout the
interpolation process. The proofs of real interpolation theorems are also a
little bit messy, though in many cases one can simply invoke a standard
instance of such theorems (e.g., the Marcinkiewicz interpolation theorem)
as a black box in applications.
The complex interpolation method instead proceeds by exploiting the
powerful tools of complex analysis, in particular the maximum modulus prin-
ciple and its relatives (such as the Phragmén-Lindelöf principle). The idea
is to rewrite the estimate to be proven (e.g., T f Lq (Y,ν) ≤ Cf Lp (X,μ) ) in
such a way that it can be embedded into a family of such estimates which
depend holomorphically on a complex parameter s in some domain (e.g.,
the strip {σ + it : t ∈ R, σ ∈ [0, 1]}). One then exploits things like the max-
imum modulus principle to bound an estimate corresponding to an interior
point of this domain by the estimates on the boundary of this domain. The
strengths of the complex interpolation method are that it typically gives
cleaner constants than the real interpolation method, and also allows the

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 159

underlying operator T to vary holomorphically with respect to the param-


eter s, which can significantly increase the flexibility of the interpolation
technique. The proofs of these methods are also very short (if one takes
the maximum modulus principle and its relatives as a black box), which
make the method particularly amenable for generalisation to more intri-
cate settings (e.g., multilinear operators, mixed Lebesgue norms, etc.) On
the other hand, the somewhat rigid requirement of holomorphicity makes
it much more difficult to apply this method to non-linear operators, such
as sublinear or quasi-linear operators; also, the interpolated estimate tends
to be of the same type as the extreme ones, so that one does not enjoy
the upgrading of weak type estimates to strong type estimates that the real
interpolation method typically produces. Also, the complex method runs
into some minor technical problems when target space Lq (Y, ν) ceases to be
a Banach space (i.e., when q < 1) as this makes it more difficult to exploit
duality.
Despite these differences, the real and complex methods tend to give
broadly similar results in practice, especially if one is willing to ignore con-
stant losses in the estimates or epsilon losses in the exponents.
The theory of both real and complex interpolation can be studied ab-
stractly, in general normed or quasi-normed spaces; see, e.g., [BeLo1976]
for a detailed treatment. However in these notes we shall focus exclusively
on interpolation for Lebesgue spaces Lp (and their cousins, such as the weak
Lebesgue spaces Lp,∞ and the Lorentz spaces Lp,r ).

1.11.1. Interpolation of scalars. As discussed in the introduction, most


of the interesting applications of interpolation occur when the technique is
applied to operators T . However, in order to gain some intuition as to why
interpolation works in the first place, let us first consider the significantly
simpler (though rather trivial) case of interpolation in the case of scalars or
functions.
We begin first with scalars. Suppose that A0 , B0 , A1 , B1 are non-negative
real numbers such that
(1.76) A0 ≤ B0
and
(1.77) A1 ≤ B1 .
Then clearly we will have
(1.78) Aθ ≤ Bθ
for every 0 ≤ θ ≤ 1, where we define
(1.79) Aθ := A01−θ Aθ1

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
160 1. Real analysis

and
(1.80) Bθ := B01−θ B1θ ;
indeed one simply raises (1.76) to the power 1−θ, (1.77) to the power θ, and
multiplies the two inequalities together. Thus for instance, when θ = 1/2
one obtains the geometric mean of (1.76) and (1.77):
1/2 1/2 1/2 1/2
A0 A1 ≤ B0 B1 .
One can view Aθ and Bθ as the unique log-linear functions of θ (i.e., log Aθ ,
log Bθ are (affine-)linear functions of θ) which equal their boundary values
A0 , A1 and B0 , B1 , respectively, as θ = 0, 1.
Example 1.11.1. If A0 = AL1/p0 and A1 = AL1/p1 for some A, L > 0 and
0 < p0 , p1 ≤ ∞, then the log-linear interpolant Aθ is given by Aθ = AL1/pθ ,
where 0 < pθ ≤ ∞ is the quantity such that p1θ = 1−θ θ
p0 + p1 .

The deduction of (1.78) from (1.76), (1.77) is utterly trivial, but there
are still some useful lessons to be drawn from it. For instance, let us take
A0 = A1 = A for simplicity, so we are interpolating two upper bounds
A ≤ B0 , A ≤ B1 on the same quantity A to give a new bound A ≤ Bθ . But
actually we have a refinement available to this bound, namely
B0 B 1 ε
(1.81) Aθ ≤ Bθ min( , )
B1 B0
for any sufficiently small ε > 0 (indeed one can take any ε less than or equal
to min(θ, 1 − θ)). Indeed one sees this simply by applying (1.78) with θ
with θ − ε and θ + ε and taking minima. Thus we see that (1.78) is only
sharp when the two original bounds B0 , B1 are comparable; if instead we
have B1 ∼ 2n B0 for some integer n, then (1.81) tells us that we can improve
(1.78) by an exponentially decaying factor of 2−ε|n| . The geometric series
formula tells us that such factors are absolutely summable, and so in practice
it is often a useful heuristic to pretend that the n = O(1) cases dominate so
strongly that the other cases can be viewed as negligible by comparison.
Also, one can trivially extend the deduction of (1.78) from (1.76), (1.77)
as follows: if θ → Aθ is a function from [0, 1] to R+ which is log-convex
(thus θ → log Aθ is a convex function of θ, and (1.76) and (1.77) hold for
some B0 , B1 > 0, then (1.78) holds for all intermediate θ also, where Bθ is
of course defined by (1.80)). Thus one can interpolate upper bounds on log-
convex functions. However, one certainly cannot interpolate lower bounds:
lower bounds on a log-convex function θ → Aθ at θ = 0 and θ = 1 yield no
information about the value of, say, A1/2 . Similarly, one cannot extrapolate
upper bounds on log-convex functions: an upper bound on, say, A0 and A1/2
does not give any information about A1 . (However, an upper bound on A0

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 161

coupled with a lower bound on A1/2 gives a lower bound on A1 ; this is the
contrapositive of an interpolation statement.)
Exercise 1.11.1. Show that the sum f + g, product f g, or pointwise max-
imum max(f, g) of two log-convex functions f, g : [0, 1] → R+ is log-convex.
Remark 1.11.2. Every non-negative log-convex function θ → Aθ is convex,
thus in particular Aθ ≤ (1 − θ)A0 + θA1 for all 0 ≤ θ ≤ 1 (note that this
generalises the arithmetic mean-geometric mean inequality). Of course, the
converse statement is not true.

Now we turn to the complex version of the interpolation of log-convex


functions, a result known as Lindelöf ’s theorem:
Theorem 1.11.3 (Lindelöf’s theorem). Let s → f (s) be a holomorphic
function on the strip S := {σ + it : 0 ≤ σ ≤ 1; t ∈ R}, which obeys the bound
(1.82) |f (σ + it)| ≤ A exp(exp((π − δ)t))
for all σ+it ∈ S and some constants A, δ > 0. Suppose also that |f (0+it)| ≤
B0 and |f (1 + it)| ≤ B1 for all t ∈ R. Then we have |f (θ + it)| ≤ Bθ for all
0 ≤ θ ≤ 1 and t ∈ R, where Bθ is of course defined by (1.80).
Remark 1.11.4. The hypothesis (1.82) is a qualitative hypothesis rather
than a quantitative one, since the exact values of A, σ do not show up in the
conclusion. It is quite a mild condition; any function of exponential growth
in t, or even with such super-exponential growth as O(|t||t| ) or O(e|t| ),
O(1)

will obey (1.82). The principle however fails without this hypothesis, as
one can see for instance by considering the holomorphic function f (s) :=
exp(−i exp(πis)).

Proof. Observe that the function s → B01−s B1s is holomorphic and non-zero
on S, and has magnitude exactly Bθ on the line Re(s) = θ for each 0 ≤ θ ≤ 1.
Thus, by dividing f by this function (which worsens the qualitative bound
(1.82) slightly), we may reduce to the case when Bθ = 1 for all 0 ≤ θ ≤ 1.
Suppose we temporarily assume that f (σ + it) → 0 as |σ + it| → ∞.
Then by the maximum modulus principle (applied to a sufficiently large
rectangular portion of the strip), it must then attain a maximum on one of
the two sides of the strip. But |f | ≤ 1 on these two sides, and so |f | ≤ 1 on
the interior as well.
To remove the assumption that f goes to zero at infinity, we use the
trick of giving ourselves an epsilon of room (Section 2.7). Namely, we mul-
tiply f (s) by the holomorphic function gε (s) := exp(εi exp(i[(π − δ/2)s +
δ/4])) for some ε > 0. A little complex arithmetic shows that the function
f (s)gε (s)gε (1 − s) goes to zero at infinity in S (the gε (s) factor decays fast
enough to damp out the growth of f as Im(s) → −∞, while the gε (1 − s)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
162 1. Real analysis

damps out the growth as Im(s) → +∞), and is bounded in magnitude by


1 on both sides of the strip S. Applying the previous case to this function,
then taking limits as ε → 0, we obtain the claim. 
Exercise 1.11.2. With the notation and hypotheses of Theorem 1.11.3,
show that the function σ → supt∈R |f (σ + it)| is log-convex on [0, 1].
Exercise 1.11.3 (Hadamard three-circles theorem). Let f be a holomorphic
function on an annulus {z ∈ C : R1 ≤ |z| ≤ R2 }. Show that the function
r → supθ∈[0,2π] |f (reiθ )| is log-convex on [R1 , R2 ].
Exercise 1.11.4 (Phragmén-Lindelöf principle). Let f be as in Theorem
1.11.3, but suppose that we have the bounds f (0 + it) ≤ C(1 + |t|)a0 and
f (1 + it) ≤ C(1 + |t|)a1 for all t ∈ R and some exponents a0 , a1 ∈ R and
a constant C > 0. Show that one has f (σ + it) ≤ C  (1 + |t|)(1−σ)a0 +σa1
for all σ + it ∈ S and some constant C  (which is allowed to depend on
the constants A, δ in (1.82)). (Hint: It is convenient to work first in a half-
strip such as {σ + it ∈ S : t ≥ T } for some large T . Then multiply f by
something like exp(−((1 − z)a0 + za1 ) log(−iz)) for some suitable branch of
the logarithm and apply a variant of Theorem 1.11.3 for the half-strip. A
more refined estimate in this regard is due to Rademacher [Ra1959].) This
particular version of the principle gives the convexity bound for Dirichlet
series such as the Riemann zeta function. Bounds which exploit the deeper
properties of these functions to improve upon the convexity bound are known
as subconvexity bounds and are of major importance in analytic number
theory, which is of course well outside the scope of this course.

1.11.2. Interpolation of functions. We now turn to the interpolation


in function spaces, focusing particularly on the Lebesgue spaces Lp (X) and
the weak Lebesgue spaces Lp,∞ (X). Here, X = (X, X , μ) is a fixed measure
space. It will not matter much whether we deal with real or complex spaces;
for sake of concreteness we work with complex spaces. Then for 0 < p < ∞,
recall (see Section 1.3) that Lp (X) is the space of all functions f : X → C
whose Lp norm
f Lp (X) := ( |f |p dμ)1/p
X
is finite, modulo almost everywhere equivalence. The space L∞ (X) is defined
similarly, but where f L∞ (X) is the essential supremum of |f | on X.
A simple test case in which to understand the Lp norms better is that of
a step function f = A1E , where A is a non-negative number and E a set of
finite measure. Then one has f Lp (X) = Aμ(E)1/p for 0 < p ≤ ∞. Observe
that this is a log-convex function of 1/p. This is a general phenomenon:
Lemma 1.11.5 (Log-convexity of Lp norms). Let 0 < p0 < p1 ≤ ∞ and
f ∈ Lp0 (X) ∩Lp1 (X). Then f ∈ Lp (X) for all p0 ≤ p ≤ p1 , and furthermore

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 163

we have
f Lpθ (X) ≤ f L p0 (X) f Lp1 (X)
1−θ θ

for all 0 ≤ θ ≤ 1, where the exponent pθ is defined by 1/pθ := (1 − θ)/p0 +


θ/p1 .
In particular, we see that the function 1/p → f Lp (X) is log-convex
whenever the right-hand side is finite (and is in fact log-convex for all 0 ≤
1/p < ∞, if one extends the definition of log-convexity to functions that can
take the value +∞). In other words, we can interpolate any two bounds
f Lp0 (X) ≤ B0 and f Lp1 (X) ≤ B1 to obtain f Lpθ (X) ≤ Bθ for all
0 ≤ θ ≤ 1.

Let us give several proofs of this lemma. We will focus on the case
p1 < ∞; the endpoint case p1 = ∞ can be proven directly, or by modifying
the arguments below, or by using an appropriate limiting argument, and we
leave the details to the reader.
The first proof is to use Hölder’s inequality

f pLθpθ (X) = |f |(1−θ)pθ |f |θpθ dμ ≤ |f |(1−θ)pθ Lp0 /((1−θ)pθ ) |f |θpθ Lp1 /(θpθ )
X
when p1 is finite (with some minor modifications in the case p1 = ∞).
Another (closely related) proof proceeds by using the log-convexity in-
equality
|f (x)|pθ ≤ (1 − α)|f (x)|p0 + α|f (x)|p1
for all x, where 0 < α < 1 is the quantity such that pθ = (1 − α)p0 + αp1 .
If one integrates this inequality in x, one already obtains the claim in the
normalised case when f Lp0 (X) = f Lp1 (X) = 1. To obtain the general
case, one can multiply the function f and the measure μ by appropriately
chosen constants to obtain the above normalisation; we leave the details as
an exercise to the reader. (The case when f Lp0 (X) or f Lp1 (X) vanishes
is of course easy to handle separately.)
A third approach is more in the spirit of the real interpolation method,
avoiding the use of convexity arguments. As in the second proof, we can
reduce to the normalised case f Lp0 (X) = f Lp1 (X) = 1. We then split
f = f 1|f |≤1 + f 1|f |>1 , where 1|f |≤1 is the indicator function to the set {x :
|f (x)| ≤ 1}, and similarly for 1|f |>1 . Observe that

f 1|f |≤1 pLθpθ (X) = |f |pθ dμ ≤ |f |p0 dμ = 1


|f |≤1 X

and similarly

f 1|f |>1 pLθpθ (X) = |f |pθ dμ ≤ |f |p1 dμ = 1,


|f |>1 X

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
164 1. Real analysis

and so by the quasi-triangle inequality (or triangle inequality, when pθ ≥ 1)


f Lpθ (X) ≤ C
for some constant C depending on pθ . Note, by the way, that this argument
gives the inclusions
(1.83) Lp0 (X) ∩ Lp1 (X) ⊂ Lpθ (X) ⊂ Lp0 (X) + Lp1 (X).
This is off by a constant factor from what we want. But one can eliminate
this constant by using the tensor power trick (Section 1.9 of Structure and
Randomness). Indeed, if one replaces X with a Cartesian power X M (with
the product σ-algebra X M and product measure μM ), and replaces f by
the tensor power f ⊗M : (x1 , . . . , xm ) → f (x1 ) · · · f (xm ), we see from many
applications of the Fubini-Tonelli theorem that
f ⊗M Lp (X) = f M
Lp (X)

for all p. In particular, f ⊗M obeys the same normalisation hypotheses as f ,


and thus by applying the previous inequality to f ⊗M , we obtain
f M
Lpθ (X) ≤ C
for every M , where it is key to note that the constant C on the right is
independent of M . Taking M th roots and then sending M → ∞, we obtain
the claim.
Finally, we give a fourth proof in the spirit of the complex interpo-
lation method. By replacing f by |f | we may assume f is non-negative.
By expressing non-negative measurable functions as the monotone limit of
simple functions and using the monotone convergence theorem (Theorem
1.1.21), we may assume that f is a simple function, which is then neces-
sarily of finite measure support from the Lp finiteness hypotheses. Now
consider the function s → X |f |(1−s)p0 +sp1 dμ. Expanding f out in terms
of step functions we see that this is an analytic function of f which grows
at most exponentially in s; also, by the triangle inequality this function has
magnitude at most X |f |p0 when s = 0 + it and magnitude X |f |p1 when
s = 1 + it. Applying Theorem 1.11.3 and specialising to the value of s for
which (1 − s)p0 + sp1 = pθ , we obtain the claim.
Exercise 1.11.5. If 0 < θ < 1, show that equality holds in Lemma 1.11.5
if and only if |f | is a step function.

Now we consider variants of interpolation in which the strong Lp spaces


are replaced by their weak counterparts Lp,∞ . Given a measurable function
f : X → C, we define the distribution function λf : R+ → [0, +∞] by the
formula
λf (t) := μ({x ∈ X : |f (x)| ≥ t}) = 1|f |≥t dμ.
X

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 165

This distribution function is closely connected to the Lp norms. Indeed,


from the calculus identity

dt
|f (x)|p = p 1|f |≥t tp
0 t
and the Fubini-Tonelli theorem, we obtain the formula

dt
(1.84) f pLp (X) = p λf (t)tp
0 t
for all 0 < p < ∞, thus the Lp
norms are essentially moments of the dis-
tribution function. The L∞ norm is of course related to the distribution
function by the formula
f L∞ (X) = inf{t ≥ 0 : λf (t) = 0}.
Exercise 1.11.6. Show that we have the relationship
f pLp (X) ∼p λf (2n )2np
n∈Z
for any measurable f : X → C and 0 < p < ∞, where we use X ∼p Y
to denote a pair of inequalities of the form cp Y ≤ X ≤ Cp Y for some
constants cp , Cp > 0 depending only on p. (Hint: λf (t) is non-increasing in
t.) Thus we can relate the Lp norms of f to the dyadic values λf (2n ) of the
distribution function; indeed, for any 0 < p ≤ ∞, f Lp (X) is comparable
(up to constant factors depending on p) to the p (Z) norm of the sequence
n → 2n λf (2n )1/p .

Another relationship between the Lp norms and the distribution function


is given by observing that

f pLp (X) = |f |p dμ ≥ tp dμ = tp λf (t)


X |f |≥t

for any t > 0, leading to Chebyshev’s inequality


1
λf (t) ≤ p f pLp (X) .
t
(The p = 1 version of this inequality is also known as Markov’s inequality.
In probability theory, Chebyshev’s inequality is often specialised to the case
p = 2, and with f replaced by a normalised function f − Ef . Note that, as
with many other Cyrillic names, there are also a large number of alternative
spellings of Chebyshev in the Roman alphabet.)
Chebyshev’s inequality motivates one to define the weak Lp norm
f Lp,∞ (X) of a measurable function f : X → C for 0 < p < ∞ by the
formula
f Lp,∞ (X) := sup tλf (t)1/p ,
t>0

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
166 1. Real analysis

thus Chebyshev’s inequality can be expressed succinctly as


f Lp,∞ (X) ≤ f Lp (X) .
It is also natural to adopt the convention that f L∞,∞ (X) = f L∞ (X) . If
f, g : X → C are two functions, we have the inclusion
{|f + g| ≥ t} ⊂ {|f | ≥ t/2} ∪ {|g| ≥ t/2},
and hence
λf +g (t) ≤ λf (t/2) + λg (t/2);
this easily leads to the quasi-triangle inequality
f + gLp,∞ (X) p f Lp,∞ (X) + f Lp,∞ (X) ,

where we use12 X p Y as shorthand for the inequality X ≤ Cp Y for some


constant Cp depending only on p (it can be a different constant at each use
of the p notation).
Let Lp,∞ (X) be the space of all f : X → C which have finite Lp,∞ (X),
modulo almost everywhere equivalence; this space is also known as weak
Lp (X). The quasi-triangle inequality soon implies that Lp,∞ (X) is a quasi-
normed vector space with the Lp,∞ (X) quasi-norm, and Chebyshev’s in-
equality asserts that Lp,∞ (X) contains Lp (X) as a subspace (though the Lp
norm is not a restriction of the Lp,∞ (X) norm).

Example 1.11.6. If X = Rn with the usual measure and 0 < p < ∞,


then the function f (x) := |x|−n/p is in weak Lp , but not strong Lp . It is
also not in strong or weak Lq for any other q. But the local component
|x|−n/p 1|x|≤1 of f is in strong and weak Lq for all q > p, and the global
component |x|−n/p 1|x|>1 of f is in strong and weak Lq for all q > p.

Exercise 1.11.7. For any 0 < p, q ≤ ∞ and f : X → C, define the (dyadic)


Lorentz norm f Lp,q (X) to be q (Z) norm of the sequence n → 2n λf (2n )1/p ,
and define the Lorentz space Lp,q (X) to be the space of functions f with
f Lp,q (X) finite, modulo almost everywhere equivalence. Show that Lp,q (X)
is a quasi-normed space, which is equivalent to Lp,∞ (X) when q = ∞ and
to Lp (X) when q = p. Lorentz spaces arise naturally in more refined appli-
cations of the real interpolation method, and are useful in certain endpoint
estimates that fail for Lebesgue spaces but which can be rescued by using
Lorentz spaces instead. However, we will not pursue these applications in
detail here.

12 In analytic number theory, it is more customary to use  instead of  , following Vino-


p p
gradov. However, in analysis  is sometimes used instead to denote “much smaller than”, e.g.,
X  Y denotes the assertion X ≤ cY for some sufficiently small constant c.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 167

Exercise 1.11.8. Let X be a finite set with counting measure, and let
f : X → C be a function. For any 0 < p < ∞, show that
f Lp,∞ (X) ≤ f Lp (X) p log(1 + |X|)f Lp,∞ (X) .
(Hint: To prove the second inequality, normalise f Lp,∞ (X) = 1, and then
manually dispose of the regions of X where f is too large or too small.)
Thus, in some sense, weak Lp and strong Lp are equivalent up to logarithmic
factors.

One can interpolate weak Lp bounds just as one can strong Lp bounds:
if f Lp0 ,∞ (X) ≤ B0 and f Lp1 ,∞ (X) ≤ B1 , then
(1.85) f Lpθ ,∞ (X) ≤ Bθ
for all 0 ≤ θ ≤ 1. Indeed, from the hypotheses we have
B0p0
λf (t) ≤
t p0
and
B1p1
λf (t) ≤
t p1
for all t > 0, and hence by scalar interpolation (using an interpolation
parameter 0 < α < 1 defined by pθ = (1 − α)p0 + αp1 , and after doing some
algebra) we have
Bθpθ
(1.86) λf (t) ≤
t pθ
for all 0 < θ < 1.
As remarked in the previous section, we can improve upon (1.86); indeed,
if we define t0 to be the unique value of t where B0p0 /tp0 and B1p1 /tp1 are
equal, then we have
Bθpθ
λf (t) ≤ p min(t/t0 , t0 /t)ε

for some ε > 0 depending on p0 , p1 , θ. Inserting this improved bound into
(1.84) we see that we can improve the weak-type bound (1.85) to a strong-
type bound
(1.87) f Lpθ (X) ≤ Cp0 ,p1 ,θ Bθ
for some constant Cp0 ,p1 ,θ . Note that one cannot use the tensor power trick
this time to eliminate the constant Cp0 ,p1 ,θ as the weak Lp norms do not
behave well with respect to tensor product. Indeed, the constant Cp0 ,p1 ,θ
must diverge to infinity in the limit θ → 0 if p0 = ∞, otherwise it would
imply that the Lp0 norm is controlled by the Lp0 ,∞ norm, which is false by
Example 1.11.6; similarly one must have a divergence as θ → 1 if p1 = ∞.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
168 1. Real analysis

Exercise 1.11.9. Let 0 < p0 < p1 ≤ ∞ and 0 < θ < 1. Refine the
inclusions in (1.83) to
Lp0 (X) ∩ Lp1 (X) ⊂ Lp0 ,∞ (X) ∩ Lp1 ,∞ (X) ⊂ Lpθ (X)
⊂ Lpθ ,∞ (X) ⊂ Lp0 (X) + Lp1 (X) ⊂ Lp0 ,∞ (X) + Lp1 ,∞ (X).

Define the strong type diagram of a function f : X → C to be the set of


all 1/p for which f lies in strong Lp , and the weak type diagram to be the set
of all 1/p for which f lies in weak Lp . Then both the strong and weak type
diagrams are connected subsets of [0, +∞), and the strong type diagram is
contained in the weak type diagram, and contains in turn the interior of
the weak type diagram. By experimenting with linear combinations of the
examples in Example 1.11.6 we see that this is basically everything one can
say about the strong and weak type diagrams, without further information
on f or X.
Exercise 1.11.10. Let f : X → C be a measurable function which is
finite almost everywhere. Show that there exists a unique non-increasing
left-continuous function f ∗ : R+ → R+ such that λf ∗ (t) = λf (t) for all
t ≥ 0, and in particular f Lp (X) = f ∗ Lp (R+ ) for all 0 < p ≤ ∞, and
f Lp,∞ (X) = f ∗ Lp,∞ (R+ ) . (Hint: First look for the formula that describes
f ∗ (x) for some x > 0 in terms of λf (t).) The function f ∗ is known as
the non-increasing rearrangement of f , and the spaces Lp (X) and Lp,∞ (X)
are examples of rearrangement-invariant spaces. There is a class of useful
rearrangement inequalities that relate f to its rearrangements, and which
can be used to clarify the structure of rearrangement-invariant spaces, but
we will not pursue this topic here.
Exercise 1.11.11. Let (X, X , μ) be a σ-finite measure space, let 1 < p < ∞,
and f : X → C be a measurable function. Show that the following are
equivalent:
• f lies in Lp,∞ (X), thus f Lp,∞ (X) ≤ C for some finite C.

• There exists a constant C  such that | X f 1E dμ| ≤ C  μ(E)1/p for
all sets E of finite measure.
Furthermore show that the best constants C, C  in the above state-
ments are equivalent up to multiplicative constants depending on p, thus
C ∼p C  . Conclude that the modified weak Lp,∞ (X) norm f L̃p,∞ (X) :=

supE μ(E)−1/p | X f 1E dμ|, where E ranges over all sets of positive finite
measure, is a genuine norm on Lp,∞ (X) which is equivalent to the Lp,∞ (X)
quasi-norm.
Exercise 1.11.12. Let n > 1 be an integer. Find a probability space
(X, X , μ) and functions f1 , . . . , fn : X → R with fj L1,∞ (X) ≤ 1 for j =

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 169


1, . . . , n such that  nj=1 fj L1,∞ (X) ≥ cn log n for some absolute constant
c> 0. (Hint: Exploit the logarithmic divergence of the harmonic series
∞ 1
j=1 j .) Conclude that there exists a probability space X such that the
1,∞
L (X) quasi-norm is not equivalent to an actual norm.
Exercise 1.11.13. Let (X, X , μ) be a σ-finite measure space, let 0 < p < ∞,
and f : X → C be a measurable function. Show that the following are
equivalent:
• f lies in Lp,∞ (X).
• There exists a constant C such that for every set E of finite
measure, there exists a subset E  with μ(E  ) ≥ 12 μ(E) such that

| X f 1E  dμ| ≤ Cμ(E)1/p .
Exercise 1.11.14. Let (X, X , μ) be a measure space of finite measure, and
f : X → C be a measurable function. Show that the following two state-
ments are equivalent:
• There exists a constant C > 0 such that f Lp (X) ≤ Cp for all
1 ≤ p < ∞.
• There exists a constant c > 0 such that X ec|f | dμ < ∞.

1.11.3. Interpolation of operators. We turn at last to the central topic


of these notes, which is interpolation of operators T between functions on
two fixed measure spaces X = (X, X , μ) and Y = (Y, Y, ν). To avoid some
(very minor) technicalities we will make the mild assumption throughout
that X and Y are both σ-finite, although much of the theory here extends
to the non-σ-finite setting.
A typical situation is that of a linear operator T which maps one Lp0 (X)
space to another Lq0 (Y ), and also maps Lp1 (X) to Lq1 (Y ) for some expo-
nents 0 < p0 , p1 , q0 , q1 ≤ ∞; thus (by linearity) T will map the larger vector
space Lp0 (X) + Lp1 (X) to Lq0 (Y ) + Lq1 (Y ), and one has some estimates of
the form
(1.88) T f Lq0 (Y ) ≤ B0 f Lp0 (X)
and
(1.89) T f Lq1 (Y ) ≤ B1 f Lp1 (X)
for all f ∈ Lp0 (X), f ∈ Lp1 (X), respectively, and some B0 , B1 > 0. We
would like to then interpolate to say something about how T maps Lpθ (X)
to Lqθ (Y ).
The complex interpolation method gives a satisfactory result as long
as the exponents allow one to use duality methods, a result known as the
Riesz-Thorin theorem:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
170 1. Real analysis

Theorem 1.11.7 (Riesz-Thorin theorem). Let 0 < p0 , p1 ≤ ∞ and 1 ≤


q0 , q1 ≤ ∞. Let T : Lp0 (X) + Lp1 (X) → Lq0 (Y ) + Lq1 (Y ) be a linear
operator obeying the bounds (1.88), (1.89) for all f ∈ Lp0 (X), f ∈ Lp1 (X),
respectively, and some B0 , B1 > 0. Then we have
T f Lqθ (Y ) ≤ Bθ f Lpθ (X)
for all 0 < θ < 1 and f ∈ Lpθ (X), where 1/pθ := (1 − θ)/p0 + θ/p1 ,
1/qθ := (1 − θ)/q0 + θ/q1 , and Bθ := B01−θ B1θ .
Remark 1.11.8. When X is a point, this theorem essentially collapses to
Lemma 1.11.5 (and when Y is a point, this is a dual formulation of that
lemma); and when X and Y are both points; this collapses to interpolation
of scalars.

Proof. If p0 = p1 , then the claim follows from Lemma 1.11.5, so we may


assume p0 = p1 , which in particular forces pθ to be finite. By symmetry we
can take p0 < p1 . By multiplying the measures μ and ν (or the operator T )
by various constants, we can normalise B0 = B1 = 1 (the case when B0 = 0
or B1 = 0 is trivial). Thus we have Bθ = 1 also.
By Hölder’s inequality, the bound (1.88) implies that

(1.90) | (T f )g dν| ≤ f Lp0 (X) gLq0 (Y )


Y

for all f ∈ Lp0 (X) and g ∈ Lq0 (Y ), where q0 is the dual exponent of q0 .
Similarly we have

(1.91) | (T f )g dν| ≤ f Lp1 (X) gLq1 (Y )


Y

for all f ∈ Lp1 (X) and g ∈ Lq1 (Y ).
We now claim that
(1.92) | (T f )g dν| ≤ f Lpθ (X) g 
Lqθ (Y )
Y
for all f , g that are simple functions with finite measure support. To see this,
we first normalise f Lpθ (X) = g qθ = 1. Observe that we can write
L (Y )
f = |f | sgn(f ), g = |g| sgn(g) for some functions sgn(f ), sgn(g) of magnitude
at most 1. If we then introduce the quantity
   
F (s) := (T [|f |(1−s)pθ /p0 +spθ /p1 sgn(f )])[|g|(1−s)qθ /q0 +sqθ /q1 sgn(g)] dν
Y
(with the conventions that qθ /q0 , qθ /q1 = 1 in the endpoint case q0 = q1 =
qθ = ∞), we see that F is a holomorphic function of s of at most exponential
growth which equals Y (T f )g dν when s = θ. When instead s = 0 + it, an
application of (1.90) shows that |F (s)| ≤ 1; a similar claim is obtained when
s = 1 + it using (1.91). The claim now follows from Theorem 1.11.3.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 171

The estimate (1.92) has currently been established for simple functions
f, g with finite measure support. But one can extend the claim to any
f ∈ Lpθ (X) (keeping g simple with finite measure support) by decompos-
ing f into a bounded function and a function of finite measure support,
approximating the former in Lpθ (X) ∩ Lp1 (X) by simple functions of fi-
nite measure support, and approximating the latter in Lpθ (X) ∩ Lp0 (X) by
simple functions of finite measure support, and taking limits using (1.90),
(1.91) to justify the passage to the limit. One can then also allow arbitrary

g ∈ Lqθ (Y ) by using the monotone convergence theorem (Theorem 1.1.21).

The claim now follows from the duality between Lq1 (Y ) and Lq1 (Y ). 

Suppose one has a linear operator T that maps simple functions of finite
measure support on X to measurable functions on Y (modulo almost ev-
erywhere equivalence). We say that such an operator is of strong type (p, q)
if it can be extended in a continuous fashion to an operator on Lp (X) to
an operator on Lq (Y ); this is equivalent to having an estimate of the form
T f Lq (Y ) ≤ Bf Lp (X) for all simple functions f of finite measure sup-
port. (The extension is unique if p is finite or if X has finite measure, due
to the density of simple functions of finite measure support in those cases.
Annoyingly, uniqueness fails for L∞ of an infinite measure space, though
this turns out not to cause much difficulty in practice, as the conclusions of
interpolation methods are usually for finite exponents p.) Define the strong
type diagram to be the set of all (1/p, 1/q) such that T is of strong type
(p, q). The Riesz-Thorin theorem tells us that if T is of strong type (p0 , q0 )
and (p1 , q1 ) with 0 < p0 , p1 ≤ ∞ and 1 ≤ q0 , q1 ≤ ∞, then T is also of
strong type (pθ , qθ ) for all 0 < θ < 1; thus the strong type diagram contains
the closed line segment connecting (1/p0 , 1/q0 ) with (1/p1 , 1/q1 ). Thus the
strong type diagram of T is convex in [0, +∞) × [0, 1] at least. (As we shall
see later, it is in fact convex in all of [0, +∞)2 .) Furthermore, on the inter-
section of the strong type diagram with [0, 1] × [0, +∞), the operator norm
T Lp (X)→Lq (Y ) is a log-convex function of (1/p, 1/q).
Exercise 1.11.15. If X = Y = [0, 1] with the usual measure, show that the
strong type diagram of the identity operator is the triangle {(1/p, 1/q) ∈
[0, +∞) × [0, +∞) : 1/p ≤ 1/q}. If instead X = Y = Z with the usual
counting measure, show that the strong type diagram of the identity oper-
ator is the triangle {(1/p, 1/q) ∈ [0, +∞) × [0, +∞) : 1/p ≥ 1/q}. What is
the strong type diagram of the identity when X = Y = R with the usual
measure?
Exercise 1.11.16. Let T (resp. T ∗ ) be a linear operator from simple
functions of finite measure support on Y (resp. X) to measurable func-
tions on Y (resp. X) modulo a.e. equivalence that are absolutely inte-
grable on finite measure sets. We say T, T ∗ are formally adjoint if we have

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
172 1. Real analysis

Y (T f )g dν = X f T ∗ g dμ for all simple functions f, g of finite measure sup-


port on X, Y , respectively. If 1 ≤ p, q ≤ ∞, show that T is of strong type
(p, q) if and only if T ∗ is of strong type (q  , p ). Thus, taking formal adjoints
reflects the strong type diagram around the line of duality 1/p + 1/q = 1,
at least inside the Banach space region [0, 1]2 .

Remark 1.11.9. There is a powerful extension of the Riesz-Thorin theorem


known as the Stein interpolation theorem, in which the single operator T is
replaced by a family of operators Ts for s ∈ S that vary holomorphically in s
in the sense that Y (Ts 1E )1F dν is a holomorphic function of s for any sets
E, F of finite measure. Roughly speaking, the Stein interpolation theorem
asserts that if Tj+it is of strong type (pj , qj ) for j = 0, 1 with a bound growing
at most exponentially in t, and Ts itself grows at most exponentially in t in
some sense, then Tθ will be of strong type (pθ , qθ ). A precise statement of
the theorem and some applications can be found in [St1993].

Now we turn to the real interpolation method. Instead of linear opera-


tors, it is now convenient to consider sublinear operators T mapping simple
functions f : X → C of finite measure support in X to [0, +∞]-valued mea-
surable functions on Y (modulo almost everywhere equivalence, as usual),
obeying the homogeneity relationship
|T (cf )| = |c||T f |
and the pointwise bound
|T (f + g)| ≤ |T f | + |T g|
for all c ∈ C, and all simple functions f, g of finite measure support.
Every linear operator is sublinear; also, the absolute value T f := |Sf |
of a linear (or sublinear) operator is also sublinear. More generally, any
maximal operator of the form T f := supα∈A |Sα f |, where (Sα )α∈A is a family
of linear operators, is also a non-negative sublinear operator; note that one
can also replace the
supremum here by any other norm in α, e.g., one could
take an p norm ( α∈A |Sα f |p )1/p for any 1 ≤ p ≤ ∞. (After p = ∞ and
p = 1, a particularly common case is when p = 2, in which case T is known
as a square function.)
The basic theory of sublinear operators is similar to that of linear oper-
ators in some respects. For instance, continuity is still equivalent to bound-
edness:

Exercise 1.11.17. Let T be a sublinear operator, and let 0 < p, q ≤ ∞.


Then the following are equivalent:
• T can be extended to a continuous operator from Lp (X) to Lq (Y ).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 173

• There exists a constant B > 0 such that T f Lq (Y ) ≤ Bf Lp (X) for
all simple functions f of finite measure support.
• T can be extended to a operator from Lp (X) to Lq (Y ) such that
T f Lq (Y ) ≤ Bf Lp (X) for all f ∈ Lp (X) and some B > 0.
Show that the extension mentioned above is unique if p is finite or if X has
finite measure. Finally, show that the same equivalences hold if Lq (Y ) is
replaced by Lq,∞ (Y ) throughout.

We say that T is of strong type (p, q) if any of the above equivalent


statements (for Lq (Y )) hold, and it is of weak type (p, q) if any of the above
equivalent statements (for Lq,∞ (Y )) hold. We say that a linear operator
S is of strong or weak type (p, q) if its non-negative counterpart |S| is;
note that this is compatible with our previous definition of strong type for
such operators. Also, Chebyshev’s inequality tells us that strong type (p, q)
implies weak type (p, q).
We now give the real interpolation counterpart of the Riesz-Thorin the-
orem, namely the Marcinkeiwicz interpolation theorem:
Theorem 1.11.10 (Marcinkiewicz interpolation theorem). Let 0 < p0 , p1 ,
q0 , q1 ≤ ∞ and 0 < θ < 1 be such that q0 = q1 , and pi ≤ qi for i = 0, 1.
Let T be a sublinear operator which is of weak type (p0 , q0 ) and of weak type
(p1 , q1 ). Then T is of strong type (pθ , qθ ).
Remark 1.11.11. Of course, the same claim applies to linear operators S
by setting T := |S|. One can also extend the argument to quasi-linear oper-
ators, in which the pointwise bound |T (f + g)| ≤ |T f | + |T g| is replaced by
|T (f + g)| ≤ C(|T f | + |T g|) for some constant C > 0, but this generalisation
only appears occasionally in applications. The conditions p0 ≤ q0 , p1 ≤ q1
can be replaced by the variant condition pθ ≤ qθ (see Exercises 1.11.19
and 1.11.21), but cannot be eliminated entirely; see Exercise 1.11.20. The
precise hypotheses required on p0 , p1 , q0 , q1 , pθ , qθ are rather technical and I
recommend that they be ignored on a first reading.

Proof. For notational reasons it is convenient to take q0 , q1 finite; however


the arguments below can be modified without much difficulty to deal with
the infinite case (or one can use a suitable limiting argument). We leave this
to the interested reader.
By hypothesis, there exist constants B0 , B1 > 0 such that
(1.93) λT f (t) ≤ B0q0 f qL0p0 (X) /tq0
and
(1.94) λT f (t) ≤ B1q1 f qL1p1 (X) /tq1

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
174 1. Real analysis

for all simple functions f of finite measure support, and all t > 0. Let
us write A  B to denote A ≤ Cp0 ,p1 ,q0 ,q1 ,θ,B0 ,B1 B for some constant
Cp0 ,p1 ,q0 ,q1 ,θ,B0 ,B1 depending on the indicated parameters. By (1.84), it will
suffice to show that

dt
λT f (t)tqθ  f qLθpθ (X) .
0 t
By homogeneity we can normalise f Lpθ (X) = 1.
Actually, it will be slightly more convenient to work with the dyadic
version of the above estimate, namely
(1.95) λT f (2n )2qθ n  1;
n∈Z
see Exercise 1.11.6. The hypothesis f Lpθ (X) = 1 similarly implies that
(1.96) λf (2m )2pθ m  1.
m∈Z
The basic idea is then to get enough control on the numbers λT f (2n ) in
terms of the numbers λf (2m ) that one can deduce (1.95) from (1.96).
When p0 = p1 , the claim follows from direct substitution of (1.91), (1.94)
(see also the discussion in the previous section about interpolating strong
Lp bounds from weak ones), so let us assume p0 = p1 . By symmetry we
may take p0 < p1 , and thus p0 < pθ < p1 . In this case we cannot directly
apply (1.91), (1.94) because we only control f in Lpθ , not Lp0 or Lp1 . To
get around this, we use the basic real interpolation trick of decomposing f
into pieces. There are two basic choices for what decomposition to pick. On
one hand, one could adopt a minimalistic approach and just decompose into
two pieces
f = f≥s + f<s ,
where f≥s := f 1|f |≥s and f<s := f 1|f |<s , and the threshold s is a parameter
(depending on n) to be optimised later. Or we could adopt a maximalistic
approach and perform the dyadic decomposition
f= fm ,
m∈Z
where fm = f 12m ≤|f |<2m+1 . (Note that only finitely many of the fm are
non-zero, as we are assuming f to be a simple function.) We will adopt the
latter approach, in order to illustrate the dyadic decomposition method; the
former approach also works, but we leave it as an exercise to the interested
reader.
From sublinearity we have the pointwise estimate
Tf ≤ T fm ,
m

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 175

which implies that


λT f (2n ) ≤ λT fm (cn,m 2n )
m

whenever cn,m are positive constants such that m cn,m = 1 but for which
we are otherwise at liberty to choose. We will set aside the problem of
deciding what the optimal choice of cn,m is for now and continue with the
proof.
From (1.91), (1.94), we have two bounds for the quantity λT fm (cn,m 2n ),
namely
λT fm (cn,m 2n )  c−q 0 −nq0
n,m 2 fm qL0p0 (X)
and
λT fm (cn,m 2n )  c−q 1 −nq1
n,m 2 fm qL1p1 (X) .
From construction of fm we can bound
fm Lp0 (X)  2m λf (2m )1/p0
and similarly for p1 , and thus we have
λT fm (cn,m 2n )  c−q i −nqi mqi
n,m 2 2 λf (2m )qi /pi
for i = 0, 1. To prove (1.95), it thus suffices to show that
2nqθ min c−q i −nqi mqi
n,m 2 2 λf (2m )qi /pi  1.
i=0,1
n m

It is convenient to introduce the quantities am := λf (2m )2mpθ appearing in


(1.96), thus
am  1
m
and our task is to show that
2nqθ min c−q i −nqi mqi −mqi pθ /pi qi /pi
n,m 2 2 2 am  1.
i=0,1
n m
q /p
Since pi ≤ qi , we have ami i  am , and so weare reduced to the purely
numerical task of locating constants cn,m with m cn,m ≤ 1 for all n such
that
(1.97) 2nqθ min c−q i −nqi mqi −mqi pθ /pi
n,m 2 2 2 1
i=0,1
n
for all m.
We can simplify this expression a bit by collecting terms and making
some substitutions. The points (1/p0 , 1/q0 ), (1/pθ , 1/qθ ), (1/p1 , 1/q1 ) are
collinear, and we can capture this by writing
1 1 1 1
= + xi ; = + αxi
pi pθ qi qθ

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
176 1. Real analysis

for some x0 > 0 > x1 and some α ∈ R. We can then simplify the left-hand
side of (1.97) to
min (c−1
n,m 2
nαqθ −mpθ qi
) .
i=0,1
n
Note that q0 x0 is positive and q1 x1 is negative. If we then pick cn,m to be
a suitably normalised multiple of 2−|nαqθ −mpθ | min(|x0 |,|x1 |)/2 (say), we obtain
the claim by summing geometric series. 
Remark 1.11.12. A closer inspection of the proof (or a rescaling argument
to reduce to the normalised case B0 = B1 = 1, as in preceding sections)
reveals that one establishes the estimate
T f Lqθ (Y ) ≤ Cp0 ,p1 ,q0 ,q1 ,θ,C B01−θ B1θ f Lpθ (X)
for all simple functions f of finite measure support (or for all f ∈ Lpθ (X),
if one works with the continuous extension of T to such functions), and
some constant Cp0 ,p1 ,q0 ,q1 ,θ,C > 0. Thus the conclusion here is weaker by
a multiplicative constant from that in the Riesz-Thorin theorem, but the
hypotheses are weaker too (weak type instead of strong type). Indeed, we
see that the constant Cp0 ,p1 ,q0 ,q1 ,θ must blow up as θ → 0 or θ → 1.

The power of the Marcinkiewicz interpolation theorem, as compared to


the Riesz-Thorin theorem, is that it allows one to weaken the hypotheses on
T from strong type to weak type. Actually, it can be weakened further. We
say that a non-negative sublinear operator T is restricted weak-type (p, q)
for some 0 < p, q ≤ ∞ if there is a constant B > 0 such that
T f Lq,∞ (Y ) ≤ Bμ(E)1/p
for all sets E of finite measure and all simple functions f with |f | ≤ 1E .
Clearly, restricted weak-type (p, q) is implied by weak-type (p, q), and thus
by strong-type (p, q). (One can also define the notion of restricted strong-type
(p, q) by replacing Lq,∞ (Y ) with Lq (Y ); this is between strong-type (p, q)
and restricted weak-type (p, q), but is incomparable to weak-type (p, q).)
Exercise 1.11.18. Show that the Marcinkiewicz interpolation theorem con-
tinues to hold if the weak-type hypotheses are replaced by restricted weak-
type hypothesis. (Hint: Where were the weak-type hypotheses used in the
proof?)

We thus see that the strong-type diagram of T contains the interior of the
restricted weak-type or weak-type diagrams of T , at least in the triangular
region {(1/p, 1/q) ∈ [0, +∞)2 : p ≥ q}.
Exercise 1.11.19. Suppose that T is a sublinear operator of restricted
weak-type (p0 , q0 ) and (p1 , q1 ) for some 0 < p0 , p1 , q0 , q1 ≤ ∞. Show that
T is of restricted weak-type (pθ , qθ ) for any 0 < θ < 1, or in other words

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 177

the restricted type diagram is convex in [0, +∞)2 . (This is an easy result
requiring only interpolation of scalars.) Conclude that the hypotheses p0 ≤
q0 , p1 ≤ q1 in the Marcinkiewicz interpolation theorem can be replaced by
the variant pθ < qθ .

Exercise 1.11.20. For any α ∈  R, let Xα be the natural numbers N with


the weighted counting measure n∈N 2αn δn , thus each point n has mass
2αn . Show that if α > β > 0, then the identity operator from Xα to Xβ
is of weak-type (p, q) but not strong-type (p, q) when 1 < p, q < ∞ and
α/p = β/q. Conclude that the hypotheses p0 ≤ q0 , p1 ≤ q1 cannot be
dropped entirely.

Exercise 1.11.21. Suppose we are in the situation of the Marcinkiewicz


interpolation theorem, with the hypotheses p0 ≤ q0 , p1 ≤ q1 replaced by
p0 = p1 . Show that for all 0 < θ < 1 and 1 ≤ r ≤ ∞ there exists a B > 0
such that
T f Lqθ ,r (Y ) ≤ Bf Lpθ ,r (X)
for all simple functions f of finite measure support, where the Lorentz
norms Lp,q were defined in Exercise 1.11.7. (Hint: Repeat the proof of
the Marcinkiewicz interpolation theorem. It is convenient to replace the
analogues of the quantities am in that argument by the slightly larger quan-

tities bm := supm 2−ε|m−m | am for some small ε >  0 to obtain a good
Lipschitz property on the bm . Now partition the sum n,m into regions of
the form {nαqθ − mpθ + pθ −αqr
θ
log2 bm = kO(1)} for integer k (this choice
of partition is dictated by a comparison of the two terms that arise in the
minimum). Obtain a bound for each summand which decreases geomet-
rically as k → ±∞. Conclude that the hypotheses p0 ≤ q0 , p1 ≤ q1 in
the Marcinkiewicz interpolation theorem can be replaced by pθ ≤ qθ . This
Lorentz space version of the interpolation theorem is in some sense the right
version of the theorem, but the Lorentz spaces are slightly more technical
to deal with than the Lebesgue spaces, and the Lebesgue space version of
Marcinkiewicz interpolation is largely sufficient for most applications.

Exercise 1.11.22. For i = 1, 2, let Xi = (Xi , Xi , μi ), Yi = (Yi , Yi , νi ) be σ-


finite measure spaces, and let Ti be a linear operator from simple functions of
finite measure support on Xi to measurable functions on Yi (modulo almost
everywhere equivalence, as always). Let X = X1 × X2 , Y = Y1 × Y2 be the
product spaces (with product σ-algebra and product measure). Show that
there exists a unique (modulo a.e. equivalence) linear operator T defined
on linear combinations of indicator functions 1E1 ×E2 of product sets of sets
E1 ⊂ X1 , E2 ⊂ X2 of finite measure, such that

T 1E1 ×E2 (y1 , y2 ) := T1 1E1 (y1 )T2 1E2 (y2 )

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
178 1. Real analysis

for a.e. (y1 , y2 ) ∈ Y ; we refer to T as the tensor product of T1 and T2


and write T = T1 ⊗ T2 . Show that if T1 , T2 are of strong-type (p, q) for
some 1 ≤ p, q < ∞ with operator norms B1 , B2 , respectively, then T can
be extended to a bounded linear operator on Lp (X) to Lq (Y ) with operator
norm exactly equal to B1 B2 , thus
T1 ⊗ T2 Lp (X1 ×X2 )→Lq (Y1 ×Y2 ) = T1 Lp (X1 )→Lq (Y1 ) T2 Lp (X2 )→Lq (Y2 ) .
(Hint: For the lower bound, show that T1 ⊗ T2 (f1 ⊗ f2 ) = (T1 f1 ) ⊗ (T2 f2 )
for all simple functions f1 , f2 . For the upper bound, express T1 × T2 as the
composition of two other operators T1 ⊗ I1 and I2 ⊗ T2 for some identity
operators I1 , I2 , and establish operator norm bounds on these two operators
separately.) Use this and the tensor power trick to deduce the Riesz-Thorin
theorem (in the special case when 1 ≤ pi ≤ qi < ∞ for i = 0, 1, and q0 = q1 )
from the Marcinkiewicz interpolation theorem. Thus one can (with some
effort) avoid the use of complex variable methods to prove the Riesz-Thorin
theorem, at least in some cases.
Exercise 1.11.23 (Hölder’s inequality for Lorentz spaces). Let f∈Lp1 ,r1 (X)
and g ∈ Lp2 ,r2 (X) for some 0 < p1 , p2 , r1 , r2 ≤ ∞. Show that f g ∈
Lp3 ,r3 (X), where 1/p3 = 1/p1 + 1/p2 and 1/r3 = 1/r1 + 1/r2 , with the
estimate
f gLp3 ,r3 (X) ≤ Cp1 ,p2 ,r1 ,r2 f Lp1 ,r1 (X) gLp2 ,r2 (X)
for some constant Cp1 ,p2 ,r1 ,r2 . (This estimate is due to O’Neil [ON1963].)
Remark 1.11.13. Just as interpolation of functions can be clarified by
using step functions f = A1E as a test case, it is instructive to use rank one
operators such as

T f := Af, 1E 1F = A( f dμ)1F ,


E
where E ⊂ X, F ⊂ Y are finite measure sets, as test cases for the real and
complex interpolation methods. (After understanding the rank one case, we
then recommend looking at the rank two case, e.g., T f := A1 f, 1E1 1F1 +
A2 f, 1E2 1F2 , where E2 , F2 could be very different in size from E1 , F1 .)

1.11.4. Some examples of interpolation. Now we apply the interpola-


tion theorems to some classes of operators. An important such class is given
by the integral operators

T f (y) := K(x, y)f (x) dμ(x)


X
from functions f : X → C to functions T f : Y → C, where K : X × Y → C
is a fixed measurable function, known as the kernel of the integral operator

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 179

T . Of course, this integral is not necessarily convergent, so we will also need


to study the sublinear analogue

|T |f (y) := |K(x, y)||f (x)| dμ(x),


X
which is well defined (though it may be infinite).
The following useful lemma gives us strong-type bounds on |T | and hence
T , assuming certain Lp type bounds on the rows and columns of K.
Lemma 1.11.14 (Schur’s test). Let K : X × Y → C be a measurable
function obeying the bounds
K(x, ·)Lq0 (Y ) ≤ B0
for almost every x ∈ X, and
K(·, y)Lp1 (X) ≤ B1
for almost every y ∈ Y , where 1 ≤ p1 , q0 ≤ ∞ and B0 , B1 > 0. Then for
every 0 < θ < 1, |T | and T are of strong-type (pθ , qθ ), with T f (y) well
defined for all f ∈ Lpθ (X) and almost every y ∈ Y , and furthermore
T f Lqθ (Y ) ≤ Bθ f Lpθ (X) .
Here we adopt the convention that p0 := 1 and q1 := ∞, thus qθ = q0 /(1 − θ)
and pθ = p1 /θ.

Proof. The hypothesis K(x, ·)Lq0 (Y ) ≤ B0 , combined with Minkowski’s


integral inequality, shows us that
|T |f Lq0 (Y ) ≤ B0 f L1 (X)
for all f ∈ L1 (X). In particular, for such f , T f is well defined almost
everywhere, and
T f Lq0 (Y ) ≤ B0 f L1 (X) .
Similarly, Hölder’s inequality tells us that for f ∈ Lp1 (X), T f is well defined
everywhere, and
T f L∞ (Y ) ≤ B1 f Lp1 (X) .
Applying the Riesz-Thorin theorem we conclude that
T f Lqθ (Y ) ≤ Bθ f Lpθ (X)
for all simple functions f with finite measure support; replacing K with |K|
we also see that
|T |f Lqθ (Y ) ≤ Bθ f Lpθ (X)
for all simple functions f with finite measure support, and thus (by mono-
tone convergence, Theorem 1.1.21) for all f ∈ Lpθ (X). The claim then
follows. 

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
180 1. Real analysis

Example 1.11.15. Let A = (aij )1≤i≤n,1≤j≤m be a matrix such that the


sum of the magnitudes of the entries
 in every row and column is at most B,
i.e., ni=1 |aij | ≤ B for all j and m j=1 |aij | ≤ B for all i. Then one has the
bound
Ax pm ≤ Bx pn
for all vectors x ∈ Cn and all 1 ≤ p ≤ ∞. Note the extreme cases p = 1, p =
∞ can be seen directly; the remaining cases then follow from interpolation.
A useful special case arises when A is an S-sparse matrix, which means
that at most S entries in any row or column are non-zero (e.g., permutation
matrices are 1-sparse). We then conclude that the p operator norm of A is
at most S supi,j |ai,j |.
Exercise 1.11.24. Establish Schur’s test by more direct means, taking
advantage of the duality relationship

gLp (Y ) := sup{| gh| : hLp (Y ) ≤ 1}


Y

for 1 ≤ p ≤ ∞, as well as Young’s inequality xy ≤ 1r xr + r1 xr for 1 < r < ∞.
(You may wish to first work out Example 1.11.15, say with p = 2, to figure
out the logic.)

A useful corollary of Schur’s test is Young’s convolution inequality for


the convolution f ∗ g of two functions f : Rn → C, g : Rn → C, defined as

f ∗ g(x) := f (y)g(x − y) dy,


Rn
provided of course that the integrand is absolutely convergent.
Exercise 1.11.25 (Young’s inequality). Let 1 ≤ p, q, r ≤ ∞ be such that
p + q = r + 1. Show that if f ∈ L (R ) and g ∈ L (R ), then f ∗ g is well
1 1 1 p n q n
r n
defined almost everywhere and lies in L (R ), and furthermore that
f ∗ gLr (Rn ) ≤ f Lp (Rn ) gLq (Rn ) .
(Hint: Apply Schur’s test to the kernel K(x, y) := g(x − y).)
Remark 1.11.16. There is nothing special about Rn here; one could in fact
use any locally compact group G with a bi-invariant Haar measure. On the
other hand, if one specialises to Rn , then it is possible to improve Young’s
inequality slightly to
f ∗ gLr (Rn ) ≤ (Ap Aq Ar )n/2 f Lp (Rn ) gLq (Rn ) ,

where Ap := p1/p /(p )1/p , a celebrated result of Beckner [Be1975]. The
constant here is best possible, as can be seen by testing the inequality in the
case when f, g are Gaussians.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 181


Exercise 1.11.26. Let 1 ≤ p ≤ ∞, and let f ∈ Lp (Rn ), g ∈ Lp (Rn ).
Young’s inequality tells us that f ∗ g ∈ L∞ (Rn ). Refine this further by
showing that f ∗ g ∈ C0 (Rn ), i.e., f ∗ g is continuous and goes to zero at
infinity. (Hint: First show this when f, g ∈ Cc (Rn ), then use a limiting
argument.)

We now give a variant of Schur’s test that allows for weak estimates.
Lemma 1.11.17 (Weak-type Schur’s test). Let K : X × Y → C be a
measurable function obeying the bounds
K(x, ·)Lq0 ,∞ (Y ) ≤ B0
for almost every x ∈ X, and
K(·, y)Lp1 ,∞ (X) ≤ B1

for almost every y ∈ Y , where 1 < p1 , q0 < ∞ and B0 , B1 > 0 (note the
endpoint exponents 1, ∞ are now excluded). Then for every 0 < θ < 1, |T |
and T are of strong-type (pθ , qθ ), with T f (y) well defined for all f ∈ Lpθ (X)
and almost every y ∈ Y , and furthermore
T f Lqθ (Y ) ≤ Cp1 ,q0 ,θ Bθ f Lpθ (X) .
Here we again adopt the convention that p0 := 1 and q1 := ∞.

Proof. From Exercise 1.11.11 we see that



|K(x, y)|1E (y) dν(y)  B0 μ(E)1/q0
Y
for any measurable E ⊂ Y , where we use A  B to denote A ≤ Cp1 ,q0 ,θ B for
some Cp1 ,q0 ,θ depending on the indicated parameters. By the Fubini-Tonelli
theorem, we conclude that

|T |f (y)1E (y) dν(y)  B0 μ(E)1/q0 f L1 (X)
Y
for any f ∈ 1
L (X); by Exercise 1.11.11 again we conclude that
|T |f Lq0 ,∞ (Y )  B0 f L1 (X) ,
thus |T | is of weak-type (1, q0 ). In a similar vein from yet another application
of Exercise 1.11.11, we see that
|T |f L∞ (Y )  B1 μ(F )1/p1
whenever 0 ≤ f ≤ 1F and F ⊂ X has finite measure; thus |T | is of restricted
type (p1 , ∞). Applying Exercise 1.11.18, we conclude that |T | is of strong
type (pθ , qθ ) (with operator norm  Bθ ), and the claim follows. 

This leads to a weak-type version of Young’s inequality:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
182 1. Real analysis

Exercise 1.11.27 (Weak-type Young’s inequality). Let 1 < p, q, r < ∞ be


such that p1 + 1q = 1r + 1. Show that if f ∈ Lp (Rn ) and g ∈ Lq,∞ (Rn ), then
f ∗ g is well defined almost everywhere and lies in Lr (Rn ), and furthermore
that
f ∗ gLr (Rn ) ≤ Cp,q f Lp (Rn ) gLq,∞ (Rn )
for some constant Cp,q > 0.
Exercise 1.11.28. Refine the previous exercise by replacing Lr (Rn ) with
the Lorentz space Lr,p (Rn ) throughout.

Recall that the function 1/|x|α will lie in Ln/α,∞ (Rn ) for α > 0. We
conclude
Corollary 1.11.18 (Hardy-Littlewood-Sobolev fractional integration in-
equality). Let 1 < p, r < ∞ and 0 < α < n be such that p1 + αn = 1r + 1. If
f ∈ Lp (Rn ), then the function Iα f , defined as
f (y)
Iα f (x) := dy,
Rn |x − y|α
is well defined almost everywhere and lies in Lr (Rn ), and furthermore
Iα f Lr (Rn ) ≤ Cp,α,n f Lp (Rn )
for some constant Cp,α,n > 0.

This inequality is of importance in the theory of Sobolev spaces, which


we will discuss in Section 1.14.
Exercise 1.11.29. Show that Corollary 1.11.18 can fail at the endpoints
p = 1, r = ∞, or α = n.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/03/30.
Thanks to PDEbeginner, Samir Chomsky, Spencer, Xiaochuan Liu and
anonymous commenters for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.12

The Fourier transform

In these notes we lay out the basic theory of the Fourier transform, which
is of course the most fundamental tool in harmonic analysis and is also of
major importance in related fields (functional analysis, complex analysis,
PDE, number theory, additive combinatorics, representation theory, signal
processing, etc.) The Fourier transform, in conjunction with the Fourier
inversion formula, allows one to take essentially arbitrary (complex-valued)
functions on a group G (or more generally, a space X that G acts on, e.g.,
a homogeneous space G/H) and decompose them as a (discrete or continu-
ous) superposition of much more symmetric functions on the domain, such
as characters χ : G → S 1 . The precise superposition is given by Fourier
coefficients fˆ(ξ), which take values in some dual object such as the Pontrya-
gin dual Ĝ of G. Characters behave in a very simple manner with respect to
translation (indeed, they are eigenfunctions of the translation action), and
so the Fourier transform tends to simplify any mathematical problem which
enjoys a translation invariance symmetry (or an approximation to such a
symmetry) and is somehow linear (i.e., it interacts nicely with superposi-
tions). In particular, Fourier analytic methods are particularly useful for
studying operations such as convolution f, g → f ∗ g and set-theoretic addi-
tion A, B → A + B, or the closely related problem of counting solutions to
additive problems such as x = a1 + a2 + a3 or x = a1 − a2 , where a1 , a2 , a3
are constrained to lie in specific sets A1 , A2 , A3 . The Fourier transform is
also a particularly powerful tool for solving constant-coefficient linear ODE
and PDE (because of the translation invariance), and it can also approxi-
mately solve some variable-coefficient (or slightly non-linear) equations if the
coefficients vary smoothly enough and the nonlinear terms are sufficiently
tame.

183

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
184 1. Real analysis

The Fourier transform fˆ(ξ) also provides an important new way of look-
ing at a function f (x), as it highlights the distribution of f in frequency
space (the domain of the frequency variable ξ) rather than physical space
(the domain of the physical variable x). A given property of f in the phys-
ical domain may be transformed to a rather different-looking property of fˆ
in the frequency domain. For instance:
• Smoothness of f in the physical domain corresponds to decay of fˆ
in the Fourier domain, and conversely. (More generally, fine scale
properties of f tend to manifest themselves as coarse scale properties
of fˆ, and conversely.)
• Convolution in the physical domain corresponds to pointwise multi-
plication in the Fourier domain, and conversely.
• Constant coefficient differential operators such as d/dx in the physical
domain correspond to multiplication by polynomials such as 2πiξ in
the Fourier domain, and conversely.
• More generally, translation invariant operators in the physical domain
correspond to multiplication by symbols in the Fourier domain, and
conversely.
• Rescaling in the physical domain by an invertible linear transfor-
mation corresponds to an inverse (adjoint) rescaling in the Fourier
domain.
• Restriction to a subspace (or subgroup) in the physical domain cor-
responds to projection to the dual quotient space (or quotient group)
in the Fourier domain, and conversely.
• Frequency modulation in the physical domain corresponds to trans-
lation in the frequency domain, and conversely.
(We will make these statements more precise below.)
On the other hand, some operations in the physical domain remain es-
sentially unchanged in the Fourier domain. Most importantly, the L2 norm
(or energy) of a function f is the same as that of its Fourier transform, and
more generally the inner product f, g of two functions f is the same as
that of their Fourier transforms. Indeed, the Fourier transform is a unitary
operator on L2 (a fact which is variously known as the Plancherel theorem or
the Parseval identity). This makes it easier to pass back and forth between
the physical domain and frequency domain, so that one can combine tech-
niques that are easy to execute in the physical domain with other techniques
that are easy to execute in the frequency domain. (In fact, one can combine
the physical and frequency domains together into a product domain known
as phase space, and there are entire fields of mathematics (e.g., microlocal

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 185

analysis, geometric quantisation, time-frequency analysis) devoted to per-


forming analysis on these sorts of spaces directly, but this is beyond the
scope of this course.)
In these notes, we briefly discuss the general theory of the Fourier trans-
form, but will mainly focus on the two classical domains for Fourier analysis:
the torus Td := (R/Z)d and the Euclidean space Rd . For these domains
one has the advantage of being able to perform very explicit algebraic cal-
culations, involving concrete functions such as plane waves x → e2πix·ξ or
Gaussians x → Ad/2 e−πA|x| .
2

1.12.1. Generalities. Let us begin with some generalities. An abelian


topological group is an abelian group G = (G, +) with a topological struc-
ture, such that the group operations of addition + : G×G → G and negation
− : G → G are continuous. (One can of course also consider abelian mul-
tiplicative groups G = (G, ·), but to fix the notation we shall restrict our
attention to additive groups.) For technical reasons (and in particular, in or-
der to apply many of the results from the previous sections) it is convenient
to restrict our attention to abelian topological groups which are locally com-
pact Hausdorff (LCH); these are known as locally compact abelian (LCA)
groups. Some basic examples of locally compact abelian groups are:
• Finite additive groups (with the discrete topology), such as cyclic
groups Z/N Z.
• Finitely generated additive groups (with the discrete topology), such
as the standard lattice Zd .
• Tori, such as the standard d-dimensional torus Td := (R/Z)d with
the standard topology.
• Euclidean spaces, such the standard d-dimensional Euclidean space
Rd (with the standard topology, of course).
• The rationals Q are not locally compact with the usual topology, but
if one uses the discrete topology instead, one recovers an LCA group.
• Another example of an LCA group, of importance in number theory,
is the adele ring A, discussed in Section 1.5 of Poincaré’s legacies,
Vol. I.
Thus we see that locally compact abelian groups can be either discrete
or continuous, and either compact or non-compact; all four combinations
of these cases are of importance. The topology of course generates a Borel
σ-algebra in the usual fashion, as well as a space Cc (G) of continuous, com-
pactly supported complex-valued functions. There is a translation action
x → τx of G on Cc (G), where for every x ∈ G, τx : Cc (G) → Cc (G) is the

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
186 1. Real analysis

translation operation
τx f (y) := f (y − x).
LCA groups need not be σ-compact (think of the free abelian group on
uncountably many generators, with the discrete topology), but one has the
following useful substitute:
Exercise 1.12.1. Show that every LCA group G contains a σ-compact
open subgroup H, and in particular is the disjoint union of σ-compact sets.
(Hint: Take a compact symmetric neighbourhood K of the identity, and
consider the group H generated by this neighbourhood.)

An important notion for us will be that of a Haar measure: a Radon


measure μ on G which is translation-invariant (i.e., μ(E + x) = μ(E) for
all Borel sets E ⊂ G and all x ∈ G, where E + x := {y + x : y ∈ E} is the
translation of E by x). From this and the definition of integration we see
that integration f → G f dμ against a Haar measure (an operation known
as the Haar integral ) is also translation-invariant, thus

(1.98) f (y − x) dμ(y) = f (y) dμ(y)


G G
or equivalently

(1.99) τx f dμ = f dμ
G G
for all f ∈ Cc (G) and x ∈ G. The trivial measure 0 is of course a Haar
measure; all other Haar measures are called non-trivial.
Let us note some non-trivial Haar measures in the four basic examples
of locally compact abelian groups:
• For a finite additive group G, one can take either counting measure
# or normalised counting measure #/#(G) as a Haar measure. (The
former measure emphasises the discrete nature of G; the latter mea-
sure emphasises the compact nature of G.)
• For finitely generated additive groups such as Zd , counting measure
# is a Haar measure.
• For the standard torus (R/Z)d , one can obtain a Haar measure by
identifying this torus with [0, 1)d in the usual manner and then tak-
ing Lebesgue measure on the latter space. This Haar measure is a
probability measure.
• For the standard Euclidean space Rd , Lebesgue measure is a Haar
measure.
Of course, any non-negative constant multiple of a Haar measure is again
a Haar measure. The converse is also true:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 187

Exercise 1.12.2 (Uniqueness of Haar measure up to scalars). Let μ, ν be


two non-trivial Haar measures on a locally compact abelian group G. Show
that μ, ν are scalar multiples of each other, i.e., there exists a constant c > 0
such that ν = cμ. (Hint: For any f, g ∈ Cc (G), compute the quantity
G G g(y)f (x + y) dμ(x)dν(y) in two different ways.)

The above argument also implies a useful symmetry property of Haar


measures:
Exercise 1.12.3 (Haar measures are symmetric). Let μ be a Haar measure
on a locally compact abelian group G. Show that G f (−x) dx = G f (x) dx
for all f ∈ Cc (G). (Hint: Expand G G f (y)f (x + y) dμ(x)dμ(y) in two
different ways.) Conclude that Haar measures on LCA groups are symmetric
in the sense that μ(−E) = μ(E) for all measurable E, where −E := {−x :
x ∈ E} is the reflection of E.
Exercise 1.12.4 (Open sets have positive measure). Let μ be a non-trivial
Haar measure on a locally compact abelian group G. Show that μ(U ) > 0
for any non-empty open set U . Conclude that if f ∈ Cc (G) is non-negative
and not identically zero, then G f dμ > 0.
Exercise 1.12.5. If G is an LCA group with non-trivial Haar measure
μ, show that L1 (G)∗ is identifiable with L∞ (G). (Unfortunately, G is not
always σ-finite, and so the standard duality theorem from Section 1.3 does
not directly apply. However, one can get around this using Exercise 1.12.1.)
It is a (not entirely trivial) theorem, due to André Weil, that all LCA
groups have a non-trivial Haar measure. For discrete groups, one can of
course take counting measure as a Haar measure. For compact groups, the
result is due to Haar, and one can argue as follows:
Exercise 1.12.6 (Existence of Haar measure, compact case). Let G be a
compact metrisable abelian group. For any real-valued f ∈ Cc (G), and
any Borel probability measure μ on G, define the oscillation oscf (μ) of
μ with respect to f to be the quantity oscf (μ) := supy∈G G τy f dμ(x) −
inf y∈G G τy f dμ(x).
(a) Show that a Borel probability measure μ is a Haar measure if and
only if oscf (μ) = 0 for all f ∈ Cc (G).
(b) If a sequence μn of Borel probability measures converges in the vague
topology to another Borel probability measure μ, show that oscf (μn )
→ oscf (μ) for all f ∈ Cc (G).
(c) If μ is a Borel probability measure and f ∈ Cc (G) is such that
oscf (μ) > 0, show that there exists a Borel probability measure μ
such that oscf (μ ) < oscf (μ) and oscg (μ ) ≤ oscg (μ) for all g ∈ Cc (G).
(Hint: Take μ to be the an average of certain translations of μ.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
188 1. Real analysis

(d) Given any finite number of functions f1 , . . . , fn ∈ Cc (G), show that


there exists a Borel probability measure μ such that oscfi (μ) = 0
for all i = 1, . . . , n. (Hint: Use Prokhorov’s theorem; see Corollary
1.10.22. Try the n = 1 case first.)
(e) Show that there exists a unique Haar probability measure on G.
(Hint: One can identify each probability measure μ with the ele-
ment ( G f dμ)f ∈Cc (G) of the product space

[− sup |f (x)|, sup |f (x)|],
x∈G x∈G
f ∈Cc (G)

which is compact by Tychonoff’s theorem. Now use (d) and the finite
intersection property.)
(The argument can be adapted to the case when G is not metrisable, but
one has to replace the sequential compactness given by Prokhorov’s theorem
with the topological compactness given by the Banach-Alaoglu theorem.)

For general LCA groups, the proof is more complicated:

Exercise 1.12.7 (Existence of Haar measure, general case). Let G be an


LCA group. Let Cc (G)+ denote the space of non-negative functions f ∈
Cc (G) that are not identically zero. Given two f, g ∈ Cc (G)+ , define a g-
cover of f to be an expression of the form a1 τx1 g+· · ·+an τxn g that pointwise
dominates f , where a1 , . . . , an are non-negative numbers and x1 , . . . , xn ∈ G.
Let (f : g) denote the infimum of the quantity a1 + · · · + an for all g-covers
of f .
(a) Finiteness. Show that 0 < (f : g) < +∞ for all f, g ∈ Cc (G)+ .
(b) Let μ be a Haar measure on G. Show that G f dμ ≤ (f : g)( G g dμ)
for all f, g ∈ Cc (G)+ . Conversely, for every f ∈ Cc (G)+ and ε >
0, show that there exists g ∈ Cc (G)+ such that G f dμ ≥ (f :
g)( G g dμ) − ε. (Hint: f is uniformly continuous. Take g to be
an approximation to the identity.) Thus Haar integrals are related
to certain renormalised versions of the functionals f → (f : g); this
observation underlies the strategy for construction of Haar measure
in the rest of this exercise.
(c) Transitivity. Show that (f : h) ≤ (f : g)(g : h) for all f, g, h ∈
Cc (G)+ .
(d) Translation invariance. Show that (τx f : g) = (f : g) for all f, g ∈
Cc (G)+ and x ∈ G.
(e) Sublinearity. Show that (f + g : h) ≤ (f : h) + (g : h) and (cf : g) =
c(f : g) for all f, g, h ∈ Cc (G)+ and c > 0.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 189

(f) Approximate superadditivity. If f, g ∈ Cc (G)+ and ε > 0, show that


there exists a neighbourhood U of the identity such that (f : h) + (g :
h) ≤ (1+ε)(f +g : h) whenever h ∈ Cc (G)+ is supported in U . (Hint:
f, g, f + g are all uniformly continuous. Take an h-cover of f + g and
multiply the weight ai at xi by weights such as f (xi )/(f (xi )+g(xi )−ε)
and g(xi )/(f (xi ) + g(xi ) − ε).)
Next, fix a reference function f0 ∈ Cc (G)+ , and define the functional Ig :
Cc (G)+ → R+ for all g ∈ Cc (G)+ by the formula Ig (f ) := (f : g)/(f0 : g).
(g) Show that for any fixed f , Ig (f ) ranges in the compact interval
[(f0 : f )−1 , (f : f0 )]; thus Ig can be viewed as an element of the
product space f ∈Cc (G)+ [(f0 : f )−1 , (f : f0 )], which is compact by
Tychonoff’s theorem.
(h) From (d), (e) we have the translation-invariance property Ig (τx f ) =
Ig (f ), the homogeneity property Ig (cf ) = cIg (f ), and the sub-
additivity property Ig (f +f  ) ≤ Ig (f )+Ig (f  ) for all g, f, f  ∈ Cc (G)+ ,
x ∈ G, and c > 0; we also have the normalisation Ig (f0 ) = 1. Now
show that for all f1 , . . . , fn , f1 , . . . , fn ∈ Cc (G)+ and ε > 0, there
exists g ∈ Cc (G)+ such that Ig (fi + fi ) ≥ Ig (fi ) + Ig (fi ) − ε for all
i = 1, . . . , n.
(i) Show that there exists a unique Haar measure μ on G with μ(f0 ) =
1. (Hint: Use (h) and the finite intersection property to obtain a
translation-invariant positive linear functional on Cc (G), then use
the Riesz representation theorem.)

Now we come to a fundamental notion, that of a character.


Definition 1.12.1 (Characters). Let G be an LCA group. A multiplicative
character χ is a continuous function χ : G → S 1 to the unit circle S 1 :=
{z ∈ C : |z| = 1} which is a homomorphism, i.e., χ(x + y) = χ(x)χ(y) for
all x, y ∈ G. An additive character or frequency ξ : x → ξ · x is a continuous
function ξ : G → R/Z which is a homomorphism, thus ξ ·(x+y) = ξ ·x+ξ ·y
for all x, y ∈ G. The set of all frequencies ξ is called the Pontryagin dual of
G and is denoted Ĝ; it is clearly an abelian group. A multiplicative character
is called non-trivial if it is not the constant function 1; an additive character
is called non-trivial if it is not the constant function 0.

Multiplicative characters and additive characters are clearly related: if


ξ ∈ Ĝ is an additive character, then the function x → e2πiξ·x is a multiplica-
tive character, and conversely every multiplicative character arises uniquely
from an additive character in this fashion.
Exercise 1.12.8. Let G be an LCA group. We give Ĝ the topology of local
uniform convergence on compact sets, thus the topology on Ĝ are generated

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
190 1. Real analysis

by sets of the form {ξ ∈ Ĝ : |ξ · x − ξ0 · x| < ε for all x ∈ K} for compact


K ⊂ G, ξ0 ∈ Ĝ, and ε > 0. Show that this turns Ĝ into an LCA group.
(Hint: Show that for any neighbourhood U of the identity in G, the sets
{ξ ∈ Ĝ : ξ · x ∈ [−ε, ε] for all x ∈ U } for 0 < ε < 1/4 (say) are compact.)
Furthermore, if G is discrete, show that Ĝ is compact.

The Pontryagin dual can be computed easily for various classical LCA
groups:

Exercise 1.12.9. Let d ≥ 1 be an integer.

(a) Show that the Pontryagin dual Z d of Zd is identifiable as an LCA


group with (R/Z)d by identifying each ξ ∈ (R/Z)d with the fre-
quency x → ξ · x given by the dot product.
(b) Show that the Pontryagin dual R d of Rd is identifiable as an LCA
group with Rd by identifying each ξ ∈ Rd with the frequency x → ξ·x
given by the dot product.
 d of (R/Z)d is identifiable as
(c) Show that the Pontryagin dual (R/Z)
an LCA group with Z by identifying each ξ ∈ Zd with the frequency
d

x → ξ · x given by the dot product.


(d) Contravariant functoriality. If φ : G → H is a continuous homomor-
phism between LCA groups, show that there is a continuous homo-
morphism φ∗ : Ĥ → Ĝ between their Pontryagin duals, defined by
φ∗ (ξ) · x := ξ · φ(x) for ξ ∈ Ĥ and x ∈ G.
(e) If H is a closed subgroup of an LCA group G (and is thus also LCA),
show that Ĥ is identifiable with Ĝ/H ⊥ , where H ⊥ is the space of all
frequencies ξ ∈ Ĝ which annihilates H (i.e., ξ · x = 0 for all x ∈ H).
(f) If G, H are LCA groups, show that G
× H is identifiable as an LCA
group with Ĝ × Ĥ.
(g) Show that the Pontryagin dual of a finite abelian group G is identifi-
able with itself. (Hint: First do this for cyclic groups Z/N Z, identi-
fying ξ ∈ Z/N Z with the additive character x → xξ/N , then use the
classification of finite abelian groups.) Note that this identification is
not unique.

Exercise 1.12.10. Let G be an LCA group with non-trivial Haar measure


μ, and let χ : G → S 1 be a measurable function such that χ(x)χ(y) =
χ(x + y) for almost every x, y ∈ G. Show that χ is equal almost everywhere
to a multiplicative character χ̃ of G. (Hint: On the one hand, τx χ = χ(−x)χ
a.e. for almost every x. On the other hand, τx χ depends continuously on x
in, say, the local L1 topology.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 191

In the remainder of this section, G is a fixed LCA group with a non-


trivial Haar measure μ.
Given an absolutely integrable function f ∈ L1 (G), we define the Fourier
transform fˆ : Ĝ → C by the formula

fˆ(ξ) := f (x)e−2πiξ·x dμ(x).


G
This is clearly a linear transformation with the obvious bound
sup |fˆ(ξ)| ≤ f L1 (G) .
ξ∈Ĝ

It converts translations into frequency modulations: indeed, one easily ver-


ifies that
(1.100) τ
x0 f (ξ) = e
−2πiξ·x0 ˆ
f (ξ)
for any f ∈ L1 (G), x0 ∈ G, and ξ ∈ Ĝ. Conversely, it converts frequency
modulations to translations: one has
(1.101) χ ˆ
ξ0 f (ξ) = f (ξ − ξ0 )

for any f ∈ L1 (G) and ξ0 , ξ ∈ Ĝ, where χξ0 is the multiplicative character
χξ0 : x → e2πiξ0 ·x .
Exercise 1.12.11 (Riemann-Lebesgue lemma). If f ∈ L1 (G), show that
fˆ : Ĝ → C is continuous. Furthermore, show that fˆ goes to zero at infinity in
the sense that for every ε > 0 there exists a compact subset K of Ĝ such that
|fˆ(ξ)| ≤ ε for ξ ∈ K. (Hint: First show that there exists a neighbourhood U
of the identity in G such that τx f − f L1 (G) ≤ ε2 (say) for all x ∈ U . Now
take the Fourier transform of this fact.) Thus the Fourier transform maps
L1 (G) continuously to C0 (Ĝ), the space of continuous functions on Ĝ which
go to zero at infinity; the decay at infinity is known as the Riemann-Lebesgue
lemma.
Exercise 1.12.12. Let G be an LCA group with non-trivial Haar measure
μ. Show that the topology of Ĝ is the weakest topology such that fˆ is
continuous for every f ∈ L1 (G).

Given two f, g ∈ L1 (G), recall that the convolution f ∗ g : G → C is


defined as
f ∗ g(x) := f (y)g(x − y) dμ(y).
G
From Young’s inequality (Exercise 1.11.25) we know that f ∗ g is defined
a.e., and lies in L1 (G); indeed, we have
f ∗ gL1 (G) ≤ f L1 (G) gL1 (G) .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
192 1. Real analysis

Exercise 1.12.13. Show that the operation f, g → f ∗g is a bilinear, contin-


uous, commutative, and associative operation on L1 (G). As a consequence,
the Banach space L1 (G) with the convolution operation as a “multiplica-
tion” operation becomes a commutative Banach algebra. If we also define
f ∗ (x) := f (−x) for all f ∈ L1 (G), this turns L1 (G) into a B*-algebra.
For f, g ∈ L1 (G), show that

(1.102) f
∗ g(ξ) = fˆ(ξ)ĝ(ξ)
for all ξ ∈ Ĝ; thus the Fourier transform converts convolution to a pointwise
product.
Exercise 1.12.14. Let G, H be LCA groups with non-trivial Haar mea-
sures μ, ν, respectively, and let f ∈ L1 (G), g ∈ L1 (H). Show that the tensor
product f ⊗ g ∈ L1 (G × H) (with product Haar measure μ × ν) has a Fourier
transform of fˆ ⊗ ĝ, where we identify G × H with Ĝ × Ĥ as per Exercise
1.12.9(f). Informally, this exercise asserts that the Fourier transform com-
mutes with tensor products. (Because of this fact, the tensor power trick (see
Section 1.9 of Structure and Randomness) is often available when proving
results about the Fourier transform on general groups.)
Exercise 1.12.15 (Convolution and Fourier transform of measures). If ν ∈
M (G) is a finite Radon measure on an LCA group G with non-trivial Haar
measure μ, define the Fourier-Stieltjes transform ν̂ : Ĝ → C by the formula
ν̂(ξ) := G e−2πiξ·x dν(x) (thus for instance μˆf = fˆ for any f ∈ L1 (G)).
Show that ν̂ is a bounded continuous function on Ĝ. Given any f ∈ L1 (G),
define the convolution f ∗ ν : G → C to be the function

f ∗ ν(x) := f (x − y) dν(y),
G
and given any finite Radon measure ρ, let ν ∗ ρ : G → C be the measure

ν ∗ ρ(E) := 1E (x + y) dν(x)dρ(y).
G G

Show that f ∗ ν ∈ L1 (G) and f ∗ ν(ξ) = fˆ(ξ)ν̂(ξ) for all ξ ∈ Ĝ, and similarly
that ν ∗ ρ is a finite measure and ν∗ ρ(ξ) = ν̂(ξ)ρ̂(ξ) for all ξ ∈ Ĝ. Thus the
convolution and Fourier structure on L1 (G) can be extended to the larger
space M (G) of finite Radon measures.

1.12.2. The Fourier transform on compact abelian groups. In this


section we specialise the Fourier transform to the case when the locally
compact group G is in fact compact, thus we now have a compact abelian
group G with non-trivial Haar measure μ. This case includes that of finite
groups, together with that of the tori (R/Z)d .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 193

As μ is a Radon measure, compact groups G have finite measure. It is


then convenient to normalise the Haar measure μ so that μ(G) = 1, thus μ is
now a probability measure. For the remainder of this section, we will assume
that G is a compact abelian group and μ is its (unique) Haar probability
measure, as given by Exercise 1.12.6.
A key advantage of working in the compact setting is that multiplicative
characters χ : G → S 1 now lie in L2 (G) and L1 (G). In particular, they can
be integrated:
Lemma 1.12.2. Let χ be a multiplicative character. Then G χ dμ equals
1 when χ is trivial and 0 when χ is non-trivial. Equivalently, for ξ ∈ Ĝ, we
have G e2πiξ·x dμ = δ0 (ξ), where δ is the Kronecker delta function at 0.

Proof. The claim is clear when χ is trivial. When χ is non-trivial, there


exists x ∈ G such that χ(x) = 1. If one then integrates the identity τx χ =
χ(−x)χ using (1.99), one obtains the claim. 
Exercise 1.12.16. Show that the Pontryagin dual Ĝ of a compact abelian
group G is discrete (compare with Exercise 1.12.8).
Exercise 1.12.17. Show that the Fourier transform of the constant function
1 is the Kronecker delta function δ0 at 0. More generally, for any ξ0 ∈ Ĝ,
show that the Fourier transform of the multiplicative character x → e2πiξ0 ·x
is the Kronecker delta function δξ0 at ξ0 .

Since the pointwise product of two multiplicative characters is again a


multiplicative character and the conjugate of a multiplicative character is
also a multiplicative character, we obtain
Corollary 1.12.3. The space of multiplicative chararacters is an orthonor-
mal set in the complex Hilbert space L2 (G).

Actually, one can say more:


Theorem 1.12.4 (Plancherel theorem for compact abelian groups). Let G
be a compact abelian group with probability Haar measure μ. Then the space
of multiplicative characters is an orthonormal basis for the complex Hilbert
space L2 (G).

The full proof of this theorem requires the spectral theorem and is not
given here, though see Exercise 1.12.43 below. However, we can work out
some important special cases here.
• When G is a torus G = Td = (R/Z)d , the multiplicative characters
x → e2πiξ·x separate points (given any two x, y ∈ G, there exists a
character which takes different values at x and at y). The space of

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
194 1. Real analysis

finite linear combinations of multiplicative characters (i.e., the space


of trigonometric polynomials) is then an algebra closed under con-
jugation that separates points and contains the unit 1, and thus by
the Stone-Weierstrass theorem, is dense in C(G) in the uniform (and
hence in L2 ) topology, and is thus dense in L2 (G) (in the L2 topology)
also.
• The same argument works when G is a cyclic group Z/N Z, using the
multiplicative characters x → e2πiξx/N for ξ ∈ Z/N Z. As every finite
abelian group is isomorphic to the product of cyclic groups, we also
obtain the claim for finite abelian groups.
• Alternatively, when G is finite, one can argue by viewing the linear
operators τx : Cc (G) → Cc (G) as |G| × |G| unitary matrices (in fact,
they are permutation matrices) for each x ∈ G. The spectral theorem
for unitary matrices allows each of these matrices to be diagonalised;
as G is abelian, the matrices commute and so one can simultaneously
diagonalise these matrices. It is not hard to see that each simulta-
neous eigenvector of these matrices is a multiple of a character, and
so the characters span L2 (G), yielding the claim. (The same argu-
ment will in fact work for arbitrary compact abelian groups, once we
obtain the spectral theorem for unitary operators.)
If f ∈ L2 (G), the inner product f, χξ L2 (G) of f with any multiplicative
character χξ : x → e2πiξ·x is just the Fourier coefficient fˆ(ξ) of f at the
corresponding frequency. Applying the general theory of orthonormal bases
(see Section 1.4), we obtain the following consequences:

Corollary 1.12.5 (Plancherel theorem for compact abelian groups, again).


Let G be a compact abelian group with probability Haar measure μ.
• Parseval identity. For any f ∈ L2 (G), we have f 2L2 (G) =

|fˆ(ξ)|2 .
ξ∈Ĝ

• Parseval identity, II. For any f, g ∈ L2 (G), we have f, gL2 (G) =
 ˆ
ξ∈Ĝ f (ξ)ĝ(ξ).
• Unitarity. Thus the Fourier transform is a unitary transformation
from L2 (G) to 2 (Ĝ).
• Inversion formula. For any f ∈ L2 (G), the series x →
 ˆ 2πiξ·x
ξ∈Ĝ f (ξ)e converges unconditionally in L2 (G) to f .
• Inversion formula, II. For any sequence (cξ )ξ∈Ĝ in 2 (Ĝ), the series

x → ξ∈Ĝ cξ e2πiξ·x converges unconditionally in L2 (G) to a function
f with cξ as its Fourier coefficients.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 195

We can record here a textbook application of the Riesz-Thorin interpo-


lation theorem from Section 1.11. Observe that the Fourier transform map
F : f → fˆ maps L2 (G) to 2 (Ĝ) with norm 1, and also trivially maps L1 (G)
to ∞ (Ĝ) with norm 1. Applying the interpolation theorem, we conclude
the Hausdorff-Young inequality
(1.103) fˆ p (Ĝ) ≤ f Lp (G)

for all 1 ≤ p ≤ 2 and all f ∈ Lp (G); in particular, the Fourier transform



maps Lp (G) to p (Ĝ), where p is the dual exponent of p, thus 1/p+1/p = 1.
It is remarkably difficult (though not impossible) to establish the inequality
(1.103) without the aid of the Riesz-Thorin theorem. (For instance, one
could use the Marcinkiewicz interpolation theorem combined with the tensor
power trick.) The constant 1 cannot be improved, as can be seen by testing
(1.103) with the function f = 1 and using Exercise 1.12.17. By combining
(1.103) with Hölder’s inequality, one concludes that
(1.104) fˆ q (Ĝ) ≤ f Lp (G)

whenever 2 ≤ q ≤ ∞ and 1p + 1q ≤ 1. These are the optimal hypotheses on


p, q for which (1.104) holds, though we will not establish this fact here.
Exercise 1.12.18. If f, g ∈ L2 (G), show that the Fourier transform of
f g ∈ L1 (G) is given by the formula

fg(ξ) = fˆ(η)ĝ(ξ − η).


η∈Ĝ

Thus multiplication is converted via the Fourier transform to convolution;


compare this with (1.102).
Exercise 1.12.19 (Hardy-Littlewood majorant property). Let p ≥ 2 be an
even integer. If f, g ∈ Lp (G) are such that |fˆ(ξ)| ≤ ĝ(ξ) for all ξ ∈ Ĝ (in
particular, ĝ is non-negative), show that f Lp (G) ≤ gLp (G) . (Hint: Use
Exercise 1.12.18 and the Plancherel identity.) The claim fails for all other
values of p, a result of Fournier [Fo1974].
Exercise 1.12.20. In this exercise and the next two, we will work on the
torus T = R/Z with the probability Haar measure μ. The Pontryagin dual
T̂ is identified with Z in the usual manner, thus fˆ(n) = R/Z f (x)e−2πinx dx
for all f ∈ L1 (T). For every integer N > 0 and f ∈ L1 (T), define the partial
Fourier series SN f to be the expression
N
SN f (x) := fˆ(n)e2πinx .
n=−N

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
196 1. Real analysis

• Show that SN f = f ∗ DN , where DN is the Dirichlet kernel DN (x) :=


sin((N +1/2)x)
sin x/2 .
• Show that DN L1 (T) ≥ c log N for some absolute constant c > 0.
Conclude that the operator norm of SN on C(T) (with the uniform
norm) is at least c log N .
• Conclude that there exists a continuous function f such that the
partial Fourier series SN f does not converge uniformly. (Hint: Use
the uniform boundedness principle.) This is despite the fact that
SN f must converge to f in L2 norm, by the Plancherel theorem.
(Another example of non-uniform convergence of SN f is given by
the Gibbs phenomenon.)
Exercise 1.12.21. We continue the notational conventions of the preceding
exercise. For every integer N > 0 and f ∈ L1 (T), define the Césaro-summed
partial Fourier series CN f to be the expression
N −1
1
CN f (x) := Dn f (x).
N
n=0
• Show that CN f = f ∗ FN , where FN is the Fejér kernel FN (x) :=
1 sin(nx/2) 2
n ( sin(x/2) ) .
• Show that FN L1 (T) = 1. (Hint: What is the Fourier coefficient of
FN at zero?)
• Show that CN f converges uniformly to f for every f ∈ C(T). (Thus
we see that Césaro averaging improves the convergence properties of
Fourier series.)
Exercise 1.12.22. Carleson’s inequality asserts that for any f ∈ L2 (T),
one has the weak-type inequality
 sup |DN f (x)|L2,∞ (T) ≤ Cf L2 (T)
N >0
for some absolute constant C. Assuming this (deep) inequality, establish
Carleson’s theorem that for any f ∈ L2 (T), the partial Fourier series DN f (x)
converge for almost every x to f (x). (Conversely, a general principle of Stein
[St1961], analogous to the uniform boundedness principle, allows one to
deduce Carleson’s inequality from Carleson’s theorem. A later result of Hunt
[Hu1968] extends Carleson’s theorem to Lp (T) for any p > 1, but a famous
example of Kolmogorov shows that almost everywhere convergence can fail
for L1 (T) functions; in fact the series may diverge pointwise everywhere.)

1.12.3. The Fourier transform on Euclidean spaces. We now turn to


the Fourier transform on the Euclidean space Rd , where d ≥ 1 is a fixed
integer. From Exercise 1.12.9 we can identify the Pontryagin dual of Rd with

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 197

itself, and then the Fourier transform fˆ : Rd → C of a function f ∈ L1 (Rd )


is given by the formula

(1.105) fˆ(ξ) := f (x)e−2πiξ·x dx.


Rd

Remark 1.12.6. One needs the Euclidean inner product structure on Rd


d with Rd . Without this structure, it is more natural
in order to identify R
d with the dual space (Rd )∗ of Rd . (In the language of physics,
to identify R
one should interpret frequency as a covector rather than a vector.) However,
we will not need to consider such subtleties here. In areas of mathematics
other than harmonic analysis, the normalisation of the Fourier transform
(particularly with regard to the positioning of the sign—and the factor 2π)
is sometimes slightly different from that presented here. For instance, in
PDE, the factor of 2π is often omitted from the exponent in order to slightly
simplify the behaviour of differential operators under the Fourier transform
(at the cost of introducing factors of 2π in various identities, such as the
Plancherel formula or inversion formula).

In Exercise 1.12.11 we saw that if f was in L1 (Rd ), then fˆ was continuous


and decayed to zero at infinity. One can improve both the regularity and
decay on fˆ by strengthening the hypotheses on f . We need two basic facts:
Exercise 1.12.23 (Decay transforms to regularity). Let 1 ≤ j ≤ d, and
suppose that f, xj f both lie in L1 (Rd ), where xj is the jth coordinate func-
tion. Show that fˆ is continuously differentiable in the ξj variable, with
∂ ˆ 
f (ξ) = −2πix j f (ξ).
∂ξj
(Hint: The main difficulty is to justify differentiation under the integral sign.
Use the fact that the function x → eix has a derivative of magnitude 1 and
is hence Lipschitz by the fundamental theorem of calculus. Alternatively,
one can show first that fˆ(ξ) is the indefinite integral of −2πix j f and then
use the fundamental theorem of calculus.)
Exercise 1.12.24 (Regularity transforms to decay). Let 1 ≤ j ≤ d, and
∂f
suppose that f ∈ L1 (Rd ) has a derivative ∂x j
in L1 (Rd ), for which one has
the fundamental theorem of calculus
xj
∂f
f (x1 , . . . , xn ) = (x1 , . . . , xj−1 , t, xj+1 , . . . , xn ) dt
−∞ ∂xj
for almost every x1 , . . . , xn . (This is equivalent to f being absolutely con-
tinuous in xj for almost every x1 , . . . , xj−1 , xj+1 , . . . , xn .) Show that

∂f
(ξ) = 2πiξj fˆ(ξ).
∂xj

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
198 1. Real analysis

In particular, conclude that |ξj |fˆ(ξ) goes to zero as |ξ| → ∞.


Remark 1.12.7. Exercise 1.12.24 shows that Fourier transforms diago-
nalise differentiation: (constant-coefficient) differential operators, such as

∂xj , when viewed in frequency space, become nothing more than multi-
plication operators fˆ(ξ) → 2πiξj fˆ(ξ). (Multiplication operators are the
continuous analogue of diagonal matrices.) It is because of this fact that
the Fourier transform is extremely useful in PDE, particularly in constant-
coefficient linear PDE or perturbations thereof.

It is now convenient to work with a class of functions which has an


infinite amount of both regularity and decay.
Definition 1.12.8 (Schwartz class). A rapidly decreasing function is a mea-
surable function f : Rd → C such that |x|n f (x) is bounded for every non-
negative integer n. A Schwartz function is a smooth function f : Rd → C
such that all derivatives ∂xn11 · · · ∂xndd f are rapidly decreasing. The space of
all Schwartz functions is denoted S(Rd ).
Example 1.12.9. Any smooth, compactly supported function f : Rd → C
is a Schwartz function. The Gaussian functions
f (x) = Ae2πiθ e2πiξ0 ·x e−π|x−x0 |
2 /R2
(1.106)
for A ∈ R, θ ∈ R/Z, x0 , ξ0 ∈ Rd are also Schwartz functions.
Exercise 1.12.25. Show that the seminorms
f k,n := sup |x|n |∇k f (x)|
x∈Rn

for k, n ≥ 0, where we think of ∇k f (x) as a dk -dimensional vector (or, if


one wishes, a rank k d-dimensional tensor), give S(Rd ) the structure of a
Fréchet space. In particular, S(Rd ) is a topological vector space.

Clearly, every Schwartz function is both smooth and rapidly decreasing.


The following exercise explores the converse:
Exercise 1.12.26.
• Give an example to show that not all smooth, rapidly decreasing
functions are Schwartz.
• Show that if f is a smooth, rapidly decreasing function and all deriva-
tives of f are bounded, then f is Schwartz. (Hint: Use Taylor’s
theorem with remainder.)

One of the reasons why the Schwartz space is convenient to work with
is that it is closed under a wide variety of operations. For instance, the
derivative of a Schwartz function is again a Schwartz function, and that

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 199

the product of a Schwartz function with a polynomial is again a Schwartz


function. Here are some further such closure properties:
Exercise 1.12.27. Show that the product of two Schwartz functions is
again a Schwartz function. Moreover, show that the product map f, g → f g
is continuous from S(Rd ) × S(Rd ) to S(Rd ).
Exercise 1.12.28. Show that the convolution of two Schwartz functions
is again a Schwartz function. Moreover, show that the convolution map
f, g → f ∗ g is continuous from S(Rd ) × S(Rd ) to S(Rd ).
Exercise 1.12.29. Show that the Fourier transform of a Schwartz function
is again a Schwartz function. Moreover, show that the Fourier transform
map F : f → fˆ is continuous from S(Rd ) to S(Rd ).

The other important property of the Schwartz class is that it is dense in


many other spaces:
Exercise 1.12.30. Show that S(Rd ) is dense in Lp (Rd ) for every 1 ≤ p <
∞, and it is also dense in C0 (Rd ) (with the uniform topology). (Hint: One
can either use the Stone-Weierstrass theorem or convolutions with approxi-
mations to the identity.)

Because of this density property, it becomes possible to establish various


estimates and identities in spaces of rough functions (e.g., Lp functions) by
first establishing these estimates on Schwartz functions (where it is easy to
justify operations such as differentiation under the integral sign) and then
taking limits.
Having defined the Fourier transform F : S(Rd ) → S(Rd ), we now in-
troduce the adjoint Fourier transform F ∗ : S(Rd ) → S(Rd ) by the formula

F ∗ F (x) := e2πiξ·x F (ξ) dξ


Rd
(note the sign change from (1.105)). We will shortly demonstrate that the
adjoint Fourier transform is also the inverse Fourier transform, F ∗ = F −1 .
From the identity
(1.107) F ∗f = F f ,
we see that F ∗ obeys much the same propeties as F ; for instance, it is also
continuous from S(Rd ) to S(Rd ). It is also the adjoint to F in the sense
that
F f, gL2 (Rd ) = f, F ∗ gL2 (Rd )
for all f, g ∈ S(Rd ).
Now we show that F ∗ inverts F . We begin with an easy preliminary
result:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
200 1. Real analysis

Exercise 1.12.31. For any f, g ∈ S(Rd ), establish the identity F ∗ F (f ∗g) =


f ∗ F ∗ Fg.

Next, we perform a computation:

Exercise 1.12.32 (Fourier transform of Gaussians). Let r > 0. Show


that the Fourier transform of the Gaussian function gr (x) := r−d e−π|x| /r
2 2

is ĝr (ξ) = e−πr |ξ| . (Hint: Reduce to the case d = 1 and r = 1, then
2 2

complete the square and use contour integration and the classical identity
∞ −πx2
−∞ e dx = 1.) Conclude that F ∗ F gr = gr .

Exercise 1.12.33. With gr as in the previous exercise, show that f ∗ gr


converges in the Schwartz space topology to f as r → 0 for all f ∈ S(Rd ).
(Hint: First show convergence in the uniform topology, then use the identi-
ties ∂x∂ j (f ∗g) = ( ∂x∂ j f )∗g and xj (f ∗g) = (xj f )∗g+f (xj g) for f, g ∈ S(Rd ).)

From Exercises 1.12.31 and 1.12.32 we see that


F ∗ F (f ∗ gr ) = f ∗ gr
for all r > 0 and f ∈ S(Rd ). Taking limits as r → 0 using Exercises 1.12.29
and 1.12.33, we conclude that
F ∗F f = f
for all f ∈ S(Rd ), or in other words we have the Fourier inversion formula

(1.108) f (x) = fˆ(ξ)e2πiξ·x dξ


Rd

for all x ∈ Rd . From (1.107) we also have


F F ∗ f = f.
Taking inner products with another Schwartz function g, we obtain Parse-
val’s identity
F f, F gL2 (Rd ) = f, gL2 (Rd )
for all f, g ∈ S(Rd ), and similarly for F ∗ . In particular, we obtain Plan-
cherel’s identity
F f L2 (Rd ) = f L2 (Rd ) = F ∗ f L2 (Rd )

for all f ∈ S(Rd ). We conclude that

Theorem 1.12.10 (Plancherel’s theorem for Rd ). The Fourier transform


operator F : S → S can be uniquely extended to a unitary transformation
F : L2 (Rd ) → L2 (Rd ).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 201

Exercise 1.12.34. Show that the Fourier transform on L2 (Rd ) given by


Plancherel’s theorem agrees with the Fourier transform on L1 (Rd ) given by
(1.105) on the common domain L2 (Rd ) ∩ L1 (Rd ). Thus we may define fˆ
for f ∈ L1 (Rd ) or f ∈ L2 (Rd ) (or even f ∈ L1 (Rd ) + L2 (Rd ) without
any ambiguity (other than the usual identification of any two functions that
agree almost everywhere).

Note that it is certainly possible for a function f to lie in L2 (Rd ) but


not in L1 (Rd ) (e.g., the function (1 + |x|)−d ). In such cases, the integrand in
(1.105) is not absolutely integrable, and so this formula does not define the
Fourier transform of f directly. Nevertheless, one can recover the Fourier
transform via a limiting version of (1.105):
Exercise 1.12.35. Let f ∈ L2 (Rd ). Show that the partial Fourier integrals
ξ → |x|≤R f (x)e−2πiξ·x dx converge in L2 (Rd ) to fˆ as R → ∞.
Remark 1.12.11. It is a famous open question whether the partial Fourier
integrals of an L2 (Rd ) function also converge pointwise almost everywhere
for d ≥ 2. For d = 1, this is essentially the celebrated theorem of Carleson
mentioned in Exercise 1.12.22.
Exercise 1.12.36 (Heisenberg uncertainty principle). Let d = 1. Define the
position operator X : S(R) → S(R) and momentum operator D : S(R) →
S(R) by the formulae
−1 d
Xf (x) := xf (x), Df (x) := f (x).
2πi dx
Establish the identities
−1
(1.109) F D = XF , F X = −F D, DX − XD =
2πi
and the formal self-adjointness relationships
Xf, gL2 (R) = f, XgL2 (R) , Df, gL2 (R) = f, DgL2 (R)
and then establish the inequality
1
Xf L2 (R) Df L2 (R) ≥ f 2L2 (R) .

(Hint: Start with the obvious inequality (aX +ibD)f, (aX +ibD)f L2 (R) ≥
0 for real numbers a, b, and optimise in a and b.) If f L2 (R) = 1, deduce
the Heisenberg uncertainty principle
1
[ (ξ − ξ0 )|fˆ(ξ)|2 dξ]1/2 [ (x − x0 )|f (x)|2 dx]1/2 ≥
R R 4π
for any x0 , ξ0 ∈ R. (Hint: One can use the translation and modulation
symmetries (1.100), (1.101) of the Fourier transform to reduce to the case
x0 = ξ0 = 0.) Classify precisely the f, x0 , ξ0 for which equality occurs.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
202 1. Real analysis

Remark 1.12.12. For x0 , ξ0 ∈ Rd and R > 0, define the Gaussian wave


packet gx0 ,ξ0 ,R by the formula

gx0 ,ξ0 ,R (x) := 2d/2 R−d/2 e2πiξ0 ·x e−π|x−x0 |


2 /R2
.

These wave packets are normalised to have L2 norm one, and their Fourier
transform is given by

(1.110) ĝx0 ,ξ0 ,R = e2πiξ0 ·x0 gξ0 ,−x0 ,1/R .

Informally, gx0 ,ξ0 ,R is localised to the region x = x0 +O(R) in physical space,


and to the region ξ = ξ0 + O(1/R) in frequency space; observe that this is
consistent with the uncertainty principle. These packets almost diagonalise
the position and momentum operators X, D in the sense that (taking d = 1
for simplicity)

Xgx0 ,ξ0 ,R ≈ x0 gx0 ,ξ0 ,R , Dgx0 ,ξ0 ,R ≈ ξ0 gx0 ,ξ0 ,R ,

where the errors terms are morally of the form O(Rgx0 ,ξ0 ,R ) and
O(R−1 gx0 ,ξ0 ,R ) respectively. Of course, the non-commutativity of D and
X as evidenced by the last equation in (1.109) shows that exact diagonali-
sation is impossible. Nevertheless it is useful, at an intuitive level at least, to
view these wave packets as a sort of (overdetermined) basis for L2 (R) that
approximately diagonalises X and D (as well as other formal combinations
a(X, D) of these operators, such as differential operators or pseudodifferen-
tial operators). Meanwhile, the Fourier transform morally maps the point
(x0 , ξ0 ) in phase space to (ξ0 , −x0 ), as evidenced by (1.110) or (1.109); it is
the model example of the more general class of Fourier integral operators,
which morally move points in phase space around by canonical transfor-
mations. The study of these types of objects (which are of importance in
linear PDE) is known as microlocal analysis, and is beyond the scope of this
course.

The proof of the Hausdorff-Young inequality (1.103) carries over to the


Euclidean space setting, and gives

(1.111) fˆLp (Rd ) ≤ f Lp (Rd )

for all 1 ≤ p ≤ 2 and all f ∈ Lp (Rd ); in particular the Fourier transform is



bounded from Lp (Rd ) to Lp (Rd ). The constant of 1 on the right-hand side
of (1.111) turns out to not be optimal in the Euclidean setting, in contrast to

the compact setting; the sharp constant is in fact (p1/p /(p )1/p )d/2 , a result
of Beckner [Be1975]. (The fact that this constant cannot be improved can
be seen by using the Gaussians from Exercise 1.12.32.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 203

Exercise 1.12.37 (Entropy uncertainty principle). For any f ∈ S(Rd ) with


f L2 (Rd ) = 1, show that
1 1
− |f (x)|2 log dx − |fˆ(ξ)|2 log dξ ≥ 0.
Rd |f (x)|2 Rd ˆ
|f (ξ)|2
(Hint: Differentiate (!) (1.104) in p at p = 2, where one has equality in
(1.104).) Using Beckner’s improvement to (1.103), improve the right-hand
side to the optimal value of d log(2e).
Exercise 1.12.38 (Fourier transform under linear changes of variable). Let
L : Rd → Rd be an invertible linear transformation. If f ∈ S(Rd ) and
fL (x) := f (Lx), show that the Fourier transform of fL is given by the
formula
1
fˆL (ξ) = fˆ((L∗ )−1 ξ),
| det L|
where L∗ : Rd → Rd is the adjoint operator to L. Verify that this transfor-
mation is consistent with (1.104) and indeed shows that the exponent p on
the left-hand side cannot be replaced by any other exponent. (One can also
establish this latter claim by dimensional analysis.)
Remark 1.12.13. As a corollary of Exercise 1.12.38, observe that if f ∈
S(Rd ) is spherically symmetric (thus f = f ◦ L for all rotation matrices L),
then fˆ is spherically symmetric also.
Exercise 1.12.39 (Fourier transform intertwines restriction and projec-
tion). Let 1 ≤ r ≤ d, and let f ∈ S(Rd ). We express Rd as Rr × Rd−r in
the obvious manner.
• Restriction becomes projection. If g ∈ S(Rr ) is the restriction g(x) :=
f (x, 0) of f to Rr ≡ Rr × {0}, show that ĝ(ξ) =
ˆ
Rd−r f (ξ, η) dη for all ξ ∈ R .
r

• Projection becomes restriction. If h ∈ S(Rr ) is the projection h(x) :=


d−r , show that ĥ(ξ) = fˆ(ξ, 0) for
Rd−r f (x, y) dy of f to R ≡ R /R
r d

all ξ ∈ R .
r

Exercise 1.12.40 (Fourier transform on large tori). Let L > 0, and let
(R/LZ)d be the torus of length L with Lebesgue measure dx (thus the total
measure of this torus is Ld . We identify the Pontryagin dual of this torus
with L1 · Zd in the usual manner, thus we have the Fourier coefficients

fˆ(ξ) := f (x)e−2πiξ·x dx
(R/LZ)d

for all f ∈ L1 ((R/LZ)d ) and ξ ∈ 1


L · Zd .
• Show that for any f ∈ L2 ((R/LZ)d ), the Fourier series
1  ˆ 2πiξ·x converges unconditionally in L2 ((R/LZ)d ).
Ld ξ∈ 1 ·Zd f (ξ)e
L

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
204 1. Real analysis

• Use this to give an alternate proof of the Fourier inversion formula


(1.108) in the case where f is smooth and compactly supported.
Exercise 1.12.41 (Poisson summation formula). Let f ∈ S(R ). Show that
d

the function F : (R/Z) → C defined by F (x + Z ) := n∈Zd f (x + n) has


d d

Fourier transform F̂ (ξ) = fˆ(ξ) for all ξ ∈ Zd ⊂ Rd (note the two different
Fourier transforms in play here). Conclude the Poisson summation formula
f (n) = fˆ(m).
n∈Zd m∈Zd

Exercise 1.12.42. Let f : Rd → C be a compactly supported, absolutely


integrable function. Show that the function fˆ is real-analytic. Conclude
that it is not possible to find a non-trivial f ∈ L1 (Rd ) such that f and fˆ
are both compactly supported.

1.12.4. The Fourier transform on general groups (optional). The


field of abstract harmonic analysis is concerned, among other things, with
extensions of the above theory to more general groups, for instance arbitrary
LCA groups. One of the ways to proceed is via Gelfand theory, which for
instance can be used to show that the Fourier transform is at least injective:
Exercise 1.12.43 (Fourier analysis via Gelfand theory). (Optional) In this
exercise we use the Gelfand theory of commutative Banach *-algebras (see
Section 1.10.4) to establish some basic facts of Fourier analysis in general
groups. Let G be an LCA group. We view L1 (G) as a commutative Banach
*-algebra L1 (G) (see Exercise 1.12.13).
(a) If f ∈ L1 (G) is such that lim inf n→∞ f ∗n L1 (G) > 0, where f ∗n =
1/n

f ∗ · · · ∗ f is the convolution of n copies of f , show that there exists


a non-zero complex number z such that the map g → f ∗ g − zg is
not invertible on L1 (G). (Hint: If L1 (G) contains a unit, one can use
Exercise 1.10.36; otherwise, adjoin a unit.)
(b) If f and z are as in (a), show that there exists a character λ : L1 (G) →
C (in the sense of Banach *-algebras, see Definition 1.10.25) such
that f ∗ g − zg lies in the kernel of λ for all g ∈ L1 (G). Conclude in
particular that λ(f ) is non-zero.
(c) If λ : L1 (G) → C is a character, show that there exists a multiplica-
tive character χ : G → S 1 such that λ(f ) = f, χ for all f ∈ L1 (G).
(You will need Exercise 1.12.5 and Exercise 1.12.10.)
(d) For any f ∈ L1 (G) and g ∈ L2 (G), show that |f ∗ g ∗ g ∗ (0)| ≤
|f ∗ f ∗ ∗ g ∗ g ∗ (0)|1/2 |g ∗ g ∗ (0)|1/2 , where 0 is the group identity and
f ∗ (x) := f (−x) is the conjugate of f . (Hint: The inner product
f1 , f2 g := f1 ∗ f2∗ ∗ g ∗ g ∗ (0) is positive semidefinite.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 205

(e) Show that if f ∈ L1 (G) is not identically zero, then there exists
ξ ∈ Ĝ such that fˆ(ξ) = 0. (Hint: First find g ∈ L2 (G) such that
f ∗ g ∗ g ∗ (0) = 0 and g ∗ g ∗ (0) = 0, and conclude using (d) re-
peatedly that lim inf n→∞ (f ∗ f ∗ )∗n L1 (G) > 0. Then use (a), (b),
1/n

(c).) Conclude that the Fourier transform is injective on L1 (G).


(The image of L1 (G) under the Fourier transform is then a Banach
*-algebra known as the Wiener algebra, and is denoted A(Ĝ).)
(f) Prove Theorem 1.12.4.
It is possible to use arguments similar to those in Exercise 1.12.43 to
characterise positive measures on Ĝ in terms of continuous functions on G,
leading to Bochner’s theorem:
Theorem 1.12.14 (Bochner’s theorem). Let φ ∈ C(G) be a continuous
function on an LCA group G. Then the following are equivalent:
N N
(a) n=1 m=1 cn cm φ(xn − xm ) ≥ 0 for all x1 , . . . , xN ∈ G and c1 , . . . ,
cN ∈ C.
(b) There exists a non-negative finite Radon measure ν on Ĝ such that
φ(x) = Ĝ e2πiξ·x dν(ξ).
Functions obeying either (a) or (b) are known as positive-definite func-
tions. The space of such functions is denoted B(G).
Exercise 1.12.44. Show that (b) implies (a) in Bochner’s theorem. (The
converse implication is significantly harder, reprising much of the machinery
in Exercise 1.12.43, but with φ taking the place of g∗g ∗ ; see Rudin [Ru1962]
for details.)
Using Bochner’s theorem, it is possible to show
Theorem 1.12.15 (Plancherel’s theorem for LCA groups). Let G be an
LCA group with non-trivial Haar measure μ. Then there exists a non-trivial
Haar measure ν on Ĝ such that the Fourier transform on L1 (G) ∩L2 (G) can
be extended continuously to a unitary transformation from L2 (G) to L2 (Ĝ).
In particular we have the Plancherel identity

|f (x)|2 dμ(x) = |fˆ(ξ)|2 dν(ξ)


G Ĝ
for all f ∈ L2 (G) and the Parseval identity

f (x)g(x) dμ(x) = fˆ(ξ)ĝ(ξ) dν(ξ)


G Ĝ
for all f, g ∈ L2 (G). Furthermore, the inversion formula

f (x) = fˆ(ξ)e2πiξ·x dν(ξ)


Author's preliminary version made available with permission of the publisher, the American Mathematical Society
206 1. Real analysis

is valid for f in a dense subclass of L2 (G) (in particular, it is valid for


f ∈ L1 (G) ∩ B(G)).

Again, see Rudin [Ru1962] for details. A related result is that of Pon-
tryagin duality: if Ĝ is the Pontryagin dual of an LCA group G, then G is
the Pontryagin dual of Ĝ. (Certainly, every element x ∈ G defines a charac-
ˆ
ter x̂ : ξ → ξ · x on Ĝ, thus embedding G into Ĝ via the Gelfand transform
(see Section 1.10.4). The non-trivial fact is that this embedding is in fact
surjective.) One can use Pontryagin duality to convert various properties
of LCA groups into other properties on LCA groups. For instance, we have
already seen that Ĝ is compact (resp. discrete) if G is discrete (resp. com-
pact); with Pontryagin duality, the implications can now also be reversed.
As another example, one can show that Ĝ is connected (resp. torsion-free)
if and only if G is torsion-free (resp. connected). We will not prove these
assertions here.
It is natural to ask what happens for non-abelian locally compact groups
G = (G, ·). One can still build non-trivial Haar measures (the proof sketched
out in Exercise 1.12.7 extends without difficulty to the non-abelian setting),
though one must now distinguish between left-invariant and right-invariant
Haar measures. (The two notions are equivalent for some classes of groups,
notably compact groups, but not in general. Groups for which the two
notions of Haar measures coincide are called unimodular.) However, when
G is non-abelian then there are not enough multiplicative characters χ :
G → S 1 to have a satisfactory Fourier analysis. (Indeed, such characters
must annihilate the commutator group [G, G], and it is entirely possible for
this commutator group to be all of G, e.g., if G is simple and non-abelian.)
Instead, one must generalise the notion of a multiplicative character to that
of a unitary representation ρ : G → U (H) from G to the group of unitary
transformations on a complex Hilbert space H; thus the Fourier coefficients
fˆ(ρ) of a function will now be operators on this Hilbert space H, rather than
complex numbers. When G is a compact group, it turns out to be possible
to restrict our attention to finite-dimensional representations (thus one can
replace U (H) by the matrix group U (n) for some n). The analogue of the
Pontryagin dual Ĝ is then the collection of (irreducible) finite-dimensional
unitary representations of G, up to isomorphism. There is an analogue of
the Plancherel theorem in this setting, closely related to the Peter-Weyl
theorem in representation theory. We will not discuss these topics here, but
refer the reader instead to any representation theory text.
The situation for non-compact non-abelian groups (e.g., SL2 (R)) is sig-
nificantly more subtle, as one must now consider infinite-dimensional repre-
sentations as well as finite-dimensional ones, and the inversion formula can
become quite non-trivial (one has to decide what weight each representation

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 207

should be assigned in that formula). At this point it seems unprofitable to


work in the category of locally compact groups, and specialise to a more
structured class of groups, e.g., algebraic groups. The representation theory
of such groups is a massive subject and well beyond the scope of this course.

1.12.5. Relatives of the Fourier transform (optional). There are a


number of other Fourier-like transforms used in mathematics, which we will
briefly survey here. First, there are some rather trivial modifications one
can make to the definition of Fourier transform, for instance by replacing
the complex exponential e2πix by trigonometric functions such as sin(x)
and cos(x), or moving around the various factors of 2π, i, −1, etc. in the
definition. In this spirit, we have the Laplace transform

(1.112) Lf (t) := f (s)e−st ds
0
of a measurable function f : [0, +∞) → R with some reasonable growth at
infinity, where t > 0. Roughly speaking, the Laplace transform is the Fourier
transform without the i (cf. Wick rotation), and so has the (mild) advantage
of being definable in the realm of real-valued functions rather than complex-
valued functions. It is particularly well suited for studying ODE on the half-
line [0, +∞) (e.g., initial value problems for a finite-dimensional system).
The Laplace transform and Fourier transform can be unified by allowing the
t parameter in (1.112) to vary in the right-half plane {t ∈ C : Re(t) ≥ 0}.
When the Fourier transform is applied to a spherically symmetric func-
tion f (x) := F (|x|) on Rd , then the Fourier transform is also spherically
symmetric, given by the formula fˆ(ξ) = G(|ξ|), where G is the Fourier-
Bessel transform (or Hankel transform)

G(r) := 2πr−(d−2)/2 F (s)J(d−2)/2 (2πrs)sd/2 ds,
0
and Jν is the Bessel function of the first kind with index ν. In practice, one
can then analyse the Fourier-analytic behaviour of spherically symmetric
functions in terms of one-dimensional Fourier-like integrals by using various
asymptotic expansions of the Bessel function.
There is a relationship between the d-dimensional Fourier transform and
the one-dimensional Fourier transform, provided by the Radon transform,
defined for f ∈ S(Rd ) (say) by the formula

Rf (ω, t) := f,
x·ω=t

where ω ∈ S d−1 , t ∈ R, and the integration is with respect to (d − 1)-


dimensional measure. Indeed one checks that the d-dimensional Fourier
transform of f at rω for some r > 0 and ω ∈ S d−1 is nothing more than

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
208 1. Real analysis

the one-dimensional Fourier coefficient of the function t → Rf (ω, t) at r.


The Radon transform is often used in scattering theory and related areas of
analysis, geometry, and physics.
In analytic number theory, a multiplicative version of the Fourier-Laplace
transform is often used, namely the Mellin transform

dx
Mf (s) := xs f (x) .
0 x
(Note that dx +
x is a Haar measure for the multiplicative group R = (0, +∞).)
To see the relation with the Fourier-Laplace transform, write f (x) =
F (log x), then the Mellin transform becomes

Mf (s) = est f (t) dt.


R

Many functions of importance in analytic number theory, such as the Gamma


function or the zeta function, can be expressed neatly in terms of Mellin
transforms.
In electrical engineering and signal processing, the z-transform is often
used, transforming a sequence c = (cn )∞
n=−∞ of complex numbers to a formal
Laurent series

Zc(z) := cn z n
n=−∞

(some authors use z −ninstead of zn


here). If one makes the substitution
z = e2πinx , then this becomes a (formal) Fourier series expansion on the unit
circle. If the sequence cn is restricted to be non-zero only for non-negative
n and does not grow too quickly as n → ∞, then the z-transform becomes
holomorphic on the unit disk, thus providing a link between Fourier analysis
and complex analysis. For instance, the standard formula
1 f (z)
cn = dz
2πi |z|=1 z n+1

for the Taylor coefficients of a holomorphic function f (z) = ∞ n
n=0 cn z at
the origin can be viewed as a version of the Fourier inversion formula for the
torus R/Z. Just as the Fourier or Laplace transforms are useful for analysing
differential equations in continuous settings, the z-transform is useful for
analysing difference equations in discrete settings. The z-transform is of
course also very similar to the method of generating functions in combina-
torics and probability.
In probability theory one also considers the characteristic function
E(eitX ) of a real-valued random variable X; this is essentially the Fourier
transform of the probability distribution of X. Just as the Fourier transform

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 209

is useful for understanding convolutions f ∗ g, the characteristic function is


useful for understanding sums X1 + X2 of independent random variables.
We have briefly touched upon the role of Gelfand theory in the general
theory of the Fourier transform. Indeed, one can view the Fourier transform
as the special case of the Gelfand transform for Banach *-algebras, which
we already discussed in Section 1.10.4.
The Fast Fourier Transform (FFT) is not, strictly speaking, a variant
of the Fourier transform, but rather an efficient algorithm for computing the
Fourier transform
N −1
1
ˆ
f (ξ) = f (x)e−2πiξx/N
N
n=0
on a cyclic group Z/N Z ≡ {0, . . . , N − 1}, when N is large but composite.
Note that a brute force computation of this transform for all N values of
ξ would require about O(N 2 ) addition and multiplication operations. The
FFT algorithm, in contrast, takes only O(N log N ) operations, and is based
on reducing the FFT for a large N to the FFT for smaller N . For instance,
suppose N is even, say N = 2M , then observe that
1
fˆ(ξ) = (fˆ0 (ξ) + e−2πiξ/N fˆ1 (ξ)),
2
where f0 , f1 : Z/M Z → C are the functions fj (x) := f (2x + j). Thus one
can obtain the Fourier transform of the length N vector f from the Fourier
transforms of the two length M vectors f0 , f1 after about O(N ) operations.
Iterating this, we see that we can indeed compute fˆ in O(N log N ) opera-
tions, at least in the model case when N is a power of two; the general case
has a similar but more complicated analysis.
In many situations (particularly in ergodic theory), it is desirable not
to perform Fourier analysis on a group G directly, but instead on another
space X that G acts on. Suppose for instance that G is a compact abelian
group, with probability Haar measure dg, which acts in a measure-preserving
(and measurable) fashion on a probability space (X, μ). Then  one can de-
compose any f ∈ L2 (X) into Fourier components f = ξ∈Ĝ fξ , where
fξ (x) := G e −2πiξ·g f (gx) dg, where the series is unconditionally convergent
2
in L (X). The reason for doing this is that each of the fξ behaves in a simple
way with respect to the group action, indeed one has fξ (gx) = e2πiξ·g fξ (x)
for (almost) all g ∈ G, x ∈ X. This decomposition is closely related to
the decomposition in representation theory of a given representation into
irreducible components. Perhaps the most basic example of this type of
operation is the decomposition of a function f : R → R into even and
odd components f (x)+f 2
(−x) f (x)−f (−x)
, 2 ; here the underlying group is Z/2Z,
which acts on R by reflections, gx := (−1)g x.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
210 1. Real analysis

The operation of converting a square matrix A = (aij )1≤i,j≤n of numbers


into eigenvalues λ1 , . . . , λn or singular values σ1 , . . . , σn can be viewed as a
sort of non-commutative generalisation of the Fourier transform. (Note that
the eigenvalues of a circulant matrix are essentially the Fourier n  coefficients of
the first row of that matrix.) For instance, the identity i=1 nj=1 |aij |2 =
n 2
k=1 σk can be viewed as a variant of the Plancherel identity. More gener-
ally, there are close relationships between spectral theory and Fourier anal-
ysis (as one can already see from the connection to Gelfand theory). For
instance, in Rd and Td , one can view Fourier analysis as the spectral the-
ory of the gradient operator ∇ (note that the characters e2πiξ·x are joint
eigenfunctions of ∇). As the gradient operator is closely related to the
Laplacian Δ, it is not surprising that Fourier analysis is also closely re-
lated to the spectral theory of the Laplacian, and in particular to various
operators built using the Laplacian (e.g., resolvents, heat kernels, wave op-
erators, Schrödinger operators, Littlewood-Paley projections, etc.) Indeed,
the spectral theory of the Laplacian can serve as a partial substitute for the
Fourier transform in situations in which there is not enough symmetry to
exploit Fourier-analytic techniques (e.g., on a manifold with no translation
symmetries).
Finally, there is an analogue of the Fourier duality relationship between
an LCA group G and its Pontryagin dual Ĝ in algebraic geometry, known
as the Fourier-Mukai transform, which relates an abelian variety X to its
dual X̂, and transforms coherent sheaves on the former to coherent sheaves
on the latter. This transform obeys many of the algebraic identities that
the Fourier transform does, although it does not seem to have much of the
analytic structure.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/04/06.
Thanks to Hunter, Marco Frasca, Max Baroi, PDEbeginner, timur, Xi-
aochuan Liu, and anonymous commenters for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.13

Distributions

In set theory, a function f : X → Y is defined as an object that evaluates


every input x to exactly one output f (x). However, in various branches of
mathematics, it has become convenient to generalise this classical concept of
a function to a more abstract one. For instance, in operator algebras, quan-
tum mechanics, or non-commutative geometry, one often replaces commuta-
tive algebras of (real or complex-valued) functions on some space X, such
as C(X) or L∞ (X), with a more general—and possibly non-commutative—
algebra (e.g., a C ∗ -algebra or a von Neumann algebra). Elements in this
more abstract algebra are no longer definable as functions in the classical
sense of assigning a single value f (x) to every point x ∈ X, but one can still
define other operations on these generalised functions (e.g., one can multiply
or take inner products between two such objects).
Generalisations of functions are also very useful in analysis. In our study
of Lp spaces, we have already seen one such generalisation, namely the con-
cept of a function defined up to almost everywhere equivalence. Such a
function f (or more precisely, an equivalence class of classical functions)
cannot be evaluated at any given point x if that point has measure zero.
However, it is still possible to perform algebraic operations on such func-
tions (e.g., multiplying or adding two functions together), and one can also
integrate such functions on measurable sets (provided, of course, that the
function has some suitable integrability condition). We also know that the

Lp spaces can usually be described via duality, as the dual space of Lp (ex-
cept in some endpoint cases, namely when p = ∞, or when p = 1 and the
underlying space is not σ-finite).

211

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
212 1. Real analysis

We have also seen (via the Lebesgue-Radon-Nikodym theorem) that lo-


cally integrable functions f ∈ L1loc (R) on, say, the real line R, can be iden-
tified with locally finite absolutely continuous measures mf on the line,
by multiplying Lebesgue measure m by the function f . So another way
to generalise the concept of a function is to consider arbitrary locally fi-
nite Radon measures μ (not necessarily absolutely continuous), such as the
Dirac measure δ0 . With this concept of generalised function, one can still
add and subtract two measures μ, ν, and integrate any measure μ against a
(bounded) measurable set E to obtain a number μ(E), but one cannot eval-
uate a measure μ (or more precisely, the Radon-Nikodym derivative dμ/dm
of that measure) at a single point x, and one also cannot multiply two mea-
sures together to obtain another measure. From the Riesz representation
theorem, we also know that the space of (finite) Radon measures can be
described via duality, as linear functionals on Cc (R).
There is an even larger class of generalised functions that is very use-
ful, particularly in linear PDE, namely the space of distributions, say on
a Euclidean space Rd . In contrast to Radon measures μ, which can be
defined by how they pair up against continuous, compactly supported test
functions f ∈ Cc (Rd ) to create numbers f, μ := Rd f dμ, a distribu-
tion λ is defined by how it pairs up against a smooth compactly supported
function f ∈ Cc∞ (Rd ) to create a number f, λ. As the space Cc∞ (Rd )
of smooth compactly supported functions is smaller than (but dense in)
the space Cc (Rd ) of continuous compactly supported functions (and has a
stronger topology), the space of distributions is larger than that of measures.
But the space Cc∞ (Rd ) is closed under more operations than Cc (Rd ), and in
particular is closed under differential operators (with smooth coefficients).
Because of this, the space of distributions is similarly closed under such op-
erations; in particular, one can differentiate a distribution and get another
distribution, which is something that is not always possible with measures
or Lp functions. But as measures or functions can be interpreted as dis-
tributions, this leads to the notion of a weak derivative for such objects,
which makes sense (but only as a distribution) even for functions that are
not classically differentiable. Thus the theory of distributions can allow one
to rigorously manipulate rough functions as if they were smooth, although
one must still be careful as some operations on distributions are not well de-
fined, most notably the operation of multiplying two distributions together.
Nevertheless one can use this theory to justify many formal computations
involving derivatives, integrals, etc., including several computations used
routinely in physics, that would be difficult to formalise rigorously in a
purely classical framework.
If one shrinks the space of distributions slightly to the space of tem-
pered distributions (which is formed by enlarging dual class Cc∞ (Rd ) to the

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 213

Schwartz class S(Rd )), then one obtains closure under another important
operation, namely the Fourier transform. This allows one to define vari-
ous Fourier-analytic operations (e.g., pseudodifferential operators) on such
distributions.
Of course, at the end of the day, one is usually not all that interested in
distributions in their own right, but would like to be able to use them as a
tool to study more classical objects, such as smooth functions. Fortunately,
one can recover facts about smooth functions from facts about the (far
rougher) space of distributions in a number of ways. For instance, if one
convolves a distribution with a smooth, compactly supported function, one
gets back a smooth function. This is a particularly useful fact in the theory
of constant-coefficient linear partial differential equations such as Lu = f ,
as it allows one to recover a smooth solution u from smooth, compactly
supported data f by convolving f with a specific distribution G, known as
the fundamental solution of L. We will give some examples of this later in
this section.
It is this unusual and useful combination of both being able to pass
from classical functions to generalised functions (e.g., by differentiation)
and then back from generalised functions to classical functions (e.g., by
convolution) that sets the theory of distributions apart from other competing
theories of generalised functions, in particular allowing one to justify many
formal calculations in PDE and Fourier analysis rigorously with relatively
little additional effort. On the other hand, being defined by linear duality,
the theory of distributions becomes somewhat less useful when one moves
to more nonlinear problems, such as nonlinear PDE. However, they still
serve an important supporting role in such problems as an ambient space of
functions, inside of which one carves out more useful function spaces, such
as Sobolev spaces, which we will discuss in Section 1.14.

1.13.1. Smooth functions with compact support. In the rest of the


notes we will work on a fixed Euclidean space Rd . (One can also define
distributions on other domains related to Rd , such as open subsets of Rd ,
or d-dimensional manifolds, but for simplicity we shall restrict our attention
to Euclidean spaces in these notes.)
A test function is any smooth, compactly supported function f : Rd →
C; the space of such functions is denoted13 Cc∞ (Rd ).
From analytic continuation, one sees that there are no real-analytic test
functions other than the zero function. Despite this negative result, test
functions actually exist in abundance:

13 In some texts, this space is denoted C0∞ (Rd ) instead.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
214 1. Real analysis

Exercise 1.13.1.
(i) Show that there exists at least one test function that is not identically
zero. (Hint: It suffices to do this for d = 1. One starting point is to
use the fact that the function f : R → R defined by f (x) := e−1/x
for x > 0 and f (x) := 0 otherwise is smooth, even at the origin 0.)
(ii) Show that if f ∈ Cc∞ (Rd ) and g : Rd → R is absolutely inte-
grable and compactly supported, then the convolution f ∗ g is also in
Cc∞ (Rd ). (Hint: First show that f ∗ g is continuously differentiable
with ∇(f ∗ g) = (∇f ) ∗ g.)
(iii) C ∞ Urysohn lemma. Let K be a compact subset of Rd , and let U
be an open neighbourhood of K. Show that there exists a function
f : Cc∞ (Rd ) supported in U which equals 1 on K. (Hint: Use the
ordinary Urysohn lemma to find a function in Cc (Rd ) that equals 1
on a neighbourhood of K and is supported in a compact subset of U ,
then convolve this function by a suitable test function.)
(iv) Show that Cc∞ (Rd ) is dense in C0 (Rd ) (in the uniform topology),
and dense in Lp (Rd ) (with the Lp topology) for all 0 < p < ∞.

The space Cc∞ (Rd ) is clearly a vector space. Now we place


 a (very
strong!) topology on it. We first observe that Cc∞ (Rd ) = K Cc∞ (K),
where K ranges over all compact subsets of Rd and Cc∞ (K) consists of
those functions f ∈ Cc∞ (Rd ) which are supported in K. Each Cc∞ (K) will
be given a topology (called the smooth topology) generated by the norms
k
f C k := sup |∇j f (x)|
x∈Rd j=0

for k = 0, 1, . . ., where we view ∇j f (x) as a dj -dimensional vector (or, if


one wishes, a d-dimensional rank j tensor). Thus a sequence fn ∈ Cc∞ (K)
converges to a limit f ∈ Cc∞ (K) if and only if ∇j fn converges uniformly to
∇j f for all j = 0, 1, . . .. (This gives Cc∞ (K) the structure of a Fréchet space,
though we will not use this fact here.)
We make the trivial remark that if K ⊂ K  are compact sets, then
Cc∞ (K) is a subspace of Cc∞ (K  ), and the topology on the former space is
the restriction of the topology of the latter space. Because of this, we are
able to give Cc∞ (Rd ) the final topology induced by the topologies on the
Cc∞ (K), defined as the strongest topology on Cc∞ (Rd ) which restricts to the
topologies on Cc∞ (K) for each K. Equivalently, a set is open in Cc∞ (Rd ) if
and only if its restriction to Cc∞ (K) is open for every compact K.
Exercise 1.13.2. Let fn be a sequence in Cc∞ (Rd ), and let f be another
function in Cc∞ (Rd ). Show that fn converges in the topology of Cc∞ (Rd ) to

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 215

f if and only if there exists a compact set K such that fn , f are all supported
in K, and fn converges to f in the smooth topology of Cc∞ (K).

Exercise 1.13.3.
(i) Show that the topology of Cc∞ (K) is first countable for every compact
K.
(ii) Show that the topology of Cc∞ (Rd ) is not first countable. (Hint:
Given any countable sequence of open neighbourhoods of 0, build a
new open neighbourhood that does not contain any of the previous
ones, using the σ-compact nature of Rd .)
(iii) Despite this, show that an element f ∈ Cc∞ (Rd ) is an adherent point
of a set E ⊂ Cc∞ (Rd ) if and only if there is a sequence fn ∈ E that
converges to f . (Hint: Argue by contradiction.) Conclude in partic-
ular that a subset of Cc∞ (Rd ) is closed if and only if it is sequentially
closed. Thus while first countability fails for Cc∞ (Rd ), we have a
serviceable substitute for this property.

There are plenty of continuous operations on Cc∞ (Rd ):

Exercise 1.13.4.
(i) Let K be a compact set. Show that a linear map T : Cc∞ (K) → X
into a normed vector space X is continuous if and only if there exists
k ≥ 0 and C > 0 such that T f X ≤ Cf C k for all f ∈ Cc∞ (K).
(ii) Let K, K  be compact sets. Show that a linear map T : Cc∞ (K) →
Cc∞ (K  ) is continuous if and only if for every k ≥ 0 there exists
k  ≥ 0 and a constant Ck > 0 such that T f C k ≤ Ck f C k for all
f ∈ Cc∞ (K).
(iii) Show that a map T : Cc∞ (Rd ) → X to a topological space is contin-
uous if and only if for every compact set K ⊂ Rd , T maps Cc∞ (K)
continuously to X.
(iv) Show that the inclusion map from Cc∞ (Rd ) to Lp (Rd ) is continuous
for every 0 < p ≤ ∞.
(v) Show that a map T : Cc∞ (Rd ) → Cc∞ (Rd ) is continuous if and only
if for every compact set K ⊂ Rd there exists a compact set K  such
that T maps Cc∞ (K) continuously to Cc∞ (K  ).
(vi) Show that every linear differential operator with smooth coefficients
is a continuous operation on Cc∞ (Rd ).
(vii) Show that convolution with any absolutely integrable, compactly sup-
ported function is a continuous operation on Cc∞ (Rd ).
(viii) Show that Cc∞ (Rd ) is a topological vector space.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
216 1. Real analysis

(ix) Show that the product operation f, g → f g is continuous from Cc∞ (Rd )
×Cc∞ (Rd ) to Cc∞ (Rd ).
A sequence φn ∈ Cc (Rd ) of continuous, compactly supported functions
is said to be an approximation to the identity if the φn are non-negative,
have total mass Rn φn equal to 1, and converge uniformly to zero away
from the origin; thus, sup|x|≥r |φn (x)| → 0 for all r > 0. One can generate
such a sequence by starting with a single non-negative continuous compactly
supported function φ of total mass 1, and then setting φn (x) := nd φ(nx);
many other constructions are possible also.
One has the following useful fact:
Exercise 1.13.5. Let φn ∈ Cc∞ (Rd ) be a sequence of approximations to
the identity.
(i) If f ∈ C(Rd ) is continuous, show that f ∗ φn converges uniformly on
compact sets to f .
(ii) If f ∈ Lp (Rd ) for some 1 ≤ p < ∞, show that f ∗ φn converges in
Lp (Rd ) to f . (Hint: Use (i), the density of C0 (Rd ) in Lp (Rd ), and
Young’s inequality, Exercise 1.11.25.)
(iii) If f ∈ Cc∞ (Rd ), show that f ∗ φn converges in Cc∞ (Rd ) to f . (Hint:
Use the identity ∇(f ∗ φn ) = (∇f ) ∗ φn , cf. Exercise 1.13.1(ii).)
Exercise 1.13.6. Show that Cc∞ (Rd ) is separable. (Hint: It suffices to show
that Cc∞ (K) is separable for each compact K. There are several ways to
accomplish this. One is to begin with the Stone-Weierstrass theorem, which
will give a countable set which is dense in the uniform topology, then use the
fundamental theorem of calculus to strengthen the topology. Another is to
use Exercise 1.13.5 and then discretise the convolution. Another is to embed
K into a torus and use Fourier series, noting that the Fourier coefficients fˆ
of a smooth function f : Td → C decay faster than any power of |n|.)
1.13.2. Distributions. Now we can define the concept of a distribution.
Definition 1.13.1 (Distribution). A distribution on Rd is a continuous lin-
ear functional λ : f → f, λ from Cc∞ (Rd ) to C. The space of such distribu-
tions is denoted Cc∞ (Rd )∗ , and is given the weak-* topology. In particular,
a sequence of distributions λn converges (in the sense of distributions) to a
limit λ if one has f, λn  → f, λ for all f ∈ Cc∞ (Rd ).
A technical point: We endow the space Cc∞ (Rd )∗ with the conjugate
complex structure. Thus, if λ ∈ Cc∞ (Rd )∗ and c is a complex number,
then cλ is the distribution that maps a test function f to cf, λ rather
than cf, λ; thus f, cλ = cf, λ. This is to keep the analogy between
the evaluation of a distribution against a function and the usual Hermitian
inner product f, g = Rd f g of two test functions.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 217

From Exercise 1.13.4, we see that a linear functional λ : Cc∞ (Rd ) → C


is a distribution if, for every compact set K ⊂ Rd , there exists k ≥ 0 and
C > 0 such that
(1.113) |f, λ| ≤ Cf C k
for all f ∈ Cc∞ (K).
Exercise 1.13.7. Show that Cc∞ (Rd )∗ is a Hausdorff topological vector
space.

We note two basic examples of distributions:


• Any locally integrable function g ∈ L1loc (Rd ) can be viewed as a
distribution, by writing f, g := Rd f (x)g(x) dx for all test functions
f.
• Any complex Radon measure μ can be viewed as a distribution, by
writing f, μ := Rd f (x) dμ, where μ is the complex conjugate of
μ (thus μ(E) := μ(E)). (Note that this example generalises the
preceding one, which corresponds to the case when μ is absolutely
continuous with respect to Lebesgue measure.) Thus, for instance,
the Dirac measure δ at the origin is a distribution, with f, δ = f (0)
for all test functions f .
Exercise 1.13.8. Show that the above identifications of locally integrable
functions or complex Radon measures with distributions are injective. (Hint:
Use Exercise 1.13.1(iv).)

From the above exercise, we may view locally integrable functions and lo-
cally finite measures as a special type of distribution. In particular, Cc∞ (Rd )
and Lp (Rd ) are now contained in Cc∞ (Rd )∗ for all 1 ≤ p ≤ ∞.
Exercise 1.13.9. Show that if a sequence of locally integrable functions
converge in L1loc to a limit, then they also converge in the sense of distribu-
tions; similarly, if a sequence of complex Radon measures converge in the
vague topology to a limit, then they also converge in the sense of distribu-
tions.

Thus we see that convergence in the sense of distributions is among the


weakest of the notions of convergence used in analysis; however, from the
Hausdorff property, distributional limits are still unique.
Exercise 1.13.10. If φn is a sequence of approximations to the identity,
show that φn converges in the sense of distributions to the Dirac distribution
δ.

More exotic examples of distributions can be given:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
218 1. Real analysis

Exercise 1.13.11 (Derivative of the delta function). Let d = 1. Show


that the functional δ  : f → −f  (0) for all test functions f is a distribution
which does not arise from either a locally integrable function or a Radon
measure. (Note how it is important here that f is smooth (and in particular
differentiable) and not merely continuous.) The presence of the minus sign
will be explained shortly.
Exercise 1.13.12 (Principal value of 1/x). Let d = 1. Show that the
functional p. v. x1 defined by the formula
1 f (x)
f, p. v.  := lim dx
x ε→0 |x|>ε x
is a distribution which does not arise from either a locally integrable function
or a Radon measure. (Note that 1/x is not a locally integrable function!)
Exercise 1.13.13 (Distributional interpretations of 1/|x|). Let d = 1. For
any r > 0, show that the functional λr defined by the formula
f (x) − f (0) f (x)
f, λr  := dx + dx
|x|<r |x| |x|≥r |x|
is a distribution that does not arise from either a locally integrable function
or a Radon measure. Note that any two such functionals λr , λr differ by a
constant multiple of the Dirac delta distribution.
Exercise 1.13.14. A distribution λ is said to be real if f, λ is real for every
real-valued test function f . Show that every distribution λ can be uniquely
expressed as Re(λ) + i Im(λ) for some real distributions Re(λ), Im(λ).
Exercise 1.13.15. A distribution λ is said to be non-negative if f, λ is
non-negative for every non-negative test function f . Show that a distribution
is non-negative if and only if it is a non-negative Radon measure. (Hint: Use
the Riesz representation theorem and Exercise 1.13.1(iv).) Note that this
implies that the analogue of the Jordan decomposition fails for distributions;
any distribution which is not a Radon measure will not be the difference of
non-negative distributions.
We will now extend various operations on locally integrable functions or
Radon measures to distributions by arguing by analogy. (Shortly, we will
give a more formal approach based on density.)
We begin with the operation of multiplying a distribution λ by a smooth
function h : Rd → C. Observe that
f, gh = f h, g
for all test functions f, g, h. Inspired by this formula, we define the product
λh = hλ of a distribution with a smooth function by setting
f, λh := f h, λ

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 219

for all test functions f . It is easy to see (e.g., using Exercise 1.13.4(vi)) that
this defines a distribution λh, and that this operation is compatible with
existing definitions of products between a locally integrable function (or
Radon measure) with a smooth function. It is important that h is smooth
(and not merely, say, continuous) because one needs the product of a test
function f with h to still be a test function.
Exercise 1.13.16. Let d = 1. Establish the identity
δf = f (0)δ
for any smooth function f . In particular,
δx = 0,
where we abuse notation slightly and write x for the identity function x → x.
Conversely, if λ is a distribution such that
λx = 0,
show that λ is a constant multiple of δ. (Hint: Use the identity f (x) =
f (0) + x 0 f  (tx) dt to write f (x) as the sum of f (0)ψ and x times a test
1

function for any test function f , where ψ is a fixed test function equalling 1
at the origin.)
Remark 1.13.2. Even though distributions are not, strictly speaking, func-
tions, it is often useful heuristically to view them as such. Thus, for instance,
one might write a distributional identity such as δx = 0 suggestively as
δ(x)x = 0. Another useful (and rigorous) way to view such identities is to
write distributions such as δ as a limit of approximations to the identity
ψn , and show that the relevant identity becomes true in the limit; thus, for
instance, to show that δx = 0, one can show that ψn x → 0 in the sense of
distributions as n → ∞. (In fact, ψn x converges to zero in the L1 norm.)
Exercise 1.13.17. Let d = 1. With the distribution p. v. x1 from Exercise
1.13.12, show that (p. v. x1 )x is equal to 1. With the distributions λr from
Exercise 1.13.13, show that λr x = sgn, where sgn is the signum function.

A distribution λ is said to be supported in a closed set K in f, λ = 0 for


all f that vanish on an open neighbourhood of K. The intersection of all K
that λ is supported on is denoted supp(λ) and is referred to as the support
of the distribution; this is the smallest closed set that λ is supported on.
Thus, for instance, the Dirac delta function is supported on {0}, as are all
derivatives of that function. (Note here that it is important that f vanish on
a neighbourhood of K, rather than merely vanishing on K itself; for instance,
in one dimension, there certainly exist test functions f that vanish at 0 but
nevertheless have a non-zero inner product with δ  .)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
220 1. Real analysis

Exercise 1.13.18. Show that every distribution is the limit of a sequence


of compactly supported distributions (using the weak-* topology, of course).
(Hint: Approximate a distribution λ by the truncated distributions ληn for
some smooth cutoff functions ηn constructed using Exercise 1.13.1(iii).)

In a similar spirit, we can convolve a distribution λ by an absolutely in-


tegrable, compactly supported function h ∈ L1 (Rd ). From Fubini’s theorem
we observe the formula
f, g ∗ h = f ∗ h̃, g
for all test functions f, g, h, where h̃(x) := h(−x). Inspired by this formula,
we define the convolution λ ∗ h = h ∗ λ of a distribution with an absolutely
integrable, compactly supported function by the formula
(1.114) f, λ ∗ h := f ∗ h̃, λ
for all test functions f . This gives a well-defined distribution λh (thanks to
Exercise 1.13.4(vii)) which is compatible with previous notions of convolu-
tion.
Example 1.13.3. One has δ ∗ f = f ∗ δ = f for all test functions f . In one
dimension, we have δ  ∗ f = f  . Why? Thus differentiation can be viewed
as convolution with a distribution.

A remarkable fact about convolutions of two functions f ∗ g is that they


inherit the regularity of the smoother of the two factors f, g (in contrast
to products f g, which tend to inherit the regularity of the rougher of the
two factors). (This disparity can be also be seen by contrasting the identity
∇(f ∗ g) = (∇f ) ∗ g = f ∗ (∇g) with the identity ∇(f g) = (∇f )g + f (∇g).)
In the case of convolving distributions with test functions, this phenomenon
is manifested as follows:
Lemma 1.13.4. Let λ ∈ Cc∞ (Rd )∗ be a distribution, and let h ∈ Cc∞ (Rd )
be a test function. Then λ ∗ h is equal to a smooth function.

Proof. If λ were itself a smooth function, then one could easily verify the
identity
(1.115) λ ∗ h(x) = hx , λ,
where hx (y) := h(x − y). As h is a test function, it is easy to see that hx
varies smoothly in x in any C k norm (indeed, it has Taylor expansions to
any order in such norms), and so the right-hand side is a smooth function of
x. So it suffices to verify the identity (1.115). As distributions are defined
against test functions f , it suffices to show that

f, λ ∗ h = f (x)hx , λ dx.


Rd

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 221

On the other hand, we have from (1.114) that

f, λ ∗ h = f ∗ h̃, λ =  f (x)hx dx, λ.


Rd

So the only issue is to justify the interchange of integral and inner product:

f (x)hx , λ dx =  f (x)hx dx, λ.


Rd Rd

Certainly (from the compact support of f ), any Riemann sum can be inter-
changed with the inner product

f (xn )hxn , λΔx =  f (xn )hxn Δx, λ,


n n

where xn ranges over some lattice and Δx is the volume of the fundamental
domain. A modification of the argument that shows convergence of the
Riemann integral for smooth, compactly supported functions then works
here and allows one to take limits. We omit the details. 

This has an important corollary:

Lemma 1.13.5. Every distribution is the limit of a sequence of test func-


tions. In particular, Cc∞ (Rd ) is dense in Cc∞ (Rd )∗ .

Proof. By Exercise 1.13.18, it suffices to verify this for compactly supported


distributions λ. We let φn be a sequence of approximations to the identity.
By Exercise 1.13.5(iii) and (1.114), we see that λ ∗ φn converges in the sense
of distributions to λ. By Lemma 1.13.4, λ ∗ φn is a smooth function; as λ
and φn are both compactly supported, λ ∗ φn is compactly supported also.
The claim follows. 

Because of this lemma, we can formalise the previous procedure of ex-


tending operations that were previously defined on test functions, to dis-
tributions, provided that these operations were continuous in distributional
topologies. However, we shall continue to proceed by analogy as it requires
fewer verifications in order to motivate the definition.

Exercise 1.13.19. Another consequence of Lemma 1.13.4 is that it allows


one to extend the definition (1.114) of convolution to the case when h is not
an integrable function of compact support, but is instead merely a distribu-
tion of compact support. Adopting this convention, show that convolution
of distributions of compact support is both commutative and associative.
(Hint: This can either be done directly or by carefully taking limits using
Lemma 1.13.5.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
222 1. Real analysis

The next operation we will introduce is that of differentiation. An inte-


gration by parts reveals the identity
∂ ∂
f, g = − f, g
∂xj ∂xj
for any test functions f, g and j = 1, . . . , d. Inspired by this, we define the
(distributional) partial derivative ∂x∂ j λ of a distribution λ by the formula

∂ ∂
f, λ := − f, λ.
∂xj ∂xj
This can be verified to still be a distribution, and by Exercise 1.13.4(vi),
the operation of differentiation is a continuous one on distributions. More
generally, given any linear differential operator P with smooth coefficients,
one can define P λ for a distribution λ by the formula
f, P λ := P ∗ f, λ,
where P ∗ is the adjoint differential operator P , which can be defined implic-
itly by the formula
f, P g = P ∗ f, g
for test functions f, g, or more explicitly by replacing all coefficients with
complex conjugates, replacing each partial derivative ∂x∂ j with its negative,
and reversing the order of operations (thus, for instance, the adjoint of the
first-order operator a(x) dxd
: f → af  would be − dx
d
a(x) : f → −(af ) ).
Example 1.13.6. The distribution δ  defined in Exercise 1.13.11 is the
d
derivative dx δ of δ, as defined by the above formula.

Many of the identities one is used to in classical calculus extend to the


distributional setting (as one would already expect from Lemma 1.13.5). For
instance:
Exercise 1.13.20 (Product rule). Let λ ∈ Cc∞ (Rd )∗ be a distribution, and
let f : Rd → C be smooth. Show that
∂ ∂ ∂
(λf ) = ( λ)f + λ( f)
∂xj ∂xj ∂xj
for all j = 1, . . . , d.
Exercise 1.13.21. Let d = 1. Show that δ  x = −δ in three different ways:
• Directly from the definitions;
• Using the product rule;
• Writing δ as the limit of approximations ψn to the identity.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 223

Exercise 1.13.22. Let d = 1.


(i) Show that if λ is a distribution and n ≥ 1 is an integer, then λxn = 0 if
and only if it is a linear combination of δ and its first n−1 derivatives
δ  , δ  , . . . , δ (n−1) .
(ii) Show that a distribution λ is supported on {0} if and only if it is a
linear combination of δ and finitely many of its derivatives.
(iii) Generalise (ii) to the case of general dimension d (where of course
one now uses partial derivatives instead of derivatives).
Exercise 1.13.23. Let d = 1.
• Show that the derivative of the Heaviside function 1[0,+∞) is equal
to δ.
• Show that the derivative of the signum function sgn(x) is equal to
2δ.
• Show that the derivative of the locally integrable function log |x| is
equal to p. v. x1 .
• Show that the derivative of the locally integrable function
log |x| sgn(x) is equal to the distribution λ1 from Exercise 1.13.13.
• Show that the derivative of the locally integrable function |x| is the
locally integrable function sgn(x).

If a locally integrable function has a distributional derivative which is


also a locally integrable function, we refer to the latter as the weak derivative
of the former. Thus, for instance, the weak derivative of |x| is sgn(x) (as one
would expect), but sgn(x) does not have a weak derivative (despite being
(classically) differentiable almost everywhere), because the distributional
derivative 2δ of this function is not itself a locally integrable function. Thus
weak derivatives differ in some respects from their classical counterparts,
though of course the two concepts agree for smooth functions.
Exercise 1.13.24. Let d ≥ 1. Show that for any 1 ≤ i, j ≤ d, and any
distribution λ ∈ Cc∞ (Rd )∗ , we have ∂x ∂ ∂
i ∂xj
λ = ∂x∂ j ∂x

i
λ, thus weak deriva-
tives commute with each other. (This is in contrast to classical deriva-
tives, which can fail to commute for non-smooth functions; for instance,
∂ ∂ xy 3 ∂ ∂ xy 3
∂x ∂y x2 +y 2 = ∂y ∂x x2 +y 2 at the origin (x, y) = 0, despite both derivatives
being defined. More generally, weak derivatives tend to be less pathological
than classical derivatives, but of course the downside is that weak deriva-
tives do not always have a classical interpretation as a limit of a Newton
quotient.)
Exercise 1.13.25. Let d = 1, and let k ≥ 0 be an integer. Let us say that
a compactly supported distribution λ ∈ Cc∞ (R)∗ has order of at most k if

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
224 1. Real analysis

the functional f → f, λ is continuous in the C k norm. Thus, for instance,


δ has order at most 0, and δ  has order at most 1, and every compactly
supported distribution is of order at most k for some sufficiently large k.
• Show that if λ is a compactly supported distribution of order at most
0, then it is a compactly supported Radon measure.
• Show that if λ is a compactly supported distribution of order at most
k, then λ has order at most k + 1.
• Conversely, if λ is a compactly supported distribution of order k + 1,
then we can write λ = ρ + ν for some compactly supported distribu-
tions of order k. (Hint: One has to dualise the fundamental theorem
of calculus and then apply smooth cutoffs to recover compact sup-
port.)
• Show that every compactly supported distribution can be expressed
as a finite linear combination of (distributional) derivatives of com-
pactly supported Radon measures.
• Show that every compactly supported distribution can be expressed
as a finite linear combination of (distributional) derivatives of func-
tions in C0k (R), for any fixed k.

We now set out some other operations on distributions. If we define


the translation τx f of a test function f by a shift x ∈ Rd by the formula
τx f (y) := f (y − x), then we have
f, τx g = τ−x f, g
for all test functions f, g, so it is natural to define the translation τx λ of a
distribution λ by the formula
f, τx λ := τ−x f, λ.

Next, we consider linear changes of variable.

Exercise 1.13.26 (Linear changes of variable). Let d ≥ 1, and let L : Rd →


Rd be a linear transformation. Given a distribution λ ∈ Cc∞ (Rd )∗ , let λ ◦ L
be the distribution given by the formula
1
f, λ ◦ L := f ◦ L−1 , λ
| det L|
for all test functions f . (How would one motivate this formula?)
• Show that δ ◦ L = 1
| det L| δ for all linear transformations L.
• If d = 1, show that p. v. x1 · L = | det1 L| p. v. x1 for all linear transfor-
mations L.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 225

• Conversely, if d = 1 and λ is a distribution such that λ · L = | det1 L| λ


for all linear transformations L show that L is a constant multiple
of p. v. x1 . (Hint: First show that there exists a constant c such that

f, λ = c 0 f (x) x dx whenever f is a bump function supported in
(0, +∞). To show this, approximate f by the function
∞ ∞
f (y) x
f (et x)ψn (t) dt = ψn (log )1x>0 dy
−∞ 0 y y
for ψn an approximation to the identity.)
Remark 1.13.7. One can also compose distributions with diffeomorphisms.
However, things become much more delicate if the map one is composing
with contains stationary points. For instance, in one dimension, one can-
not meaningfully make sense of δ(x2 ) (the composition of the Dirac delta
distribution with x → x2 ). This can be seen by first noting that for an
approximation ψn to the identity, ψn (x2 ) does not converge to a limit in the
distributional sense.
Exercise 1.13.27 (Tensor product of distributions). Let d, d ≥ 1 be inte-

gers. If λ ∈ Cc∞ (Rd )∗ and ρ ∈ Cc∞ (Rd )∗ are distributions, show that there

is a unique distribution λ ⊗ ρ ∈ Cc∞ (Rd+d )∗ with the property that
(1.116) f ⊗ g, λ ⊗ ρ = f, λg, ρ
 
for all test functions f ∈ Cc∞ (Rd ), g ∈ Cc∞ (Rd ), where f ⊗ g : Cc∞ (Rd+d )
is the tensor product f ⊗ g(x, x ) := f (x)g(x ) of f and g. (Hint: Like many
other constructions of tensor products, this is rather intricate. One way is to

start by fixing two cutoff functions ψ, ψ  on Rd , Rd , respectively, and define

λ ⊗ ρ on modulated test functions e2πiξ·x e2πiξ ·x ψ(x)ψ  (x ) for various fre-
quencies ξ, ξ  , and then use Fourier series to define λ⊗ρ on F (x, x )ψ(x)ψ  (x )
for smooth F . Then show that these definitions of λ ⊗ ρ are compatible for
different choices of ψ, ψ  and can be glued together to form a distribution;
finally, go back and verify (1.116).)

We close this section with one caveat. Despite the many operations
that one can perform on distributions, there are two types of operations
which cannot, in general, be defined on arbitrary distributions (at least
while remaining in the class of distributions):
• Nonlinear operations (e.g., taking the absolute value of a distribu-
tion); or
• Multiplying a distribution by anything rougher than a smooth func-
tion.
Thus, for instance, there is no meaningful way to interpret the square
δ2 of the Dirac delta function as a distribution. This is perhaps easiest to

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
226 1. Real analysis

see using an approximation ψn to the identity: ψn converges to δ in the


sense of distributions, but ψn2 does not converge to anything (the integral
against a test function that does not vanish at the origin will go to infinity
as n → ∞). For similar reasons, one cannot meaningfully interpret the
absolute value |δ  | of the derivative of the delta function. (One also cannot
multiply δ by sgn(x). Why?)
Exercise 1.13.28. Let X be a normed vector space which contains Cc∞ (Rd )
as a dense subspace (and such that the inclusion of Cc∞ (Rd ) to X is contin-
uous). The adjoint (or transpose) of this inclusion map is then an injection
from X ∗ to the space of distributions Cc∞ (Rd )∗ ; thus X ∗ can be viewed as
a subspace of the space of distributions.
• Show that the closed unit ball in X ∗ is also closed in the space of
distributions.
• Conclude that any distributional limit of a bounded sequence in
Lp (Rd ) for 1 < p ≤ ∞ is still in Lp (Rd ).
• Show that the previous claim fails for L1 (Rd ), but holds for the space
M (Rd ) of finite measures.

1.13.3. Tempered distributions. The list of operations one can define


on distributions has one major omission—the Fourier transform F. Unfor-
tunately, one cannot easily define the Fourier transform for all distributions.
One can see this as follows. From Plancherel’s theorem one has the identity
f, F g = F ∗ f, g
for test functions f, g, so one would like to define the Fourier transform
Fλ = λ̂ of a distribution λ by the formula
(1.117) f, F λ := F ∗ f, λ.
Unfortunately, this does not quite work because the adjoint Fourier trans-
form F ∗ of a test function is not a test function, but is instead just a Schwartz
function. (Indeed, by Exercise 1.12.42, it is not possible to find a non-trivial
test function whose Fourier transform is again a test function.) To address
this, we need to work with a slightly smaller space than that of all distribu-
tions, namely those of tempered distributions:
Definition 1.13.8 (Tempered distributions). A tempered distribution is a
continuous linear functional λ : f → f, λ on the Schwartz space S(Rd )
(with the topology given by Exercise 1.12.25), i.e., an element of S(Rd )∗ .

Since Cc∞ (Rd ) embeds continuously into S(Rd ) (with a dense image),
we see that the space of tempered distributions can be embedded into the
space of distributions. However, not every distribution is tempered:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 227

Example 1.13.9. The distribution ex is not tempered. Indeed, if ψ is a


bump function, observe that the sequence of functions e−n ψ(x−n) converges
to zero in the Schwartz space topology, but e−n ψ(x − n), ex  does not go to
zero, and so this distribution does not correspond to a tempered distribution.

On the other hand, distributions which avoid this sort of exponential


growth, and instead only grow polynomially, tend to be tempered:
Exercise 1.13.29. Show that any Radon measure μ which is of polynomial
growth in the sense that |μ|(B(0, R)) ≤ CRk for all R ≥ 1 and some con-
stants C, k > 0, where B(0, R) is the ball of radius R centred at the origin
in Rd , is tempered.
Remark 1.13.10. As a zeroth approximation, one can roughly think of
“tempered” as being synonymous with “polynomial growth”. However, this
is not strictly true: for instance, the (weak) derivative of a function of poly-
nomial growth will still be tempered, but need not be of polynomial growth
(for instance, the derivative ex cos(ex ) of sin(ex ) is a tempered distribu-
tion, despite having exponential growth). While one can eventually describe
which distributions are tempered by measuring their growth in both physical
space and in frequency space, we will not do so here.

Most of the operations that preserve the space of distributions, also


preserve the space of tempered distributions. For instance:
Exercise 1.13.30. • Show that any derivative of a tempered distribu-
tion is again a tempered distribution.
• Show that and any convolution of a tempered distribution with a
compactly supported distribution is again a tempered distribution.
• Show that if f is a measurable function which is rapidly decreasing in
the sense that |x|k f (x) is an L∞ (Rd ) function for each k = 0, 1, 2, . . .,
then a convolution of a tempered distribution with f can be defined,
and is again a tempered distribution.
• Show that if f is a smooth function such that f and all its derivatives
have at most polynomial growth (thus for each j ≥ 0 there exists
C, k ≥ 0 such that |∇j f (x)| ≤ C(1 + |x|)k for all x ∈ Rd ), then
the product of a tempered distribution with f is again a tempered
distribution. Give a counterexample to show that this statement fails
if the polynomial growth hypotheses are dropped.
• Show that the translate of a tempered distribution is again a tem-
pered distribution.

But we can now add a new operation to this list using the formula
(1.117): as the Fourier transform F maps Schwartz functions continuously

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
228 1. Real analysis

to Schwartz functions, it also continuously maps the space of tempered distri-


butions to itself. One can also define the inverse Fourier transform F ∗ = F −1
on tempered distributions in a similar manner.
It is not difficult to extend many of the properties of the Fourier trans-
form from Schwartz functions to distributions. For instance:
Exercise 1.13.31. Let λ ∈ S(Rd )∗ be a tempered distribution, and let
f ∈ S(Rd ) be a Schwartz function.
• Inversion formula. Show that F ∗ F λ = F F ∗ λ = λ.
• Multiplication intertwines with convolution. Show that F(λf ) =
(Fλ) ∗ (F f ) and F (λ ∗ f ) = (Fλ)(F f ).
• Translation intertwines with modulation. For any x0 ∈ Rd , show that
F(τx0 λ) = e−x0 F λ, where e−x0 (ξ) := e−2πiξ·x0 . Similarly, show that
for any ξ0 ∈ Rd , one has F (eξ0 λ) = τξ0 F λ.
• Linear transformations. For any invertible linear transformation L :
Rd → Rd , show that F (λ ◦ L) = | det1 L| (F λ) ◦ (L∗ )−1 .
• Differentiation intertwines with polynomial multiplication. For
any 1 ≤ j ≤ d, show that F ( ∂x∂ j λ) = 2πiξj F λ, where xj and ξj
is the jth coordinate function in physical space and frequency space,
respectively, and similarly F (−2πixj λ) = ∂ξ∂ j F λ.
Exercise 1.13.32. Let d ≥ 1.
• Inversion formula. Show that Fδ = 1 and F 1 = δ.
• Orthogonality. Let V be a subspace of Rd , and let μ be Lebesgue
measure on V . Show that F μ is Lebesgue measure on the orthogo-
nal complement V ⊥ of V . (Note that this generalises the previous
exercise.)

• Poisson summation formula. Let k∈Zd τk δ be the distribution
f, τk δ := f (k).
k∈Zd k∈Zd
Show that this is a tempered distribution which is equal to its own
Fourier transform.

One can use these properties of tempered distributions to start solv-


ing constant-coefficient PDE. We first illustrate this by an ODE example,
showing how the formal symbolic calculus for solving such ODE, which you
may have seen as an undergraduate, can now be (sometimes) justified using
tempered distributions.
Exercise 1.13.33. Let d = 1, let a, b be real numbers, and let D be the
d
operator D = dx .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 229

• If a = b, use the Fourier transform to show that all tempered distri-


bution solutions to the ODE (D − ia)(D − ib)λ = 0 are of the form
λ = Aeiax + Beibx for some constants A, B.
• If a = b, show that all tempered distribution solutions to the ODE
(D − ia)(D − ib)λ = 0 are of the form λ = Aeiax + Bxeiax for some
constants A, B.
Remark 1.13.11. More generally, one can solve any homogeneous constant-
coefficient ODE using tempered distributions and the Fourier transform so
long as the roots of the characteristic polynomial are purely imaginary. In
all other cases, solutions can grow exponentially as x → +∞ or x → −∞
and so are not tempered. There are other theories of generalised functions
that can handle these objects (e.g., hyperfunctions) but we will not discuss
them here.

Now we turn to PDE. To illustrate the method, let us focus on solving


Poisson’s equation
(1.118) Δu = f
in Rd , where f is a Schwartz function and u is a distribution, where Δ =
d ∂ 2
j=1 ∂x2j is the Laplacian. (In some texts, particularly those using spectral
 ∂2
analysis, the Laplacian is occasionally defined instead as − dj=1 ∂x 2 , to
j
make it positive semidefinite. But we will eschew that sign convention here,
though of course the theory is only changed in a trivial fashion if one adopts
it.)
We first settle the question of uniqueness:
Exercise 1.13.34. Let d ≥ 1. Using the Fourier transform, show that the
only tempered distributions λ ∈ S(Rd )∗ which are harmonic (by which we
mean that Δλ = 0 in the sense of distributions) are the harmonic polyno-
mials. (Hint: Use Exercise 1.13.22.) Note that this generalises Liouville’s
theorem. There are of course many other harmonic functions than the har-
monic polynomials, e.g., ex cos(y), but such functions are not tempered dis-
tributions.

From the above exercise, we know that the solution u to (1.118), if


tempered, is defined up to harmonic polynomials. To find a solution, we
observe that it is enough to find a fundamental solution, i.e., a tempered
distribution K solving the equation
ΔK = δ.
Indeed, if one then convolves this equation with the Schwartz function f and
uses the identity (ΔK)∗f = Δ(K∗f ) (which can either be seen directly or by

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
230 1. Real analysis

using Exercise 1.13.31), we see that u = K ∗f will be a tempered distribution


solution to (1.118) (and all the other solutions will equal this solution plus
a harmonic polynomial). So, it is enough to locate a fundamental solution
K. We can take Fourier transforms and rewrite this equation as
−4π 2 |ξ|2 K̂(ξ) = 1
(here we are treating the tempered distribution K̂ as a function to emphasise
that the dependent variable is now ξ). It is then natural to propose solving
this equation as
1
(1.119) K̂(ξ) = ,
−4π 2 |ξ|2
though this may not be the unique solution (for instance, one is free to
modify K by a multiple of the Dirac delta function, cf. Exercise 1.13.16).
A short computation in polar coordinates shows that −4π12 |ξ|2 is locally
integrable in dimensions d ≥ 3, so the right-hand side of (1.119) makes sense.
To then compute K explicitly, we have from the distributional inversion
formula that
−1
K = 2 F ∗ |ξ|−2 .

So we now need to figure out what the Fourier transform of a negative power
of |x| (or the adjoint Fourier transform of a negative power of |ξ|) is.
Let us work formally at first and consider the problem of computing the
Fourier transform of the function |x|−α in Rd for some exponent α. A direct
attack, based on evaluating the (formal) Fourier integral

(1.120) 
|x| −α (ξ) = |x|−α e−2πiξ·x dx,
Rd

does not seem to make much sense (the integral is not absolutely integrable),
although a change of variables (or dimensional analysis) heuristic can at least
lead to the prediction that the integral (1.120) should be some multiple of
|ξ|α−d . But which multiple should it be? To continue the formal calculation,
we can write the non-integrable function |x|−α as an average of integrable
functions whose Fourier transforms are already known. There are many such
functions that one could use here, but it is natural to use Gaussians, as they
have a particularly pleasant Fourier transform, namely

e−πt 2 |x|2
(ξ) = td e−π|ξ|
2 /t2

for t > 0 (see Exercise 1.12.32). To get from Gaussians to |x|−α , one can
observe that |x|−α is invariant under the scaling f (x) → tα f (tx) for t > 0.
Thus, it is natural to average the standard Gaussian e−π|x| with respect
2

to this scaling, thus producing the function tα e−πt |x| , then integrate with
2 2

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 231

dt
respect to the multiplicative Haar measure t . A straightforward change of
variables then gives the identity

dt 1
tα e−πt
2 |x|2
= π −α/2 |x|−α Γ(α/2),
0 t 2
where

dt
Γ(s) := ts e−t
0 t
is the Gamma function. If we formally take Fourier transforms of this iden-
tity, we obtain

dt 1 
tα t−d e−π|x| = π −α/2 |x|
2 /t2
−α (ξ)Γ(α/2).
0 t 2
Another change of variables shows that

dt 1
tα t−d e−π|x| = π −(d−α)/2 |ξ|−(d−α) Γ((d − α)/2),
2 /t2

0 t 2
and so we conclude (formally) that

 −α (ξ) =
π −(d−α)/2 Γ((d − α)/2) −(d−α)
(1.121) |x| |ξ| ,
π −α/2 Γ(α/2)
thus solving the problem of what the constant multiple of |ξ|−(d−α) should
be.

Exercise 1.13.35. Give a rigorous proof of (1.121) for 0 < α < d (when
both sides are locally integrable) in the sense of distributions. (Hint: Ba-
sically, one needs to test the entire formal argument against an arbitrary
Schwartz function.) The identity (1.121) can in fact be continued mero-
morphically in α, but the interpretation of distributions such as |x|−α when
|x|−α is not locally integrable is somewhat complicated (cf. Exercise 1.13.12)
and will not be discussed here.

Specialising back to the current situation with d = 3, α = 2, and using


the standard identities
1 √
Γ(n) = (n − 1)!, Γ( ) = π,
2
we see that
1
(ξ) = π|ξ|−1 ,
|x|2
and similarly
1
F∗ = π|x|−1 .
|ξ|2

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
232 1. Real analysis

So from (1.119) we see that one choice of the fundamental solution K is the
Newton potential
−1
K= ,
4π|x|
leading to an explicit (and rigorously derived) solution
1 f (y)
(1.122) u(x) := f ∗ K(x) = − dy
4π R3 |x − y|
to the Poisson equation (1.118) in d = 3 for Schwartz functions f . (This is
not quite the only fundamental solution K available; one can add a harmonic
polynomial to K, which will end up adding a harmonic polynomial to u, since
the convolution of a harmonic polynomial with a Schwartz function is easily
seen to still be harmonic.)
Exercise 1.13.36. Without using the theory of distributions, give an alter-
nate (and still rigorous) proof that the function u defined in (1.122) solves
(1.118) in d = 3.
Exercise 1.13.37. • Show that for any d ≥ 3, a fundamental solution
K to the Poisson equation is given by the locally integrable function
1 1
K(x) = ,
d(d − 2)ωd |x|d−2
where ωd = π d/2 /Γ( d2 + 1) is the volume of the unit ball in d dimen-
sions.
• Show that for d = 1, a fundamental solution is given by the locally
integrable function K(x) = |x|/2.
• Show that for d = 2, a fundamental solution is given by the locally
1
integrable function K(x) = 2π log |x|.
Thus we see that for the Poisson equation, d = 2 is a critical dimension,
requiring a logarithmic correction to the usual formula.

Similar methods can solve other constant coefficient linear PDE. We give
some standard examples in the exercises below.
Exercise 1.13.38. Let d ≥ 1. Show that a smooth solution u : R+ × Rd →
C to the heat equation ∂t u = Δu with initial data u(0, x) = f (x) for some
Schwartz function f is given by u(t) = f ∗ Kt for t > 0, where Kt is the heat
kernel
1
e−|x−y| /4t .
2
Kt (x) = d/2
(4πt)
(This solution is unique assuming certain smoothness and decay conditions
at infinity, but we will not pursue this issue here.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 233

Exercise 1.13.39. Let d ≥ 1. Show that a smooth solution u : R×Rd → C


to the Schrödinger equation ∂t u = iΔu with initial data u(0, x) = f (x) for
some Schwartz function f is given by u(t) = f ∗ Kt for t = 0, where Kt is
the Schrödinger kernel 14
1 2
Kt (x) = d/2
ei|x−y| /4t ,
(4πit)
and we use the standard branch of the complex logarithm (with cut on the
negative real axis) to define (4πit)d/2. (Hint: You may wish to investigate
the Fourier transform of e−z|ξ| , where z is a complex number with positive
2

real part, and then let z approach the imaginary axis.)


Exercise 1.13.40. Let d = 3. Show that a smooth solution u : R×R3 → C
to the wave equation −∂tt u+Δu with initial data u(0, x) = f (x), ∂t u(0, x) =
g(x) for some Schwartz functions f is given by the formula
u(t) = f ∗ ∂t Kt + g ∗ Kt
for t = 0, where Kt is the distribution
t
f, Kt  := f (tω) dω,
4π S2
where ω is Lebesgue measure on the sphere S 2 , and the derivative ∂t Kt is
K −Kt
defined in the Newtonian sense limdt→0 t+dt
dt , with the limit taken in the
sense of distributions.
Remark 1.13.12. The theory of (tempered) distributions is also highly
effective for studying variable coefficient linear PDE, especially if the coeffi-
cients are fairly smooth, and particularly if one is primarily interested in the
singularities of solutions to such PDE and how they propagate. Here the
Fourier transform must be augmented with more general transforms of this
type, such as Fourier integral operators. A classic reference for this topic
is [Ho1990]. For non-linear PDE, subspaces of the space of distributions,
such as Sobolev spaces, tend to be more useful. We will discuss these in the
next section.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/04/19.
Thanks to Dale Roberts, Max Baroi, and an anonymous commenter for
corrections.

14 The close similarity here with the heat kernel is a manifestation of Wick rotation in action.

However, from an analytical viewpoint, the two kernels are very different. For instance, the
convergence of f ∗ Kt to f as t → 0 follows in the heat kernel case by the theory of approximations
to the identity, whereas the convergence in the Schrödinger case is much more subtle and is best
seen via Fourier analysis.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.14

Sobolev spaces

As discussed in previous sections, a function space norm can be viewed as


a means to rigorously quantify various statistics of a function f : X → C.
For instance, the height and width can be quantified via the Lp (X, μ) norms
(and their relatives, such as the Lorentz norms f Lp,q (X,μ) ). Indeed, if
f is a step function f = A1E , then the Lp norm of f is a combination
f Lp (X,μ) = |A|μ(E)1/p of the height (or amplitude) A and the width μ(E).
However, there are more features of a function f of interest than just its
width and height. When the domain X is a Euclidean space Rd (or domains
related to Euclidean spaces, such as open subsets of Rd , or manifolds),
then another important feature of such functions (especially in PDE) is
the regularity of a function, as well as the related concept of the frequency
scale of a function. These terms are not rigorously defined; but roughly
speaking, regularity measures how smooth a function is (or how many times
one can differentiate the function before it ceases to be a function), while the
frequency scale of a function measures how quickly the function oscillates
(and would be inversely proportional to the wavelength). One can illustrate
this informal concept with some examples:

• Let φ ∈ Cc∞ (R) be a test function that equals 1 near the origin, and
let N be a large number. Then the function f (x) := φ(x) sin(N x)
oscillates at a wavelength of about 1/N , and a frequency scale of
about N . While f is, strictly speaking, a smooth function, it be-
comes increasingly less smooth in the limit N → ∞; for instance, the
derivative f  (x) = φ (x) sin(N x) + N φ(x) cos(N x) grows at a roughly
linear rate as N → ∞, and the higher derivatives grow at even faster
rates. So this function does not really have any regularity in the

235

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
236 1. Real analysis

limit N → ∞. Note however that the height and width of this func-
tion is bounded uniformly in N , so regularity and frequency scale are
independent of height and width.
• Continuing the previous example, now consider the function g(x) :=
N −s φ(x) sin(N x), where s ≥ 0 is some parameter. This function also
has a frequency scale of about N . But now it has a certain amount of
regularity, even in the limit N → ∞; indeed, one easily checks that
the kth derivative of g stays bounded in N as long as k ≤ s. So one
could view this function as having “s degrees of regularity” in the
limit N → ∞.
• In a similar vein, the function N −s φ(N x) also has a frequency scale
of about N and can be viewed as having s degrees of regularity in
the limit N → ∞.
• The function φ(x)|x|s 1x>0 also has about s degrees of regularity, in
the sense that it can be differentiated up to s times before becoming
unbounded. By performing a dyadic decomposition of the x variable,
one can also decompose this function into components ψ(2n x)|x|s
for n ≥ 0, where ψ(x) := (φ(x) − φ(2x))1x>0 is a bump function
supported away from the origin; each such component has frequency
scale about 2n and s degrees of regularity. Thus we see that the
original function φ(x)|x|s 1x>0 has a range of frequency scales, ranging
from about 1 all the way to +∞.
• One can of course concoct higher-dimensional analogues of these ex-
amples. For instance, the localised plane wave φ(x) sin(ξ · x) in Rd ,
where φ ∈ Cc∞ (Rd ) is a test function, would have a frequency scale
of about |ξ|.

There are a variety of function space norms that can be used to cap-
ture frequency scale (or regularity) in addition to height and width. The
most common and well-known examples of such spaces are the Sobolev space
norms f W s,p (Rd ) , although there are a number of other norms with similar
features, such as Hölder norms, Besov norms, and Triebel-Lizorkin norms.
Very roughly speaking, the W s,p norm is like the Lp norm, but with “s ad-
ditional degrees of regularity”. For instance, in one dimension, the function
Aφ(x/R) sin(N x), where φ is a fixed test function and R, N are large, will
have a W s,p norm of about |A|R1/p N s , thus combining the height |A|, the
width R, and the frequency scale N of this function together. (Compare
this with the Lp norm of the same function, which is about |A|R1/p .)
To a large extent, the theory of the Sobolev spaces W s,p (Rd ) resembles
their Lebesgue counterparts Lp (Rd ) (which are as the special case of Sobolev
spaces when s = 0), but with the additional benefit of being able to interact

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 237

∂f
very nicely with (weak) derivatives: a first derivative ∂x j
of a function in
p
an L space usually leaves all Lebesgue spaces, but a first derivative of a
function in the Sobolev space W s,p will end up in another Sobolev space
W s−1,p . This compatibility with the differentiation operation begins to ex-
plain why Sobolev spaces are so useful in the theory of partial differential
equations. Furthermore, the regularity parameter s in Sobolev spaces is not
restricted to be a natural number; it can be any real number, and one can
use a fractional derivative or integration operators to move from one regu-
larity to another. Despite the fact that most partial differential equations
involve differential operators of integer order, fractional spaces are still of
importance; for instance it often turns out that the Sobolev spaces which
are critical (scale-invariant) for a certain PDE are of fractional order.
The uncertainty principle in Fourier analysis places a constraint between
the width and frequency scale of a function; roughly speaking (and in one di-
mension for simplicity), the product of the two quantities has to be bounded
away from zero (or to put it another way, a wave is always at least as wide as
its wavelength). This constraint can be quantified as the very useful Sobolev
embedding theorem, which allows one to trade regularity for integrability: a
function in a Sobolev space W s,p will automatically lie in a number of other
Sobolev spaces W s̃,p̃ with s̃ < s and p̃ > p; in particular, one can often em-
bed Sobolev spaces into Lebesgue spaces. The trade is not reversible: one
cannot start with a function with a lot of integrability and no regularity,
and expect to recover regularity in a space of lower integrability. (One can
already see this with the most basic example of Sobolev embedding, coming
from the fundamental theorem of calculus. If a (continuously differentiable)
function f : R → R has f  in L1 (R), then we of course have f ∈ L∞ (R);
but the converse is far from true.)
Plancherel’s theorem reveals that Fourier-analytic tools are particularly
powerful when applied to L2 spaces. Because of this, the Fourier transform is
very effective at dealing with the L2 -based Sobolev spaces W s,2 (Rd ), often
abbreviated H s (Rd ). Indeed, using the fact that the Fourier transform
converts regularity to decay, we will see that the H s (Rd ) spaces are nothing
more than Fourier transforms of weighted L2 spaces, and in particular enjoy
a Hilbert space structure. These Sobolev spaces, and in particular the energy
space H 1 (Rd ), are of particular importance in any PDE that involves some
sort of energy functional (this includes large classes of elliptic, parabolic,
dispersive, and wave equations, and especially those equations connected to
physics and/or geometry).
We will not fully develop the theory of Sobolev spaces here, as this would
require the theory of singular integrals, which is beyond the scope of this

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
238 1. Real analysis

course. There are of course many references for further reading, such as
[St1970].

1.14.1. Hölder spaces. Throughout these notes, d ≥ 1 is a fixed dimen-


sion.
Before we study Sobolev spaces, let us first look at the more elementary
theory of Hölder spaces C k,α (Rd ), which resemble Sobolev spaces but with
the aspect of width removed (thus Hölder norms only measure a combination
of height and frequency scale). One can define these spaces on many domains
(for instance, the C 0,α norm can be defined on any metric space) but we shall
largely restrict our attention to Euclidean spaces Rd for sake of concreteness.
We first recall the C k (Rd ) spaces, which we have already been implicitly
using in previous lectures. The space C 0 (Rd ) = BC(Rd ) is the space of
bounded continuous functions f : Rd → C on Rd , with norm
f C 0 (Rd ) := sup |f (x)| = f L∞ (Rd ) .
x∈Rd

This norm gives C 0 the structure of a Banach space. More generally, one
can then define the spaces C k (Rd ) for any non-negative integer k as the
space of all functions which are k times continuously differentiable, with all
derivatives of order k bounded, and whose norm is given by the formula
k k
f C k (Rd ) := sup |∇j f (x)| = ∇j f L∞ (Rd ) ,
j=0 x∈Rd j=0

where we view ∇j fas a rank j, dimension d tensor with complex coefficients


(or equivalently, as a vector of dimension dj with complex coefficients), thus
∂j
|∇j f (x)| = ( | f (x)|2 )1/2 .
∂xi1 · · · ∂xij
i1 ,...,ij =1,...,d

(One does not have to use the 2 norm here, actually; since all norms on
a finite-dimensional space are equivalent, any other means of taking norms
here will lead to an equivalent definition of the C k norm. More generally, all
the norms discussed here tend to have several definitions which are equiva-
lent up to constants, and in most cases the exact choice of norm one uses is
just a matter of personal taste.)
Remark 1.14.1. In some texts, C k (Rd ) is used to denote the functions
which are k times continuously differentiable, but whose derivatives up to
kth order are allowed to be unbounded, so for instance ex would lie in
C k (R) for every k under this definition. Here, we will refer to such func-
tions (with unbounded derivatives) as lying in Cloc k (Rd ) (i.e., they are lo-

cally in C k ), rather than C k (Rd ). Similarly, we make a distinction between

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 239

∞ (Rd ) =
∞ k d ) (smooth functions, with no bounds on deriva-
Cloc k=1 Cloc (R
 ∞
tives) and C ∞ (Rd ) = k=1 C k (Rd ) (smooth functions, all of whose deriva-
∞ (R) but not C ∞ (R).
tives are bounded). Thus, for instance, ex lies in Cloc

Exercise 1.14.1. Show that C k (Rd ) is a Banach space.


Exercise 1.14.2. Show that for every d ≥ 1 and k ≥ 0, the C k (Rd ) norm
is equivalent to the modified norm
f C̃ k (Rd ) := f L∞ (Rd ) + ∇k f L∞ (Rd )

in the sense that there exists a constant C (depending on k and d) such that
C −1 f C k (Rd ) ≤ f C̃ k (Rd ) ≤ f C k (Rd )

for all f ∈ C k (Rd ). (Hint: Use Taylor series with remainder.) Thus when
defining the C k norms, one does not really need to bound all the intermediate
derivatives ∇j f for 0 < j < k; the two extreme terms j = 0, j = k suffice.
(This is part of a more general interpolation phenomenon; the extreme terms
in a sum often already suffice to control the intermediate terms.)
Exercise 1.14.3. Let φ ∈ Cc∞ (Rd ) be a bump function, and let k ≥ 0.
Show that if ξ ∈ Rd with |ξ| ≥ 1, R ≥ 1/|ξ|, and A > 0, then the function
Aφ(x/R) sin(ξ · x) has a C k norm of at most CA|ξ|k , where C is a constant
depending only on φ, d and k. Thus we see how the Cc∞ norm relates to the
height A, width Rd , and frequency scale N of the function, and in particular
how the width R is largely irrelevant. What happens when the condition
R ≥ 1/|ξ| is dropped?

We clearly have the inclusions


C 0 (Rd ) ⊃ C 1 (Rd ) ⊃ C 2 (Rd ) ⊃ · · ·
and for any constant-coefficient partial differential operator
∂ i1 +···+id
L= ci1 ,...,id
∂xi1 · · · ∂xid
i1 ,...,id ≥0:i1 +···+id ≤m 1 d

of some order m ≥ 0, it is easy to see that L is a bounded linear operator


from C k+m (Rd ) to C k (Rd ) for any k ≥ 0.
The Hölder spaces C k,α (Rd ) are designed to “fill up the gaps” between
the discrete spectrum C k (Rd ) of the continuously differentiable spaces. For
k = 0 and 0 ≤ α ≤ 1, these spaces are defined as the subspace of functions
f ∈ C 0 (Rd ) whose norm
|f (x) − f (y)|
f C 0,α (Rd ) := f C 0 (Rd ) + sup
x,y∈Rd :x=y |x − y|α

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
240 1. Real analysis

is finite. To put it another way, f ∈ C 0,α (Rd ) if f is bounded and continuous


and furthermore obeys the Hölder continuity bound
|f (x) − f (y)| ≤ C|x − y|α
for some constant C > 0 and all x, y ∈ Rd .
The space C 0,0 (Rd ) is easily seen to be just C 0 (Rd ) (with an equivalent
norm). At the other extreme, C 0,1 (Rd ) is the class of Lipschitz functions,
and is also denoted Lip(Rd ) (and the C 0,1 norm is also known as the Lips-
chitz norm).
Exercise 1.14.4. Show that C 0,α (Rd ) is a Banach space for every 0 ≤ α ≤
1.
Exercise 1.14.5. Show that C 0,α (Rd ) ⊃ C 0,β (Rd ) for every 0 ≤ α ≤ β ≤ 1,
and that the inclusion map is continuous.
Exercise 1.14.6. If α > 1, show that the C 0,α (Rd ) norm of a function f
is finite if and only if f is constant. This explains why we generally restrict
the Hölder index α to be less than or equal to 1.
Exercise 1.14.7. Show that C 1 (Rd ) is a proper subspace of C 0,1 (Rd ), and
that the restriction of the C 0,1 (Rd ) norm to C 1 (Rd ) is equivalent to the
C 1 norm. (The relationship between C 1 (Rd ) and C 0,1 (Rd ) is in fact closely
analogous to that between C 0 (Rd ) and L∞ (Rd ), as can be seen from the
fundamental theorem of calculus.)
Exercise 1.14.8. Let f ∈ (Cc∞ (R))∗ be a distribution. Show that f ∈
C 0,1 (R) if and only if f ∈ L∞ (R), and the distributional derivative f  of f
also lies in L∞ (R). Furthermore, for f ∈ C 0,1 (R), show that f C 0,1 (R) is
comparable to f L∞ (R) + f  L∞ (R) .

We can then define the C k,α (Rd ) spaces for natural numbers k ≥ 0 and
0 ≤ α ≤ 1 to be the subspace of C k (Rd ) whose norm
k
f C k,α (Rd ) := ∇j f C 0,α (Rd )
j=0

is finite. (As before, there are a variety of ways to define the C 0,α norm of
the tensor-valued quantity ∇j f , but they are all equivalent to each other.)
Exercise 1.14.9. Show that C k,α (Rd ) is a Banach space which contains
C k+1 (Rd ), and is contained in turn in C k (Rd ).

As before, C k,0 (Rd ) is equal to C k (Rd ), and C k,α (Rd ) is contained in


C k,β (Rd ). The space C k,1 (Rd ) is slightly larger than C k+1 , but is fairly
close to it, thus providing a near-continuum of spaces between the sequence
of spaces C k (Rd ). The following examples illustrate this.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 241

Exercise 1.14.10. Let φ ∈ Cc∞ (R) be a test function, let k ≥ 0 be a natural


number, and let 0 ≤ α ≤ 1.
• Show that the function |x|s φ(x) lies in C k,α (R) whenever s ≥ k + α.
• Conversely, if s is not an integer, φ(0) = 0, and s < k + α, show that
|x|s φ(x) does not lie in C k,α (R).
• Show that |x|k+1 φ(x)1x>0 lies in C k,1 (R), but not in C k+1 (R).
This example illustrates that the quantity k + α can be viewed as mea-
suring the total amount of regularity held by functions in C k,α (R): k full
derivatives, plus an additional α amount of Hölder continuity.

Exercise 1.14.11. Let φ ∈ Cc∞ (Rd ) be a test function, let k ≥ 0 be a


natural number, and let 0 ≤ α ≤ 1. Show that for ξ ∈ Rd with |ξ| ≥ 1, the
function φ(x) sin(ξ · x) has a C k,α (R) norm of at most C|ξ|k+α , for some C
depending on φ, d, k, α.

By construction, it is clear that continuously differential operators L of


order m will map C k+m,α (Rd ) continuously to C k,α (Rd ).
Now we consider what happens with products.

Exercise 1.14.12. Let k, l ≥ 0 be natural numbers, and let 0 ≤ α, β ≤ 1.


• If f ∈ C k (Rd ) and g ∈ C l (Rd ), show that f g ∈ C min(k,l) (Rd ), and
that the multiplication map is continuous from C k (Rd ) × C l (Rd ) to
C min(k,l) (Rd ). (Hint: Reduce to the case k = l and use induction.)
• If f ∈ C k,α (Rd ) and g ∈ C l,β (Rd ), and k + α ≤ l + β, show that
f g ∈ C k,α (Rd ) and that the multiplication map is continuous from
C k,α (Rd ) × C l,β (Rd ) to C k,α (Rd ).
It is easy to see that the regularity in these results cannot be improved (just
take g = 1). This illustrates a general principle, namely that a pointwise
product f g tends to acquire the lower of the regularities of the two factors
f, g.

As one consequence of this exercise, we see that any variable-coeffi-


cient differential operator L of order m with C ∞ (R) coefficients will map
C m+k,α (Rd ) to C k,α (Rd ) for any k ≥ 0 and 0 ≤ α ≤ 1.
We now briefly remark on Hölder spaces on open domains Ω in Euclidean
space Rd . Here, a new subtlety emerges; instead of having just one space
C k,α for each choice of exponents k, α, one actually has a range of spaces to
choose from, depending on what kind of behaviour one wants to impose at
the boundary of the domain. At one extreme, one has the space C k,α (Ω),
defined as the space of k times continuously differentiable functions f : Ω →

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
242 1. Real analysis

C whose Hölder norm


k
|∇j f (x) − ∇j f (y)|
f C k,α (Ω) := sup |∇j f (x)| + sup
|x − y|α
j=0 x∈Ω x,y∈Ω:x=y

is finite; this is the maximal choice for the C k,α (Ω). At the other extreme,
one has the space C0k,α (Ω), defined as the closure of the compactly sup-
ported functions in C k,α (Ω). This space is smaller than C k,α (Ω); for in-
stance, functions in C00,α ((0, 1)) must converge to zero at the endpoints 0, 1,
while functions in C k,α ((0, 1)) do not need to do so. An intermediate space
is C k,α (Rd ) Ω , defined as the space of restrictions of functions in C k,α (Rd )
to Ω. For instance, the restriction of |x|ψ(x) to R\{0}, where ψ is a cut-
off function non-vanishing at the origin, lies in C 1,0 (R\{0}), but is not in
C 1,0 (R) R\{0} or C01,0 (R\{0}) (note that |x|ψ(x) itself is not in C 1,0 (R),
as it is not continuously differentiable at the origin). It is possible to clar-
ify the exact relationships between the various flavours of Hölder spaces on
domains (and similarly for the Sobolev spaces discussed below), but we will
not discuss these topics here.
Exercise 1.14.13. Show that Cc∞ (Rd ) is a dense subset of C0k,α (Rd ) for
any k ≥ 0 and 0 ≤ α ≤ 1. (Hint: To approximate a compactly supported
C k,α function by a Cc∞ one, convolve with a smooth, compactly supported
approximation to the identity.)

Hölder spaces are particularly useful in elliptic PDE because tools such
as the maximum principle lend themselves well to the suprema that appear
inside the definition of the C k,α norms; see [GiTr1998] for a thorough
treatment. For simple examples of elliptic PDE, such as the Poisson equation
Δu = f , one can also use the explicit fundamental solution, through lengthy
but straightforward computations. We give a typical example here:
Exercise 1.14.14 (Schauder estimate). Let 0 < α < 1, and let f ∈
C 0,α (R3 ) be a function supported on the unit ball B(0, 1). Let u be the
unique bounded solution to the Poisson equation Δu = f (where Δ =
3 ∂2
j=1 ∂x2 is the Laplacian), given by convolution with the Newton kernel:
j

1 f (y)
u(x) := dy.
4π R3 |x − y|
(i) Show that u ∈ C 0 (R3 ).
(ii) Show that u ∈ C 1 (R3 ), and rigorously establish the formula
∂u 1 f (y)
(x) = − (xj − yj ) dy
∂xj 4π R3 |x − y|3
for j = 1, 2, 3.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 243

(iii) Show that u ∈ C 2 (R3 ), and rigorously establish the formula


∂2u 1 3(xi − yi )(xj − yj ) δij
(x) = lim [ − ]f (y) dy
∂xi ∂xj 4π ε→0 |x−y|≥ε |x − y| 5 |x − y|3
for i, j = 1, 2, 3, where δij is the Kronecker delta. (Hint: First es-
tablish this in the two model cases when f (x) = 0 and when f is
constant near x.)
(iv) Show that u ∈ C 2,α (R3 ), and establish the Schauder estimate
uC 2,α (R3 ) ≤ Cα f C 0,α (R3 ) ,
where Cα depends only on α.
(v) Show that the Schauder estimate fails when α = 0. Using this,
conclude that there exists f ∈ C 0 (R3 ) supported in the unit ball
such that the function u defined above fails to be in C 2 (R3 ). (Hint:
Use the closed graph theorem, Theorem 1.7.19.) This failure helps
explain why it is necessary to introduce Hölder spaces into elliptic
theory in the first place (as opposed to the more intuitive C k spaces).
Remark 1.14.2. Roughly speaking, the Schauder estimate asserts that if
Δu has C 0,α regularity, then all other second derivatives of u have C 0,α reg-
ularity as well. This phenomenon—that control of a special derivative of u
at some order implies control of all other derivatives of u at that order—is
known as elliptic regularity and relies crucially on Δ being an elliptic differ-
ential operator. We will discuss ellipticity in more detail in Exercise 1.14.36.
The theory of Schauder estimates is by now extremely well developed, and
it applies to large classes of elliptic operators on quite general domains, but
we will not discuss these estimates and their applications to various linear
and nonlinear elliptic PDE here.
Exercise 1.14.15 (Rellich-Kondrakov type embedding theorem for Hölder
spaces). Let 0 ≤ α < β ≤ 1. Show that any bounded sequence of functions
fn ∈ C 0,β (Rd ) that are all supported in the same compact subset of Rn
will have a subsequence that converges in C 0,α (Rd ). (Hint: Use the Arzelá-
Ascoli theorem (Theorem 1.8.23) to first obtain uniform convergence, then
upgrade this convergence.) This is part of a more general phenomenon:
sequences bounded in a high regularity space, and constrained to lie in a
compact domain will tend to have convergent subsequences in low regularity
spaces.

1.14.2. Classical Sobolev spaces. We now turn to the classical Sobolev


spaces W k,p (Rd ), which involve only an integral amount k of regularity.
Definition 1.14.3. Let 1 ≤ p ≤ ∞, and let k ≥ 0 be a natural number. A
function f is said to lie in W k,p (Rd ) if its weak derivatives ∇j f exist and

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
244 1. Real analysis

lie in Lp (Rd ) for all j = 0, . . . , k. If f lies in W k,p (Rd ), we define the W k,p
norm of f by the formula
k
f W k,p (Rd ) := ∇j f Lp (Rd ) .
j=0

(As before, the exact choice of convention in which one measures the Lp
norm of ∇j is not particularly relevant for most applications, as all such
conventions are equivalent up to multiplicative constants.)

The space W k,p (Rd ) is also denoted Lpk (Rd ) in some texts.

Example 1.14.4. W 0,p (Rd ) is of course the same space as Lp (Rd ), thus
the Sobolev spaces generalise the Lebesgue spaces. From Exercise 1.14.8 we
see that W 1,∞ (R) is the same space as C 0,1 (R), with an equivalent norm.
More generally, one can see from induction that W k+1,∞ (R) is the same
space as C k,1 (R) for k ≥ 0, with an equivalent norm. It is also clear that
W k,p (Rd ) contains W k+1,p (Rd ) for any k, p.

Example 1.14.5. The function | sin x| lies in W 1,∞ (R) but is not every-
where differentiable in the classical sense; nevertheless, it has a bounded
weak derivative of cos x sgn(sin(x)). On the other hand, the Cantor function
(a.k.a. the Devil’s staircase) is not in W 1,∞ (R), despite having a classical
derivative of zero at almost every point; the weak derivative is a Cantor
measure, which does not lie in any Lp space. Thus one really does need to
work with weak derivatives rather than classical derivatives to define Sobolev
spaces properly (in contrast to the C k,α spaces).

Exercise 1.14.16. Let φ ∈ Cc∞ (Rd ) be a bump function, k ≥ 0, and


1 ≤ p ≤ ∞. Show that if ξ ∈ Rd with |ξ| ≥ 1, R ≥ 1/|ξ|, and A > 0, then
the function φ(x/R) sin(ξx) has a W k,p (R) norm of at most CA|ξ|k Rd/p ,
where C is a constant depending only on φ, p and k. (Compare this with
Exercise 1.14.3 and Exercise 1.14.11.) What happens when the condition
R ≥ 1/|ξ| is dropped?

Exercise 1.14.17. Show that W k,p (Rd ) is a Banach space for any 1 ≤ p ≤
∞ and k ≥ 0.

The fact that Sobolev spaces are defined using weak derivatives is a tech-
nical nuisance, but in practice one can often end up working with classical
derivatives anyway by means of the following lemma:

Lemma 1.14.6. Let 1 ≤ p < ∞ and k ≥ 0. Then the space Cc∞ (Rd ) of test
functions is a dense subspace of W k,p (Rd ).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 245

Proof. It is clear that Cc∞ (Rd ) is a subspace of W k,p (Rd ). We first show
that the smooth functions Cloc∞ (Rd ) ∩ W k,p (Rd ) are a dense subspace of

W (R ), and we then show that Cc∞ (Rd ) is dense in Cloc


k,p d ∞ (Rd ) ∩ W k,p (Rd ).

We begin with the former claim. Let f ∈ W k,p (Rd ), and let φn be a
sequence of smooth, compactly supported approximations to the identity.
Since f ∈ Lp (Rd ), we see that f ∗ φn converges to f in Lp (Rd ). More
generally, since ∇j f is in Lp (Rd ) for 0 ≤ j ≤ k, we see that (∇j f ) ∗ φn =
∇j (f ∗ φn ) converges to ∇j f in Lp (Rd ). Thus we see that f ∗ φn converges
to f in W k,p (Rd ). On the other hand, as φn is smooth, f ∗ φn is smooth;
and the claim follows.
Now we prove the latter claim. Let f be a smooth function in W k,p (Rd ),
thus ∇j f ∈ Lp (Rd ) for all 0 ≤ j ≤ k. We let η ∈ Cc∞ (Rd ) be a compactly
supported function which equals 1 near the origin, and consider the functions
fR (x) := f (x)η(x/R) for R > 0. Clearly, each fR lies in Cc∞ (Rd ). As
R → ∞, dominated convergence shows that fR converges to f in Lp (Rd ). An
application of the product rule then lets us write ∇fR (x) = (∇f )(x)η(x/R)+
R f (x)(∇η)(x/R). The first term converges to ∇f in L (R ) by dominated
1 p d

convergence, while the second term goes to zero in the same topology; thus
∇fR converges to ∇f in Lp (Rd ). A similar argument shows that ∇j fR
converges to ∇j f in Lp (Rd ) for all 0 ≤ j ≤ k, and so fR converges to f in
W k,p (Rd ), and the claim follows. 

As a corollary of this lemma, we also see that the space S(Rd ) of


Schwartz functions is dense in W k,p (Rd ).

Exercise 1.14.18. Let k ≥ 0. Show that the closure of Cc∞ (Rd ) in W k,∞ (Rd )
is C k+1 (Rd ), thus Lemma 1.14.6 fails at the endpoint p = ∞.

Now we come to the important Sobolev embedding theorem, which allows


one to trade regularity for integrability. We illustrate this phenomenon
first with some very simple cases. First, we claim that the space W 1,1 (R)
embeds continuously into W 0,∞ (R) = L∞ (R), thus trading in one degree
of regularity to upgrade L1 integrability to L∞ integrability. To prove this
claim, it suffices to establish the bound

(1.123) f L∞ (R) ≤ Cf W 1,1 (R)

for all test functions f ∈ Cc∞ (R) and some constant C, as the claim then
follows by taking limits using Lemma 1.14.6. (Note that any limit in ei-
ther the L∞ or W 1,1 topologies is also a limit in the sense of distributions,
and such limits are necessarily unique. Also, since L∞ (R) is the dual space
of L1 (R), the distributional limit of any sequence bounded in L∞ (R) re-
mains in L∞ (R), by Exercise 1.13.28.) To prove (1.123), observe from the

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
246 1. Real analysis

fundamental theorem of calculus that


x
|f (x) − f (0)| = | f  (t) dt| ≤ f  L1 (R) ≤ f W 1,1 (R)
0

for all x; in particular, from the triangle inequality


f L∞ (R) ≤ |f (0)| + f W 1,1 (R) .
Also, taking x to be sufficiently large, we see (from the compact support of
f ) that
|f (0)| ≤ f W 1,1 (R) ,
and (1.123) follows.
Since the closure of Cc∞ (R) in L∞ (R) is C0 (R), we actually obtain the
stronger embedding, that W 1,1 (R) embeds continuously into C0 (R).
Exercise 1.14.19. Show that W d,1 (Rd ) embeds continuously into C0 (Rd ),
thus there exists a constant C (depending only on d) such that
f C0 (Rd ) ≤ Cf W d,1 (Rd )

for all f ∈ W d,1 (Rd ).

Now we turn to Sobolev embedding for exponents other than p = 1 and


p = ∞.
Theorem 1.14.7 (Sobolev embedding theorem for one derivative). Let 1 ≤
p ≤ q ≤ ∞ be such that dp − 1 ≤ dq ≤ dp , but encluding the endpoint cases
(p, q) = (d, ∞), (1, d−1
d
). Then W 1,p (Rd ) embeds continuously into Lq (Rd ).

Proof. By Lemma 1.14.6 and the same limiting argument as before, it suf-
fices to establish the Sobolev embedding inequality
f Lq (Rd ) ≤ Cp,q,d f W 1,p (Rd )

for all test functions f ∈ Cc∞ (Rd ), and some constant Cp,q,d depending
only on p, q, d, as the inequality will then extend to all f ∈ W 1,p (Rd ). To
simplify the notation, we shall use X  Y to denote an estimate of the form
X ≤ Cp,q,d Y , where Cp,q,d is a constant depending on p, q, d (the exact value
of this constant may vary from instance to instance).
The case p = q is trivial. Now let us look at another extreme case,
namely when dp − 1 = dq ; by our hypotheses, this forces 1 < p < d. Here, we
use the fundamental theorem of calculus (and the compact support of f ) to
write

f (x) = − ω · ∇f (x + rω) dr
0

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 247

for any x ∈ Rd and any direction ω ∈ S d−1 . Taking absolute values, we


conclude in particular that

|f (x)|  |∇f (x + rω)| dr.
0
We can average this over all directions ω:

|f (x)|  |∇f (x + rω) drdω.
S d−1 0
Switching from polar coordinates back to Cartesian (multiplying and divid-
ing by rd−1 ), we conclude that
1
|f (x)|  |∇f (x − y)| dy,
Rd |y|
d−1

thus f is pointwise controlled by the convolution of |∇f | with the fractional


1
integration |x|d−1 . By the Hardy-Littlewood-Sobolev theorem on fractional
integration (Corollary 1.11.18) we conclude that
f Lq (Rd )  ∇f Lp (Rd ) ,
and the claim follows. (Note that the hypotheses 1 < p < d are needed here
in order to be able to invoke this theorem.)
Now we handle intermediate cases, when dp − 1 < dq < dp . (Many of these
cases can be obtained from the endpoints already established by interpola-
tion, but unfortunately not all such cases can be, so we will treat this case
separately.) Here, the trick is not to integrate out to infinity, but instead to
integrate out to a bounded distance. For instance, the fundamental theorem
of calculus gives
R
f (x) = f (x + Rω) − ω · ∇f (x + rω) dr
0
for any R > 0; hence
R
|f (x)|  |f (x + Rω)| + |∇f (x + rω)| dr.
0
What value of R should one pick? If one picks any specific value of R,
one would end up with an average of f over spheres, which looks somewhat
unpleasant. But what one can do here is average over a range of R’s, for
instance between 1 and 2. This leads to
2 2
|f (x)|  |f (x + Rω)| dR + |∇f (x + rω)| dr;
1 0
averaging over all directions ω and converting back to Cartesian coordinates,
we see that
1
|f (x)|  |f (x − y)| dy + |∇f (x − y)| dy.
|y|≤2 |y|
d−1
1≤|y|≤2

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
248 1. Real analysis

Thus one is bounding |f | pointwise (up to constants) by the convolution of


|f | with the kernel K1 (y) := 11≤|y|≤2 , plus the convolution of |∇f | with the
1
kernel K2 (y) := 1|y|≤2 |y|d−1 . A short computation shows that both kernels
lie in Lr (Rd ), where r is the exponent in Young’s inequality, and more
specifically that 1q + 1 = p1 + 1r (and in particular 1 < r < d−1
d
). Applying
Young’s inequality (Exercise 1.11.25), we conclude that

f Lq (Rd )  f Lp (Rd ) + ∇f Lp (Rd ) ,

and the claim follows. 

Remark 1.14.8. It is instructive to insert the example in Exercise 1.14.16


into the Sobolev embedding theorem. By replacing the W 1,p (Rd ) norm with
the Lq (Rd ) norm, one trades one factor of the frequency scale |ξ| for 1q − 1p
powers of the width Rd . This is consistent with the Sobolev embedding
theorem so long as Rd  1/|ξ|d , which is essentially one of the hypotheses in
that exercise. Thus, one can view Sobolev embedding as an assertion that
the width of a function must always be greater than or comparable to the
wavelength scale (the reciprocal of the frequency scale) raised to the power
of the dimension; this is a manifestation of the uncertainty principle (see
Section 2.6 for further discussion).

Exercise 1.14.20. Let d ≥ 2. Show that the Sobolev endpoint estimate fails
in the case (p, q) = (d, ∞). (Hint: Experiment with functions f of the form

f (x) := N n
n=1 φ(2 x), where φ is a test function supported on the annulus
{1 ≤ |x| ≤ 2}.) Conclude in particular that W 1,d (Rd ) is not a subset
of L∞ (Rd ). (Hint: Either use the closed graph theorem or some variant
of the function f used in the first part of this exercise.) Note that when
d = 1, the Sobolev endpoint theorem for (p, q) = (1, ∞) follows from the
fundamental theorem of calculus, as mentioned earlier. There are substitutes
known for the endpoint Sobolev embedding theorem, but they involve more
sophisticated function spaces, such as the space BMO of spaces of bounded
mean oscillation, which we will not discuss here.

The p = 1 case of the Sobolev inequality cannot be proven via the Hardy-
Littlewood-Sobolev inequality; however, there are other proofs available.
One of these (due to Gagliardo and Nirenberg) is based on the following.

Exercise 1.14.21 (Loomis-Whitney inequality). Let d ≥ 1, let f1 , . . . , fd ∈


Lp (Rd−1 ) for some 0 < p ≤ ∞, and let F : Rd → C be the function


d
F (x1 , . . . , xd ) := fi (x1 , . . . , xi−1 , xi+1 , . . . , xd ).
i=1

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 249

Show that

d
F Lp/(d−1) (Rd ) ≤ fi Lp (Rd ) .
i=1
(Hint: Induct on d, using Hölder’s inequality and Fubini’s theorem.)
Lemma 1.14.9 (Endpoint Sobolev inequality). W 1,1 (Rd ) embeds continu-
ously into Ld/(d−1) (Rd ).

Proof. It will suffice to show that


f Ld/(d−1) (Rd ) ≤ ∇f L1 (Rd )
for all test functions f ∈ Cc∞ (Rd ). From the fundamental theorem of calcu-
lus we see that
∂f
|f (x1 , . . . , xd )| ≤ | (x1 , . . . , xi−1 , t, xi+1 , . . . , xd )| dt,
R ∂xi
and thus
|f (x1 , . . . , xd )| ≤ fi (x1 , . . . , xi−1 , xi+1 , . . . , xd ),
where
fi (x1 , . . . , xi−1 , xi+1 , . . . , xd ) := |∇f (x1 , . . . , xi−1 , t, xi+1 , . . . , xd )| dt.
R
From Fubini’s theorem we have
fi L1 (Rd ) = ∇f L1 (Rd ) ,
and hence by the Loomis-Whitney inequality
f1 · · · fd L1/(d−1) (Rd ) ≤ ∇f dL1 (Rd ) ,
and the claim follows. 
Exercise 1.14.22 (Connection between Sobolev embedding and isoperi-
metric inequality). Let d ≥ 2, and let Ω be an open subset of Rd whose
boundary ∂Ω is a smooth (d − 1)-dimensional manifold. Show that the sur-
face area |∂Ω| of Ω is related to the volume |Ω| of Ω by the isoperimetric
inequality
|Ω| ≤ Cd |∂Ω|d/(d−1)
for some constant Cd depending only on d. (Hint: Apply the endpoint
Sobolev theorem to a suitably smoothed out version of 1Ω .) It is also possi-
ble to reverse this implication and deduce the endpoint Sobolev embedding
theorem from the isoperimetric inequality and the co-area formula, which
we will do in later notes.
Exercise 1.14.23. Use dimensional analysis to argue why the Sobolev em-
bedding theorem should fail when dq < dp − 1. Then create a rigorous coun-
terexample to that theorem in this case.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
250 1. Real analysis

Exercise 1.14.24. Show that W k,p (Rd ) embeds into W l,q (Rd ) whenever
k ≥ l ≥ 0 and 1 < p < q ≤ ∞ are such that dp − k ≤ dq − l, and such that at
least one of the two inequalities q ≤ ∞, dp − k ≤ dq − l is strict.
Exercise 1.14.25. Show that the Sobolev embedding theorem fails
whenever
n q < p. (Hint: Experiment with functions of the form f (x) =
j=1 φ(x − xj ), where φ is a test function and the xj are widely separated
points in space.)
Exercise 1.14.26 (Hölder-Sobolev embedding). Let d < p < ∞. Show that
W 1,p (Rd ) embeds continuously into C 0,α (Rd ), where 0 < α < 1 is defined
by the scaling relationship dp − 1 = −α. Use dimensional analysis to justify
why one would expect this scaling relationship to arise naturally, and give
an example to show that α cannot be improved to any higher exponent.
More generally, with the same assumptions on p, α, show that
W k+1,p (Rd )
embeds continuously into C k,α (Rd ) for all natural numbers
k ≥ 0.
Exercise 1.14.27 (Sobolev product theorem, special case). Let k ≥ 1,
1 < p, q < d/k, and 1 < r < ∞ be such that p1 + 1q − kd = 1r . Show that
whenever f ∈ W k,p (Rd ) and g ∈ W k,q (Rd ), then f g ∈ W k,r (Rd ), and that
f gW k,r (Rd ) ≤ Cp,q,k,d,r f W k,p (Rd ) gW k,q (Rd )
for some constant Cp,q,k,d,r depending only on the subscripted parameters.
(This is not the most general range of parameters for which this sort of
product theorem holds, but it is an instructive special case.)
Exercise 1.14.28. Let L be a differential operator of order m whose co-
efficients lie in C ∞ (Rd ). Show that L maps W k+m,p (Rd ) continuously to
W k,p (Rd ) for all 1 ≤ p ≤ ∞ and all integers k ≥ 0.

1.14.3. L2 -based Sobolev spaces. It is possible to develop more general


Sobolev spaces W s,p (Rd ) than the integer-regularity spaces W k,p (Rd ) de-
fined above, in which s is allowed to take any real number (including negative
numbers) as a value, although the theory becomes somewhat pathological
unless one restricts attention to the range 1 < p < ∞, for reasons having to
do with the theory of singular integrals.
As the theory of singular integrals is beyond the scope of this course, we
will illustrate this theory only in the model case p = 2, in which Plancherel’s
theorem is available, which allows one to avoid dealing with singular integrals
by working purely on the frequency space side.
To explain this, we begin with the Plancherel identity

|f (x)|2 dx = |fˆ(ξ)|2 dξ,


Rd Rd

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 251

which is valid for all L2 (Rd ) functions and in particular for Schwartz func-
tions f ∈ S(Rd ). Also, we know that the Fourier transform of any derivative
∂f ˆ
∂xj f of f is −2πiξj f (ξ). From this we see that

∂f
| (x)|2 dx = (2π|ξj |)2 |fˆ(ξ)|2 dξ,
Rd ∂xj Rd

for all f ∈ S(Rd ), and so on summing in j, we have

|∇f (x)|2 dx = (2π|ξ|)2 |fˆ(ξ)|2 dξ.


Rd Rd
A similar argument then gives

|∇j f (x)|2 dx = (2π|ξ|)2j |fˆ(ξ)|2 dξ,


Rd Rd
and so on summing in j, we have
k
f 2W k,2 (Rd ) = (2π|ξ|)2j |fˆ(ξ)|2 dξ
Rd j=0

for all k ≥ 0 and all Schwartz functions f ∈ S(Rd ). Since the Schwartz
functions are dense in W k,2 (Rd ), a limiting argument (using the fact that
L2 is complete) then shows that the above formula also holds for all f ∈
W k,2 (Rd ).

Now observe that the quantity kj=0 (2π|ξ|)2j is comparable (up to con-
stants depending on k, d) to the expression ξ2k , where x := (1 + |x|2 )1/2
(this quantity is sometimes known as the Japanese bracket of x). We thus
conclude that
f W k,2 (Rd ) ∼ ξk fˆ(ξ)L2 (Rd ) ,
where we use x ∼ y here to denote the fact that x and y are comparable up
to constants depending on d, k, and ξ denotes the variable of independent
variable on the right-hand side. If we then define, for any real number s,
the space H s (Rd ) to be the space of all tempered distributions f such that
the distribution ξs fˆ(ξ) lies in L2 and give this space the norm
f H s (Rd ) := ξs fˆ(ξ)L2 (Rd ) ,

then we see that W k,2 (Rd ) embeds into H k (Rd ) and that the norms are
equivalent.
Actually, the two spaces are equal:
Exercise 1.14.29. For any s ∈ R, show that S(Rd ) is a dense subspace of
H s (Rd ). Use this to conclude that W k,2 (Rd ) = H k (Rd ) for all non-negative
integers k.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
252 1. Real analysis


It is clear that H 0 (Rd ) ≡ L2 (Rd ), and that H s (Rd ) ⊂ H s (Rd ) when-
ever s > s . The spaces H s (Rd ) are also (complex) Hilbert spaces, with the
Hilbert space inner product

f, gH s (Rd ) := ξ2s f (ξ)g(ξ) dξ.


Rd
It is not hard to verify that this inner product does indeed give H s (Rd )
the structure of a Hilbert space (indeed, it is isomorphic under the Fourier
transform to the Hilbert space L2 (ξ2s dξ) which is isomorphic in turn under
the map F (ξ) → ξs F (ξ) to the standard Hilbert space L2 (Rd )).
Being a Hilbert space, H s (Rd ) is isomorphic to its dual H s (Rd )∗ (or
more precisely, to the complex conjugate of this dual). There is another
duality relationship which is also useful:
Exercise 1.14.30 (Duality between H s and H −s ). Let s ∈ R, and f ∈
H s (Rd ). Show also for any continuous linear functional λ : H s (Rd ) → C
there exists a unique g ∈ H −s (Rd ) such that
λ(f ) = f, gL2 (Rd )
for all f ∈ H s (Rd ), where the inner product f, gL2 (Rd ) is defined via the
Fourier transform as
f, gL2 (Rd ) := fˆ(ξ)ĝ(ξ) dξ.
Rd
Also show that
f H s (Rd ) := sup{|f, gL2 (Rd ) : g ∈ S(Rd ); gH −s (Rd ) ≤ 1}
for all f ∈ H s (Rd ).

The H s Sobolev spaces also enjoy the same type of embedding estimates
as their classical counterparts:
Exercise 1.14.31 (Sobolev embedding for H s , I). If s > d/2, show
that H s (Rd ) embeds continuously into C 0,α (Rd ) whenever 0 < α ≤
min(s − d2 , 1). (Hint: Use the Fourier inversion formula and the Cauchy-
Schwarz inequality.)
Exercise 1.14.32 (Sobolev embedding for H s , II). If 0 < s < d/2, show
that H s (Rd ) embeds continuously into Lq (Rd ) whenever d2 − s ≤ dq ≤ d2 .
(Hint: It suffices to handle the extreme case dq = d2 − s. For this, first
reduce to establishing the bound f Lq (Rd ) ≤ Cf H s (Rd ) to the case when
f ∈ H s (Rd ) is a Schwartz function whose Fourier transform vanishes near
the origin (and C depends on s, d, q), and write fˆ(ξ) = ĝ(ξ)/|ξ|s for some
g which is bounded in L2 (Rd ). Then use Exercise 1.13.35 and Corollary
1.11.18).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 253

Exercise 1.14.33. In this exercise we develop a more elementary variant


of Sobolev spaces, the Lp Hölder spaces. For any 1 ≤ p ≤ ∞ and 0 < α < 1,
let Λpα (Rd ) be the space of functions f whose norm
τx f − f Lp (Rd )
f Λpα (Rd ) := f Lp (Rd ) + sup
x∈Rd \{0} |x|α
is finite, where τx (y) := f (y − x) is the translation of f by x. Note that
Λ∞ d
α (R ) = C
0,α (Rd ) (with equivalent norms).

(i) For any 0 < α < 1, establish the inclusions Λ2α+ε (Rd ) ⊂ H α (Rd ) ⊂
Λ2α (Rd ) for any 0 < ε < 1 − α. (Hint: Take Fourier transforms and
work in frequency space.)
(ii) Let φ ∈ Cc∞ (Rd ) be a bump function, and let φn be the approxima-
tions to the identity φn (x) := 2dn φ(2n x). If f ∈ Λpα (Rd ), show that
one has the equivalence
f Λpα (Rd ) ∼ f Lp (Rd ) + sup 2αn f ∗ φn+1 − f ∗ φn Lp (Rd ) ,
n≥0
where we use x ∼ y to denote the assertion that x and y are compa-
rable up to constants depending on p, d, α. (Hint: To upper bound
τx f − f Lp (Rd ) for |x| ≤ 1, express f as a telescoping sum of
f ∗ φn+1 − f ∗ φn for 2−n ≤ x, plus a final term f ∗ φn0 where 2−n0
is comparable to x.)
(iii) If 1 ≤ p ≤ q ≤ ∞ and 0 < α < 1 are such that dp − α < dq , show that
Λpα (Rd ) embeds continuously into Lq (Rd ). (Hint: Express f (x) as
f ∗ φ1 ∗ φ0 plus a telescoping series of f ∗ φn+1 ∗ φn − f ∗ φn ∗ φn−1 ,
where φn is as in the previous exercise. The additional convolution
is in place in order to apply Young’s inequality.)
The functions f ∗ φn+1 − f ∗ φn are crude versions of Littlewood-Paley pro-
jections, which play an important role in harmonic analysis and non-linear
wave and dispersive equations.
Exercise 1.14.34 (Sobolev trace theorem, special case). Let s > 1/2. For
any f ∈ Cc∞ (Rd ), establish the Sobolev trace inequality
f Rd−1 H s−1/2 (Rd ) ≤ Cf H s (Rd ) ,
where C depends only on d and s, and f Rd−1 is the restriction of f to the
standard hyperplane Rd−1 ≡ Rd−1 × {0} ⊂ Rd . (Hint: Convert everything
to L2 -based statements involving the Fourier transform of f , and use Schur’s
test; see Lemma 1.11.14.)
Exercise 1.14.35. (i) Show that if f ∈ H s (Rd ) for some s ∈ R and

g ∈ C (R ), then f g ∈ H s (Rd ) (note that this product has to be
d

defined in the sense of tempered distributions if s is negative), and

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
254 1. Real analysis

the map f → f g is continuous from H s (Rd ) to H s (Rd ). (Hint: As


with the previous exercise, convert everything to L2 -based statements
involving the Fourier transform of f , and use Schur’s test.)
(ii) Let L be a partial differential operator of order m with coefficients in
C ∞ (Rd ) for some m ≥ 0. Show that L maps H s (Rd ) continuously
to H s−m (Rd ) for all s ∈ R.

Now we consider a partial converse to Exercise 1.14.35.

Exercise 1.14.36 (Elliptic regularity). Let m ≥ 0, and let


∂d
L= cj1 ,...,jd
∂xj1 · · · ∂xjd
j1 ,...,jd ≥0;j1 +···+jd =m

be a constant-coefficient homogeneous differential operator of order m. De-


fine the symbol l : Rd → C of L to be the homogeneous polynomial of degree
m, defined by the formula

L(ξ1 , . . . , ξd ) := cj1 ,...,jd ξj1 · · · ξjd .


j1 ,...,jd ≥0;j1 +···+jd =m

We say that L is elliptic if one has the lower bound


l(ξ) ≥ c|ξ|m
for all ξ ∈ Rd and some constant c > 0. Thus, for instance, the Laplacian
is elliptic. Another example of an elliptic operator is the Cauchy-Riemann
operator ∂x∂ 1 − i ∂x∂ 2 in R2 . On the other hand, the heat operator ∂t

− Δ,
2

the Schrödinger operator i ∂t + Δ, and the wave operator − ∂t

2 + Δ are not
1+d
elliptic on R .
(i) Show that if L is elliptic of order m, and f is a tempered distribution
such that f, Lf ∈ H s (Rd ), then f ∈ H s+m (Rd ), and show that one
has the bound
(1.124) f H s+m (Rd ) ≤ C(f H s (Rd ) + Lf H s (Rd ) )

for some C depending on s, m, d, L. (Hint: Once again, rewrite ev-


erything in terms of the Fourier transform fˆ of f .)
(ii) Show that if L is a constant-coefficient differential operator of m
which is not elliptic, then the estimate (1.124) fails.
(iii) Let f ∈ L2loc (Rd ) be a function which is locally in L2 , and let L
be an elliptic operator of order m. Show that if Lf = 0, then f is
smooth. (Hint: First show inductively that f φ ∈ H k (Rd ) for every
test function φ and every natural number k ≥ 0.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 255

Remark 1.14.10. The symbol l of an elliptic operator (with real coeffi-


cients) tends to have level sets that resemble ellipsoids, hence the name.
In contrast, the symbol of parabolic operators, such as the heat operator
∂t − Δ, has level sets resembling paraboloids, and the symbol of hyperbolic

∂2
operators, such as the wave operator − ∂t 2 + Δ, has level sets resembling
hyperboloids. The symbol in fact encodes many important features of lin-
ear differential operators, in particular controling whether singularities can
form, and how they must propagate in space and/or time; but this topic is
beyond the scope of this course.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/04/30.
Thanks to Antonio, bk, lutfu, PDEbeginner, Polam, timur, and anonymous
commenters for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.15

Hausdorff dimension

A fundamental characteristic of many mathematical spaces (e.g., vector


spaces, metric spaces, topological spaces, etc.) is their dimension, which
measures the complexity or degrees of freedom inherent in the space. There
is no single notion of dimension; instead, there are a variety of different
versions of this concept, with different versions being suitable for different
classes of mathematical spaces. Typically, a single mathematical object may
have several subtly different notions of dimension that one can place on it,
which will be related to each other, and which will often agree with each
other in non-pathological cases, but can also deviate from each other in many
other situations. For instance:

• One can define the dimension of a space X by seeing how it compares


to some standard reference spaces, such as Rn or Cn ; one may view a
space as having dimension n if it can be (locally or globally) identified
with a standard n-dimensional space. The dimension of a vector
space or a manifold can be defined in this fashion.

• Another way to define dimension of a space X is as the largest number


of independent objects one can place inside that space; this can be
used to give an alternate notion of dimension for a vector space or
of an algebraic variety as well as the closely related notion of the
transcendence degree of a field. The concept of VC dimension in
machine learning also broadly falls into this category.

• One can also try to define dimension inductively, for instance declar-
ing a space X to be n-dimensional if it can be separated somehow
by an (n − 1)-dimensional object; thus an n-dimensional object will

257

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
258 1. Real analysis

tend to have maximal chains of subobjects of length n (or n + 1, de-


pending on how one initialises the chain and how one defines length).
This can give a notion of dimension for a topological space or of a
commutative ring (Krull dimension).
The notions of dimension as defined above tend to necessarily take values
in √the natural numbers (or the cardinal numbers); there is no such space as
R 2 , for instance, nor can one talk about a basis consisting of π linearly
independent elements or a chain of maximal ideals of length e. There is
however a somewhat different approach to the concept of dimension which
makes no distinction between integer and non-integer dimensions, and is
suitable for studying rough sets such as fractals. The starting point is to
observe that in the d-dimensional space Rd , the volume V of a ball of radius
R grows like Rd , thus giving the following heuristic relationship
log V
(1.125) ≈d
log R
between volume, scale, and dimension. Formalising this heuristic leads to a
number of useful notions of dimension for subsets of Rn (or more generally,
for metric spaces), including (upper and lower) Minkowski dimension (also
known as the box-packing dimension or Minkowski-Bougliand dimension)
and the Hausdorff dimension.

Remark 1.15.1. In K-theory, it is also convenient to work with virtual


vector spaces or vector bundles, such as formal differences of such spaces,
and which may therefore have a negative dimension; but as far as I am aware,
there is no connection between this notion of dimension and the metric ones
given here.

Minkowski dimension can either be defined externally (relating the ex-


ternal volume of δ-neighbourhoods of a set E to the scale δ) or internally
(relating the internal δ-entropy of E to the scale). Hausdorff dimension is
defined internally by first introducing the d-dimensional Hausdorff measure
of a set E for any parameter 0 ≤ d < ∞, which generalises the familiar
notions of length, area, and volume to non-integer dimensions, or to rough
sets, and is of interest in its own right. Hausdorff dimension has a lengthier
definition than its Minkowski counterpart, but it is more robust with re-
spect to operations such as countable unions, and is generally accepted as
the standard notion of dimension in metric spaces. We will compare these
concepts against each other later in these notes.
One use of the notion of dimension is to create finer distinctions between
various types of small subsets of spaces such as Rn , beyond what can be
achieved by the usual Lebesgue measure (or Baire category). For instance,

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 259

a point, line, and plane in R3 all have zero measure with respect to three-
dimensional Lebesgue measure (and are nowhere dense), but of course have
different dimensions (0, 1, and 2, respectively). (Another good example is
provided by Kakeya sets.) This can be used to clarify the nature of vari-
ous singularities, such as that arising from non-smooth solutions to PDE.
A function which is non-smooth on a set of large Hausdorff dimension can
be considered less smooth than one which is non-smooth on a set of small
Hausdorff dimension, even if both are smooth almost everywhere. While
many properties of the singular set of such a function are worth studying
(e.g., their rectifiability), understanding their dimension is often an impor-
tant starting point. The interplay between these types of concepts is the
subject of geometric measure theory.

1.15.1. Minkowski dimension. Before we study the more standard no-


tion of Hausdorff dimension, we begin with the more elementary concept of
the (upper and lower) Minkowski dimension of a subset E of a Euclidean
space Rn .
There are several equivalent ways to approach Minkowski dimension. We
begin with an external approach, based on a study of the δ-neighbourhoods
Eδ := {x ∈ Rn : dist(x, E) < δ} of E, where dist(x, E) := inf{|x − y| : y ∈
E} and we use the Euclidean metric on Rn . These are open sets in Rn and
therefore have a d-dimensional volume (or Lebesgue measure) vold (Eδ ). To
avoid divergences, let us assume for now that E is bounded, so that the Eδ
have finite volume.
Let 0 ≤ d ≤ n. Suppose E is a bounded portion of a k-dimensional
subspace, e.g., E = B d (0, 1) × {0}n−d , where B d (0, 1) ⊂ Rd is the unit ball
in Rd and we identify Rn with Rd × Rn−d in the usual manner. Then we
see from the triangle inequality that

B d (0, 1) × B n−d (0, δ) ⊂ Eδ ⊂ B d (0, 2) × B n−d (0, δ)

for all 0 < δ < 1, which implies that

cδ n−d ≤ voln (Eδ ) ≤ Cδ n−d

for some constants c, C > 0 depending only on n, d. In particular, we have

log voln (Eδ )


lim n − =d
δ→0 log δ

(compare with (1.125)). This motivates our first definition of Minkowski


dimension:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
260 1. Real analysis

Definition 1.15.2. Let E be a bounded subset of Rn . The upper Minkowski


dimension dimM (E) is defined as
log voln (Eδ )
dimM (E) := lim sup n − ,
δ→0 log δ
and the lower Minkowski dimension dimM (E) is defined as
log voln (Eδ )
dimM (E) := lim inf n − .
δ→0 log δ
If the upper and lower Minkowski dimensions match, we refer to dimM (E) :=
dimM (E) = dimM (E) as the Minkowski dimension of E. In particular, the
empty set has a Minkowski dimension of −∞.

Unwrapping all the definitions, we have the following equivalent formu-


lation, where E is a bounded subset of Rn and α ∈ R:
• We have dimM (E) < α iff for every ε > 0, one has voln (Eδ ) ≤
Cδ n−α−ε for all sufficiently small δ > 0 and some C > 0.
• We have dimM (E) < α iff for every ε > 0, one has voln (Eδ ) ≤
Cδ n−α−ε for arbitrarily small δ > 0 and some C > 0.
• We have dimM (E) > α iff for every ε > 0, one has voln (Eδ ) ≥
cδ n−α−ε for arbitrarily small δ > 0 and some c > 0.
• We have dimM (E) > α iff for every ε > 0, one has voln (Eδ ) ≥
cδ n−α−ε for all sufficiently small δ > 0 and some c > 0.

Exercise 1.15.1.∞ (i) −iLet C ⊂ R be the Cantor set consisting of all base
4 strings i=1 ai 4 , where each ai takes values in {0, 3}. Show that
C has Minkowski dimension 1/2. (Hint: Approximate any small δ
by a negative power of 4.)

(ii) Let C ⊂R be the Cantor set consisting of all base 4 strings ∞ −i
i=1 ai 4 ,
where each ai takes values in {0, 3} when (2k)! ≤ i < (2k + 1)! for
some integer k ≥ 0 and ai is arbitrary for the other values of i.
Show that C  has a lower Minkowski dimension of 1/2 and an upper
Minkowski dimension of 1.

Exercise 1.15.2. Suppose that E ⊂ Rn is a compact set with the property


that there exist 0 < r < 1 and an integer k > 1 such that E is equal to the
union of k disjoint translates of r · E := {rx : x ∈ E}. (This is a special case
of a self-similar fractal ; the Cantor set is a typical example.) Show that E
log k
has Minkowski dimension log 1/r .
If the k translates of r · E are allowed to overlap, establish the upper
log k
bound dimM (E) ≤ log 1/r .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 261

It is clear that we have the inequalities


0 ≤ dimM (E) ≤ dimM (E) ≤ n
for non-empty bounded E ⊂ Rn and the monotonicity properties
dimM (E) ≤ dimM (F ), dimM (E) ≤ dimM (F )
whenever E ⊂ F ⊂ Rn
are bounded sets. It is thus natural to extend the
definitions of lower and upper Minkowski dimension to unbounded sets E
by defining
(1.126) dimM (E) := sup dimM (F )
F ⊂E, bounded
and
(1.127) dimM (E) := sup dimM (F ).
bounded
F ⊂E,
In particular, we easily verify that d-dimensional subspaces of Rn have
Minkowski dimension d.
Exercise 1.15.3. Show that any subset of Rn with lower Minkowski di-
mension less than n has Lebesgue measure zero. In particular, any subset
E ⊂ Rn of positive Lebesgue measure must have full Minkowski dimension
dimM (E) = n.
Now we turn to other formulations of Minkowski dimension. Given a
bounded set E and δ > 0, we make the following definitions:
• Nδext (E) (the external δ-covering number of E) is the fewest number
of open balls of radius δ with centres in Rn needed to cover E.
• Nδint (E) (the internal δ-covering number of E) is the fewest number
of open balls of radius δ with centres in E needed to cover E.
• Nδnet (E) (the δ-metric entropy) is the cardinality of the largest δ-net
in E, i.e., the largest set x1 , . . . , xk in E such that |xi − xj | ≥ δ for
every 1 ≤ i < j ≤ k.
• Nδpack (E) (the δ-packing number of E) is the largest number of dis-
joint open balls one can find of radius δ with centres in E.
These four quantities are closely related to each other and to the volumes
n
vol (Eδ ):
Exercise 1.15.4. For any bounded set E ⊂ Rn and any δ > 0, show that
voln (Eδ )
N2δnet
(E) = Nδpack (E) ≤ 2n n n
vol (B (0, δ))
and
voln (Eδ )
≤ Nδext (E) ≤ Nδint (E) ≤ Nδnet (E).
voln (B n (0, δ))

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
262 1. Real analysis

As a consequence of this exercise, we see that


Nδ∗ (E)
(1.128) dimM (E) = lim sup
δ→0 log 1/δ
and
Nδ∗ (E)
(1.129) dimM (E) = lim inf ,
δ→0 log 1/δ
where ∗ is any of ext, int, net, or pack.
One can now take the formulae (1.128) and (1.129) as the definition of
Minkowski dimension for bounded sets (and then use (1.126) and (1.127)
to extend to unbounded sets). The formulations (1.128) and (1.129) for
∗ = int, net, pack have the advantage of being intrinsic—they only involve
E, rather than the ambient space Rn . For metric spaces, one still has a
partial analogue of Exercise 1.15.4, namely
N2δ
net
(E) ≤ Nδpack (E) ≤ Nδint (E) ≤ Nδnet (E).
As such, these formulations of Minkowski dimension extend without any
difficulty to arbitrary bounded metric spaces (E, d) (at least when the spaces
are locally compact), and then to unbounded metric spaces by (1.126) and
(1.127).
Exercise 1.15.5. If φ : (X, dX ) → (Y, dY ) is a Lipschitz map between
metric spaces, show that dimM (φ(E)) ≤ dimM (E) and dimM (φ(E)) ≤
dimM (E) for all E ⊂ X. Conclude in particular that the graph {(x, φ(x)) :
x ∈ Rd } of any Lipschitz function φ : Rd → Rn−d has Minkowski dimension
d, and the graph of any measurable function φ : Rd → Rn−d has Minkowski
dimension at least d.

Note however that the dimension of graphs can become larger than that
of the base in the non-Lipschitz case:
Exercise 1.15.6. Show that the graph {(x, sin x1 ) : 0 < x < 1} has Min-
kowski dimension 3/2.
Exercise 1.15.7. Let (X, d) be a bounded metric space. For each n ≥ 0,
let En be a maximal 2−n -net of X (thus the cardinality of En is N2net
−n (X)).
Show that for any continuous function f : X → R and any x0 ∈ X, one has
the inequality
sup f (x) ≤ sup f (x0 )
x∈X x0 ∈E0

+ sup (f (xn ) − f (xn+1 )).
3 −n
n=0 xn ∈En ,xn+1 ∈En+1 :|xn −xn+1 |≤ 2 2

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 263

(Hint: For any x ∈ X, define xn ∈ En to be the nearest point in En to x,


and use a telescoping series.) This inequality (and variants thereof), which
replaces a continuous supremum of a function f (x) by a sum of discrete
suprema of differences f (xn ) − f (xn+1 ) of that function, is the basis of the
generic chaining technique in probability, used to estimate the supremum
of a continuous family of random processes. It is particularly effective when
combined with bounds on the metric entropy N2net −n (X), which of course is
closely related to the Minkowski dimension of X, and with large deviation
bounds on the differences f (xn ) − f (xn+1 ). A good reference for generic
chaining is [Ta2005].
Exercise 1.15.8. If E ⊂ Rn and F ⊂ Rm are bounded sets, show that
dimM (E) + dimM (F ) ≤ dimM (E × F )
and
dimM (E × F ) ≤ dimM (E) + dimM (F ).
Give a counterexample that shows that either of the inequalities here can
be strict. (Hint: There are many possible constructions; one of them is a
modification of Exercise 1.15.1(ii).)

It is easy to see that Minkowski dimension reacts well to finite unions,


and, more precisely, that
dimM (E ∪ F ) = max(dimM (E), dimM (F ))
and
dimM (E ∪ F ) = max(dimM (E), dimM (F ))
for any E, F ⊂ Rn ; however, it does not respect countable unions. For
instance, the rationals Q have Minkowski dimension 1, despite being the
countable union of points, which of course have Minkowski dimension 0.
More generally, it is not difficult to see that any set E ⊂ Rn has the
same upper or lower Minkowski dimension as its topological closure E, since
both sets have the same δ-neighbourhoods. Thus we see that the notion of
Minkowski dimension misses some of the fine structure of a set E, in par-
ticular the presence of holes within the set. We now turn to the notion of
Hausdorff dimension, which rectifies some of these defects.

1.15.2. Hausdorff measure. The Hausdorff approach to dimension be-


gins by noting that d-dimensional objects in Rn tend to have a meaningful
d-dimensional measure to assign to them. For instance, the 1-dimensional
boundary of a polygon has a perimeter, the 0-dimensional vertices of that
polygon have a cardinality, and the polygon itself has an area. So to define
the notion of a d-Hausdorff dimensional set, we will first define the notion
of the d-dimensional Hausdorff measure Hd (E) of a set E.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
264 1. Real analysis

To do this, let us quickly review one of the (many) constructions of n-


dimensional Lebesgue measure, which we are denoting here by voln . One
way to build this measure is to work with half-open boxes B = ni=1 [ai , bi )
in Rn , which we assign a volume of |B| := ni=1 (bi −ai ). Given this notion of
volume for boxes, we can then define the outer Lebesgue measure (voln )∗ (E)
of any set E ⊂ Rn by the formula

n ∗
(vol ) (E) := inf{ |Bk | : Bk covers E},
k=1
where the infimum ranges over all at most countable collections B1 , B2 , . . .
of boxes that cover E. One easily verifies that (voln )∗ is indeed an outer
measure (i.e., it is monotone, countably subadditive, and assigns zero to the
empty set). We then define a set A ⊂ Rn to be (voln )∗ -measurable if one
has the additivity property
(voln )∗ (E) = (voln )∗ (E ∩ A) + (voln )∗ (E\A)
for all E ⊂ Rn . By Carathéodory’s theorem, the space of (voln )∗ -measurable
sets is a σ-algebra, and outer Lebesgue measure is a countably additive
measure on this σ-algebra, which we denote voln . Furthermore, one easily
verifies that every box B is (voln )∗ -measurable, which soon implies that
every Borel set is also; thus Lebesgue measure is a Borel measure (though
it can also of course measure some non-Borel sets).
Finally, one needs to verify that the Lebesgue measure voln (B) of a box
is equal to its classical volume |B|; the above construction trivially gives
voln (B) ≤ |B|, but the converse is not as obvious. This is in fact a rather
delicate matter, relying in particular on the completeness of the reals; if one
replaced R by the rationals Q, for instance, then all the above construc-
tions go through, but now boxes have Lebesgue measure zero (why?). See
[Fo2000, Chapter 1], for instance, for details.
Anyway, we can use this construction of Lebesgue measure as a model for
building d-dimensional Hausdorff measure. Instead of using half-open boxes
as the building blocks, we will instead work with the open balls B(x, r). For
d-dimensional measure, we will assign each ball B(x, r) a measure rd (cf.
(1.125)). We can then define the unlimited Hausdorff content hd,∞ (E) of a
set E ⊂ Rn by the formula

hd,∞ (E) := inf{ rkd : B(xk , rk ) covers E},
k=1
where the infimum ranges over all at most countable families of balls that
cover E. (Note that if E is compact, then it would suffice to use finite
coverings, since every open cover of E has a finite subcover. But in general,
for non-compact E we must allow the use of infinitely many balls.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 265

As with Lebesgue measure, hd,∞ is easily seen to be an outer mea-


sure, and one could define the notion of a hd,∞ -measurable set on which
Carathéodory’s theorem applies to build a countably additive measre. Un-
fortunately, a key problem arises: once d is less than n, most sets cease
to be hd,∞ -measurable! We illustrate this in the one-dimensional case with
n = 1 and d = 1/2, and consider the problem of computing the unlimited
Hausdorff content h1/2,∞ ([a, b]). On the one hand, this content is at most
2 |
| b−a 1/2 , since one can cover [a, b] by the ball of radius b−a + ε centred at
2
a+b
2 for any ε > 0. On the other hand, the content is also at least | b−a
2 |
1/2 .

To see this, suppose we cover [a, b] by a finite or countable family of balls


B(xk , rk ) (one can reduce to the finite case by compactness, though it is
not
 necessary to do so here). The total one-dimensional Lebesgue measure
k 2rk of these balls must equal or exceed the Lebesgue measure of the
entire interval |b − a|, thus
|b − a|
rk ≥ .
2
k
  1/2
From the inequality k rk ≤ ( k rk )2 (which is obvious after expanding
the right-hand side and discarding cross-terms) we see that
 
1/2 |b − a| 1/2
rk ≥ ,
2
k

and the claim follows.


We now see some serious breakdown of additivity: for instance, the
unlimited 1/2-dimensional content of [0, 2] is 1, despite being the disjoint

union of [0, 1] and (1, 2], which each have an unlimited content of 1/ 2. In
particular, this shows that [0, 1] (for instance) is not measurable with respect
to the unlimited content. The basic problem here is that the most efficient
cover of a union such as [0, 1] ∪ (1, 2] for the purposes of unlimited 1/2-
dimensional content is not coming from covers of the separate components
[0, 1] and (1, 2] of that union, but is instead coming from one giant ball that
covers [0, 2] directly.
To fix this, we will limit the Hausdorff content by working only with
small balls. More precisely, for any r > 0, we define the Hausdorff content
hd,r (E) of a set E ⊂ Rn by the formula

hd,r (E) := inf{ rkd : B(xk , rk ) covers E; rk ≤ r},
k=1

where the balls B(xk , rk ) are now restricted to be less than or equal to r in
radius. This quantity is increasing in r, and we then define the Hausdorff

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
266 1. Real analysis

outer measure (Hd )∗ (E) by the formula


(Hd )∗ (E) := lim hd,r (E).
r→0
(This is analogous to the Riemann integral approach to volume of sets, cov-
ering them by balls, boxes, or rectangles of increasingly smaller size; this
latter approach is also closely connected to the Minkowski dimension con-
cept studied earlier. The key difference between the Lebesgue/Hausdorff
approach and the Riemann/Minkowski approach is that in the former ap-
proach one allows the balls or boxes to be countable in number and variable
in size, whereas in the latter approach the cover is finite and uniform in
size.)
Exercise 1.15.9. Show that if d > n, then (Hd )∗ (E) = 0 for all E ⊂
Rn . Thus d-dimensional Hausdorff measure is only a non-trivial concept for
subsets of Rn in the regime 0 ≤ d ≤ n.
Since each of the hd,r are outer measures, (Hd )∗ is also. But the key
advantage of moving to the Hausdorff measure rather than Hausdorff content
is that we obtain a lot more additivity. For instance:
Exercise 1.15.10. Let E, F be subsets of Rn which have a non-zero sepa-
ration, i.e., the quantity dist(E, F ) = inf{|x − y| : x ∈ E, y ∈ F } is strictly
positive. Show that (Hd )∗ (E ∪ F ) = (Hd )∗ (E) + (Hd )∗ (F ). (Hint: One
inequality is easy. For the other, observe that any small ball can intersect
E or intersect F , but not both.)
One consequence of this is that there is a large class of measurable sets:
Proposition 1.15.3. Let d ≥ 0. Then every Borel subset of Rn is (Hd )∗ -
measurable.

Proof. Since the collection of (Hd )∗ -measurable sets is a σ-algebra, it suf-


fices to show the claim for closed sets A. (It will be slightly more convenient
technically to work with closed sets rather than open ones here.) Thus, we
take an arbitrary set E ⊂ Rn and seek to show that
(Hd )∗ (E) = (Hd )∗ (E ∩ A) + (Hd )∗ (E\A).
We may assume that (Hd )∗ (E ∩ A) and (Hd )∗ (E\A) are both finite, since
the claim is obvious otherwise from monotonicity.
From Exercise 1.15.10 and the fact that (Hd )∗ is an outer measure, we
already have
(Hd )∗ (E ∩ A) + (Hd )∗ (E\A1/m ) ≤ (Hd )∗ (E) ≤ (Hd )∗ (E ∩ A) + (Hd )∗ (E\A),
where A1/m is the 1/m-neighbourhood of A. So it suffices to show that
lim (Hd )∗ (E\A1/m ) = (Hd )∗ (E\A).
m→∞

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 267


For any m, we have the telescoping sum E\A = (E\A1/m ) ∪ l>m Fl , where
Fl := (E\A1/(l+1) ) ∩ Al , and thus by countable subadditivity and mono-
tonicity,
(Hd )∗ (E\A1/m ) ≤ (Hd )∗ (E\A) ≤ (Hd )∗ (E\A1/m ) + (Hd )∗ (Fl ),
l>m
∞ d )∗ (F )
so it suffices to show that the sum l=1 (H l is absolutely convergent.
Consider the even-indexed sets F2 , F4 , F6 , . . .. These sets are separated
from each other, so by many applications of Exercise 1.15.10 followed by
monotonicity we have
L 
L
(Hd )∗ (F2l ) = (Hd )∗ ( F2l ) ≤ (Hd )∗ (E\A) < ∞
l=1 l=1
∞ d ∗
for all L, and thus l=1 (H ) (F2l )
is absolutely convergent. Similarly for
∞ d )∗ (F
l=1 (H 2l−1 ), and the claim follows. 

On the (Hd )∗ -measurable sets E, we write Hd (E) for (Hd )∗ (E), thus Hd
is a Borel measure on Rn . We now study what this measure looks like for
various values of d. The case d = 0 is easy:
Exercise 1.15.11. Show that every subset of Rn is (H0 )∗ -measurable, and
that H0 is counting measure.

Now we look at the opposite case d = n. It is easy to see that any


Lebesgue-null set of Rn has n-dimensional Hausdorff measure zero (since
it may be covered by balls of arbitrarily small total content). Thus n-
dimensional Hausdorff measure is absolutely continuous with respect to
n
Lebesgue measure, and we thus have ddH voln = c for some locally integrable
function c. As Hausdorff measure and Lebesgue measure are clearly transla-
tion-invariant, c must also be translation-invariant and thus constant. We
therefore have
Hn = c voln
for some constant c ≥ 0.
We now compute what this constant is. If ωn denotes the volume of the
unit ball B(0, 1), then we have
1 1 
rkn = voln (B(xk , rk )) ≥ voln ( B(xk , rk ))
ωn ωn
k k k
for any at most countable collection of balls B(xk , rk ). Taking infima, we
conclude that
1
Hn ≥ voln ,
ωn
and so c ≥ ω1n .

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
268 1. Real analysis

In the opposite direction, observe from Exercise 1.15.4 that given any
0 < r < 1, one can cover the unit cube [0, 1]n by at most Cn r−n balls of
radius r, where Cn depends only on n; thus
Hn ([0, 1]n ) ≤ Cn
and so c ≤ Cn ; in particular, c is finite.
We can in fact compute c explicitly (although knowing that c is finite
and non-zero already suffices for many applications):
Lemma 1.15.4. We have c = ω1n , or in other words Hn = ω1n voln . (In
particular, a ball B n (x, r) has n-dimensional Hausdorff measure rn .)

Proof. Let us consider the Hausdorff measure Hn ([0, 1]n ) of the unit cube.
By definition, for any ε > 0 one can find an 0 < r < 1/2 such that
hn,r ([0, 1]n ) ≥ Hn ([0, 1]n ) − ε.
Observe (using Exercise 1.15.4) that we can find at least cn r−n disjoint balls
B(x1 , r), . . . , B(xk , r) of radius r inside the unit cube. We then observe that

k
hn,r ([0, 1]n ) ≤ krn + Hn ([0, 1]n \ B(xk , r)).
i=1

On the other hand,



k 
k
Hn ([0, 1]n \ B(xk , r)) = c voln ([0, 1]n \ B(xk , r)) = c(1 − kωn rn ).
i=1 i=1

Putting all this together, we obtain


c = Hn ([0, 1]n ) ≤ krn + c(1 − kωn rn ) + ε,
which rearranges as
ε
1 − cωn ≥
.
krn
Since krn is bounded below by cn , we can then send ε → 0 and conclude
that c ≥ ω1n ; since we already showed c ≤ ω1n , the claim follows. 

Thus n-dimensional Hausdorff measure is an explicit constant multiple


of n-dimensional Lebesgue measure. The same argument shows that for
integers 0 < d < n, the restriction of d-dimensional Hausdorff measure to
any d-dimensional linear subspace (or affine subspace) V is equal to the
constant ω1d times d-dimensional Lebesgue measure on V . (This shows, by
the way, that Hd is not a σ-finite measure on Rn in general, since one
can partition Rn into uncountably many d-dimensional affine subspaces. In
particular, it is not a Radon measure in general.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 269

One can then compute d-dimensional Hausdorff measure for other sets
than subsets of d-dimensional affine subspaces by changes of variable. For
instance:
Exercise 1.15.12. Let 0 ≤ d ≤ n be an integer, let Ω be an open subset of
Rd , and let φ : Ω → Rn be a smooth injective map which is non-degenerate
in the sense that the Hessian Dφ (which is a d × n matrix) has full rank at
every point of Ω. For any compact subset E of Ω, establish the formula
1
Hd (φ(E)) = J dHd = J d vold ,
E ωd E
where the Jacobian J is the square root of the sum of squares of all the
determinants of the d × d minors of the d × n matrix Dφ. (Hint: By working
locally, one can assume that φ is the graph of some map from Ω to Rn−d , and
so can be inverted by the projection function; by working even more locally,
one can assume that the Jacobian is within an epsilon of being constant.
The image of a small ball in Ω then resembles a small ellipsoid in φ(Ω), and
conversely the projection of a small ball in φ(Ω) is a small ellipsoid in Ω.
Use some linear algebra and several variable calculus to relate the content
of these ellipsoids to the radius of the ball.) It is possible to extend this
formula to Lipschitz maps φ : Ω → Rn that are not necessarily injective,
leading to the area formula
1
#(φ−1 (y)) dHd (y) = J d vold
φ(E) ωd E
for such maps, but we will not prove this formula here.

From this exercise we see that d-dimensional Hausdorff measure does


coincide to a large extent with the d-dimensional notion of surface area;
for instance, for a simple smooth curve γ : [a, b] → Rn with everywhere
non-vanishing derivative, the H1 measure of γ([a, b]) is equal to its classical
length |γ| = a |γ  (t)| dt. One can also handle a certain amount of singu-
b

larity (e.g., piecewise smooth non-degenerate curves rather than everywhere


smooth non-degenerate curves) by exploiting the countable additivity of H1
measure, or by using the area formula alluded to earlier.
Now we see how the Hausdorff measure varies in d.
Exercise 1.15.13. Let 0 ≤ d < d , and let E ⊂ Rn be a Borel set. Show

that if Hd (E) is finite, then Hd (E) is zero; equivalently, if Hd (E) is positive,

then Hd is infinite.
Example 1.15.5. Let 0 ≤ d ≤ n be integers. The unit ball B d (0, 1) ⊂
Rd ⊂ Rn has a d-dimensional Hausdorff measure of 1 (by Lemma 1.15.4),
and so it has zero d -dimensional Hausdorff dimensional measure for d > d
and infinite d -dimensional measure for d < d.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
270 1. Real analysis

On the other hand, we know from Exercise 1.15.11 that H0 (E) is positive
for any non-empty set E, and that Hd (E) = 0 for every d > n. We conclude
(from the least upper bound property of the reals) that for any Borel set E ⊂
Rn , there exists a unique number in [0, n], called the Hausdorff dimension
dimH (E) of E, such that Hd (E) = 0 for all d > dimH (E) and Hd (E) =
∞ for all d < dimH (E). Note that at the critical dimension d = dimH
itself, we allow Hd (E) to be zero, finite, or infinite, and we shall shortly
see in fact that all three possibilities can occur. By convention, we give the
empty set a Hausdorff dimension of −∞. One can also assign Hausdorff
dimension to non-Borel sets, but we shall not do so to avoid some (very
minor) technicalities.
Example 1.15.6. The unit ball B d (0, 1) ⊂ Rd ⊂ Rn has Hausdorff dimen-
sion d, as does Rd itself. Note that the former set has finite d-dimensional
Hausdorff measure, while the latter has an infinite measure. More generally,
any d-dimensional smooth manifold in Rn has Hausdorff dimension d.
Exercise 1.15.14. Show that the graph {(x, sin x1 ) : 0 < x < 1} has Haus-
dorff dimension 1; compare this with Exercise 1.15.6.

It is clear that Hausdorff dimension is monotone: if E ⊂ F are Borel


sets, then dimH (E) ≤ dimH (F ). Since Hausdorff measure is countably
additive, it is also not hard to see that Hausdorff dimension interacts well
with countable unions:


dimH ( Ei ) = sup dimH (Ei ).
i=1 1≤i≤∞

Thus for instance the rationals, being a countable union of 0-dimensional


points, have Hausdorff dimension 0, in contrast to their Minkowski dimen-
sion of 1. On the other hand, we at least have an inequality between Haus-
dorff and Minkowski dimension:
Exercise 1.15.15. For any Borel set E ⊂ Rn , show that dimH (E) ≤
dimM (E) ≤ dimM (E). (Hint: Use (1.129). Which of the choices of ∗ is
most convenient to use here?)

It is instructive to compare Hausdorff dimension and Minkowski dimen-


sion as follows.
Exercise 1.15.16. Let E be a bounded Borel subset of Rn , and let d ≥ 0.
• Show that dimM (E) ≤ d if and only if, for every ε > 0 and
arbitrarily small r > 0, one can cover E by finitely many
balls B(x1 , r1 ), . . . , B(xk , rk ) of radii ri = r equal to r such that
k d+ε
i=1 ri ≤ ε.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 271

• Show that dimM (E) ≤ d if and only if, for every ε > 0 and all
sufficiently small r > 0, one can cover E by finitely many balls

B(x1 , r1 ), . . . , B(xk , rk ) of radii ri = r equal to r such that ki=1 rid+ε
≤ ε.
• Show that dimH (E) ≤ d if and only if, for every ε > 0 and r > 0, one
can cover E by countably many balls B(x1 , r1 ), . . . of radii ri ≤ r at

most r such that ki=1 rid+ε ≤ ε.

The previous two exercises give ways to upper-bound the Hausdorff di-
mension; for instance, we see from Exercise 1.15.2 that self-similar fractals
E of the type in that exercise (i.e., E is k translates of r · E) have Hausdorff
log k
dimension at most log 1/r . To lower-bound the Hausdorff dimension of a set
E, one convenient way to do so is to find a measure with a certain dimension
property (analogous to (1.125)) that assigns a positive mass to E:
Exercise 1.15.17. Let d ≥ 0. A Borel measure μ on Rn is said to be a
Frostman measure of dimension at most d if it is compactly supported there
exists a constant C such that μ(B(x, r)) ≤ Crd for all balls B(x, r) of radius
0 < r < 1. Show that if μ has dimension at most d, then any Borel set E
with μ(E) > 0 has positive d-dimensional Hausdorff content; in particular,
dimH (E) ≥ d.

Note that this gives an alternate way to justify the fact that smooth
d-dimensional manifolds have Hausdorff dimension d, since on the one hand
they have Minkowski dimension d, and on the other hand they support a
non-trivial d-dimensional measure, namely Lebesgue measure.
Exercise 1.15.18. Show that the Cantor set in Exercise 1.15.1(i) has Haus-
dorff dimension 1/2. More generally, establish the analogue of the first part
of Exercise 1.15.2 for Hausdorff measure.
Exercise 1.15.19. Construct a subset of R of Hausdorff dimension 1 that
has zero Lebesgue measure. (Hint: A modified Cantor set, vaguely reminis-
cent of Exercise 1.15.1(ii), can work here.)

A useful fact is that Exercise 1.15.17 can be reversed:


Lemma 1.15.7 (Frostman’s lemma). Let d ≥ 0, and let E ⊂ Rn be a com-
pact set with Hd (E) > 0. Then there exists a non-trivial Frostman measure
of dimension at least d supported on E (thus μ(E) > 0 and μ(Rd \E) = 0).

Proof. Without loss of generality we may place the compact set E in the
half-open unit cube [0, 1)n . It is convenient to work dyadically. For each
integer k ≥ 0, we subdivide [0, 1)n into 2kn half-open cubes Qk,1 , . . . , Qk,2nk
of side length (Qk,i ) = 2−k in the usual manner, and refer to such cubes

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
272 1. Real analysis

as dyadic cubes. For each k and any F ⊂ [0, 1)n , we can define the dyadic
Hausdorff content hΔd,k (F ) to be the quantity


d,2−k (F ) := inf{ (Qkj ,ij )d : Qkj ,ij cover F ; kj ≥ k},
j

where the Qkj ,ij range over all at most countable families of dyadic cubes of
side length at most 2−k that cover F . By covering cubes by balls and vice
versa, it is not hard to see that
chd,C2−k (F ) ≤ hΔ
d,2−k (F ) ≤ Chd,c2−k (F )
for some absolute constants c, C depending only on d, n. Thus, if we define
the dyadic Hausdorff measure
(Hd )Δ (F ) := lim hΔ
d,2−k (F ),
k→∞
then we see that the dyadic and non-dyadic Huausdorff measures are com-
parable:
cHd (F ) ≤ (Hd )Δ (F ) ≤ C(Hd )Δ (F ).
In particular, the quantity σ := (Hd )Δ (E) is strictly positive.
Given any dyadic cube Q of length (Q) = 2−k , define the upper Frost-
man content μ+ (Q) to be the quantity
d,k (E ∩ Q).
μ+ (Q) := hΔ
Then μ+ ([0, 1)n ) ≥ σ. By covering E ∩ Q by Q, we also have the bound
μ+ (Q) ≤ (Q)d .
Finally, by the subadditivity property of Hausdorff content, if we decompose
Q into 2n cubes Q of side length (Q ) = 2−k−1 , we have
μ+ (Q) ≤ μ+ (Q ).
Q

The quantity μ+ behaves like a measure, but is subadditive rather than


additive. Nevertheless, one can easily find another quantity μ(Q) to assign
to each dyadic cube such that
μ([0, 1)n ) = μ+ ([0, 1)n )
and
μ(Q) ≤ μ+ (Q)
for all dyadic cubes, and such that
μ(Q) = μ(Q )
Q

whenever a dyadic cube is decomposed into 2n subcubes of half the side


length. Indeed, such a μ can be constructed by a greedy algorithms starting

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 273

at the largest cube [0, 1)n and working downward; we omit the details. One
can then use this measure μ to integrate any continuous compactly sup-
ported function on Rn (by approximating such a function by one which is
constant on dyadic cubes of a certain scale), and so by the Riesz represen-
tation theorem, it extends to a Radon measure μ supported on [0, 1]n . (One
could also have used the Caratheódory extension theorem at this point.)
Since μ([0, 1)n ) ≥ σ, μ is non-trivial; since μ(Q) ≤ μ+ (Q) ≤ (Q)d for
all dyadic cubes Q, it is not hard to see that μ is a Frostman measure of
dimension at most d, as desired. 

The study of Hausdorff dimension is then intimately tied to the study


of the dimensional properties of various measures. We give some examples
in the next few exercises.

Exercise 1.15.20. Let 0 < d ≤ n, and let E ⊂ Rn be a compact set.


Show that dimH (E) ≥ d if and only if, for every 0 < ε < d, there exists a
compactly supported probability Borel measure μ with
1
dμ(x)dμ(y) < ∞.
Rd Rd |x − y|d−ε
Show that this condition is also equivalent to μ lying in the Sobolev space
H −(n−d+ε)/2 (Rn ). Thus we see a link here between Hausdorff dimension and
Sobolev norms: the lower the dimension of a set, the rougher the measures
that it can support, where the Sobolev scale is used to measure roughness.

Exercise 1.15.21. Let E be a compact subset of Rn , and let μ be a Borel


probability measure supported on E. Let 0 ≤ d ≤ n.
• Suppose that for every ε > 0, every 0 < δ < 1/10, and every subset E 
of E with μ(E  ) ≥ log2 1(1/δ) , one could establish the bound Nδ∗ (E  ) ≥
cε ( 1δ )d−ε for ∗ equal to any of ext, int, net, pack (the exact choice of ∗
is irrelevant thanks to Exercise 1.15.4). Show that E has Hausdorff
dimension at least d. (Hint: Cover E by small balls, then round the
radius of each ball to the nearest powerof 2. Now use countable
additivity and the observation that sum δ log2 1(1/δ) is small when δ
ranges over sufficiently small powers of 2.)
• Show that one can replace μ(E  ) ≥ log2 1(1/δ) with μ(E  ) ≥ log log12 (1/δ)
in the previous statement. (Hint: Instead of rounding the radius
εn
to the nearest power of 2, round instead to radii of the form 1/22
for integers n.) This trick of using a hyper-dyadic range of scales
rather than a dyadic range of scales is due to Bourgain [Bo1999].
The exponent 2 in the double logarithm can be replaced by any other
exponent strictly greater than 1.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
274 1. Real analysis

This should be compared with the task of lower-bounding the lower Minkow-
ski dimension, which only requires control on the entropy of E itself, rather
than of large subsets E  of E. The results of this exercise are exploited
to establish lower bounds on the Hausdorff dimension of Kakeya sets (and
in particular, to conclude such bounds from the Kakeya maximal function
conjecture).
Exercise 1.15.22. Let E ⊂ Rn be a Borel set, and let φ : E → Rm be a
locally Lipschitz map. Show that dimH (φ(E)) ≤ dimH (E), and that if E
has zero d-dimensional Hausdorff measure then so does φ(E).
Exercise 1.15.23. Let φ : Rn → R be a smooth function, and let g : Rn →
R be a test function such that |∇φ| > 0 on the support of g. Establish the
co-area formula

(1.130) g(x)|∇φ(x)| dx = ( g(x) dHn−1 (x)) dt.


Rn R φ−1 (t)
(Hint: Subdivide the support of g to be small, and then apply a change of
variables to make φ linear, e.g., φ(x) = x1 .) This formula is in fact valid for
all absolutely integrable g and Lipschitz φ, but is difficult to prove for this
level of generality, requiring a version of Sard’s theorem.

The co-area formula (1.130) can be used to link geometric inequalities


to analytic ones. For instance, the sharp isoperimetric inequality
n−1 1
voln (Ω) n ≤ 1/n
Hn−1 (∂Ω),
nωn
valid for bounded open sets Ω in Rn , can be combined with the co-area
formula (with g := 1) to give the sharp Sobolev inequality
1
φ n−1n ≤ |∇φ(x)| dx
L (Rn ) 1/n
nωn Rn

for any test function φ, the main point being that φ−1 (t) ∪ φ−1 (−t) is the
boundary of {|φ| ≥ t} (one also needs to do some manipulations relating the
volume of those level sets to φ n−1n
n
). We omit the details.
L (R )

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/05/19.
Thanks to Vicky for corrections.
Further discussion of Hausdorff dimension can be found in [Fa2003],
[Ma1995], [Wo2003], as well as in many other places.
There was some interesting discussion online as to whether there could
be an analogue of K-theory for Hausdorff dimension, although the results of
the discussion were inconclusive.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Chapter 2

Related articles

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.1

An alternate approach
to the Carathéodory
extension theorem

In this section, I would like to give an alternate proof of (a weak form of)
the Carathéodory extension theorem (Theorem 1.1.17). This argument is
restricted to the σ-finite case and does not extend the measure to quite
as large a σ-algebra as is provided by the standard proof of this theorem.
But I find it conceptually clearer (in particular, hewing quite closely to
Littlewood’s principles, and the general Lebesgue philosophy of treating sets
of small measure as negligible), and it suffices for many standard applications
of this theorem, in particular the construction of Lebesgue measure.
Let us first state the precise statement of the theorem:

Theorem 2.1.1 (Weak Carathéodory extension theorem). Let A be a Bool-


ean algebra of subsets of a set X, and let μ : A → [0, +∞] be a function
obeying the following three properties:

(i) μ(∅) = 0.
(ii) Pre-countable
∞ A2 · · · ∈ A are
additivity. If A1 ,  disjoint and such that
∞

n=1 A n also lies in A, then μ( n=1 A n ) = n=1 μ(An ).
(iii) σ-finiteness. X can be covered by at most countably many sets in A,
each of which has finite μ-measure.

Let X be the σ-algebra generated by A. Then μ can be uniquely extended to


a countably additive measure on X .

277

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
278 2. Related articles

We will refer to sets in A as elementary sets and sets in X as measurable


sets. A typical example is when X = [0, 1] and A is the collection of all
sets that are unions of finitely many intervals; in this case, X are the Borel-
measurable sets.

2.1.1. Some basics. Let us first observe that the hypotheses on the pre-
measure μ imply some other basic and useful properties:
From properties (i) and (ii) we see that μ is finitely additive (thus
μ(A1 ∪ · · · ∪ An ) = μ(A1 ) + · · · + μ(An ) whenever A1 , . . . , An are disjoint
elementary sets).
As particular consequences of finite additivity, we have monotonicity
(μ(A) ≤ μ(B) whenever A ⊂ B are elementary sets) and finite subadditivity
(μ(A1 ∪ · · · ∪ An ) ≤ μ(A1 ) + · · · + μ(An ) for all elementary A1 , . . . , An , not
necessarily disjoint).

We also have precountable subadditivity: μ(A) ≤ ∞ n=1 μ(An ) whenever
the elementary sets A1 , A2 , . . . cover the
n−1 elementary set A. To see this, first
observe, by replacing An with An \ i=1 Ai and using monotonicity, that we
may take the Ai to be disjoint. Next, by restricting all the Ai to A and
using monotonicity, we may assume that A is the union of the Ai . Now the
claim is immediate from precountable additivity.

2.1.2. Existence. Let us first verify existence. As is standard in measure-


theoretic proofs for σ-finite spaces, we first handle the finite case (when
μ(X) < ∞), and then rely on countable additivity or subadditivity to recover
the σ-finite case.
The basic idea, following Littlewood’s principles, is to view the measur-
able sets as lying in the completion of the elementary sets, or in other words
to exploit the fact that measurable sets can be approximated to arbitrarily
high accuracy by elementary sets.
∞Define the outer measure μ∗ (A) of a set A ⊂ X to be the infimum of
n=1 μ(An ), where A1 , A2 , . . . range over all at most countable collections of
elementary sets that cover A. It is clear that the outer measure is monotone
and countably subadditive. Also, since μ is precountably subadditive, we
see that μ∗ (A) ≥ μ(A) for all elementary A. Since we also have the trivial
inequality μ∗ (A) ≤ μ(A), we conclude that μ∗ and μ agree on elementary
sets.
The outer measure naturally defines a pseudometric1 (and thus a topol-
ogy) on the space of subsets of X, with the distance between A and B being

1 A pseudometric is a metric in which distinct objects are allowed to be separated by a zero

distance.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.1. Carathéodory extension theorem 279

defined as μ∗ (AΔB), where Δ denotes symmetric difference. (The subad-


ditivity of μ∗ ensures the triangle inequality; furthermore, we see that the
Boolean operations (union, intersection, complement, etc.) are all continu-
ous with respect to this pseudometric.) With this pseudometric, we claim
that the measurable sets lie in the closure of the elementary sets. Indeed, it
is not difficult to see (using subadditivity and monotonicity properties of μ∗ )
that the closure of the elementary sets are closed under finite unions, under
complements, and under countable disjoint unions (here we need finiteness
of μ(X) to keep the measure of all the pieces absolutely summable), and
thus form a σ-algebra. Since this σ-algebra clearly contains the elementary
sets, it must contain the measurable sets also.
By subadditivity of μ∗ , the function A → μ∗ (A) is Lipschitz continuous.
Since this function is finitely additive on elementary sets, we see on taking
limits (using subadditivity to control error terms) that it must be finitely
additive on measurable sets also. Since μ∗ is finitely additive, monotone,
and countably subadditive, it must be countably additive, and so μ∗ is the
desired extension of μ to the measurable sets. This completes the proof of
the theorem in the finite measure case.
To handle the σ-finite case, we partition X into countably many ele-
mentary sets of finite measure and use the above argument to extend μ to
measurable subsets of each such elementary set. It is then a routine matter
to sum together these localised measures to recover a measure on all mea-
surable sets; the precountable additivity property ensures that this sum still
agrees with μ on elementary sets.

2.1.3. Uniqueness. Now we verify uniqueness. Again, we begin with the


finite measure case.
Suppose first that μ(X) < ∞, and that we have two different extensions
μ1 , μ2 : X → [0, +∞] of μ to X that are countably additive. Observe
that μ1 , μ2 must both be continuous with respect to the μ∗ pseudometric
used in the existence argument, from countable subadditivity. Since every
measurable set is a limit of elementary sets in this pseudometric, we obtain
uniqueness in the finite measure case.
When instead X is σ-finite, we cover X by elementary sets of finite
measure. The previous argument shows that any two extensions μ1 , μ2 of μ
agree when restricted to each of these sets, and the claim then follows by
countable additivity. This proves Theorem 2.1.1.
Remark 2.1.2. The uniqueness claim fails when the σ-finiteness condition
is dropped. Consider for instance the rational numbers X = Q, and let the
elementary sets be the finite unions of intervals [a, b)∩Q. Define the measure
μ(A) of an elementary set to be zero if A is empty, and +∞ otherwise. As

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
280 2. Related articles

the rationals are countable, we easily see that every set of rationals is mea-
surable. One easily verifies the precountable additivity condition (though
the σ-finiteness condition fails horribly). However, μ has multiple extensions
to the measurable sets; for instance, any positive scalar multiple of counting
measure is such an extension.
Remark 2.1.3. It is not difficult to show that the measure completion X
of X with respect to μ is the same as the topological closure of X (or of A)
with respect to the above pseudometric. Thus, for instance, a subset of [0, 1]
is Lebesgue measurable if and only if it can be approximated to arbitrary
accuracy (with respect to outer measure) by a finite union of intervals.

A particularly simple case of Theorem 2.1.1 occurs when X is a compact


Hausdorff totally disconnected space (i.e., a Stone space), such as the infinite
discrete cube {0, 1}N or any other Cantor space. In this case, the Borel σ-
algebra X is generated by the Boolean algebra A of clopen sets. Also,
as clopen sets here are simultaneously compact and open, we see that any
infinite cover of one clopen set by others automatically has a finite subcover.
From this, we conclude
Corollary 2.1.4. Let X be a compact Hausdorff totally disconnected space.
Then any finitely additive σ-finite measure on the clopen sets uniquely ex-
tends to a countably additive measure on the Borel sets.

By identifying {0, 1}N with [0, 1] up to a countable set, this provides


one means to construct Lebesgue measure on [0, 1]; similar constructions
are available for R or Rn .

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/03.
Thanks to Américo Tavares, JB, Max Menzies, and mmailliw/william for
corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.2

Amenability, the
ping-pong lemma, and
the Banach-Tarski
paradox

Notational convention: In this section (and in Section 2.4) only, we will


colour a statement red if it assumes the axiom of choice. (For the rest of
this text, the axiom of choice will be implicitly assumed throughout.)
The famous Banach-Tarski paradox asserts that one can take the unit
ball in three dimensions, divide it up into finitely many pieces, and then
translate and rotate each piece so that their union is now two disjoint unit
balls. As a consequence of this paradox, it is not possible to create a finitely
additive measure on R3 that is both translation and rotation invariant,
which can measure every subset of R3 , and which gives the unit ball a non-
zero measure. This paradox helps explain why Lebesgue measure (which is
countably additive and both translation and rotation invariant, and gives
the unit ball a non-zero measure) cannot measure every set, instead being
restricted to measuring sets that are Lebesgue measurable.
On the other hand, it is not possible to replicate the Banach-Tarski
paradox in one or two dimensions; the unit interval in R or unit disk in
R2 cannot be rearranged into two unit intervals or two unit disks using
only finitely many pieces, translations, and rotations, and indeed there do
exist non-trivial finitely additive measures on these spaces. However, it is
possible to obtain a Banach-Tarski type paradox in one or two dimensions

281

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
282 2. Related articles

using countably many such pieces; this rules out the possibility of extending
Lebesgue measure to a countably additive translation invariant measure on
all subsets of R (or any higher-dimensional space).
In this section we will establish all of the above results, and tie them
in with some important concepts and tools in modern group theory, most
notably amenability and the ping-pong lemma.

2.2.1. One-dimensional equidecomposability. Before we study the


three-dimensional situation, let us first review the simpler one-dimensional
situation. To avoid having to say “X can be cut up into finitely many pieces,
which can then be moved around to create Y ” all the time, let us make a
convenient definition:

Definition 2.2.1 (Equidecomposability). Let G = (G, ·) be a group acting


on a space X, and let A, B be subsets of X.
• We say that A, B are  finitely G-equidecomposable
 if there exist fi-
nite partitions A = ni=1 Ai and B = ni=1 Bi and group elements
g1 , . . . , gn ∈ G such that Bi = gi Ai for all 1 ≤ i ≤ n.
• We say that A, B are countably G-equidecomposable
∞ if there exist
countable partitions A = ∞ A
i=1 i and B = B
i=1 i and group ele-
ments g1 , g2 · · · ∈ G such that Bi = gi Ai for all i.
• We say that A is finitely G-paradoxical if it can be partitioned into
two subsets, each of which is finitely G-equidecomposable with A.
• We say that A is countably G-paradoxical if it can be partitioned into
two subsets, each of which is countably G-equidecomposable with A.

One can of course make similar definitions when G = (G, +) is an addi-


tive group rather than a multiplicative one.
Clearly, finite G-equidecomposability implies countable G-equidecom-
posability, but the converse is not true. Observe that any finitely (resp.
countably) additive and G-invariant measure on X that measures every sin-
gle subset of X, must give either a zero measure or an infinite measure to a
finitely (resp. countably) G-paradoxical set. Thus, paradoxical sets provide
significant obstructions to constructing additive measures that can measure
all sets.

Example 2.2.2. If R acts on itself by translation, then [0, 2] is finitely


R-equidecomposable with [10, 11) ∪ [21, 22], and R is finitely R-equidecom-
posable with (−∞, −10] ∪ (10, +∞).

Example 2.2.3. If G acts transitively on X, then any two finite subsets of


X are finitely G-equidecomposable iff they have the same cardinality and

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 283

any two countably infinite sets of X are countably G-equidecomposable. In


particular, any countably infinite subset of X is countably G-paradoxical.
Exercise 2.2.1. Show that finite G-equidecomposability and countable
G-equidecomposability are both equivalence relations.
Exercise 2.2.2 (Banach-Schröder-Bernstein theorem). Let G act on X,
and let A, B be subsets of X.
(i) If A is finitely G-equidecomposable with a subset of B, and B is
finitely G-equidecomposable with a subset of A, show that A and B
are finitely G-equidecomposable with each other. (Hint: Adapt the
proof of the Schröder-Bernstein theorem; see Section 1.13 of Volume
II.)
(ii) If A is finitely G-equidecomposable with a superset of B, and B is
finitely G-equidecomposable with a superset of A, show that A and
B are finitely G-equidecomposable with each other. (Hint: Use part
(i).)
Show that claims (i) and (ii) also hold when “finitely” is replaced by “count-
ably”.
Exercise 2.2.3. Show that if G acts on X, A is a subset of X which is
finitely (resp. countably) G-paradoxical, and x ∈ X, then the recurrence
set {g ∈ G : gx ∈ A} is also finitely (resp. countably) G-paradoxical (where
G acts on itself by translation).

Let us first establish countable equidecomposability paradoxes in the


reals.
Proposition 2.2.4. Let R act on itself by translations. Then [0, 1] and R
are countably R-equidecomposable.

Proof. By Exercise 2.2.2, it will suffice to show that some set contained in
[0, 1] is countably R-equidecomposable with R. Consider the space R/Q of
all cosets x + Q of the rationals. By the axiom of choice, we can express
each such coset as x + Q for some x ∈ [0, 1/2], thus we can partition R =

x∈E x+Q for some E ⊂ [0, 1/2]. By Example 2.2.3,  Q∩[0, 1/2] is countably
Q-equidecomposable with Q, which implies
 that x∈E x + (Q ∩ [0, 1/2]) is
countably R-equidecomposable with x∈E x + Q. Since the latter set is R
and the former set is contained in [0, 1], the claim follows. 

Of course, the same proposition holds if [0, 1] is replaced by any other


interval. As a quick consequence of this proposition and Exercise 2.2.2, we
see that any subset of R containing an interval is R-equidecomposable with
R. In particular, we have

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
284 2. Related articles

Corollary 2.2.5. Any subset of R containing an interval is countably R-


paradoxical.

In particular, we see that any countably additive translation-invariant


measure that measures every subset of R must assign a zero or infinite
measure to any set containing an interval. In particular, it is not possible
to extend Lebesgue measure to measure all subsets of R.
We now turn from countably paradoxical sets to finitely paradoxical sets.
Here, the situation is quite different: we can rule out many sets from being
finitely paradoxical. The simplest example is that of a finite set:

Proposition 2.2.6. If G acts on X, and A is a non-empty finite subset of


X, then A is not finitely (or countably) G-paradoxical.

Proof. One easily sees that any two sets that are finitely or countably
G-equidecomposable must have the same cardinality. The claim follows. 

Now we consider the integers.

Proposition 2.2.7. Let the integers Z act on themselves by translation.


Then Z is not finitely Z-paradoxical.

Proof. The integers are of course infinite, and so Proposition 2.2.6 does not
apply directly. However, the key point is that the integers can be efficiently
truncated to be finite, and so we will be able to adapt the argument used
to prove Proposition 2.2.6 to this setting.
Let’s see how. Suppose for contradiction that we could partition Z into
two sets
 A and B, whichare in turn partitioned into finitely many pieces
A = ni=1 Ai and B = m B , such that Z can be partitioned as Z =
n m j=1 j
i=1 Ai + ai and Z = j=1 Bj + bj for some integers a1 , . . . , an , b1 , . . . , bm .
Now let N be a large integer (much larger than n, m, a1 , . . . , an , b1 , . . .,
bm ). We truncate Z to the interval [−N, N ] := {−N, . . . , N }. Clearly,

n
(2.1) A ∩ [−N, N ] = Ai ∩ [−N, N ]
i=1

and

n
(2.2) [−N, N ] = (Ai + ai ) ∩ [−N, N ].
i=1

From (2.2) we see that the set ni=1 (Ai ∩ [−N, N ]) + ai differs from [−N, N ]
by only O(1) elements, where the bound in the O(1) expression can depend
on n, a1 , . . . , an but does not depend on N . (The point here is that [−N, N ]

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 285

is almost translation-invariant in some sense.) Comparing this with (2.1)


we see that
(2.3) |[−N, N ]| ≤ |A ∩ [−N, N ]| + O(1).
Similarly with A replaced by B. Summing, we obtain
(2.4) 2|[−N, N ]| ≤ |[−N, N ]| + O(1),
but this is absurd for N sufficiently large, and the claim follows. 
Exercise 2.2.4. Use the above argument to show that in fact no infinite
subset of Z is finitely Z-paradoxical; combining this with Example 2.2.3, we
see that the only finitely Z-paradoxical set of integers is the empty set.

The above argument can be generalised to an important class of groups:


Definition 2.2.8 (Amenability). Let G = (G, ·) be a discrete, at most
countable group. A Følner sequence is a sequence F1 , F2 , F3 , . . . of finite
 |gFN ΔFN |
subsets of G with ∞ N =1 FN = G with the property that limN →∞ |FN | =
0 for all g ∈ G, where Δ denotes symmetric difference. A discrete, at most
countable group G is amenable if it contains at least one Følner sequence.
Of course, one can define the same concept for additive groups G = (G, +).
Remark 2.2.9. One can define amenability for uncountable groups by re-
placing the notion of a Følner sequence with a Følner net. Similarly, one
can define amenability for locally compact Hausdorff groups equipped with a
Haar measure by using that measure in place of cardinality in the above def-
inition. However, we will not need these more general notions of amenability
here. The notion of amenability was first introduced (though not by this
name, or by this definition) by von Neumann, precisely in order to study
these sorts of decomposition paradoxes. We will discuss amenability further
in Section 2.8.
Example 2.2.10. The sequence [−N, N ] for N = 1, 2, 3, . . . is a Følner
sequence for the integers Z, which are hence an amenable group.
Exercise 2.2.5. Show that any abelian discrete group that is at most count-
able is amenable.
Exercise 2.2.6. Show that any amenable discrete group G that is at most
countable is not finitely G-paradoxical, when acting on itself. Combined
with Exercise 2.2.3, we see that if such a group G acts on a non-empty
space X, then X is not finitely G-paradoxical.
Remark 2.2.11. Exercise 2.2.6 suggests that an amenable group G should
be able to support a non-trivial finitely additive measure which is invariant
under left-translations, and that it can measure all subsets of G. Indeed,
one can even create a finitely additive probability measure, for instance by

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
286 2. Related articles

selecting a non-principal ultrafilter p ∈ βN and a Følner sequence (Fn )∞


n=1
and defining μ(A) := limn→p |A ∩ Fn |/|Fn | for all A ∈ G.
The reals R = (R, +) (which we will give the discrete topology!) are
uncountable, and thus not amenable by the narrow definition of Definition
2.2.8. However, observe from Exercise 2.2.5 that any finitely generated sub-
group of the reals is amenable (or equivalently, that the reals themselves
with the discrete topology are amenable, using the Følner net generalisation
of Definition 2.2.8. Also, we have the following easy observation:
Exercise 2.2.7. Let G act on X, and let A be a subset of X which is finitely
G-paradoxical. Show that there exists a finitely generated subgroup H of G
such that A is finitely H-paradoxical.
From this we see that R is not finitely R-paradoxical. But we can in
fact say much more:
Proposition 2.2.12. Let A be a non-empty subset of R. Then A is not
finitely R-paradoxical.

Proof. Suppose, for contradiction, that we can partition A into two sets
A = A1 ∪ A2 which are both finitely R-equidecomposable with A. This
gives us two maps f1 : A → A1 , f2 : A → A2 which are piecewise given by
a finite number of translations; thus there exists a finite set g1 , . . . , gd ∈ R
such that fi (x) ∈ x + {g1 , . . . , gd } for all x ∈ A and i = 1, 2.
For any integer N ≥ 1, consider the 2N composition maps fi1 ◦ · · · ◦ fiN :
A → A for i1 , . . . , iN ∈ {1, 2}. From the disjointness of A1 , A2 and an easy
induction, we see that the ranges of all these maps are disjoint, and so for
any x ∈ A the 2N quantities fi1 ◦ · · · ◦ fiN (x) are distinct. On the other
hand, we have
(2.5) fi1 ◦ · · · ◦ fiN (x) ∈ x + {g1 , . . . , gd } + · · · + {g1 , . . . , gd }.
Simple combinatorics (relying primarily on the abelian nature of (R, +)
shows that the number of values on the right-hand side of (2.5) is at most
N d . But for sufficiently large N , we have 2N > N d , giving the desired
contradiction. 

Let us call a group G supramenable if every non-empty subset of G is


not finitely G-paradoxical; thus R is supramenable. From Exercise 2.2.3 we
see that if a supramenable group acts on any space X, then the only finitely
G-paradoxical subset of X is the empty set.
Exercise 2.2.8. We say that a group G = (G, ·) has subexponential growth if
for any finite subset S of G, we have limn→∞ |S n |1/n = 1, where S n = S·. . .·S
is the set of n-fold products of elements of S. Show that every group of
subexponential growth is supramenable.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 287

Exercise 2.2.9. Show that every abelian group has subexponential growth
(and is thus supramenable). More generally, show that every nilpotent group
has subexponential growth and is thus also supramenable.
Exercise 2.2.10. Show that if two finite unions of intervals in R are finitely
R-equidecomposable, then they must have the same total length. (Hint:
Reduce to the case when both sets consist of a single interval. First show
that the lengths of these intervals cannot differ by more than a factor of
two, and then amplify this fact by iteration to conclude the result.)
Remark 2.2.13. We already saw that amenable groups G admit finitely
additive translation-invariant probability measures that measure all subsets
of G (Remark 2.2.11 can be extended to the uncountable case); in fact, this
turns out to be an equivalent definition of amenability. It turns out that
supramenable groups G enjoy a stronger property, namely that given any
non-empty set A on G, there exists a finitely additive translation-invariant
measure on G that assigns the measure 1 to A; this is basically a deep result
of Tarski.

2.2.2. Two-dimensional equidecomposability. Now we turn to equide-


composability on the plane R2 . The nature of equidecomposability depends
on what group G of symmetries we wish to act on the plane.
Suppose first that we only allow ourselves to translate various sets in the
planes, but not to rotate them; thus G = R2 . As this group is abelian, it is
supramenable by Exercise 2.2.9, and so any non-empty subset A of the plane
will not be finitely R2 -paradoxical; indeed, by Remark 2.2.13, there exists a
finitely additive translation-invariant measure that gives A the measure 1.
On the other hand, it is easy to adapt Corollary 2.2.5 to see that any subset
of the plane containing a ball will be countably R2 -paradoxical.
Now suppose we allow both translations and rotations, thus G is now
the group SO(2)  R2 of (orientation-preserving) isometries x → eiθ x + v
for v ∈ R2 and θ ∈ R/2πZ, where eiθ denotes the anticlockwise rotation by
θ around the origin. This group is no longer abelian, or even nilpotent, so
Exercise 2.2.9 no longer applies. Indeed, it turns out that G is no longer
supramenable. This is a consequence of the following three lemmas.
Lemma 2.2.14. Let G be a group which contains a free semigroup on two
generators (in other words, there exist group elements g, h ∈ G such that
all the words involving g and h (but not g −1 or h−1 ) are distinct). Then G
contains a non-empty finitely G-paradoxical set. In other words, G is not
supramenable.

Proof. Let S be the semigroup generated by g and h (i.e., the set of all words
formed by g and h, including the empty word (i.e., group identity). Observe

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
288 2. Related articles

that gS and hS are disjoint subsets of S that are clearly G-equidecomposable


with S. The claim then follows from Exercise 2.2.2. 
Lemma 2.2.15 (Semigroup ping-pong lemma). Let G act on a space X,
let g, h be elements of G, and suppose that there exists a non-empty subset
A of X such that gA and hA are disjoint subsets of A. Then g, h generate
a free semigroup.

Proof. As in the proof of Proposition 2.2.12, we see from induction that


for two different words w, w generated by g, h, the sets wA and w A are
disjoint, and the claim follows. 
Lemma 2.2.16. The group G = SO(2)  R2 contains a free semigroup on
two generators.

Proof. It is convenient to identify R2 with the complex plane C. We set g


to be the rotation gx := ωx for some transcendental phase ω = e2πiθ (such a
phase must exist, since the set of algebraic complex numbers is countable),
and let h be the translation hx := x + 1. Observe that g and h act on the
set A of polynomials in ω with non-negative integer coefficients, and that
gA and hA are disjoint. The claim now follows from Lemma 2.2.15. 

Combining Lemma 2.2.14 and Lemma 2.2.16 to create a countable,


finitely paradoxical subset of SO(2)  R2 and then letting that set act on a
generic point in the plane (noting that each group element in SO(2)  R2
has at most one fixed point), we obtain
Corollary 2.2.17 (Sierpinski-Mazurkiewicz paradox). There exist non-
empty finitely SO(2)  R2 -paradoxical subsets of the plane.

We have seen that the group of rigid motions is not supramenable. Nev-
ertheless, it is still amenable, thanks to the following lemma.
Lemma 2.2.18. Suppose one has a short exact sequence 0 → H → G →
K → 0 of discrete, at most countable, groups, and suppose one has a choice
function φ : K → G that inverts the projection of G to K (the existence
of which is automatic, from the axiom of choice, and also follows if G is
finitely generated ). If H and K are amenable, then so is G.

Proof. Let (An )∞ ∞


n=1 and (Bn )n=1 be Følner sequences for H and K, respec-
N → N be a rapidly growing function, and let (Fn )∞
tively. Let f :  n=1 be
the set Fn := x∈Bn φ(x) · Af (n) . One easily verifies that this is a Følner
sequence for G if f is sufficiently rapidly growing. 
Exercise 2.2.11. Show that any finitely generated solvable group is amen-
able. More generally, show that any discrete, at most countable, solvable
group is amenable.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 289

Exercise 2.2.12. Show that any finitely generated subgroup of SO(2)  R2


is amenable. (Hint: Use the short exact sequence 0 → R2 → SO(2) 
R2 → SO(2) → 0, which shows that SO(2)  R2 is solvable (in fact it is
metabelian)). Conclude that R2 is not finitely SO(2)  R2 -paradoxical.

Finally, we show a result of Banach.


Proposition 2.2.19. The unit disk D in R2 is not finitely SO(2)  R2 -
paradoxical.

Proof. If the claim failed, then D would be finitely SO(2)  R2 -equidecom-


posable with a disjoint union of two copies of D, say D and D + v for some
vector v of length greater than 2. By Exercise 2.2.7, we can then find a sub-
group G of SO(2) × R2 generated by a finite number of rotations x → eiθj x
for j = 1, . . . , J and translations x → x + vk for k = 1, . . . , K such that D
and D ∪ (D + v) are finitely G-equidecomposable. Indeed, we may assume
that the rigid motions that move pieces of D to pieces of D ∪ (D + v) are of
the form x → eiθj x + vk for some 1 ≤ j ≤ J, 1 ≤ k ≤ K, thus

J K
(2.6) D ∪ (D + v) = eiθj Dj,k + vk
j=1 k=1
J K
for some partition D = j=1 k=1 Dj,k of the disk.
By amenability of the rotation group SO(2), one can find a finite set
Φ ⊂ SO(2) of rotations such that eiθj Φ differs from Φ by at most 0.01|Φ|
elements for all 1 ≤ j ≤ J. Let N be a large integer, and let ΓN ⊂ R2 be
the set of all linear combinations of eiθ vk for θ ∈ Φ and 1 ≤ k ≤ K with
coefficients in {−N, . . . , N }. Observe that ΓN is a finite set whose cardinality
grows at most polynomially in N . Thus, by the pigeonhole principle, one
can find arbitrarily large N such that
(2.7) |D ∩ ΓN +10 | ≤ 1.01|D ∩ ΓN |.
On the other hand, from (2.6) and the rotation-invariance of the disk we
have
2|D ∩ ΓN | = 2|eiθ (D) ∩ ΓN |
≤ |eiθ (D ∪ (D + v)) ∩ ΓN +5 |
(2.8) J K
≤ |ei(θ+θj ) Dj,k ∩ ΓN +10 |
j=1 k=1

for all θ ∈ Φ. Averaging this over all θ ∈ Φ, we conclude


(2.9) 2|D ∩ ΓN | ≤ 1.01|D ∩ ΓN +10 |,
contradicting (2.7). 

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
290 2. Related articles

Remark 2.2.20. Banach in fact showed the slightly stronger statement


that any two finite unions of polygons of differing area were not finitely
SO(2) × R2 -equidecomposable. (The converse is also true and is known as
the Bolyai-Gerwien theorem.)

Exercise 2.2.13. Show that all the claims in this section continue to hold
if we replace SO(2)  R2 by the slightly larger group Isom(R)2 = O(2)  R2
of isometries (not necessarily orientation-preserving.

Remark 2.2.21. As a consequence of Remark 2.2.20, the unit square is


not SO(2) × R2 -paradoxical. However, it is SL(2) × R2 -paradoxical; this is
known as the von Neumann paradox.

2.2.3. Three-dimensional equidecomposability. We now turn to the


three-dimensional setting. The new feature here is that the group SO(3) ×
R3 of rigid motions is no longer abelian (as in one dimension) or solvable
(as in two dimensions), but it now contains a free group on two generators
(not just a free semigroup), as per Lemma 2.2.16. The significance of this
fact comes from

Lemma 2.2.22. The free group F2 on two generators is finitely F2 -para-


doxical.

Proof. Let a, b be the two generators of F2 . We can partition F2 = {1} ∪


Wa ∪ Wb ∪ Wa−1 ∪ Wb−1 , where Wc is the collection of reduced words of F2
that begin with c. From the identities

(2.10) Wa−1 = a−1 · (F2 \Wa ), Wb−1 = b−1 · (F2 \Wb ),

we see that F2 is finitely F2 -equidecomposable with both Wa ∪ Wa−1 and


Wc ∪ Wc−1 , and the claim now follows from Exercise 2.2.2. 

Corollary 2.2.23. Suppose that F2 acts freely on a space X (i.e., gx = x


whenever x ∈ X and g ∈ F2 is not the identity). Then X is finitely F2 -
paradoxical.

Proof. Using the axiom of choice, we can partition X as X = x∈Γ F2 x for
some subset Γ of X. The claim now follows from Lemma 2.2.22. 

Next, we embed the free group inside the rotation group SO(3) using
the following useful lemma (cf. Lemma 2.2.15).

Exercise 2.2.14 (Ping-pong lemma). Let G be a group acting on a set X.


Suppose that there exist disjoint subsets A+ , A− , B+ , B− of X, whose union

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 291

is not all of X, and elements a, b ∈ G, such that2


a(X\A− ) ⊂ A+ , a−1 (X\A+ ) ⊂ A− ,
(2.11)
b(X\B− ) ⊂ B+ , b−1 (X\B+ ) ⊂ B− .
Show that a, b generate a free group.
Proposition 2.2.24. SO(3) contains a copy of the free group on two gen-
erators.

Proof. It suffices to find a space X that two elements of SO(3) act on in a


way that Exercise 2.2.14 applies. There are many such constructions. One
such construction,3 is based on passing from the reals to the 5-adics, where
−1 is a square root and so SO(3) becomes isomorphic to P SL(2). At the
end of the day, one takes
⎛ ⎞ ⎛ ⎞
3/5 4/5 0 1 0 0
(2.12) a = ⎝−4/5 3/5 0⎠ , b = ⎝0 3/5 −4/5⎠
0 0 1 0 4/5 3/5
and
⎛ ⎞
x
A± := 5 · { y ⎠ : x, y, z ∈ Z, x = ±3y mod 5, z = 0 mod 5},
Z ⎝
z
⎛ ⎞
(2.13) x
B± := 5Z · {⎝y ⎠ : x, y, z ∈ Z, z = ±3y mod 5, x = 0 mod 5},
z
 
X := A− ∪ A+ ∪ B− ∪ B+ ∪ { 0 1 0 },
where 5Z denotes the integer powers of 5 (which act on column vectors in
the obvious manner). The verification of the ping-pong inclusions (2.11) is
a routine application of modular arithmetic. 
Remark 2.2.25. This is a special case of the Tits alternative.
Corollary 2.2.26 (Hausdorff paradox). There exists a countable subset E
of the sphere S 2 such that S 2 \E is finitely SO(3)-paradoxical, where SO(3)
of course acts on S 2 by rotations.

Proof. Let F2 ⊂ SO(3) be a copy of the free group on two generators, as


given by Proposition 2.2.24. Each rotation in F2 fixes exactly two points on
the sphere. Let E be the union of all these points; this is countable since
2 If drawn correctly, a diagram of the inclusions in (2.11) resembles a game of doubles ping-

pong of A+ , A− versus B+ , B− ; hence the name.


3 See http://sbseminar.wordpress.com/2007/09/17/ for more details and motivation for this

construction.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
292 2. Related articles

F2 is countable. The action of F2 on SO(3)\E is free, and the claim now


follows from Corollary 2.2.23. 
Corollary 2.2.27 (Banach-Tarski paradox on the sphere). S 2 is finitely
SO(3)-paradoxical.

Proof (Sketch). Iterating the Hausdorff paradox, we see that S 2 \E is


finitely SO(3)-equidecomposable to four copies of S 2 \E, which can easily
be used to cover two copies of S 2 (with some room to spare), by randomly
rotating each of the copies. The claim now follows from Exercise 2.2.2. 
Exercise 2.2.15 (Banach-Tarski paradox on R3 ). Show that the unit ball
in R3 is finitely SO(3)  R3 -paradoxical.
Exercise 2.2.16. Extend these three-dimensional paradoxes to higher di-
mensions.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/08.
Thanks to Harald Helfgott for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.3

The Stone and


Loomis-Sikorski
representation
theorems

A (concrete) Boolean algebra is a pair (X, B), where X is a set and B is a


collection of subsets of X which contain the empty set ∅ and which is closed
under unions A, B → A ∪ B, intersections A, B → A ∩ B, and complements
A → Ac := X\A. The subset relation ⊂ also gives a relation on B. Because
the B is concretely represented as subsets of a space X, these relations
automatically obey various axioms, in particular, for any A, B, C ∈ B,

(i) ⊂ is a partial ordering on B, and A and B have join A ∪ B and meet


A ∩ B.
(ii) We have the distributive laws A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) and
A ∩ (B ∪ C) = A ∪ (B ∩ C).
(iii) ∅ is the minimal element of the partial ordering ⊂ and ∅c is the
maximal element.
(iv) A ∩ Ac = ∅ and A ∪ Ac = ∅c .

(More succinctly: B is a lattice which is distributive, bounded, and comple-


mented.)
We can then define an abstract Boolean algebra B = (B, ∅, ·c , ∪, ∩, ⊂) to
be an abstract set B with the specified objects, operations, and relations that
obey the axioms (i)–(iv). Of course, some of these operations are redundant;

293

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
294 2. Related articles

for instance, intersection can be defined in terms of complement and union


by de Morgan’s laws. In the literature, different authors select different
initial operations and axioms when defining an abstract Boolean algebra,
but they are all easily seen to be equivalent to each other. To emphasise the
abstract nature of these algebras, the symbols ∅, ·c , ∪, ∩, ⊂ are often replaced
with other symbols such as 0, ·, ∨, ∧, <.
Clearly, every concrete Boolean algebra is an abstract Boolean alge-
bra. In the converse direction, we have Stone’s representation theorem (see
below), which asserts (among other things) that every abstract Boolean al-
gebra is isomorphic to a concrete one (and even constructs this concrete
representation of the abstract Boolean algebra canonically). So, up to (ab-
stract) isomorphism, there is really no difference between a concrete Boolean
algebra and an abstract one.
Now let us turn from Boolean algebras to σ-algebras.
A concrete σ-algebra (also known as a measurable space) is a pair (X, B),
where X is a set, and B is a collection of subsets of X which contains ∅
and are closed under countable unions, countable intersections, and comple-
ments; thus every concrete σ-algebra is a concrete Boolean algebra, but not
conversely. As before, concrete σ-algebras come equipped with the struc-
tures ∅, ·c , ∪, ∩, ⊂ which obey axioms (i)–(iv),
∞ but they also come with the
operations of countable ∞
union (An )n=1 → n=1 An and countable intersec-
∞
tion (An )∞n=1 → n=1 An , which obey an additional axiom:

(v) Any A1 , A2 , . . . of elements of B has supremum


∞ countable family ∞
A
n=1 n and infimum n=1 An .

As with Boolean algebras, one  cannow define an abstract σ-algebra


to be a set B = (B, ∅, ·c , ∪, ∩, ⊂, ∞n=1 , ∞
n=1 with the indicated objects,
)
operations, and relations, which obeys axioms (i)–(v). Again, every concrete
σ-algebra is an abstract one; but is it still true that every abstract σ-algebra
is representable as a concrete one?
The answer turns out to be no, but the obstruction can be described
precisely (namely, one needs to quotient out an ideal of null sets from
the concrete σ-algebra), and there is a satisfactory representation theorem,
namely the Loomis-Sikorski representation theorem (see below). As a corol-
lary of this representation theorem, one can also represent abstract measure
spaces (B, μ) (also known as measure algebras) by concrete measure spaces,
(X, B, μ), after quotienting out by null sets.
In the rest of this section, I will state and prove these representation
theorems. These theorems help explain why it is “safe” to focus attention
primarily on concrete σ-algebras and measure spaces when doing measure
theory, since the abstract analogues of these mathematical concepts are

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.3. Stone and Loomis-Sikorski 295

largely equivalent to their concrete counterparts. (The situation is quite


different for non-commutative measure theories, such as quantum probabil-
ity, in which there is basically no good representation theorem available to
equate the abstract with the classically concrete, but I will not discuss these
theories here.)

2.3.1. Stone’s representation theorem. We first give the class of Boo-


lean algebras the structure of a category:
Definition 2.3.1 (Boolean algebra morphism). A morphism φ : A → B
from one abstract Boolean algebra to another is a map which preserves the
empty set, complements, unions, intersections, and the subset relation (e.g.,
φ(A ∪ B) = φ(A) ∪ φ(B) for all A, B ∈ A. An isomorphism is a morphism
φ : A → B which has an inverse morphism φ−1 : B → A. Two Boolean
algebras are isomorphic if there is an isomorphism between them.

Note that if (X, A), (Y, B) are concrete Boolean algebras, and if f : X →
Y is a map which is measurable in the sense that f −1 (B) ∈ A for all B ∈ B,
then the inverse of f is a Boolean algebra morphism f −1 : B → A which goes
in the reverse (i.e., contravariant) direction to that of f . To state Stone’s
representation theorem we need another definition.
Definition 2.3.2 (Stone space). A Stone space is a topological space X =
(X, F) which is compact, Hausdorff, and totally disconnected. Given a Stone
space, define the clopen algebra Cl(X) of X to be the concrete Boolean
algebra on X consisting of the clopen sets (i.e., sets that are both closed
and open).

It is easy to see that Cl(X) is indeed a concrete Boolean algebra for any
topological space X. The additional properties of being compact, Hausdorff,
and totally disconnected are needed in order to recover the topology F of
X uniquely from the clopen algebra. Indeed, we have
Lemma 2.3.3. If X is a Stone space, then the topology F of X is generated
by the clopen algebra Cl(X). Equivalently, the clopen algebra forms an open
base for the topology.

Proof. Let x ∈ X be a point, and let K be the intersection of all the clopen
sets containing x. Clearly, K is closed. We claim that K = {x}. If this is
not the case, then (since X is totally disconnected) K must be disconnected,
thus K can be separated non-trivially into two closed sets K = K1 ∪ K2 .
Since compact Hausdorff spaces are normal, we can write K1 = K ∩ U1 and
K2 = K ∩ U2 for some disjoint open U1 , U2 . Since the intersection of all the
clopen sets containing x with the closed set (U1 ∪ U2 )c is empty, we see from
the finite intersection property that there must be a finite intersection K 

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
296 2. Related articles

of clopen sets containing x that is contained inside U1 ∪ U2 . In particular,


K  ∩ U1 and K  ∩ U2 are clopen and do not contain K. But this contradicts
the definition of K (since x is contained in one of K  ∩ U1 and K  ∩ U2 ).
Thus K = {x}.
Another application of the finite intersection property then reveals that
every open neighbourhood of x contains at least one clopen set containing
x, and so the clopen sets form a base as required. 
Exercise 2.3.1. Show that two Stone spaces have isomorphic clopen alge-
bras if and only if they are homeomorphic.

Now we turn to the representation theorem.


Theorem 2.3.4 (Stone representation theorem). Every abstract Boolean
algebra B is equivalent to the clopen algebra Cl(X) of a Stone space X.

Proof. We will need the binary abstract Boolean algebra {0, 1}, with the
usual Boolean logic operations. We define X := Hom(B, {0, 1}) to be the
space of all morphisms from B to {0,1}. Observe that each point x ∈ X can
be viewed as a finitely additive measure μx : B → {0, 1} that takes values in
{0, 1}. In particular, this makes X a closed subset of {0, 1}B (endowed with
the product topology). The space {0, 1}B is Hausdorff, totally disconnected,
and (by Tychonoff’s theorem, Theorem 1.8.14) compact, and so X is also;
in other words, X is a Stone space. Every B ∈ B induces a cylinder set
CB ⊂ {0, 1}B , consisting of all maps μ : B → {0, 1} that map B to 1. If we
define φ(B) := CB ∩ X, it is not hard to see that φ is a morphism from B
to Cl(X). Since the cylinder sets are clopen and generate the topology of
{0, 1}B , we see that φ(B) of clopen sets generates the topology of X. Using
compactness, we then conclude that every clopen set is the finite union of
finite intersections of elements of φ(B); since φ(B) is an algebra, we thus see
that φ is surjective.
The only remaining task is to check that φ is injective. It is sufficient
to show that φ(A) is non-empty whenever A ∈ B is not equal to ∅. But
by Zorn’s lemma (Section 2.4), we can place A inside a maximal proper
filter (i.e., an ultrafilter ) p. The indictator 1p : B → {0, 1} of p can then be
verified to be an element of φ(A), and the claim follows. 
Remark 2.3.5. If B = 2Y is the power set of some set Y , then the Stone
space given by Theorem 2.3.4 is the Stone-Čech compactification of Y (which
we give the discrete topology); see Section 2.5.
Remark 2.3.6. Lemma 2.3.3 and Theorem 2.3.4 can be interpreted as
giving a duality between the category of Boolean algebras and the cate-
gory of Stone spaces, with the duality maps being B → Hom(B, {0, 1}) and
X → Cl(X). (The duality maps are (contravariant) functors which are

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.3. Stone and Loomis-Sikorski 297

inverses up to natural transformations.) It is the model example of the


more general Stone duality between certain partially ordered sets and cer-
tain topological spaces. The idea of dualising a space X by considering the
space of its morphisms to a fundamental space (in this case, {0, 1}) is a
common one in mathematics; for instance, Pontryagin duality in the con-
text of Fourier analysis on locally compact abelian groups provides another
example (with the fundamental space in this case being the unit circle R/Z);
see Section 1.12. Other examples include the Gelfand representation of C ∗
algebras (here the fundamental space is the complex numbers C; see Section
1.10.4) and the ideal-variety correspondence that provides the duality be-
tween algebraic geometry and commutative algebra (here the fundamental
space is the base field k). In fact there are various connections between all
of the dualities mentioned above.
Exercise 2.3.2. Show that any finite Boolean algebra is isomorphic to the
power set of a finite set. (This is a special case of Birkhoff ’s representation
theorem.)

2.3.2. The Loomis-Sikorski representation theorem. Now we turn


to abstract σ-algebras. We can of course adapt Definition 2.3.1 to define
the notion of a morphism or isomorphism between abstract σ-algebras, and
to define when two abstract σ-algebras are isomorphic. Another important
notion for us will be that of a quotient σ-algebra.
Definition 2.3.7 (Quotient σ-algebras). Let B be an abstract σ-algebra. A
σ-ideal in B is a subset N of B which contains ∅, is closed under countable
unions, and is downwardly closed (thus if N ∈ N and A ∈ B is such that
A ⊂ N , then A ∈ N ). If N is a σ-ideal, then we say that two elements
of B are equivalent modulo N if their symmetric difference lies in N . The
quotient of B by this equivalence relation is denoted B/N , and can be given
the structure of an abstract σ-algebra in a straightforward manner.
Example 2.3.8. If (X, B, μ) is a measure space, then the collection N
of sets of measure zero is a σ-ideal, so that we can form the abstract σ-
algebra B/N . This freedom to quotient out the null sets is only available
in the abstract setting, not the concrete one, and is perhaps the primary
motivation for introducing abstract σ-algebras into measure theory in the
first place.

One might hope that there is an analogue of Stone’s representation the-


orem holds for σ-algebras. Unfortunately, this is not the case:
Proposition 2.3.9. Let B be the Borel σ-algebra on [0, 1], and let N be
the σ-ideal consisting of those sets with Lebesgue measure zero. Then the
abstract σ-algebra B/N is not isomorphic to a concrete σ-algebra.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
298 2. Related articles

Proof. Suppose for contradiction that we had an isomorphism φ : B/N →


A to some concrete σ-algebra (X, A); this induces a map φ : B → A which
sends null sets to the empty set. Let x be a point in X. (It is clear that
X must be non-empty.) Observe that any Borel set E in [0, 1] can be
partitioned into two Borel subsets whose Lebesgue measure is exactly half
that of E. As a consequence, we see that if there exists a Borel set B
such that φ(B) contains x, then there exists another Borel set B  of half
the measure with φ(B  ) contains x. Iterating this (starting with [0, 1]) we
see that there exist Borel sets B of arbitrarily small measure with φ(B)
containing x. Taking countable intersections, we conclude that there exists a
null set N whose image φ(N ) contains x; but φ(N ) is empty, a contradiction.


However, it turns out that quotienting out by ideals is the only obstruc-
tion to having a Stone-type representation theorem. Namely, we have

Theorem 2.3.10 (Loomis-Sikorski representation theorem). Let B be an


abstract σ-algebra. Then there exists a concrete σ-algebra (X, A) and a σ-
ideal N of A such that B is isomorphic to A/N .

Proof. We use the argument of Loomis [Lo1946]. Applying Stone’s repre-


sentation theorem, we can find a Stone space X such that there is a Boolean
algebra isomorphism φ : B → Cl(X) from B (viewed now only as a Boolean
algebra rather than a σ-algebra to the clopen algebra of X. Let A be the
Baire σ-algebra of X, i.e., the σ-algebra generated by Cl(X). The map φ
need not be a σ-algebra isomorphism, being merely a Boolean algebra iso-
morphism one instead; it preserves finite unions and intersections, but need
not
∞ preserve countable∞ ones. In particular, if B1 , B2 · · · ∈ B are such that
B
n=1 n = ∅, then φ(Bn ) ∈ A need not be empty.
n=1

Let us call sets n=1 φ(Bn ) of this form basic null sets, and let N be
the collection of sets in A which can be covered by at most countably many
basic null sets.
It is not hard to see that N is a σ-ideal in A. The map φ then descends
to a map φ : B → A/N . It is not hard to see that ∞ φ is a Boolean algebra
morphism. Also, if B 1 , B2 · · · ∈ B are such that n=1 Bn = ∅, then from
construction we have ∞ n=1 φ(B n ) = ∅. From these two facts one can easily
show that φ is in fact a σ-algebra morphism. Since φ(B) = Cl(X) generates
A, φ(B) must generate A/N , and so φ is surjective.
The only remaining task is to show that φ is injective. As before, it
suffices to show that φ(A) = ∅ when A = ∅. Suppose for contradiction
that A = ∅ and φ(A) = ∅; then φ(A) can be covered by a countable family
∞ (i) ∞ (i)
n=1 φ(An ) of basic null sets, where n=1 An = ∅ for each i. Since A =

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.3. Stone and Loomis-Sikorski 299


∅ and ∞
(1) (1)
n=1 An = ∅, we can find n1 such that A\An1 = ∅ (where of
course A\B := A ∩ B c ). Iterating this, we can find n2 , n3 , n4 , . . . such that
(1) (k)
A\(An1 ∪ · · · ∪ Ank ) = ∅ for all k. Since φ is a Boolean space isomorphism,
we conclude that φ(A) is not covered by any finite subcollection of the
(1) (2)
φ(An1 ), φ(An2 ), . . .. But all of these sets are clopen, so by compactness,
(1) (2)
φ(A) is not covered by the entire collection φ(An1 ), φ(An2 ), . . .. But this
∞ (i)
contradicts the fact that φ(A) is covered by the n=1 φ(An ). 

Remark 2.3.11. The proof above actually gives a little bit more structure
on X, A, namely it gives X the structure of a Stone space, with A being
its Baire σ-algebra. Furthermore, the ideal N constructed in the proof is in
fact the ideal of meager Baire sets. The only difficult step is to show that
every closed Baire set S with empty interior is in N , i.e., it is a countable
intersection of clopen sets. To see this, note that S is generated by a count-
able subalgebra of B which corresponds to a continuous map f from X to
the Cantor set K (since K is dual to the free Boolean algebra on countably
many generators). Then f (S) is closed in K and is hence a countable inter-
section of clopen sets in K, which pull back to countably many clopen sets
on X whose intersection is f −1 (f (S)). But the fact that S is generated by
the subalgebra defining f can easily be seen to imply that f −1 (f (S)) = S.

Remark 2.3.12. The Stone representation theorem relies, in an essential


way, on the axiom of choice (or at least the Boolean prime ideal theorem,
which is slightly weaker than this axiom). However, it is possible to prove
the Loomis-Sikorski representation theorem without choice; see for instance
[BudePvaR2008].

Remark 2.3.13. The construction of X, A, N in the above proof was canon-


ical, but it is not unique (in contrast to the situation with the Stone rep-
resentation theorem, where Lemma 2.3.3 provides uniqueness up to homeo-
morphisms). Nevertheless, using Remark 2.3.11, one can make the Loomis-
Sikorski representation functorial. Let A and B be σ-algebras with Stone
spaces X and Y . A map Y → X induces a σ-homomorphism Bor(X) →
Bor(Y ), and if the inverse image of a Borel meager set is meager then it in-
duces a σ-homomorphism A → B. Conversely, a σ-homomorphism A → B
induces a map Y → X under which the inverse image of a Borel meager set is
meager (using the fact above that Borel meager sets are generated by count-
able intersections of clopen sets). The correspondence is bijective since it is
just a restriction of the correspondence for ordinary Boolean algebras. This
gives a duality between the category of σ-algebras and σ-homomorphisms
and the category of σ-Stone spaces and continuous maps such that the in-
verse image of a Borel meager set is meager. In fact, σ-Stone spaces can be

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
300 2. Related articles

abstractly characterized as Stone spaces such that the closure of a countable


union of clopen sets is clopen.

A (concrete) measure space (X, B, μ) is a concrete σ-algebra (X, B) to-


gether with a countably additive measure μ : B → [0, +∞]. One can simi-
larly define an abstract measure space (B, μ) (or measure algebra) to be an
abstract σ-algebra B with a countably additive measure μ : B → [0, +∞].
(Note that one does not need the concrete space X in order to define the
notion of a countably additive measure.)
One can obtain an abstract measure space from a concrete one by delet-
ing X and then quotienting out by some σ-ideal of null sets—sets of measure
zero with respect to μ. (For instance, one could quotient out the space of all
null sets, which is automatically a σ-ideal.) Thanks to the Loomis-Sikorski
representation theorem, we have a converse:
Exercise 2.3.3. Show that every abstract measure space is isomorphic to a
concrete measure space after quotieting out by a σ-ideal of null sets (where
the notion of morphism, isomorphism, etc. on abstract measure spaces is
defined in the obvious manner.)

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/12.
Thanks to Eric for Remark 2.3.11, and for the functoriality remark in Re-
mark 2.3.13.
Eric and Tom Leinster pointed out a subtlety that two concrete Boolean
algebras which are abstractly isomorphic need not be concretely isomorphic.
In particular, the modifier “abstract” is essential in the statement that “up
to (abstract) isomorphism, there is no difference between a concrete Boolean
algebra and an abstract one.”

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.4

Well-ordered sets,
ordinals, and Zorn’s
lemma

Notational convention: As in Section 2.2, I will colour a statement red


if it assumes the axiom of choice. We will, of course, rely on every other
axiom of Zermelo-Frankel set theory here (and in the rest of the course).
In analysis, one often needs to iterate some sort of operation infinitely
many times (e.g., to create a infinite basis by choosing one basis element at
a time). In order to do this rigorously, we will rely on Zorn’s lemma:
Lemma 2.4.1 (Zorn’s lemma). Let (X, ≤) be a non-empty partially ordered
set, with the property that every chain (i.e., a totally ordered set) in X has
an upper bound. Then X contains a maximal element (i.e., an element with
no larger element).

Indeed, we have used this lemma several times already in previous sec-
tions. Given the other standard axioms of set theory, this lemma is logically
equivalent to
Axiom 2.4.2 (Axiom of choice). Let X be a set, and let F be a collection
of non-empty subsets of X. Then there exists a choice function f : F → X,
i.e., a function such that f (A) ∈ A for all A ∈ F .

One implication is easy:

Proof of axiom of choice using Zorn’s lemma. Define a partial choice


function to be a pair (F  , f  ), where F  is a subset of F and f  : F  → X

301

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
302 2. Related articles

is a choice function for F  . We can partially order the collection of partial


choice functions by writing (F  , f  ) ≤ (F  , f  ) if F  ⊂ F  and f  extends
f  . The collection of partial choice functions is non-empty (since it contains
the pair (∅, ()) consisting of the empty set and the empty function), and
it is easy to see that any chain of partial choice functions has an upper
bound (formed by gluing all the partial choices together). Hence, by Zorn’s
lemma, there is a maximal partial choice function (F∗ , f∗ ). But the domain
F∗ of this function must be all of F , since otherwise one could enlarge F∗
by a single set A and extend f∗ to A by choosing a single element of A.
(One does not need the axiom of choice to make a single choice, or finitely
many choices; it is only when making infinitely many choices that the axiom
becomes necessary.) The claim follows. 

In the rest of this section I would like to supply the reverse implication,
using the machinery of well-ordered sets. Instead of giving the shortest or
slickest proof of Zorn’s lemma here, I would like to take the opportunity to
place the lemma in the context of several related topics, such as ordinals and
transfinite induction, noting that much of this material is in fact independent
of the axiom of choice. The material here is standard, but for the purposes
of real analysis, one may simply take Zorn’s lemma as a black box and not
worry about the proof.

2.4.1. Well-ordered sets. To prove Zorn’s lemma, we first need to


strengthen the notion of a totally ordered set.
Definition 2.4.3. A well-ordered set is a totally ordered set X = (X, ≤)
such that every non-empty subset A of X has a minimal element min(A) ∈
A. Two well-ordered sets X, Y are isomorphic if there is an order isomor-
phism φ : X → Y between them, i.e., a bijection φ which is monotone
(φ(x) < φ(x ) whenever x < x ).
Example 2.4.4. The natural numbers are well ordered (this is the well-
ordering principle), as is any finite totally ordered set (including the empty
set), but the integers, rationals, or reals are not well ordered.
Example 2.4.5. Any subset of a well-ordered set is again well ordered. In
particular, if a, b are two elements of a well-ordered set, then intervals such
as [a, b] := {c ∈ X : a ≤ c ≤ b}, [a, b) := {c ∈ X : a ≤ c < b}, etc., are also
well ordered.
Example 2.4.6. If X is a well-ordered set, then the ordered set X ⊕ {+∞},
defined by adjoining a new element +∞ to X and declaring it to be larger
than all the elements of X, is also well ordered. More generally, if X and
Y are well-ordered sets, then the ordered set X ⊕ Y , defined as the disjoint
union of X and Y , with any element of Y declared to be larger than any

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 303

element of X, is also well ordered. Observe that the operation ⊕ is asso-


ciative (up to isomorphism), but not commutative in general: for instance,
N ⊕ {∞} is not isomorphic to {∞} ⊕ N.
Example 2.4.7. If X, Y are well-ordered sets, then the ordered set X ⊗ Y ,
defined as the Cartesian product X × Y with the lexicographical ordering
(thus (x, y) ≤ (x , y  ) if x < x , or if x = x and y ≤ y  ), is again a well-
ordered set. Again, this operation is associative (up to isomorphism) but
not commutative. Note that we have one-sided distributivity: (X ⊕ Y ) ⊗ Z
is isomorphic to (X ⊗ Z) ⊕ (Y ⊗ Z), but Z ⊗ (X ⊕ Y ) is not isomorphic to
(Z ⊗ X) ⊕ (Z ⊗ Y ) in general.
Remark 2.4.8. The axiom of choice is trivially true in the case when X
is well ordered, since one can take min to be the choice function. Thus,
the axiom of choice follows from the well-ordering theorem (every set has
at least one well-ordering). Conversely, we will be able to deduce the well-
ordering theorem from Zorn’s lemma (and hence from the axiom of choice);
see Exercise 2.4.11 below.

One of the reasons that well-ordered sets are useful is that one can
perform induction on them. This is easiest to describe for the principle of
strong induction:
Exercise 2.4.1 (Strong induction on well-ordered sets). Let X be a well-
ordered set, and let P : X → {true, false} be a property of elements of X.
Suppose that whenever x ∈ X is such that P (y) is true for all y < x, then
P (x) is true. Then P (x) is true for every x ∈ X. This is called the principle
of strong induction. Conversely, show that a totally ordered set X enjoys the
principle of strong induction if and only if it is well ordered. (For partially
ordered sets, the corresponding notion is that of being well founded.)

To describe the analogue of the ordinary principle of induction for well-


ordered sets, we need some more notation. Given a subset A of a non-empty
well-ordered set X, we define the supremum sup(A) ∈ X ⊕ {+∞} of A to
be the least upper bound
(2.14) sup(A) := min{y ∈ X ⊕ {+∞} : x ≤ y for all x ∈ X}
of A (thus for instance the supremum of the empty set is min(X)). If x ∈ X,
we define the successor succ(x) ∈ X ⊕ {+∞} of x by the formula
(2.15) succ(x) := min((x, +∞]).
We have the following Peano-type axioms:
Exercise 2.4.2. If x is an element of a non-empty well-ordered set X, show
that exactly one of the following statements hold:

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
304 2. Related articles

• Limit case. x = sup([min(X), x)).


• Successor case. x = succ(y) for some Y .
In particular, min(X) is not the successor of any element in X.
Exercise 2.4.3. Show that if x, y are elements of a well-ordered set such
that succ(x) = succ(y), then x = y.
Exercise 2.4.4 (Transfinite induction for well-ordered sets). Let X be a
non-empty well-ordered set, and let P : X → {true, false} be a property of
elements of X. Suppose that
• Base case. P (min(X)) is true.
• Successor case. If x ∈ X and P (x) is true, then P (succ(x)) is true.
• Limit case. If x = sup([min(X), x)) and P (y) is true for all y < x,
then P (x) is true. (Note that this subsumes the base case.)
Then P (x) is true for all x ∈ X.
Remark 2.4.9. The usual Peano axioms for succession are the special case
of Exercises 2.4.2–2.4.4 in which the limit case of Exercise 2.4.2 only occurs
for min(X) (which is denoted 0), and the successor function never attains
+∞. With these additional axioms, X is necessarily isomorphic to N.

Now we introduce two more key concepts.


Definition 2.4.10. An initial segment of a well-ordered set X is a subset
Y of X such that [min(X), y] ⊂ Y for all y ∈ Y (i.e., whenever y lies in Y ,
all elements of X that are less than y also lie in Y ).
A morphism from one well-ordered set X to another Y is a map φ : X →
Y which is strictly monotone (thus φ(x) < φ(x ) whenever x < x ) and such
that φ(X) is an initial segment of Y .
Example 2.4.11. The only morphism from {1, 2, 3} to {1, 2, 3, 4, 5} is the
inclusion map. There is no morphism from {1, 2, 3, 4, 5} to {1, 2, 3}.
Remark 2.4.12. With this notion of a morphism, the class of well-ordered
sets becomes a category.

We can identify the initial segments of X with elements of X ∪ {+∞}:


Exercise 2.4.5. Let X be a non-empty well-ordered set. Show that every
initial segment I of X is of the form I = [min(X), a) for exactly one a ∈
X ∪ {+∞}.
Exercise 2.4.6. Show that an arbitrary union or arbitrary intersection of
initial segments is again an initial segment.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 305

Exercise 2.4.7. Let φ : X → Y be a morphism. Show that φ maps initial


segments of X to initial segments of Y . If x, x ∈ X is such that x is the
successor of x, show that φ(x ) is the successor of φ(x).

As Example 2.4.11 suggests, there are very few morphisms between well-
ordered sets. Indeed, we have
Proposition 2.4.13 (Uniqueness of morphisms). Given two well-ordered
sets X and Y , there is at most one morphism from X and Y .

Proof. Suppose we have two morphisms φ : X → Y , ψ : X → Y . By using


transfinite induction (Exercise 2.4.4 and Exercise 2.4.7), we see that φ, ψ
agree on [min(X), a) for every a ∈ X ⊕ {+∞}; setting a = +∞ gives the
claim. 
Exercise 2.4.8 (Schroder-Bernstein theorem for well-ordered sets). Show
that two well-ordered sets X, Y are isomorphic if and only if there is a
morphism from X to Y , and a morphism from Y to X.

We can complement the uniqueness in Proposition 2.4.13 with existence:


Proposition 2.4.14 (Existence of morphisms). Given two well-ordered sets
X and Y , there is either a morphism from X to Y or a morphism from Y
to X.

Proof. Call an element a ∈ X ⊕ {+∞} good if there is a morphism φa


from [min(X), a) to Y , thus min(X) is good. If +∞ is good, then we are
done. From uniqueness we see that if every element in a set A is good, then
the supremum sup(A) is also good. Applying transfinite induction (Exercise
2.4.5), we thus see that we are done unless there exists a good a ∈ X such
that succ(a) is not good. By Exercise 2.4.5, φa ([min(X), a)) = [min(Y ), b)
for some b ∈ Y ⊕ {+∞}. If b ∈ Y , then we could extend the morphism φa to
[min(X), a] = [min(X), succ(a)) by mapping a to b, contradicting the fact
that succ(a) is not good; thus b = +∞ and so φa is surjective. It is then
easy to check that φ−1
a exists and is a morphism from Y to X, and the claim
follows. 
Remark 2.4.15. Formally, Proposition 2.4.13, Exercise 2.4.8, and Proposi-
tion 2.4.14 tell us that the collection of all well-ordered sets, modulo isomor-
phism, is totally ordered by declaring one well-ordered set X to be at least as
large as another Y when there is a morphism from Y to X. However, this is
not quite the case, because the collection of well-ordered sets is only a class
rather than a set. Indeed, as we shall soon see, this is not a technicality, but
is in fact a fundamental fact about well-ordered sets that lies at the heart of
Zorn’s lemma. (From Russell’s paradox we know that the notions of class
and set are necessarily distinct; see Section 1.15 of Volume II.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
306 2. Related articles

2.4.2. Ordinals. As we learn very early on in our mathematics education,


a finite set of a certain cardinality (e.g., a set {a, b, c, d, e}) can be put in
one-to-one correspondence with a standard set of the same cardinality (e.g.,
the set {1, 2, 3, 4, 5}); two finite sets have the same cardinality if and only
if they correspond to the same standard set {1, . . . , N }). (The same fact is
true for infinite sets; see Exercise 2.4.12 below.) Similarly, we would like to
place every well-ordered set in a standard form. This motivates
Definition 2.4.16. A representation ρ of the well-ordered sets is an assign-
ment of a well-ordered set ρ(X) to every well-ordered set X such that
• ρ(X) is isomorphic to X for every well-ordered set X. (In particular,
if ρ(X) and ρ(Y ) are equal, then X and Y are isomorphic.)
• If there exists a morphism from X to Y , then ρ(X) is a subset of
ρ(Y ) and the order structure on ρ(X) is induced from that on ρ(Y ).
(In particular, if X and Y are isomorphic, then ρ(X) and ρ(Y ) are
equal.)
Remark 2.4.17. In the language of category theory, a representation is
a covariant functor from the category of well-ordered sets to itself which
turns all morphisms into inclusions, and which is naturally isomorphic to
the identity functor.
Remark 2.4.18. Because the collection of all well-ordered sets is a class
rather than a set, ρ is not actually a function (it is sometimes referred to as
a class function).

It turns out that several representations of the well-ordered sets exist.


The most commonly used one is that of the ordinals, defined by von Neu-
mann as follows.
Definition 2.4.19 (Ordinals). An ordinal is a well-ordered set α with the
property that x = {y ∈ α : y < x} for all x ∈ α. (In particular, each element
of α is also a subset of α, and the strict order relation < on α is identical
to the set membership relation ∈.)
Example 2.4.20. For each natural number n = 0, 1, 2, . . ., define the ordi-
nal number nth recursively by setting 0th := ∅ and nth := {0th , 1th , . . . , (n −
1)th } for all n ≥ 1, thus for instance
0th := ∅
1th := {0th } = {∅}
(2.16)
2th := {0th , 1th } = {∅, {∅}}
3th := {0th , 1th , 2th } = {∅, {∅}, {∅, {∅}}},

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 307

and so forth. (Of course, to be compatible with the English language con-
ventions for ordinals, we should write 1st instead of 1th , etc., but let us ignore
this discrepancy.) One can easily check by induction that nth is an ordinal
for every n. Furthermore, if we define ω := {nth : n ∈ N}, then ω is also an
ordinal. (In the foundations of set theory, this construction, together with
the axiom of infinity, is sometimes used to define the natural numbers (so
that n = nth for all natural numbers n), although this construction can lead
to some conceptually strange-looking consequences that blur the distinction
between numbers and sets, such as 3 ∈ 5 and 4 = {0, 1, 2, 3}.)

The fundamental theorem about ordinals is


Theorem 2.4.21. (i) Given any two ordinals α, β, one is a subset of
the other (and the order structure on α is induced from that on β).
(ii) Every well-ordered set X is isomorphic to exactly one ordinal
ord(X).
In particular, ord is a representation of the well-ordered sets.

Proof. We first prove (i). From Proposition 2.4.14 and symmetry, we may
assume that there is a morphism φ from α to β. By strong induction (Ex-
ercise 2.4.1) and Definition 2.4.19, we see that φ(x) = x for all x ∈ α, and
so φ is the inclusion map from α into β. The claim follows.
Now we prove (ii). If uniqueness failed, then we would have two distinct
ordinals that are isomorphic to each other, but as one ordinal is a subset of
the other, this would contradict Proposition 2.4.13 (the inclusion morphism
is not an isomorphism); so it suffices to prove existence.
We use transfinite induction. It suffices to show that for every a ∈
X ⊕ {+∞}, that [min(X), a) is isomorphic to an ordinal α(a) (which we
know to be unique). This is of course true in the base case a = min(X). To
handle the successor case a = succ(b), we set α(a) := α(b) ∪ {α(b)}, which
is easily verified to be an ordinal isomorphic to [min(X), a). To handle
the limit case a = sup([min(X), a)), we take all the ordinals associated to
elements in [min(X), a) and take their union (here we rely crucially on the
axiom schema of replacement and the axiom of union); by use of (i) one can
show that this union is an ordinal isomorphic to a as required. 
Remark 2.4.22. Operations on well-ordered sets, such as the sum ⊕ and
product ⊗ defined in Exercises 2.4.3 and 2.4.4, induce corresponding oper-
ations on ordinals, leading to ordinal arithmetic, which we will not discuss
here. (Note that the convention for which order multiplication proceeds in
is swapped in some of the literature, thus αβ would be the ordinal of β ⊗ α
rather than α ⊗ β.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
308 2. Related articles

Exercise 2.4.9 (Ordinals are themselves well ordered). Let F be a non-


empty class of ordinals. Show that there is a least ordinal min(F) in this
class, which is a subset of all the other ordinals in this class. In particular,
this shows that any set of ordinals is well-ordered by set inclusion.
Remark 2.4.23. Because of Exercise 2.4.9, we can meaningfully talk about
“the least ordinal obeying property P ”, as soon as we can exhibit at least
one ordinal with that property P . For instance, once one can demonstrate
the existence of an uncountable ordinal (which follows from Exercise 2.4.11
below4 ), one can talk about the least uncountable ordinal.
Exercise 2.4.10 (Transfinite induction for ordinals). Let P (α) be a prop-
erty pertaining to ordinals α. Suppose that
• Base case. P (∅) is true.
• Successor case. If α = {β, {β}} for some ordinal β and P (β) is true,
then P (α) is true.

• Limit case. If α = β∈α β and P (β) is true for all β ∈ α, then P (α)
is true.
Show that P (α) is true for every ordinal α.

Now we show a fundamental fact, that the well-ordered sets are just too
numerous to all fit inside a single set, even modulo isomorphism.
Theorem 2.4.24. There does not exist a set A and a representation ρ of
the well-ordered sets such that ρ(X) ∈ A for all well-ordered sets X.

Proof. By Theorem 2.4.21, any two distinct ordinals are non-isomorphic


and so get mapped under ρ to a different element of A. Thus we can identify
the class of ordinals with a subset of A, and so the class of ordinals is in fact
a set. In particular, by the axiom of union, we may take the union of all the
ordinals, which one can verify to be another ordinal ε0 . But then ε0 ∪ {ε0 }
is another ordinal, which implies that ε0 ∈ ε0 , which contradicts the axiom
of foundation. 
Remark 2.4.25. It is also possible to prove Theorem 2.4.24 without the
theory of ordinals or the axiom of foundation. One first observes (by trans-
finite induction) that given two well-ordered sets X, X  , one of the sets
ρ(X), ρ(X  ) is a subset of the other. Because of this, one can show that
the union S of all the ρ(X) (where X ranges over all well-ordered sets) is
well defined (because the ρ(X) form a subset of A) and well ordered. Now
we look at the well-ordered set S ∪ {+∞}; by Proposition 2.4.13, it is not
4 One can also create an uncountable ordinal without the axiom of choice by starting with

all the well-orderings of subsets of the natural numbers, and taking the union of their associated
ordinals; this construction is due to Hartog.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 309

isomorphic to any subset of S, but ρ(S ∪ {+∞}) is necessarily contained


in S, a contradiction. See also Section 1.15 of Volume II for some related
results and arguments in this spirit.
Remark 2.4.26. The same argument also shows that there is no represen-
tation of the ordinals inside a given set; the ordinals are “too big” to be
placed in anything other than a class.
2.4.3. Zorn’s lemma. Now we can prove Zorn’s lemma. The key propo-
sition is
Proposition 2.4.27. Let X be a partially ordered set, and let C be the set
of all well-ordered sets in X. Then there does not exist a function g : C → X
such that g(C) is a strict upper bound for C (i.e., g(C) > x for all x ∈ C)
for all well-ordered C ∈ C.

Proof. Suppose for contradiction that there existed X and g with the above
properties. Then, given any well-ordered set Y , we claim that there exists
exactly one isomorphism φY : Y → ρ(Y ) from Y to a well-ordered set ρ(Y )
in X such that φY (y) = g(φY ([min(Y ), y))) for all y ∈ Y . Indeed, the
uniqueness and existence can both be established by a transfinite induction
that we leave as an exercise. (Informally, φY is what one gets by “applying
g Y times, starting with the empty set”.) From uniqueness we see that
ρ(Y ) = ρ(Y  ) whenever Y and Y  are isomorphic, and another transfinite
induction shows that ρ(Y ) ⊂ ρ(Y  ) whenever Y is a subset of Y  . Thus ρ is
a representation of the ordinals. But this contradicts Theorem 2.4.24. 
Remark 2.4.28. One can use transfinite induction on ordinals rather than
well-ordered sets if one wishes here, using Remark 2.4.26 in place of Theorem
2.4.24.

Proof of Zorn’s lemma. Suppose for contradiction that one had a non-
empty partially ordered set X without maximal elements, such that every
chain had an upper bound. As there are no maximal elements, every element
in X must be bounded by a strictly larger element in X, and so every chain
in fact has a strict upper bound; in particular every well-ordered set has
a strict upper bound. Applying the axiom of choice, we may thus find a
choice function g : C → X from the space of well-ordered sets in X to X,
that maps every such set to a strict upper bound. But this contradicts
Proposition 2.4.27. 
Remark 2.4.29. It is important for Zorn’s lemma that X is a set, rather
than a class. Consider for instance the class of all ordinals. Every chain
of ordinals has an upper bound (namely, the union of the ordinals in that
chain), and the class is certainly non-empty, but there is no maximal ordinal.
(Compare also Theorem 2.4.21 and Theorem 2.4.24.)

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
310 2. Related articles

Remark 2.4.30. It is also important that every chain have an upper bound,
and not just countable chains. Indeed, the collection of countable subsets
of an uncountable set (such as R) is non-empty, and every countable chain
has an upper bound, but there is no maximal element.
Remark 2.4.31. The above argument shows that the hypothesis of Zorn’s
lemma can be relaxed slightly; one does not need every chain to have an
upper bound, merely every well-ordered set needs to have one. But I
do not know of any application in which this apparently stronger version
of Zorn’s lemma dramatically simplifies an argument. (In practice, either
Zorn’s lemma can be applied routinely, or it fails utterly to be applicable at
all.)
Exercise 2.4.11. Use Zorn’s lemma to establish the well-ordering theorem
(every set has at least one well-ordering).
Remark 2.4.32. By the above exercise, R can be well-ordered. However,
if one drops the axiom of choice from the axioms of set theory, one can no
longer prove that R is well-ordered. Indeed, given a well-ordering of R, it
is not difficult (using Remark 2.4.8) to remove the axiom of choice from the
Banach-Tarski constructions in Section 2.2, and thus obtain constructions of
non-measurable subsets of R. But a deep theorem of Solovay gives a model
of set theory (without the axiom of choice) in which every set of reals is
measurable.
Exercise 2.4.12. Define a (von Neumann) cardinal to be an ordinal α with
the property that all smaller ordinals have strictly lesser cardinality (i.e.,
cannot be placed in one-to-one correspondence with α). Show that every
set can be placed in one-to-one correspondence with exactly one cardinal.
(This gives a representation of the category of sets, similar to how ord gives
a representation of well-ordered sets.)

It seems appropriate to close these notes with a quote from Jerry Bona:
The Axiom of Choice is obviously true, the well-ordering prin-
ciple obviously false, and who can tell about Zorn’s Lemma?

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/01/28.
Thanks to an anonymous commenter for corrections.
Eric remarked that any application of Zorn’s lemma can be equivalently
rephrased as a transfinite induction, after using a choice function to decide
where to go at each limit ordinal.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.5

Compactification and
metrisation

One way to study a general class of mathematical objects is to embed them


into a more structured class of mathematical objects; for instance, one could
study manifolds by embedding them into Euclidean spaces. In these notes
we study two (related) embedding theorems for topological spaces:

• The Stone-Čech compactification, which embeds locally compact


Hausdorff spaces into compact Hausdorff spaces in a universal fash-
ion; and
• The Urysohn metrization theorem, that shows that every second-
countable normal Hausdorff space is metrizable.

2.5.1. The Stone-Čech compactification. Observe that any dense open


subset of a compact Hausdorff space is automatically a locally compact
Hausdorff space. We now study the reverse concept:

Definition 2.5.1. A compactification of a locally compact Hausdorff space


X is an embedding ι : X → X (i.e., a homeomorphism between X and
ι(X)) into a compact Hausdorff space X such that the image ι(X) of X is
an open dense subset of X. We will often abuse notation and refer to X
as the compactification rather than the embedding ι : X → X, when the
embedding is obvious from context.

One compactification ι : X → X is finer than another ι : X → X (or

ι : X → X is coarser than ι : X → X) if there exists a continuous map

π : X → X such that ι = π ◦ ι . Notice that this map must be surjective

311

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
312 2. Related articles

and unique, by the open dense nature of ι(X). Two compactifications are
equivalent if they are both finer than each other.
Example 2.5.2. Any compact set can be its own compactification. The real
line R can be compactified into [−π/2, π/2] by using the arctan function as
the embedding, or (equivalently) by embedding it into the extended real line
[−∞, ∞]. It can also be compactified into the unit circle {(x, y) ∈ R2 : x2 +
x2 −1
y 2 = 1} by using the stereographic projection x → ( 1+x 2x
2 , 1+x2 ). Notice that

the former embedding is finer than the latter. The plane R2 can similarly
be compactified into the unit sphere {(x, y, z) ∈ R2 : x2 + y 2 + z 2 = 1} by
2y x2 +y 2 −1
the stereographic projection (x, y) → ( 1+x2x
2 +y 2 , 1+x2 +y 2 , 1+x2 +y 2 ).

Exercise 2.5.1. Let X be a locally compact Hausdorff space X that is not


compact. Define the one-point compactification X ∪ {∞} by adjoining one
point ∞ to X, with the topology generated by the open sets of X, and the
complement (in X ∪ {∞}) of the compact sets in X. Show that X ∪ {∞}
(with the obvious embedding map) is a compactification of X. Show that
the one-point compactification is coarser than any other compactification of
X.

We now consider the opposite extreme to the one-point compactification:


Definition 2.5.3. Let X be a locally compact Hausdorff space. A Stone-
Čech compactification βX of X is defined as the finest compactification of
X, i.e., the compactification of X which is finer than every other compacti-
fication of X.

It is clear that the Stone-Čech compactification, if it exists, is unique


up to isomorphism, and so one often abuses notation by referring to the
Stone-Čech compactification. The existence of the compactification can be
established by Zorn’s lemma (see Section 2.3 of Poincaré’s legacies, Vol. I ).
We shall shortly give several other constructions of the compactification.
(All constructions, however, rely at some point on the axiom of choice, or a
related axiom.)
The Stone-Čech compactification obeys a useful functorial property:
Exercise 2.5.2. Let X, Y be locally compact Hausdorff spaces, with Stone-
Čech compactifications βX, βY . Show that every continuous map f : X →
Y has a unique continuous extension βf : βX → βY . (Hint: Uniqueness
is easy; for existence, look at the closure of the graph {(x, f (x)) : x ∈ X}
in βX × βY , which compactifies X and thus cannot be strictly finer than
βX.) In the converse direction, if X is a compactification of X such that
every continuous map f : X → K into a compact space can be extended
continuously to X, show that X is the Stone-Čech compactification.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.5. Compactification 313

Example 2.5.4. From the above exercise, we can define limits limx→p f (x)
:= βf (p) for any bounded continuous function on X and any p ∈ βX. But
for coarser compactifications, one can only take limits for special types of
bounded continuous functions; for instance, using the one-point compactifi-
cation of R, limx→∞ f (x) need not exist for a bounded continuous function
f : R → R, e.g., limx→∞ sin(x) or limx→∞ arctan(x) do not exist. The
finer the compactification, the more limits can be defined; for instance the
two point compactification [−∞, +∞] of R allows one to define the lim-
its limx→+∞ f (x) and limx→−∞ f (x) for some additional functions f (e.g.,
limx→±∞ arctan(x) is well defined); and the Stone-Čech compactification is
the only compactification which allows one to take limits for any bounded
continuous function (e.g., limx→p sin(x) is well defined for all p ∈ βR).

Now we turn to the issue of actually constructing the Stone-Čech com-


pactifications.
Exercise 2.5.3. Let X be a locally compact Hausdorff space. Let
C(X → [0, 1]) be the space of continuous functions from X to the unit
interval, let Q := [0, 1]C(X→[0,1]) be the space of tuples (yf )f ∈C(X→[0,1]) tak-
ing values in the unit interval with the product topology, and let ι : X → Q
be the Gelfand transform ι(x) := (f (x))f ∈C(X→[0,1]) , and let βX be the
closure of ιX in Q.
• Show that βX is a compactification of X. (Hint: Use Urysohn’s
lemma and Tychonoff’s theorem.)
• Show that βX is the Stone-Čech compactification of X. (Hint: If
X is any other compactification of X, we can identify C(X → [0, 1])
as a subset of C(X → [0, 1]) and then project Q to [0, 1]C(X→[0,1]) .
Meanwhile, we can embed X inside [0, 1]C(X→[0,1]) by the Gelfand
transform.)
Exercise 2.5.4. Let X be a discrete topological space, let 2X be the Boolean
algebra of all subsets of X. By Stone’s representation theorem (Theorem
1.2.2), 2X is isomorphic to the clopen algebra of a Stone space βX.
• Show that βX is a compactification of X.
• Show that βX is the Stone-Čech compactification of X.
• Identify βX with the space of ultrafilters on X. (See Section 1.5 of
Structure and randomness for further discussion of ultrafilters, and
Section 2.3 of Poincaré’s legacies, Vol. I for further discussion of the
relationship of ultrafilters to the Stone-Čech compactification.)
Exercise 2.5.5. Let X be a locally compact Hausdorff space, and let
BC(X → C) be the space of bounded continuous complex-valued functions
on X.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
314 2. Related articles

• Show that BC(X → C) is a unital commutative C ∗ -algebra (see


Section 1.10.4).
• By the commutative Gelfand-Naimark theorem (Theorem 1.10.24),
BC(X → C) is isomorphic as a unital C ∗ -algebra to C(βX → C) for
some compact Hausdorff space βX (which is in fact the spectrum of
BC(X → C). Show that βX is the Stone-Čech compactification of
X.
• More generally, show that given any other compactification X of X,
that C(X → C) is isomorphic as a unital C ∗ -algebra to a subalgebra
of BC(X → C) that contains C ⊕ C0 (X → C) (the space of continu-
ous functions from X to C that converge to a limit at ∞), with X as
the spectrum of this algebra; thus we have a canonical identification
between compactifications and C ∗ -algebras between BC(X → C)
and C ⊕ C0 (X → C), which correspond to the Stone-Čech compact-
ification and one-point compactification, respectively.
Exercise 2.5.6. Let X be a locally compact Hausdorff space. Show that
the dual BC(X → R)∗ of BC(X → R) is isomorphic as a Banach space to
the space M (βX) of real signed Radon measures on the Stone-Čech com-
pactification βX, and similarly in the complex case. In particular, conclude
that ∞ (N)∗ ≡ M (βN).
Remark 2.5.5. The Stone-Čech compactification can be extended from
locally compact Hausdorff spaces to the slightly larger class of Tychonoff
spaces, which are those Hausdorff spaces X with the property that any closed
set K ⊂ X and point x not in K can be separated by a continuous function
f ∈ C(X → R) which equals 1 on K and zero on x. This compactification
can be constructed by a modification of the argument used to establish
Exercise 2.5.3. However, in this case the space X is merely dense in its
compactification βX, rather than open and dense.
Remark 2.5.6. A cautionary note: in general, the Stone-Čech compactifi-
cation is almost never sequentially compact. For instance, it is not hard to
show that N is sequentially closed in βN. In particular, these compactifica-
tions are usually not metrisable.

2.5.2. Urysohn’s metrisation theorem. Recall that a topological space


is metrisable if there exists a metric on that space which generates the topol-
ogy. There are various necessary conditions for metrisability. For instance,
we have already seen that metric spaces must be normal and Hausdorff. In
the converse direction, we have
Theorem 2.5.7 (Urysohn’s metrisation theorem). Let X be a normal Haus-
dorff space which is second countable. Then X is metrisable.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.5. Compactification 315

Proof (Sketch). This will be a variant of the argument in Exercise 2.5.3,


but with a countable family of continuous functions in place of C(X →
[0, 1]).
Let U1 , U2 , . . . be a countable base for X. If Ui , Uj are in this base with
Ui ⊂ Uj , we can apply Urysohn’s lemma and find a continuous function
fij : X → [0, 1] which equals 1 on Ui and vanishes outside of Uj . Let F
be the collection of all such functions; this is a countable family. We can
then embed X in [0, 1]F using the Gelfand transform x → (f (x))f ∈F . By
modifying the proof of Exercise 2.5.3 one can show that this is an embed-
ding. On the other hand, [0, 1]F is a countable product of metric spaces
and is thus metrisable (e.g., by  enumerating F as f1 , f2 , . . . and using the
metric d((xn )fn ∈F , (yn )fn ∈F ) := ∞ n=1 2
−n |x − y |). Since a subspace of a
n n
metrisable space is clearly also metrisable, the claim follows. 

Recalling that compact metric spaces are second countable (Lemma


1.8.6), thus we have
Corollary 2.5.8. A compact Hausdorff space is metrisable if and only if it
is second countable.

Of course, non-metrisable compact Hausdorff spaces exist; βN is a stan-


dard example. Uncountable products of non-trivial compact metric spaces,
such as {0, 1}, are always non-metrisable. Indeed, we already saw in Sec-
tion 1.8 that {0, 1}X is compact but not sequentially compact (and thus not
metrisable) when X has the cardinality of the continuum. One can use the
first uncountable ordinal to achieve a similar result for any uncountable X,
and then by embedding one can obtain non-metrisability for any uncount-
able product of non-trivial compact metric spaces, thus complementing the
metrisability of countable products of such spaces. Conversely, there also
exist metrisable spaces which are not second countable (e.g., uncountable
discrete spaces). So Urysohn’s metrisation theorem does not completely
classify the metrisable spaces, however it already covers a large number of
interesting cases.

Notes. This lecture first appeared at


terrytao.wordpress.com/2009/03/02.
Thanks to Eric, Javier Lopez, Mark Meckes, Max Baroi, Paul Leopardi,
Pete L. Clark, and anonymous commenters for corrections.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.6

Hardy’s uncertainty
principle

Many properties of a (sufficiently nice) function f : R → C are reflected in


its Fourier transform fˆ : R → C, defined by the formula

(2.17) fˆ(ξ) := f (x)e−2πixξ dx.
−∞

For instance, decay properties of f are reflected in smoothness properties of


fˆ, as the following table shows:

If f is. . . then fˆ is. . . and this relates to. . .


Square-integrable square-integrable Plancherel’s theorem
Absolutely integrable continuous Riemann-Lebesgue lemma
Rapidly decreasing smooth theory of Schwartz functions
Exponentially decreasing analytic in a strip
Compactly supported entire, exponential growth Paley-Wiener theorem

(See Section 1.12 for further discussion of the Fourier transform.)


Another important relationship between a function f and its Fourier
transform fˆ is the uncertainty principle, which roughly asserts that if a
function f is highly localised in space, then its Fourier transform fˆ must
be widely dispersed in space, or to put it another way, f and fˆ cannot
both decay too strongly at infinity (except of course in the degenerate case
f = 0). There are many ways to make this intuition precise. One of them
is the Heisenberg uncertainty principle, which asserts that if we normalise

|f (x)|2 dx = |fˆ(ξ)|2 dξ = 1,
R R

317

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
318 2. Related articles

then we must have


1
( |x|2 |f (x)|2 dx) · ( |ξ|2 |fˆ(ξ)|2 dx) ≥ ,
R R (4π)2
thus forcing at least one of f or fˆ to not be too concentrated near the origin.
This principle can be proven (for sufficiently nice f , initially) by observing
the integration by parts identity
1
xf, f   = xf (x)f  (x) dx = − |f (x)|2 dx
R 2 R
and then using Cauchy-Schwarz and the Plancherel identity.
Another well-known manifestation of the uncertainty principle is the fact
that it is not possible for f and fˆ to both be compactly supported (unless
of course they vanish entirely). This can be in fact be seen from the above
table: if f is compactly supported, then fˆ is an entire function; but the
zeroes of a non-zero entire function are isolated, yielding a contradiction
unless f vanishes. (Indeed, the table also shows that if one of f and fˆ is
compactly supported, then the other cannot have exponential decay.)
On the other hand, we have the example of the Gaussian functions
f (x) = e−πax , fˆ(ξ) = √1a e−πξ /a , which both decay faster than exponen-
2 2

tially. The classical Hardy uncertainty principle asserts, roughly speaking,


that this is the fastest that f and fˆ can simultaneously decay:
Theorem 2.6.1 (Hardy uncertainty principle). Suppose that f is a (mea-
surable) function such that |f (x)| ≤ Ce−πax and |fˆ(ξ)| ≤ C  e−πξ /a for all
2 2

x, ξ and some C, C  , a > 0. Then f (x) is a scalar multiple of the Gaussian


e−πax .
2

This theorem is proven by complex-analytic methods, in particular the


Phragmén-Lindelöf principle; for the sake of completeness we give that proof
below. But I was curious to see if there was a real-variable proof of the same
theorem, avoiding the use of complex analysis. I was able to find the proof
of a slightly weaker theorem:
Theorem 2.6.2 (Weak Hardy uncertainty principle). Suppose that f is a
non-zero (measurable) function such that |f (x)| ≤ Ce−πax and |fˆ(ξ)| ≤
2

C  e−πbξ for all x, ξ and some C, C  , a, b > 0. Then ab ≤ C0 for some


2

absolute constant C0 .

Note that the correct value of C0 should be 1, as is implied by the true


Hardy uncertainty principle. Despite the weaker statement, I thought the
proof might still might be of interest as it is a little less “magical” than the
complex-variable one, and so I am giving it below.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.6. Hardy’s uncertainty principle 319

2.6.1. The complex-variable proof. We first give the complex-variable


√ √
proof. By dilating f by a (and contracting fˆ by 1/ a) we may normalise
a = 1. By multiplying f by a small constant, we may also normalise C =
C  = 1.
The super-exponential decay of f allows us to extend the Fourier trans-
form fˆ to the complex plane, thus

fˆ(ξ + iη) = f (x)e−2πixξ e2πηx dx


R
for all ξ, η ∈ R. We may differentiate under the integral sign and verify that
fˆ is entire. Taking absolute values, we obtain the upper bound

e−πx e2πηx dx;


2
|fˆ(ξ + iη)| ≤
R
completing the square, we obtain
2
(2.18) |fˆ(ξ + iη)| ≤ eπη
for all ξ, η. We conclude that the entire function
2
F (z) := eπz fˆ(z)
is bounded in magnitude by 1 on the imaginary axis; also, by hypothesis
on fˆ, we know that F is bounded in magnitude by 1 on the real axis.
Formally applying the Phragmén-Lindelöf principle (or maximum modulus
principle), we conclude that F is bounded on the entire complex plane, which
by Liouville’s theorem implies that F is constant, and the claim follows.
Now let’s go back and justify the Phragmén-Lindelöf argument. Strictly
speaking, Phragmén-Lindelöf does not apply, since it requires exponential
growth on the function F , whereas we have quadratic-exponential growth
here. But we can tweak F a bit to solve this problem. Firstly, we pick
0 < θ < π/2 and work on the sector
Γθ := {reiα : r > 0, 0 ≤ α ≤ θ}.
Using (2.18), we have
2
|F (ξ + iη)| ≤ eπξ .
Thus, if δ > 0, and θ is sufficiently close to π/2 depending on δ, the function
2
eiδz F (z) is bounded in magnitude by 1 on the boundary of Γθ . Then, for
any sufficiently small ε > 0, e−iεe z eiδz F (z) (using the standard branch
iε 2+ε 2

of z 2+ε on Γθ ) is also bounded in magnitude by 1 on this boundary and goes


to zero at infinity in the interior of Γθ , so is bounded by 1 in that interior
by the maximum modulus principle. Sending ε → 0, and then θ → π/2, and
then δ → 0, we obtain F bounded in magnitude by 1 on the upper right
quadrant. Similar arguments work for the other quadrants, and the claim
follows.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
320 2. Related articles

2.6.2. The real-variable proof. Now we turn to the real-variable proof


of Theorem 2.6.2, which is based on the fact that polynomials of controlled
degree do not resemble rapidly decreasing functions.
Rather than use the complex analyticity fˆ, we will rely instead on a
different relationship between the decay of f and the regularity of fˆ, as
follows:
Lemma 2.6.3 (Derivative bound). Suppose that |f (x)| ≤ Ce−πax for all
2

x ∈ R, and for some C, a > 0. Then fˆ is smooth, and furthermore one has
k!π k/2
the bound |∂ξk fˆ(ξ)| ≤ √Ca (k/2)!a (k+1)/2 for all ξ ∈ R and every even integer k.

Proof. The smoothness of fˆ follows from the rapid decrease of f . To get


the bound, we differentiate under the integral sign (one can easily check that
this is justified) to obtain

∂ξk fˆ(ξ) = (−2πix)k f (x)e−2πixξ dx,


R
and thus by the triangle inequality for integrals (and the hypothesis that k
is even)
e−πax (2πx)k dx.
2
|∂ξk fˆ(ξ)| ≤ C
R
On the other hand, by differentiating the Fourier analytic identity
1
√ e−πξ /a = e−πax e−2πixξ dx
2 2

a R
k times at ξ = 0, we obtain
dk 1 −πξ2 /a
e−πax (2πix)k dx;
2
(√ e )|ξ=0 =
dξ k a R

√1 e−πξ /a
2
expanding out a
using Taylor series, we conclude that

k! (−π/a)k/2
e−πax (2πix)k dx.
2
√ = 
a (k/2)! R

Using Stirling’s formula k! = k k (e + o(1))−k , we conclude in particular


that
πe
(2.19) |∂ξk fˆ(ξ)| ≤ ( + o(1))k/2 k k/2
a
for all large even integers k (where the decay of o(1) can depend on a, C).
We can combine (2.19) with Taylor’s theorem with remainder to con-
clude that on any interval I ⊂ R, we have an approximation
1 πe
fˆ(ξ) = PI (ξ) + O( ( + o(1))k/2 k k/2 |I|k ),
k! a

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.6. Hardy’s uncertainty principle 321

where |I| is the length of I and PI is a polynomial of degree less than k.


Using Stirling’s formula again, we obtain
π
(2.20) fˆ(ξ) = PI (ξ) + O(( + o(1))k/2 k −k/2 |I|k ).
ea
Now we apply a useful bound.
Lemma 2.6.4 (Doubling bound). Let P be a polynomial of degree at most
k for some k ≥ 1, let I = [x0 − r, x0 + r] be an interval, and suppose that
|P (x)| ≤ A for all x ∈ I and some A > 0. Then for any N ≥ 1 we have the
bound |P (x)| ≤ (CN )k A for all x ∈ N I := [x0 − N r, x0 + N r] and for some
absolute constant C.

Proof. By translating, we may take x0 = 0; by dilating, we may take r = 1.


By dividing P by A, we may normalise A = 1. Thus we have |P (x)| ≤ 1
for all −1 ≤ x ≤ 1, and the aim is now to show that |P (x)| ≤ (CN )k for all
−N ≤ x ≤ N .
Consider the trigonometric polynomial P (cos θ). By de Moivre’s for-
mula, this function is a linear combination of cos(jθ) for 0 ≤ j ≤ k. By

Fourier analysis, we can thus write P (cos θ) = kj=0 cj cos(jθ), where
π
1
cj = P (cos θ) cos(jθ) dθ.
π −π
Since P (cos θ) is bounded in magnitude by 1, we conclude that cj is bounded
in magnitude by 2. Next, we use de Moivre’s formula again to expand cos(jθ)
as a linear combination of cos(θ) and sin2 (θ), with coefficients of size O(1)k ;
expanding sin2 (θ) further as 1 − cos2 (θ), we see that cos(jθ) is a polynomial
in cos(θ) with coefficients O(1)k . Putting all this together, we conclude that
the coefficients of P are all of size O(1)k , and the claim follows. 
Remark 2.6.5. One can get slightly sharper results by using the theory of
Chebyshev polynomials.

We return to the proof of Theorem 2.6.2. We pick a large integer k and


a parameter r > 0 to be chosen later. From (2.20) we have
r 2
fˆ(ξ) = Pr (ξ) + O( )k/2
ak
for ξ ∈ [−r, 2r], and some polynomial Pr of degree k. In particular, we have
r2 k/2
Pr (ξ) = O(e−br ) + O(
2
)
ak
for ξ ∈ [r, 2r]. Applying Lemma 2.6.4, we conclude that
r2 k/2
Pr (ξ) = O(1)k e−br + O(
2
)
ak

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
322 2. Related articles

for ξ ∈ [−r, r]. Applying (2.20) again, we conclude that


r 2
fˆ(ξ) = O(1)k e−br + O( )k/2
2

ak
for ξ ∈ [−r, r]. If we pick r := cb k
for a sufficiently small absolute constant
c, we conclude that
1
|fˆ(ξ)| ≤ 2−k + O( )k/2
ab
(say) for ξ ∈ [−r, r]. If ab ≥ C0 for large enough C0 , the right-hand side
goes to zero as k → ∞ (which also implies r → ∞), and we conclude that fˆ
(and hence f ) vanishes identically.

Notes. This article first appeared at


terrytao.wordpress.com/2009/02/18.
Pedro Lauridsen Ribeiro noted an old result of Schrödinger, that the
only minimisers of the Heisenberg uncertainty principle were the Gaussians
(up to scaling, translation, and modulation symmetries).
Fabrice Planchon and Phillipe Jaming mentioned several related results
and generalisations, including a recent PDE-based proof of the Hardy un-
certainty principle (with the sharp constant) in [EsKePoVe2008].

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.7

Create an epsilon of
room

In this section I would like to discuss a fundamental trick in “soft” anal-


ysis, sometimes known as the limiting argument or epsilon regularisation
argument.
A quick description of the trick is as follows. Suppose one wants to
prove some statement S0 about some object x0 (which could be a number,
a point, a function, a set, etc.) To do so, pick a small ε > 0, and first prove
a weaker statement Sε (which allows for losses which go to zero as ε → 0)
about some perturbed object xε . Then, take limits ε → 0. Provided that the
dependency and continuity of the weaker conclusion Sε on ε are sufficiently
controlled, and xε is converging to x0 in an appropriately strong sense, you
will recover the original statement.
One can of course play a similar game when proving a statement S∞
about some object X∞ , by first proving a weaker statement SN on some
approximation XN to X∞ for some large parameter N , and then send N →
∞ at the end.
Some typical examples of a target statement S0 and the approximating
statements Sε that would converge to S appear in the following table.
Of course, to justify the convergence of Sε to S0 , it is necessary that
xε converge to x0 (or fε converge to f0 , etc.) in a suitably strong sense.
(But for the purposes of proving just upper bounds, such as f (x0 ) ≤ M , one
can often get by with quite weak forms of convergence, thanks to tools such
as Fatou’s lemma or the weak closure of the unit ball.) Similarly, we need

323

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
324 2. Related articles

S0 Sε
f (x0 ) = g(x0 ) f (xε ) = g(xε ) + o(1)
f (x0 ) ≤ g(x0 ) f (xε ) ≤ g(xε ) + o(1)
f (x0 ) > 0 f (xε ) ≥ c − o(1) for some c > 0 independent of ε
f (x0 ) is finite f (xε ) is bounded uniformly in ε
f (x0 ) ≥ f (x) for all x ∈ X f (xε ) ≥ f (x) − o(1) for all x ∈ X
(i.e., x0 maximises f ) (i.e., xε nearly maximises f )
fn (x0 ) converges as n → ∞ fn (xε ) fluctuates by at most o(1) for
sufficiently large n
f0 is a measurable function fε is a measurable function converging
pointwise to f0
f0 is a continuous function fε is an equicontinuous family of functions converging
pointwise to f0
OR fε is continuous and converges
(locally) uniformly to f0
The event E0 holds a.s. The event Eε holds with probability 1 − o(1)
The statement P0 (x) holds for a.e. x The statement Pε (x) holds for x outside of
a set of measure o(1)

some continuity (or at least semi-continuity) hypotheses on the functions f ,


g appearing above.
It is also necessary in many cases that the control Sε on the approximat-
ing object xε is somehow “uniform in ε”, although for “σ-closed” conclusions,
such as measurability, this is not required.5
By giving oneself an epsilon of room, one can evade a lot of famil-
iar issues in soft analysis. For instance, by replacing “rough”, “infinite-
complexity”, “continuous”, “global”, or otherwise “infinitary” objects x0
with “smooth”, “finite-complexity”, “discrete”, “local”, or otherwise “fini-
tary” approximants xε , one can finesse most issues regarding the justifi-
cation of various formal operations (e.g., exchanging limits, sums, deriva-
tives, and integrals).6 Similarly, issues such as whether the supremum
M := sup{f (x) : x ∈ X} of a function on a set is actually attained by
some maximiser x0 become moot if one is willing to settle instead for an
almost-maximiser xε , e.g., one which comes within an epsilon of that supre-
mum M (or which is larger than 1/ε, if M turns out to be infinite). Last,
but not least, one can use the epsilon of room to avoid degenerate solutions,
for instance by perturbing a non-negative function to be strictly positive,
perturbing a non-strictly monotone function to be strictly monotone, and
so forth.

5 It is important to note that it is only the final conclusion S on x that needs to have this
ε ε
uniformity in ε; one is permitted to have some intermediate stages in the derivation of Sε that
depend on ε in a non-uniform manner, so long as these non-uniformities cancel out or otherwise
disappear at the end of the argument.
6 It is important to be aware, though, that any quantitative measure on how smooth, discrete,

finite, etc., xε should be expected to degrade in the limit ε → 0, and so one should take extreme
caution in using such quantitative measures to derive estimates that are uniform in ε.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 325

To summarise: One can view the epsilon regularisation argument as a


loan in which one borrows an epsilon here and there in order to be able to
ignore soft analysis difficulties. Also one can temporarily be able to utilise
estimates which are non-uniform in epsilon, but at the end of the day one
needs to pay back the loan by establishing a final hard analysis estimate
which is uniform in epsilon (or whose error terms decay to zero as epsilon
goes to zero).
A variant: It may seem that the epsilon regularisation trick is useless
if one is already in hard analysis situations when all objects are already
finitary, and all formal computations easily justified. However, there is an
important variant of this trick which applies in this case: namely, instead
of sending the epsilon parameter to zero, choose epsilon to be a sufficiently
small (but not infinitesimally small) quantity, depending on other parame-
ters in the problem, so that one can eventually neglect various error terms
and obtain a useful bound at the end of the day. (For instance, any result
proven using the Szemerédi regularity lemma is likely to be of this type.)
Since one is not sending epsilon to zero, not every term in the final bound
needs to be uniform in epsilon, though for quantitative applications one
still would like the dependencies on such parameters to be as favourable as
possible.

2.7.1. Examples. The soft analysis components of any real analysis text-
book will contain a large number of examples of this trick in action. In
particular, any argument which exploits Littlewood’s three principles of real
analysis is likely to utilise this trick. Of course, this trick also occurs repeat-
edly in Chapter , and thus was chosen as the title of this book.
Example 2.7.1 (Riemann-Lebesgue lemma). Given any absolutely inte-
grable function f ∈ L1 (R), the Fourier transform fˆ : R → C is defined by
the formula
fˆ(ξ) := f (x)e−2πixξ dx.
R

The Riemann-Lebesgue lemma asserts that fˆ(ξ) → 0 as ξ → ∞. It is difficult


to prove this estimate for f directly, because this function is too rough: it
is absolutely integrable (which is enough to ensure that fˆ exists and is
bounded), but need not be continuous, differentiable, compactly supported,
bounded, or otherwise nice. But suppose we give ourselves an epsilon of
room. Then, as the space C0∞ of test functions is dense in L1 (R) (Exercise
1.13.5), we can approximate f to any desired accuracy ε > 0 in the L1 norm
by a smooth, compactly supported function fε : R → C, thus

(2.21) |f (x) − fε (x)| dx ≤ ε.


R

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
326 2. Related articles

The point is that fε is much better behaved than f , and it is not difficult
to show the analogue of the Riemann-Lebesgue lemma for fε . Indeed, being
smooth and compactly supported, we can now justifiably integrate by parts
to obtain
1
fˆε (ξ) = f  (x)e−2πixξ dx
2πiξ R ε
for any non-zero ξ, and it is now clear (since f  is bounded and compactly
supported) that fˆε (ξ) → 0 as ξ → ∞.
Now we need to take limits as ε → 0. It will be enough to have fˆε
converge uniformly to fˆ. But from (2.21) and the basic estimate

(2.22) sup |ĝ(ξ)| ≤ |g(x)| dx


ξ R

(which is the single hard analysis ingredient in the proof of the lemma)
applied to g := f − fε , we see (by the linearity of the Fourier transform)
that
sup |fˆ(ξ) − fˆε (ξ)| ≤ ε,
ξ
and we obtain the desired uniform convergence.
Remark 2.7.2. The same argument also shows that fˆ is continuous; we
leave this as an exercise to the reader. See also Exercise 1.12.11 for the
generalisation of this lemma to other locally compact abelian groups.
Remark 2.7.3. Example 2.7.1 is a model case of a much more general
instance of the limiting argument: in order to prove a convergence or conti-
nuity theorem for all rough functions in a function space, it suffices to first
prove convergence or continuity for a dense subclass of smooth functions,
and combine that with some quantitative estimate in the function space (in
this case, (2.22)) in order to justify the limiting argument. See Corollary
1.7.7 for an important example of this principle.
Example 2.7.4. The limiting argument in Example 2.7.1 relied on the
linearity of the Fourier transform f → fˆ. But, with more effort, it is also
possible to extend this type of argument to non-linear settings. We will
sketch (omitting several technical details, which can be found for instance
in my PDE book [Ta2006]) a very typical instance. Consider a nonlinear
PDE, e.g., the cubic non-linear wave equation
(2.23) −utt + uxx = u3 ,
where u : R × R → R is some scalar field, and the t and x subscripts denote
differentiation of the field u(t, x). If u is sufficiently smooth and sufficiently
decaying at spatial infinity, one can show that the energy
1 1 1
(2.24) E(u)(t) := |ut (t, x)|2 + |ux (t, x)|2 + |u(t, x)|4 dx
R 2 2 4

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 327

is conserved, thus E(u)(t) = E(u)(0) for all t. Indeed, this can be formally
justified by computing the derivative ∂t E(u)(t) by differentiating under the
integral sign, integrating by parts, and then applying the PDE (2.23); we
leave this as an exercise for the reader.7 However, these justifications do re-
quire a fair amount of regularity on the solution u; for instance, requiring u to
be three times continuously differentiable in space and time, and compactly
supported in space on each bounded time interval, would be sufficient to
make the computations rigorous by applying “off the shelf” theorems about
differentiation under the integration sign, etc.
But suppose one only has a much rougher solution, for instance an energy
class solution which has finite energy (2.24), but for which higher derivatives
of u need not exist in the classical sense.8 Then it is difficult to justify the
energy conservation law directly. However, it is still possible to obtain energy
conservation by the limiting argument. Namely, one takes the energy class
solution u at some initial time (e.g., t = 0) and approximates that initial data
(the initial position u(0) and initial data ut (0)) by a much smoother (and
(ε)
compactly supported) choice (u(ε) (0), ut (0)) of initial data, which converges
back to (u(0), ut (0)) in a suitable energy topology related to (2.24), which
we will not define here (it is based on Sobolev spaces, which are discussed
in Section 1.14). It then turns out (from the existence theory of the PDE
(ε)
(2.23)) that one can extend the smooth initial data (u(ε) (0), ut (0)) to other
times t, providing a smooth solution u(ε) to that data. For this solution, the
energy conservation law E(u(ε) )(t) = E(u(ε) )(0) can be justified.
(ε)
Now we take limits as ε → 0 (keeping t fixed). Since (u(ε) (0), ut (0))
converges in the energy topology to (u(0), ut (0)), and the energy functional
E is continuous in this topology, E(u(ε) )(0) converges to E(u)(0). To con-
clude the argument, we will also need E(u(ε) )(t) to converge to E(u)(t),
(ε)
which will be possible if (u(ε) (t), ut (t)) converges in the energy topology
to (u(t), ut (t)). This in turn follows from a fundamental fact (which re-
quires a certain amount of effort to prove) about the PDE to (2.24), namely
that it is well-posed in the energy class. This means that not only do solu-
tions exist and are unique for initial data in the energy class, but they also
depend continuously on the initial data in the energy topology; small per-
turbations in the data lead to small perturbations in the solution, or more
formally, the map (u(0), ut (0)) → (u(t), ut (t)) from data to solution (say,
at some fixed time t) is continuous in the energy topology. This final fact
7 There are also more fancy ways to see why the energy is conserved, using Hamiltonian or

Lagrangian mechanics or by the more general theory of stress-energy tensors, but we will not
discuss these here.
8 There is a non-trivial issue regarding how to make sense of the PDE (2.23) when u is only

in the energy class, since the terms utt and uxx do not then make sense classically, but there are
standard ways to deal with this, e.g., using weak derivatives; see Section 1.13.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
328 2. Related articles

concludes the limiting argument and gives us the desired conservation law
E(u(t)) = E(u(0)).
Remark 2.7.5. It is important that one have a suitable well-posedness
theory in order to make the limiting argument work for rough solutions to
a PDE; without such a well-posedness theory, it is possible for quantities
which are formally conserved to cease being conserved when the solutions
become too rough or otherwise weak; energy, for instance, could disappear
into a singularity and not come back.
Example 2.7.6 (Maximum principle). The maximum principle is a funda-
mental tool in elliptic and parabolic PDE (for example, it is used heavily
in the proof of the Poincaré conjecture, discussed extensively in Poincaré’s
legacies, Vol. II ). Here is a model example of this principle:
Proposition 2.7.7. Let u : D → R be a smooth harmonic function on
the closed unit disk D := {(x, y) : x2 + y 2 ≤ 1}. If M is a bound such that
u(x, y) ≤ M on the boundary ∂D := {(x, y) : x2 +y 2 = 1}, then u(x, y) ≤ M
on the interior as well.

A naive attempt to prove Proposition 2.7.7 comes very close to working,


and goes like this: Suppose for contradiction that the proposition failed, thus
u exceeds M somewhere in the interior of the disk. Since u is continuous, and
the disk is compact, there must then be a point (x0 , y0 ) in the interior of the
disk where the maximum is attained. Undergraduate calculus then tells us
that uxx (x0 , y0 ) and uyy (x0 , y0 ) are non-positive, which almost contradicts
the harmonicity hypothesis uxx + uyy = 0. However, it is still possible that
uxx and uyy both vanish at (x0 , y0 ), so we do not yet get a contradiction.
But we can finish the proof by giving ourselves an epsilon of room. The
trick is to work not with the function u directly, but with the modified
function u(ε) (x, y) := u(x, y) + ε(x2 + y 2 ), to boost the harmonicity into
(ε) (ε)
subharmonicity. Indeed, we have uxx + uyy = 4ε > 0. The preceding
argument now shows that u(ε) cannot attain its maximum in the interior
of the disk; since it is bounded by M + ε on the boundary of the disk, we
conclude that u(ε) is bounded by M + ε on the interior of the disk as well.
Sending ε → 0 we obtain the claim.
Remark 2.7.8. Of course, Proposition 2.7.7 can also be proven by much
more direct means, for instance via the Green’s function for the disk. How-
ever, the argument given is extremely robust and applies to a large class of
both linear and nonlinear elliptic and parabolic equations, including those
with rough variable coefficients.
Exercise 2.7.1. Use the maximum modulus principle to prove the Phrag-
mén-Lindelöf principle: if f is complex analytic on the strip {z : 0 ≤ Re(z) ≤

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 329

1}, is bounded in magnitude by 1 on the boundary of this strip, and obeys


a growth condition |f (z)| ≤ Ce|z| on the interior of the strip, then show
C

that f is bounded in magnitude by 1 throughout the strip. (Hint: Multiply


f by e−εz for some even integer m.) See Section 1.11 for some applications
m

of this principle to interpolation theory.

Example 2.7.9 (Manipulating generalised functions). In PDE we are pri-


marily interested in smooth (classical) solutions; but for a variety of reasons
it is useful to also consider rougher solutions. Sometimes, these solutions
are so rough that they are no longer functions, but are measures, distribu-
tions (see Section 1.13), or some other concept of generalised function or
generalised solution. For instance, the fundamental solution to a PDE is
typically just a distribution or measure, rather than a classical function. A
typical example: a (sufficiently smooth) solution to the three-dimensional
wave equation −utt + Δu = 0 with initial position u(0, x) = 0 and initial
velocity ut (0, x) = g(x) is given by the classical formula

u(t) = tg ∗ σt

for t > 0, where σt is the unique rotation-invariant probability measure


on the sphere St := {(x, y, z) ∈ R3 : x2 + y 2 + z 2 = t2 } of radius t, or
equivalently, the area element dS on that sphere divided by the surface
area 4πt2 of that sphere. (The convolution f ∗ μ of a smooth function f
and a (compactly supported) finite measure μ is defined by f ∗ μ(x) :=
f (x − y) dμ(y); one can also use the distributional convolution defined in
Section 1.13.)
For this and many other reasons, it is important to manipulate measures
and distributions in various ways. For instance, in addition to convolving
functions with measures, it is also useful to convolve measures with mea-
sures; the convolution μ ∗ ν of two finite measures on Rn is defined as the
measure which assigns to each measurable set E in Rn , the measure

(2.25) μ ∗ ν(E) := 1E (x + y) dμ(x)dν(y).

For the sake of concreteness, let us focus on a specific question, namely to


compute (or at least estimate) the measure σ ∗ σ, where σ is the normalised
rotation-invariant measure on the unit circle {x ∈ R2 : |x| = 1}. It turns
out that while σ is not absolutely continuous with respect to Lebesgue mea-
sure m, the convolution is: d(σ ∗ σ) = f dm for some absolutely integrable
function f on R2 . But what is this function f ? It certainly is possible to
compute it from the definition (2.25) or by other methods (e.g., the Fourier
transform), but I would like to give one approach to computing these sorts
of expressions involving measures (or other generalised functions) based on

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
330 2. Related articles

epsilon regularisation, which requires a certain amount of geometric com-


putation but which I find to be rather visual and conceptual, compared to
more algebraic approaches (e.g., those based on Fourier transforms). The
idea is to approximate a singular object, such as the singular measure σ,
by a smoother object σε , such as an absolutely continuous measure. For
instance, one can approximate σ by
1
dσε := 1A dm,
m(Aε ) ε
where Aε := {x ∈ R2 : 1 − ε ≤ |x| ≤ 1 + ε} is a thin annular neighbourhood
of the unit circle. It is clear that σε converges to σ in the vague topology,
which implies that σε ∗ σε converges to σ ∗ σ in the vague topology also.
Since
1
σε ∗ σ ε = 1A ∗ 1Aε dm,
m(Aε )2 ε
we will be able to understand the limit f by first considering the function
1 m(Aε ∩ (x − Aε ))
fε (x) := 2
1Aε ∗ 1Aε (x) =
m(Aε ) m(Aε )2
and then taking (weak) limits as ε → 0 to recover f .
Up to constants, one can compute from elementary geometry that m(Aε )
is comparable to ε, and m(Aε ∩ (x − Aε )) vanishes for |x| ≥ 2 + 2ε, and is
comparable to ε2 (2 − |x|)−1/2 for 1 ≤ |x| ≤ 2 − 2ε (and of size O(ε3/2 ) in the
transition region |x| = 2+O(ε)) and is comparable to ε2 |x|−1 for ε ≤ |x| ≤ 1
(and of size about O(ε) when |x| ≤ ε. (This is a good exercise for anyone who
wants practice in quickly computing the orders of magnitude of geometric
quantities such as areas; for such order of magnitude calculations, “quick and
dirty” geometric methods tend to work better here than the more algebraic
calculus methods you would have learned as an undergraduate.) The bounds
here are strong enough to allow one to take limits and conclude what f looks
like: it is comparable to |x|−1 (2 − |x|)−1/2 1|x|≤2 . And by being more careful
with the computations of area, one can compute the exact formula for f (x),
though I will not do so here.
Remark 2.7.10. Epsilon regularisation also sheds light on why certain
operations on measures or distributions are not permissible. For instance,
squaring the Dirac delta function δ will not give a measure or distribution,
because if one looks at the squares δε2 of some smoothed out approximations
δε to the Dirac function (i.e., approximations to the identity), one sees that
their masses go to infinity in the limit ε → 0, and so cannot be integrated
against test functions uniformly in ε. On the other hand, derivatives of the
delta function, while no longer measures (the total variation of derivatives
of δε become unbounded), are at least still distributions (the integrals of
derivatives of δε against test functions remain convergent).

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 331

Notes. This article first appeared at


terrytao.wordpress.com/2009/02/28.
Thanks to Harald, nicolaennio, and RK for corrections.
The article was a submission to the Tricki (www.tricki.org), an online
repository of mathematical tricks. A version of this article appears on that
site at www.tricki.org/article/Create an epsilon of room.
Dima pointed out that a variant of the epsilon regularisation argument
is used routinely in real algebraic geometry, when the underlying field R
is extended to the field of real Puiseaux series in a parameter ε. After
performing computations in this extension, one eventually sets ε to zero to
recover results in the original real field.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.8

Amenability

Recently, I have been studying the concept of amenability on groups. This


concept can be defined in a combinatorial or finitary fashion, using Følner
sequences, and also in a more functional-analytic or infinitary fashion, using
invariant means. I wanted to get some practice passing back and forth
between these two definitions, so I wrote down some notes on how to do
this, and also how to take some facts about amenability that are usually
proven in one setting, and prove them instead in the other.

2.8.1. Equivalent definitions of amenability. For simplicity we will


restrict our attention to countable groups G. Given any f : G → R and
x ∈ G, we define the left-translation τx f : G → R by the formula τx f (y) :=
f (x−1 y). Given g : G → R as well, we define the inner product f, g :=

x∈G f (x)g(x) whenever the right-hand side is convergent.
All p spaces are real-valued. The cardinality of a finite set A is denoted
|A|. The symmetric difference of two sets A, B is denoted AΔB.
A finite mean is a non-negative, finitely supported function μ : G →
R+ such that μ 1 (G) = 1. A mean is a non-negative linear functional
λ : ∞ (G) → R such that λ(1) = 1. Note that every finite mean μ can be
viewed as a mean λμ by the formula λμ (f ) := f, μ.
The following equivalences were established by Følner [Fo1955]:

Theorem 2.8.1. Let G be a countable group. Then the following are equiv-
alent:

(i) There exists a left-invariant mean λ : ∞ (G) → R, i.e., a mean such


that λ(τx f ) = λ(f ) for all f ∈ ∞ (G) and x ∈ G.

333

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
334 2. Related articles

(ii) For every finite set S ⊂ G and every ε > 0, there exists a finite mean
ν such that ν − τx ν 1 (G) ≤ ε for all x ∈ S.
(iii) For every finite set S ⊂ G and every ε > 0, there exists a non-empty
finite set A ⊂ G such that |(x · A)ΔA|/|A| ≤ ε for all x ∈ S.
(iv) There exists a sequence An of non-empty finite sets such that
|x · An ΔAn |/|An | → 0 as n → ∞ for each x ∈ G. (Such a sequence
is called a Følner sequence.)

Proof. We shall use an argument of Namioka [Na1964].


(i) implies (ii): Suppose for contradiction that (ii) failed, then there
exists S, ε such that ν − τx ν 1 (G) > ε for all means ν and all x ∈ S. The
set {(ν − τx ν)x∈S : ν ∈ 1 (G)} is then a convex set of ( 1 (G))S that is
bounded away from zero. Applying the Hahn-Banach separation theorem
(Theorem 1.5.14), there thus exists a linear functional ρ ∈ ( 1 (G)S )∗ such
that ρ((ν − τx ν)x∈S ) ≥ 1 for all means ν. Since S ∗ ∞ (G)S , there
 ( (G) ) ≡
1

thus exist mx ∈ (G) for x ∈ S such that x∈S ν − δx ∗ ν, mx  ≥ 1 for all
means ν, thus ν, x∈S mx − τx−1 mx  ≥ 1. Specialising ν to the Kronecker
means δy , we see that x∈S mx −τx−1 mx ≥ 1 pointwise. Applying the mean
λ, we conclude that x∈S λ(mx ) − λ(τx−1 mx ) ≥ 1. But this contradicts the
left-invariance of λ.
(ii) implies (iii): Fix S (which we can take to be non-empty), and let
ε > 0 be a small quantity to be chosen later. By (ii) we can find a finite
mean ν such that
ν − τx ν 1 (G) < ε/|S|
for all x ∈ S.
k
Using the layer-cake decomposition, we can write ν = i=1 ci 1Ei for
some nested non-empty sets E1 ⊃ E2 ⊃ · · · ⊃ Ek and some positive con-

stants ci . As ν is a mean, we have ki=1 ci |Ei | = 1. On the other hand,
observe that |ν − τx ν| is at least ci on (x · Ei )ΔEi . We conclude that
k k
ε
ci |(x · Ei )ΔEi | ≤ ci |Ei |
|S|
i=1 i=1
for all x ∈ S, and thus
k k
ci |(x · Ei )ΔEi | ≤ ε ci |Ei |.
i=1 x∈S i=1
By the pigeonhole principle, there thus exists i such that
|(x · Ei )ΔEi | ≤ ε|Ei |
x∈S
and the claim follows.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.8. Amenability 335

(iii) implies (iv): Write the countable group G as the increasing union
of finite sets Sn and apply (iii) with ε := 1/n and S := Sn to create the set
An .
(iv) implies (i): Use the Hahn-Banach theorem to select an infinite mean
ρ ∈ ∞ (N)∗ \ 1 (N), and define λ(m) = ρ((m, |A1n | 1An )n∈N ). (Alterna-
tively, one can define λ(m) to be an ultralimit of the m, |A1n | 1An .) 

Any countable group obeying any (and hence all) of (i)–(iv) is called
amenable.
Remark 2.8.2. The above equivalences are proven in a non-constructive
manner, due to the use of the Hahn-Banach theorem (as well as the con-
tradiction argument). Thus, for instance, it is not immediately obvious
how to convert an invariant mean into a Følner sequence, despite the above
equivalences.
2.8.2. Examples of amenable groups. We give some model examples
of amenable and non-amenable groups:
Proposition 2.8.3. Every finite group is amenable.

Proof. Trivial (either using invariant means or Følner sequences). 


Proposition 2.8.4. The integers Z = (Z, +) are are amenable.

Proof. One can take the sets AN = {1, . . . , N } as the Følner sequence, or
an ultralimit as an invariant mean. 
Proposition 2.8.5. The free group F2 on two generators e1 , e2 is not
amenable.

Proof. We first argue using invariant means. Suppose for contradiction


that one had an invariant mean λ. Let E1 , E2 , E−1 , E−2 ⊂ F2 be the set
of all words beginning with e1 , e2 , e−1 −1
1 , e2 , respectively. Observe that
−1
E2 ⊂ (e1 · E1 )\E1 , thus λ(1E2 ) ≤ λ(τe−1 1E1 ) − λ(1E1 ). By invariance
1
we conclude that λ(1E2 ) = 0; similarly for 1E1 , 1E−1 , 1E−2 . Since the
identity element clearly must have mean zero, we conclude that the mean λ
is identically zero, which is absurd.
Now we argue using Følner sequences. If F2 were amenable, then for any
ε > 0 we could find a finite non-empty set A such that x·A differs from A by
at most ε|A| points for x = e1 , e2 , e−1 −1
1 , e2 . The set e1 ·(A∩(E2 ∪E−1 ∪E−2 ))
is contained in e1 · A and in E1 , and so
|e1 · (A\E−1 )| ≤ |A ∩ E1 | + ε|A|,
and thus
|A| − |A ∩ E−1 | ≤ |A ∩ E1 | + ε|A|.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
336 2. Related articles

Similarly for permutations. Summing up over all four permutations, we


obtain
4|A| − |A| ≤ |A| + 4ε|A|,
leading to a contradiction for ε small enough (any ε < 1/2 will do). 
Remark 2.8.6. The non-amenability of the free group is related to the
Banach-Tarski paradox (see Section 2.2).

Now we generate some more amenable groups.


Proposition 2.8.7. Let 0 → H → G → K → 0 be a short exact sequence
of countable groups (thus H can be identified with a normal subgroup of G,
and K can be identified with G/H). If H and K are amenable, then G is
amenable also.

Proof. Using invariant means, there is a very short proof: Given invariant
means λH , λK for H, K, we can build an invariant mean λG for G by the
formula
λG (f ) := λK (F )
for any f ∈ ∞ (G), where F : K → R is the function defined as F (xH) :=
λH (f (x·)) for all cosets xH (note that the left-invariance of λH shows that
the exact choice of coset representative x is irrelevant). (One can view λG
as sort of a “product measure” of the λH and λK .)
Now we argue using Følner sequences instead. Let En , Fn be Følner
sequences for H, K, respectively. Let S be a finite subset of G, and let
ε > 0. We would like to find a finite non-empty subset A ⊂ G such that
|(x · A)\A| ≤ ε|A| for all x ∈ S; this will demonstrate amenability. (Note
that by taking S to be symmetric, we can replace |(x·A)\A| with |(x·A)ΔA|
without difficulty.)
By taking n large enough, we can find Fn such that π(x) · Fn differs
from Fn by at most ε|Fn |/2 elements for all x ∈ S, where π : G → K is the
projection map. Now, let Fn be a pre-image of Fn in G. Let T be the set of
all group elements t ∈ K such that S · Fn intersects Fn · t. Observe that T
is finite. Thus, by taking m large enough, we can find Em such that t · Em
differs from Em by at most ε|Em |/2|T | points for all t ∈ T .
Now set A := Fn · Em = {zy : y ∈ Em , z ∈ Fn }. Observe that the sets
z ·Em for z ∈ Fn lie in disjoint cosets of H and so |A| = |Em ||Fn | = |Em ||Fn |.
Now take x ∈ S, and consider an element of (x · A)\A. This element must
take the form xzy for some y ∈ Em and z ∈ Fn . The coset of H that xzy
lies in is given by π(x)π(z). Suppose first that π(x)π(z) lies outside of Fn .
By construction, this occurs for at most ε|Fn |/2 choices of z, leading to at
most ε|Em ||Fn |/2 = ε|A|/2 elements in (x · A)\A.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.8. Amenability 337

Now suppose instead that π(x)π(z) lies in Fn . Then we have xz = z  t


for some z  ∈ Fn and t ∈ T , by construction of T , and so xzy = z  ty. But
as xzy lies outside of A, ty must lie outside of Em . But by construction
of Em , there are at most ε|Em |/2|T | possible choices of y that do this for
each fixed x, t, leading to at most ε|Em ||Fn |/2 = ε|A|/2. We thus have
|(x · A)\A| ≤ ε|A| as required. 
Proposition 2.8.8.Let G1 ⊂ G2 ⊂ · · · be a sequence of countable amenable
groups. Then G := n Gn is amenable.

Proof. We first use invariant means. An invariant mean on ∞ (Gn ) induces


a mean on ∞ (G) which is invariant with respect to translations by Gn .
Taking an ultralimit of these means, we obtain the claim.
Now we use Følner sequences. Given any finite set S ⊂ G and ε > 0, we
have S ⊂ Gn for some n. As Gn is amenable, we can find A ⊂ Gn such that
|(x · A)ΔA| ≤ ε|A| for all x ∈ S, and the claim follows. 
Proposition 2.8.9. Every countable virtually solvable group G is amenable.

Proof. By definition, every virtually solvable group contains a solvable


group of finite index, and thus contains a normal solvable subgroup of finite
index. (Note that every subgroup H of G of index I contains a normal sub-
group of index at most I!, namely the stabiliser of the G action on G/H.)
By Proposition 2.8.7 and Proposition 2.8.3, we may thus reduce to the case
when G is solvable. By inducting on the derived length of this solvable
group using Proposition 2.8.7 again, it suffices to verify this when the group
is abelian. By Proposition 2.8.8, it suffices to verify this when the group
is abelian and finitely generated. By Proposition 2.8.7 again, it suffices to
verify this when the group is cyclic. But this follows from Proposition 2.8.3
and Proposition 2.8.4. 

Notes. This article first appeared at


terrytao.wordpress.com/2009/04/14.
Thanks to Orr for corrections.
Danny Calegari noted the application of amenability to that of obtain-
ing asymptotic limit objects in dynamics (e.g., via the ergodic theorem for
amenable groups). Jason Behrstock mentioned an amusing characterisa-
tion of amenability, as those groups which do not admit successful “Ponzi
schemes”, schemes in which each group element passes on a bounded amount
of money to its neighbours (in a Cayley graph) in such a way that every-
one profits. There was some ensuing discussion as to the related question
of whether amenable and non-amenable groups admit non-trivial bounded
harmonic functions.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Bibliography

[AgKaSa2004] M. Agrawal, N. Kayal, N. Saxena, PRIMES is in P, Annals of Mathematics


160 (2004), no. 2, pp. 781–793.
[AjSz1974] M. Ajtai, E. Szemerédi, Sets of lattice points that form no squares, Stud. Sci.
Math. Hungar. 9 (1974), 9–11 (1975).
[AlDuLeRoYu1994] N. Alon, R. Duke, H. Lefmann, Y. Rödl, R. Yuster, The algorithmic
aspects of the regularity lemma, J. Algorithms 16 (1994), no. 1, 80–10.
[AlSh2008] N. Alon, A. Shapira, Every monotone graph property is testable, SIAM J.
Comput. 38 (2008), no. 2, 505–522.
[AlSp2008] N. Alon, J. Spencer, The Probabilistic Method. Third edition. With an ap-
pendix on the life and work of Paul Erdős. Wiley-Interscience Series in Discrete Math-
ematics and Optimization. John Wiley & Sons, Inc., Hoboken, NJ, 2008.
[Au2008] T. Austin, On exchangeable random variables and the statistics of large graphs
and hypergraphs, Probab. Surv. 5 (2008), 80–145.
[Au2009] T. Austin, Deducing the multidimensional Szemerédi Theorem from an infinitary
removal lemma, preprint.
[Au2009b] T. Austin, Deducing the Density Hales-Jewett Theorem from an infinitary re-
moval lemma, preprint.
[AuTa2010] T. Austin, T. Tao, On the testability and repair of hereditary hypergraph
properties, preprint.
[Ax1968] J. Ax, The elementary theory of finite fields, Ann. of Math. 88 (1968) 239–271.
[BaGiSo1975] T. Baker, J. Gill, R. Solovay, Relativizations of the P =?N P question,
SIAM J. Comput. 4 (1975), no. 4, 431–442.
[Be1975] W. Beckner, Inequalities in Fourier analysis, Ann. of Math. 102 (1975), no. 1,
159–182.
[BeTaZi2009] V. Bergelson, T. Tao, T. Ziegler, An inverse theorem for the uniformity
seminorms associated with the action of F ω , preprint.
[BeLo1976] J. Bergh, J. Löfström, Interpolation spaces. An introduction. Grundlehren der
Mathematischen Wissenschaften, No. 223. Springer-Verlag, Berlin–New York, 1976.
[BiRo1962] A. Bialynicki-Birula, M. Rosenlicht, Injective morphisms of real algebraic va-
rieties, Proc. Amer. Math. Soc. 13 (1962) 200–203.

339

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
340 Bibliography

[BoKe1996] E. Bogomolny, J. Keating, Random matrix theory and the Riemann zeros. II.
n-point correlations, Nonlinearity 9 (1996), no. 4, 911–935.
[Bo1969] A. Borel, Injective endomorphisms of algebraic varieties, Arch. Math. (Basel)
20 1969 531–537.
[Bo1999] J. Bourgain, On the dimension of Kakeya sets and related maximal inequalities,
Geom. Funct. Anal. 9 (1999), no. 2, 256–282.
[BoBr2003] J. Bourgain, H. Brezis, On the equation div Y = f and application to control
of phases, J. Amer. Math. Soc. 16 (2003), no. 2, 393–426.
[BudePvaR2008] G. Buskes, B. de Pagter, A. van Rooij, The Loomis-Sikorski theorem
revisited, Algebra Universalis 58 (2008), 413–426.
[ChPa2009] T. Chen, N. Pavlovic, The quintic NLS as the mean field limit of a boson gas
with three-body interactions, preprint.
[ClEdGuShWe1990] K. Clarkson, H. Edelsbrunner, L. Guibas, M. Sharir, E. Welzl, Com-
binatorial complexity bounds for arrangements of curves and spheres, Discrete Com-
put. Geom. 5 (1990), no. 2, 99–160.
[Co1989] J. B. Conrey, More than two fifths of the zeros of the Riemann zeta function are
on the critical line, J. Reine Angew. Math. 399 (1989), 1–26.
[Dy1970] F. Dyson, Correlations between eigenvalues of a random matrix, Comm. Math.
Phys. 19 1970 235–250.
[ElSz2008] G. Elek, B. Szegedy, A measure-theoretic approach to the theory of dense hy-
pergraphs, preprint.
[ElObTa2009] J. Ellenberg, R. Oberlin, T. Tao, The Kakeya set and maximal conjectures
for algebraic varieties over finite fields, preprint.
[ElVeWe2009] J. Ellenberg, A. Venkatesh, C. Westerland, Homological stability for Hur-
witz spaces and the Cohen-Lenstra conjecture over function fields, preprint.
[ErKa1940] P. Erdős, M. Kac, The Gaussian Law of Errors in the Theory of Additive
Number Theoretic Functions, American Journal of Mathematics, volume 62, No. 1/4,
(1940), 738–742.
[EsKePoVe2008] L. Escauriaza, C. E. Kenig, G. Ponce, L. Vega, Hardy’s uncertainty
principle, convexity and Schrödinger evolutions, J. Eur. Math. Soc. (JEMS) 10 (2008),
no. 4, 883–907.
[Fa2003] K. Falconer, Fractal geometry, Mathematical foundations and applications. Sec-
ond edition. John Wiley & Sons, Inc., Hoboken, NJ, 2003.
[FeSt1972] C. Fefferman, E. M. Stein, H p spaces of several variables, Acta Math. 129
(1972), no. 3–4, 137–193.
[FiMaSh2007] E. Fischer, A. Matsliach, A. Shapira, Approximate Hypergraph Partitioning
and Applications, Proc. of FOCS 2007, 579–589.
[Fo2000] G. Folland, Real Analysis, Modern techniques and their applications. Second edi-
tion. Pure and Applied Mathematics (New York). A Wiley-Interscience Publication.
John Wiley & Sons, Inc., New York, 1999.
[Fo1955] E. Følner, On groups with full Banach mean value, Math. Scand. 3 (1955), 243–
254.
[Fo1974] J. Fournier, Majorants and Lp norms, Israel J. Math. 18 (1974), 157–166.
[Fr1973] G. Freiman, Groups and the inverse problems of additive number theory, Number-
theoretic studies in the Markov spectrum and in the structural theory of set addition,
pp. 175–183. Kalinin. Gos. Univ., Moscow, 1973.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Bibliography 341

[Fu1977] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Sze-


merédi on arithmetic progressions, J. Analyse Math. 31 (1977), 204–256.
[FuKa1989] H. Furstenberg, Y. Katznelson, A density version of the Hales-Jewett theorem
for k = 3, Graph theory and combinatorics (Cambridge, 1988). Discrete Math. 75
(1989), no. 1–3, 227–241.
[FuKa1991] H. Furstenberg, Y. Katznelson, A density version of the Hales-Jewett theorem,
J. Anal. Math. 57 (1991), 64–119.
[GiTr1998] D. Gilbarg, N. Trudinger, Elliptic partial differential equations of second order.
Reprint of the 1998 edition. Classics in Mathematics. Springer-Verlag, Berlin, 2001.
[GoMo1987] D. Goldston, H. Montgomery, Pair correlation of zeros and primes in short
intervals, Analytic number theory and Diophantine problems (Stillwater, OK, 1984),
183–203, Progr. Math., 70, Birkhäuser Boston, Boston, MA, 1987.
[Go1993] W. T. Gowers, B. Maurey, The unconditional basic sequence problem, J. Amer.
Math. Soc. 6 (1993), no. 4, 851–874.
[Gr1992] A. Granville, On elementary proofs of the prime number theorem for arithmetic
progressions, without characters, Proceedings of the Amalfi Conference on Analytic
Number Theory (Maiori, 1989), 157–194, Univ. Salerno, Salerno, 1992.
[Gr2005] A. Granville, It is easy to determine whether a given integer is prime, Bull.
Amer. Math. Soc. (N.S.) 42 (2005), no. 1, 3–38.
[GrSo2007] A. Granville, K. Soundararajan, Large character sums: pretentious characters
and the Pólya-Vinogradov theorem, J. Amer. Math. Soc. 20 (2007), no. 2, 357–384.
[GrTa2007] B. Green, T. Tao, The distribution of polynomials over finite fields, with ap-
plications to the Gowers norms, preprint.
[GrTaZi2009] B. Green, T. Tao, T. Ziegler, An inverse theorem for the Gowers U 4 norm,
preprint.
[Gr1999] M. Gromov, Endomorphisms of symbolic algebraic varieties, J. Eur. Math. Soc.
(JEMS) 1 (1999), no. 2, 109–197.
[Gr1966] A. Grothendieck, Éléments de géométrie algébrique. IV. Étude locale des
schémas et des morphismes de schémas. III., Inst. Hautes Études Sci. Publ. Math.
No. 28 1966 255 pp.
[GyMaRu2008] K. Gyarmati, M. Matolcsi, I. Ruzsa, Plünnecke’s inequality for different
summands, Building bridges, 309–320, Bolyai Soc. Math. Stud., 19, Springer, Berlin,
2008
[Ho1990] L. Hörmander, The analysis of linear partial differential operators. I–IV. Reprint
of the second (1990) edition. Classics in Mathematics. Springer-Verlag, Berlin, 2003.
[HoKrPeVi2006] J. Hough, M. Krishnapur, Y. Peres, B. Virág, Determinantal processes
and independence, Probab. Surv. 3 (2006), 206–229.
[Hr2009] E. Hrushovski, Stable group theory and approximate subgroups, preprint.
[Hu1968] R. Hunt, On the convergence of Fourier series, 1968 Orthogonal Expansions
and their Continuous Analogues (Proc. Conf., Edwardsville, Ill., 1967) pp. 235–255
Southern Illinois Univ. Press, Carbondale, Ill.
[Is2006] Y. Ishigami, A Simple Regularization of Hypergraphs, preprint.
[IwKo2004] H. Iwaniec, E. Kowalski, Analytic number theory. American Mathematical
Society Colloquium Publications, 53. American Mathematical Society, Providence,
RI, 2004.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
342 Bibliography

[Jo1986] D. Joyner, Distribution theorems of L-functions, Pitman Research Notes in


Mathematics Series, 142. Longman Scientific & Technical, Harlow; John Wiley &
Sons, Inc., New York, 1986
[KaVe1983] V. Kaimanovich, A. Vershik, Random walks on discrete groups: boundary and
entropy, Ann. Probab. 11 (1983), no. 3, 457–490.
[KiVi2009] R. Killip, M. Visan, Nonlinear Schrödinger Equations at critical regularity,
preprint.
[KiScSt2008] K. Kirkpatrick, B. Schlein, G. Staffilani, Derivation of the two dimensional
nonlinear Schrodinger equation from many body quantum dynamics, preprint.
[KlMa2008] S. Klainerman, M. Machedon, On the uniqueness of solutions to the Gross-
Pitaevskii hierarchy, Comm. Math. Phys. 279 (2008), no. 1, 169–185.
[Ku1999] K. Kurdyka, Injective endomorphisms of real algebraic sets are surjective, Math.
Ann. 313 (1999), no. 1, 69–82.
 aba, Fuglede’s conjecture for a union of two intervals, Proc. Amer. Math.
[La2001] I. L
Soc. 129 (2001), no. 10, 2965–2972.
 aba, T. Tao, An x-ray transform estimate in Rn , Rev. Mat. Iberoamericana
[LaTa2001] I. L
17 (2001), no. 2, 375–407.
[La1996] A. Laurincikas, Limit theorems for the Riemann zeta-function. Mathematics and
its Applications, 352. Kluwer Academic Publishers Group, Dordrecht, 1996.
[Le2009] A. Leibman, A canonical form and the distribution of values of generalised poly-
nomials, preprint.
[Le2000] V. Lev, Restricted Set Addition in Groups I: The Classical Setting, J. London
Math. Soc. 62 (2000), no. 1, 27–40.
[LiLo2000] E. Lieb, E. Loss, Analysis. Second edition. Graduate Studies in Mathematics,
14. American Mathematical Society, Providence, RI, 2001.
[Li1853] J. Liouville, Sur l’equation aux differences partielles, J. Math. Pure et Appl. 18
(1853), 71–74.
[LiTz1971] J. Lindenstrauss, L. Tzafriri, On the complemented subspaces problem, Israel
J. Math. 9 (1971) 263–269.
[Lo1946] L. H. Loomis, On the representation of σ-complete Boolean algebras, Bull. Amer.
Math Soc. 53 (1947), 757–760.
[LoSz2007] L. Lovász, B. Szegedy, Szemerédi’s lemma for the analyst, Geom. Funct. Anal.
17 (2007), no. 1, 252–270
[Ly2003] R. Lyons, Determinantal probability measures, Publ. Math. Inst. Hautes Études
Sci. No. 98 (2003), 167–212.
[Ma2008] M. Madiman, On the entropy of sums, preprint.
[MaKaSo2004] W. Magnus, A. Karras, and D. Solitar, Presentations of Groups in Terms
of Generators and Relations, Dover Publications, 2004.
[Ma1999] R. Matthews, The power of one, New Scientist, 10 July 1999, p. 26.
[Ma1995] P. Mattila, Geometry of sets and measures in Euclidean spaces. Fractals and
rectifiability. Cambridge Studies in Advanced Mathematics, 44. Cambridge University
Press, Cambridge, 1995.
[Ma1959] B. Mazur, On embeddings of spheres, Bull. Amer. Math. Soc. 65 (1959), 59–65.
[Mo2009] R. Moser, A constructive proof of the Lovász local lemma, Proceedings of the
41st Annual ACM Symposium on Theory of Computing 2009, pp. 343–350.

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Bibliography 343

[Na1964] I. Namioka, Følner’s conditions for amenable semi-groups, Math. Scand. 15


(1964), 18–28.
[ON1963] R. O’Neil, Convolution operators and L(p, q) spaces, Duke Math. J. 30 (1963),
129–142.
[Po2009] D.H.J. Polymath, A new proof of the density Hales-Jewett theorem, preprint.
[PCM] The Princeton Companion to Mathematics. Edited by Timothy Gowers, June
Barrow-Green and Imre Leader. Princeton University Press, Princeton, NJ, 2008.
[Ra1959] H. Rademacher, On the Phragmén-Lindelöf theorem and some applications,
Math. Z 72 (1959/1960), 192–204.
[RaRu1997] A. Razborov, S. Rudich, Natural proofs, 26th Annual ACM Symposium on
the Theory of Computing (STOC ’94) (Montreal, PQ, 1994). J. Comput. System Sci.
55 (1997), no. 1, part 1, 24–35.
[Ro1982] J.-P. Rosay, Injective holomorphic mappings, Amer. Math. Monthly 89 (1982),
no. 8, 587–588.
[Ro1953] K. Roth, On certain sets of integers, I, J. London Math. Soc. 28 (1953), 104–109.
[Ru1962] W. Rudin, Fourier analysis on groups. Reprint of the 1962 original. Wiley Clas-
sics Library. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York,
1990. x+285 pp
[Ru1995] W. Rudin, Injective polynomial maps are automorphisms, Amer. Math. Monthly
102 (1995), no. 6, 540–543.
[Ru1989] I. Ruzsa, An application of graph theory to additive number theory, Sci. Ser. A
Math. Sci. (N.S.) 3 (1989), 97–109.
[RuSz1978] I. Ruzsa, E. Szemerédi, Triple systems with no six points carrying three tri-
angles, Colloq. Math. Soc. J. Bolyai, 18 (1978), 939–945.
[Sc2006] B. Schlein, Dynamics of Bose-Einstein Condensates, preprint.
[Se2009] J. P. Serre, How to use finite fields for problems concerning infinite fields,
preprint.
[So2000] A. Soshnikov, Determinantal random point fields, Uspekhi Mat. Nauk 55 (2000),
no. 5 (335), 107–160; translation in Russian Math. Surveys 55 (2000), no. 5, 923–975.
[St1961] E. M. Stein, On limits of seqences of operators, Ann. of Math. 74 (1961) 140–170.
[St1970] E. M. Stein, Singular Integrals and Differentiability Properties of Functions.
Princeton Mathematical Series, No. 30 Princeton University Press, Princeton, N.J.
1970
[St1993] E. M. Stein, Harmonic Analysis: Real-variable Methods, Orthogonality, and Os-
cillatory Integrals. With the assistance of Timothy S. Murphy. Princeton Mathemat-
ical Series, 43. Monographs in Harmonic Analysis, III. Princeton University Press,
Princeton, NJ, 1993.
[St1969] S. A. Stepanov, The number of points of a hyperelliptic curve over a finite prime
field, Izv. Akad. Nauk SSSR Ser. Mat. 33 (1969), 1171–1181
[St1948] A. H. Stone, Paracompactness and product spaces, Bull. Amer. Math. Soc. 54,
(1948), 977–982.
[Sz2009] B. Szegedy, Higher order Fourier analysis as an algebraic theory I, preprint.
[SzTr1983] E. Szemerédi, W. Trotter, Extremal problems in discrete geometry, Combina-
torica 3 (1983), no. 3–4, 381–392.
[Ta1996] M. Talagrand, Concentration of measure and isoperimetric inequalities in prod-
uct spaces, Inst. Hautes Études Sci. Publ. Math. No. 81 (1995), 73–205

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
344 Bibliography

[Ta2005] M. Talagrand, The generic chaining. Upper and lower bounds of stochastic pro-
cesses. Springer Monographs in Mathematics. Springer-Verlag, Berlin, 2005.
[Ta1951] A. Tarski, A decision method for elementary algebra and geometry, 2nd ed.
University of California Press, Berkeley and Los Angeles, Calif., 1951.
[Ta] T. Tao, Summability of functions, unpublished preprint.
[Ta2006] T. Tao, Nonlinear dispersive equations. Local and global analysis. CBMS Re-
gional Conference Series in Mathematics, 106. Published for the Conference Board of
the Mathematical Sciences, Washington, DC; by the American Mathematical Society,
Providence, RI, 2006.
[Ta2006b] T. Tao, A quantitative ergodic theory proof of Szemerédi’s theorem, Electron.
J. Combin. 13 (2006), no. 1.
[Ta2006c] T. Tao, Szemerédi’s regularity lemma revisited, Contrib. Discrete Math. 1
(2006), no. 1, 8–28
[Ta2007] T. Tao, A correspondence principle between (hyper)graph theory and probability
theory, and the (hyper)graph removal lemma, J. Anal. Math. 103 (2007), 1–45
[Ta2007b] T. Tao, Structure and randomness in combinatorics, Proceedings of the 48th
Annual Symposium on Foundations of Computer Science (FOCS) 2007, 3-18.
[Ta2008] T. Tao, Structure and Randomness: pages from year one of a mathematical blog,
American Mathematical Society, Providence RI, 2008.
[Ta2009] T. Tao, Poincaré’s Legacies: pages from year two of a mathematical blog, Vols.
I, II, American Mathematical Society, Providence RI, 2009.
[Ta2010] T. Tao, The high exponent limit p → ∞ for the one-dimensional nonlinear wave
equation, preprint.
[Ta2010b] T. Tao, A remark on partial sums involving the Möbius function, preprint.
[Ta2010c] T. Tao, Sumset and inverse sumset theorems for Shannon entropy, preprint.
[TaVu2006] T. Tao, V. Vu, On random ±1 matrices: singularity and determinant, Ran-
dom Structures Algorithms 28 (2006), no. 1, 1–23
[TaVu2006b] T. Tao, V. Vu, Additive combinatorics. Cambridge Studies in Advanced
Mathematics, 105. Cambridge University Press, Cambridge, 2006.
[TaVu2007] T. Tao, V. Vu, On the singularity probability of random Bernoulli matrices,
J. Amer. Math. Soc. 20 (2007), 603–628.
[TaWr2003] T. Tao, J. Wright, Lp improving bounds for averages along curves, J. Amer.
Math. Soc. 16 (2003), no. 3, 605–638.
[Th1994] W. Thurston, On proof and progress in mathematics, Bull. Amer. Math. Soc.
(N.S.) 30 (1994), no. 2, 161–177.
[To2005] C. Toth, The Szemerédi-Trotter Theorem in the Complex Plane, preprint.
[Uc1982] A. Uchiyama, A constructive proof of the Fefferman-Stein decomposition of BMO
(Rn ), Acta Math. 148 (1982), 215–241.
[VuWoWo2010] V. Vu, M. Wood, P. Wood, Mapping incidences, preprint.
[Wo1995] T. Wolff, An improved bound for Kakeya type maximal functions, Rev. Mat.
Iberoamericana 11 (1995), no. 3, 651–674
[Wo1998] T. Wolff, A mixed norm estimate for the X-ray transform, Rev. Mat. Iberoamer-
icana 14 (1998), no. 3, 561–600.
[Wo2003] T. Wolff, Lectures on harmonic analysis. With a foreword by Charles Fefferman
and preface by Izabella Laba. Edited by Laba and Carol Shubin. University Lecture
Series, 29. American Mathematical Society, Providence, RI, 2003. x+137 pp

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Index

B(G), 205 Amenable group, 285, 335


B(X → Y ), 128 Approximation to the identity, 216
BC(X → Y ), 112 Arzelá-Ascoli diagonalisation argument,
C(X → Y ), 112 109
C ∗ algebra, 154 Arzelá-Ascoli theorem, 114
C0∞ (Rd ), 213 Axiom of choice, 301
Cc∞ (Rd ), 213
C k (Rd ), 238
C k,α (Rd ), 240 Baire category theorem, 86
Cc (X → R), 136 Baire set, 87
H s (Rd ), 251 Banach ∗-algebra, 192
L0 norm, 39 Banach algebra, 154
M (X), 148 Banach space, 36
W k,p (Rd ), 243 Banach-Alaoglu theorem, 126
σ-compact, 75, 78 Banach-Mazur theorem, 67
σ-finite, 6 Banach-Schröder-Bernstein theorem,
c0 (N), 62 283
cc (N), 62 Banach-Steinhaus theorem, 89
z-transform, 208 Banach-Tarski paradox, 292
Base (topology), 104
Absolute value of a measure, 20 Basic open neighbourhood, 104
Absolutely continuous component, 21 Bessel inequality, 55
Absolutely continuous measure, 21 Beurling-Gelfand spectral radius
Absolutely integrable function, 10 formula, 155
Abstract Boolean algebra, 293 Birkhoff’s representation theorem, 297
Adherent point, 72, 77 Bochner’s theorem, 205
Adjoints (Hilbert spaces), 54 Bolyai-Gerwien theorem, 290
Alexander subbase theorem, 106 Boolean algebra, 293
Algebraic Hahn-Banch theorem, 70 Boolean algebra morphism, 295
Algebraic topology, 120 Borel σ-algebra, 4
Almost everywhere, 7 Borel measurable, 4
Almost open, 87 Bounded set, 73
Almost surely, 7 Box topology, 108

345

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
346 Index

Césaro convergence, 125 Discrete σ-algebra, 7


Césaro summation, 196 Discrete topology, 77
Cantor measure, 22 Distribution, 216
Cantor set, 260 Dominated convergence (sequences), 11
Carathéodory extension theorem, 8, 277 Dominated convergence (series), 11
Carleson’s theorem, 196 Dominated convergence (sets), 7
Cauchy-Schwarz inequality, 48 Dual of Lp , 40
CH, 103 Dual space, 61
Change of variables, 11
Character, 57 Egorov’s theorem, 11
Character (Banach algebras), 155 Elliptic regularity, 254
Character (Fourier analysis), 189 Entropy uncertainty principle, 203
Characteristic function (probability), Equicontinuous, 112
208 Equidecomposability, 282
Chebyshev’s inequality, 165 Essentially bounded function, 35
Classical Sobolev space, 243 Existence of minimisers, 51
Closed graph theorem, 95
Closed set, 72, 77 Fast Fourier Transform, 209
Closure, 72 Fatou’s lemma, 11
Co-countable topology, 78 Fejér kernel, 196
Co-meager set, 87 Finer (σ-algebra), 4
Co-null set, 7 Finer topology, 76
Coarser (σ-algebra), 4 Finite intersection property, 74, 103
Coarser topology, 76 Finite measure, 6, 20
Compact, 78 Finite rank operator, 131
Compact operator, 131 Finite variation, 19
Compactification, 311 First category, 87
Compactness, 74 First-countable, 111
Complete measure space, 8 Formally adjoint, 171
Complete metric space, 73 Fourier inversion formula, 49, 55, 57,
Completion (normed vector space), 61 90, 194
Completion of an inner product space, Fourier series, 57
51 Fourier transform, 57, 317
Continuous map, 75, 78 Fourier transform of a tempered
Continuous measure, 21 distribution, 228
Convergence (ultrafilter), 103 Fourier-Bessel transform, 207
Convergence in measure, 120 Fréchet space, 121
Convergence of sequences, 72 Frostman’s lemma, 271
Convex cone, 68 Fubini’s theorem, 12
Convexity, 51 Fubini-Tonelli theorem, 12
Convolution, 191 Følner sequence, 285, 333
Convolving a distribution with a test
function, 220 Gaussian, 198, 200
Counting measure, 7 Gelfand transform, 66
Gelfand-Naimark theorem, 155
Defined almost everywhere, 32 Generalised Hahn-Banach theorem, 68
Dense set, 77 Generalised limit functional, 66
Dimension (Hilbert space), 56 Generation (σ-algebra), 4
Diophantine number, 88 Geometric Hahn-Banach theorem, 68
Dirac mass, 7 Gibbs phenomenon, 196
Dirac measure, 7 Gram matrix, 49
Dirichlet kernel, 196 Gram-Schmidt theorem, 49

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Index 347

Hölder inequality for Lorentz spaces, LCA, 185


178 LCH, 103
Hölder spaces, 239, 253 Lebesgue σ-algebra, 4, 8
Hölder’s inequality, 38 Lebesgue decomposition theorem, 21
Hölder-Sobolev embedding, 250 Lebesgue differentiation theorem, 85
Haar measure, 186 Lebesgue measurable, 5
Hadamard three-circles theorem, 162 Lebesgue philosophy, 32
Hahn decomposition theorem, 18 Lebesgue spaces, 27
Hahn-Banach theorem, 63 Lebesgue-Radon-Nikodym theorem, 20
Half-open topology, 80 Lebesgue-Radon-Nikodym theorem
Hamel basis, 56 (finitary version), 23
Hankel transform, 207 Limit point, 72
Hanner inequalities, 50 Lindelöf’s theorem, 161
Hardy uncertainty principle, 318 Lindenstrauss-Tzafriri theorem, 95
Hardy-Littlewood majorant property, Linear functional, 61
195 Lipschitz continuous, 113
Hardy-Littlewood-Sobolev fractional Lipschitz function, 240
integration inequality, 182 Littlewood’s first principle, 9
Hausdorff content, 264 Littlewood’s principles, 278
Hausdorff dimension, 270 Littlewood’s second principle, 11, 140
Hausdorff measure, 266 Littlewood’s third principle, 11
Hausdorff paradox, 291 Locally compact, 75, 78
Hausdorff space, 80 Locally compact abelian groups, 185
Hausdorff-Young inequality, 96, 195, Locally finite measure, 9, 136
202 Log-convexity, 160, 162
Heat equation, 232 Loomis-Sikorski representation
Heaviside function, 223 theorem, 298
Height and width of functions, 35 Loomis-Whitney inequality, 248
Heine-Borel theorem, 74 Lorentz spaces, 166
Heisenberg uncertainty principle, 201 Lower semicontinuous, 80, 141
Helly’s selection theorem, 154 Lower topology, 80
Hilbert space, 50 Lusin’s theorem, 140
Homeomorphism, 81
Marinkiewicz interpolation theorem,
Ideal, 156 173
Initial segment, 304 Markov’s inequality, 165
Inner product space, 47 Maximal operator, 172
Inner regular measure, 137 Maximum principle, 328
Inner regularity, 9 Meager set, 87
Integration, 9 Measurable function, 4
Interior, 72, 78 Measurable space, 3, 294
Isolated point, 72 Measure, 6
Isoperimetric inequality, 249 Measure space, 6
Mellin transform, 208
Japanese bracket, 251
Metric space, 71
Join (σ-algebra), 4
Minkowski dimension, 259
Jordan decomposition theorem, 19, 148
Minkowski’s inequality, 38
Kelley’s theorem, 111 Monotone convergence (sequences), 11
Khintchine’s inequality, 99 Monotone convergence (series), 11
Kronecker delta, 55 Monotone convergence (sets), 7
Multiplication of a measure by a
Laplace transform, 207 function, 15

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
348 Index

Mutually singular measures, 19 Product measure, 9


Product rule for distributions, 222
Negative variation of a measure, 19 Product topology, 80, 108
Net, 81 Prokhorov’s theorem, 153
Nikodym convergence theorem, 97 Pseudometric, 278
Norm topology, 117 Pullback (σ-algebra), 5
Normal space (topology), 134 Pure point component, 22
Normalised counting measure, 7 Pushforward measure, 7, 149
Normed vector space, 34 Pythagorean theorem, 49
Notation, x
Nowhere dense, 86 Quasi-normed vector space, 34
Null set (relative to a signed measure), Quasi-triangle inequality, 34
19 Quotient σ-algebra, 297
Quotient topology, 121
Open ball, 72
Open mapping theorem, 92 Radon measure, 9, 136
Open set, 72 Radon transform, 207
Order (distributions), 223 Radon-Nikodym derivative, 16
Order topology, 79 Radon-Nikodym theorem, 21
Ordinal, 306 Rapidly decreasing function, 198
Orthogonal complement, 52 Relatively compact, 75
Orthogonal projection, 52 Rellich-Kondrakov embedding theorem,
Orthogonal transformation, 54 243
Orthonormal basis, 55 Residual set, 87
Orthonormal system, 49 Restriction (σ-algebra), 5
Outer Lebesgue measure, 264 Restriction of a measure to a set, 15
Outer regular measure, 137 Riemann integral, 11
Outer regularity, 9 Riemann-Lebesgue lemma, 191, 325
Riesz representation theorem (measure
Parallelogram law, 50 theory), 143
Parseval identity, 55, 58, 194 Riesz representation theorem (signed
Partition of unity, 140 measures), 148
Phragmén-Lindelöf principle, 162, 319, Riesz representation theorem for
328 Hilbert spaces, 53
Pigeonhole principle (measure theory), Riesz-Thorin theorem, 170
85
Ping-pong lemma, 288, 290 Schauder estimate, 242
Plancherel formula, 49 Schröder-Bernstein theorem, 56
Plancherel theorem, 193 Schrödinger equation, 233
Plancherel’s theorem, 58, 200, 205 Schroder-Bernstein theorem
Pointwise bounded, 112 (well-ordered sets), 305
Pointwise precompact, 112 Schur’s property, 98
Poisson summation formula, 204 Schur’s test, 179
Poisson’s equation, 229 Schwartz function, 198
Pontryagin duality, 206 Second category, 87
Positive variation of a measure, 19 Second-countable, 104
Positive-definite function, 205 Separable, 104
Pre-Hilbert space, 51 Separable (topology), 56
Precompact, 75, 78 Separating hyperplane, 68
Principal value, 218 Sequential Banach-Alaoglu theorem,
Probability measure, 6 127
Probability space, 6 Sequential compactness, 74

Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Index 349

Sequential Tychonoff theorem, 109 Total variation of a measure, 20


Sequentially adherent point, 77 Totally bounded set, 73
Sequentially compact, 78 Transfinite induction, 304, 308
Sequentially continuous, 78 Transpose, 62
Sesquilinear form, 47 Triangle inequality, 34
Sierpinski-Erdős theorem, 88 Trivial topology, 77
Sierpinski-Mazurkiewicz paradox, 288 Tychonoff’s theorem, 110
Signed measure, 17
Signed Radon measure, 148 Ultrafilter, 103
Simple function, 9, 37 Uncertainty principle, 317
Singular component, 21 Uniform boudnedness principle, 129
Singular continuous component, 22 Uniform boundedness principle, 89
Sobolev embedding theorem, 252 Unimodular group, 206
Sobolev embedding theorem, 246, 249 Unital algebra, 154
Sobolev product theorem, 250 Unitary transformation, 54
Sobolev trace theorem, 253 Unsigned measure, 15
Spectral radius, 155 Upper semicontinuous, 80, 141
Spectrum (Banach algebras), 155 Upper topology, 80
Square function, 172 Urysohn’s lemma, 134, 214
Stein interpolation theorem, 172 Urysohn’s metrisation theorem, 314
Step function, 162 Urysohn’s subsequence principle, 78
Stone representation theorem, 296 von Neumann paradox, 290
Stone space, 280, 295
Stone-Čech compactification, 312 Wave equation, 233
Stone-Weierstrass theorem, 151 Weak Lp , 166
Strong induction, 303 Weak derivative, 223
Strong operator topology, 128 Weak operator topology, 128
Strong topology, 117 Weak topology, 123
Strong type, 173 Weak type, 173
Stronger topology, 76 Weak* topology, 123
Subbase, 105 Weak-type Young’s inequality, 182
Sublinear operator, 172 Weak-type Schur’s test, 181
Subnet, 82 Weaker topology, 76
Support of measure, 19 Weierstrass M -test, 36
Supramenable group, 286 Weierstrass approximation theorem, 150
Supremum (well-ordered set), 303 Well-ordered set, 302

Tempered distribution, 226 Young’s convolution inequality, 180


Tensor power trick, 164 Young’s inequality, 180
Tensor product (Hilbert space), 57
Test function, 213 Zariski topology, 79
Tietze extension theorem, 138 Zorn’s lemma, 301, 309
Tits alternative, 291
Tonelli’s theorem, 12
Topological space, 76
Topological vector space, 118
Topology of smooth convergence, 120
Topology (metric space), 72
Topology of pointwise convergence, 119
Topology of uniform convergence, 119
Topology of uniform convergence on
compacta, 119

Author's preliminary version made available with permission of the publisher, the American Mathematical Society

You might also like