Terrence Tao - An Epsilon of Room, I - Real Analysis
Terrence Tao - An Epsilon of Room, I - Real Analysis
Terence Tao
This is a preliminary version of the book An Epsilon of Room, I: Real Analysis: pages from year three
of a mathematical blog published by the American Mathematical Society (AMS). This preliminary
version is made available with the permission of the AMS and may not be changed, edited, or reposted
at any other website without explicit written permission from the author and the AMS.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
To Garth Gaudry, who set me on the road;
To my family, for their constant support;
And to the readers of my blog, for their feedback and contributions.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Contents
Preface ix
A remark on notation x
Acknowledgments xi
§1.3. Lp spaces 27
vii
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
viii Contents
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Preface
ix
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
x Preface
A remark on notation
For reasons of space, we will not be able to define every single mathematical
term that we use in this book. If a term is italicised for reasons other than
emphasis or for definition, then it denotes a standard mathematical object,
result, or concept, which can be easily looked up in any number of references.
(In the blog version of the book, many of these terms were linked to their
Wikipedia pages, or other online reference pages.)
I will, however, mention a few notational conventions that I will use
throughout. The cardinality of a finite set E will be denoted |E|. We will
use the asymptotic notation X = O(Y ), X Y , or Y X to denote the
estimate |X| ≤ CY for some absolute constant C > 0. In some cases we will
need this constant C to depend on a parameter (e.g., d), in which case we
shall indicate this dependence by subscripts, e.g., X = Od (Y ) or X d Y .
We also sometimes use X ∼ Y as a synonym for X Y X.
In many situations there will be a large parameter n that goes off to
infinity. When that occurs, we also use the notation on→∞ (X) or simply
o(X) to denote any quantity bounded in magnitude by c(n)X, where c(n)
is a function depending only on n that goes to zero as n goes to infinity. If
we need c(n) to depend on another parameter, e.g., d, we indicate this by
further subscripts, e.g., on→∞;d (X).
We will occasionally use the averaging notation Ex∈X f (x) :=
x∈X f (x) to denote the average value of a function f : X → C on
1
|X|
a non-empty finite set X.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Preface xi
Acknowledgments
The author is supported by a grant from the MacArthur Foundation, by
NSF grant DMS-0649473, and by the NSF Waterman Award.
Thanks to Kestutis Cesnavicius, Wolfgang M., Daniel Mckenzie, Simion,
Snegud, Blake Stacey, Konrad Swanepoel, and anonymous commenters for
global corrections to the text, and to Edward Dunne at the American Math-
ematical Society for encouragement and editing.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Chapter 1
Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.1
A quick review of
measure and
integration theory
In this section we quickly review the basics of abstract measure theory and
integration theory, which was covered in the previous course but will of
course be relied upon in the current course. This is only a brief summary
of the material; certainly, one should consult a real analysis text for the full
details of the theory.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
4 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 5
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
6 1. Real analysis
just take f to be the map from points in X to the partition cell in which that
point lies). Given such an f , we call the σ-algebra f −1 (2A ) the σ-algebra
generated by the partition; a set is measurable with respect
to this structure
if and only if it is the union of some subcollection α∈B Eα of cells of the
partition.
Exercise 1.1.3. Showthat a σ-algebra on a finite set X necessarily arises
from a partition X = α∈A Eα as in Example 1.1.10, and furthermore the
partition is unique (up to relabeling). Thus, in the finitary world, σ-algebras
are essentially the same concept as partitions.
Example 1.1.11. Let (Xα , Xα )α∈A be a family of measurable spaces,
then the Cartesian product α∈A Xα has canonical projection maps
πβ : α∈A Xα → Xβ for each β ∈ A. The product σ-algebra α∈A Xα
is defined as the σ-algebra on α∈A Xα generated by the πα , as in Example
1.1.7.
Exercise 1.1.4. Let (Xα )α∈A be an at most countable family of second
countable topological spaces. Show that the Borel σ-algebra of the prod-
uct space (with the product topology) is equal to the product of the Borel
σ-algebras of the factor spaces. In particular, the Borel σ-algebra on Rn
is the product of n copies of the Borel σ-algebra on R. (The claim can
fail when the countability hypotheses are dropped, though in most applica-
tions in analysis, these hypotheses are satisfied.) We caution however that
the Lebesgue σ-algebra on Rn is not the product of n copies of the one-
dimensional Lebesgue σ-algebra, as it contains some additional null sets;
however, it is the completion of that product.
Exercise 1.1.5. Let (X, X ) and (Y, Y) be measurable spaces. Show that
if E is measurable with respect to X × Y, then for every x ∈ X, the set
{y ∈ Y : (x, y) ∈ E} is measurable in Y, and similarly for every y ∈ Y ,
the set {x ∈ X : (x, y) ∈ E} is measurable in X . Thus, sections of Borel
measurable sets are again Borel measurable. (The same is not true for
Lebesgue measurable sets.)
1.1.2. Measure spaces. Now we endow measurable spaces with a mea-
sure, turning them into measure spaces.
Definition 1.1.12 (Measures). A (non-negative) measure μ on a measur-
able space (X, X ) is a function μ : X → [0, +∞] such that μ(∅) =0, and such
that we have the countable additivity property μ( ∞ E
n=1 n ) = ∞
n=1 μ(En )
whenever E1 , E2 , . . . are disjoint measurable sets. We refer to the triplet
(X, X , μ) as a measure space.
A measure space (X, X , μ) is finite if μ(X) < ∞; it is a probability space
if μ(X) = 1 (and then we call μ a probability measure). It is σ-finite if X
can be covered by countably many sets of finite measure.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 7
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
8 1. Real analysis
Proof. (Sketch)
∞ Define the outer measure μ∗ (E) of any set E ⊂ X as the
∞
infimum of n=1 μ(An ), where (An )n=1 ranges over all coverings of E by
elements in A. It is not hard to see that if μ∗ agrees with μ on A, it will
suffice to show that it is a measure on X .
It is easy to check that μ∗ is monotone and countably subadditive (as
in parts (i), (ii) of Exercise 1.1.6) on all of 2X and assigns zero to ∅; thus it
is an outer measure in the abstract sense. But we need to show countable
additivity on X . The key is to first show the related property
(1.2) μ∗ (A) = μ∗ (A ∩ E) + μ∗ (A\E)
for all A ⊂ X and E ∈ X . This can first be shown for E ∈ A, and then
one observes that the class of E that obeys (1.2) for all A is a σ-algebra; we
leave this as a (moderately lengthy) exercise.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 9
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
10 1. Real analysis
The following results are standard, and the proofs are omitted:
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 11
• Dominated convergence
for series. If fn: X → C are measurable
functions with n X |fn | dμ < ∞, then n fn (x) is absolutely con-
vergent for a.e. x and X ∞ n=1 fn dμ =
∞
n=1 X fn dμ.
• Egorov’s theorem. If fn : X → C are measurable functions converg-
ing pointwise a.e. to a limit f on a subset A of X of finite measure
and ε > 0, then there exists a set of measure at most ε, outside of
which fn converges uniformly to f in A. (This is a manifestation of
Littlewood’s third principle.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
12 1. Real analysis
Remark 1.1.22. As a rule of thumb, if one does not have exact or approx-
imate monotonicity or domination (where “approximate” means “up to an
error e whose L1 norm X |e| dμ goes to zero”), then one should not expect
the integral of a limit to equal the limit of the integral in general; there is
just too much room for oscillation.
Exercise 1.1.11. Let f : X → C be an absolutely integrable function on a
measure space (X, X , μ). Show that f is uniformly integrable, in the sense
that for every ε > 0 there exists δ > 0 such that E |f | dμ ≤ ε whenever
E is a measurable set of measure at most δ. (The property of uniform
integrability becomes more interesting, of course, when applied to a family
of functions rather than to a single function.)
= ( f (x, y) dμ(x))dν(y).
Y X
• Fubini’s theorem. If f : X × Y → C is absolutely integrable, then we
also have
f dμ × ν = ( f (x, y) dν(y)) dμ(x)
X×Y X Y
= ( f (x, y) dμ(x))dν(y),
Y X
with the inner integrals being absolutely integrable a.e. and the outer
integrals all being absolutely integrable.
If (X, X , μ) and (Y, Y, ν) are complete measure spaces, then the same claims
hold with the product σ-algebra X × Y replaced by its completion.
Remark 1.1.24. The theorem fails for non-σ-finite spaces, but virtually ev-
ery measure space actually encountered in “hard analysis” applications will
be σ-finite. (One should be cautious, however, with any space constructed
using ultrafilters or the first uncountable ordinal.) It is also important that
f obey some measurability in the product space; there exist non-measurable
f for which the iterated integrals exist (and may or may not be equal to
each other, depending on the properties of f and even on which axioms of
set theory one chooses), but the product integral (of course) does not.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.1. Measure and integration 13
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.2
15
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
16 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 17
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
18 1. Real analysis
∞
E1 , E2 , . . . ⊂ X are disjoint, then
(iii) If n=1 μ(En ) converges to
∞
μ( n=1 En ), with the former sum being absolutely convergent1 if
the latter expression is finite.
Thus every unsigned measure is a signed measure, and the difference of
two unsigned measures is a signed measure if at least one of the unsigned
measures is finite; we will see shortly that the converse statement is also true;
i.e., every signed measure is the difference of two unsigned measures (with
one of the unsigned measures being finite). Another example of a signed
measure are the measures mf defined by (1.5), where f : X → [−∞, +∞]
is now signed rather than unsigned, but with the assumption that at least
one of the signed parts f+ := max(f, 0), f− := max(−f, 0) of f is absolutely
integrable.
We also observe that a signed measure μ is unsigned if and only if μ ≥ 0
(where we use (1.10) to define order on measures).
Given a function f : X → [−∞, +∞], we can partition X into one
set X+ := {x : f (x) ≥ 0} on which f is non-negative and another set
X− := {x : f (x) < 0} on which f is negative; thus f X+ ≥ 0 and f X− ≤ 0.
It turns out that the same is true for signed measures:
Theorem 1.2.2 (Hahn decomposition theorem). Let μ be a signed measure.
Then one can find a partition X = X+ ∪ X− such that μ X+ ≥ 0 and
μ X− ≤ 0.
Another consequence of (iii) is that any subset of a finite measure set is again of finite measure,
and the finite union of finite measure sets again has finite measure.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 19
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
20 1. Real analysis
Proof. We prove this only for the case when μ, ν are finite rather than
σ-finite, and leave the general case as an exercise. The uniqueness follows
from Exercise 1.2.2 and the previous observation that mf cannot be mutually
singular with m for any non-zero f , so it suffices to prove existence. By the
Jordan decomposition theorem, we may assume that μ is unsigned as well.
(In this case, we expect f and μs to be unsigned also.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 21
Proof. The implication of (iii) from (i) is Exercise 1.1.11. The implication
of (ii) from (iii) is trivial. To deduce (i) from (ii), apply Theorem 1.2.2
to μ and observe that μs is supported on a set of m-measure zero E by
hypothesis. Since E is null for m, it is null for mf and μ also, and so μs is
trivial, giving (i).
Corollary 1.2.6 (Lebesgue decomposition theorem). Let m be an unsigned
σ-finite measure, and let μ be a signed σ-finite measure. Then there is a
unique decomposition μ = μac + μs , where μac m and μs ⊥ m. (We refer
to μac and μs as the absolutely continuous and singular components of μ
with respect to m.) If μ is unsigned, then μac and μs are also.
Exercise 1.2.9. If every point in X is measurable, we call a signed measure
μ continuous if μ({x}) = 0 for all x. Let the hypotheses be as in Corollary
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
22 1. Real analysis
1.2.6, but suppose also that every point is measurable and m is continuous.
Show that there is a unique decomposition μ = μac + μsc + μpp , where
μac m, μpp is supported on an at most countable set, and μsc is both
singular with respect to m and continuous. Furthermore, if μ is unsigned,
then μac , μsc , μpp are also. We call μsc and μpp the singular continuous and
pure point components of μ, respectively.
Example 1.2.7. A Cantor measure is singular continuous with respect to
Lebesgue measure, while Dirac measures are pure point. Lebesgue measure
on a line is singular continuous with respect to Lebesgue measure on a plane
containing that line.
Remark 1.2.8. Suppose one is decomposing a measure μ on a Euclidean
space Rd with respect to Lebesgue measure m on that space. Very roughly
speaking, a measure is pure point if it is supported on a 0-dimensional
subset of Rd , it is absolutely continuous if its support is spread out on a full
dimensional subset, and it is singular continuous if it is supported on some
set of dimension intermediate between 0 and d. For instance, if μ is the sum
of a Dirac mass at (0, 0) ∈ R2 , one-dimensional Lebesgue measure on the
x-axis, and two-dimensional Lebesgue measure on R2 , then these are the
pure point, singular continuous, and absolutely continuous components of
μ, respectively. This heuristic is not completely accurate (in part because we
have left the definition of “dimension” vague) but is not a bad rule of thumb
for a first approximation. We will study analytic concepts of dimension in
more detail in Section 1.15.
To motivate the terminology “continuous” and “singular continuous”,
we recall two definitions on an interval I ⊂ R, and make a third:
• A function f : I → R is continuous if for every x ∈ I and every
ε > 0, there exists δ > 0 such that |f (y) − f (x)| ≤ ε whenever y ∈ I
is such that |y − x| ≤ δ.
• A function f : I → R is uniformly continuous if for every ε > 0,
there exists δ > 0 such that |f (y) − f (x)| ≤ ε whenever [x, y] ⊂ I has
length at most δ.
• A function f : I → R is absolutely continuous if for every ε >
0, there exists δ > 0 such that ni=1 |f (yi ) − f (xi )| ≤ ε whenever
[x1 , y1 ], . . . , [xn , yn ] are disjoint intervals in I of total length at most
δ.
Clearly, absolute continuity implies uniform continuity, which in turn implies
continuity. The significance of absolute continuity is that it is the largest
class of functions for which the fundamental theorem of calculus holds (using
the classical derivative and the Lebesgue integral), as can be seen in any
introductory graduate real analysis course.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 23
Exercise 1.2.10. Let m be Lebesgue measure on the interval [0, +∞], and
let μ be a finite unsigned measure.
Show that μ is a continuous measure if and only if the function x →
μ([0, x]) is continuous. Show that μ is an absolutely continuous measure
with respect to m if and only if the function x → μ([0, x]) is absolutely
continuous.
Proof. Using the Radon-Nikodym theorem (or just working by hand, since
everything is finite), we can write dμn = fn dmn for some fn : Xn → [0, +∞)
with average value 1.
For each positive integer k, the sequence μn ({fn ≥ k}) is bounded be-
tween 0 and 1, so by the Bolzano-Weierstrass theorem, it has a convergent
subsequence. Applying the usual diagonalisation argument (as in the proof
of the Arzelá-Ascoli theorem, Theorem 1.8.23), we may thus assume (after
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
24 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.2. Signed measures 25
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.3
Lp spaces
Now that we have reviewed the foundations of measure theory, let us now
put it to work to set up the basic theory of one of the fundamental families
of function spaces in analysis, namely the Lp spaces (also known as Lebesgue
spaces). These spaces serve as important model examples for the general
theory of topological and normed vector spaces, which we will discuss a little
bit in this lecture and then in much greater detail in later lectures.
Just as scalar quantities live in the space of real or complex numbers, and
vector quantities live in vector spaces, functions f : X → C (or other objects
closely related to functions, such as measures) live in function spaces. Like
other spaces in mathematics (e.g., vector spaces, metric spaces, topological
spaces, etc.) a function space V is not just mere sets of objects (in this
case, the objects are functions), but they also come with various important
structures that allow one to do some useful operations inside these spaces
and from one space to another. For example, function spaces tend to have
several (though usually not all) of the following types of structures, which
are usually related to each other by various compatibility conditions:
27
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
28 1. Real analysis
this course will be vector spaces. Because the field of scalars is real
or complex, vector spaces also come with the notion of convexity,
which turns out to be crucial in many aspects of analysis. As a
consequence (and in marked contrast to algebra or number theory),
much of the theory in real analysis does not seem to extend to other
fields of scalars (in particular, real analysis fails spectacularly in the
finite characteristic setting).
• Algebra structure. Sometimes (though not always) we also wish
to multiply two functions f , g in V and get another function f g in V ;
when combined with the vector space structure and assuming some
compatibility conditions (e.g., the distributive law), this makes V an
algebra. This multiplication operation is often just pointwise multi-
plication, but there are other important multiplication operations on
function spaces too, such as2 convolution.
• Norm structure. We often want to distinguish large functions in
V from small ones, especially in analysis, in which small terms in an
expression are routinely discarded or deemed to be acceptable errors.
One way to do this is to assign a magnitude or norm f V to each
function that measures its size. Unlike the situation with scalars,
where there is basically a single notion of magnitude, functions have
a wide variety of useful notions of size, each measuring a different
aspect (or combination of aspects) of the function, such as height,
width, oscillation, regularity, decay, and so forth. Typically, each
such norm gives rise to a separate function space (although sometimes
it is useful to consider a single function space with multiple norms
on it). We usually require the norm to be compatible with the vector
space structure (and algebra structure, if present), for instance by
demanding that the triangle inequality hold.
• Metric structure. We also want to tell whether two functions f ,
g in a function space V are near together or far apart. A typical
way to do this is to impose a metric d : V × V → R+ on the space
V . If both a norm V and a vector space structure are available,
there is an obvious way to do this: define the distance between two
functions f, g in V to be3 d(f, g) := f − gV . It is often important
2 One sometimes sees other algebraic structures than multiplication appear in function spaces,
such as commutators and derivations, but again we will not encounter those in this course. An-
other common algebraic operation for function spaces is conjugation or adjoint, leading to the
notion of a *-algebra.
3 This will be the only type of metric on function spaces encountered in this course. But there
are some non-linear function spaces of importance in non-linear analysis (e.g., spaces of maps from
one manifold to another) which have no vector space structure or norm, but still have a metric.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 29
spaces unfortunately tend be non-compact in various rather nasty ways, although there are useful
partial substitutes for compactness that are available; see, e.g., Section 1.6 of Poincaré’s Legacies,
Vol. I.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
30 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 31
p = 2. We will also extend this notion later to p = ∞, which is also an important special case.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
32 1. Real analysis
At present, the Lebesgue spaces Lp are just sets. We now begin to place
several of the structures mentioned in the introduction to upgrade these sets
to richer spaces.
6 One could also take a more abstract view, dispensing with the set X altogether and defining
the Lebesgue space Lp (X , μ) on abstract measure spaces (X , μ), but we will not do so here.
Another way to think about elements of Lp is that they are functions which are unreliable on an
unknown set of measure zero, but remain reliable almost everywhere.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 33
We begin with vector space structure. Fix 0 < p < ∞, and let f, g ∈ Lp
be two pth-power integrable functions. From the crude pointwise (or more
precisely, pointwise almost everywhere) inequality
|f (x) + g(x)|p ≤ (2 max(|f (x)|, |g(x)|))p
(1.16) = 2p max(|f (x)|p , |g(x)|p )
≤ 2p (|f (x)|p + |g(x)|p ),
we see that the sum of two pth-power integrable functions is also pth-power
integrable. It is also easy to see that any scalar multiple of a pth-power
integrable function is also pth-power integrable. These operations respect
almost everywhere equivalence, and so Lp becomes a (complex) vector space.
Next, we set up the norm structure. If f ∈ Lp , we define the Lp norm
f Lp of f to be the number
Proof. The claims (i) and (ii) are obvious. (Note how important it is that
we equate functions that vanish almost everywhere in order to get (i).) The
quasi-triangle inequality follows from a variant of the estimates in (1.16)
and is left as an exercise. For the triangle inequality, we have to be more
efficient than the crude estimate (1.16). By the non-degeneracy property
we may take f Lp and gLp to be non-zero. Using homogeneity, we can
normalise f Lp + gLp to equal 1, thus (by homogeneity again) we can
write f = (1 − θ)F and g = θG for some 0 < θ < 1 and F, G ∈ Lp with
F Lp = GLp = 1. Our task is now to show that
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
34 1. Real analysis
But observe that for 1 ≤ p < ∞, the function x → |x|p is convex on C, and
in particular that
(1.20) |(1 − θ)F (x) + θG(x)|p ≤ (1 − θ)|F (x)|p + θ|G(x)|p .
(If one wishes, one can use the complex triangle inequality to first reduce to
the case when F , G are non-negative, in which case one only needs convexity
on [0, +∞) rather than all of C.) The claim (1.19) then follows from (1.20)
and the normalisations of F , G.
Exercise 1.3.2. Let 0 < p ≤ 1 and f, g ∈ Lp .
(i) Establish the variant f + gpLp ≤ f pLp + gpLp of the triangle
inequality.
(ii) If furthermore f and g are non-negative (almost everywhere), estab-
lish also the reverse triangle inequality f + gLp ≥ f Lp + gLp .
(iii) Show that the best constant C in the quasi-triangle inequality is
1
−1
2 p . In particular, the triangle inequality is false for p < 1.
(iv) Now suppose instead that 1 < p < ∞ or 0 < p < 1. If f, g ∈ Lp are
such that f + gLp = f Lp + gLp , show that one of the functions
f , g is a non-negative scalar multiple of the other (up to equivalence,
of course). What happens when p = 1?
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 35
case to the σ-finite case in many, though not all, questions concerning Lp
spaces.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
36 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 37
1 ≤ p < ∞, we write M := ∞ n=1 fn Lp , which is a finite quantity by hy-
N
pothesis. By the triangle inequality, we have n=1 |fn |Lp ≤ M for all N .
∞
By monotone convergence
∞ (Theorem 1.1.21), we conclude n=1 |fn |Lp ≤
M . In particular, n=1 fn (x) is absolutely convergent for almost every x.
Write the limit of this series as F (x). By dominated convergence (Theorem
N p
1.1.21), we see that n=1 fn (x) converges in L norm to F , and we are
done.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
38 1. Real analysis
(1.21) |f g| dμ ≤ 1.
X
(and checking the degenerate cases separately, e.g., when f + gLp = 0).
Remark 1.3.14. The proofs of Hölder’s inequality and Minkowski’s in-
equality both relied on convexity of various functions in C or [0, +∞). One
way to emphasise this is to deduce both inequalities from Jensen’s inequality,
which is an inequality that manifestly exploits this convexity. We will not
take this approach here, but see for instance [LiLo2000] for a discussion.
Example 1.3.15. It is instructive to test Hölder’s inequality (and also Ex-
ercises 1.3.10–1.3.14 below) in the special case when f , g are generalised step
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 39
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
40 1. Real analysis
(1.26) λg (f ) := f g dμ
X
is well defined on Lp ; the functional is also clearly linear. Furthermore,
Hölder’s inequality also tells us that this functional is continuous.
A deep and important fact about Lp spaces is that, in most cases, the
converse is true: the recipe (1.26) is the only way to create continuous linear
functionals on Lp .
Theorem 1.3.16 (Dual of Lp ). Let 1 ≤ p < ∞, and assume μ is σ-finite.
Let λ : Lp → C be a continuous linear functional. Then there exists a unique
g ∈ Lp such that λ = λg .
Proof. It is clear that (i) implies (ii), and that (iii) implies (ii). Next, from
linearity we have T x = T x0 + T (x − x0 ) for any x, x0 ∈ X, which (together
with the continuity of addition, which follows from the triangle inequality)
shows that continuity of T at 0 implies continuity of T at any x0 , so that
(ii) implies (i). The only remaining task is to show that (i) implies (iii).
By continuity, the inverse image of the unit ball in Y must be an open
neighbourhood of 0 in X, thus there exists some radius r > 0 such that
T xY < 1 whenever xX < r. The claim then follows (with C := 1/r) by
homogeneity. (Alternatively, one can deduce (iii) from (ii) by contradiction.
If (iii) failed, then there exists a sequence xn of non-zero elements of X
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 41
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
42 1. Real analysis
Remark 1.3.18. When 1 < p < ∞, the hypothesis that μ is σ-finite can
be dropped, but not when p = 1; see, e.g., [Fo2000, Section 6.2] for further
discussion. In these lectures, though, we will be content with working in
the σ-finite setting. On the other hand, the claim fails when p = ∞ (except
when X is finite); we will see this in Section 1.5, when we discuss the Hahn-
Banach theorem.
(1.29) f dμ = f g d(μ + m)
X X
for all f ∈ L1 (μ + m). It is easy to see that g must be real and non-negative,
and also at most 1 almost everywhere. If E is the set where m = 1, we
see by setting f = 1E in (1.29) that E has m-measure zero, and so μ E is
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.3. Lp spaces 43
(1.30) (1 − g)f dμ = f g dm
X\E X
and one then easily verifies that μ agrees with m g outside of E . This gives
1−g
the desired Lebesgue-Radon-Nikodym decomposition μ = m g + μ E .
1−g
Remark 1.3.20. The argument used in Remark 1.3.19 also shows that the
Radon-Nikodym theorem implies the Lebesgue-Radon-Nikodym theorem.
Remark 1.3.21. One can give an alternate proof of Theorem 1.3.16, which
relies on the geometry (and in particular, the uniform convexity) of Lp spaces
rather than on the Radon-Nikodym theorem, and can thus be viewed as
giving an independent proof of that theorem; see Exercise 1.4.14.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.4
Hilbert spaces
In the next few lectures, we will be studying four major classes of function
spaces. In decreasing order of generality, these classes are the topological
vector spaces, the normed vector spaces, the Banach spaces, and the Hilbert
spaces. In order to motivate the discussion of the more general classes of
spaces, we will first focus on the most special class—that of (real and com-
plex) Hilbert spaces. These spaces can be viewed as generalisations of (real
and complex) Euclidean spaces such as Rn and Cn to infinite-dimensional
settings, and indeed much of one’s Euclidean geometry intuition concerning
lengths, angles, orthogonality, subspaces, etc., will transfer readily to arbi-
trary Hilbert spaces. In contrast, this intuition is not always accurate in
the more general vector spaces mentioned above. In addition to Euclidean
spaces, another fundamental example7 of Hilbert spaces comes from the
Lebesgue spaces L2 (X, X , μ) of a measure space (X, X , μ).
Hilbert spaces are the natural abstract framework in which to study two
important (and closely related) concepts, orthogonality and unitarity, al-
lowing us to generalise familiar concepts and facts from Euclidean geometry
such as the Cartesian coordinate system, rotations and reflections, and the
Pythagorean theorem to Hilbert spaces. (For instance, the Fourier trans-
form (Section 1.12) is a unitary transformation and can thus be viewed as a
kind of generalised rotation.) Furthermore, the Hodge duality on Euclidean
7 There are of course many other Hilbert spaces of importance in complex analysis, harmonic
analysis, and PDE, such as Hardy spaces H2 , Sobolev spaces H s = W s,2 , and the space HS of
Hilbert-Schmidt operators; see for instance Section 1.14 for a discussion of Sobolev spaces. Com-
plex Hilbert spaces also play a fundamental role in the foundations of quantum mechanics, being
the natural space to hold all the possible states of a quantum system (possibly after projectivising
the Hilbert space), but we will not discuss this subject here.
45
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
46 1. Real analysis
spaces has a partial analogue for Hilbert spaces, namely the Riesz represen-
tation theorem for Hilbert spaces, which makes the theory of duality and
adjoints for Hilbert spaces especially simple (when compared with the more
subtle theory of duality for, say, Banach spaces; see Section 1.5).
These notes are only the most basic introduction to the theory of Hilbert
spaces. In particular, the theory of linear transformations between two
Hilbert spaces, which is perhaps the most important aspect of the subject,
is not covered much at all here.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 47
(1.43) f, g := f g dμ
X
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
48 1. Real analysis
Given a (real or complex) inner product space V , we can define the norm
x of any vector x ∈ V by the formula (1.39), which is well defined thanks
to the positivity property; in the case of the L2 spaces, this norm of course
corresponds to the usual L2 norm. We have the following basic facts:
Lemma 1.4.7. Let V be a real or complex inner product space.
(i) Cauchy-Schwarz inequality. For any x, y ∈ V , we have |x, y| ≤
xy.
(ii) The function x → x is a norm on V . (Thus every inner product
space is a normed vector space.)
Proof. We shall just verify the complex case, as the real case is similar
(and slightly easier). The positivity property tells us that the quadratic
form ax + by, ax + by is non-negative for all complex numbers a, b. Using
sesquilinearity and symmetry, we can expand this form as
(1.45) |a|2 x2 + 2 Re(abx, y) + |b|2 y2 .
Optimising in a, b (see also Section 1.10 of Structure and Randomness), we
obtain the Cauchy-Schwarz inequality. To verify the norm property, the only
non-trivial verification is that of the triangle inequality x + y ≤ x + y.
But on expanding x + y2 = x + y, x + y, we see that
(1.46) x + y2 = x2 + 2 Re(x, y) + y2 ,
and the claim then follows from the Cauchy-Schwarz inequality.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 49
Inspired by the above exercise, we say that two inner product spaces
are isomorphic if there exists an invertible isometry from one space to the
other; such invertible isometries are known as isomorphisms.
Exercise 1.4.2. Let V be a real or complex inner product space. If x1 , . . .,
xn are a finite collection of vectors in V , show that the Gram matrix
(xi , xj )1≤i,j≤n is Hermitian and positive semidefinite, and it is positive
definite if and only if the x1 , . . . , xn are linearly independent. Conversely,
given a Hermitian positive semidefinite matrix (aij )1≤i,j≤n with real (resp.,
complex) entries, show that there exists a real (resp., complex) inner product
space V and vectors x1 , . . . , xn such that xi , xj = aij for all 1 ≤ i, j ≤ n.
(with only finitely many of the terms non-zero) and the (finite) Plancherel
formula
(1.48) x2 = |x, eα |2 .
α∈A
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
50 1. Real analysis
1.4.2. Hilbert spaces. Thus far, our discussion of inner product spaces
has been largely algebraic in nature; this is because we have not been able
to take limits inside these spaces and do some actual analysis. This can be
rectified by adding an additional axiom:
Definition 1.4.8 (Hilbert spaces). A (real or complex) Hilbert space is a
(real or complex) inner product space which is complete (or equivalently, an
inner product space which is also a Banach space).
Example 1.4.9. From Proposition 1.3.7, (real or complex) L2 (X, X , μ) is
a Hilbert space for any measure space (X, X , μ). In particular, Rn and Cn
are Hilbert spaces.
Exercise 1.4.7. Show that a subspace of a Hilbert space H will itself be a
Hilbert space if and only if it is closed. (In particular, proper dense subspaces
of Hilbert spaces are not Hilbert spaces.)
Example 1.4.10. By Example 1.4.9, the space l2 (Z) of doubly infinite
square-summable sequences is a Hilbert space. Inside this space, the space
cc (Z) of sequences of finite support is a proper dense subspace (as can be
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 51
seen for instance by Proposition 1.3.8, though this can also be seen much
more directly), and so cannot be a Hilbert space.
Exercise 1.4.8. Let V be an inner product space. Show that there exists a
Hilbert space V which contains a dense subspace isomorphic to V ; we refer
to V as a completion of V . Furthermore, this space is essentially unique
in the sense that if V , V are two such completions, then there exists an
isomorphism from V to V which is the identity on V (if one identifies V with
the dense subspaces of V and V . Because of this fact, inner product spaces
are sometimes known as pre-Hilbert spaces, and can always be identified with
dense subspaces of actual Hilbert spaces.
Exercise 1.4.9. Let H, H be two Hilbert spaces. Define the direct sum
H ⊕ H of the two spaces to be the vector space H × H with inner product
(x, x ), (y, y )H⊕H := x, yH + x , y H . Show that H ⊕ H is also a
Hilbert space.
Example 1.4.11. If H is a complex Hilbert space, one can define the com-
plex conjugate H of that space to be the set of formal conjugates {x : x ∈ H}
of vectors in H, with complex vector space structure x + y := x + y and
cx := cx, and inner product x, yH := y, xH . One easily checks that H is
again a complex Hilbert space. Note the map x → x is not a complex linear
isometry; instead, it is a complex antilinear isometry.
Proof. Observe from the parallelogram law (1.49) that we have the (geo-
metrically obvious) fact that if y and y are distinct and equidistant from
x, then their midpoint (y + y )/2 is strictly closer to x than either of y or
y . This (and convexity) ensures that the distance minimiser, if it exists, is
unique. Also, if y is the distance minimiser and z is in K, then (1 − θ)y + θz
is at least as distant from x as y is for any 0 < θ < 1, by convexity. Squaring
this and rearranging, we conclude that
(1.53) 2 Rez − y, y − x + θz − y2 ≥ 0.
Letting θ → 0 we obtain the final claim in the proposition.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
52 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 53
Proof. We just show the claim for complex Hilbert spaces, since the claim
for real Hilbert spaces is very similar. First, we show uniqueness: if λv = λv ,
then λv−v = 0, and in particular v − v , v − v = 0, and so v = v .
Now we show existence. We may assume that λ is not identically zero,
since the claim is obvious otherwise. Observe that the kernel V := {x ∈
H : λ(x) = 0} is then a proper subspace of H, which is closed since λ
is continuous. By Exercise 1.4.13, the orthogonal complement V ⊥ must
contain at least one non-trivial vector w, which we can normalise to have
unit magnitude. Since w does not lie in V , λ(w) is non-zero. Now observe
λ(x)
that for any x in H, x − λ(w) w lies in the kernel of λ, i.e., it lies in V . Taking
inner products with w, we conclude that
λ(x)
(1.54) x, w − = 0,
λ(w)
and thus
(1.55) λ(x) = x, λ(w)w.
Thus we have λ = λλ(w)w , and the claim follows.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
54 1. Real analysis
Exercise 1.4.14. Using Exercise 1.4.11, give an alternate proof of the 1 <
p < ∞ case of Theorem 1.3.16.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 55
∞
(iii) Show that the map (cn )∞ n=1 → n=1 cn en is an isometry from the
2
Hilbert space (N) to H. The image V of this isometry is the
smallest closed subspace of H that contains e1 , e2 , . . ., and which we
shall therefore call the (Hilbert space) span of e1 , e2 , . . ..
(iv) Take adjoints of (ii) and conclude that for any x ∈ H, we have
πV (x) = ∞ n=1 x, en en and πV (x) =( ∞n=1 |x, en | )
2 1/2 . Con-
∞
clude in particular the Bessel inequality n=1 |x, en | ≤ x2 .
2
Remark 1.4.17. Note the contrast here between conditional and uncondi-
tional summability (which needs only square-summability of the coefficients
cn ) and absolute summability (which requires the stronger condition that
the cn are absolutely summable). In particular there exist non-absolutely
summable series that are still unconditionally summable, in contrast to the
situation for scalars, in which one has the Riemann rearrangement theorem.
Now we can handle arbitrary orthonormal systems (eα )α∈A . If (cα )α∈A
is square-summable, then at most countably many of the cα are non-zero (by
Exercise
1.3.4). Using parts (i), (ii) of Exercise 1.4.18, we can then form the
sum α∈A cα eα in an unambiguous manner. It is not hard to use Exercise
1.4.18 to then conclude that this gives an isometric embedding of 2 (A) into
H. The image of this isometry is the smallest closed subspace of H that
contains the orthonormal system, which we call the (Hilbert space) span of
that system. (It is the closure of the algebraic span of the system.)
Exercise 1.4.19. Let (eα )α∈A be an orthonormal system in H. Show that
the following statements are equivalent:
(i) The Hilbert space span of (eα )α∈A is all of H.
(ii) The algebraic span of (eα )α∈A (i.e., the finite linear combinations of
the eα ) is dense in H.
(iii) One has the Parseval identity x2 = α∈A |x, eα |2 for all x ∈ H.
(iv) One has the inversion formula x = α∈A x, eα eα for all x ∈ H (in
particular, the coefficients x, eα are square-summable).
(v) The only vector that is orthogonal to all the eα is the zero vector.
(vi) There is an isomorphism from 2 (A) to H that maps δα to eα for all
α ∈ A (where δα is the Kronecker delta at α).
A system (eα )α∈A obeying any (and hence all) of the properties in Ex-
ercise 1.4.19 is known as an orthonormal basis of the Hilbert space H. All
Hilbert spaces have such a basis:
Proposition 1.4.18. Every Hilbert space has at least one orthonormal ba-
sis.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
56 1. Real analysis
Proof. We use the standard Zorn’s lemma argument (see Section 2.4). Ev-
ery Hilbert space has at least one orthonormal system, namely the empty
system. We order the orthonormal systems by inclusion, and observe that
the union of any totally ordered set of orthonormal systems is again an
orthonormal system. By Zorn’s lemma, there must exist a maximal or-
thonormal system (eα )α∈A . There cannot be any unit vector orthogonal to
all the elements of this system, since otherwise one could add that vector
to the system and contradict orthogonality. Applying Exercise 1.4.19 in the
contrapositive, we obtain an orthonormal basis as claimed.
Exercise 1.4.20. Show that every vector space V has at least one algebraic
basis, i.e., a set of basis vectors such that every vector in V can be expressed
uniquely as a finite linear combination of basis vectors. (Such bases are also
known as Hamel bases.)
Corollary 1.4.19. Every Hilbert space is isomorphic to 2 (A) for some set
A.
Exercise 1.4.21. Let A, B be sets. Show that 2 (A) and 2 (B) are isomor-
phic iff A and B have the same cardinality. (Hint: The case when A or B
is finite is easy, so suppose A and B are both infinite. If 2 (A) and 2 (B)
are isomorphic, show that B can be covered by a family of at most count-
able sets indexed by A, and vice versa. Then apply the Schröder-Bernstein
theorem (Section 1.13 of Volume II ).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.4. Hilbert spaces 57
Exercise 1.4.25. Let (X, X , μ) and (Y, Y, ν) be measure spaces. Show that
L2 (X ×Y, X ×Y, μ×ν) is the tensor product of L2 (X, X , μ) and L2 (Y, Y, μ),
if one defines the tensor product f ⊗g of f ∈ L2 (X, X , μ) and g ∈ L2 (Y, Y, μ)
as f ⊗ g(x, y) := f (x)g(y).
We do not yet have enough theory in other areas to give the really
useful applications of Hilbert space theory yet, but let us just illustrate a
simple one, namely the development of Fourier series on the unit circle
R/Z. We can give this space the usual Lebesgue measure (identifying the
unit circle with [0, 1), if one wishes), giving rise to the complex Hilbert space
L2 (R/Z). On this space we can form the characters en (x) := e2πinx for all
integers n; one easily verifies that (en )n∈Z is an orthonormal system. We
claim that it is in fact an orthonormal basis. By Exercise 1.4.19, it suffices
to show that the algebraic span of the en , i.e., the space of trigonometric
polynomials, is dense in L2 (R/Z). But8 from an explicit computation (e.g.,
using Fejér kernels) one can show that the indicator function of any interval
can be approximated to arbitrary accuracy in the L2 norm by trigonometric
polynomials, and is thus in the closure of the trigonometric polynomials. By
linearity, the same is then true of an indicator function of a finite union of
intervals; since Lebesgue measurable sets in R/Z can be approximated to
arbitrary accuracy by finite unions of intervals, the same is true for indicators
of measurable sets. By linearity, the same is true for simple functions, and
by density (Proposition 1.3.8) the same is true for arbitrary L2 functions,
and the claim follows.
The Fourier transform fˆ : Z → C of a function f ∈ L2 (R/Z) is defined
as
1
(1.56) fˆ(n) := f, en = f (x)e−2πinx dx.
0
8 One can also use the Stone-Weierstrass theorem here; see Theorem 1.10.18.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
58 1. Real analysis
|fˆ(n)|2 = |f (x)|2 dx
n∈Z R/Z
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.5
(There are also more sophisticated ways to study an object via its maps,
e.g., by studying extensions, joinings, splittings, universal lifts, etc. The
general study of objects via the maps between them is formalised abstractly
in modern mathematics as category theory, and is also closely related to
homological algebra.)
A remarkable phenomenon in many areas of mathematics is that of (con-
travariant) duality: that the maps into and out of one type of mathematical
object X can be naturally associated to the maps out of and into a dual
59
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
60 1. Real analysis
object X ∗ (note the reversal of arrows here!) In some cases, the dual object
X ∗ looks quite different from the original object X. (For instance, in Stone
duality, discussed in Section 2.3, X would be a Boolean algebra (or some
other partially ordered set) and X ∗ would be a compact totally disconnected
Hausdorff space (or some other topological space).) In other cases, most no-
tably with Hilbert spaces as discussed in Section 1.4, the dual object X ∗ is
essentially identical to X itself.
In these notes we discuss a third important case of duality, namely du-
ality of normed vector spaces, which is of an intermediate nature to the
previous two examples: The dual X ∗ of a normed vector space turns out to
be another normed vector space, but generally one which is not equivalent
to X itself (except in the important special case when X is a Hilbert space,
as mentioned above). On the other hand, the double dual (X ∗ )∗ turns out
to be closely related to X, and in several (but not all) important cases, is
essentially identical to X. One of the most important uses of dual spaces in
functional analysis is that it allows one to define the transpose T ∗ : Y ∗ → X ∗
of a continuous linear operator T : X → Y .
A fundamental tool in understanding duality of normed vector spaces
will be the Hahn-Banach theorem, which is an indispensable tool for ex-
ploring the dual of a vector space. (Indeed, without this theorem, it is not
clear at all that the dual of a non-trivial normed vector space is non-trivial!)
Thus, we shall study this theorem in detail in this section concurrently with
our discussion of duality.
1.5.1. Duality. In the category of normed vector spaces, the natural no-
tion of a map (or morphism) between two such spaces is that of a continuous
linear transformation T : X → Y between two normed vector spaces X, Y .
By Lemma 1.3.17, any such linear transformation is bounded, in the sense
that there exists a constant C such that T xY ≤ CxX for all x ∈ X. The
least such constant C is known as the operator norm of T , and is denoted
T op or simply T .
Two normed vector spaces X, Y are equivalent if there is an invertible
continuous linear transformation T : X → Y from X to Y , thus T is bijective
and there exist constants C, c > 0 such that cxX ≤ T xY ≤ CxX for
all x ∈ X. If one can take C = c = 1, then T is an isometry, and X and Y
are called isomorphic. When one has two norms 1 , 2 on the same vector
space X, we say that the norms are equivalent if the identity from (X, 1 )
to (X, 2 ) is an invertible continuous transformation, i.e., that there exist
constants C, c > 0 such that cx1 ≤ x2 ≤ Cx1 for all x ∈ X.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 61
From Exercise 1.5.2, we see that the dual of any normed vector space
is a Banach space, and so duality is arguably a Banach space notion rather
than a normed vector space notion. The following exercise reinforces this:
Exercise 1.5.4. We say that a normed vector space X has a completion X
if X is a Banach space and X can be identified with a dense subspace of X
(cf. Exercise 1.4.8).
(i) Show that every normed vector space X has at least one completion
X, and that any two completions X, X are isomorphic in the sense
that there exists an isomorphism from X to X which is the identity
on X.
(ii) Show that the dual spaces X ∗ and (X)∗ are isomorphic to each other.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
62 1. Real analysis
The next few exercises are designed to give some intuition as to how
dual spaces work.
Exercise 1.5.5. Let Rn be given the Euclidean metric. Show that (Rn )∗
is isomorphic to Rn . Establish the corresponding result for the complex
spaces Cn .
Exercise 1.5.6. Let cc (N) be the vector space of sequences (an )n∈N of real
or complex numbers which are compactly supported (i.e., at most finitely
many of the an are non-zero). We give cc the uniform norm ∞ .
(i) Show that the dual space cc (N)∗ is isomorphic to 1 (N).
(ii) Show that the completion of cc (N) is isomorphic to c0 (N), the space
of sequences on N that go to zero at infinity (again with the uniform
norm); thus, by Exercise 1.5.4, the dual space of c0 (N) is isomorphic
to 1 (N) also.
(iii) On the other hand, show that the dual of 1 (N) is isomorphic to
∞ (N), a space which is strictly larger than c (N) or c (N). Thus
c 0
we see that the double dual of a Banach space can be strictly larger
than the space itself.
Exercise 1.5.7. Let H be a real or complex Hilbert space. Using the Riesz
representation theorem for Hilbert spaces (Theorem 1.4.13), show that the
dual space H ∗ is isomorphic (as a normed vector space) to the conjugate
space H (see Example 1.4.11), with an element g ∈ H being identified
with the linear functional f → f, g. Thus we see that Hilbert spaces are
essentially self-dual (if we ignore the pesky conjugation sign).
One of the key purposes of introducing the notion of a dual space is that
it allows one to define the notion of a transpose.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 63
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
64 1. Real analysis
Proof. We can assume that v lies outside Y , since the claim is trivial oth-
erwise. We can also normalise λY ∗ = 1 (the claim is of course trivial if
λY ∗ vanishes). To specify the extension λ̃ of λ, it suffices by linearity to
specify the value of λ̃(v). In order for the extension λ̃ to continue to have
operator norm 1, we require that
|λ̃(y + tv)| ≤ y + tvX
for all t ∈ R and y ∈ Y . This is automatic for t = 0, so by homogeneity it
suffices to attain this bound for t = 1. We rearrange this a bit as
sup λ(y ) − y + vX ≤ λ̃(v) ≤ inf y + vX − λ(y).
y ∈Y y∈Y
Proof. This is a standard “Zorn’s lemma argument” (see Section 2.4). Fix
Y , X, λ. Define a partial extension of λ to be a pair (Y , λ ), where Y is an
intermediate subspace between Y and X, and λ is an extension of λ with
the same operator norm as λ. The set of all partial extensions is partially
ordered by declaring (Y , λ ) ≥ (Y , λ ) if Y contains Y and λ extends λ .
It is easy to see that every chain of partial extensions has an upper bound;
hence, by Zorn’s lemma, there must be a maximal partial extension (Y∗ , λ∗ ).
If Y∗ = X, we are done; otherwise, one can find v ∈ X\Y∗ . By Proposition
1.5.7, we can then extend λ∗ further to the larger space spanned by Y∗ and
v, a contradiction; and the claim follows.
Remark 1.5.9. Of course, this proof of the Hahn-Banach theorem relied
on the axiom of choice (via Zorn’s lemma) and is thus non-constructive.
It turns out that this is, to some extent, necessary: it is not possible to
prove the Hahn-Banach theorem if one deletes the axiom of choice from the
axioms of set theory (although it is possible to deduce the theorem from
slightly weaker versions of this axiom, such as the ultrafilter lemma).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 65
Given a normed vector space X, we can form its double dual (X ∗ )∗ : the
space of linear functionals on X ∗ . There is a very natural map ι : X →
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
66 1. Real analysis
(X ∗ )∗ , defined as
(1.58) ι(x)(λ) := λ(x)
for all x ∈ X and λ ∈ X ∗.
(This map is closely related to the Gelfand
transform in the theory of operator algebras; see Section 1.10.4.) It is easy
to see that ι is a continuous linear transformation, with operator norm at
most 1. But the Hahn-Banach theorem gives a stronger statement:
Theorem 1.5.10. ι is an isometry.
Proof. We need to show that ι(x)X ∗∗ = x for all x ∈ X. The upper
bound is clear; the lower bound follows from Exercise 1.5.14.
Exercise 1.5.15. Let Y be a subspace of a normed vector space X. Define
the complement Y ⊥ of Y to be the space of all λ ∈ X ∗ which vanish on Y .
(i) Show that Y ⊥ is a closed subspace of X ∗ , and that Y := {x ∈ X :
λ(x) = 0 for all λ ∈ Y ⊥ } (compare with Exercise 1.4.13). In other
words, ι(Y ) = ι(X) ∩ Y ⊥⊥ .
(ii) Show that Y ⊥ is trivial if and only if Y is dense, and Y ⊥ = X ∗ if
and only if Y is trivial.
(iii) Show that Y ⊥ is isomorphic to the dual of the quotient space X/Y
(which has the norm x + Y X/Y := inf y∈Y x + yX ).
(iv) Show that Y ∗ is isomorphic to X ∗ /Y ⊥ .
From Theorem 1.5.10, every normed vector space can be identified with
a subspace of its double dual (and every Banach space is identified with
a closed subspace of its double dual). If ι is surjective, then we have an
isomorphism X ≡ X ∗∗ , and we say that X is reflexive in this case; since X ∗∗
is a Banach space, we conclude that only Banach spaces can be reflexive.
From linear algebra we see in particular that any finite-dimensional normed
vector space is reflexive; from Exercises 1.5.7 and 1.5.8 we see that any
Hilbert space and any Lp space with 1 < p < ∞ on a σ-finite space is also
reflexive (and the hypothesis of σ-finiteness can in fact be dropped). On the
other hand, from Exercise 1.5.6, we see that the Banach space c0 (N) is not
reflexive.
An important fact is that l1 (N) is also not reflexive: the dual of l1 (N)
is equivalent to l∞ (N), but the dual of l∞ (N) is strictly larger than that of
l1 (N). Indeed, consider the subspace c(N) of l∞ (N) consisting of bounded
convergent sequences (equivalently, this is the space spanned by c0 (N) and
the constant sequence (1)n∈N ). The limit functional (an )∞ n=1 → limn→∞ an
is a bounded linear functional on c(N), with operator norm 1, and thus by
the Hahn-Banach theorem can be extended to a generalised limit functional
λ : l∞ (N) → C which is a continuous linear functional of operator norm
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 67
1. As such generalised limit functionals annihilate all of c0 (N) but are still
non-trivial, they do not correspond to any element of 1 (N) ≡ c0 (N)∗ .
Exercise 1.5.16. Let λ : l∞ (N) → C be a generalised limit functional (i.e.,
an extension of the limit functional of c(N) of operator norm 1) which is
also an algebra homomorphism, i.e., λ((xn yn )∞ ∞ ∞
n=1 ) = λ((xn )n=1 )λ((yn )n=1 )
for all sequences (xn )∞ ∞
n=1 , (yn )n=1 ∈
∞ (N). Show that there exists a unique
Proof. Take Y to be the unit ball in X ∗ , then the map ι identifies X with
a subspace of BC(Y ).
Remark 1.5.12. If X is separable, it is known that one can take Y to just
be the unit interval [0, 1]; this is the Banach-Mazur theorem, which we will
not prove here.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
68 1. Real analysis
such that λY ∗ = 1 and λ(T x) = T ∗ λ(x) = T xY ≥ αx, and thus
T ∗ λX ∗ ≥ α. This implies that T ∗ op ≥ α; taking suprema over all α
strictly less than T op we obtain the claim.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.5. The Hahn-Banach theorem 69
Proof. Clearly (ii) implies (i); now we show that (i) implies (ii). We first
handle the case when A and B are convex cones.
Define a good pair to be a pair (A, B) where A and B are disjoint convex
cones, with A algebraically open, thus (A, B) is a good pair by hypothesis.
We can order (A, B) ≤ (A , B ) if A contains A and B contains B. A
standard application of Zorn’s lemma (Section 2.4) reveals that any good
pair (A, B) is contained in a maximal good pair, and so without loss of
generality we may assume that (A, B) is a maximal good pair.
We can of course assume that neither A nor B is empty. We now claim
that B is the complement of A. For if not, then there exists v ∈ V which
does not lie in either A or B. By the maximality of (A, B), the convex cone
generated by B ∪ {v} must intersect A at some point, say w. By dilating
w if necessary we may assume that w lies on a line segment between v and
some point b in B. By using the convexity and disjointness of A and B
one can then deduce that for any a ∈ A, the ray {a + t(w − b) : t > 0} is
disjoint from B. Thus one can enlarge A to the convex cone generated by
A and w − b, which is still algebraically open and now strictly larger than
A (because it contains v), a contradiction. Thus B is the complement of A.
Let us call a line in V monochromatic if it is entirely contained in A
or entirely contained in B. Note that if a line is not monochromatic, then
(because A and B are convex and partition the line, and A is algebraically
open) the line splits into an open ray contained in A and a closed ray con-
tained in B. From this we can conclude that if a line is monochromatic,
then all parallel lines must also be monochromatic, because otherwise we
look at the ray in the parallel line which contains A and use convexity of
both A and B to show that this ray is adjacent to a halfplane contained in
B, contradicting algebraic openness. Now let W be the space of all vectors w
for which there exists a monochromatic line in the direction w (including 0).
Then W is easily seen to be a vector space; since A, B are non-empty, W is
a proper subspace of V . On the other hand, if w and w are not in W , some
playing around with the property that A and B are convex sets partitioning
V shows that the plane spanned by w and w contains a monochromatic
line, and hence some non-trivial linear combination of w and w lies in W .
Thus V /W is precisely one dimensional. Since every line with direction in
w is monochromatic, A and B also have well-defined quotients A/W and
B/W on this one-dimensional subspace, which remain convex (with A/W
still algebraically open). But then it is clear that A/W and B/W are an
open and closed ray from the origin in V /W , respectively. It is then a rou-
tine matter to construct a linear functional λ : V → R (with null space W )
such that A = {λ < 0} and B = {λ ≥ 0}, and the claim follows.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
70 1. Real analysis
To establish the general case when A, B are not convex cones, we lift to
one higher dimension and apply the previous result to convex cones A , B ∈
R×V defined by A := {(t, tx) : t > 0, x ∈ A}, B := {(t, tx) : t > 0, x ∈ B}.
We leave the verification that this works as an exercise.
Exercise 1.5.19. Use the geometric Hahn-Banach theorem to reprove Ex-
ercise 1.5.18, thus providing a slightly different proof of the Hahn-Banach
theorem. (It is possible to reverse these implications and deduce the geomet-
ric Hahn-Banach theorem from the usual Hahn-Banach theorem, but this
is somewhat trickier, requiring one to fashion a norm out of the difference
A − B of two convex cones.)
Exercise 1.5.20 (Algebraic Hahn-Banach theorem). Let V be a vector
space over a field F , let W be a subspace of V , and let λ : W → F be a
linear map. Show that there exists a linear map λ̃ : V → F which extends
λ.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.6
A quick review of
point-set topology
1.6.1. Metric spaces. In many spaces, one wants a notion of when two
points in the space are near or far. A particularly quantitative and intuitive
way to formalise this notion is via the concept of a metric space.
Example 1.6.2. Every normed vector space (X, ) is a metric space, with
distance function d(x, y) := x − y.
71
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
72 1. Real analysis
Given a metric space, one can then define various useful topological
structures. There are two ways to do so. One is via the machinery of
convergent sequences:
Definition 1.6.6 (Topology of a metric space). Let (X, d) be a metric space.
• A sequence xn of points in X is said to converge to a limit x ∈ X if
one has d(xn , x) → 0 as n → ∞. In this case, we say that xn → x in
the metric d as n → ∞, and that limn→∞ xn = x in the metric space
X. (It is easy to see that any sequence of points in a metric space
has at most one limit.)
• A point x is an adherent point of a set E ⊂ X if it is the limit of some
sequence in E. (This is slightly different from being a limit point of
E, which is equivalent to being an adherent point of E\{x}; every
adherent point is either a limit point or an isolated point of E.) The
set of all adherent points of E is called the closure E of X. A set E
is closed if it contains all its adherent points, i.e., if E = E. A set
E is dense if every point in X is adherent to E, or equivalently if
E = X.
• Given any x in X and r > 0, define the open ball B(x, r) centred
at x with radius r to be the set of all y in X such that d(x, y) < r.
Given a set E, we say that x is an interior point of E if there is some
open ball centred at x which is contained in E. The set of all interior
points is called the interior E ◦ of E. A set is open if every point is
an interior point, i.e., if E = E ◦ .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 73
In the next section we will adopt this “open sets first” perspective when
defining topological spaces.
On the other hand, there are some other properties of subsets of a metric
space which require the metric structure more fully, and cannot be defined
purely in terms of open sets (see, e.g., Example 1.6.24), although some of
these concepts can still be defined using a structure intermediate to metric
spaces and topological spaces, such as uniform space. For instance:
Definition 1.6.7. Let (X, d) be a metric space.
• A sequence (xn )∞ n=1 of points in X is a Cauchy sequence if
d(xn , xm ) → 0 as n, m → ∞ (i.e., for every ε > 0 there exists N > 0
such that d(xn , xm ) ≤ ε for all n, m ≥ N ).
• A space X is complete if every Cauchy sequence is convergent.
• A set E in X is bounded if it is contained inside a ball.
• A set E is totally bounded in X if for every ε > 0, E can be covered
by finitely many balls of radius ε.
Exercise 1.6.2. Show that any metric space X can be identified with a
dense subspace of a complete metric space X, known as a metric completion
or Cauchy completion of X. (For instance, R is a metric completion of
Q.) (Hint: One can define a real number to be an equivalence class of
Cauchy sequences of rationals. Once the reals are defined, essentially the
same construction works in arbitrary metric spaces.) Furthermore, if X is
another metric completion of X, show that there exists an isometry between
X and X which is the identity on X. Thus, up to isometry, there is a unique
metric completion to any metric space.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
74 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 75
Any space that obeys one of the four equivalent properties in Theorem
1.6.8 is called a compact space; a subset E of a metric space X is said to be
compact if it is a compact space when viewed as a subspace of X. There are
some variants of the notion of compactness which are also of importance for
us:
• A space is σ-compact if it can be expressed as the countable union
of compact sets. (For instance, the real line R with the usual metric
is σ-compact.)
• A space is locally compact if every point is contained in the interior
of a compact set. (For instance, R is locally compact.)
• A subset of a space is precompact or relatively compact if it is con-
tained inside a compact set (or equivalently, if its closure is compact).
Another fundamental notion in the subject is that of a continuous map.
Exercise 1.6.5. Let f : X → Y be a map from one metric space (X, dX )
to another (Y, dY ). Then the following are equivalent:
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
76 1. Real analysis
collection of open sets has become an abstract lattice, in the spirit of Section 2.3, but we will not
need such notions in this course.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 77
Any notion in metric space theory which can be defined purely in terms
of open sets, can now be defined for topological spaces. Thus for instance:
Definition 1.6.15. Let (X, F) be a topological space.
• A sequence xn of points in X converges to a limit x ∈ X if and only
if every open neighbourhood of x (i.e., an open set containing x)
contains xn for all sufficiently large n. In this case we write xn → x
in the topological space (X, F ), and (if x is unique) we write x =
limn→∞ xn .
• A point is a sequentially adherent point of a set E if it is the limit of
some sequence in E.
• A point x is an adherent point of a set E if and only if every open
neighbourhood of x intersects E.
• The set of all adherent points of E is called the closure of E and is
denoted E.
• A set E is closed if and only if its complement is open, or equivalently
if it contains all its adherent points.
• A set E is dense if and only if every non-empty open set intersects
E, or equivalently if its closure is X.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
78 1. Real analysis
• The interior of a set E is the union of all the open sets contained
in E, and x is called an interior point of E if and only if some
neighbourhood of x is contained in E.
• A space X is sequentially compact if every sequence has a convergent
subsequence.
• A space X is compact if every open cover has a finite subcover.
• The concepts of being σ-compact, locally compact, and precom-
pact can be defined as before. (One could also define sequential
σ-compactness, etc., but these notions are rarely used.)
• A map f : X → Y between topological spaces is sequentially contin-
uous if whenever xn converges to a limit x in X, f (xn ) converges to
a limit f (x) in Y .
• A map f : X → Y between topological spaces is continuous if the
inverse image of every open set is open.
Remark 1.6.16. The stronger a topology becomes, the more open and
closed sets it will have, but fewer sequences will converge, there are fewer
(sequentially) adherent points and (sequentially) compact sets, closures be-
come smaller, and interiors become larger. There will be more (sequentially)
continuous functions on this space, but fewer (sequentially) continuous func-
tions into the space. Note also that the identity map from a space X with
one topology F to the same space X with a different topology F is contin-
uous precisely when F is stronger than F .
Example 1.6.17. In a metric space, these topological notions coincide with
their metric counterparts, and sequential compactness and compactness are
equivalent, as are sequential continuity and continuity.
Exercise 1.6.8 (Urysohn’s subsequence principle). Let xn be a sequence
in a topological space X, and let x be another point in X. Show that the
following are equivalent:
• xn converges to x.
• Every subsequence of xn converges to x.
• Every subsequence of xn has a further subsequence that converges to
x.
Exercise 1.6.9. Show that every sequentially adherent point is an adherent
point, and every continuous function is sequentially continuous.
Remark 1.6.18. The converses to Exercise 1.6.9 are unfortunately not
always true in general topological spaces. For instance, if we endow an
uncountable set X with the cocountable topology (so that a set is open if
it is either empty or its complement is at most countable), then we see
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 79
that the only convergent sequences are those which are eventually constant.
Thus, every subset of X contains its sequentially adherent points, and every
function from X to another topological space is sequentially continuous,
even though not every set in X is closed and not every function on X is
continuous. An example of a set which is sequentially compact but not
compact is the first uncountable ordinal with the order topology (Exercise
1.6.10). It is trickier to give an example of a compact space which is not
sequentially compact; this will have to wait until we establish Tychonoff’s
theorem (Theorem 1.8.14). However one can “fix” this discrepancy between
the sequential and non-sequential concepts by replacing sequences with the
more general notion of nets; see Section 1.6.3.
Example 1.6.21 (Order topology). Any totally ordered set (X, <) gen-
erates the order topology, defined as the topology generated by the sets
{x ∈ X : x > a} and {x ∈ X : x < a} for all a ∈ X. In particular, the
extended real line [−∞, +∞] can be given the order topology, and the no-
tion of convergence of sequences in this topology to either finite or infinite
limits is identical to the notion one is accustomed to in undergraduate real
analysis. (On the real line, of course, the order topology corresponds to the
usual topology.) Also observe that a function n → xn from the extended
natural numbers N ∪ {+∞} (with the order topology) into a topological
space X is continuous if and only if xn → x+∞ as n → ∞, so one can
interpret convergence of sequences as a special case of continuity.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
80 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 81
Example 1.6.27. Every metric space is Hausdorff (one can use the open
balls B(x, d(x, y)/2) and B(y, d(x, y)/2) as the separating neighbourhoods).
On the other hand, the trivial topology (Example 1.6.13) on two or more
points is not Hausdorff, and neither is the cocountable topology (Remark
1.6.18) on an uncountable set, or the upper topology (Example 1.6.23) on
the real line. Thus, these topologies do not arise from a metric.
Exercise 1.6.11. Show that the half-open topology (Example 1.6.22) is
Hausdorff, but does not arise from a metric. (Hint: Assume for contradiction
that the half-open topology did arise from a metric. Then show that for
every real number x there exists a rational number q and a positive integer
n such that the ball of radius 1/n centred at q has infimum x.) Thus there are
more obstructions to metrisability than just the Hausdorff property; a more
complete answer is provided by Urysohn’s metrisation theorem (Theorem
2.5.7).
Exercise 1.6.12. Show that in a Hausdorff space, any sequence can have at
most one limit. (For a more precise statement, see Exercise 1.6.16 below.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
82 1. Real analysis
if there exists β ∈ A such that P (α) holds for all α ≥ β. (Note in particular
that if P (α) and Q(α) separately hold for sufficiently large α, then their
conjunction P (α) ∧ Q(α) also holds for sufficiently large α.)
A net (xα )α∈A in a topological space X is said to converge to a limit
x ∈ X if for every neighbourhood V of x, we have xα ∈ V for all sufficiently
large α.
A subnet of a net (xα )α∈A is a tuple of the form (xφ(β) )β∈B , where
(B, <) is another directed set, and φ : B → A is a monotone map (thus
φ(β ) ≥ φ(β) whenever β ≥ β) which also has cofinal image. This means
that for any α ∈ A there exists β ∈ B with φ(β) ≥ α (in particular, if P (α)
is true for sufficiently large α, then P (φ(β)) is true for sufficiently large β).
Remark 1.6.30. Every sequence is a net, but one can create nets that do
not arise from sequences (in particular, one can take A to be uncountable).
Note a subtlety in the definition of a subnet—we do not require φ to be
injective, so B can in fact be larger than A! Thus subnets differ a little bit
from subsequences in that they allow repetitions.
Remark 1.6.31. Given a directed set A, one can endow A∪{+∞} with the
topology generated by the singleton sets {α} with α ∈ A together with the
sets [α, +∞] := {β ∈ A∪{+∞} : β ≥ α} for α ∈ A, with the convention that
+∞ > α for all α ∈ A. The property of being directed is precisely saying
that these sets form a base. A net (xα )α∈A converges to a limit x+∞ if and
only if the function α → xα is continuous on A∪{+∞} (cf. Example 1.6.21).
Also, if (xφ(β) )β∈B is a subnet of (xα )α∈A , then φ is a continuous map from
B ∪ {+∞} to A ∪ {+∞}, if we adopt the convention that φ(+∞) = +∞.
In particular, a subnet of a convergent net remains convergent to the same
limit.
The point of working with nets instead of sequences is that one no longer
needs to worry about the distinction between sequential and non-sequential
concepts in topology, as the following exercises show.
Exercise 1.6.13. Let X be a topological space, let E be a subset of X, and
let x be an element of X. Show that x is an adherent point of E if and only
if there exists a net (xα )α∈A in E that converges to x. (Hint: Take A to be
the directed set of neighbourhoods of x, ordered by reverse set inclusion.)
Exercise 1.6.14. Let f : X → Y be a map between two topological spaces.
Show that f is continuous if and only if for every net (xα )α∈A in X that
converges to a limit x, the net (f (xα ))α∈A converges in Y to f (x).
Exercise 1.6.15. Let X be a topological space. Show that X is compact if
and only if every net has a convergent subnet. (Hint: Equate both properties
of X with the finite intersection property, and review the proof of Theorem
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.6. Point set topology 83
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.7
85
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
86 1. Real analysis
Thus one can think of a null set as a set which is nowhere dense in some
measure-theoretic sense.
It turns out that there are analogues of these results when the measure
space X = (X, X , μ) is replaced instead by a complete metric space X =
(X, d). Here, the appropriate notion of a small set is not a null set, but
rather that of a nowhere dense set: a set E which is not dense in any ball,
or equivalently a set whose closure has empty interior. (A good example of
a nowhere dense set would be a proper subspace, or smooth submanifold, of
Rd , or a Cantor set; on the other hand, the rationals are a dense subset of R
and thus clearly not nowhere dense.) We then have the following important
result:
Theorem 1.7.3 (Baire category theorem). Let E1 , E2 , . . . be
an at most
countable sequence of subsets of a complete metric space X. If n En con-
tains a ball B, then at least one of the En is dense in a subball B of B
(and in particular is not nowhere dense). To put it in the contrapositive:
the countable union of nowhere dense sets cannot contain a ball.
Exercise 1.7.1. Show that the Baire category theorem is equivalent to the
claim that in a complete metric space, the countable intersection of open
dense sets remain dense.
Exercise 1.7.2. Using the Baire category theorem, show that any non-
empty complete metric space without isolated points is uncountable. (In
particular, this shows that the Baire category theorem can fail for incomplete
metric spaces such as the rationals Q.)
To quickly illustrate an application of the Baire category theorem, ob-
serve that it implies that one cannot cover a finite-dimensional real or com-
plex vector space Rn , Cn by a countable number of proper subspaces. One
can of course also establish this fact by using Lebesgue measure on this
space. However, the advantage of the Baire category approach is that it
also works well in infinite dimensional complete normed vector spaces, i.e.,
Banach spaces, whereas the measure-theoretic approach runs into significant
difficulties in infinite dimensions. This leads to three fundamental equiva-
lences between the qualitative theory of continuous linear operators on Ba-
nach spaces (e.g., finiteness, surjectivity, etc.) to the quantitative theory
(i.e., estimates):
• The uniform boundedness principle that equates the qualitative
boundedness (or convergence) of a family of continuous operators
with their quantitative boundedness.
• The open mapping theorem that equates the qualitative solvability
of a linear problem Lu = f with the quantitative solvability of that
problem.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 87
Strictly speaking, these theorems are not used much directly in practice,
because one usually works in the reverse direction (i.e., first proving quan-
titative bounds, and then deriving qualitative corollaries). But the above
three theorems help explain why we usually approach qualitative problems
in functional analysis via their quantitative counterparts.
Let us first prove the Baire category theorem:
Proof of Baire category theorem. Assume that the Baire category the-
orem failed; then it would be possible to cover a ball B(x0 , r0 ) in a complete
metric space by a countable family E1 , E2 , E3 , . . . of nowhere dense sets.
We now invoke the following easy observation: if E is nowhere dense,
then every ball B contains a subball B which is disjoint from E. Indeed,
this follows immediately from the definition of a nowhere dense set.
Invoking this observation, we can find a ball B(x1 , r1 ) in B(x0 , r0 /10)
(say) which is disjoint from E1 ; we may also assume that r1 ≤ r0 /10 by
shrinking r1 as necessary. Then, inside B(x1 , r1 /10), we can find a ball
B(x2 , r2 ) which is also disjoint from E2 , with r2 ≤ r1 /10. Continuing this
process, we end up with a nested sequence of balls B(xn , rn ), each of which
are disjoint from E1 , . . . , En , and such that B(xn , rn ) ⊂ B(xn−1 , rn−1 /10)
and rn ≤ rn−1 /10 for all n = 1, 2, . . ..
From the triangle inequality we have d(xn , xn−1 ) ≤ 2rn−1 /10 ≤ 2 ×
10−n r0 ,
and so the sequence xn is a Cauchy sequence. As X is complete,
xn converges to a limit x. Summing the geometric series, one verifies that
x ∈ B(xn−1 , rn−1 ) for all n = 1, 2, . . ., and in particular x is an element of
B which avoids all of E1 , E2 , E3 , . . ., a contradiction.
We can illustrate the analogy between the Baire category theorem and
the measure-theoretic analogs by introducing some further definitions. Call
a set E meager or of the first category if it can be expressed (or covered)
by a countable union of nowhere dense sets, and of the second category if it
is not meager. Thus, the Baire category theorem shows that any subset of
a complete metric space with non-empty interior is of the second category,
which may help explain the name for the property. Call a set co-meager or
residual if its complement is meager, and call a set Baire or almost open if it
differs from an open set by a meager set (note that a Baire set is unrelated to
the Baire σ-algebra). Then we have the following analogy between complete
metric space topology, and measure theory:
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
88 1. Real analysis
Nowhere dense sets are meager, and meager sets have empty interior.
Contrapositively, sets with dense interior are residual, and residual sets are
somewhere dense. Taking complements instead of contrapositives, we see
that open dense sets are co-meager, and co-meager sets are dense.
While there are certainly many analogies between meager sets and null
sets (for instance, both classes are closed under countable unions or under
intersections with arbitrary sets), the two concepts can differ in practice.
For instance, in the real line R with the standard metric and measure space
structures, the set
∞
(1.61) (qn − 2−n , qn + 2−n ),
n=1
where q1 , q2 , . . . is an enumeration of the rationals, is open and dense but
has Lebesgue measure at most 2; thus its complement has infinite measure
in R but is nowhere dense (hence meager). As a variant of this, the set
∞
∞
(1.62) (qn − 2−n /m, qn + 2−n /m)
m=1 n=1
is a null set but is the intersection of countably many open dense sets and
is thus co-meager.
Exercise 1.7.3. A real number x is Diophantine if for every ε > 0 there
exists cε > 0 such that |x − aq | ≥ |q|c2+ε
ε
for every rational number aq . Show
that the set of Diophantine real numbers has full measure but is meager.
Remark 1.7.4. If one assumes some additional axioms of set theory (e.g.,
the continuum hypothesis), it is possible to show that the collection of mea-
ger subsets of R and the collection of null subsets of R (viewed as σ-ideals of
the collection of all subsets of R) are isomorphic; this is the Sierpinski-Erdős
theorem, which we will not prove here. Roughly speaking, this theorem tells
us that any effective first-order statement which is true about meager sets
will also be true about null sets, and conversely.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 89
Banach spaces. Note that Lemma 1.3.17 already gave a prototypical such
equivalence between a qualitative property (continuity) and a quantitative
one (boundedness).
Proof. It is clear that (ii) implies (i); now assume (i) holds and let us obtain
(ii).
For each n = 1, 2, . . ., let En be the set
(1.63) En := {x ∈ X : Tα xY ≤ n for all α ∈ A}.
The hypothesis (i) is nothing more than the assertion that the En cover X,
and thus by the Baire category theorem must be dense in a ball. Since the
Tα are continuous, the En are closed, and so one of the En contains a ball.
Since En − En ⊂ E2n , we see that one of the En contains a ball centred at
the origin. Dilating n as necessary, we see that one of the En contains the
unit ball B(0, 1). But then all the Tα op are bounded by n, and the claim
follows.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
90 1. Real analysis
Suppose that (ii) fails, then Tα op is unbounded. We can then find a se-
quence αn ∈ A such that Tαn+1 op > 100n Tαn op (say) for all n. We can
then find unit vectors xn such that Tαn xn Y ≥ 12 Tαn op .
We can then form the absolutely convergent
∞ (and hence conditionally
convergent, by completeness) sum x = n=1 n 10−n xn for some choice of
signs n = ±1 recursively as follows: Once 1 , . . . , n−1 have been chosen,
choose the sign n so that
n
1
(1.64) m 10−m Tαm xm Y ≥ 10−n Tαn xn Y ≥ 10−n Tαn op .
2
m=1
From the triangle inequality we soon conclude that
1
(1.65) Tαn xY ≥ 10−n Tαn op .
4
But by hypothesis, the right-hand side of (1.65) is unbounded in n, contra-
dicting (i).
Proof. Clearly (ii) implies (i), and as convergent sequences are bounded,
we see from Theorem 1.7.3 that (i) implies (iii). The implication of (ii) from
(iii) follows by a standard limiting argument and is left as an exercise.
Remark 1.7.8. The same equivalences hold if one replaces the sequence
(Tn )∞
n=1 by a net (Tα )α∈A .
Example 1.7.9 (Fourier inversion formula). For any f ∈ L2 (R) and N > 0,
define the Dirichlet summation operator
N
(1.66) SN f (x) := fˆ(ξ)e2πixξ dξ,
−N
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 91
Remark 1.7.10. There is a partial analogue of Corollary 1.7.7 for the ques-
tion of pointwise almost everywhere convergence rather than norm conver-
gence, which is known as Stein’s maximal principle (discussed, for instance,
in Section 1.9 of Structure and Randomness). For instance, it reduces Car-
leson’s theorem on the pointwise almost everywhere convergence of Fourier
series to the boundedness of a certain maximal function (the Carleson maxi-
mal operator) related to Fourier summation, although the latter task is again
quite non-trivial. (As in Example 1.7.9, the role of the maximal principle is
meta-mathematical rather than direct.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
92 1. Real analysis
(i) L is surjective.
(ii) L is open.
(iii) Qualitative solvability. For every f ∈ Y there exists a solution u ∈ X
to the equation Lu = f .
(iv) Quantitative solvability. There exists a constant C > 0 such that for
every f ∈ Y there exists a solution u ∈ X to the equation Lu = f ,
which obeys the bound uX ≤ Cf Y .
(v) Quantitative solvability for a dense subclass. There exists a constant
C > 0 such that for a dense set of f in Y , there exists a solution u ∈
X to the equation Lu = f , which obeys the bound uX ≤ Cf Y .
Proof. Clearly (iv) implies (iii), which is equivalent to (i), and it is easy to
see from linearity that (ii) and (iv) are equivalent (cf. the proof of Lemma
1.3.17). (iv) trivially implies (v), while to conversely obtain (iv) from (v),
observe that if E is any dense subset of the Banach space Y, then any f in Y
can be expressed as an absolutely convergent series f = n fn of elements
−1
in E (since one can iteratively approximate the residual f − N n=1 fn to
arbitrary accuracy by an element of E for N = 1, 2, 3, . . .), and the claim
easily follows. So it suffices to show that (iii) implies (iv).
For each n, let En ⊂ Y be the set of all f ∈ Y for which there exists a
to Lu = f with uX ≤ nf Y . From the hypothesis (iii), we see
solution
that n En = Y . Since Y is complete, the Baire category theorem implies
that there is some En which is dense in some ball B(f0 , r) in Y . In other
words, the problem Lu = f is approximately quantitatively solvable in the
ball B(f0 , r) in the sense that for every ε > 0 and every f ∈ B(f0 , r), there
exists an approximate solution u with Lu − f Y ≤ ε and uX ≤ nLuY ,
and thus uX ≤ nr + nε.
By subtracting two such approximate solutions, we conclude that for
any f ∈ B(0, 2r) and any ε > 0, there exists u ∈ X with Lu − f Y ≤ 2ε
and uX ≤ 2nr + 2nε.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 93
We can iterate this procedure and then take limits (now using the com-
pleteness of X rather than Y ) to obtain a solution to Lu = f for every
f ∈ Y with uX ≤ 5nf Y , and the claim follows.
Remark 1.7.13. The open mapping theorem provides metamathematical
justification for the method of a priori estimates for solving linear equations
such as Lu = f for a given datum f ∈ Y and for an unknown u ∈ X, which
is of course a familiar problem in linear PDE. The a priori method assumes
that f is in some dense class of nice functions (e.g., smooth functions) in
which solvability of Lu = f is presumably easy, and then proceeds to obtain
the a priori estimate uX ≤ Cf Y for some constant C. Theorem 1.7.12
then assures that Lu = f is solvable for all f in Y (with a similar bound).
As before, this implication does not directly use the Baire category theorem,
but that theorem helps explain why this method is not wasteful.
A pleasant corollary of the open mapping theorem is that, as with or-
dinary linear algebra or with arbitrary functions, invertibility is the same
thing as bijectivity:
Corollary 1.7.14. Let T : X → Y be a continuous linear operator between
two Banach spaces X, Y . Then the following are equivalent:
• Qualitative invertibility. T is bijective.
• Quantitative invertibility. T is bijective, and T −1 : Y → X is a
continuous (hence bounded) linear transformation.
Remark 1.7.15. The claim fails without the completeness hypotheses on
X and Y . For instance, consider the operator T : cc (N) → cc (N) defined
by T (an )∞ an ∞
n=1 := ( n )n=1 , where we give cc (N) the uniform norm. Then T is
continuous and bijective, but T −1 is unbounded.
Exercise 1.7.5. Show that Corollary 1.7.14 can still fail if we drop the
completeness hypothesis on just X or just Y .
Exercise 1.7.6. Suppose that L : X → Y is a surjective continuous lin-
ear transformation between Banach spaces. By combining the open map-
ping theorem with the Hahn-Banach theorem, show that the transpose map
L∗ : Y ∗ → X ∗ is bounded from below, i.e., there exists c > 0 such that
L∗ λX ∗ ≥ cλY ∗ for all λ ∈ Y ∗ . Conclude that L∗ is an isomorphism
between Y ∗ and L∗ (Y ∗ ).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
94 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 95
Proof. It is clear that (i) implies (iii) (just take F to equal the norm topol-
ogy). To see why (iii) implies (ii), observe that if xn → x in X and T xn → y
in norm, then T xn → y in the weaker topology F as well; but by weak
continuity T xn → T x in F . Since Hausdorff topological spaces have unique
limits, we have T x = y and so T is closed.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
96 1. Real analysis
Now we show that (ii) implies (i). If T is closed, then the graph Γ :=
{(x, T x) : x ∈ X} is a closed linear subspace of the Banach space X × Y
and is thus also a Banach space. On the other hand, the projection map
π : (x, T x) → x from Γ to X is clearly a continuous linear bijection. By
Corollary 1.7.14, its inverse x → (x, T x) is also continuous, and so T is
continuous as desired.
Proof. Clearly (ii) implies (iii) or (i). If we have (iii), then T extends
uniquely to a bounded linear map from X to Y , which must agree with the
original continuous map from X to Z since limits in the Hausdorff space Z
are unique, and so (iii) implies (ii). Finally, if (i) holds, then we can view
T as a map from X to Y , which by Theorem 1.7.19 is continuous, and the
claim now follows from Lemma 1.3.17.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 97
for some constant Cp and all f in some suitable dense subclass of Lp (R)
(e.g., the space C0∞ (R) of smooth functions of compact support), together
with the soft observation that the Fourier transform is continuous from
Lp (R) to the space of tempered distributions, which is a Hausdorff space
into which Lp (R) embeds continuously. (We will prove this inequality in
(1.103).) One can replace the Hausdorff-Young inequality here by countless
other estimates in harmonic analysis to obtain similar qualitative regularity
conclusions.
1.7.4. Non-linear solvability (optional). In this section we give an ex-
ample of a linear equation Lu = f which can only be quantitatively solved
in a non-linear fashion. We will use a number of basic tools which we will
only cover later in this course, and so this material is optional reading.
Let X = {0, 1}N be the infinite discrete cube with the product topology;
by Tychonoff’s theorem (Theorem 1.8.14) this is a compact Hausdorff space.
The Borel σ-algebra is generated by the cylinder sets
(1.68) En := {(xm )∞
m=1 ∈ {0, 1} : xn = 1}.
N
(From a probabilistic view point, one can think of X as the event space for
flipping a countably infinite number of coins and En as the event that the
nth coin lands as heads.)
Let M (X) be the space of finite Borel measures on X; this can be verified
to be a Banach space. There is a map L : M (X) → ∞ (N) defined by
(1.69) L(μ) := (μ(En ))∞
n=1 .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
98 1. Real analysis
which is weakly convergent in the sense that σn (E) converges to some limit
σ(E) for each E ∈ B.
• The σn are uniformly countably additive, which means that for
any sequence E1 , E2 , . . . of disjoint measurable sets, the series
∞
m=1 |σn (Em )| converges uniformly in n.
• σ is a signed finite measure.
Proof. It suffices to prove the first claim, since this easily implies that σ is
also countably additive and is thence a signed finite measure. Suppose for
contradiction that the claim failed, then one ∞could find disjoint E1 , E2 , . . .
and ε > 0 such that one has lim supn→∞ m=M |σn (Em )| > ε for all M .
We now construct disjoint sets A1 , A2 , . . ., each consisting of the union of a
finite collection of the Ej , and an increasing sequence n1 , n2 , . . . of positive
integers, by the following recursive procedure:
0. Initialise k = 0.
1. Suppose recursively that n1 < · · · < n2k and A1 , . . . , Ak has already
been constructed for some k ≥ 0.
2. Choose n2k+1 > n2k so large that for all n ≥ n2k+1 , μn (A1 ∪ · · · ∪ Ak )
differs from μ(A1 ∪ · · · ∪ Ak ) by at most ε/10.
3. Choose Mk so large that Mk is larger than j for any Ej ⊂ A1 ∪· · ·∪Ak ,
and such that ∞ m=Mk |μnj (Em )| ≤ ε/100
k+1 for all 1 ≤ j ≤ 2k + 1.
∞
4. Choose n2k+2 > n2k+1 so that m=Mk |μn2k+2 (Em )| > ε.
5. Pick Ak+1 to be a finite union of the Ej with j ≥ Mk such that
|μn2k+2 (Ak+1 )| > ε/2.
6. Increment k to k + 1 and then return to step 2.
It is then a routine matter to show that if A := ∞ j=1 Aj , then |μn2k+2 (A) −
μn2k+1 (A)| ≥ ε/10 for all j, contradicting the hypothesis that μj is weakly
convergent to μ.
Exercise 1.7.11 (Schur’s property for 1 ). Show that if a sequence in 1 (N)
is convergent in the weak topology, then it is convergent in the strong topol-
ogy.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.7. The Baire category theorem 99
M (X), we see that S(an )(E) converges to some limit μ(E) for all measurable
sets E. Applying the Nikodym convergence theorem, we see that μ is also
a signed finite measure. We then see that S(an ) converges
in the weak
topology to μ. (One way to see this is to define ν := ∞ n=1 2 −n |S(a )| + |μ|,
n
then ν is finite and S(an ), μ are all absolutely continuous with respect to ν.
Now use the Radon-Nikodym theorem (see Section 1.2) and the fact that
L1 (ν)∗ ≡ L∞ (ν).) On the other hand, as LS = I and L and S are both
bounded, S is a Banach space isomorphism between c0 and S(c0 ). Thus
S(c0 ) is complete, hence closed, hence weakly closed (by the Hahn-Banach
theorem), and so μ = S(a) for some a ∈ c0 . By the Hahn-Banach theorem
again, this implies that an converges weakly to a ∈ c0 . But this is easily
seen to be impossible, since the constant sequence (1)∞ m=1 does not lie in c0 ,
and the claim follows.
Now we give the hard analysis proof. Let e1 , e2 , . . . be the standard basis
for ∞ (N), let N be a large number, and consider the random sums
(1.70) S(ε1 e1 + · · · + εN eN ),
where εn ∈ {−1, 1} are iid random signs. Since the ∞ norm of ε1 e1 + · · · +
εN eN is 1, we have
(1.71) S(ε1 e1 + · · · + εN eN )M (X) ≤ C
for some constant C independent of N . On the other hand, we can write
S(en ) = fn ν for some finite measure ν and some fn ∈ L1 (ν) using Radon-
Nikodym as in the previous proof, and then
(1.72) ε1 f1 + · · · + εN fN L1 (ν) ≤ C.
Taking expectations and applying Khintchine’s inequality, we conclude
N
(1.73) ( |fn |2 )1/2 L1 (ν) ≤ C
n=1
for some constant C independent of N . By Cauchy-Schwarz, this implies
that
N √
(1.74) |fn |L1 (ν) ≤ C N .
n=1
But as fn L1 (ν) = S(en )M (X) ≥ c for some constant c > 0 independent
of N , we obtain a contradiction for N large enough, and the claim follows.
Remark 1.7.23. The phenomenon of non-linear quantitative solvability
actually comes up in many applications of interest. For instance, consider
the Fefferman-Stein decomposition theorem [FeSt1972], which asserts that
any f ∈ BM O(R) of bounded mean oscillation can be decomposed as f =
g + Hh for some g, h ∈ L∞ (R), where H is the Hilbert transform. This
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
100 1. Real analysis
theorem was first proven by using the duality of the Hardy space H 1 (R)
and BMO (and by using Exercise 1.5.13), and by using the fact that a
function f is in H 1 (R) if and only if f and Hf both lie in L1 (R). From
the open mapping theorem, we know that we can pick g, h so that the L∞
norms of g, h are bounded by a multiple of the BMO norm of f . But it turns
out not to be possible to pick g and h in a bounded linear manner in terms
of f , although this is a little tricky to prove. (Uchiyama [Uc1982] famously
gave an explicit construction of g, h in terms of f , but the construction was
highly non-linear.)
An example in a similar spirit was given more recently by Bourgain and
Brezis [BoBr2003], who considered the problem of solving the equation
div u = f on the d-dimensional torus Td for some function f : Td → C on
the torus with mean zero and with some unknown vector field u : Td → Cd ,
where the derivatives are interpreted in the weak sense. They showed that if
d ≥ 2 and f ∈ Ld (Td ), then there existed a solution u to this problem with
u ∈ W 1,d ∩ C 0 , despite the failure of Sobolev embedding at this endpoint.
Again, the open mapping theorem allows one to choose u with norm bounded
by a multiple of the norm of f , but Bourgain and Brezis also show that one
cannot select u in a bounded linear fashion depending on f .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.8
Compactness in
topological spaces
One of the most useful concepts for analysis that arises from topology and
metric spaces is the concept of compactness. Recall (from Section 1.6) that
a space X is compact if every open cover of X has a finite subcover, or
equivalently if any collection of closed sets whose finite subcollections have
non-empty intersection itself has non-empty intersection. (In other words,
all families of closed sets obey the finite intersection property.)
In these notes, we explore how compactness interacts with other key
topological concepts: the Hausdorff property, bases and subbases, product
spaces, and equicontinuity, in particular establishing the useful Tychonoff
and Arzelá-Ascoli theorems that give criteria for compactness (or precom-
pactness).
Exercise 1.8.1 (Basic properties of compact sets).
• Show that any finite set is compact.
• Show that any finite union of compact subsets of a topological space
is still compact.
• Show that any image of a compact space under a continuous map is
still compact.
Show that these three statements continue to hold if “compact” is replaced
by “sequentially compact”.
101
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
102 1. Real analysis
respectively; every metric space is Hausdorff, but not every topological space
is.
At first glance, the Hausdorff property bears no resemblance to the com-
pactness property. However, they are in some sense dual to each other, as
the following two exercises show:
Exercise 1.8.2. Let X = (X, F ) be a compact topological space.
• Show that every closed subset in X is compact.
• Show that any weaker topology F ⊂ F on X also yields a compact
topological space (X, F ).
• Show that the trivial topology on X is always compact.
Exercise 1.8.3. Let X be a Hausdorff topological space.
• Show that every compact subset of X is closed.
• Show that any stronger topology F ⊃ F on X also yields a Hausdorff
topological space (X, F ).
• Show that the discrete topology on X is always Hausdorff.
The first exercise asserts that compact topologies tend to be weak, while
the second exercise asserts that Hausdorff topologies tend to be strong. The
next lemma asserts that the two concepts only barely overlap:
Lemma 1.8.1. Let F ⊂ F be a weak and strong topology, respectively, on
a space X. If F is compact and F is Hausdorff, then F = F . (In other
words, a compact topology cannot be strictly stronger than a Hausdorff one,
and a Hausdorff topology cannot be strictly weaker than a compact one.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 103
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
104 1. Real analysis
Definition 1.8.4 (Base). Let (X, F ) be a topological space. A base for this
space is a collection B of open sets such that every open set in X can be
expressed as the union of sets in the base. The elements of B are referred
to as basic open sets.
Example 1.8.5. The collection of open balls B(x, r) in a metric space forms
a base for the topology of that space. As another (rather trivial) example
of a base: any topology F is a base for itself.
A useful fact about compact metric spaces is that they are in some sense
countably generated.
Lemma 1.8.6. Let X = (X, dX ) be a compact metric space.
(i) X is separable (i.e., it has an at most countably infinite dense sub-
set).
(ii) X is second-countable (i.e., it has an at most countably infinite base).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 105
verified to be dense and at most countable, giving (i). Similarly, the set of
balls {B(xn,i , n1 ) : n ≥ 1; 1 ≤ i ≤ kn } can be easily verified to be a base
which is at most countable, giving (ii).
Remark 1.8.7. One can easily generalise compactness here to σ-compact-
ness; thus, for instance, finite-dimensional vector spaces Rn are separa-
ble and second-countable. The properties of separability and second-count-
ability are much weaker than σ-compactness in general, but can still serve
to provide some constraint as to the size or complexity of a metric space or
topological space in many situations.
We now weaken the notion of a base to that of a subbase.
Definition 1.8.8 (Subbase). Let (X, F ) be a topological space. A subbase
for this space is a collection B of subsets of X such that F is the weakest
topology that makes B open (i.e., F is generated by B). Elements of B are
referred to as subbasic open sets.
Observe for instance that every base is a subbase. The converse is not
true: for instance, the half-open intervals (−∞, a), (a, +∞) for a ∈ R form
a subbase for the standard topology on R, but not a base. In contrast to
bases, which need to obey the property in Exercise 1.8.8, no property is
required on a collection B in order for it to be a subbase; every collection of
sets generates a unique topology with respect to which it is a subbase.
The precise relationship between subbases and bases is given by the
following exercise.
Exercise 1.8.10. Let (X, F ) be a topological space, and let B be a collection
of subsets of X. Then the following are equivalent:
• B is a subbase for (X, F ).
• The space B ∗ := {B1 ∩· · ·∩Bk : B1 , . . . , Bk ∈ B} of finite intersections
of B (including the whole space X, which corresponds to the case
k = 0) is a base for (X, F).
Thus a set is open iff it is the union of finite intersections of subbasic
open sets.
Many topological facts involving open sets can often be reduced to veri-
fications on basic or subbasic open sets, as the following exercise illustrates:
Exercise 1.8.11. Let (X, F ) be a topological space, and B be a subbase of
X, and let B ∗ be a base of X.
• Show that a sequence xn ∈ X converges to a limit x ∈ X if and
only if every subbasic open neighbourhood of x contains xn for all
sufficiently large xn . (Optional: Show that an analogous statement
is also true for nets.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
106 1. Real analysis
Proof. Call an open cover bad if it had no finite subcover and good oth-
erwise. In view of Exercise 1.8.9, it suffices to show that if every subbasic
open cover is good, then every basic open cover is good also, where we use
the basis B ∗ coming from Exercise 1.8.10.
Suppose for contradiction that every subbasic open cover was good but
at least one basic open cover was bad. If we order the bad basic open covers
by set inclusion, observe that every chain of bad basic open covers has an
upper bound that is also a bad basic open cover, namely the union of all
the covers in the chain. Thus, by Zorn’s lemma (Section 2.4), there exists a
maximal bad basic open cover C = (Uα )α∈A . Thus this cover has no finite
subcover, but if one adds any new basic open set to this cover, then there
must now be a finite subcover.
Pick a basic open set Uα in this cover C. Then we can write Uα =
B1 ∩ · · · ∩ Bk for some subbasic open sets B1 , . . . , Bk . We claim that at
least one of the B1 , . . . , Bk also lie in the cover C. To see this, suppose for
contradiction that none of the B1 , . . . , Bk was in C. Then adding any of the
Bi to C enlarges the basic open cover and thus creates a finite subcover; thus
Bi together with finitely many sets from C cover X, or equivalently that one
can cover X\Bi with finitely many sets from C. Thus one can also cover
X\Uα = ki=1 (X\Bi ) with finitely many sets from C, and thus X itself can
be covered by finitely many sets from C, a contradiction.
From the above discussion and the axiom of choice, we see that for each
basic set Uα in C there exists a subbasic set Bα containing Uα that also lies
in C. (Two different basic sets Uα , Uβ could lead to the same subbasic set
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 107
Bα = Bβ , but this will not concern us.) Since the Uα cover X, the Bα do
also. By hypothesis, a finite number of Bα can cover X, and so C is good,
which gives the desired a contradiction.
Exercise 1.8.12. (Optional) Use Exercise 1.8.7 to give another proof of the
Alexander subbase theorem.
Exercise 1.8.13. Use the Alexander subbase theorem to show that the unit
interval [0, 1] (with the usual topology) is compact, without recourse to the
Heine-Borel or Bolzano-Weierstrass theorems.
Exercise 1.8.14. Let X be a well-ordered set, endowed with the order
topology (Exercise 1.6.10); such a space is known as an ordinal space. Show
that X is Hausdorff, and that X is compact if and only if X has a maximal
element.
Proof. By Exercise 1.8.9 it suffices to show that any basic open cover of
X × Y by boxes (Uα × Vα )α∈A has a finite subcover. For any x ∈ X, this
open cover covers {x} × Y ; by the compactness of Y ≡ {x} × Y , we can thus
cover {x} × Y by a finite number of open boxes Uα × Vα . Intersecting the
Uα together, we obtain a neighbourhood Ux of x such that Ux × Y is covered
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
108 1. Real analysis
The above theory for products of two spaces extends without difficulty
to products of finitely many spaces. Now we consider infinite products.
Definition 1.8.11 (Product spaces). Given a family (Xα , Fα )α∈A of topo-
logical spaces, let X := α∈A Xα be the Cartesian product, i.e., the space
of tuples (xα )α∈A with xα ∈ Xα for all α ∈ A. For each α ∈ A, we have the
obvious projection map πα : X → Xα that maps (xβ )β∈A to xα .
• We define the product topology on X to be the topology generated
by the cylinder sets πα−1 (Uα ) for α ∈ A and Uα ∈ Fα as a subbase, or
equivalently the weakest topology that makes all of the πα continuous.
• We define the box topology on X to be the topology generated by all
the boxes α∈A Uα , where Uα ∈ Fα for all α ∈ A.
Unless otherwise specified, we assume the product space to be endowed with
the product topology rather than the box topology.
When A is finite, the product topology and the box topology coincide.
When A is infinite, the two topologies are usually different (as we shall see),
but the box topology is always at least as strong as the product topology.
Actually, in practice the box topology is too strong to be of much use—there
are not enough convergent sequences in it. For instance, in the space RN of
real-valued sequences (xn )∞ 1 −nm ∞
n=1 , even sequences such as ( m! e )n=1 do not
converge to the zero sequence as m → ∞ (why?), despite converging in just
about every other sense.
Exercise 1.8.18. Show that the arbitrary product of Hausdorff spaces re-
mains Hausdorff in either the product or the box topology.
Exercise 1.8.19. Let (Xn , dn ) be a sequence of metric spaces. Show that
the the function d : X × X → R+ on the product space X := n Xn defined
by
∞
∞ ∞ dn (xn , yn )
d((xn )n=1 , (yn )n=1 ) := 2−n
1 + dn (xn , yn )
n=1
is a metric on X which generates the product topology on X.
Exercise 1.8.20. Let X = α∈A Xα be a product space with the product
topology. Show that a sequence xn in that space converges to a limit x ∈ X
if and only if πα (xn ) converges in Xα to πα (x) for every α ∈ A. (The same
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 109
statement also holds for nets.) Thus convergence in the product topology is
essentially the same concept as pointwise convergence (cf. Example 1.6.24).
The box topology usually does not preserve compactness. For instance,
one easily checks that the product of any number of discrete spaces is still
discrete in the box topology. On the other hand, a discrete space is com-
pact (or sequentially compact) if and only if it is finite. Thus the infinite
product of any number of non-trivial (i.e., having at least two elements)
compact discrete spaces will be non-compact, and similarly for sequential
compactness.
The situation improves significantly with the product topology, however
(which is weaker, and thus more likely to be compact). We begin with the
situation for sequential compactness.
Proposition 1.8.12 (Sequential Tychonoff theorem). Any at most count-
able product of sequentially compact topological spaces is sequentially com-
pact.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
110 1. Real analysis
(m )
sequence (x(mj,j ) )∞
j=1 . One easily verifies that xn
j,j
converges in Xn to xn
as j → ∞ for every n, and so we have extracted a sequence that is convergent
in the product topology.
Remark 1.8.13. In the converse direction, if a product of spaces is se-
quentially compact, then each of the factor spaces must also be sequentially
compact, since they are continuous images of the product space and one can
apply Exercise 1.8.1.
f : {0, 1}N → {0, 1}. As {0, 1} (with the discrete topology) is sequentially
compact, this is an (uncountable) product of sequentially compact spaces.
On the other hand, for each n ∈ N we can define the evaluation function
fn : {0, 1}N → {0, 1} by fn : (am )∞
m=1 → an . This is a sequence in X; we
claim that it has no convergent subsequence. Indeed, given any nj → ∞, we
can find x = (xm )∞ m=1 ∈ {0, 1}
∞ such that x
nj = fnj (x) does not converge
to a limit as j → ∞, and so fnj does not converge pointwise (i.e., does not
converge in the product topology).
However, we can recover the result for uncountable products as long as
we work with topological compactness rather than sequential compactness,
leading to Tychonoff ’s theorem:
Theorem 1.8.14 (Tychonoff’s theorem). Any product of compact topolog-
ical spaces is compact.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 111
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
112 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 113
The concept only acquires additional meaning once one considers infinite
families of continuous functions.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
114 1. Real analysis
Proof. We first show that (i) implies (ii). For any x ∈ X, the evaluation
map f → f (x) is a continuous map from C(X → Y ) to Y , and thus maps
precompact sets to precompact sets. As a consequence, any precompact
family in C(X → Y ) is pointwise precompact. To show equicontinuity,
suppose for contradiction that equicontinuity failed at some point x, thus
there exists ε > 0, a sequence αn ∈ A, and points xn → x such that
dY (fαn (xn ), fαn (x)) > ε for every n. One then verifies that no subsequence
of fαn can converge uniformly to a continuous limit, contradicting precom-
pactness. (Note that in the metric space C(X → Y ), precompactness is
equivalent to sequential precompactness.)
Now we show that (ii) implies (iii). It suffices to show that equicontinuity
implies uniform equicontinuity. This is a straightforward generalisation of
the more familiar argument that continuity implies uniform continuity on
a compact domain, and we repeat it here. Namely, fix ε > 0. For every
x ∈ X, equicontinuity provides a δx > 0 such that dY (fα (x), fα (x )) ≤ ε
whenever x ∈ B(x, δx ) and α ∈ A. The balls B(x, δx /2) cover X, thus
by compactness some finite subcollection B(xi , δxi /2), i = 1, . . . , n, of these
balls cover X. One then easily verifies that dY (fα (x), fα (x )) ≤ ε whenever
x, x ∈ X with dX (x, x ) ≤ min1≤i≤n δxi /2.
Finally, we show that (iii) implies (i). It suffices to show that any se-
quence fn ∈ BC(X → Y ), n = 1, 2, . . ., which is pointwise precompact and
uniformly equicontinuous, has a convergent subsequence. By embedding Y
in its metric completion Y , we may assume without loss of generality that Y
is complete. (Note that for every x ∈ X, the set {fn (x) : n = 1, 2, . . .} is pre-
compact in Y , hence the closure in Y is complete and thus closed in Y also.
Thus any pointwise limit of the fn in Y will take values in Y .) By Lemma
1.8.6, we can find a countable dense subset x1 , x2 , . . . of X. For each xm ,
we can use pointwise precompactness to find a compact set Km ⊂ Y such
that fα (xm ) takes values in Km . For each n, the tuple Fn := (fn (xm ))∞ m=1
can then be viewed as a point in the product space ∞ n=1 Kn . By Proposi-
tion 1.8.12, this product space is sequentially compact, hence we may find
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.8. Compactness 115
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
116 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.9
117
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
118 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 119
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
120 1. Real analysis
where f (j) is the jth derivative of f . The topology generated by all the C k
norms for k = 0, 1, 2, . . . is the smooth topology: a sequence fn converges in
(j)
this topology to a limit f if fn converges uniformly to f (j) for each j ≥ 0.
Exercise 1.9.7 (Convergence in measure). Let (X, X , μ) be a measure
space, and let L(X) be the space of measurable functions f : X → C.
Show that the sets
B(f, ε, r) := {g ∈ L(X) : μ({x : |f (x) − g(x)| ≥ r} < ε)}
for f ∈ L(X), ε > 0, r > 0 form the base for a topology that turns L(X)
into a topological vector space, and that a sequence fn ∈ L(X) converges to
a limit f in this topology if and only if it converges in measure.
Exercise 1.9.8. Let [0, 1] be given the usual Lebesgue measure. Show
that the vector space L∞ ([0, 1]) cannot be given a topological vector space
structure in which a sequence fn ∈ L∞ ([0, 1]) converges to f in this topology
if and only if it converges almost everywhere. (Hint: Construct a sequence
fn in L∞ ([0, 1]) which does not converge pointwise a.e. to zero, but such
that every subsequence has a further subsequence that converges a.e. to
zero, and use Exercise 1.6.8.) Thus almost everywhere convergence is not
“topologisable” in general.
Exercise 1.9.9 (Algebraic topology). Recall that a subset U of a real vector
space V is algebraically open if the sets {t ∈ R : x + tv ∈ U } are open for
all x, v ∈ V .
10 I am not sure if the same statement is true for the box topology; I believe it is false.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 121
(i) Show that any set which is open in a topological vector space, is also
algebraically open.
(ii) Give an example of a set in R2 which is algebraically open, but not
open in the usual topology. (Hint: A line intersects the unit circle in
at most two points.)
(iii) Show that the collection of algebraically open sets in V is a topology.
(iv) Show that the collection of algebraically open sets in R2 does not
give R2 the structure of a topological vector space.
Exercise 1.9.10 (Quotient topology). Let V be a topological vector space,
and let W be a subspace of V . Let V /W := {v + W : v ∈ V } be the space
of cosets of W ; this is a vector space. Let π : V → V /W be the coset
map π(v) := v + W . Show that the collection of sets U ⊂ V /W such that
π −1 (U ) is open gives V /W the structure of a topological vector space. If V
is Hausdorff, show that V /W is Hausdorff if and only if W is closed in V .
Some (but not all) of the concepts that are definable for normed vector
spaces are also definable for the more general category of topological vector
spaces. For instance, even though there is no metric structure, one can still
define the notion of a Cauchy sequence xn ∈ V in a topological vector space:
this is a sequence such that xn −xm → 0 as n, m → ∞ (or more precisely, for
any open neighbourhood U of 0, there exists N > 0 such that xn − xm ∈ U
for all n, m ≥ N ). It is then possible to talk about a topological vector
space being complete (i.e., every Cauchy sequence converges). (From a more
abstract perspective, the reason we can define notions such as completeness
is because a topological vector space has something better than a topological
structure, namely a uniform structure.)
Remark 1.9.7. As we have seen in previous lectures, complete normed
vector spaces (i.e., Banach spaces) enjoy some very nice properties. Some
of these properties (e.g., the uniform boundedness principle and the open
mapping theorem) extend to a slightly larger class of complete topological
vector spaces, namely the Fréchet spaces. A Fréchet space is a complete
Hausdorff topological vector space whose topology is generated by an at
most countable family of seminorms; examples include the space C ∞ ([0, 1])
from Exercise 1.9.6 or the uniform convergence on compact topology from
Exercise 1.9.5 in the case when X is σ-compact. We will however not study
Fréchet spaces systematically here.
One can also extend the notion of a dual space V ∗ from normed vector
spaces to topological vector spaces in the obvious manner: the dual space
V ∗ of a topological space is the space of continuous linear functionals from
V to the field of scalars (either R or C, depending on whether V is a real
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
122 1. Real analysis
sequence V1 ⊂ V2 ⊂ · · · of finite-
Suppose furthermore that there is a nested
dimensional subspaces of V such that ∞ n=1 Vn is dense. Show that the
following statement is equivalent to the first three:
(iv) K is closed and bounded, and for every ε > 0, there exists an n such
that K lies in the ε-neighbourhood of Vn .
Example 1.9.8. Let 1 ≤ p < ∞. In order for a set K ⊂ p (N) to be
compact in the strong topology, it needs to be closed and bounded, and also
uniformly pth-power integrable at spatial infinity in the sense that for every
ε > 0 there exists n > 0 such that
( |f (m)|p )1/p ≤ ε
m>n
for all f ∈ K. Thus, for instance, the moving bump example {e1 , e2 , e3 , . . .},
where en is the sequence which equals 1 on n and zero elsewhere, is not
uniformly pth power integrable and thus not a compact subset of p (N),
despite being closed and bounded.
For continuous Lp spaces, such as Lp (R), uniform integrability at spatial
infinity is not sufficient to force compactness in the strong topology; one also
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 123
needs some uniform integrability at very fine scales, which can be described
using harmonic analysis tools such as the Fourier transform (Section 1.12).
We will not discuss this topic here.
Exercise 1.9.12. Let V be a normed vector space.
• If W is a finite-dimensional subspace of V , and x ∈ V , show that
there exists y ∈ W such that x − y ≤ x − y for all y ∈ W . Give
an example to show that y is not necessarily unique (in contrast to
the situation with Hilbert spaces).
• If W is a finite-dimensional proper subspace of V , show that there
exists x ∈ V with x = 1 such that x − y ≥ 1 for all y ∈ W (cf.
the Riesz lemma).
• Show that the closed unit ball {x ∈ V : x ≤ 1} is compact in the
strong topology if and only if V is finite dimensional.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
124 1. Real analysis
The following exercise shows that the strong, weak, and weak* topologies
can all differ from each other.
Exercise 1.9.15. Let V := c0 (N), thus V ∗ ≡ 1 (N) and V ∗∗ ≡ ∞ (N). Let
e1 , e2 , . . . be the standard basis of either V , V ∗ , or V ∗∗ .
• Show that the sequence e1 , e2 , . . . converges weakly in V to zero, but
does not converge strongly in V .
• Show that the sequence e1 , e2 , . . . converges in the weak* sense in V ∗
to zero, but does not converge in the weak or strong senses in V ∗ .
• Show that the sequence ∞ m=n em for n = 1, 2, . . . converges in the
weak* topology of V ∗∗ to zero, but does not converge in the weak or
strong senses. (Hint: Use a generalised limit functional.)
Remark 1.9.12. Recall from Exercise 1.7.11 that sequences in V ∗ ≡ 1 (N)
that converge in the weak topology also converge in the strong topology.
We caution however that the two topologies are not quite equivalent; for
instance, the open unit ball in 1 (N) is open in the strong topology but not
in the weak.
Exercise 1.9.16. Let V be a normed vector space, and let E be a subset
of V . Show that the following are equivalent:
• E is strongly bounded (i.e., E is contained in a ball).
• E is weakly bounded (i.e., λ(E) is bounded for all λ ∈ V ∗ ).
(Hint: Use the Hahn-Banach theorem and the uniform boundedness princi-
ple.) Similarly, if F is a subset of V ∗ , and V is a Banach space, show that F
is strongly bounded if and only if F is weak* bounded (i.e., {λ(x) : λ ∈ F }
is bounded for each x ∈ V ). Conclude in particular that any sequence which
is weakly convergent in V or weak* convergent in V ∗ is necessarily bounded.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 125
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
126 1. Real analysis
One of the main reasons why we use the weak and weak* topologies in
the first place is that they have much better compactness properties than
the strong topology, thanks to the Banach-Alaoglu theorem:
Theorem 1.9.13 (Banach-Alaoglu theorem). Let V be a normed vector
space. Then the closed unit ball of V ∗ is compact in the weak* topology.
Proof. Let’s say V is a complex vector space (the case of real vector spaces
is of course analogous). Let B ∗ be the closed unit ball of V ∗ , then any
linear functional λ ∈ B ∗ maps the closed unit ball B of V into the disk
D := {z ∈ C : |z| ≤ 1}. Thus one can identify B ∗ with a subset of D B , the
space of functions from B to D. One easily verifies that the weak* topology
on B ∗ is nothing more than the product topology of D B restricted to B ∗ .
Also, one easily shows that B ∗ is closed in D B . But by Tychonoff’s theorem,
D B is compact, and so B ∗ is compact also.
One should caution that the Banach-Alaoglu theorem does not imply
that the space V ∗ is locally compact in the weak* topology, because the
norm ball in V has empty interior in the weak* topology unless V is finite
dimensional. In fact, we have the following result of Riesz:
Exercise 1.9.24. Let V be a locally compact Hausdorff topological vector
space. Show that V is finite dimensional. (Hint: If V is locally compact,
then there exists an open neighbourhood U of the origin whose closure is
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 127
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
128 1. Real analysis
1.9.4. The strong and weak operator topologies. Now we turn our
attention from function spaces to spaces of operators. Recall that if X
and Y are normed vector spaces, then B(X → Y ) is the space of bounded
linear transformations from X to Y . This is a normed vector space with the
operator norm
T op := sup{T xY : xX ≤ 1}.
This norm induces the operator norm topology on B(X → Y ). Unfortu-
nately, this topology is so strong that it is difficult for a sequence of oper-
ators Tn ∈ B(X → Y ) to converge to a limit; for this reason, we introduce
two weaker topologies.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 129
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
130 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.9. The strong and weak topologies 131
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.10
Continuous functions
on locally compact
Hausdorff spaces
133
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
134 1. Real analysis
A topological space which obeys any (and hence all) of (i)–(iv) is known
as a normal space; definition (i) is traditionally taken to be the standard
definition of normality. We will give some examples of normal spaces shortly.
K1 ⊂ U1/2 ⊂ K1/2 ⊂ U0 .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 135
Applying (ii) two more times, we can find more open sets U1/4 , U3/4 and
closed sets K1/4 , K3/4 such that
K1 ⊂ U3/4 ⊂ K3/4 ⊂ U1/2 ⊂ K1/2 ⊂ U1/4 ⊂ K1/4 ⊂ U0 .
Iterating this process, we can construct open sets Uq and closed sets Kq for
every dyadic rational q = a/2n in (0, 1) such that Uq ⊂ Kq for all 0 < q < 1,
and Kq ⊂ Uq for any 0 ≤ q < q ≤ 1.
If we now define f (x) := sup{q : x ∈ Uq } = inf{q : x ∈ Kq }, where
q ranges over dyadic rationals between 0 and 1, and with the convention
that the emptyset has sup 1 and inf 0, one easily verifies that the sets
{f (x) > α} = q>α Uq and {f (x) < α} = q<α X\Kq are open for every
real number α, and so f is continuous as required.
Exercise 1.10.3. Let R be the real line with the usual topology F , and
let F be the topology on R generated by F and the rationals. Show that
(R, F ) is Hausdorff, with every point closed, but is not normal.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
136 1. Real analysis
• For j = 1, 2, let Kj be the set of all tuples (nx )x∈R such that nx = j
for all x outside of a countable set and such that x → nx is injective
on this finite set (i.e., there do not exist distinct x, x such that nx =
nx = j). Show that K1 , K2 are disjoint and closed.
• Show that given any open neighbourhood U of K1 , there exists dis-
joint
∞ finite subsets A1 , A2 , . . . of R and an injective function f :
i=1 Ai → N such that for any j ≥ 0, any (mx )x∈R such that
mx = f (x) for all x ∈ A1 ∪ · · · ∪ Aj and is identically 1 on Aj+1 ,
lies in U .
• Show that any open neighbourhood of K1 and any open neighbour-
hood of K2 necessarily intersect, and so NR is not normal.
• Conclude that RR with the product topology is not normal.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 137
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
138 1. Real analysis
Remark 1.10.6. Observe that Urysohn’s lemma can be viewed the special
case of the Tietze extension theorem when K is the union of two disjoint
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 139
closed sets, and when f is equal to 1 on one of these sets and equal to 0 on
the other.
Remark 1.10.7. One can extend the Tietze extension theorem to finite-
dimensional vector spaces: if K is a closed subset of a normal vector space
X and f : K → Rn is bounded and continuous, then one has a bounded
continuous extension f : K → Rn . Indeed, one simply applies the Tietze
extension theorem to each component of f separately. However, if the range
space is replaced by a space with a non-trivial topology, then there can
be topological obstructions to continuous extension. For instance, a map
f : {0, 1} → Y from a two-point set into a topological space Y is always
continuous, but can be extended to a continuous map f˜ : R → Y if and only
if f (0) and f (1) lie in the same path-connected component of Y . Similarly,
if f : S 1 → Y is a map from the unit circle into a topological space Y ,
then a continuous extension from S 1 to R2 exists if and only if the closed
curve f : S 1 → Y is contractible to a point in Y . These sorts of questions
require the machinery of algebraic topology to answer them properly, and
are beyond the scope of this course.
There are analogues for the Tietze extension theorem in some other
categories of functions. For instance, in the Lipschitz category, we have
One can also remove the requirement that the function f be bounded in
the Tietze extension theorem:
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
140 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 141
Proof. Suppose first that X is normal. By Urysohn’s lemma, one can find a
continuous function gα : X → [0, 1] for each α ∈ A which is supported
on Uα
and equals 1 on the closed set Kα . Observe that the function g := α∈A gα
is well defined, continuous and bounded below by 1. The claim then follows
by setting fα := gα /g.
The final claim follows by using Exercise 1.10.6 instead of Urysohn’s
lemma.
Exercise 1.10.11. Let X be a topological space. A function f : X → R is
said to be upper semicontinuous if f −1 ((−∞, a)) is open for all real a and
lower semicontinuous if f −1 ((a, +∞)) is open for all real a.
• Show that an indicator function 1E is upper semicontinuous if and
only if E is closed and lower semicontinuous if and only if E is open.
• If X is normal, show that a function f is upper semi-continuous
if and only if f (x) = inf{g(x) : g ∈ C(X → (−∞, +∞]), g ≥ f }
for all x ∈ X, and lower semi-continuous if and only if f (x) =
sup{g(x) : g ∈ C(X → [−∞, +∞)), g ≤ f } for all x ∈ X, where
we write f ≤ g if f (x) ≤ g(x) for all x ∈ X.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
142 1. Real analysis
This result can be extended to more general spaces than compact metric
spaces, for instance to Polish spaces (provided that the measure remains
finite). For instance:
Exercise 1.10.12. Let X be a locally compact metric space which is σ-
compact, and let μ be an unsigned Borel measure which is finite on every
compact set. Show that μ is a Radon measure.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 143
Proof. We first prove the uniqueness, which is quite easy due to all the
properties that Radon measures enjoy. Suppose we had two Radon measures
μ, μ such that I = Iμ = Iμ ; in particular, we have
(1.75) f dμ = f dμ
X X
for all f ∈ Cc (X → R). Now let K be a compact set, and let U be an open
neighbourhood of K. By Exercise 1.10.6, we can find f ∈ Cc (X → R) with
1K ≤ f ≤ 1U ; applying this to (1.75), we conclude that
μ(U ) ≥ μ (K).
Taking suprema in K and using inner regularity, we conclude that μ(U ) ≥
μ (U ); exchanging μ and μ we conclude that μ and μ agree on open sets;
by outer regularity we then conclude that μ and μ agree on all Borel sets.
Now we prove existence, which is significantly trickier. We will ini-
tially make the simplifying assumption that X is compact (so in particular
Cc (X → R) = C(X → R) = BC(X → R)), and remove this assumption at
the end of the proof.
Observe that I is monotone on C(X → R), thus I(f ) ≤ I(g) whenever
f ≤ g.
We would like to define the measure μ on Borel sets E by defining
μ(E) := I(1E ). This does not work directly, because 1E is not continuous.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
144 1. Real analysis
(cf. Exercise 1.10.11). This definition agrees with the existing definition of
I(f ) in the case when f is continuous. Since I(1) is finite and I is monotone,
one sees that I(f ) is finite (and non-negative) for all f ∈ BClsc (X → R+ ).
One also easily sees that I is monotone on BClsc (X → R+ ): I(f ) ≤ I(g)
whenever f, g ∈ BClsc (X → R+ ) and f ≤ g, and homogeneous in the sense
that I(cf ) = cI(f ) for all f ∈ BClsc (X → R+ ) and c > 0. It is also easy
to verify the super-additivity property I(f + f ) ≥ I(f ) + I(f ) for f, f ∈
BClsc (X → R+ ); this simply reflects the linearity of I on Cc (X → R),
together with the fact that if 0 ≤ g ≤ f and 0 ≤ g ≤ f , then 0 ≤ g + g ≤
f + f .
We now complement the super-additivity property with a countably
subadditive one: if fn ∈ BClsc (X → R+ ) is a sequence, and f ∈
∞
∞lsc (X → R ) is such that f (x) ≤ n=1 fn (x) for all x ∈ X, then I(f ) ≤
BC +
n=1 I(fn ).
∞Pick a small 1/2 0 < ε < 1. It will suffice to show that I(g) ≤
n=1 I(fn ) + O(ε ) (say) whenever g ∈ Cc (X → R) is such that 0 ≤ g ≤
f , and O(ε1/2 ) denotes a quantity bounded in magnitude by Cε1/2 , where
C is a quantity that is independent of ε.
Fix g. For every x ∈ X, we can find a neighbourhood Ux of x such
that |g(y) − g(x)| ≤ ε for all y ∈ Ux ; we can also find Nx > 0 such that
Nx
n=1 fn (x) ≥ f (x) − ε. By shrinking Ux if necessary, we see from the
lower semicontinuity of the fn and f that we can also ensure that fn (y) ≥
fn (x) − ε/2n for all 1 ≤ n ≤ Nx and y ∈ Ux .
By normality, we can find open neighbourhoods Vx of x whose closure
lies in Ux . The Vx form an open cover of X. Since we are assuming X to
be compact, we can thus find a finite subcover Vx1 , . . . , Vxk of X. Applying
Lemma 1.10.9, we can thus find a partition of unity 1 = kj=1 ψj , where
each ψj is supported on Uxj .
√
Let x ∈ X be such that g(x) ≥ ε. Then we can write g(x) =
j:x∈Uxj g(x)ψj (x). If j is in this sum, then |g(xj ) − g(x)| ≤ ε, and thus
√ √
(for ε small enough) g(xj ) ≥ ε/2, and hence f (xj ) ≥ ε/2. We can then
write
Nx j
fn (xj ) √
1≤ + O( ε),
f (xj )
n=1
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 145
and thus
∞
fn (xj ) √
g(x) ≤ g(xj )ψj (x) + O( ε)
√ f (xj )
n=1 j:f (xj )≥ ε/2;Nx ≥n
j
(here we use the fact that j ψj (x) = 1 and that the continuous compactly
supported function g is bounded). Observe that only finitely many sum-
mands are non-zero. We conclude that
∞
fn (xj ) √
I(g) ≤ I( g(xj )ψj ) + O( ε)
√ f (xj )
n=1 j:f (xj )≥ ε/2;Nxj ≥n
(here we use that 1 ∈ Cc (X) and so I(1) is finite). On the other hand, for
any x ∈ X and any n, the expression
fn (xj )
g(xj )ψj (x)
√ f (xj )
j:f (xj )≥ ε/2;Nxj ≥n
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
146 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 147
for all f ∈ Cc (X → R). From the compact case we see that there exists a
finite Radon measure μn such that I(ψn f ) = Iμn (f ) for all f ∈ Cc (X → R);
setting μ := n μn one can verify (using the monotone convergence theorem,
Theorem 1.1.21) that μ obeys the required properties.
Remark 1.10.13. One can also construct the Radon measure μ using
the Carathéodory extension theorem (Theorem 1.1.17); this proof of the
Riesz representation theorem can be found in many real analysis texts. A
third method is to first create the space L1 by taking the completion of
Cc (X → R) with respect to the L1 norm f L1 := I(|f |), and then define
μ(E) := 1E L1 . It seems to me that all three proofs are almost equally
lengthy and ultimately rely on the same ingredients; they all seem to have
their strengths and weaknesses, and involve at least one tricky computation
somewhere (in the above argument, the most tricky thing is the countable
subadditivity of I on lower semicontinuous functions). I have yet to find
a proof of this theorem which is both clean and conceptual, and would be
happy to learn of other proofs of this theorem.
Remark 1.10.14. One can use the Riesz representation theorem to provide
an alternate construction of Lebesgue measure, say on R. Indeed, the Rie-
mann integral already provides a positive linear functional on Cc (R → R),
which by the Riesz representation theorem must come from a Radon mea-
sure, which can be easily verified to assign the value b − a to every interval
[a, b] and thus must agree with Lebesgue measure. The same approach lets
one define volume measures on manifolds with a volume form.
and
g dμ = sup{ h dμ : 0 ≤ h ≤ g; h ∈ Cc (X → R)}.
X X
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
148 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 149
Remark 1.10.16. Note that the previous exercise generalises the identifica-
tions cc (N)∗ ≡ c0 (N)∗ ≡ 1 (N) from previous notes. For compact Hausdorff
spaces X, we have C(X → R) = C0 (X → R), and thus
C(X → R)∗ ≡ M (X). For locally compact Hausdorff spaces that are σ-
compact but not compact, we instead have C(X → R)∗ ≡ M (βX), where
βX is the Stone-Čech compactification of X, which we will discuss in Section
2.5.
Remark 1.10.17. One can of course also define complex Radon measures
to be those complex finite Borel measures whose real and imaginary parts
are signed Radon measures, and define M (X → C) to be the space of all
such measures; then one has analogues of the above identifications. We omit
the details.
Exercise 1.10.18. Let X, Y be two locally compact Hausdorff spaces that
are also σ-compact, and let f : X → Y be a continuous map. If μ is an
unsigned finite Radon measure on X, show that the pushforward measure
f# μ on Y , defined by f# μ(E) := μ(f −1 (E)), is a Radon measure on Y .
Establish the same fact for signed Radon measures.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
150 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 151
Proof. It suffices to verify the claim for algebras A which are closed in the
C(X → R) topology, since the claim follows in the general case by replacing
A with its closure (note that the closure of an algebra is still an algebra).
Observe from the Weierstrass approximation theorem that on any
bounded interval [−K, K], the function |x| can be expressed as the uni-
form limit of polynomials Pn (x); one can even write down explicit formulae
for such a Pn , though we will not need such formulae here. Since continuous
functions on the compact space X are bounded, this implies that for any
f ∈ A, the function |f | is the uniform limit of polynomial combinations
Pn (f ) of f . As A is an algebra, the Pn (f ) lie in A; as A is closed; we see
that |f | lies in A.
f −g f −g
Using the identities max(f, g) = f +g f +g
2 +| 2 |, min(f, g) = 2 −| 2 |, we
conclude that A is a lattice in the sense that one has max(f, g), min(f, g) ∈ A
whenever f, g ∈ A.
Now let f ∈ C(X → R) and ε > 0. We would like to find g ∈ A such
that |f (x) − g(x)| ≤ ε for all x ∈ X.
Given any two points x, y ∈ X, we can at least find a function gxy ∈ A
such that gxy (x) = f (x) and gxy (y) = f (y); this follows since the vector
space A separates points and also contains the identity function (the case
x = y needs to be treated separately). We now use these functions gxy to
build the approximant g. First, observe from continuity that for every x, y ∈
X there exists an open neighbourhood Vxy of y such that gxy (y ) ≥ f (y ) − ε
for all y ∈ Vxy . By compactness, for any fixed x we can cover X by a
finite number of these Vxy . Taking the max of all the gxy associated to this
finite subcover, we create another function gx ∈ A such that gx (x) = f (x)
and gx (y) ≥ f (y) − ε for all y ∈ X. By continuity, we can find an open
neighbourhood Ux of x such that gx (x ) ≤ f (x ) + ε for all x ∈ Ux . Again
applying compactness, we can cover X by a finite number of the Ux ; taking
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
152 1. Real analysis
the min of all the gx associated to this finite subcover we obtain g ∈ A with
f (x) − ε ≤ g(x) ≤ f (x) + ε for all x ∈ X, and the claim follows.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 153
Combining this with the Riesz representation theorem and the sequential
Banach-Alaoglu theorem (Theorem 1.9.14), we obtain
Corollary 1.10.21. If X is a compact metric space, then the closed unit
ball in M (X) is sequentially compact in the vague topology.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
154 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.10. LCH spaces 155
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
156 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.11
Interpolation of Lp
spaces
In the previous sections, we have been focusing largely on the soft side of
real analysis, which is primarily concerned with qualitative properties such as
convergence, compactness, measurability, and so forth. In contrast, we will
now emphasise the hard side of real analysis, in which we study estimates
and upper and lower bounds of various quantities, such as norms of functions
or operators. (Of course, the two sides of analysis are closely connected to
each other; an understanding of both sides and their interrelationships is
needed in order to get the broadest and most complete perspective for this
subject; see Section 1.3 of Structure and Randomness for more discussion.)
One basic tool in hard analysis is that of interpolation, which allows
one to start with a hypothesis of two (or more) upper bound estimates, e.g.,
A0 ≤ B0 and A1 ≤ B1 , and conclude a family of intermediate estimates
Aθ ≤ Bθ (or maybe Aθ ≤ Cθ Bθ , where Cθ is a constant) for any choice of
parameter 0 < θ < 1. Of course, interpolation is not a magic wand; one
needs various hypotheses (e.g., linearity, sublinearity, convexity, or com-
plexifiability) on Ai , Bi in order for interpolation methods to be applicable.
Nevertheless, these techniques are available for many important classes of
problems, most notably that of establishing boundedness estimates such as
T f Lq (Y,ν) ≤ Cf Lp (X,μ) for linear (or linear-like) operators T from one
Lebesgue space Lp (X, μ) to another Lq (Y, ν). (Interpolation can also be
performed for many other normed vector spaces than the Lebesgue spaces,
but we will just focus on Lebesgue spaces in these notes to focus the discus-
sion.) Using interpolation, it is possible to reduce the task of proving such
estimates to that of proving various endpoint versions of these estimates.
157
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
158 1. Real analysis
In some cases, each endpoint only faces a portion of the difficulty that the
interpolated estimate did, and so by using interpolation, one has split the
task of proving the original estimate into two or more simpler subtasks. In
other cases, one of the endpoint estimates is very easy, and the other one
is significantly more difficult than the original estimate. Thus interpolation
does not really simplify the task of proving estimates in this case, but at least
clarifies the relative difficulty between various estimates in a given family.
As is the case with many other tools in analysis, interpolation is not
captured by a single interpolation theorem; instead, there are a family of
such theorems, which can be broadly divided into two major categories, re-
flecting the two basic methods that underlie the principle of interpolation.
The real interpolation method is based on a divide-and-conquer strategy: to
understand how to obtain control on some expression such as T f Lq (Y,ν)
for some operator T and some function f , one would divide f into two or
more components, e.g., into components where f is large and where f is
small, or where f is oscillating with high frequency or only varying with
low frequency. Each component would be estimated using a carefully cho-
sen combination of the extreme estimates available; optimising over these
choices and summing up (using whatever linearity-type properties on T are
available), one would hope to get a good estimate on the original expres-
sion. The strengths of the real interpolation method are that the linearity
hypotheses on T can be relaxed to weaker hypotheses, such as sublinearity
or quasilinearity; also, the endpoint estimates are allowed to be of a weaker
type than the interpolated estimates. On the other hand, the real interpola-
tion often concedes a multiplicative constant in the final estimates obtained,
and one is usually obligated to keep the operator T fixed throughout the
interpolation process. The proofs of real interpolation theorems are also a
little bit messy, though in many cases one can simply invoke a standard
instance of such theorems (e.g., the Marcinkiewicz interpolation theorem)
as a black box in applications.
The complex interpolation method instead proceeds by exploiting the
powerful tools of complex analysis, in particular the maximum modulus prin-
ciple and its relatives (such as the Phragmén-Lindelöf principle). The idea
is to rewrite the estimate to be proven (e.g., T f Lq (Y,ν) ≤ Cf Lp (X,μ) ) in
such a way that it can be embedded into a family of such estimates which
depend holomorphically on a complex parameter s in some domain (e.g.,
the strip {σ + it : t ∈ R, σ ∈ [0, 1]}). One then exploits things like the max-
imum modulus principle to bound an estimate corresponding to an interior
point of this domain by the estimates on the boundary of this domain. The
strengths of the complex interpolation method are that it typically gives
cleaner constants than the real interpolation method, and also allows the
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 159
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
160 1. Real analysis
and
(1.80) Bθ := B01−θ B1θ ;
indeed one simply raises (1.76) to the power 1−θ, (1.77) to the power θ, and
multiplies the two inequalities together. Thus for instance, when θ = 1/2
one obtains the geometric mean of (1.76) and (1.77):
1/2 1/2 1/2 1/2
A0 A1 ≤ B0 B1 .
One can view Aθ and Bθ as the unique log-linear functions of θ (i.e., log Aθ ,
log Bθ are (affine-)linear functions of θ) which equal their boundary values
A0 , A1 and B0 , B1 , respectively, as θ = 0, 1.
Example 1.11.1. If A0 = AL1/p0 and A1 = AL1/p1 for some A, L > 0 and
0 < p0 , p1 ≤ ∞, then the log-linear interpolant Aθ is given by Aθ = AL1/pθ ,
where 0 < pθ ≤ ∞ is the quantity such that p1θ = 1−θ θ
p0 + p1 .
The deduction of (1.78) from (1.76), (1.77) is utterly trivial, but there
are still some useful lessons to be drawn from it. For instance, let us take
A0 = A1 = A for simplicity, so we are interpolating two upper bounds
A ≤ B0 , A ≤ B1 on the same quantity A to give a new bound A ≤ Bθ . But
actually we have a refinement available to this bound, namely
B0 B 1 ε
(1.81) Aθ ≤ Bθ min( , )
B1 B0
for any sufficiently small ε > 0 (indeed one can take any ε less than or equal
to min(θ, 1 − θ)). Indeed one sees this simply by applying (1.78) with θ
with θ − ε and θ + ε and taking minima. Thus we see that (1.78) is only
sharp when the two original bounds B0 , B1 are comparable; if instead we
have B1 ∼ 2n B0 for some integer n, then (1.81) tells us that we can improve
(1.78) by an exponentially decaying factor of 2−ε|n| . The geometric series
formula tells us that such factors are absolutely summable, and so in practice
it is often a useful heuristic to pretend that the n = O(1) cases dominate so
strongly that the other cases can be viewed as negligible by comparison.
Also, one can trivially extend the deduction of (1.78) from (1.76), (1.77)
as follows: if θ → Aθ is a function from [0, 1] to R+ which is log-convex
(thus θ → log Aθ is a convex function of θ, and (1.76) and (1.77) hold for
some B0 , B1 > 0, then (1.78) holds for all intermediate θ also, where Bθ is
of course defined by (1.80)). Thus one can interpolate upper bounds on log-
convex functions. However, one certainly cannot interpolate lower bounds:
lower bounds on a log-convex function θ → Aθ at θ = 0 and θ = 1 yield no
information about the value of, say, A1/2 . Similarly, one cannot extrapolate
upper bounds on log-convex functions: an upper bound on, say, A0 and A1/2
does not give any information about A1 . (However, an upper bound on A0
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 161
coupled with a lower bound on A1/2 gives a lower bound on A1 ; this is the
contrapositive of an interpolation statement.)
Exercise 1.11.1. Show that the sum f + g, product f g, or pointwise max-
imum max(f, g) of two log-convex functions f, g : [0, 1] → R+ is log-convex.
Remark 1.11.2. Every non-negative log-convex function θ → Aθ is convex,
thus in particular Aθ ≤ (1 − θ)A0 + θA1 for all 0 ≤ θ ≤ 1 (note that this
generalises the arithmetic mean-geometric mean inequality). Of course, the
converse statement is not true.
will obey (1.82). The principle however fails without this hypothesis, as
one can see for instance by considering the holomorphic function f (s) :=
exp(−i exp(πis)).
Proof. Observe that the function s → B01−s B1s is holomorphic and non-zero
on S, and has magnitude exactly Bθ on the line Re(s) = θ for each 0 ≤ θ ≤ 1.
Thus, by dividing f by this function (which worsens the qualitative bound
(1.82) slightly), we may reduce to the case when Bθ = 1 for all 0 ≤ θ ≤ 1.
Suppose we temporarily assume that f (σ + it) → 0 as |σ + it| → ∞.
Then by the maximum modulus principle (applied to a sufficiently large
rectangular portion of the strip), it must then attain a maximum on one of
the two sides of the strip. But |f | ≤ 1 on these two sides, and so |f | ≤ 1 on
the interior as well.
To remove the assumption that f goes to zero at infinity, we use the
trick of giving ourselves an epsilon of room (Section 2.7). Namely, we mul-
tiply f (s) by the holomorphic function gε (s) := exp(εi exp(i[(π − δ/2)s +
δ/4])) for some ε > 0. A little complex arithmetic shows that the function
f (s)gε (s)gε (1 − s) goes to zero at infinity in S (the gε (s) factor decays fast
enough to damp out the growth of f as Im(s) → −∞, while the gε (1 − s)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
162 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 163
we have
f Lpθ (X) ≤ f L p0 (X) f Lp1 (X)
1−θ θ
Let us give several proofs of this lemma. We will focus on the case
p1 < ∞; the endpoint case p1 = ∞ can be proven directly, or by modifying
the arguments below, or by using an appropriate limiting argument, and we
leave the details to the reader.
The first proof is to use Hölder’s inequality
f pLθpθ (X) = |f |(1−θ)pθ |f |θpθ dμ ≤ |f |(1−θ)pθ Lp0 /((1−θ)pθ ) |f |θpθ Lp1 /(θpθ )
X
when p1 is finite (with some minor modifications in the case p1 = ∞).
Another (closely related) proof proceeds by using the log-convexity in-
equality
|f (x)|pθ ≤ (1 − α)|f (x)|p0 + α|f (x)|p1
for all x, where 0 < α < 1 is the quantity such that pθ = (1 − α)p0 + αp1 .
If one integrates this inequality in x, one already obtains the claim in the
normalised case when f Lp0 (X) = f Lp1 (X) = 1. To obtain the general
case, one can multiply the function f and the measure μ by appropriately
chosen constants to obtain the above normalisation; we leave the details as
an exercise to the reader. (The case when f Lp0 (X) or f Lp1 (X) vanishes
is of course easy to handle separately.)
A third approach is more in the spirit of the real interpolation method,
avoiding the use of convexity arguments. As in the second proof, we can
reduce to the normalised case f Lp0 (X) = f Lp1 (X) = 1. We then split
f = f 1|f |≤1 + f 1|f |>1 , where 1|f |≤1 is the indicator function to the set {x :
|f (x)| ≤ 1}, and similarly for 1|f |>1 . Observe that
and similarly
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
164 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 165
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
166 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 167
Exercise 1.11.8. Let X be a finite set with counting measure, and let
f : X → C be a function. For any 0 < p < ∞, show that
f Lp,∞ (X) ≤ f Lp (X) p log(1 + |X|)f Lp,∞ (X) .
(Hint: To prove the second inequality, normalise f Lp,∞ (X) = 1, and then
manually dispose of the regions of X where f is too large or too small.)
Thus, in some sense, weak Lp and strong Lp are equivalent up to logarithmic
factors.
One can interpolate weak Lp bounds just as one can strong Lp bounds:
if f Lp0 ,∞ (X) ≤ B0 and f Lp1 ,∞ (X) ≤ B1 , then
(1.85) f Lpθ ,∞ (X) ≤ Bθ
for all 0 ≤ θ ≤ 1. Indeed, from the hypotheses we have
B0p0
λf (t) ≤
t p0
and
B1p1
λf (t) ≤
t p1
for all t > 0, and hence by scalar interpolation (using an interpolation
parameter 0 < α < 1 defined by pθ = (1 − α)p0 + αp1 , and after doing some
algebra) we have
Bθpθ
(1.86) λf (t) ≤
t pθ
for all 0 < θ < 1.
As remarked in the previous section, we can improve upon (1.86); indeed,
if we define t0 to be the unique value of t where B0p0 /tp0 and B1p1 /tp1 are
equal, then we have
Bθpθ
λf (t) ≤ p min(t/t0 , t0 /t)ε
tθ
for some ε > 0 depending on p0 , p1 , θ. Inserting this improved bound into
(1.84) we see that we can improve the weak-type bound (1.85) to a strong-
type bound
(1.87) f Lpθ (X) ≤ Cp0 ,p1 ,θ Bθ
for some constant Cp0 ,p1 ,θ . Note that one cannot use the tensor power trick
this time to eliminate the constant Cp0 ,p1 ,θ as the weak Lp norms do not
behave well with respect to tensor product. Indeed, the constant Cp0 ,p1 ,θ
must diverge to infinity in the limit θ → 0 if p0 = ∞, otherwise it would
imply that the Lp0 norm is controlled by the Lp0 ,∞ norm, which is false by
Example 1.11.6; similarly one must have a divergence as θ → 1 if p1 = ∞.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
168 1. Real analysis
Exercise 1.11.9. Let 0 < p0 < p1 ≤ ∞ and 0 < θ < 1. Refine the
inclusions in (1.83) to
Lp0 (X) ∩ Lp1 (X) ⊂ Lp0 ,∞ (X) ∩ Lp1 ,∞ (X) ⊂ Lpθ (X)
⊂ Lpθ ,∞ (X) ⊂ Lp0 (X) + Lp1 (X) ⊂ Lp0 ,∞ (X) + Lp1 ,∞ (X).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 169
1, . . . , n such that nj=1 fj L1,∞ (X) ≥ cn log n for some absolute constant
c> 0. (Hint: Exploit the logarithmic divergence of the harmonic series
∞ 1
j=1 j .) Conclude that there exists a probability space X such that the
1,∞
L (X) quasi-norm is not equivalent to an actual norm.
Exercise 1.11.13. Let (X, X , μ) be a σ-finite measure space, let 0 < p < ∞,
and f : X → C be a measurable function. Show that the following are
equivalent:
• f lies in Lp,∞ (X).
• There exists a constant C such that for every set E of finite
measure, there exists a subset E with μ(E ) ≥ 12 μ(E) such that
| X f 1E dμ| ≤ Cμ(E)1/p .
Exercise 1.11.14. Let (X, X , μ) be a measure space of finite measure, and
f : X → C be a measurable function. Show that the following two state-
ments are equivalent:
• There exists a constant C > 0 such that f Lp (X) ≤ Cp for all
1 ≤ p < ∞.
• There exists a constant c > 0 such that X ec|f | dμ < ∞.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
170 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 171
The estimate (1.92) has currently been established for simple functions
f, g with finite measure support. But one can extend the claim to any
f ∈ Lpθ (X) (keeping g simple with finite measure support) by decompos-
ing f into a bounded function and a function of finite measure support,
approximating the former in Lpθ (X) ∩ Lp1 (X) by simple functions of fi-
nite measure support, and approximating the latter in Lpθ (X) ∩ Lp0 (X) by
simple functions of finite measure support, and taking limits using (1.90),
(1.91) to justify the passage to the limit. One can then also allow arbitrary
g ∈ Lqθ (Y ) by using the monotone convergence theorem (Theorem 1.1.21).
The claim now follows from the duality between Lq1 (Y ) and Lq1 (Y ).
Suppose one has a linear operator T that maps simple functions of finite
measure support on X to measurable functions on Y (modulo almost ev-
erywhere equivalence). We say that such an operator is of strong type (p, q)
if it can be extended in a continuous fashion to an operator on Lp (X) to
an operator on Lq (Y ); this is equivalent to having an estimate of the form
T f Lq (Y ) ≤ Bf Lp (X) for all simple functions f of finite measure sup-
port. (The extension is unique if p is finite or if X has finite measure, due
to the density of simple functions of finite measure support in those cases.
Annoyingly, uniqueness fails for L∞ of an infinite measure space, though
this turns out not to cause much difficulty in practice, as the conclusions of
interpolation methods are usually for finite exponents p.) Define the strong
type diagram to be the set of all (1/p, 1/q) such that T is of strong type
(p, q). The Riesz-Thorin theorem tells us that if T is of strong type (p0 , q0 )
and (p1 , q1 ) with 0 < p0 , p1 ≤ ∞ and 1 ≤ q0 , q1 ≤ ∞, then T is also of
strong type (pθ , qθ ) for all 0 < θ < 1; thus the strong type diagram contains
the closed line segment connecting (1/p0 , 1/q0 ) with (1/p1 , 1/q1 ). Thus the
strong type diagram of T is convex in [0, +∞) × [0, 1] at least. (As we shall
see later, it is in fact convex in all of [0, +∞)2 .) Furthermore, on the inter-
section of the strong type diagram with [0, 1] × [0, +∞), the operator norm
T Lp (X)→Lq (Y ) is a log-convex function of (1/p, 1/q).
Exercise 1.11.15. If X = Y = [0, 1] with the usual measure, show that the
strong type diagram of the identity operator is the triangle {(1/p, 1/q) ∈
[0, +∞) × [0, +∞) : 1/p ≤ 1/q}. If instead X = Y = Z with the usual
counting measure, show that the strong type diagram of the identity oper-
ator is the triangle {(1/p, 1/q) ∈ [0, +∞) × [0, +∞) : 1/p ≥ 1/q}. What is
the strong type diagram of the identity when X = Y = R with the usual
measure?
Exercise 1.11.16. Let T (resp. T ∗ ) be a linear operator from simple
functions of finite measure support on Y (resp. X) to measurable func-
tions on Y (resp. X) modulo a.e. equivalence that are absolutely inte-
grable on finite measure sets. We say T, T ∗ are formally adjoint if we have
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
172 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 173
• There exists a constant B > 0 such that T f Lq (Y ) ≤ Bf Lp (X) for
all simple functions f of finite measure support.
• T can be extended to a operator from Lp (X) to Lq (Y ) such that
T f Lq (Y ) ≤ Bf Lp (X) for all f ∈ Lp (X) and some B > 0.
Show that the extension mentioned above is unique if p is finite or if X has
finite measure. Finally, show that the same equivalences hold if Lq (Y ) is
replaced by Lq,∞ (Y ) throughout.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
174 1. Real analysis
for all simple functions f of finite measure support, and all t > 0. Let
us write A B to denote A ≤ Cp0 ,p1 ,q0 ,q1 ,θ,B0 ,B1 B for some constant
Cp0 ,p1 ,q0 ,q1 ,θ,B0 ,B1 depending on the indicated parameters. By (1.84), it will
suffice to show that
∞
dt
λT f (t)tqθ f qLθpθ (X) .
0 t
By homogeneity we can normalise f Lpθ (X) = 1.
Actually, it will be slightly more convenient to work with the dyadic
version of the above estimate, namely
(1.95) λT f (2n )2qθ n 1;
n∈Z
see Exercise 1.11.6. The hypothesis f Lpθ (X) = 1 similarly implies that
(1.96) λf (2m )2pθ m 1.
m∈Z
The basic idea is then to get enough control on the numbers λT f (2n ) in
terms of the numbers λf (2m ) that one can deduce (1.95) from (1.96).
When p0 = p1 , the claim follows from direct substitution of (1.91), (1.94)
(see also the discussion in the previous section about interpolating strong
Lp bounds from weak ones), so let us assume p0 = p1 . By symmetry we
may take p0 < p1 , and thus p0 < pθ < p1 . In this case we cannot directly
apply (1.91), (1.94) because we only control f in Lpθ , not Lp0 or Lp1 . To
get around this, we use the basic real interpolation trick of decomposing f
into pieces. There are two basic choices for what decomposition to pick. On
one hand, one could adopt a minimalistic approach and just decompose into
two pieces
f = f≥s + f<s ,
where f≥s := f 1|f |≥s and f<s := f 1|f |<s , and the threshold s is a parameter
(depending on n) to be optimised later. Or we could adopt a maximalistic
approach and perform the dyadic decomposition
f= fm ,
m∈Z
where fm = f 12m ≤|f |<2m+1 . (Note that only finitely many of the fm are
non-zero, as we are assuming f to be a simple function.) We will adopt the
latter approach, in order to illustrate the dyadic decomposition method; the
former approach also works, but we leave it as an exercise to the interested
reader.
From sublinearity we have the pointwise estimate
Tf ≤ T fm ,
m
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 175
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
176 1. Real analysis
for some x0 > 0 > x1 and some α ∈ R. We can then simplify the left-hand
side of (1.97) to
min (c−1
n,m 2
nαqθ −mpθ qi
) .
i=0,1
n
Note that q0 x0 is positive and q1 x1 is negative. If we then pick cn,m to be
a suitably normalised multiple of 2−|nαqθ −mpθ | min(|x0 |,|x1 |)/2 (say), we obtain
the claim by summing geometric series.
Remark 1.11.12. A closer inspection of the proof (or a rescaling argument
to reduce to the normalised case B0 = B1 = 1, as in preceding sections)
reveals that one establishes the estimate
T f Lqθ (Y ) ≤ Cp0 ,p1 ,q0 ,q1 ,θ,C B01−θ B1θ f Lpθ (X)
for all simple functions f of finite measure support (or for all f ∈ Lpθ (X),
if one works with the continuous extension of T to such functions), and
some constant Cp0 ,p1 ,q0 ,q1 ,θ,C > 0. Thus the conclusion here is weaker by
a multiplicative constant from that in the Riesz-Thorin theorem, but the
hypotheses are weaker too (weak type instead of strong type). Indeed, we
see that the constant Cp0 ,p1 ,q0 ,q1 ,θ must blow up as θ → 0 or θ → 1.
We thus see that the strong-type diagram of T contains the interior of the
restricted weak-type or weak-type diagrams of T , at least in the triangular
region {(1/p, 1/q) ∈ [0, +∞)2 : p ≥ q}.
Exercise 1.11.19. Suppose that T is a sublinear operator of restricted
weak-type (p0 , q0 ) and (p1 , q1 ) for some 0 < p0 , p1 , q0 , q1 ≤ ∞. Show that
T is of restricted weak-type (pθ , qθ ) for any 0 < θ < 1, or in other words
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 177
the restricted type diagram is convex in [0, +∞)2 . (This is an easy result
requiring only interpolation of scalars.) Conclude that the hypotheses p0 ≤
q0 , p1 ≤ q1 in the Marcinkiewicz interpolation theorem can be replaced by
the variant pθ < qθ .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
178 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 179
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
180 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.11. Interpolation of Lp spaces 181
Exercise 1.11.26. Let 1 ≤ p ≤ ∞, and let f ∈ Lp (Rn ), g ∈ Lp (Rn ).
Young’s inequality tells us that f ∗ g ∈ L∞ (Rn ). Refine this further by
showing that f ∗ g ∈ C0 (Rn ), i.e., f ∗ g is continuous and goes to zero at
infinity. (Hint: First show this when f, g ∈ Cc (Rn ), then use a limiting
argument.)
We now give a variant of Schur’s test that allows for weak estimates.
Lemma 1.11.17 (Weak-type Schur’s test). Let K : X × Y → C be a
measurable function obeying the bounds
K(x, ·)Lq0 ,∞ (Y ) ≤ B0
for almost every x ∈ X, and
K(·, y)Lp1 ,∞ (X) ≤ B1
for almost every y ∈ Y , where 1 < p1 , q0 < ∞ and B0 , B1 > 0 (note the
endpoint exponents 1, ∞ are now excluded). Then for every 0 < θ < 1, |T |
and T are of strong-type (pθ , qθ ), with T f (y) well defined for all f ∈ Lpθ (X)
and almost every y ∈ Y , and furthermore
T f Lqθ (Y ) ≤ Cp1 ,q0 ,θ Bθ f Lpθ (X) .
Here we again adopt the convention that p0 := 1 and q1 := ∞.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
182 1. Real analysis
Recall that the function 1/|x|α will lie in Ln/α,∞ (Rn ) for α > 0. We
conclude
Corollary 1.11.18 (Hardy-Littlewood-Sobolev fractional integration in-
equality). Let 1 < p, r < ∞ and 0 < α < n be such that p1 + αn = 1r + 1. If
f ∈ Lp (Rn ), then the function Iα f , defined as
f (y)
Iα f (x) := dy,
Rn |x − y|α
is well defined almost everywhere and lies in Lr (Rn ), and furthermore
Iα f Lr (Rn ) ≤ Cp,α,n f Lp (Rn )
for some constant Cp,α,n > 0.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.12
In these notes we lay out the basic theory of the Fourier transform, which
is of course the most fundamental tool in harmonic analysis and is also of
major importance in related fields (functional analysis, complex analysis,
PDE, number theory, additive combinatorics, representation theory, signal
processing, etc.) The Fourier transform, in conjunction with the Fourier
inversion formula, allows one to take essentially arbitrary (complex-valued)
functions on a group G (or more generally, a space X that G acts on, e.g.,
a homogeneous space G/H) and decompose them as a (discrete or continu-
ous) superposition of much more symmetric functions on the domain, such
as characters χ : G → S 1 . The precise superposition is given by Fourier
coefficients fˆ(ξ), which take values in some dual object such as the Pontrya-
gin dual Ĝ of G. Characters behave in a very simple manner with respect to
translation (indeed, they are eigenfunctions of the translation action), and
so the Fourier transform tends to simplify any mathematical problem which
enjoys a translation invariance symmetry (or an approximation to such a
symmetry) and is somehow linear (i.e., it interacts nicely with superposi-
tions). In particular, Fourier analytic methods are particularly useful for
studying operations such as convolution f, g → f ∗ g and set-theoretic addi-
tion A, B → A + B, or the closely related problem of counting solutions to
additive problems such as x = a1 + a2 + a3 or x = a1 − a2 , where a1 , a2 , a3
are constrained to lie in specific sets A1 , A2 , A3 . The Fourier transform is
also a particularly powerful tool for solving constant-coefficient linear ODE
and PDE (because of the translation invariance), and it can also approxi-
mately solve some variable-coefficient (or slightly non-linear) equations if the
coefficients vary smoothly enough and the nonlinear terms are sufficiently
tame.
183
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
184 1. Real analysis
The Fourier transform fˆ(ξ) also provides an important new way of look-
ing at a function f (x), as it highlights the distribution of f in frequency
space (the domain of the frequency variable ξ) rather than physical space
(the domain of the physical variable x). A given property of f in the phys-
ical domain may be transformed to a rather different-looking property of fˆ
in the frequency domain. For instance:
• Smoothness of f in the physical domain corresponds to decay of fˆ
in the Fourier domain, and conversely. (More generally, fine scale
properties of f tend to manifest themselves as coarse scale properties
of fˆ, and conversely.)
• Convolution in the physical domain corresponds to pointwise multi-
plication in the Fourier domain, and conversely.
• Constant coefficient differential operators such as d/dx in the physical
domain correspond to multiplication by polynomials such as 2πiξ in
the Fourier domain, and conversely.
• More generally, translation invariant operators in the physical domain
correspond to multiplication by symbols in the Fourier domain, and
conversely.
• Rescaling in the physical domain by an invertible linear transfor-
mation corresponds to an inverse (adjoint) rescaling in the Fourier
domain.
• Restriction to a subspace (or subgroup) in the physical domain cor-
responds to projection to the dual quotient space (or quotient group)
in the Fourier domain, and conversely.
• Frequency modulation in the physical domain corresponds to trans-
lation in the frequency domain, and conversely.
(We will make these statements more precise below.)
On the other hand, some operations in the physical domain remain es-
sentially unchanged in the Fourier domain. Most importantly, the L2 norm
(or energy) of a function f is the same as that of its Fourier transform, and
more generally the inner product f, g of two functions f is the same as
that of their Fourier transforms. Indeed, the Fourier transform is a unitary
operator on L2 (a fact which is variously known as the Plancherel theorem or
the Parseval identity). This makes it easier to pass back and forth between
the physical domain and frequency domain, so that one can combine tech-
niques that are easy to execute in the physical domain with other techniques
that are easy to execute in the frequency domain. (In fact, one can combine
the physical and frequency domains together into a product domain known
as phase space, and there are entire fields of mathematics (e.g., microlocal
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 185
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
186 1. Real analysis
translation operation
τx f (y) := f (y − x).
LCA groups need not be σ-compact (think of the free abelian group on
uncountably many generators, with the discrete topology), but one has the
following useful substitute:
Exercise 1.12.1. Show that every LCA group G contains a σ-compact
open subgroup H, and in particular is the disjoint union of σ-compact sets.
(Hint: Take a compact symmetric neighbourhood K of the identity, and
consider the group H generated by this neighbourhood.)
(1.99) τx f dμ = f dμ
G G
for all f ∈ Cc (G) and x ∈ G. The trivial measure 0 is of course a Haar
measure; all other Haar measures are called non-trivial.
Let us note some non-trivial Haar measures in the four basic examples
of locally compact abelian groups:
• For a finite additive group G, one can take either counting measure
# or normalised counting measure #/#(G) as a Haar measure. (The
former measure emphasises the discrete nature of G; the latter mea-
sure emphasises the compact nature of G.)
• For finitely generated additive groups such as Zd , counting measure
# is a Haar measure.
• For the standard torus (R/Z)d , one can obtain a Haar measure by
identifying this torus with [0, 1)d in the usual manner and then tak-
ing Lebesgue measure on the latter space. This Haar measure is a
probability measure.
• For the standard Euclidean space Rd , Lebesgue measure is a Haar
measure.
Of course, any non-negative constant multiple of a Haar measure is again
a Haar measure. The converse is also true:
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 187
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
188 1. Real analysis
which is compact by Tychonoff’s theorem. Now use (d) and the finite
intersection property.)
(The argument can be adapted to the case when G is not metrisable, but
one has to replace the sequential compactness given by Prokhorov’s theorem
with the topological compactness given by the Banach-Alaoglu theorem.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 189
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
190 1. Real analysis
The Pontryagin dual can be computed easily for various classical LCA
groups:
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 191
for any f ∈ L1 (G) and ξ0 , ξ ∈ Ĝ, where χξ0 is the multiplicative character
χξ0 : x → e2πiξ0 ·x .
Exercise 1.12.11 (Riemann-Lebesgue lemma). If f ∈ L1 (G), show that
fˆ : Ĝ → C is continuous. Furthermore, show that fˆ goes to zero at infinity in
the sense that for every ε > 0 there exists a compact subset K of Ĝ such that
|fˆ(ξ)| ≤ ε for ξ ∈ K. (Hint: First show that there exists a neighbourhood U
of the identity in G such that τx f − f L1 (G) ≤ ε2 (say) for all x ∈ U . Now
take the Fourier transform of this fact.) Thus the Fourier transform maps
L1 (G) continuously to C0 (Ĝ), the space of continuous functions on Ĝ which
go to zero at infinity; the decay at infinity is known as the Riemann-Lebesgue
lemma.
Exercise 1.12.12. Let G be an LCA group with non-trivial Haar measure
μ. Show that the topology of Ĝ is the weakest topology such that fˆ is
continuous for every f ∈ L1 (G).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
192 1. Real analysis
(1.102) f
∗ g(ξ) = fˆ(ξ)ĝ(ξ)
for all ξ ∈ Ĝ; thus the Fourier transform converts convolution to a pointwise
product.
Exercise 1.12.14. Let G, H be LCA groups with non-trivial Haar mea-
sures μ, ν, respectively, and let f ∈ L1 (G), g ∈ L1 (H). Show that the tensor
product f ⊗ g ∈ L1 (G × H) (with product Haar measure μ × ν) has a Fourier
transform of fˆ ⊗ ĝ, where we identify G × H with Ĝ × Ĥ as per Exercise
1.12.9(f). Informally, this exercise asserts that the Fourier transform com-
mutes with tensor products. (Because of this fact, the tensor power trick (see
Section 1.9 of Structure and Randomness) is often available when proving
results about the Fourier transform on general groups.)
Exercise 1.12.15 (Convolution and Fourier transform of measures). If ν ∈
M (G) is a finite Radon measure on an LCA group G with non-trivial Haar
measure μ, define the Fourier-Stieltjes transform ν̂ : Ĝ → C by the formula
ν̂(ξ) := G e−2πiξ·x dν(x) (thus for instance μˆf = fˆ for any f ∈ L1 (G)).
Show that ν̂ is a bounded continuous function on Ĝ. Given any f ∈ L1 (G),
define the convolution f ∗ ν : G → C to be the function
f ∗ ν(x) := f (x − y) dν(y),
G
and given any finite Radon measure ρ, let ν ∗ ρ : G → C be the measure
ν ∗ ρ(E) := 1E (x + y) dν(x)dρ(y).
G G
Show that f ∗ ν ∈ L1 (G) and f ∗ ν(ξ) = fˆ(ξ)ν̂(ξ) for all ξ ∈ Ĝ, and similarly
that ν ∗ ρ is a finite measure and ν∗ ρ(ξ) = ν̂(ξ)ρ̂(ξ) for all ξ ∈ Ĝ. Thus the
convolution and Fourier structure on L1 (G) can be extended to the larger
space M (G) of finite Radon measures.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 193
The full proof of this theorem requires the spectral theorem and is not
given here, though see Exercise 1.12.43 below. However, we can work out
some important special cases here.
• When G is a torus G = Td = (R/Z)d , the multiplicative characters
x → e2πiξ·x separate points (given any two x, y ∈ G, there exists a
character which takes different values at x and at y). The space of
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
194 1. Real analysis
• Parseval identity, II. For any f, g ∈ L2 (G), we have f, gL2 (G) =
ˆ
ξ∈Ĝ f (ξ)ĝ(ξ).
• Unitarity. Thus the Fourier transform is a unitary transformation
from L2 (G) to 2 (Ĝ).
• Inversion formula. For any f ∈ L2 (G), the series x →
ˆ 2πiξ·x
ξ∈Ĝ f (ξ)e converges unconditionally in L2 (G) to f .
• Inversion formula, II. For any sequence (cξ )ξ∈Ĝ in 2 (Ĝ), the series
x → ξ∈Ĝ cξ e2πiξ·x converges unconditionally in L2 (G) to a function
f with cξ as its Fourier coefficients.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 195
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
196 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 197
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
198 1. Real analysis
One of the reasons why the Schwartz space is convenient to work with
is that it is closed under a wide variety of operations. For instance, the
derivative of a Schwartz function is again a Schwartz function, and that
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 199
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
200 1. Real analysis
is ĝr (ξ) = e−πr |ξ| . (Hint: Reduce to the case d = 1 and r = 1, then
2 2
complete the square and use contour integration and the classical identity
∞ −πx2
−∞ e dx = 1.) Conclude that F ∗ F gr = gr .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 201
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
202 1. Real analysis
These wave packets are normalised to have L2 norm one, and their Fourier
transform is given by
where the errors terms are morally of the form O(Rgx0 ,ξ0 ,R ) and
O(R−1 gx0 ,ξ0 ,R ) respectively. Of course, the non-commutativity of D and
X as evidenced by the last equation in (1.109) shows that exact diagonali-
sation is impossible. Nevertheless it is useful, at an intuitive level at least, to
view these wave packets as a sort of (overdetermined) basis for L2 (R) that
approximately diagonalises X and D (as well as other formal combinations
a(X, D) of these operators, such as differential operators or pseudodifferen-
tial operators). Meanwhile, the Fourier transform morally maps the point
(x0 , ξ0 ) in phase space to (ξ0 , −x0 ), as evidenced by (1.110) or (1.109); it is
the model example of the more general class of Fourier integral operators,
which morally move points in phase space around by canonical transfor-
mations. The study of these types of objects (which are of importance in
linear PDE) is known as microlocal analysis, and is beyond the scope of this
course.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 203
all ξ ∈ R .
r
Exercise 1.12.40 (Fourier transform on large tori). Let L > 0, and let
(R/LZ)d be the torus of length L with Lebesgue measure dx (thus the total
measure of this torus is Ld . We identify the Pontryagin dual of this torus
with L1 · Zd in the usual manner, thus we have the Fourier coefficients
fˆ(ξ) := f (x)e−2πiξ·x dx
(R/LZ)d
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
204 1. Real analysis
Fourier transform F̂ (ξ) = fˆ(ξ) for all ξ ∈ Zd ⊂ Rd (note the two different
Fourier transforms in play here). Conclude the Poisson summation formula
f (n) = fˆ(m).
n∈Zd m∈Zd
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 205
(e) Show that if f ∈ L1 (G) is not identically zero, then there exists
ξ ∈ Ĝ such that fˆ(ξ) = 0. (Hint: First find g ∈ L2 (G) such that
f ∗ g ∗ g ∗ (0) = 0 and g ∗ g ∗ (0) = 0, and conclude using (d) re-
peatedly that lim inf n→∞ (f ∗ f ∗ )∗n L1 (G) > 0. Then use (a), (b),
1/n
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
206 1. Real analysis
Again, see Rudin [Ru1962] for details. A related result is that of Pon-
tryagin duality: if Ĝ is the Pontryagin dual of an LCA group G, then G is
the Pontryagin dual of Ĝ. (Certainly, every element x ∈ G defines a charac-
ˆ
ter x̂ : ξ → ξ · x on Ĝ, thus embedding G into Ĝ via the Gelfand transform
(see Section 1.10.4). The non-trivial fact is that this embedding is in fact
surjective.) One can use Pontryagin duality to convert various properties
of LCA groups into other properties on LCA groups. For instance, we have
already seen that Ĝ is compact (resp. discrete) if G is discrete (resp. com-
pact); with Pontryagin duality, the implications can now also be reversed.
As another example, one can show that Ĝ is connected (resp. torsion-free)
if and only if G is torsion-free (resp. connected). We will not prove these
assertions here.
It is natural to ask what happens for non-abelian locally compact groups
G = (G, ·). One can still build non-trivial Haar measures (the proof sketched
out in Exercise 1.12.7 extends without difficulty to the non-abelian setting),
though one must now distinguish between left-invariant and right-invariant
Haar measures. (The two notions are equivalent for some classes of groups,
notably compact groups, but not in general. Groups for which the two
notions of Haar measures coincide are called unimodular.) However, when
G is non-abelian then there are not enough multiplicative characters χ :
G → S 1 to have a satisfactory Fourier analysis. (Indeed, such characters
must annihilate the commutator group [G, G], and it is entirely possible for
this commutator group to be all of G, e.g., if G is simple and non-abelian.)
Instead, one must generalise the notion of a multiplicative character to that
of a unitary representation ρ : G → U (H) from G to the group of unitary
transformations on a complex Hilbert space H; thus the Fourier coefficients
fˆ(ρ) of a function will now be operators on this Hilbert space H, rather than
complex numbers. When G is a compact group, it turns out to be possible
to restrict our attention to finite-dimensional representations (thus one can
replace U (H) by the matrix group U (n) for some n). The analogue of the
Pontryagin dual Ĝ is then the collection of (irreducible) finite-dimensional
unitary representations of G, up to isomorphism. There is an analogue of
the Plancherel theorem in this setting, closely related to the Peter-Weyl
theorem in representation theory. We will not discuss these topics here, but
refer the reader instead to any representation theory text.
The situation for non-compact non-abelian groups (e.g., SL2 (R)) is sig-
nificantly more subtle, as one must now consider infinite-dimensional repre-
sentations as well as finite-dimensional ones, and the inversion formula can
become quite non-trivial (one has to decide what weight each representation
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 207
Rf (ω, t) := f,
x·ω=t
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
208 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.12. The Fourier transform 209
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
210 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.13
Distributions
211
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
212 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 213
Schwartz class S(Rd )), then one obtains closure under another important
operation, namely the Fourier transform. This allows one to define vari-
ous Fourier-analytic operations (e.g., pseudodifferential operators) on such
distributions.
Of course, at the end of the day, one is usually not all that interested in
distributions in their own right, but would like to be able to use them as a
tool to study more classical objects, such as smooth functions. Fortunately,
one can recover facts about smooth functions from facts about the (far
rougher) space of distributions in a number of ways. For instance, if one
convolves a distribution with a smooth, compactly supported function, one
gets back a smooth function. This is a particularly useful fact in the theory
of constant-coefficient linear partial differential equations such as Lu = f ,
as it allows one to recover a smooth solution u from smooth, compactly
supported data f by convolving f with a specific distribution G, known as
the fundamental solution of L. We will give some examples of this later in
this section.
It is this unusual and useful combination of both being able to pass
from classical functions to generalised functions (e.g., by differentiation)
and then back from generalised functions to classical functions (e.g., by
convolution) that sets the theory of distributions apart from other competing
theories of generalised functions, in particular allowing one to justify many
formal calculations in PDE and Fourier analysis rigorously with relatively
little additional effort. On the other hand, being defined by linear duality,
the theory of distributions becomes somewhat less useful when one moves
to more nonlinear problems, such as nonlinear PDE. However, they still
serve an important supporting role in such problems as an ambient space of
functions, inside of which one carves out more useful function spaces, such
as Sobolev spaces, which we will discuss in Section 1.14.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
214 1. Real analysis
Exercise 1.13.1.
(i) Show that there exists at least one test function that is not identically
zero. (Hint: It suffices to do this for d = 1. One starting point is to
use the fact that the function f : R → R defined by f (x) := e−1/x
for x > 0 and f (x) := 0 otherwise is smooth, even at the origin 0.)
(ii) Show that if f ∈ Cc∞ (Rd ) and g : Rd → R is absolutely inte-
grable and compactly supported, then the convolution f ∗ g is also in
Cc∞ (Rd ). (Hint: First show that f ∗ g is continuously differentiable
with ∇(f ∗ g) = (∇f ) ∗ g.)
(iii) C ∞ Urysohn lemma. Let K be a compact subset of Rd , and let U
be an open neighbourhood of K. Show that there exists a function
f : Cc∞ (Rd ) supported in U which equals 1 on K. (Hint: Use the
ordinary Urysohn lemma to find a function in Cc (Rd ) that equals 1
on a neighbourhood of K and is supported in a compact subset of U ,
then convolve this function by a suitable test function.)
(iv) Show that Cc∞ (Rd ) is dense in C0 (Rd ) (in the uniform topology),
and dense in Lp (Rd ) (with the Lp topology) for all 0 < p < ∞.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 215
f if and only if there exists a compact set K such that fn , f are all supported
in K, and fn converges to f in the smooth topology of Cc∞ (K).
Exercise 1.13.3.
(i) Show that the topology of Cc∞ (K) is first countable for every compact
K.
(ii) Show that the topology of Cc∞ (Rd ) is not first countable. (Hint:
Given any countable sequence of open neighbourhoods of 0, build a
new open neighbourhood that does not contain any of the previous
ones, using the σ-compact nature of Rd .)
(iii) Despite this, show that an element f ∈ Cc∞ (Rd ) is an adherent point
of a set E ⊂ Cc∞ (Rd ) if and only if there is a sequence fn ∈ E that
converges to f . (Hint: Argue by contradiction.) Conclude in partic-
ular that a subset of Cc∞ (Rd ) is closed if and only if it is sequentially
closed. Thus while first countability fails for Cc∞ (Rd ), we have a
serviceable substitute for this property.
Exercise 1.13.4.
(i) Let K be a compact set. Show that a linear map T : Cc∞ (K) → X
into a normed vector space X is continuous if and only if there exists
k ≥ 0 and C > 0 such that T f X ≤ Cf C k for all f ∈ Cc∞ (K).
(ii) Let K, K be compact sets. Show that a linear map T : Cc∞ (K) →
Cc∞ (K ) is continuous if and only if for every k ≥ 0 there exists
k ≥ 0 and a constant Ck > 0 such that T f C k ≤ Ck f C k for all
f ∈ Cc∞ (K).
(iii) Show that a map T : Cc∞ (Rd ) → X to a topological space is contin-
uous if and only if for every compact set K ⊂ Rd , T maps Cc∞ (K)
continuously to X.
(iv) Show that the inclusion map from Cc∞ (Rd ) to Lp (Rd ) is continuous
for every 0 < p ≤ ∞.
(v) Show that a map T : Cc∞ (Rd ) → Cc∞ (Rd ) is continuous if and only
if for every compact set K ⊂ Rd there exists a compact set K such
that T maps Cc∞ (K) continuously to Cc∞ (K ).
(vi) Show that every linear differential operator with smooth coefficients
is a continuous operation on Cc∞ (Rd ).
(vii) Show that convolution with any absolutely integrable, compactly sup-
ported function is a continuous operation on Cc∞ (Rd ).
(viii) Show that Cc∞ (Rd ) is a topological vector space.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
216 1. Real analysis
(ix) Show that the product operation f, g → f g is continuous from Cc∞ (Rd )
×Cc∞ (Rd ) to Cc∞ (Rd ).
A sequence φn ∈ Cc (Rd ) of continuous, compactly supported functions
is said to be an approximation to the identity if the φn are non-negative,
have total mass Rn φn equal to 1, and converge uniformly to zero away
from the origin; thus, sup|x|≥r |φn (x)| → 0 for all r > 0. One can generate
such a sequence by starting with a single non-negative continuous compactly
supported function φ of total mass 1, and then setting φn (x) := nd φ(nx);
many other constructions are possible also.
One has the following useful fact:
Exercise 1.13.5. Let φn ∈ Cc∞ (Rd ) be a sequence of approximations to
the identity.
(i) If f ∈ C(Rd ) is continuous, show that f ∗ φn converges uniformly on
compact sets to f .
(ii) If f ∈ Lp (Rd ) for some 1 ≤ p < ∞, show that f ∗ φn converges in
Lp (Rd ) to f . (Hint: Use (i), the density of C0 (Rd ) in Lp (Rd ), and
Young’s inequality, Exercise 1.11.25.)
(iii) If f ∈ Cc∞ (Rd ), show that f ∗ φn converges in Cc∞ (Rd ) to f . (Hint:
Use the identity ∇(f ∗ φn ) = (∇f ) ∗ φn , cf. Exercise 1.13.1(ii).)
Exercise 1.13.6. Show that Cc∞ (Rd ) is separable. (Hint: It suffices to show
that Cc∞ (K) is separable for each compact K. There are several ways to
accomplish this. One is to begin with the Stone-Weierstrass theorem, which
will give a countable set which is dense in the uniform topology, then use the
fundamental theorem of calculus to strengthen the topology. Another is to
use Exercise 1.13.5 and then discretise the convolution. Another is to embed
K into a torus and use Fourier series, noting that the Fourier coefficients fˆ
of a smooth function f : Td → C decay faster than any power of |n|.)
1.13.2. Distributions. Now we can define the concept of a distribution.
Definition 1.13.1 (Distribution). A distribution on Rd is a continuous lin-
ear functional λ : f → f, λ from Cc∞ (Rd ) to C. The space of such distribu-
tions is denoted Cc∞ (Rd )∗ , and is given the weak-* topology. In particular,
a sequence of distributions λn converges (in the sense of distributions) to a
limit λ if one has f, λn → f, λ for all f ∈ Cc∞ (Rd ).
A technical point: We endow the space Cc∞ (Rd )∗ with the conjugate
complex structure. Thus, if λ ∈ Cc∞ (Rd )∗ and c is a complex number,
then cλ is the distribution that maps a test function f to cf, λ rather
than cf, λ; thus f, cλ = cf, λ. This is to keep the analogy between
the evaluation of a distribution against a function and the usual Hermitian
inner product f, g = Rd f g of two test functions.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 217
From the above exercise, we may view locally integrable functions and lo-
cally finite measures as a special type of distribution. In particular, Cc∞ (Rd )
and Lp (Rd ) are now contained in Cc∞ (Rd )∗ for all 1 ≤ p ≤ ∞.
Exercise 1.13.9. Show that if a sequence of locally integrable functions
converge in L1loc to a limit, then they also converge in the sense of distribu-
tions; similarly, if a sequence of complex Radon measures converge in the
vague topology to a limit, then they also converge in the sense of distribu-
tions.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
218 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 219
for all test functions f . It is easy to see (e.g., using Exercise 1.13.4(vi)) that
this defines a distribution λh, and that this operation is compatible with
existing definitions of products between a locally integrable function (or
Radon measure) with a smooth function. It is important that h is smooth
(and not merely, say, continuous) because one needs the product of a test
function f with h to still be a test function.
Exercise 1.13.16. Let d = 1. Establish the identity
δf = f (0)δ
for any smooth function f . In particular,
δx = 0,
where we abuse notation slightly and write x for the identity function x → x.
Conversely, if λ is a distribution such that
λx = 0,
show that λ is a constant multiple of δ. (Hint: Use the identity f (x) =
f (0) + x 0 f (tx) dt to write f (x) as the sum of f (0)ψ and x times a test
1
function for any test function f , where ψ is a fixed test function equalling 1
at the origin.)
Remark 1.13.2. Even though distributions are not, strictly speaking, func-
tions, it is often useful heuristically to view them as such. Thus, for instance,
one might write a distributional identity such as δx = 0 suggestively as
δ(x)x = 0. Another useful (and rigorous) way to view such identities is to
write distributions such as δ as a limit of approximations to the identity
ψn , and show that the relevant identity becomes true in the limit; thus, for
instance, to show that δx = 0, one can show that ψn x → 0 in the sense of
distributions as n → ∞. (In fact, ψn x converges to zero in the L1 norm.)
Exercise 1.13.17. Let d = 1. With the distribution p. v. x1 from Exercise
1.13.12, show that (p. v. x1 )x is equal to 1. With the distributions λr from
Exercise 1.13.13, show that λr x = sgn, where sgn is the signum function.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
220 1. Real analysis
Proof. If λ were itself a smooth function, then one could easily verify the
identity
(1.115) λ ∗ h(x) = hx , λ,
where hx (y) := h(x − y). As h is a test function, it is easy to see that hx
varies smoothly in x in any C k norm (indeed, it has Taylor expansions to
any order in such norms), and so the right-hand side is a smooth function of
x. So it suffices to verify the identity (1.115). As distributions are defined
against test functions f , it suffices to show that
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 221
So the only issue is to justify the interchange of integral and inner product:
Certainly (from the compact support of f ), any Riemann sum can be inter-
changed with the inner product
where xn ranges over some lattice and Δx is the volume of the fundamental
domain. A modification of the argument that shows convergence of the
Riemann integral for smooth, compactly supported functions then works
here and allows one to take limits. We omit the details.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
222 1. Real analysis
∂ ∂
f, λ := − f, λ.
∂xj ∂xj
This can be verified to still be a distribution, and by Exercise 1.13.4(vi),
the operation of differentiation is a continuous one on distributions. More
generally, given any linear differential operator P with smooth coefficients,
one can define P λ for a distribution λ by the formula
f, P λ := P ∗ f, λ,
where P ∗ is the adjoint differential operator P , which can be defined implic-
itly by the formula
f, P g = P ∗ f, g
for test functions f, g, or more explicitly by replacing all coefficients with
complex conjugates, replacing each partial derivative ∂x∂ j with its negative,
and reversing the order of operations (thus, for instance, the adjoint of the
first-order operator a(x) dxd
: f → af would be − dx
d
a(x) : f → −(af ) ).
Example 1.13.6. The distribution δ defined in Exercise 1.13.11 is the
d
derivative dx δ of δ, as defined by the above formula.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 223
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
224 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 225
We close this section with one caveat. Despite the many operations
that one can perform on distributions, there are two types of operations
which cannot, in general, be defined on arbitrary distributions (at least
while remaining in the class of distributions):
• Nonlinear operations (e.g., taking the absolute value of a distribu-
tion); or
• Multiplying a distribution by anything rougher than a smooth func-
tion.
Thus, for instance, there is no meaningful way to interpret the square
δ2 of the Dirac delta function as a distribution. This is perhaps easiest to
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
226 1. Real analysis
Since Cc∞ (Rd ) embeds continuously into S(Rd ) (with a dense image),
we see that the space of tempered distributions can be embedded into the
space of distributions. However, not every distribution is tempered:
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 227
But we can now add a new operation to this list using the formula
(1.117): as the Fourier transform F maps Schwartz functions continuously
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
228 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 229
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
230 1. Real analysis
(1.120)
|x| −α (ξ) = |x|−α e−2πiξ·x dx,
Rd
does not seem to make much sense (the integral is not absolutely integrable),
although a change of variables (or dimensional analysis) heuristic can at least
lead to the prediction that the integral (1.120) should be some multiple of
|ξ|α−d . But which multiple should it be? To continue the formal calculation,
we can write the non-integrable function |x|−α as an average of integrable
functions whose Fourier transforms are already known. There are many such
functions that one could use here, but it is natural to use Gaussians, as they
have a particularly pleasant Fourier transform, namely
e−πt 2 |x|2
(ξ) = td e−π|ξ|
2 /t2
for t > 0 (see Exercise 1.12.32). To get from Gaussians to |x|−α , one can
observe that |x|−α is invariant under the scaling f (x) → tα f (tx) for t > 0.
Thus, it is natural to average the standard Gaussian e−π|x| with respect
2
to this scaling, thus producing the function tα e−πt |x| , then integrate with
2 2
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 231
dt
respect to the multiplicative Haar measure t . A straightforward change of
variables then gives the identity
∞
dt 1
tα e−πt
2 |x|2
= π −α/2 |x|−α Γ(α/2),
0 t 2
where
∞
dt
Γ(s) := ts e−t
0 t
is the Gamma function. If we formally take Fourier transforms of this iden-
tity, we obtain
∞
dt 1
tα t−d e−π|x| = π −α/2 |x|
2 /t2
−α (ξ)Γ(α/2).
0 t 2
Another change of variables shows that
∞
dt 1
tα t−d e−π|x| = π −(d−α)/2 |ξ|−(d−α) Γ((d − α)/2),
2 /t2
0 t 2
and so we conclude (formally) that
−α (ξ) =
π −(d−α)/2 Γ((d − α)/2) −(d−α)
(1.121) |x| |ξ| ,
π −α/2 Γ(α/2)
thus solving the problem of what the constant multiple of |ξ|−(d−α) should
be.
Exercise 1.13.35. Give a rigorous proof of (1.121) for 0 < α < d (when
both sides are locally integrable) in the sense of distributions. (Hint: Ba-
sically, one needs to test the entire formal argument against an arbitrary
Schwartz function.) The identity (1.121) can in fact be continued mero-
morphically in α, but the interpretation of distributions such as |x|−α when
|x|−α is not locally integrable is somewhat complicated (cf. Exercise 1.13.12)
and will not be discussed here.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
232 1. Real analysis
So from (1.119) we see that one choice of the fundamental solution K is the
Newton potential
−1
K= ,
4π|x|
leading to an explicit (and rigorously derived) solution
1 f (y)
(1.122) u(x) := f ∗ K(x) = − dy
4π R3 |x − y|
to the Poisson equation (1.118) in d = 3 for Schwartz functions f . (This is
not quite the only fundamental solution K available; one can add a harmonic
polynomial to K, which will end up adding a harmonic polynomial to u, since
the convolution of a harmonic polynomial with a Schwartz function is easily
seen to still be harmonic.)
Exercise 1.13.36. Without using the theory of distributions, give an alter-
nate (and still rigorous) proof that the function u defined in (1.122) solves
(1.118) in d = 3.
Exercise 1.13.37. • Show that for any d ≥ 3, a fundamental solution
K to the Poisson equation is given by the locally integrable function
1 1
K(x) = ,
d(d − 2)ωd |x|d−2
where ωd = π d/2 /Γ( d2 + 1) is the volume of the unit ball in d dimen-
sions.
• Show that for d = 1, a fundamental solution is given by the locally
integrable function K(x) = |x|/2.
• Show that for d = 2, a fundamental solution is given by the locally
1
integrable function K(x) = 2π log |x|.
Thus we see that for the Poisson equation, d = 2 is a critical dimension,
requiring a logarithmic correction to the usual formula.
Similar methods can solve other constant coefficient linear PDE. We give
some standard examples in the exercises below.
Exercise 1.13.38. Let d ≥ 1. Show that a smooth solution u : R+ × Rd →
C to the heat equation ∂t u = Δu with initial data u(0, x) = f (x) for some
Schwartz function f is given by u(t) = f ∗ Kt for t > 0, where Kt is the heat
kernel
1
e−|x−y| /4t .
2
Kt (x) = d/2
(4πt)
(This solution is unique assuming certain smoothness and decay conditions
at infinity, but we will not pursue this issue here.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.13. Distributions 233
14 The close similarity here with the heat kernel is a manifestation of Wick rotation in action.
However, from an analytical viewpoint, the two kernels are very different. For instance, the
convergence of f ∗ Kt to f as t → 0 follows in the heat kernel case by the theory of approximations
to the identity, whereas the convergence in the Schrödinger case is much more subtle and is best
seen via Fourier analysis.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.14
Sobolev spaces
• Let φ ∈ Cc∞ (R) be a test function that equals 1 near the origin, and
let N be a large number. Then the function f (x) := φ(x) sin(N x)
oscillates at a wavelength of about 1/N , and a frequency scale of
about N . While f is, strictly speaking, a smooth function, it be-
comes increasingly less smooth in the limit N → ∞; for instance, the
derivative f (x) = φ (x) sin(N x) + N φ(x) cos(N x) grows at a roughly
linear rate as N → ∞, and the higher derivatives grow at even faster
rates. So this function does not really have any regularity in the
235
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
236 1. Real analysis
limit N → ∞. Note however that the height and width of this func-
tion is bounded uniformly in N , so regularity and frequency scale are
independent of height and width.
• Continuing the previous example, now consider the function g(x) :=
N −s φ(x) sin(N x), where s ≥ 0 is some parameter. This function also
has a frequency scale of about N . But now it has a certain amount of
regularity, even in the limit N → ∞; indeed, one easily checks that
the kth derivative of g stays bounded in N as long as k ≤ s. So one
could view this function as having “s degrees of regularity” in the
limit N → ∞.
• In a similar vein, the function N −s φ(N x) also has a frequency scale
of about N and can be viewed as having s degrees of regularity in
the limit N → ∞.
• The function φ(x)|x|s 1x>0 also has about s degrees of regularity, in
the sense that it can be differentiated up to s times before becoming
unbounded. By performing a dyadic decomposition of the x variable,
one can also decompose this function into components ψ(2n x)|x|s
for n ≥ 0, where ψ(x) := (φ(x) − φ(2x))1x>0 is a bump function
supported away from the origin; each such component has frequency
scale about 2n and s degrees of regularity. Thus we see that the
original function φ(x)|x|s 1x>0 has a range of frequency scales, ranging
from about 1 all the way to +∞.
• One can of course concoct higher-dimensional analogues of these ex-
amples. For instance, the localised plane wave φ(x) sin(ξ · x) in Rd ,
where φ ∈ Cc∞ (Rd ) is a test function, would have a frequency scale
of about |ξ|.
There are a variety of function space norms that can be used to cap-
ture frequency scale (or regularity) in addition to height and width. The
most common and well-known examples of such spaces are the Sobolev space
norms f W s,p (Rd ) , although there are a number of other norms with similar
features, such as Hölder norms, Besov norms, and Triebel-Lizorkin norms.
Very roughly speaking, the W s,p norm is like the Lp norm, but with “s ad-
ditional degrees of regularity”. For instance, in one dimension, the function
Aφ(x/R) sin(N x), where φ is a fixed test function and R, N are large, will
have a W s,p norm of about |A|R1/p N s , thus combining the height |A|, the
width R, and the frequency scale N of this function together. (Compare
this with the Lp norm of the same function, which is about |A|R1/p .)
To a large extent, the theory of the Sobolev spaces W s,p (Rd ) resembles
their Lebesgue counterparts Lp (Rd ) (which are as the special case of Sobolev
spaces when s = 0), but with the additional benefit of being able to interact
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 237
∂f
very nicely with (weak) derivatives: a first derivative ∂x j
of a function in
p
an L space usually leaves all Lebesgue spaces, but a first derivative of a
function in the Sobolev space W s,p will end up in another Sobolev space
W s−1,p . This compatibility with the differentiation operation begins to ex-
plain why Sobolev spaces are so useful in the theory of partial differential
equations. Furthermore, the regularity parameter s in Sobolev spaces is not
restricted to be a natural number; it can be any real number, and one can
use a fractional derivative or integration operators to move from one regu-
larity to another. Despite the fact that most partial differential equations
involve differential operators of integer order, fractional spaces are still of
importance; for instance it often turns out that the Sobolev spaces which
are critical (scale-invariant) for a certain PDE are of fractional order.
The uncertainty principle in Fourier analysis places a constraint between
the width and frequency scale of a function; roughly speaking (and in one di-
mension for simplicity), the product of the two quantities has to be bounded
away from zero (or to put it another way, a wave is always at least as wide as
its wavelength). This constraint can be quantified as the very useful Sobolev
embedding theorem, which allows one to trade regularity for integrability: a
function in a Sobolev space W s,p will automatically lie in a number of other
Sobolev spaces W s̃,p̃ with s̃ < s and p̃ > p; in particular, one can often em-
bed Sobolev spaces into Lebesgue spaces. The trade is not reversible: one
cannot start with a function with a lot of integrability and no regularity,
and expect to recover regularity in a space of lower integrability. (One can
already see this with the most basic example of Sobolev embedding, coming
from the fundamental theorem of calculus. If a (continuously differentiable)
function f : R → R has f in L1 (R), then we of course have f ∈ L∞ (R);
but the converse is far from true.)
Plancherel’s theorem reveals that Fourier-analytic tools are particularly
powerful when applied to L2 spaces. Because of this, the Fourier transform is
very effective at dealing with the L2 -based Sobolev spaces W s,2 (Rd ), often
abbreviated H s (Rd ). Indeed, using the fact that the Fourier transform
converts regularity to decay, we will see that the H s (Rd ) spaces are nothing
more than Fourier transforms of weighted L2 spaces, and in particular enjoy
a Hilbert space structure. These Sobolev spaces, and in particular the energy
space H 1 (Rd ), are of particular importance in any PDE that involves some
sort of energy functional (this includes large classes of elliptic, parabolic,
dispersive, and wave equations, and especially those equations connected to
physics and/or geometry).
We will not fully develop the theory of Sobolev spaces here, as this would
require the theory of singular integrals, which is beyond the scope of this
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
238 1. Real analysis
course. There are of course many references for further reading, such as
[St1970].
This norm gives C 0 the structure of a Banach space. More generally, one
can then define the spaces C k (Rd ) for any non-negative integer k as the
space of all functions which are k times continuously differentiable, with all
derivatives of order k bounded, and whose norm is given by the formula
k k
f C k (Rd ) := sup |∇j f (x)| = ∇j f L∞ (Rd ) ,
j=0 x∈Rd j=0
(One does not have to use the 2 norm here, actually; since all norms on
a finite-dimensional space are equivalent, any other means of taking norms
here will lead to an equivalent definition of the C k norm. More generally, all
the norms discussed here tend to have several definitions which are equiva-
lent up to constants, and in most cases the exact choice of norm one uses is
just a matter of personal taste.)
Remark 1.14.1. In some texts, C k (Rd ) is used to denote the functions
which are k times continuously differentiable, but whose derivatives up to
kth order are allowed to be unbounded, so for instance ex would lie in
C k (R) for every k under this definition. Here, we will refer to such func-
tions (with unbounded derivatives) as lying in Cloc k (Rd ) (i.e., they are lo-
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 239
∞ (Rd ) =
∞ k d ) (smooth functions, with no bounds on deriva-
Cloc k=1 Cloc (R
∞
tives) and C ∞ (Rd ) = k=1 C k (Rd ) (smooth functions, all of whose deriva-
∞ (R) but not C ∞ (R).
tives are bounded). Thus, for instance, ex lies in Cloc
in the sense that there exists a constant C (depending on k and d) such that
C −1 f C k (Rd ) ≤ f C̃ k (Rd ) ≤ f C k (Rd )
for all f ∈ C k (Rd ). (Hint: Use Taylor series with remainder.) Thus when
defining the C k norms, one does not really need to bound all the intermediate
derivatives ∇j f for 0 < j < k; the two extreme terms j = 0, j = k suffice.
(This is part of a more general interpolation phenomenon; the extreme terms
in a sum often already suffice to control the intermediate terms.)
Exercise 1.14.3. Let φ ∈ Cc∞ (Rd ) be a bump function, and let k ≥ 0.
Show that if ξ ∈ Rd with |ξ| ≥ 1, R ≥ 1/|ξ|, and A > 0, then the function
Aφ(x/R) sin(ξ · x) has a C k norm of at most CA|ξ|k , where C is a constant
depending only on φ, d and k. Thus we see how the Cc∞ norm relates to the
height A, width Rd , and frequency scale N of the function, and in particular
how the width R is largely irrelevant. What happens when the condition
R ≥ 1/|ξ| is dropped?
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
240 1. Real analysis
We can then define the C k,α (Rd ) spaces for natural numbers k ≥ 0 and
0 ≤ α ≤ 1 to be the subspace of C k (Rd ) whose norm
k
f C k,α (Rd ) := ∇j f C 0,α (Rd )
j=0
is finite. (As before, there are a variety of ways to define the C 0,α norm of
the tensor-valued quantity ∇j f , but they are all equivalent to each other.)
Exercise 1.14.9. Show that C k,α (Rd ) is a Banach space which contains
C k+1 (Rd ), and is contained in turn in C k (Rd ).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 241
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
242 1. Real analysis
is finite; this is the maximal choice for the C k,α (Ω). At the other extreme,
one has the space C0k,α (Ω), defined as the closure of the compactly sup-
ported functions in C k,α (Ω). This space is smaller than C k,α (Ω); for in-
stance, functions in C00,α ((0, 1)) must converge to zero at the endpoints 0, 1,
while functions in C k,α ((0, 1)) do not need to do so. An intermediate space
is C k,α (Rd ) Ω , defined as the space of restrictions of functions in C k,α (Rd )
to Ω. For instance, the restriction of |x|ψ(x) to R\{0}, where ψ is a cut-
off function non-vanishing at the origin, lies in C 1,0 (R\{0}), but is not in
C 1,0 (R) R\{0} or C01,0 (R\{0}) (note that |x|ψ(x) itself is not in C 1,0 (R),
as it is not continuously differentiable at the origin). It is possible to clar-
ify the exact relationships between the various flavours of Hölder spaces on
domains (and similarly for the Sobolev spaces discussed below), but we will
not discuss these topics here.
Exercise 1.14.13. Show that Cc∞ (Rd ) is a dense subset of C0k,α (Rd ) for
any k ≥ 0 and 0 ≤ α ≤ 1. (Hint: To approximate a compactly supported
C k,α function by a Cc∞ one, convolve with a smooth, compactly supported
approximation to the identity.)
Hölder spaces are particularly useful in elliptic PDE because tools such
as the maximum principle lend themselves well to the suprema that appear
inside the definition of the C k,α norms; see [GiTr1998] for a thorough
treatment. For simple examples of elliptic PDE, such as the Poisson equation
Δu = f , one can also use the explicit fundamental solution, through lengthy
but straightforward computations. We give a typical example here:
Exercise 1.14.14 (Schauder estimate). Let 0 < α < 1, and let f ∈
C 0,α (R3 ) be a function supported on the unit ball B(0, 1). Let u be the
unique bounded solution to the Poisson equation Δu = f (where Δ =
3 ∂2
j=1 ∂x2 is the Laplacian), given by convolution with the Newton kernel:
j
1 f (y)
u(x) := dy.
4π R3 |x − y|
(i) Show that u ∈ C 0 (R3 ).
(ii) Show that u ∈ C 1 (R3 ), and rigorously establish the formula
∂u 1 f (y)
(x) = − (xj − yj ) dy
∂xj 4π R3 |x − y|3
for j = 1, 2, 3.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 243
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
244 1. Real analysis
lie in Lp (Rd ) for all j = 0, . . . , k. If f lies in W k,p (Rd ), we define the W k,p
norm of f by the formula
k
f W k,p (Rd ) := ∇j f Lp (Rd ) .
j=0
(As before, the exact choice of convention in which one measures the Lp
norm of ∇j is not particularly relevant for most applications, as all such
conventions are equivalent up to multiplicative constants.)
The space W k,p (Rd ) is also denoted Lpk (Rd ) in some texts.
Example 1.14.4. W 0,p (Rd ) is of course the same space as Lp (Rd ), thus
the Sobolev spaces generalise the Lebesgue spaces. From Exercise 1.14.8 we
see that W 1,∞ (R) is the same space as C 0,1 (R), with an equivalent norm.
More generally, one can see from induction that W k+1,∞ (R) is the same
space as C k,1 (R) for k ≥ 0, with an equivalent norm. It is also clear that
W k,p (Rd ) contains W k+1,p (Rd ) for any k, p.
Example 1.14.5. The function | sin x| lies in W 1,∞ (R) but is not every-
where differentiable in the classical sense; nevertheless, it has a bounded
weak derivative of cos x sgn(sin(x)). On the other hand, the Cantor function
(a.k.a. the Devil’s staircase) is not in W 1,∞ (R), despite having a classical
derivative of zero at almost every point; the weak derivative is a Cantor
measure, which does not lie in any Lp space. Thus one really does need to
work with weak derivatives rather than classical derivatives to define Sobolev
spaces properly (in contrast to the C k,α spaces).
Exercise 1.14.17. Show that W k,p (Rd ) is a Banach space for any 1 ≤ p ≤
∞ and k ≥ 0.
The fact that Sobolev spaces are defined using weak derivatives is a tech-
nical nuisance, but in practice one can often end up working with classical
derivatives anyway by means of the following lemma:
Lemma 1.14.6. Let 1 ≤ p < ∞ and k ≥ 0. Then the space Cc∞ (Rd ) of test
functions is a dense subspace of W k,p (Rd ).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 245
Proof. It is clear that Cc∞ (Rd ) is a subspace of W k,p (Rd ). We first show
that the smooth functions Cloc∞ (Rd ) ∩ W k,p (Rd ) are a dense subspace of
We begin with the former claim. Let f ∈ W k,p (Rd ), and let φn be a
sequence of smooth, compactly supported approximations to the identity.
Since f ∈ Lp (Rd ), we see that f ∗ φn converges to f in Lp (Rd ). More
generally, since ∇j f is in Lp (Rd ) for 0 ≤ j ≤ k, we see that (∇j f ) ∗ φn =
∇j (f ∗ φn ) converges to ∇j f in Lp (Rd ). Thus we see that f ∗ φn converges
to f in W k,p (Rd ). On the other hand, as φn is smooth, f ∗ φn is smooth;
and the claim follows.
Now we prove the latter claim. Let f be a smooth function in W k,p (Rd ),
thus ∇j f ∈ Lp (Rd ) for all 0 ≤ j ≤ k. We let η ∈ Cc∞ (Rd ) be a compactly
supported function which equals 1 near the origin, and consider the functions
fR (x) := f (x)η(x/R) for R > 0. Clearly, each fR lies in Cc∞ (Rd ). As
R → ∞, dominated convergence shows that fR converges to f in Lp (Rd ). An
application of the product rule then lets us write ∇fR (x) = (∇f )(x)η(x/R)+
R f (x)(∇η)(x/R). The first term converges to ∇f in L (R ) by dominated
1 p d
convergence, while the second term goes to zero in the same topology; thus
∇fR converges to ∇f in Lp (Rd ). A similar argument shows that ∇j fR
converges to ∇j f in Lp (Rd ) for all 0 ≤ j ≤ k, and so fR converges to f in
W k,p (Rd ), and the claim follows.
Exercise 1.14.18. Let k ≥ 0. Show that the closure of Cc∞ (Rd ) in W k,∞ (Rd )
is C k+1 (Rd ), thus Lemma 1.14.6 fails at the endpoint p = ∞.
for all test functions f ∈ Cc∞ (R) and some constant C, as the claim then
follows by taking limits using Lemma 1.14.6. (Note that any limit in ei-
ther the L∞ or W 1,1 topologies is also a limit in the sense of distributions,
and such limits are necessarily unique. Also, since L∞ (R) is the dual space
of L1 (R), the distributional limit of any sequence bounded in L∞ (R) re-
mains in L∞ (R), by Exercise 1.13.28.) To prove (1.123), observe from the
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
246 1. Real analysis
Proof. By Lemma 1.14.6 and the same limiting argument as before, it suf-
fices to establish the Sobolev embedding inequality
f Lq (Rd ) ≤ Cp,q,d f W 1,p (Rd )
for all test functions f ∈ Cc∞ (Rd ), and some constant Cp,q,d depending
only on p, q, d, as the inequality will then extend to all f ∈ W 1,p (Rd ). To
simplify the notation, we shall use X Y to denote an estimate of the form
X ≤ Cp,q,d Y , where Cp,q,d is a constant depending on p, q, d (the exact value
of this constant may vary from instance to instance).
The case p = q is trivial. Now let us look at another extreme case,
namely when dp − 1 = dq ; by our hypotheses, this forces 1 < p < d. Here, we
use the fundamental theorem of calculus (and the compact support of f ) to
write
∞
f (x) = − ω · ∇f (x + rω) dr
0
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 247
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
248 1. Real analysis
Exercise 1.14.20. Let d ≥ 2. Show that the Sobolev endpoint estimate fails
in the case (p, q) = (d, ∞). (Hint: Experiment with functions f of the form
f (x) := N n
n=1 φ(2 x), where φ is a test function supported on the annulus
{1 ≤ |x| ≤ 2}.) Conclude in particular that W 1,d (Rd ) is not a subset
of L∞ (Rd ). (Hint: Either use the closed graph theorem or some variant
of the function f used in the first part of this exercise.) Note that when
d = 1, the Sobolev endpoint theorem for (p, q) = (1, ∞) follows from the
fundamental theorem of calculus, as mentioned earlier. There are substitutes
known for the endpoint Sobolev embedding theorem, but they involve more
sophisticated function spaces, such as the space BMO of spaces of bounded
mean oscillation, which we will not discuss here.
The p = 1 case of the Sobolev inequality cannot be proven via the Hardy-
Littlewood-Sobolev inequality; however, there are other proofs available.
One of these (due to Gagliardo and Nirenberg) is based on the following.
d
F (x1 , . . . , xd ) := fi (x1 , . . . , xi−1 , xi+1 , . . . , xd ).
i=1
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 249
Show that
d
F Lp/(d−1) (Rd ) ≤ fi Lp (Rd ) .
i=1
(Hint: Induct on d, using Hölder’s inequality and Fubini’s theorem.)
Lemma 1.14.9 (Endpoint Sobolev inequality). W 1,1 (Rd ) embeds continu-
ously into Ld/(d−1) (Rd ).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
250 1. Real analysis
Exercise 1.14.24. Show that W k,p (Rd ) embeds into W l,q (Rd ) whenever
k ≥ l ≥ 0 and 1 < p < q ≤ ∞ are such that dp − k ≤ dq − l, and such that at
least one of the two inequalities q ≤ ∞, dp − k ≤ dq − l is strict.
Exercise 1.14.25. Show that the Sobolev embedding theorem fails
whenever
n q < p. (Hint: Experiment with functions of the form f (x) =
j=1 φ(x − xj ), where φ is a test function and the xj are widely separated
points in space.)
Exercise 1.14.26 (Hölder-Sobolev embedding). Let d < p < ∞. Show that
W 1,p (Rd ) embeds continuously into C 0,α (Rd ), where 0 < α < 1 is defined
by the scaling relationship dp − 1 = −α. Use dimensional analysis to justify
why one would expect this scaling relationship to arise naturally, and give
an example to show that α cannot be improved to any higher exponent.
More generally, with the same assumptions on p, α, show that
W k+1,p (Rd )
embeds continuously into C k,α (Rd ) for all natural numbers
k ≥ 0.
Exercise 1.14.27 (Sobolev product theorem, special case). Let k ≥ 1,
1 < p, q < d/k, and 1 < r < ∞ be such that p1 + 1q − kd = 1r . Show that
whenever f ∈ W k,p (Rd ) and g ∈ W k,q (Rd ), then f g ∈ W k,r (Rd ), and that
f gW k,r (Rd ) ≤ Cp,q,k,d,r f W k,p (Rd ) gW k,q (Rd )
for some constant Cp,q,k,d,r depending only on the subscripted parameters.
(This is not the most general range of parameters for which this sort of
product theorem holds, but it is an instructive special case.)
Exercise 1.14.28. Let L be a differential operator of order m whose co-
efficients lie in C ∞ (Rd ). Show that L maps W k+m,p (Rd ) continuously to
W k,p (Rd ) for all 1 ≤ p ≤ ∞ and all integers k ≥ 0.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 251
which is valid for all L2 (Rd ) functions and in particular for Schwartz func-
tions f ∈ S(Rd ). Also, we know that the Fourier transform of any derivative
∂f ˆ
∂xj f of f is −2πiξj f (ξ). From this we see that
∂f
| (x)|2 dx = (2π|ξj |)2 |fˆ(ξ)|2 dξ,
Rd ∂xj Rd
for all k ≥ 0 and all Schwartz functions f ∈ S(Rd ). Since the Schwartz
functions are dense in W k,2 (Rd ), a limiting argument (using the fact that
L2 is complete) then shows that the above formula also holds for all f ∈
W k,2 (Rd ).
Now observe that the quantity kj=0 (2π|ξ|)2j is comparable (up to con-
stants depending on k, d) to the expression ξ2k , where x := (1 + |x|2 )1/2
(this quantity is sometimes known as the Japanese bracket of x). We thus
conclude that
f W k,2 (Rd ) ∼ ξk fˆ(ξ)L2 (Rd ) ,
where we use x ∼ y here to denote the fact that x and y are comparable up
to constants depending on d, k, and ξ denotes the variable of independent
variable on the right-hand side. If we then define, for any real number s,
the space H s (Rd ) to be the space of all tempered distributions f such that
the distribution ξs fˆ(ξ) lies in L2 and give this space the norm
f H s (Rd ) := ξs fˆ(ξ)L2 (Rd ) ,
then we see that W k,2 (Rd ) embeds into H k (Rd ) and that the norms are
equivalent.
Actually, the two spaces are equal:
Exercise 1.14.29. For any s ∈ R, show that S(Rd ) is a dense subspace of
H s (Rd ). Use this to conclude that W k,2 (Rd ) = H k (Rd ) for all non-negative
integers k.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
252 1. Real analysis
It is clear that H 0 (Rd ) ≡ L2 (Rd ), and that H s (Rd ) ⊂ H s (Rd ) when-
ever s > s . The spaces H s (Rd ) are also (complex) Hilbert spaces, with the
Hilbert space inner product
The H s Sobolev spaces also enjoy the same type of embedding estimates
as their classical counterparts:
Exercise 1.14.31 (Sobolev embedding for H s , I). If s > d/2, show
that H s (Rd ) embeds continuously into C 0,α (Rd ) whenever 0 < α ≤
min(s − d2 , 1). (Hint: Use the Fourier inversion formula and the Cauchy-
Schwarz inequality.)
Exercise 1.14.32 (Sobolev embedding for H s , II). If 0 < s < d/2, show
that H s (Rd ) embeds continuously into Lq (Rd ) whenever d2 − s ≤ dq ≤ d2 .
(Hint: It suffices to handle the extreme case dq = d2 − s. For this, first
reduce to establishing the bound f Lq (Rd ) ≤ Cf H s (Rd ) to the case when
f ∈ H s (Rd ) is a Schwartz function whose Fourier transform vanishes near
the origin (and C depends on s, d, q), and write fˆ(ξ) = ĝ(ξ)/|ξ|s for some
g which is bounded in L2 (Rd ). Then use Exercise 1.13.35 and Corollary
1.11.18).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 253
(i) For any 0 < α < 1, establish the inclusions Λ2α+ε (Rd ) ⊂ H α (Rd ) ⊂
Λ2α (Rd ) for any 0 < ε < 1 − α. (Hint: Take Fourier transforms and
work in frequency space.)
(ii) Let φ ∈ Cc∞ (Rd ) be a bump function, and let φn be the approxima-
tions to the identity φn (x) := 2dn φ(2n x). If f ∈ Λpα (Rd ), show that
one has the equivalence
f Λpα (Rd ) ∼ f Lp (Rd ) + sup 2αn f ∗ φn+1 − f ∗ φn Lp (Rd ) ,
n≥0
where we use x ∼ y to denote the assertion that x and y are compa-
rable up to constants depending on p, d, α. (Hint: To upper bound
τx f − f Lp (Rd ) for |x| ≤ 1, express f as a telescoping sum of
f ∗ φn+1 − f ∗ φn for 2−n ≤ x, plus a final term f ∗ φn0 where 2−n0
is comparable to x.)
(iii) If 1 ≤ p ≤ q ≤ ∞ and 0 < α < 1 are such that dp − α < dq , show that
Λpα (Rd ) embeds continuously into Lq (Rd ). (Hint: Express f (x) as
f ∗ φ1 ∗ φ0 plus a telescoping series of f ∗ φn+1 ∗ φn − f ∗ φn ∗ φn−1 ,
where φn is as in the previous exercise. The additional convolution
is in place in order to apply Young’s inequality.)
The functions f ∗ φn+1 − f ∗ φn are crude versions of Littlewood-Paley pro-
jections, which play an important role in harmonic analysis and non-linear
wave and dispersive equations.
Exercise 1.14.34 (Sobolev trace theorem, special case). Let s > 1/2. For
any f ∈ Cc∞ (Rd ), establish the Sobolev trace inequality
f Rd−1 H s−1/2 (Rd ) ≤ Cf H s (Rd ) ,
where C depends only on d and s, and f Rd−1 is the restriction of f to the
standard hyperplane Rd−1 ≡ Rd−1 × {0} ⊂ Rd . (Hint: Convert everything
to L2 -based statements involving the Fourier transform of f , and use Schur’s
test; see Lemma 1.11.14.)
Exercise 1.14.35. (i) Show that if f ∈ H s (Rd ) for some s ∈ R and
∞
g ∈ C (R ), then f g ∈ H s (Rd ) (note that this product has to be
d
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
254 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.14. Sobolev spaces 255
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 1.15
Hausdorff dimension
• One can also try to define dimension inductively, for instance declar-
ing a space X to be n-dimensional if it can be separated somehow
by an (n − 1)-dimensional object; thus an n-dimensional object will
257
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
258 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 259
a point, line, and plane in R3 all have zero measure with respect to three-
dimensional Lebesgue measure (and are nowhere dense), but of course have
different dimensions (0, 1, and 2, respectively). (Another good example is
provided by Kakeya sets.) This can be used to clarify the nature of vari-
ous singularities, such as that arising from non-smooth solutions to PDE.
A function which is non-smooth on a set of large Hausdorff dimension can
be considered less smooth than one which is non-smooth on a set of small
Hausdorff dimension, even if both are smooth almost everywhere. While
many properties of the singular set of such a function are worth studying
(e.g., their rectifiability), understanding their dimension is often an impor-
tant starting point. The interplay between these types of concepts is the
subject of geometric measure theory.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
260 1. Real analysis
Exercise 1.15.1.∞ (i) −iLet C ⊂ R be the Cantor set consisting of all base
4 strings i=1 ai 4 , where each ai takes values in {0, 3}. Show that
C has Minkowski dimension 1/2. (Hint: Approximate any small δ
by a negative power of 4.)
(ii) Let C ⊂R be the Cantor set consisting of all base 4 strings ∞ −i
i=1 ai 4 ,
where each ai takes values in {0, 3} when (2k)! ≤ i < (2k + 1)! for
some integer k ≥ 0 and ai is arbitrary for the other values of i.
Show that C has a lower Minkowski dimension of 1/2 and an upper
Minkowski dimension of 1.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 261
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
262 1. Real analysis
Note however that the dimension of graphs can become larger than that
of the base in the non-Lipschitz case:
Exercise 1.15.6. Show that the graph {(x, sin x1 ) : 0 < x < 1} has Min-
kowski dimension 3/2.
Exercise 1.15.7. Let (X, d) be a bounded metric space. For each n ≥ 0,
let En be a maximal 2−n -net of X (thus the cardinality of En is N2net
−n (X)).
Show that for any continuous function f : X → R and any x0 ∈ X, one has
the inequality
sup f (x) ≤ sup f (x0 )
x∈X x0 ∈E0
∞
+ sup (f (xn ) − f (xn+1 )).
3 −n
n=0 xn ∈En ,xn+1 ∈En+1 :|xn −xn+1 |≤ 2 2
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 263
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
264 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 265
where the balls B(xk , rk ) are now restricted to be less than or equal to r in
radius. This quantity is increasing in r, and we then define the Hausdorff
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
266 1. Real analysis
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 267
For any m, we have the telescoping sum E\A = (E\A1/m ) ∪ l>m Fl , where
Fl := (E\A1/(l+1) ) ∩ Al , and thus by countable subadditivity and mono-
tonicity,
(Hd )∗ (E\A1/m ) ≤ (Hd )∗ (E\A) ≤ (Hd )∗ (E\A1/m ) + (Hd )∗ (Fl ),
l>m
∞ d )∗ (F )
so it suffices to show that the sum l=1 (H l is absolutely convergent.
Consider the even-indexed sets F2 , F4 , F6 , . . .. These sets are separated
from each other, so by many applications of Exercise 1.15.10 followed by
monotonicity we have
L
L
(Hd )∗ (F2l ) = (Hd )∗ ( F2l ) ≤ (Hd )∗ (E\A) < ∞
l=1 l=1
∞ d ∗
for all L, and thus l=1 (H ) (F2l )
is absolutely convergent. Similarly for
∞ d )∗ (F
l=1 (H 2l−1 ), and the claim follows.
On the (Hd )∗ -measurable sets E, we write Hd (E) for (Hd )∗ (E), thus Hd
is a Borel measure on Rn . We now study what this measure looks like for
various values of d. The case d = 0 is easy:
Exercise 1.15.11. Show that every subset of Rn is (H0 )∗ -measurable, and
that H0 is counting measure.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
268 1. Real analysis
In the opposite direction, observe from Exercise 1.15.4 that given any
0 < r < 1, one can cover the unit cube [0, 1]n by at most Cn r−n balls of
radius r, where Cn depends only on n; thus
Hn ([0, 1]n ) ≤ Cn
and so c ≤ Cn ; in particular, c is finite.
We can in fact compute c explicitly (although knowing that c is finite
and non-zero already suffices for many applications):
Lemma 1.15.4. We have c = ω1n , or in other words Hn = ω1n voln . (In
particular, a ball B n (x, r) has n-dimensional Hausdorff measure rn .)
Proof. Let us consider the Hausdorff measure Hn ([0, 1]n ) of the unit cube.
By definition, for any ε > 0 one can find an 0 < r < 1/2 such that
hn,r ([0, 1]n ) ≥ Hn ([0, 1]n ) − ε.
Observe (using Exercise 1.15.4) that we can find at least cn r−n disjoint balls
B(x1 , r), . . . , B(xk , r) of radius r inside the unit cube. We then observe that
k
hn,r ([0, 1]n ) ≤ krn + Hn ([0, 1]n \ B(xk , r)).
i=1
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 269
One can then compute d-dimensional Hausdorff measure for other sets
than subsets of d-dimensional affine subspaces by changes of variable. For
instance:
Exercise 1.15.12. Let 0 ≤ d ≤ n be an integer, let Ω be an open subset of
Rd , and let φ : Ω → Rn be a smooth injective map which is non-degenerate
in the sense that the Hessian Dφ (which is a d × n matrix) has full rank at
every point of Ω. For any compact subset E of Ω, establish the formula
1
Hd (φ(E)) = J dHd = J d vold ,
E ωd E
where the Jacobian J is the square root of the sum of squares of all the
determinants of the d × d minors of the d × n matrix Dφ. (Hint: By working
locally, one can assume that φ is the graph of some map from Ω to Rn−d , and
so can be inverted by the projection function; by working even more locally,
one can assume that the Jacobian is within an epsilon of being constant.
The image of a small ball in Ω then resembles a small ellipsoid in φ(Ω), and
conversely the projection of a small ball in φ(Ω) is a small ellipsoid in Ω.
Use some linear algebra and several variable calculus to relate the content
of these ellipsoids to the radius of the ball.) It is possible to extend this
formula to Lipschitz maps φ : Ω → Rn that are not necessarily injective,
leading to the area formula
1
#(φ−1 (y)) dHd (y) = J d vold
φ(E) ωd E
for such maps, but we will not prove this formula here.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
270 1. Real analysis
On the other hand, we know from Exercise 1.15.11 that H0 (E) is positive
for any non-empty set E, and that Hd (E) = 0 for every d > n. We conclude
(from the least upper bound property of the reals) that for any Borel set E ⊂
Rn , there exists a unique number in [0, n], called the Hausdorff dimension
dimH (E) of E, such that Hd (E) = 0 for all d > dimH (E) and Hd (E) =
∞ for all d < dimH (E). Note that at the critical dimension d = dimH
itself, we allow Hd (E) to be zero, finite, or infinite, and we shall shortly
see in fact that all three possibilities can occur. By convention, we give the
empty set a Hausdorff dimension of −∞. One can also assign Hausdorff
dimension to non-Borel sets, but we shall not do so to avoid some (very
minor) technicalities.
Example 1.15.6. The unit ball B d (0, 1) ⊂ Rd ⊂ Rn has Hausdorff dimen-
sion d, as does Rd itself. Note that the former set has finite d-dimensional
Hausdorff measure, while the latter has an infinite measure. More generally,
any d-dimensional smooth manifold in Rn has Hausdorff dimension d.
Exercise 1.15.14. Show that the graph {(x, sin x1 ) : 0 < x < 1} has Haus-
dorff dimension 1; compare this with Exercise 1.15.6.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 271
• Show that dimM (E) ≤ d if and only if, for every ε > 0 and all
sufficiently small r > 0, one can cover E by finitely many balls
B(x1 , r1 ), . . . , B(xk , rk ) of radii ri = r equal to r such that ki=1 rid+ε
≤ ε.
• Show that dimH (E) ≤ d if and only if, for every ε > 0 and r > 0, one
can cover E by countably many balls B(x1 , r1 ), . . . of radii ri ≤ r at
most r such that ki=1 rid+ε ≤ ε.
The previous two exercises give ways to upper-bound the Hausdorff di-
mension; for instance, we see from Exercise 1.15.2 that self-similar fractals
E of the type in that exercise (i.e., E is k translates of r · E) have Hausdorff
log k
dimension at most log 1/r . To lower-bound the Hausdorff dimension of a set
E, one convenient way to do so is to find a measure with a certain dimension
property (analogous to (1.125)) that assigns a positive mass to E:
Exercise 1.15.17. Let d ≥ 0. A Borel measure μ on Rn is said to be a
Frostman measure of dimension at most d if it is compactly supported there
exists a constant C such that μ(B(x, r)) ≤ Crd for all balls B(x, r) of radius
0 < r < 1. Show that if μ has dimension at most d, then any Borel set E
with μ(E) > 0 has positive d-dimensional Hausdorff content; in particular,
dimH (E) ≥ d.
Note that this gives an alternate way to justify the fact that smooth
d-dimensional manifolds have Hausdorff dimension d, since on the one hand
they have Minkowski dimension d, and on the other hand they support a
non-trivial d-dimensional measure, namely Lebesgue measure.
Exercise 1.15.18. Show that the Cantor set in Exercise 1.15.1(i) has Haus-
dorff dimension 1/2. More generally, establish the analogue of the first part
of Exercise 1.15.2 for Hausdorff measure.
Exercise 1.15.19. Construct a subset of R of Hausdorff dimension 1 that
has zero Lebesgue measure. (Hint: A modified Cantor set, vaguely reminis-
cent of Exercise 1.15.1(ii), can work here.)
Proof. Without loss of generality we may place the compact set E in the
half-open unit cube [0, 1)n . It is convenient to work dyadically. For each
integer k ≥ 0, we subdivide [0, 1)n into 2kn half-open cubes Qk,1 , . . . , Qk,2nk
of side length (Qk,i ) = 2−k in the usual manner, and refer to such cubes
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
272 1. Real analysis
as dyadic cubes. For each k and any F ⊂ [0, 1)n , we can define the dyadic
Hausdorff content hΔd,k (F ) to be the quantity
hΔ
d,2−k (F ) := inf{ (Qkj ,ij )d : Qkj ,ij cover F ; kj ≥ k},
j
where the Qkj ,ij range over all at most countable families of dyadic cubes of
side length at most 2−k that cover F . By covering cubes by balls and vice
versa, it is not hard to see that
chd,C2−k (F ) ≤ hΔ
d,2−k (F ) ≤ Chd,c2−k (F )
for some absolute constants c, C depending only on d, n. Thus, if we define
the dyadic Hausdorff measure
(Hd )Δ (F ) := lim hΔ
d,2−k (F ),
k→∞
then we see that the dyadic and non-dyadic Huausdorff measures are com-
parable:
cHd (F ) ≤ (Hd )Δ (F ) ≤ C(Hd )Δ (F ).
In particular, the quantity σ := (Hd )Δ (E) is strictly positive.
Given any dyadic cube Q of length (Q) = 2−k , define the upper Frost-
man content μ+ (Q) to be the quantity
d,k (E ∩ Q).
μ+ (Q) := hΔ
Then μ+ ([0, 1)n ) ≥ σ. By covering E ∩ Q by Q, we also have the bound
μ+ (Q) ≤ (Q)d .
Finally, by the subadditivity property of Hausdorff content, if we decompose
Q into 2n cubes Q of side length (Q ) = 2−k−1 , we have
μ+ (Q) ≤ μ+ (Q ).
Q
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
1.15. Hausdorff dimension 273
at the largest cube [0, 1)n and working downward; we omit the details. One
can then use this measure μ to integrate any continuous compactly sup-
ported function on Rn (by approximating such a function by one which is
constant on dyadic cubes of a certain scale), and so by the Riesz represen-
tation theorem, it extends to a Radon measure μ supported on [0, 1]n . (One
could also have used the Caratheódory extension theorem at this point.)
Since μ([0, 1)n ) ≥ σ, μ is non-trivial; since μ(Q) ≤ μ+ (Q) ≤ (Q)d for
all dyadic cubes Q, it is not hard to see that μ is a Frostman measure of
dimension at most d, as desired.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
274 1. Real analysis
This should be compared with the task of lower-bounding the lower Minkow-
ski dimension, which only requires control on the entropy of E itself, rather
than of large subsets E of E. The results of this exercise are exploited
to establish lower bounds on the Hausdorff dimension of Kakeya sets (and
in particular, to conclude such bounds from the Kakeya maximal function
conjecture).
Exercise 1.15.22. Let E ⊂ Rn be a Borel set, and let φ : E → Rm be a
locally Lipschitz map. Show that dimH (φ(E)) ≤ dimH (E), and that if E
has zero d-dimensional Hausdorff measure then so does φ(E).
Exercise 1.15.23. Let φ : Rn → R be a smooth function, and let g : Rn →
R be a test function such that |∇φ| > 0 on the support of g. Establish the
co-area formula
for any test function φ, the main point being that φ−1 (t) ∪ φ−1 (−t) is the
boundary of {|φ| ≥ t} (one also needs to do some manipulations relating the
volume of those level sets to φ n−1n
n
). We omit the details.
L (R )
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Chapter 2
Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.1
An alternate approach
to the Carathéodory
extension theorem
In this section, I would like to give an alternate proof of (a weak form of)
the Carathéodory extension theorem (Theorem 1.1.17). This argument is
restricted to the σ-finite case and does not extend the measure to quite
as large a σ-algebra as is provided by the standard proof of this theorem.
But I find it conceptually clearer (in particular, hewing quite closely to
Littlewood’s principles, and the general Lebesgue philosophy of treating sets
of small measure as negligible), and it suffices for many standard applications
of this theorem, in particular the construction of Lebesgue measure.
Let us first state the precise statement of the theorem:
(i) μ(∅) = 0.
(ii) Pre-countable
∞ A2 · · · ∈ A are
additivity. If A1 , disjoint and such that
∞
∞
n=1 A n also lies in A, then μ( n=1 A n ) = n=1 μ(An ).
(iii) σ-finiteness. X can be covered by at most countably many sets in A,
each of which has finite μ-measure.
277
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
278 2. Related articles
2.1.1. Some basics. Let us first observe that the hypotheses on the pre-
measure μ imply some other basic and useful properties:
From properties (i) and (ii) we see that μ is finitely additive (thus
μ(A1 ∪ · · · ∪ An ) = μ(A1 ) + · · · + μ(An ) whenever A1 , . . . , An are disjoint
elementary sets).
As particular consequences of finite additivity, we have monotonicity
(μ(A) ≤ μ(B) whenever A ⊂ B are elementary sets) and finite subadditivity
(μ(A1 ∪ · · · ∪ An ) ≤ μ(A1 ) + · · · + μ(An ) for all elementary A1 , . . . , An , not
necessarily disjoint).
We also have precountable subadditivity: μ(A) ≤ ∞ n=1 μ(An ) whenever
the elementary sets A1 , A2 , . . . cover the
n−1 elementary set A. To see this, first
observe, by replacing An with An \ i=1 Ai and using monotonicity, that we
may take the Ai to be disjoint. Next, by restricting all the Ai to A and
using monotonicity, we may assume that A is the union of the Ai . Now the
claim is immediate from precountable additivity.
distance.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.1. Carathéodory extension theorem 279
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
280 2. Related articles
the rationals are countable, we easily see that every set of rationals is mea-
surable. One easily verifies the precountable additivity condition (though
the σ-finiteness condition fails horribly). However, μ has multiple extensions
to the measurable sets; for instance, any positive scalar multiple of counting
measure is such an extension.
Remark 2.1.3. It is not difficult to show that the measure completion X
of X with respect to μ is the same as the topological closure of X (or of A)
with respect to the above pseudometric. Thus, for instance, a subset of [0, 1]
is Lebesgue measurable if and only if it can be approximated to arbitrary
accuracy (with respect to outer measure) by a finite union of intervals.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.2
Amenability, the
ping-pong lemma, and
the Banach-Tarski
paradox
281
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
282 2. Related articles
using countably many such pieces; this rules out the possibility of extending
Lebesgue measure to a countably additive translation invariant measure on
all subsets of R (or any higher-dimensional space).
In this section we will establish all of the above results, and tie them
in with some important concepts and tools in modern group theory, most
notably amenability and the ping-pong lemma.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 283
Proof. By Exercise 2.2.2, it will suffice to show that some set contained in
[0, 1] is countably R-equidecomposable with R. Consider the space R/Q of
all cosets x + Q of the rationals. By the axiom of choice, we can express
each such coset as x + Q for some x ∈ [0, 1/2], thus we can partition R =
x∈E x+Q for some E ⊂ [0, 1/2]. By Example 2.2.3, Q∩[0, 1/2] is countably
Q-equidecomposable with Q, which implies
that x∈E x + (Q ∩ [0, 1/2]) is
countably R-equidecomposable with x∈E x + Q. Since the latter set is R
and the former set is contained in [0, 1], the claim follows.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
284 2. Related articles
Proof. One easily sees that any two sets that are finitely or countably
G-equidecomposable must have the same cardinality. The claim follows.
Proof. The integers are of course infinite, and so Proposition 2.2.6 does not
apply directly. However, the key point is that the integers can be efficiently
truncated to be finite, and so we will be able to adapt the argument used
to prove Proposition 2.2.6 to this setting.
Let’s see how. Suppose for contradiction that we could partition Z into
two sets
A and B, whichare in turn partitioned into finitely many pieces
A = ni=1 Ai and B = m B , such that Z can be partitioned as Z =
n m j=1 j
i=1 Ai + ai and Z = j=1 Bj + bj for some integers a1 , . . . , an , b1 , . . . , bm .
Now let N be a large integer (much larger than n, m, a1 , . . . , an , b1 , . . .,
bm ). We truncate Z to the interval [−N, N ] := {−N, . . . , N }. Clearly,
n
(2.1) A ∩ [−N, N ] = Ai ∩ [−N, N ]
i=1
and
n
(2.2) [−N, N ] = (Ai + ai ) ∩ [−N, N ].
i=1
From (2.2) we see that the set ni=1 (Ai ∩ [−N, N ]) + ai differs from [−N, N ]
by only O(1) elements, where the bound in the O(1) expression can depend
on n, a1 , . . . , an but does not depend on N . (The point here is that [−N, N ]
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 285
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
286 2. Related articles
Proof. Suppose, for contradiction, that we can partition A into two sets
A = A1 ∪ A2 which are both finitely R-equidecomposable with A. This
gives us two maps f1 : A → A1 , f2 : A → A2 which are piecewise given by
a finite number of translations; thus there exists a finite set g1 , . . . , gd ∈ R
such that fi (x) ∈ x + {g1 , . . . , gd } for all x ∈ A and i = 1, 2.
For any integer N ≥ 1, consider the 2N composition maps fi1 ◦ · · · ◦ fiN :
A → A for i1 , . . . , iN ∈ {1, 2}. From the disjointness of A1 , A2 and an easy
induction, we see that the ranges of all these maps are disjoint, and so for
any x ∈ A the 2N quantities fi1 ◦ · · · ◦ fiN (x) are distinct. On the other
hand, we have
(2.5) fi1 ◦ · · · ◦ fiN (x) ∈ x + {g1 , . . . , gd } + · · · + {g1 , . . . , gd }.
Simple combinatorics (relying primarily on the abelian nature of (R, +)
shows that the number of values on the right-hand side of (2.5) is at most
N d . But for sufficiently large N , we have 2N > N d , giving the desired
contradiction.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 287
Exercise 2.2.9. Show that every abelian group has subexponential growth
(and is thus supramenable). More generally, show that every nilpotent group
has subexponential growth and is thus also supramenable.
Exercise 2.2.10. Show that if two finite unions of intervals in R are finitely
R-equidecomposable, then they must have the same total length. (Hint:
Reduce to the case when both sets consist of a single interval. First show
that the lengths of these intervals cannot differ by more than a factor of
two, and then amplify this fact by iteration to conclude the result.)
Remark 2.2.13. We already saw that amenable groups G admit finitely
additive translation-invariant probability measures that measure all subsets
of G (Remark 2.2.11 can be extended to the uncountable case); in fact, this
turns out to be an equivalent definition of amenability. It turns out that
supramenable groups G enjoy a stronger property, namely that given any
non-empty set A on G, there exists a finitely additive translation-invariant
measure on G that assigns the measure 1 to A; this is basically a deep result
of Tarski.
Proof. Let S be the semigroup generated by g and h (i.e., the set of all words
formed by g and h, including the empty word (i.e., group identity). Observe
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
288 2. Related articles
We have seen that the group of rigid motions is not supramenable. Nev-
ertheless, it is still amenable, thanks to the following lemma.
Lemma 2.2.18. Suppose one has a short exact sequence 0 → H → G →
K → 0 of discrete, at most countable, groups, and suppose one has a choice
function φ : K → G that inverts the projection of G to K (the existence
of which is automatic, from the axiom of choice, and also follows if G is
finitely generated ). If H and K are amenable, then so is G.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 289
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
290 2. Related articles
Exercise 2.2.13. Show that all the claims in this section continue to hold
if we replace SO(2) R2 by the slightly larger group Isom(R)2 = O(2) R2
of isometries (not necessarily orientation-preserving.
Next, we embed the free group inside the rotation group SO(3) using
the following useful lemma (cf. Lemma 2.2.15).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.2. The Banach-Tarski paradox 291
construction.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
292 2. Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.3
293
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
294 2. Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.3. Stone and Loomis-Sikorski 295
Note that if (X, A), (Y, B) are concrete Boolean algebras, and if f : X →
Y is a map which is measurable in the sense that f −1 (B) ∈ A for all B ∈ B,
then the inverse of f is a Boolean algebra morphism f −1 : B → A which goes
in the reverse (i.e., contravariant) direction to that of f . To state Stone’s
representation theorem we need another definition.
Definition 2.3.2 (Stone space). A Stone space is a topological space X =
(X, F) which is compact, Hausdorff, and totally disconnected. Given a Stone
space, define the clopen algebra Cl(X) of X to be the concrete Boolean
algebra on X consisting of the clopen sets (i.e., sets that are both closed
and open).
It is easy to see that Cl(X) is indeed a concrete Boolean algebra for any
topological space X. The additional properties of being compact, Hausdorff,
and totally disconnected are needed in order to recover the topology F of
X uniquely from the clopen algebra. Indeed, we have
Lemma 2.3.3. If X is a Stone space, then the topology F of X is generated
by the clopen algebra Cl(X). Equivalently, the clopen algebra forms an open
base for the topology.
Proof. Let x ∈ X be a point, and let K be the intersection of all the clopen
sets containing x. Clearly, K is closed. We claim that K = {x}. If this is
not the case, then (since X is totally disconnected) K must be disconnected,
thus K can be separated non-trivially into two closed sets K = K1 ∪ K2 .
Since compact Hausdorff spaces are normal, we can write K1 = K ∩ U1 and
K2 = K ∩ U2 for some disjoint open U1 , U2 . Since the intersection of all the
clopen sets containing x with the closed set (U1 ∪ U2 )c is empty, we see from
the finite intersection property that there must be a finite intersection K
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
296 2. Related articles
Proof. We will need the binary abstract Boolean algebra {0, 1}, with the
usual Boolean logic operations. We define X := Hom(B, {0, 1}) to be the
space of all morphisms from B to {0,1}. Observe that each point x ∈ X can
be viewed as a finitely additive measure μx : B → {0, 1} that takes values in
{0, 1}. In particular, this makes X a closed subset of {0, 1}B (endowed with
the product topology). The space {0, 1}B is Hausdorff, totally disconnected,
and (by Tychonoff’s theorem, Theorem 1.8.14) compact, and so X is also;
in other words, X is a Stone space. Every B ∈ B induces a cylinder set
CB ⊂ {0, 1}B , consisting of all maps μ : B → {0, 1} that map B to 1. If we
define φ(B) := CB ∩ X, it is not hard to see that φ is a morphism from B
to Cl(X). Since the cylinder sets are clopen and generate the topology of
{0, 1}B , we see that φ(B) of clopen sets generates the topology of X. Using
compactness, we then conclude that every clopen set is the finite union of
finite intersections of elements of φ(B); since φ(B) is an algebra, we thus see
that φ is surjective.
The only remaining task is to check that φ is injective. It is sufficient
to show that φ(A) is non-empty whenever A ∈ B is not equal to ∅. But
by Zorn’s lemma (Section 2.4), we can place A inside a maximal proper
filter (i.e., an ultrafilter ) p. The indictator 1p : B → {0, 1} of p can then be
verified to be an element of φ(A), and the claim follows.
Remark 2.3.5. If B = 2Y is the power set of some set Y , then the Stone
space given by Theorem 2.3.4 is the Stone-Čech compactification of Y (which
we give the discrete topology); see Section 2.5.
Remark 2.3.6. Lemma 2.3.3 and Theorem 2.3.4 can be interpreted as
giving a duality between the category of Boolean algebras and the cate-
gory of Stone spaces, with the duality maps being B → Hom(B, {0, 1}) and
X → Cl(X). (The duality maps are (contravariant) functors which are
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.3. Stone and Loomis-Sikorski 297
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
298 2. Related articles
However, it turns out that quotienting out by ideals is the only obstruc-
tion to having a Stone-type representation theorem. Namely, we have
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.3. Stone and Loomis-Sikorski 299
∅ and ∞
(1) (1)
n=1 An = ∅, we can find n1 such that A\An1 = ∅ (where of
course A\B := A ∩ B c ). Iterating this, we can find n2 , n3 , n4 , . . . such that
(1) (k)
A\(An1 ∪ · · · ∪ Ank ) = ∅ for all k. Since φ is a Boolean space isomorphism,
we conclude that φ(A) is not covered by any finite subcollection of the
(1) (2)
φ(An1 ), φ(An2 ), . . .. But all of these sets are clopen, so by compactness,
(1) (2)
φ(A) is not covered by the entire collection φ(An1 ), φ(An2 ), . . .. But this
∞ (i)
contradicts the fact that φ(A) is covered by the n=1 φ(An ).
Remark 2.3.11. The proof above actually gives a little bit more structure
on X, A, namely it gives X the structure of a Stone space, with A being
its Baire σ-algebra. Furthermore, the ideal N constructed in the proof is in
fact the ideal of meager Baire sets. The only difficult step is to show that
every closed Baire set S with empty interior is in N , i.e., it is a countable
intersection of clopen sets. To see this, note that S is generated by a count-
able subalgebra of B which corresponds to a continuous map f from X to
the Cantor set K (since K is dual to the free Boolean algebra on countably
many generators). Then f (S) is closed in K and is hence a countable inter-
section of clopen sets in K, which pull back to countably many clopen sets
on X whose intersection is f −1 (f (S)). But the fact that S is generated by
the subalgebra defining f can easily be seen to imply that f −1 (f (S)) = S.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
300 2. Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.4
Well-ordered sets,
ordinals, and Zorn’s
lemma
Indeed, we have used this lemma several times already in previous sec-
tions. Given the other standard axioms of set theory, this lemma is logically
equivalent to
Axiom 2.4.2 (Axiom of choice). Let X be a set, and let F be a collection
of non-empty subsets of X. Then there exists a choice function f : F → X,
i.e., a function such that f (A) ∈ A for all A ∈ F .
301
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
302 2. Related articles
In the rest of this section I would like to supply the reverse implication,
using the machinery of well-ordered sets. Instead of giving the shortest or
slickest proof of Zorn’s lemma here, I would like to take the opportunity to
place the lemma in the context of several related topics, such as ordinals and
transfinite induction, noting that much of this material is in fact independent
of the axiom of choice. The material here is standard, but for the purposes
of real analysis, one may simply take Zorn’s lemma as a black box and not
worry about the proof.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 303
One of the reasons that well-ordered sets are useful is that one can
perform induction on them. This is easiest to describe for the principle of
strong induction:
Exercise 2.4.1 (Strong induction on well-ordered sets). Let X be a well-
ordered set, and let P : X → {true, false} be a property of elements of X.
Suppose that whenever x ∈ X is such that P (y) is true for all y < x, then
P (x) is true. Then P (x) is true for every x ∈ X. This is called the principle
of strong induction. Conversely, show that a totally ordered set X enjoys the
principle of strong induction if and only if it is well ordered. (For partially
ordered sets, the corresponding notion is that of being well founded.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
304 2. Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 305
As Example 2.4.11 suggests, there are very few morphisms between well-
ordered sets. Indeed, we have
Proposition 2.4.13 (Uniqueness of morphisms). Given two well-ordered
sets X and Y , there is at most one morphism from X and Y .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
306 2. Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 307
and so forth. (Of course, to be compatible with the English language con-
ventions for ordinals, we should write 1st instead of 1th , etc., but let us ignore
this discrepancy.) One can easily check by induction that nth is an ordinal
for every n. Furthermore, if we define ω := {nth : n ∈ N}, then ω is also an
ordinal. (In the foundations of set theory, this construction, together with
the axiom of infinity, is sometimes used to define the natural numbers (so
that n = nth for all natural numbers n), although this construction can lead
to some conceptually strange-looking consequences that blur the distinction
between numbers and sets, such as 3 ∈ 5 and 4 = {0, 1, 2, 3}.)
Proof. We first prove (i). From Proposition 2.4.14 and symmetry, we may
assume that there is a morphism φ from α to β. By strong induction (Ex-
ercise 2.4.1) and Definition 2.4.19, we see that φ(x) = x for all x ∈ α, and
so φ is the inclusion map from α into β. The claim follows.
Now we prove (ii). If uniqueness failed, then we would have two distinct
ordinals that are isomorphic to each other, but as one ordinal is a subset of
the other, this would contradict Proposition 2.4.13 (the inclusion morphism
is not an isomorphism); so it suffices to prove existence.
We use transfinite induction. It suffices to show that for every a ∈
X ⊕ {+∞}, that [min(X), a) is isomorphic to an ordinal α(a) (which we
know to be unique). This is of course true in the base case a = min(X). To
handle the successor case a = succ(b), we set α(a) := α(b) ∪ {α(b)}, which
is easily verified to be an ordinal isomorphic to [min(X), a). To handle
the limit case a = sup([min(X), a)), we take all the ordinals associated to
elements in [min(X), a) and take their union (here we rely crucially on the
axiom schema of replacement and the axiom of union); by use of (i) one can
show that this union is an ordinal isomorphic to a as required.
Remark 2.4.22. Operations on well-ordered sets, such as the sum ⊕ and
product ⊗ defined in Exercises 2.4.3 and 2.4.4, induce corresponding oper-
ations on ordinals, leading to ordinal arithmetic, which we will not discuss
here. (Note that the convention for which order multiplication proceeds in
is swapped in some of the literature, thus αβ would be the ordinal of β ⊗ α
rather than α ⊗ β.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
308 2. Related articles
Now we show a fundamental fact, that the well-ordered sets are just too
numerous to all fit inside a single set, even modulo isomorphism.
Theorem 2.4.24. There does not exist a set A and a representation ρ of
the well-ordered sets such that ρ(X) ∈ A for all well-ordered sets X.
all the well-orderings of subsets of the natural numbers, and taking the union of their associated
ordinals; this construction is due to Hartog.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.4. Zorn’s lemma 309
Proof. Suppose for contradiction that there existed X and g with the above
properties. Then, given any well-ordered set Y , we claim that there exists
exactly one isomorphism φY : Y → ρ(Y ) from Y to a well-ordered set ρ(Y )
in X such that φY (y) = g(φY ([min(Y ), y))) for all y ∈ Y . Indeed, the
uniqueness and existence can both be established by a transfinite induction
that we leave as an exercise. (Informally, φY is what one gets by “applying
g Y times, starting with the empty set”.) From uniqueness we see that
ρ(Y ) = ρ(Y ) whenever Y and Y are isomorphic, and another transfinite
induction shows that ρ(Y ) ⊂ ρ(Y ) whenever Y is a subset of Y . Thus ρ is
a representation of the ordinals. But this contradicts Theorem 2.4.24.
Remark 2.4.28. One can use transfinite induction on ordinals rather than
well-ordered sets if one wishes here, using Remark 2.4.26 in place of Theorem
2.4.24.
Proof of Zorn’s lemma. Suppose for contradiction that one had a non-
empty partially ordered set X without maximal elements, such that every
chain had an upper bound. As there are no maximal elements, every element
in X must be bounded by a strictly larger element in X, and so every chain
in fact has a strict upper bound; in particular every well-ordered set has
a strict upper bound. Applying the axiom of choice, we may thus find a
choice function g : C → X from the space of well-ordered sets in X to X,
that maps every such set to a strict upper bound. But this contradicts
Proposition 2.4.27.
Remark 2.4.29. It is important for Zorn’s lemma that X is a set, rather
than a class. Consider for instance the class of all ordinals. Every chain
of ordinals has an upper bound (namely, the union of the ordinals in that
chain), and the class is certainly non-empty, but there is no maximal ordinal.
(Compare also Theorem 2.4.21 and Theorem 2.4.24.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
310 2. Related articles
Remark 2.4.30. It is also important that every chain have an upper bound,
and not just countable chains. Indeed, the collection of countable subsets
of an uncountable set (such as R) is non-empty, and every countable chain
has an upper bound, but there is no maximal element.
Remark 2.4.31. The above argument shows that the hypothesis of Zorn’s
lemma can be relaxed slightly; one does not need every chain to have an
upper bound, merely every well-ordered set needs to have one. But I
do not know of any application in which this apparently stronger version
of Zorn’s lemma dramatically simplifies an argument. (In practice, either
Zorn’s lemma can be applied routinely, or it fails utterly to be applicable at
all.)
Exercise 2.4.11. Use Zorn’s lemma to establish the well-ordering theorem
(every set has at least one well-ordering).
Remark 2.4.32. By the above exercise, R can be well-ordered. However,
if one drops the axiom of choice from the axioms of set theory, one can no
longer prove that R is well-ordered. Indeed, given a well-ordering of R, it
is not difficult (using Remark 2.4.8) to remove the axiom of choice from the
Banach-Tarski constructions in Section 2.2, and thus obtain constructions of
non-measurable subsets of R. But a deep theorem of Solovay gives a model
of set theory (without the axiom of choice) in which every set of reals is
measurable.
Exercise 2.4.12. Define a (von Neumann) cardinal to be an ordinal α with
the property that all smaller ordinals have strictly lesser cardinality (i.e.,
cannot be placed in one-to-one correspondence with α). Show that every
set can be placed in one-to-one correspondence with exactly one cardinal.
(This gives a representation of the category of sets, similar to how ord gives
a representation of well-ordered sets.)
It seems appropriate to close these notes with a quote from Jerry Bona:
The Axiom of Choice is obviously true, the well-ordering prin-
ciple obviously false, and who can tell about Zorn’s Lemma?
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.5
Compactification and
metrisation
311
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
312 2. Related articles
and unique, by the open dense nature of ι(X). Two compactifications are
equivalent if they are both finer than each other.
Example 2.5.2. Any compact set can be its own compactification. The real
line R can be compactified into [−π/2, π/2] by using the arctan function as
the embedding, or (equivalently) by embedding it into the extended real line
[−∞, ∞]. It can also be compactified into the unit circle {(x, y) ∈ R2 : x2 +
x2 −1
y 2 = 1} by using the stereographic projection x → ( 1+x 2x
2 , 1+x2 ). Notice that
the former embedding is finer than the latter. The plane R2 can similarly
be compactified into the unit sphere {(x, y, z) ∈ R2 : x2 + y 2 + z 2 = 1} by
2y x2 +y 2 −1
the stereographic projection (x, y) → ( 1+x2x
2 +y 2 , 1+x2 +y 2 , 1+x2 +y 2 ).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.5. Compactification 313
Example 2.5.4. From the above exercise, we can define limits limx→p f (x)
:= βf (p) for any bounded continuous function on X and any p ∈ βX. But
for coarser compactifications, one can only take limits for special types of
bounded continuous functions; for instance, using the one-point compactifi-
cation of R, limx→∞ f (x) need not exist for a bounded continuous function
f : R → R, e.g., limx→∞ sin(x) or limx→∞ arctan(x) do not exist. The
finer the compactification, the more limits can be defined; for instance the
two point compactification [−∞, +∞] of R allows one to define the lim-
its limx→+∞ f (x) and limx→−∞ f (x) for some additional functions f (e.g.,
limx→±∞ arctan(x) is well defined); and the Stone-Čech compactification is
the only compactification which allows one to take limits for any bounded
continuous function (e.g., limx→p sin(x) is well defined for all p ∈ βR).
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
314 2. Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.5. Compactification 315
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.6
Hardy’s uncertainty
principle
|f (x)|2 dx = |fˆ(ξ)|2 dξ = 1,
R R
317
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
318 2. Related articles
absolute constant C0 .
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.6. Hardy’s uncertainty principle 319
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
320 2. Related articles
x ∈ R, and for some C, a > 0. Then fˆ is smooth, and furthermore one has
k!π k/2
the bound |∂ξk fˆ(ξ)| ≤ √Ca (k/2)!a (k+1)/2 for all ξ ∈ R and every even integer k.
a R
k times at ξ = 0, we obtain
dk 1 −πξ2 /a
e−πax (2πix)k dx;
2
(√ e )|ξ=0 =
dξ k a R
√1 e−πξ /a
2
expanding out a
using Taylor series, we conclude that
k! (−π/a)k/2
e−πax (2πix)k dx.
2
√ =
a (k/2)! R
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.6. Hardy’s uncertainty principle 321
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
322 2. Related articles
ak
for ξ ∈ [−r, r]. If we pick r := cb k
for a sufficiently small absolute constant
c, we conclude that
1
|fˆ(ξ)| ≤ 2−k + O( )k/2
ab
(say) for ξ ∈ [−r, r]. If ab ≥ C0 for large enough C0 , the right-hand side
goes to zero as k → ∞ (which also implies r → ∞), and we conclude that fˆ
(and hence f ) vanishes identically.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.7
Create an epsilon of
room
323
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
324 2. Related articles
S0 Sε
f (x0 ) = g(x0 ) f (xε ) = g(xε ) + o(1)
f (x0 ) ≤ g(x0 ) f (xε ) ≤ g(xε ) + o(1)
f (x0 ) > 0 f (xε ) ≥ c − o(1) for some c > 0 independent of ε
f (x0 ) is finite f (xε ) is bounded uniformly in ε
f (x0 ) ≥ f (x) for all x ∈ X f (xε ) ≥ f (x) − o(1) for all x ∈ X
(i.e., x0 maximises f ) (i.e., xε nearly maximises f )
fn (x0 ) converges as n → ∞ fn (xε ) fluctuates by at most o(1) for
sufficiently large n
f0 is a measurable function fε is a measurable function converging
pointwise to f0
f0 is a continuous function fε is an equicontinuous family of functions converging
pointwise to f0
OR fε is continuous and converges
(locally) uniformly to f0
The event E0 holds a.s. The event Eε holds with probability 1 − o(1)
The statement P0 (x) holds for a.e. x The statement Pε (x) holds for x outside of
a set of measure o(1)
5 It is important to note that it is only the final conclusion S on x that needs to have this
ε ε
uniformity in ε; one is permitted to have some intermediate stages in the derivation of Sε that
depend on ε in a non-uniform manner, so long as these non-uniformities cancel out or otherwise
disappear at the end of the argument.
6 It is important to be aware, though, that any quantitative measure on how smooth, discrete,
finite, etc., xε should be expected to degrade in the limit ε → 0, and so one should take extreme
caution in using such quantitative measures to derive estimates that are uniform in ε.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 325
2.7.1. Examples. The soft analysis components of any real analysis text-
book will contain a large number of examples of this trick in action. In
particular, any argument which exploits Littlewood’s three principles of real
analysis is likely to utilise this trick. Of course, this trick also occurs repeat-
edly in Chapter , and thus was chosen as the title of this book.
Example 2.7.1 (Riemann-Lebesgue lemma). Given any absolutely inte-
grable function f ∈ L1 (R), the Fourier transform fˆ : R → C is defined by
the formula
fˆ(ξ) := f (x)e−2πixξ dx.
R
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
326 2. Related articles
The point is that fε is much better behaved than f , and it is not difficult
to show the analogue of the Riemann-Lebesgue lemma for fε . Indeed, being
smooth and compactly supported, we can now justifiably integrate by parts
to obtain
1
fˆε (ξ) = f (x)e−2πixξ dx
2πiξ R ε
for any non-zero ξ, and it is now clear (since f is bounded and compactly
supported) that fˆε (ξ) → 0 as ξ → ∞.
Now we need to take limits as ε → 0. It will be enough to have fˆε
converge uniformly to fˆ. But from (2.21) and the basic estimate
(which is the single hard analysis ingredient in the proof of the lemma)
applied to g := f − fε , we see (by the linearity of the Fourier transform)
that
sup |fˆ(ξ) − fˆε (ξ)| ≤ ε,
ξ
and we obtain the desired uniform convergence.
Remark 2.7.2. The same argument also shows that fˆ is continuous; we
leave this as an exercise to the reader. See also Exercise 1.12.11 for the
generalisation of this lemma to other locally compact abelian groups.
Remark 2.7.3. Example 2.7.1 is a model case of a much more general
instance of the limiting argument: in order to prove a convergence or conti-
nuity theorem for all rough functions in a function space, it suffices to first
prove convergence or continuity for a dense subclass of smooth functions,
and combine that with some quantitative estimate in the function space (in
this case, (2.22)) in order to justify the limiting argument. See Corollary
1.7.7 for an important example of this principle.
Example 2.7.4. The limiting argument in Example 2.7.1 relied on the
linearity of the Fourier transform f → fˆ. But, with more effort, it is also
possible to extend this type of argument to non-linear settings. We will
sketch (omitting several technical details, which can be found for instance
in my PDE book [Ta2006]) a very typical instance. Consider a nonlinear
PDE, e.g., the cubic non-linear wave equation
(2.23) −utt + uxx = u3 ,
where u : R × R → R is some scalar field, and the t and x subscripts denote
differentiation of the field u(t, x). If u is sufficiently smooth and sufficiently
decaying at spatial infinity, one can show that the energy
1 1 1
(2.24) E(u)(t) := |ut (t, x)|2 + |ux (t, x)|2 + |u(t, x)|4 dx
R 2 2 4
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 327
is conserved, thus E(u)(t) = E(u)(0) for all t. Indeed, this can be formally
justified by computing the derivative ∂t E(u)(t) by differentiating under the
integral sign, integrating by parts, and then applying the PDE (2.23); we
leave this as an exercise for the reader.7 However, these justifications do re-
quire a fair amount of regularity on the solution u; for instance, requiring u to
be three times continuously differentiable in space and time, and compactly
supported in space on each bounded time interval, would be sufficient to
make the computations rigorous by applying “off the shelf” theorems about
differentiation under the integration sign, etc.
But suppose one only has a much rougher solution, for instance an energy
class solution which has finite energy (2.24), but for which higher derivatives
of u need not exist in the classical sense.8 Then it is difficult to justify the
energy conservation law directly. However, it is still possible to obtain energy
conservation by the limiting argument. Namely, one takes the energy class
solution u at some initial time (e.g., t = 0) and approximates that initial data
(the initial position u(0) and initial data ut (0)) by a much smoother (and
(ε)
compactly supported) choice (u(ε) (0), ut (0)) of initial data, which converges
back to (u(0), ut (0)) in a suitable energy topology related to (2.24), which
we will not define here (it is based on Sobolev spaces, which are discussed
in Section 1.14). It then turns out (from the existence theory of the PDE
(ε)
(2.23)) that one can extend the smooth initial data (u(ε) (0), ut (0)) to other
times t, providing a smooth solution u(ε) to that data. For this solution, the
energy conservation law E(u(ε) )(t) = E(u(ε) )(0) can be justified.
(ε)
Now we take limits as ε → 0 (keeping t fixed). Since (u(ε) (0), ut (0))
converges in the energy topology to (u(0), ut (0)), and the energy functional
E is continuous in this topology, E(u(ε) )(0) converges to E(u)(0). To con-
clude the argument, we will also need E(u(ε) )(t) to converge to E(u)(t),
(ε)
which will be possible if (u(ε) (t), ut (t)) converges in the energy topology
to (u(t), ut (t)). This in turn follows from a fundamental fact (which re-
quires a certain amount of effort to prove) about the PDE to (2.24), namely
that it is well-posed in the energy class. This means that not only do solu-
tions exist and are unique for initial data in the energy class, but they also
depend continuously on the initial data in the energy topology; small per-
turbations in the data lead to small perturbations in the solution, or more
formally, the map (u(0), ut (0)) → (u(t), ut (t)) from data to solution (say,
at some fixed time t) is continuous in the energy topology. This final fact
7 There are also more fancy ways to see why the energy is conserved, using Hamiltonian or
Lagrangian mechanics or by the more general theory of stress-energy tensors, but we will not
discuss these here.
8 There is a non-trivial issue regarding how to make sense of the PDE (2.23) when u is only
in the energy class, since the terms utt and uxx do not then make sense classically, but there are
standard ways to deal with this, e.g., using weak derivatives; see Section 1.13.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
328 2. Related articles
concludes the limiting argument and gives us the desired conservation law
E(u(t)) = E(u(0)).
Remark 2.7.5. It is important that one have a suitable well-posedness
theory in order to make the limiting argument work for rough solutions to
a PDE; without such a well-posedness theory, it is possible for quantities
which are formally conserved to cease being conserved when the solutions
become too rough or otherwise weak; energy, for instance, could disappear
into a singularity and not come back.
Example 2.7.6 (Maximum principle). The maximum principle is a funda-
mental tool in elliptic and parabolic PDE (for example, it is used heavily
in the proof of the Poincaré conjecture, discussed extensively in Poincaré’s
legacies, Vol. II ). Here is a model example of this principle:
Proposition 2.7.7. Let u : D → R be a smooth harmonic function on
the closed unit disk D := {(x, y) : x2 + y 2 ≤ 1}. If M is a bound such that
u(x, y) ≤ M on the boundary ∂D := {(x, y) : x2 +y 2 = 1}, then u(x, y) ≤ M
on the interior as well.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 329
u(t) = tg ∗ σt
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
330 2. Related articles
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.7. Create an epsilon of room 331
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Section 2.8
Amenability
Theorem 2.8.1. Let G be a countable group. Then the following are equiv-
alent:
333
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
334 2. Related articles
(ii) For every finite set S ⊂ G and every ε > 0, there exists a finite mean
ν such that ν − τx ν 1 (G) ≤ ε for all x ∈ S.
(iii) For every finite set S ⊂ G and every ε > 0, there exists a non-empty
finite set A ⊂ G such that |(x · A)ΔA|/|A| ≤ ε for all x ∈ S.
(iv) There exists a sequence An of non-empty finite sets such that
|x · An ΔAn |/|An | → 0 as n → ∞ for each x ∈ G. (Such a sequence
is called a Følner sequence.)
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.8. Amenability 335
(iii) implies (iv): Write the countable group G as the increasing union
of finite sets Sn and apply (iii) with ε := 1/n and S := Sn to create the set
An .
(iv) implies (i): Use the Hahn-Banach theorem to select an infinite mean
ρ ∈ ∞ (N)∗ \ 1 (N), and define λ(m) = ρ((m, |A1n | 1An )n∈N ). (Alterna-
tively, one can define λ(m) to be an ultralimit of the m, |A1n | 1An .)
Any countable group obeying any (and hence all) of (i)–(iv) is called
amenable.
Remark 2.8.2. The above equivalences are proven in a non-constructive
manner, due to the use of the Hahn-Banach theorem (as well as the con-
tradiction argument). Thus, for instance, it is not immediately obvious
how to convert an invariant mean into a Følner sequence, despite the above
equivalences.
2.8.2. Examples of amenable groups. We give some model examples
of amenable and non-amenable groups:
Proposition 2.8.3. Every finite group is amenable.
Proof. One can take the sets AN = {1, . . . , N } as the Følner sequence, or
an ultralimit as an invariant mean.
Proposition 2.8.5. The free group F2 on two generators e1 , e2 is not
amenable.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
336 2. Related articles
Proof. Using invariant means, there is a very short proof: Given invariant
means λH , λK for H, K, we can build an invariant mean λG for G by the
formula
λG (f ) := λK (F )
for any f ∈ ∞ (G), where F : K → R is the function defined as F (xH) :=
λH (f (x·)) for all cosets xH (note that the left-invariance of λH shows that
the exact choice of coset representative x is irrelevant). (One can view λG
as sort of a “product measure” of the λH and λK .)
Now we argue using Følner sequences instead. Let En , Fn be Følner
sequences for H, K, respectively. Let S be a finite subset of G, and let
ε > 0. We would like to find a finite non-empty subset A ⊂ G such that
|(x · A)\A| ≤ ε|A| for all x ∈ S; this will demonstrate amenability. (Note
that by taking S to be symmetric, we can replace |(x·A)\A| with |(x·A)ΔA|
without difficulty.)
By taking n large enough, we can find Fn such that π(x) · Fn differs
from Fn by at most ε|Fn |/2 elements for all x ∈ S, where π : G → K is the
projection map. Now, let Fn be a pre-image of Fn in G. Let T be the set of
all group elements t ∈ K such that S · Fn intersects Fn · t. Observe that T
is finite. Thus, by taking m large enough, we can find Em such that t · Em
differs from Em by at most ε|Em |/2|T | points for all t ∈ T .
Now set A := Fn · Em = {zy : y ∈ Em , z ∈ Fn }. Observe that the sets
z ·Em for z ∈ Fn lie in disjoint cosets of H and so |A| = |Em ||Fn | = |Em ||Fn |.
Now take x ∈ S, and consider an element of (x · A)\A. This element must
take the form xzy for some y ∈ Em and z ∈ Fn . The coset of H that xzy
lies in is given by π(x)π(z). Suppose first that π(x)π(z) lies outside of Fn .
By construction, this occurs for at most ε|Fn |/2 choices of z, leading to at
most ε|Em ||Fn |/2 = ε|A|/2 elements in (x · A)\A.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
2.8. Amenability 337
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Bibliography
339
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
340 Bibliography
[BoKe1996] E. Bogomolny, J. Keating, Random matrix theory and the Riemann zeros. II.
n-point correlations, Nonlinearity 9 (1996), no. 4, 911–935.
[Bo1969] A. Borel, Injective endomorphisms of algebraic varieties, Arch. Math. (Basel)
20 1969 531–537.
[Bo1999] J. Bourgain, On the dimension of Kakeya sets and related maximal inequalities,
Geom. Funct. Anal. 9 (1999), no. 2, 256–282.
[BoBr2003] J. Bourgain, H. Brezis, On the equation div Y = f and application to control
of phases, J. Amer. Math. Soc. 16 (2003), no. 2, 393–426.
[BudePvaR2008] G. Buskes, B. de Pagter, A. van Rooij, The Loomis-Sikorski theorem
revisited, Algebra Universalis 58 (2008), 413–426.
[ChPa2009] T. Chen, N. Pavlovic, The quintic NLS as the mean field limit of a boson gas
with three-body interactions, preprint.
[ClEdGuShWe1990] K. Clarkson, H. Edelsbrunner, L. Guibas, M. Sharir, E. Welzl, Com-
binatorial complexity bounds for arrangements of curves and spheres, Discrete Com-
put. Geom. 5 (1990), no. 2, 99–160.
[Co1989] J. B. Conrey, More than two fifths of the zeros of the Riemann zeta function are
on the critical line, J. Reine Angew. Math. 399 (1989), 1–26.
[Dy1970] F. Dyson, Correlations between eigenvalues of a random matrix, Comm. Math.
Phys. 19 1970 235–250.
[ElSz2008] G. Elek, B. Szegedy, A measure-theoretic approach to the theory of dense hy-
pergraphs, preprint.
[ElObTa2009] J. Ellenberg, R. Oberlin, T. Tao, The Kakeya set and maximal conjectures
for algebraic varieties over finite fields, preprint.
[ElVeWe2009] J. Ellenberg, A. Venkatesh, C. Westerland, Homological stability for Hur-
witz spaces and the Cohen-Lenstra conjecture over function fields, preprint.
[ErKa1940] P. Erdős, M. Kac, The Gaussian Law of Errors in the Theory of Additive
Number Theoretic Functions, American Journal of Mathematics, volume 62, No. 1/4,
(1940), 738–742.
[EsKePoVe2008] L. Escauriaza, C. E. Kenig, G. Ponce, L. Vega, Hardy’s uncertainty
principle, convexity and Schrödinger evolutions, J. Eur. Math. Soc. (JEMS) 10 (2008),
no. 4, 883–907.
[Fa2003] K. Falconer, Fractal geometry, Mathematical foundations and applications. Sec-
ond edition. John Wiley & Sons, Inc., Hoboken, NJ, 2003.
[FeSt1972] C. Fefferman, E. M. Stein, H p spaces of several variables, Acta Math. 129
(1972), no. 3–4, 137–193.
[FiMaSh2007] E. Fischer, A. Matsliach, A. Shapira, Approximate Hypergraph Partitioning
and Applications, Proc. of FOCS 2007, 579–589.
[Fo2000] G. Folland, Real Analysis, Modern techniques and their applications. Second edi-
tion. Pure and Applied Mathematics (New York). A Wiley-Interscience Publication.
John Wiley & Sons, Inc., New York, 1999.
[Fo1955] E. Følner, On groups with full Banach mean value, Math. Scand. 3 (1955), 243–
254.
[Fo1974] J. Fournier, Majorants and Lp norms, Israel J. Math. 18 (1974), 157–166.
[Fr1973] G. Freiman, Groups and the inverse problems of additive number theory, Number-
theoretic studies in the Markov spectrum and in the structural theory of set addition,
pp. 175–183. Kalinin. Gos. Univ., Moscow, 1973.
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Bibliography 341
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
342 Bibliography
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Bibliography 343
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
344 Bibliography
[Ta2005] M. Talagrand, The generic chaining. Upper and lower bounds of stochastic pro-
cesses. Springer Monographs in Mathematics. Springer-Verlag, Berlin, 2005.
[Ta1951] A. Tarski, A decision method for elementary algebra and geometry, 2nd ed.
University of California Press, Berkeley and Los Angeles, Calif., 1951.
[Ta] T. Tao, Summability of functions, unpublished preprint.
[Ta2006] T. Tao, Nonlinear dispersive equations. Local and global analysis. CBMS Re-
gional Conference Series in Mathematics, 106. Published for the Conference Board of
the Mathematical Sciences, Washington, DC; by the American Mathematical Society,
Providence, RI, 2006.
[Ta2006b] T. Tao, A quantitative ergodic theory proof of Szemerédi’s theorem, Electron.
J. Combin. 13 (2006), no. 1.
[Ta2006c] T. Tao, Szemerédi’s regularity lemma revisited, Contrib. Discrete Math. 1
(2006), no. 1, 8–28
[Ta2007] T. Tao, A correspondence principle between (hyper)graph theory and probability
theory, and the (hyper)graph removal lemma, J. Anal. Math. 103 (2007), 1–45
[Ta2007b] T. Tao, Structure and randomness in combinatorics, Proceedings of the 48th
Annual Symposium on Foundations of Computer Science (FOCS) 2007, 3-18.
[Ta2008] T. Tao, Structure and Randomness: pages from year one of a mathematical blog,
American Mathematical Society, Providence RI, 2008.
[Ta2009] T. Tao, Poincaré’s Legacies: pages from year two of a mathematical blog, Vols.
I, II, American Mathematical Society, Providence RI, 2009.
[Ta2010] T. Tao, The high exponent limit p → ∞ for the one-dimensional nonlinear wave
equation, preprint.
[Ta2010b] T. Tao, A remark on partial sums involving the Möbius function, preprint.
[Ta2010c] T. Tao, Sumset and inverse sumset theorems for Shannon entropy, preprint.
[TaVu2006] T. Tao, V. Vu, On random ±1 matrices: singularity and determinant, Ran-
dom Structures Algorithms 28 (2006), no. 1, 1–23
[TaVu2006b] T. Tao, V. Vu, Additive combinatorics. Cambridge Studies in Advanced
Mathematics, 105. Cambridge University Press, Cambridge, 2006.
[TaVu2007] T. Tao, V. Vu, On the singularity probability of random Bernoulli matrices,
J. Amer. Math. Soc. 20 (2007), 603–628.
[TaWr2003] T. Tao, J. Wright, Lp improving bounds for averages along curves, J. Amer.
Math. Soc. 16 (2003), no. 3, 605–638.
[Th1994] W. Thurston, On proof and progress in mathematics, Bull. Amer. Math. Soc.
(N.S.) 30 (1994), no. 2, 161–177.
[To2005] C. Toth, The Szemerédi-Trotter Theorem in the Complex Plane, preprint.
[Uc1982] A. Uchiyama, A constructive proof of the Fefferman-Stein decomposition of BMO
(Rn ), Acta Math. 148 (1982), 215–241.
[VuWoWo2010] V. Vu, M. Wood, P. Wood, Mapping incidences, preprint.
[Wo1995] T. Wolff, An improved bound for Kakeya type maximal functions, Rev. Mat.
Iberoamericana 11 (1995), no. 3, 651–674
[Wo1998] T. Wolff, A mixed norm estimate for the X-ray transform, Rev. Mat. Iberoamer-
icana 14 (1998), no. 3, 561–600.
[Wo2003] T. Wolff, Lectures on harmonic analysis. With a foreword by Charles Fefferman
and preface by Izabella Laba. Edited by Laba and Carol Shubin. University Lecture
Series, 29. American Mathematical Society, Providence, RI, 2003. x+137 pp
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Index
345
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
346 Index
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Index 347
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
348 Index
Author's preliminary version made available with permission of the publisher, the American Mathematical Society
Index 349
Author's preliminary version made available with permission of the publisher, the American Mathematical Society