100% found this document useful (1 vote)

165 views

Integration 2

Lecture notes on integration theory

Uploaded by

Oliver R Diaz

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

165 views

Integration 2

Lecture notes on integration theory

Uploaded by

Oliver R Diaz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 670

c

Copyright 2010 Oliver Dı́az Espinosa

Permission is granted to copy and distribute this document for academic purposes.
Integration and Measure Theory

Oliver R. Dı́az–Espinosa

SAMSI–Duke University
Current address: Precision Health Economics
Email address: [email protected]
2010 Mathematics Subject Classification. 28-02
Contents

Preface xv
Chapter 1. Elements of set theory 1
§1.1. Naive set theory 1
§1.2. Order sets and transfinite induction 4
§1.3. The Axiom of choice 9
§1.4. Cardinality 11
§1.5. Simple algebraic structures 14
§1.6. Exercises 16
Chapter 2. Elements of point set Topology 19
§2.1. General definitions 19
§2.2. Connected spaces 23
§2.3. Convergence 27
§2.4. Compactness 31
§2.5. Metric spaces 33
§2.6. Banach fixed point theorem 38
§2.7. Uniformities 39
§2.8. Product topology 40
§2.9. Urysohn metrization 44
§2.10. Arzelà–Ascoli theorem 51
§2.11. Locally compact Hausdorff spaces 53
§2.12. Exercises 56
Chapter 3. Basic measure theory 61

vii
viii Contents

§3.1. Measurable spaces 61

§3.2. Measure spaces 64
§3.3. Construction of measures 65
§3.4. Two examples of construction by outer measures. 70
§3.5. Uniqueness of measures 76
§3.6. Measurable functions and random variables 78
§3.7. Universal completion 79
§3.8. Suslin operation and projection of measurable sets* 80
§3.9. Measurable Isomorphism Theorem* 85
§3.10. Exercises 92
Chapter 4. Integration: measure theoretic approach 97
§4.1. Simple functions and integration 97
§4.2. Lebesgue Integration 100
§4.3. Monotone Convergence 102
§4.4. Lebesgue Dominated Converge 105
§4.5. Riemann integral and Lebesgue integral on R. 107
§4.6. Integration under measurable transformations 109
§4.7. Exercises 111
Chapter 5. Baire Category and Stone–Weierstrass theorem 117
§5.1. Baire category 117
§5.2. Order on Vector spaces 118
§5.3. Stone–Weierstrass Theorem 120
§5.4. General Stone–Weierstrass Theorem 126
§5.5. Monotone classes of functions 127
§5.6. Sequential closure and Baire functions 128
§5.7. Measurable selection theorem 131
§5.8. Exercises 132
Chapter 6. Integration: functional approach 135
§6.1. The Riemann integral revisited 135
§6.2. The Elementary integral 139
§6.3. Daniell’s mean 140
§6.4. Daniell convergence theorems 146
§6.5. Extension of the Integral 149
§6.6. Alternative extension of the Daniell integral 151
§6.7. Order continuous Integrals 152
Contents ix

§6.8. Exercises 157

Chapter 7. Daniell Measurability 159
§7.1. Littlewood’s Principles and Measurability 159
§7.2. Localization 162
§7.3. Integrability criteria 163
§7.4. Absolute continuity 165
§7.5. Daniell–Stone representation 167
§7.6. Maximality 169
§7.7. Integration on locally compact Hausdorff spaces 172
§7.8. Exercises 176
Chapter 8. Lp spaces 177
§8.1. Convex functions on the real line 177
§8.2. Jensen’s Inequality 179
§8.3. Lp spaces 180
§8.4. Riesz representation. 185
§8.5. Reverse Borel–Cantelli theorem 189
§8.6. L0 and convergence in Measure. 190
§8.7. Uniform Integrability 194
§8.8. Lyapunov’s convexity theorem 198
§8.9. Exercises 204
Chapter 9. Finite product of elementary integrals 211
§9.1. The iterated mean 211
§9.2. Fubini and Tonelli’s theorems 214
§9.3. A few applications of Fubibi’s theorem 218
§9.4. The product σ–algebra 220
§9.5. Image of elementary integrals 221
§9.6. Change of variables formula in (Rn , B(Rn ), λ). 222
§9.7. Applications of change of variables in integration 234
§9.8. Isodiametric inequality 238
§9.9. Laplace’s method 241
§9.10. Exercises 242
Chapter 10. Signed and Complex measures 247
§10.1. Real valued elementary integrals 247
§10.2. Extension of elementary integrals of finite variation 250
x Contents

§10.3. Signed measures 251

§10.4. The space of elementary integrals 256
§10.5. Radon–Nikodym Theorem 262
§10.6. Some application of Radon–Nikodym theorem 264
§10.7. Uniformly continuous families of measures 268
§10.8. Exercises 270
Chapter 11. Differentiation 273
§11.1. Derivatives of Measures in Rd . 273
§11.2. The fundamental theorem of Calculus 277
§11.3. Integration by parts in R 279
§11.4. Analytic functions 285
§11.5. Cauchy formula 289
§11.6. Singularities 302
§11.7. Zeroes of analytic functions 309
§11.8. Entire functions 313
§11.9. Exercises 318
Chapter 12. Some Elements of Functional Analysis 325
§12.1. Topological Vector Spaces 325
§12.2. Quotient topology 331
§12.3. Locally convex spaces 333
§12.4. Inductive limit topology 340
§12.5. Continuous linear transformations 345
§12.6. Banach algebra of linear operators on a Banach spaces 348
§12.7. Finite dimensional spaces 351
§12.8. Fixed point theorems 354
§12.9. Uniform booundedness 355
§12.10. Duality and separation theorems 358
§12.11. Weak topology 365
§12.12. Some compactness theorems in linear spaces 368
§12.13. The open map theorem 374
§12.14. Spectrum of linear operators on Banach spaces 380
§12.15. Compact operators 383
§12.16. Hilbert Spaces 389
§12.17. Exercises 406
Chapter 13. More results on duality 413
Contents xi

§13.1. Dunford–Pettis Theorem 413

§13.2. The dual of L∞ 415
§13.3. Lp –Interpolation Theorems 416
§13.4. Localization of distributions 422
§13.5. Riesz duality between C0 (X) and M (X) 424
§13.6. An application: Runge’s theorem. 426
§13.7. Exercises 428
Chapter 14. Calculus on Banach spaces 431
§14.1. Measurability and uniformity 431
§14.2. Banach valued integral 433
§14.3. Extension of Bochner’s Integral 436
§14.4. Other vector valued integrals 439
§14.5. Symbolic calculus in Banach algebras 441
§14.6. Differentiation on Banach spaces 443
§14.7. Implicit Function Theorem 446
§14.8. Existence and uniqueness of solutions to differential equations 449
§14.9. Optimization and Lagrange Multipliers 452
§14.10. Exercises 454
Chapter 15. Fourier transform and Convolution on Rn 459
§15.1. Fourier transform 459
§15.2. Convolution 466
§15.3. Approximation to the identity 473
§15.4. Fourier series 480
§15.5. Inversion of the Fourier transform in L1 (Rn ) 488
§15.6. L2 Theory and Plancherel’s Theorem 492
§15.7. Schwartz functions 494
§15.8. Harmonic functions 498
§15.9. Exercises 501
Chapter 16. Countable product of probability spaces 505
§16.1. Product of measurable spaces 505
§16.2. Independence 506
§16.3. Ionescu Tulcea’s Theorem 508
§16.4. 0–1 laws. 509
§16.5. Canonical space 511
§16.6. Symmetrization 513
xii Contents

§16.7. Series of independent random variables 516

§16.8. The law of large numbers of independent variables 519
§16.9. Random Walks 523
§16.10. Exercises 527
Chapter 17. Weak convergence of measures 529
§17.1. The weak topology for measures of finite variation 529
§17.2. Weak convergence of measures on metric spaces 531
§17.3. Weak convergence under continuous transformations 537
§17.4. Tightness and Prohorov’s theorem 539
§17.5. Vague convergence for σ–finite measures 543
§17.6. Converging determining classes 544
§17.7. Uniform integrability and weak convergence of measures 545
§17.8. Weak convergence on probability spaces 546
§17.9. Exercises 549
Chapter 18. Weak convergence in Euclidean spaces 551
§18.1. Weak convergence and distribution functions 551
§18.2. Tightness and weak convergence of positive measures in Rn 554
§18.3. Random series with independent terms 555
§18.4. Characteristic functions and weak convergence 555
§18.5. Positive definite functions 557
§18.6. Classical Central Limit Theorem 558
§18.7. Poisson approximation 562
§18.8. Exercises 563
Chapter 19. Conditioning and disintegration 565
§19.1. Conditional expectation 565
§19.2. Conditional Independence 567
§19.3. Regular conditional probabilities 569
§19.4. Disintegration 572
§19.5. Kolmogorov’s extension theorem 574
§19.6. Sufficient statistics 577
§19.7. Bayes model and conjugate priors 583
§19.8. Information inequality 587
§19.9. Exercises 588
Chapter 20. Martingales 591
Contents xiii

§20.1. Measurability concepts for stochastic processes 591

§20.2. Stopping times 594
§20.3. Martingales and Stopping times 599
§20.4. Martingale convergence theorem 602
§20.5. Optional stopping time theorems 607
§20.6. Doob’s decomposition 610
§20.7. Doob’s maximal function 612
§20.8. Exercises 614
Chapter 21. Applications of Martingale theory 615
§21.1. Differentiation 615
§21.2. Disintegration of Stochastic kernels 619
§21.3. Exchangeability 620
§21.4. Exercises 623
Appendix A. Infinite series on Banach spaces 625
§A.1. Properties of absolutely convergent series 625
§A.2. Double series 628
§A.3. Exercises 630
Appendix B. Lower semicontinuous and convex functions 633
§B.1. Lower semicontinuous functions 633
§B.2. Convex functions 635
§B.3. Asymptotic Cones and Functions in Rn 641
§B.4. Exercises 646
Index 649
Preface

This notes originated during a short summer course on Topics in Probability for senior
undergraduate students at the University of Toronto. The original goal was to introduce
Lebesgue integration theory geared towards Probability. Over the course of three years,
mostly from my interactions with first year graduate students preparing for their qualifying
exams at the University of Toronto and at Duke University, these notes grew considerable
into what is now a full course of integration theory. Several topics in Probability (indepen-
dence, conditioning and Martingales) are included. This is intended to preserve the initial
spirit of this notes: to teach some topics in Probability.
The selection of topics and their order of appearance are based on my attempt to
provide a self contained presentation of the subject. In particular, the first two Chapters
are included as a reference to the elements of set theory and point set topology which are
used later in the notes to construct examples or to lay the terrain for new material.
In preparing these notes, I have borrowed from the work of several authors from whom
I learned Integration and Probability theory: W. Rudin’s Real and Complex Analysis; D.
Cohn’s Measure theory; V. I. Bogachev’s Measure theory; O. Kallenberg’s Foundations of
Probability; and K. Bichteller’s Integration: A functional approach.
There are several methods to introduce modern integration theory. We present two: the
classic method of Lebesgue, and the functional approach of Daniell. We will see that both
methods produce the same class of integrable functions. Paraphrasing Klaus Bichteller,
“Lebesgue’s method is based on ingenuity, Daniell’s approach is based on hindsight: to be
integrable, a function must not be too big and must be measurable”.
I hope that this notes assists other graduate students who are learning Integration
theory and the foundations of Probability. I apologize for all the typos that might appear
here and there.

Durham NC, 2010.

xv
Chapter 1

Elements of set theory 1

In this section we give a naive presentation of set theory and the real number, that is,
we do not provide either a rigorous axiomatic presentation of set theory, or a set theoretic
construction of the real numbers. Instead, we take the notion and existence of sets as granted
and assume that the reader is familiar with set operations such as union, intersection,
complementation, relations and functions. Although we assume the existence of the sets
of natural numbers N, the integers Z, the rational numbers Q and of real numbers R, we
indicate in the exercises at the end of this section how to rigorously construct the integers
from the natural numbers and zero, and the rational numbers from the integers. The
problem of constructing the real numbers from the rationals (achieved by Dedekind and
Cantor at the end of the 1800’s) is not discussed in these notes.
We will give a rather detailed presentation of notions of order, cardinality, the Axiom
of choice and some of its equivalences. The Axiom of choice is used in these notes to prove
a fundamental existence results in analysis such as the Hahn–Banach extension theorem,
Vitali’s covering Lemma, Alexander’s covering theorem for compact sets, and Parseval’s
theorem on the existence of maximal orthonormal families in Hilbert spaces.

1.1. Naive set theory

The concept of a set is one of the basic primitive mathematical concepts which does not
lend itself to an accurate definition, in the same that the concept of a point appears in
elementary Geometry. In Set Theory, sets are formally described by a definite collection of
axioms. Properties and statement about them are deduced from logic.
Informally, a set is the name for a collection of objects (elements): the set of students
enrolled in a class, the set of chairs in a theater, the set of planets in the solar system.
Given a set A and an object x we can determine whether x is an element of the set A or

1This chapter may be skipped and used only as reference.

1
2 1. Elements of set theory

not. The notion of belonging or being an element of is donated by the symbol ∈; thus, we
use x ∈ A to indicate that x is an element of A, and x ∈ / A to indicate that x is not an
element of A. To avoid logical contradictions, it is convenient to postulate that no set is
an element of its own, that is, for any set A, A ∈
/ A. For instance, Russell’s paradox which
considers R := {x : x ∈/ x}. This is not a set for, it it were a set then R ∈ R iff R ∈ / R.
Another example is the set of all sets. There is no such set, for if there were the set of all
sets, which we then denote by U , then U ∈ U .
Definition 1.1.1. Given two sets A and B, we say that A = B if for any x, x ∈ A iff x ∈ B.
That is two sets are equal iff they have the same elements. We say that A is contained in
B (denoted by A ⊂ B) iff for any x, x ∈ A implies that x ∈ B.
Definition 1.1.2. Given a set A, there is a set called power set and denoted by P(A)
such that Y ∈ P(A) iff Y ⊂ A.

Sets may have only one element x. Such set, denoted by {x} is called singleton x. A
set with two
distinct elements
x, y is denoted by {x, y}. The order pair (x, y) is the set
defined by {x}, {x, y} .
A property P is a proposition that for a given object x can be determine to be true
or false. Very often in Mathematics, given a set A and a proposition P , we define a set of
objects that belong to A for which the property P holds true. This set is denoted as
{x ∈ A : P (x)}
Remark 1.1.3. In defining sets through properties, we always restrict the objects to be
elements of a priory established set. When the a priory established set is clear form the
context we often omit it and write instead {x : P (x)}.
Example 1.1.4. If N is the set of natural numbers 1, 2, 3, . . ., then {x ∈ N : x2 − 3x + 2} =
{1, 2}.

There is a set -the empty set or void set which is denoted by ∅- that has no elements.
This can be expresses as the set of objects that are not equal to themselves, {x : x 6= x}.
Since sets are fully characterized by their elements, there is only one empty set.
Throughout this notes, we will use the term collection or family for denote sets whose
elements are sets. Given a collection of sets A, we define its union as the set defined as
[
A = {x : for some A ∈ A, x ∈ A}
Similarly, the intersection of all elements of A is defined as
\
A := {x : for all A ∈ A, x ∈ A}
In particular, if A and B are two sets, then
A ∪ B := {x : x ∈ A, or x ∈ B}
A ∩ B := {x : x ∈ A and x ∈ B}
1.1. Naive set theory 3

If A and B are two sets, then the difference A \ B is defined as

A \ B := {x : x ∈ A and x∈
/ B}
If A is a subset of a set X then, the set X \ A is called the complement of A. This set is
often denoted by Ac .
It is easy to check that if A is a subset of a X and B = {Bi : i ∈ I} is a set of subsets of X
then, the following distributive formulas hold
[ [
A∩ Bi = (A ∩ Bi )
i∈I i∈I
\ \
A∪ Bi = (A ∪ Bj )
i∈I i∈I

Given sets A and B, the Cartesian product of A and B is defined as

A × B = {(x, y) : x ∈ A, y ∈ B}

We recall first the following concepts from the theory of sets.

Definition 1.1.5. Let X and Y be sets. A relation R from X to Y in a subset of X × Y .
For each x ∈ X and y ∈ Y we say that x is related to y, denoted by xRy, if (x, y) ∈ R.
The domain of R, denoted by dom(R), is the set of all x ∈ X for which there is y ∈ Y
such that xRy. The range or image of R, denoted by Range(R) is the set of all y ∈ Y for
which there is x ∈ X such that xRy. The inverse of R is the relation from Y to X defined
by R−1 := {(y, x) : (x, y) ∈ R}.
Definition 1.1.6. Given sets X and Y , a function f from X to Y is a relation with
dom(f ) = X such that if (x, y) and (x, z) belong to f , then y = z.

We use the notation f : X → Y to indicate that f is a function from X to Y and

y = f (x) to denote that (x, y) ∈ f .
(a) If the range of f is the whole set Y , then we say that f is surjective (or onto).
(b) If for all x, z ∈ X and y ∈ Y , (x, y) ∈ f and (z, y) ∈ f implies that x = z, we say
that f is injective.
(c) If f is both injective and surjective, then we say that f is bijective.
Remark 1.1.7. A function f from X to Y is bijective iff f −1 is a function from Y to X.

When proving existence results using set theory it is often the case that one has an collection
of sets in which one is an extension of another and, from this collection we construct
a function that extends every function in the aforementioned collection. The following
elementary result summarizes this type of arguments.
Lemma 1.1.8. Given sets A and B, assume that C is a collection of functions with domains
S in A and image inSB such that for any f, g ∈ C either f ⊂ g or g ⊂ f . Then
contained
F := C is a function from {dom(f ) : f ∈ C} to B.
4 1. Elements of set theory

S
Proof. We first show that F is a relation with dom(F ) = {dom(f ) : f ∈ C}. For any
x ∈ dom(F
S ) there is y ∈ B such that (x, y) ∈ F . S
Thus (x, y) ∈ f for some f ∈ C, and so
x ∈ {dom(f ) : f ∈ C}. Conversely, for any x ∈ {dom(f ) : f ∈ C} there is f ∈ C such
that x ∈ dom(f ). Hence there is y ∈ B for which (x, y) ∈ f . This means that (x, y) ∈ F ,
that is, x ∈ dom(F ).

Now we show that F is a function. Suppose (x, y) and (x, z) are elements in F . Then, there
are f, g ∈ C such that (x, y) ∈ f and (x, z) ∈ g. Without loss of generality assume that
f ⊂ g. Then, as g is a function, y = z. This shows that F is a function.

1.2. Order sets and transfinite induction

Definition 1.2.1. An equivalence relation R on X is a relation from X to X that
satisfies the following conditions:
(a) (Reflexivity) For any x ∈ X, (x, x) ∈ R.
(b) (Symmetry) For any x, y ∈ X, (x, y) ∈ R iff (y, x) ∈ R.
(c) (Transitivity) For any x, y, z ∈ X, if (x, y) ∈ R and (y, z) ∈ R then (x, z) ∈ R.

The simplest example of equivalence relation is equality of sets.

Definition 1.2.2. A pre–order R on a set X is a reflexive and transitive relation from X

to X. A partial order R on a set X is a pre–order on X that satisfies
(d) (Antisymmetry) For any x, y ∈ X, if (x, y) ∈ R and (y, x) ∈ R then x = y.
A total order on X is a partial order R on X such that
(e) For all x, y ∈ X, either (x, y) ∈ R or (x, y) ∈ R.
If ≤ is a partial (total) order on A, we say that (A, ≤) is a partially (totally) ordered set.
Very often, we only say that A is an ordered set to mean A is a totally ordered set.

Some simplifying notation is in order. Suppose (X, ≤) is a partially ordered set. For
any x, y ∈ X, we will use the notation x < y to mean that x ≤ y but y x; also, we will
use y ≥ x to mean that x ≤ y.

Example 1.2.3. Here are some common examples of ordered sets:

(a) The integers Z with the usual order . . . < −1 < 0 < 1 < . . . is a totally order set.
(b) More generally, the set of real numbers R with x ≤ y iff 0 ≤ y − x (the usual order)
is totally ordered.
(c) For any set X, its power set P(X) can be partially ordered by inclusion, that is,
for any subsets A and B of X, we declare A B iff A ⊂ B.
(d) The set of functions from an nonempty set Ω to R, RΩ is partially ordered with
the pointwise order defined as f ≤ g iff g(x) ≤ f (x) for all x ∈ Ω.
1.2. Order sets and transfinite induction 5

(e) (lexicographic order) Suppose (A, ≤) is totally ordered. We define an order on

A × A be declaring for any (x1 , x2 ) and (y1 , y2 ) in A × A, (x1 , x2 ) ≺ (y1 , y2 ) when
x1 < y1 or when x1 = y1 and x2 < y2 .
Definition 1.2.4. Suppose (X, ≤) is a partially ordered set.
(1) An element m ∈ X is called maximal if for for every x ∈ X, m ≤ x implies that
m = y.
(2) An element u ∈ X is an upper bound of a set A ⊂ X if for any x ∈ X, x ∈ A
implies that x ≤ u. Similarly, an element v ∈ X is a lower bound of A ⊂ X if
for any x ∈ X, x ∈ A implies that v ≤ x.
(3) A nonempty set P ⊂ X is a chain in X if (P, ≤) is totally order.
Suppose (X, ≤) is a total order.
(4) A set A ⊂ X is said to be bounded above in X if A has an upper bound u ∈ X.
An element b ∈ X is called the supremum of A if b is an upper bound of A and,
if u ∈ X is any other upper bound of A then, b ≤ u. That is, b := sup(A) is the
lowest upper bound of A.
(5) Similarly, A is said to be bounded below in X if A has a lower bound in X. The
infimum of A is defined as the greatest lower bound of A.
(6) A set that is both bounded above and below is simply said to be bounded.
(7) A totally ordered set X is order complete if any nonempty set A that is bounded
above has a supremum α ∈ X.
Definition 1.2.5. A totally ordered set (X, ≤) is said to be well–ordered if for any
nonempty set A of X, there is a0 ∈ A such that for any x ∈ X, if x ∈ A then a0 ≤ x.
That is, (X, ≤) is a well–ordered set iff any nonempty set A is bounded below and contains
its infimum. The infimum of a nonempty subset A of a well ordered set X is called first
element of A.
Example 1.2.6. The simplest example of a well ordered set is the set of nonnegative
numbers Z+ with the usual order. Other well orders can be defined on Z+ . For instance:
(a) Consider the usual order on N = Z+ \ {0} and declare that for any n ∈ N, nR0.
Clearly (Z+ , R) is a well ordered set and, under R, Z+ is bounded above and has
a last element, namely {0}.
(b) A different well order in Z+ can be obtained by letting be the usual order
on the set E of nonnegative even numbers and on the set E c of nonnegative odd
numbers and then, declaring n m whenever n ∈ E and m ∈ E c . With this order,
E is is an infinite set bounded above.
Definition 1.2.7. Let (A, ≤) and (B, ) be two totally ordered sets. A function f : A → B
is order preserving if for any x, y ∈ A, x ≤ y implies that f (x) f (y). A and B are said to
be order-isomorphic (or have the same order type) if there exists an order-preserving
bijection between A and B.
6 1. Elements of set theory

The order type of the set ∅ (with the ∅ order) is denoted by 0. For any integer n ≥ 1,
the order type of Zn := {0, . . . , n − 1} with the usual order is denoted by the integer number
n. The order type of given to (Z+ , ≤) is denoted by ω.

Definition 1.2.8. Suppose (A, ≤) is a totally ordered set. For any x ∈ A, the set Ax :=
{y ∈ A : y < x} is called initial segment of (A, ≤) at x. A subset S ⊂ A is an order
ideal of A if for any x, y ∈ A, if x ∈ S and y ≤ x, then y ∈ S.

When the order ≤ is clear from the context, we will omit explicit reference to it.

Remark 1.2.9. The empty set is trivially an ideal of any totally ordered set. Evidently,
any initial segment of a totally ordered set is an ideal. The converse is not necessarily true.
For instance, if A has no last element, i.e. if A is not bounded above, then A is an ideal
but not an initial segment of A.

Lemma 1.2.10. Suppose (A, ≤) is a totally ordered set. The union of an arbitrary family
of ideals is an ideal. The intersection of an arbitrary collection of ideals is an ideal.
S
Proof. Suppose A is a family of ideals in A. Let x ∈ A and assume y < x. SThere is
S ∈ A such x ∈ S and, as S is an ideal, we have that y ∈ S. Therefore, y ∈ A. For
intersections, the proof is similar.

Theorem 1.2.11. Suppose f : A → B is an order isomorphism between two totally ordered

sets (A, ≤) and (B, ). If S is an order ideal of A then, f (S) is an order ideal of B.
Moreover, for any x ∈ A, f (Ax ) = Bf (x) .

Proof. Suppose S ⊂ A is an ideal of A. Let z ∈ f (S) and suppose w ∈ B satisfies w ≺ z.

Let x, y ∈ A be such that z = f (x) and w = f (y). Since f is bijective, x ∈ S and since
f preserves order it follows that y < x. Being S an ideal, we have that y ∈ S, and so
w = f (y) ∈ f (S). This shows that f (S) is an ideal in B.

Let x ∈ A. As f is an order isomorphism, it is clear that f (Ax ) ⊂ Bf (x) . If y ≺ f (x) then,

f −1 (y) < f −1 (f (x)) = x. Hence, f −1 (y) ∈ Ax , and so y = f f −1 (y) ∈ f (Ax ).

Theorem 1.2.12. Suppose (A, ≤) is a well ordered set. If S is an ideal of A then, either
A = S or there exists a unique x ∈ A such that S = Ax .

Proof. Suppose S 6= A. Let x be the first element of A \ S. Then Ax ⊂ S. If y ∈ S then

y ≤ x for if x < y, then we would have that x ∈ S since S is an ideal. Therefore S = Ax .

Remark 1.2.13. The well–ordered assumption is needed in the Theorem above. Consider
for instance the set of real numbers R with the usual order. Any interval of the form (−∞, a]
where a ∈ R is an order–ideal however, it is not an initial segment as in Definition 1.2.8.

Theorem 1.2.14. Suppose (A, ≤) is a well–ordered set. If f : (A, ≤) → (A, ≤) is an

injective order preserving function, then for any x ∈ A, x ≤ f (x).
1.2. Order sets and transfinite induction 7

Proof. Suppose the contrary, that is, the set B := {x ∈ A : f (x) < x} = 6 ∅. Let x0 be
the first element of the set B := {x ∈ A : f (x) < x}. Then f (x0 ) < x0 and by hypothesis,
f (f (x0 )) < f (x0 ). This means that f (x0 ) ∈ B which contradicts the choice of x0 .
Corollary 1.2.15. Suppose (A, ≤) is a well–ordered set. Then for any x ∈ A, the initial
segment Ax with the order inherited by (A, ≤) is not order–isomorphic to A.

Proof. Suppose for some x ∈ A there is an order isomorphism f : (A, ≤) → (Ax , ≤).
Then f , as a function from A into itself, satisfies the conditions of Theorem 1.2.14, and so
y ≤ f (y) for all y ∈ A. However, since f (A) = Ax we have in particular that f (x) < x.
This is a contradiction.
Theorem 1.2.16. Suppose (A, ≤) is a well–ordered set. For any two ideals S and T of A,
S and T are order–isomorphic iff either S = T = A or there is a unique x ∈ A such that
S = Ax = T .

Proof. Let S and T be order isomorphic ideals of A. Either S = A or there is x ∈ A such

that S = Ax . Since no initial segment of A is order isomorphic to A, in the former case we
must have that T = A = S. In the later case, it follows that T equals to some initial segment
Ay , y ∈ A. If x < y then Ax is an initial segment of Ay since Ax ⊂ Ay and x ∈ Ay \ Ax . By
Corollary 1.2.15 S = Ax may not order isomorphic to Ay = T , contradicting the hypothesis
of the Theorem; therefore x = y.
Theorem 1.2.17. If two well ordered sets (A, ≤) and (B, ) are order isomorphic, then
there exits a unique order isomorphism from A to B.

Proof. Suppose g and h are two order isomorphisms from A to B. We will show that
g(x) = h(x) for all x ∈ A. Indeed, h−1 ◦ g is an order isomorphism from A to itself.
Consequently, x ≤ h−1 (g(x)) for all x ∈ A. As h is an order isomorphism, we get that
h(x) ≤ h h−1 (g(x)) = g(x) for all x ∈ A. The converse inequality is obtained by reversing
the roles of g and h.
Theorem 1.2.18. Let (A, ≤) and (B, ) be two well order sets. One and only one of the
following possibilities hold:
(i) A and B are order isomorphic.
(ii) There exits a unique x ∈ A such that Ax is order isomorphic to B.
(iii) There exits a unique y ∈ B such that A and By are order isomorphic.

Proof. Let a0 and b0 be the first elements of A and B respectively. Let E be the collection
of all ideals of A that are order isomorphic to some ideal of B.SThis collection is nonempty
since Aa0 = ∅ is order isomorphic to Bb0 = ∅. The set S := E is an ideal of A and we
will show that S ∈ E .
Suppose Ij (j = 1, 2) are ideals in E and let Jj (j = 1, 2) ideals in B for which there are
(unique) order isomorphisms fj : Ij → Jj . Clearly I1 ∩ I2 is an order ideal of I1 and of I2
which is order isomorphic to order ideals f1 (I1 ∩I2 ) and f2 (I1 ∩I2 ) of B. By Theorem 1.2.16
8 1. Elements of set theory

f1 (I1 ∩ I2 ) = f2 (I1 ∩ I2 ) and by uniqueness of order isomorphisms, we have that f1 and

f2 coincide in I1 ∩ I2 . For each x ∈ S, there is I ∈ E such that x ∈ I. Thus, there is a
unique ideal JI in B such and a unique order isomorphism fI : I → JI . It follows that
wee can defineS an order preservingS injective function f : S → B by setting f (x) := fI (x).
As f (S) = {f (I) : I ∈ E } = {JI : I ∈ E } is an ideal, we have that S and T = f (S)
are order isomorphic. The conclusion of the Theorem follows by another application of
Theorem 1.2.16.

Theorem 1.2.18 allows the introduction of a total order on order types. Suppose α and
β are two order types, and let (A, w) and (B, r) well–order sets whose order types are α and
β respectively. Then α ≤ β iff (A, w) is order isomorphic to an ideal of (B, r) and α < β if
A is order isomorphic to an initial segment of (B, w). Order types are also called ordinal
numbers.
Theorem 1.2.19. Let α be an order type larger than 0. Let Pα be the set of all order types
that are less than α. Then Pα is well–ordered and it is of order type α.

Proof. Let β an order type and β < α. Let B and A be sets of order types β and α
respectively. Then, there is a unique x ∈ A such that B is order isomorphic to Ax . Setting
g(β) = x we define a function on Pα which is clearly an order isomorphism between Pα and
A. Therefore Pα is well ordered and has order type α.

We conclude this section with two results that generalize Mathematical induction.
Theorem 1.2.20. (Transfinite induction) Let (W, ≤) be a well ordered set. Suppose Q ⊂ W
is a set that satisfies the following condition: For any x ∈ W , if Wx ⊂ Q implies x ∈ Q.
Then, Q = W .

Proof. Let 0 denote the first element of W . Since W0 = ∅ ⊂ Q, it follows that 0 ∈ Q.

Hence Q 6= ∅. Suppose W \ Q 6= ∅ and let x0 be its first element. Then y < x0 implies that
y ∈ Q, that is, Wx0 ⊂ Q. Then, by hypothesis, x0 ∈ Q which is a contradiction. Therefore
Q = W.
Theorem 1.2.21. (Transfinite construction) Let (W, ≤) be a well order set and E an
arbitrary class. Assume that for each x ∈ W there is a given rule Rx that associates to each
function φ : Wx → E a unique
Rx (φ) ∈ E. Then, there exists a unique function F : W → E
such that F (x) = Rx F |Wx for each x ∈ W .

Proof. Let 0 denote the first element of W and for each x ∈ W , set Wx := Wx ∪ {x}. Let
T be the set of all x ∈ W for which there is a function fx : Wx → E such that

(1.1) fx (u) = Ru fx |Wu , u ∈ Wx .
We claim that 0 ∈ T . Since W0 = ∅, the only function φ : W0 → E is φ = ∅. Thus,
f0 : W0 → E given by f0 (0) = R0 (∅) satisfy condition (1.1).
For any x, y ∈ T , if x ≤ y then fx = fy |Wx . Suppose the opposite, that is, there are x, y ∈ T
with x < y such that fx 6= fy |Wx . Let x0 be the first element of the set {u ∈ Wx : fx (u) 6=
1.3. The Axiom of choice 9

fy (u)}. Clearly 0 < x0 since fx (0) = R0 (∅) = fy (0). Hence, Wx0 6= ∅ and fx |Wx0 = fy |Wx0 .
Consequently

fx (x0 ) = Rx0 fx |Wx0 = Rx0 fy |Wx0 = fy (x0 )
which is a contradiction.

function f : T → E given by f (x) := fx (x) is well defined and satisfies

This shows that the
f (x) = Rx f |Wx . We claim that T = W . Otherwise,
let t0 be the first element of W \ T .
Then Wt0 ⊂ T and f satisfies f (u) = Ru f |Wu for all u < t0 . Hence f |Wt0 can be extended

uniquely to t0 by setting f (t0 ) := Rt0 f |Wt0 so that f (u) = Ru f |Wu for all u ∈ Wt0 . This
implies that t0 ∈ T which is a contradiction. Therefore T = W .

1.3. The Axiom of choice

The following axiom of set theory allows to show existence to structures that are not entirely
concrete or accessible to intuition. This Axiom has been proven to be logically indepen-
dent from other fundamental axioms of set theory - Zermelo–Fraenkel (ZF) and/or von
Neumann-Bernays-Gödel set theory (NBG)- from which modern mathematics can be con-
structed. Also, it is proven that the axiom AC is consistence in ZF and/or NBG, that is,
if a contradiction can be found under ZF plus AC then a contradiction can be found under
ZF plus the negation of AC.
Axiom 1.3.1. (Axiom of choice (AC)) If I is a non empty set and A is a function from
I to a collection
S of sets A such that A(x) 6= ∅ for all x ∈ I then, there exits a function
f : I → A such that f (x) ∈ A(x).

A function f described in the Axiom of choice is called a choice function. Another

interpretation of the Axiom of choice is that for any nonempty indexed family of nonempty
sets {Ax : x ∈ I} there exits a set S consisting of exactly one element of each Ax .
Definition 1.3.2. Let I be a non empty set, and assume that each α ∈ I has a non–empty
set Q
Aα associated to it. The Cartesian product of the S collection {Aα : α ∈ I}, denoted
by α∈I Aα , is defined as the set of all functions x : I → α∈I Aα such that x(α) ∈ Aα .

Notice that the elements of the product of sets are in fact choice functions. The Axiom
of Choice states that the non–empty product of non–empty sets has at least one element.
The axiom of choice is used under other equivalent forms. In these notes we will only make
use of the following equivalences:
Well–ordering (WO): Every set admits a well–order.
Hausdorff ’s maximal principle (HMP): For every partially ordered set (X, ≤)
there is a maximal chain (P, ≤).
Zorn’s lemma (ZL): Suppose (X, ≤) is a partially ordered set. If any chain P in
X is bounded above in X the, X has a maximal element.
10 1. Elements of set theory

The following quote, attributed to Jerry Bona, states how surprising those equivalences are:
The Axiom of Choice is obviously true, the well–ordering principle is obviously false, and
who can tell about Zorn’s lemma. In other words, although the statement of the axiom
of choice seems to be intuitive and to certain degree non controversial, the well–ordering
principle is rather difficult to accept as trying to find an explicit well–order for the set of
real numbers R demonstrates, and Zorn’s lemma is not intuitive at all.
Theorem 1.3.3. AC, WO, HMP and ZL are equivalent.

Proof. AC implies WO. Let X be a nonempty set. The Axiom of choice implies that there is
a function c : P(X)\{∅} → X such that c(A) ∈ A for all A ∈ P(X)\{∅}. For convenience we
define f : P(X) \ {X} → X as f (B) = c(X \ B) for all B ∈ P(X) \ {X}. Set x0 := f (∅) and
(A, ≤) = {x0 }, (x0 , x0 ) . Then A is a well order set and as Ax0 = {x ∈ A : x < x0 } = ∅,
we say that a well order set (W, ≤) is an f –string if for any x ∈ W ,
x0 = f (Ax0 ). In general,
x = f {y ∈ W : y < x} . Clearly the first element of any f –string is x0 = f (∅).

If f –string (W, ≤) and x ∈ W then, for any z ∈ Wx , W z = {y ∈ W : y < z} ∩ Wz ∩ Wx .

Hence z = f (Wz ) = f (Wz ∩ Wx ) = f {y ∈ Wx : y < z} . This shows that initial segments
of f –strings are f –strings themselves.
Suppose (A, ≤) and (B, ) are f –strings. By Theorem 1.2.18 (A, ≤) is order–isomorphic to
an ideal (S, ) of (B, ). We claim that (A, ≤) equals (S, ). Let h be the unique order–
isomorphism from A and an ideal S of B. As x0 is the first order of both A, x0 = h(x0 ).
Suppose {z ∈ A : h(z) 6= z} = 6 ∅ and let z0 be its first element. Then Az0 6= ∅ and
h(Az0 ) = Sh(z0 ) = Az0 . Hence z0 = f (Az0 ) = f (Sh(z0 ) ) = h(z0 ) which is a contradiction to
the choice of z0 . Consequently h(x) = x for all x ∈ A and (A, ≤) = (S, ) and the claim
follows.
S
Let F be the set of all F –string sets in X. The claim above shoes that S = F together
with the union, ≤, of the well–orderings of each f –string in F forms a totally order set
(S, ≤) and for any B ∈ F, (B, ≤) is a well–ordered subset of (S, ≤). Let C ⊂ S be a
nonempty. Then for some B ∈ F, C ∩ B 6= ∅. Let b0 be the first element of B ∩ C as a
subset of (B, ≤). For any x ∈ C, if x < b0 then it follows that x ∈ B and so x ∈ C ∩ B
contradiction to choice of b0 . Hence, for any x ∈ C, b0 ≤ x. This shows that (S, ≤) is a
well–ordered set. We now show that (S, ≤) is an f –string. For x ∈ S there is B ∈ F such
that x ∈ B. Then x = f (Bx ) = f (Sx ).

Finally, we claim that S = X. Suppose S 6= X and let y = f (S). Then R = S ∪ {y} with
:=≤ ∪{(x, y) : x ∈ S} is an f string, and so R ∈ F but this is clearly a contradiction.

W) implies HMP. Suppose((X, R) is a partially ordered set and let (X, ) is well–ordered.
Let x0 ∈ X be the first element of (X, ). By transfinite induction, there is a unique
function f : X → X such that f (x0 ) = x0 and for any other x ∈ X

x if {x} ∪ {f (y) : y ≺ x} is a chain in (X, R)
f (x) =
x0 otherwise
1.4. Cardinality 11

We claim that f (X) = P is chain in (X, R). Notice that for any x ∈ X, f (x) ∈ {x0 , x}. Let
x, y ∈ P and assume that y ≺ x. Then f (x) = x and x0 ≺ x. Thus, either (a) x0 < y and
f (y) = y or (b) y = y0 . In either case, {x} ∪ {f (t) : t ≺ x} is a chain, and so xRy or yRx.
We now prove that P is a maximal chain. Suppose z ∈ X \ P . Then f (z) = x0 , and so
{z} ∪ {f (t) : t ≺ z} is not a chain in (X, R). Hence, there is t0 ≺ z such that neither
f (t0 ) R z nor z R f (t0 ). Consequently P ∪ {z} is not a chain in (X, R). This shows that P
is a maximal chain.

HMP implies ZL. Suppose (X, R) is a partially ordered set in which any chain is bounded
above. Let P be a maximal chain and let m ∈ X be an upper bound of P . Then, as P ∪{m}
is a chain that contains the maximal chain P , we conclude that m ∈ P . This shows that
for any x ∈ X such that m ≤ x, x ∈ P and so x = m. Therefore m is a maximal element
of (X, R).

ZL implies AC. Suppose I is a nonempty set and for each i ∈ I, A(i) is a nonempty set.
Let C be the set of all function f such that f ⊂ I and f (i) ∈ A(i) for each i ∈ f. As I is
not empty, there is i ∈ I and as A(i) is not empty there is ai ∈ A(i). Thus f = {i, ai )} ∈ C
so C is nonempty. We partially order C by inclusion. By Lemma 1.1.8, for any chain P in
(C, ⊂) we have that P ∈ C. Hence the conditions of ZL hold and C has a maximal element
F . We claim that dom = X, other wise there exists x ∈ I \ dom(F ) and, as A(x) 6= ∅, there
is ax ∈ A(X). Then F ∪ {(x, ax )} ∈ C contradicting the maximality of F .

1.4. Cardinality
An important concept in the theory of sets is the notion of cardinality. Two sets A and B
have the same cardinality or power if there is a bijective function f : A → B. In this case,
we also say that the sets A and B are equivalent, which is denoted by A ∼ B.
(a) When A = {1, . . . , n} := Zn , with n ∈ N, then the set B is finite and its cardinality
is denoted by the integer n.
(b) If there is no bijection f : Zn → B for any n ∈ N, then the set B is said to be
infinite.
(c) When A = N, then we say that the set B is infinite countable, and its cardinality
is denoted by ω.

Example 1.4.1. The set of all integers Z is countable. The function

−2n if n < 0
f (n) =
2n + 1 if n ≥ 0
is an explicit bijection f : Z → N.
S
Example 1.4.2. If An , n ∈ N is a sequence of countable sets, then A = n∈N An is also
countable. Form an infinite rectangular array by listing the elements of each set Ai =
{ai1 , ai2 , . . .} in a row. The following ordering depicted by arrows gives an implicit bijection
12 1. Elements of set theory

between N and A:
a11 → a12 a13 → a14 ...
ւ ր ւ ր
a21 a22 a23 a24 ...
↓ ր ւ ր
a31 a32 a33 a34 ...
ւ ր
a41 a42 a43 a44 ...
↓ ր
.. .. .. ..
. . . . ...
This is not the only way of producing a bijection between A and N.

The following example shows that not every set is countable.

Example 1.4.3. Consider the set M = {0, 1}N of all infinite sequences of 0s and 1s, that
is, an element x ∈ M if is of the form x = a1 a2 . . ., with an ∈ {0, 1} for each n ∈ N. A
is uncountable. Indeed, if it were countable, then we could write all terms of A in a list
xn = an1 an2 . . ., n ∈ N. Let x = a1 a2 . . . with an 6= ann . Although x ∈ M , it does not
appear in the list. (why?)
Theorem 1.4.4. If X is an infinite set then, there exits a set C ⊂ X that is countable.

Proof. By the Axiom of choice there exits a choice function f on P(X) \ {∅}. Set h(0) :=
f (X) and by induction, there exists a unique
function h : Z+ → X such that for all
n ≥ 1, h(n) = f X \ {h(0), . . . , h(n − 1)} . The function h : Z+ → h(Z+ ) is the desired
bijection.
Example 1.4.5. The set of all real numbers in the interval [0, 1] is uncountable. One
can use the fact that every real number admits a unique binary expansion with an infinite
number of 1s, combined with the result in the previous example. A mild modification of
this method, based on decimal expansions, can be use as well.
Theorem 1.4.6. A set A is infinite iff there is B ⊂ A such that A ∼ B

Proof. Clearly any finite set is not equivalent to any of its proper subsets. Thus, only
necessity needs a proof. If A is infinite countable then for any bijection f : N → A, let
B = {f (2n) : n ∈ N}. Clearly A ∼ B. Assume now that A is infinite and uncountable.
There exists an infinite countable set C ⊂ A. Since A is not countable, neither is A \ C.
Hence there is a countable set D ⊂ A \ C. There exists a bijection g : C ∪ D → D since
D ∪ C is a countable set. The function f (x) = x if x ∈ A \ (D ∪ C) and f (x) = g(x) if
x ∈ C ∪ D is a bijection from A into B := A \ C = A \ (C ∪ D) ∪ D.

The following results provides a link between cardinality and well–ordering and it is a
direct consequence of AC
Lemma 1.4.7. For any cardinal number A there exits a well order type such α such that
Pα has cardinality A.
1.4. Cardinality 13

Proof. Let A be a set with cardinality A. By WO there is a well–order on A. Let α

denote the order type of A. The conclusion follows from Theorem 1.2.19.
Definition 1.4.8. Let A and B be two cardinals numbers and let A and B bet sets of
cardinality A and B respectively. We say that A B if there exist a set C ⊂ B such that
A ∼ C. We say that A ≺ B if A B but A 6= B.

Using properties of composition of functions it is easy to check that for any cardinal
numbers A, B and C, if A B and B C then A C and that A A. The following result
states that for any pair of cardinal numbers A and B, only one of the following alternatives
may occur: (a) A = B, (b) A ≺ B, (c) B ≺ A.
Theorem 1.4.9. (Bernstein–Schröder) If A B and B A then A = B.

Proof. We present a proof that uses AC. Let A and B be sets with cardinality A and B
respectively. Suppose C ⊂ A and D ⊂ B are such that A ∼ D and B ∼ C. We well–order
sets (A, ≤) and (B, ). There exists a unique order isomorphism f from A to an order ideal
SB of B and, since order isomorphisms are bijections, A ∼ D ∼ SB . Similarly, there is an
order isomorphism g from B to an order ideal SA of A and B ∼ C ∼ SA . Hence, f (SA )
is an order ideal of B contained in the ideal f (A) = SB of B. Since f is order preserving,
f (SA ) is order isomorphic to B. By Corollary 1.2.15 and Theorem 1.2.16 we conclude that
f (SA ) = SB = B. This shows that A ∼ B
Corollary 1.4.10. For any pair of cardinal numbers A, B one and only one of the following
alternatives hold
(i) A = B
(ii) A ≺ B
(iii) B ≺ A

Proof. By Lemma 1.4.7 there are well order sets (A, ≤) and (B, ) with cardinalities A
and B respectively. By Theorem 1.2.18 one and only one of the following hold: (a) A is
order isomorphic to B, (b) A is order isomorphic to an initial segment of B, or (c) B is
order isomorphic to an initial segment of A. In case (a) we obtain (i), case (b) and (c) in
combination to Berstein’s theorem imply (ii) and (iii) respectively.
Theorem 1.4.11. There is a unique ordinal type Ω such that PΩ = {α : α < Ω} is
uncountable and for any β ∈ PΩ , β is countable. Furthermore, if C ⊂ PΩ and C is
countable then there is β ∈ PΩ such that C ⊂ {α : α < β}.

Proof. Let γ be an ordinal type such that Pγ has the same cardinality as that of R. If any
for any β ∈ Pγ , {α ∈ Pγ : α < β} is countable then set Ω := γ; otherwise, as Pγ is well
order, let Ω beSthe smallest ordinal type in Pγ that is uncountable. Let C ⊂ PΩ countable
and set D := {Pα : α ∈ C}. Being the countable union of countable sets, D is countable;
Let β be the first element of PΩ \ D. As each Pα with α ∈ C is an initial segment of PΩ , D
is the initial segment Pβ = {α ∈ PΩ : α < β}.
14 1. Elements of set theory

Remark 1.4.12. The cardinality the set N is denoted by ℵ0 . We have that Pω has cardi-
nality ℵ0 . The cardinality of PΩ is denoted as ℵ1 . The ordinal type Ω is the smallest one
such that PΩ is uncountable. It follows that the cardinality of Ω is at most the cardinality c
of the set R. The continuum hypothesis (CH) is the assertion that the ℵ1 = c, that is there
is no uncountable set whose cardinality is between ℵ0 and ℵ1 . Two important result of Set
theory proven by Cohen state that CH is independent of the axioms of ZF theory plus AC
and that AC is independent of the axioms of ZF theory.

For any set X, we will use $(X) to denote its cardinality. The following result due to
Cantor states that there is largest cardinal number (or order type).
Theorem 1.4.13. (Cantor) For any set X, #(X) ≺ #(P(X)).

For X = ∅ then P(X) = {∅} and so #(∅) = 0 < 1 = # P(∅) . For any set X 6= ∅
consider the function h : X → P(X) given by x 7→ {x}. This is an injective function and
so #(X) # P(X) . Suppose U is a set whose cardinality is the same as that of P(U ).
Then U 6= ∅ and there exists a bijection f : U → P(U ). Consider the set
S := {xu ∈ U : u ∈
/ f (u)}
S ∈ P(U ) since S ⊂ U . By assumption exists a ∈ U such that f (a) = S; however, we have
that a ∈ S iff a ∈
/ S which is a contradiction. This shows that no such set U exists.

1.5. Simple algebraic structures

The real numbers and the algebraic operations of addition and multiplication can be un-
derstood in terms of basic algebraic structures.
Definition 1.5.1. A set G and a binary operation · (a map from G × G → G) is said to
be a group if
(a) (Associative property) for any a, b, c ∈ G,
a · (b · c) = (a · b) · c
(b) (Unit element) There exists e ∈ G such that for any g ∈ G
e·g =g·g =g
(c) (Inverse property) For any g ∈ G, there is f ∈ G such that
f ·g =g·f =e
The group G is said to be Abelian or commutative if
(d) for any a, b ∈ G
a·b=b·a
Remark 1.5.2. It is left as an exercise to show that the element e satisfying (ii) is unique,
and that for any g ∈ G, there is only one element f such that f · g = e = g · f . Such element
f is denote as g −1 .
1.5. Simple algebraic structures 15

The simplest example of commutative group is (Z, +) the integers with the usual addi-
tion from grading school.
Definition 1.5.3. A set R with two binary operations + and · is said to be a ring if
(a) (R, +) is a commutative group.
(b) For any a, b, c ∈ R,
a · (b · c) = (a · b) · c
(c) (distribution property) For any a, b, c ∈ R
a · (b + c) = a · b + a · c
(a + b) · c = a · c + b · c
The additive unit element in (R, +) is denoted as 0. The ring (R, +, ·) is a commutative
ring if
(d) for all a, b ∈ R
a·b=b·a
The ring (R, +, ·) is a unital if
(e) there is e ∈ R such that for all a ∈ R
a·e=e·a=a
The commutative ring (R, +, ·) is an integral domain if
(f) For any a, b ∈ R, a 6= 0 and ab = 0 implies that b = 0.

The simplest example of a commutative ring with unit is (Z, +, ·) is the set of integer
numbers with the operations of addition and multiplication studied in grading school. In
fact, Z is an integral domain with unit 1.
Definition 1.5.4. A commutative ring (F, +, ·) is a field if (F \ {0}, ·) is a group. A field
F is an ordered field if it has a total order < satisfying
(a) If a 0, then a · c 0} is the set of all positive elements of F . An ordered field is an
Archimedean field if for any a, b ∈ F with a > 0, there is integer n ∈ N such that na > b
(here na is defined as 0a = 0, and for n ∈ N, na = a + (n − 1)a.)

The rational number Q and the real numbers R with the usual sum and product from
grade school are Archimedean fields. The field Q however is not ordered complete; whereas
R is. Cantor and Dedekind showed that the real numbers with the usual arithmetic oper-
ations of addition and multiplication and order (R, +, ·, <) is the only (up-to isomorphism)
Archimedean filed that is order complete.
16 1. Elements of set theory

1.6. Exercises
Exercise 1.6.1. Given two order pairs (x, y) and (u, v) show that (x, y) = (u, v) iff x = u
and y = v.
Exercise 1.6.2. Suppose (A, ≤), (B, ) and (C, R) are totally ordered sets.
(a) If A and B are order isomorphic, show that B and A are order isomorphic. (Hint:
If f : A → B is an order–isomorphisms from (A, ≤) to (B, ), show that f −1 is an
order isomorphism from B to A.)
(b) If f is an order isomorphism from A to B, show that for any x, y ∈ A, x < y iff
f (x) ≺ f (y).
(c) If A and B are order isomorphic and B and C are order isomorphic, show that A
and C are order isomorphic.
(d) If A and B are order isomorphic and A is well ordered, show that B is also well
ordered
Exercise 1.6.3. Suppose (A, ≤) is a well ordered set. Show that for any x ∈ A, the set
Ax := Ax ∪ {x} = {y ∈ A : y ≤ x} is an ideal. If A is not bounded above (in A), show that
Ax is an initial segment.
Exercise 1.6.4. Show that AC is equivalent to each of following statement:
Kuratowski’s lemma: If (X, ≤) is a partially ordered set and P is a chain in (X, ≤)
then, there exits a maximal chain Q in (X, ≤) that contains P .
Zermelo’s principle: If A is a family of nonempty pairwise disjoint sets then, there
exits a set C such that for each A ∈ S, A ∩ C contains exactly one element in A.
Exercise 1.6.5. Prove the following statements
1. Any subset C ⊂ N is either finite or countable
2. Suppose that A is a set such that for some function f : N → A is onto. Then A is
either finite or countable.
Exercise 1.6.6. The set of all rational numbers Q is countable.
Exercise 1.6.7. For any set X, let {0, X
X
1} the set of all functions from X into {0, 1}.
Show that the # {0, 1} = # P(X) . (Hint: for each set A ⊂ X, consider the indicator
function 1A (x) = 0 if x ∈ X \ A and 1A (x) = 1 if x ∈ A.)
Exercise 1.6.8. Show that if M is an infinite set and A is a countable set, then M and
M ∪ A are equivalent (that is, they have the same cardinality). Show that [0, 1], (0, 1], [0, 1),
(0, 1) and R have the same cardinality.
Exercise 1.6.9. (Distributive formula) Let J be a non–empty set and {Ij : j ∈ J} be a
collection of nonempty sets. Assume that for for each j ∈ J and i ∈ Ij there is associated
Q
a set Aji . Let I = j∈J Ij . Prove that
[ \ j \ [ j
(1.2) Ai = Aα(j)
j∈J i∈Ij α∈I j∈J
1.6. Exercises 17

/ AjI }. If the right hand side of (1.2) is not

(Hint: For each j ∈ J let Iˆj := {i ∈ Ij : x ∈
empty, show that that there exists j for which Iˆj = ∅.)
Exercise 1.6.10. Starting from Z+ = {0} ∪ N one can construct the natural numbers Z as
a equivalence relation on Z+ × Z+ by setting
(m, n) ∼ (p, q) iff m+q =p+n

Let Z := Z+ ×Z
∼
+
. For any (m, n) ∈ Z we use [(m, n)] to denote the class of equivalence of
(m, n). We define the following operations on Z:
[(m, n)] + [(p, q)] = [(m + p, n + q)]
[(m, n)] · [(p, q)] = [(mp + nq, mq + np)]
(a) Show that this operations are well defined, that is, if (m, n) ∼ (m′ , n′ ) and ((p, q) ∼
(p′ , q ′ ) then (m + p, n + q) ∼ (m′ + p′ , n′ + q ′ ) and (mp + nq, mq + np) ∼ (m′ p′ +
n′ q ′ , m′ q ′ + n′ p′ ). (Hint: for multiplication, show first that (mp + nq, mq + np) ∼
(mp′ + nq ′ , mq ′ + np′ ).)
(b) Show that (Z, +, ·) is a commutative ring with additive unit o∗ := [(0, 0)] and
multiplicative unit 1∗ := [(1, 0)]. Furthermore, show that of a ∈ Z \ {0∗ } and
a·b = 0∗ implies that b∗ = 0. For any [(m, n)] ∈ Z, we have that [(m, n)]+[(n, m)] =
[(0, 0)]. In particular, if n ∈ Z+ and n∗ = [(n, 0)], we set −n∗ := [(0, n)].
(c) Show that the map τ : Z+ → Z given by n 7→ [(n, 0)] preserves the addition and
product operations in Z+ , that is τ (n+m) = τ (n)+τ (m) and τ (mn) = τ (m)·τ (n).
(d) We define a total order on Z be saying that a 0∗ iff m > n in the order in Z+ . The order defined here is the
usual order . . . < −n − 1 < −n < . . . < −1 < 0 < 1 < . . . < m < m + 1 < . . .
learned in grading school.
Exercise 1.6.11. The rational numbers Q can be constructed from the natural numbers Z
by an equivalent relation on Z × (Z \ {0}) given by
(a, b)Q(c, d) iff ad = cb
Set Q := Z×(Z\{0})
Q , an for any (a, b) ∈ Z × (Z \ {0}) we use ab to denote the equivalence
class of (a, b) in Q. We define the following algebraic operations on Q:
[(a, b)]Q + [(c, d)]Q = [(ad + cb, db)]Q
[(a, b)]Q · [(c, d)]Q = [(ac, bd)]Q
(a) Show that (Q, +, ·) is a field with respect with additive unit 0Q := [(0, 1)]Q and
multiplicative unit [(1, 1)]Q . Moreover, show that if p = [(a, b)]Q , then the additive
inverse of p is −p := [(−a, b)]Q = [(a, −b)]Q ; if p 6= 0Q , then the multiplicative
inverse of p is p−1 = [(b, a)]Q .
(b) Show that the map n 7→ [(n, 1)]Q preserves the ring operations + and · on Z.
18 1. Elements of set theory

(d) We define a total order on Q by setting p < q iff q − p ∈ {[(a, b)]Q : a, b ∈ N} := P.

Show that if p, q ∈ P then,p + q ∈ P and p · q ∈ P. Conversely, show that q ∈ / P
and q 6= 0 iff −q ∈ P. Moreover, if p ∈ P and p · q ∈ P, then q ∈ P.
(e) Show that n 7→ [(n, 1)]Q , n ∈ Z, preserves the usual order in Z.
Exercise 1.6.12. Let (G, +) be a group. Suppose U an V are two nonempty sets of G
such that (a) V ⊂ U and (b) (x, y) ∈ U × V implies y − x ∈ V . Show that V = U and that
V a subgroup of G, that is (V, +) is itself a group.
Chapter 2

Elements of point set

Topology 1

In this Section we give a brief presentation of topics on point set Topology we use in these
notes. In particular, we discuss convergence over nets and uniformities. Convergence of nets
will make discussion of continuity on topological vector spaces much simpler. Uniformities
will be useful to extend the notion of measurability in metric spaces. Our presentation is
not exhaustive; however, we tried to make this section as self–contained as possible.

2.1. General definitions

Definition 2.1.1. Let X and τ be a nonempty set and collection of subsets of X respec-
tively. Then, (X, τ ) is a topological space if
(a) X ∈ τ .
S
(b) If {Ui }i∈I is an arbitrary collection of sets in τ , then i∈I Ui ∈ τ .
(c) If U and V are elements of τ , then U ∩ V ∈ τ .
In such case, τ is called a topology for X; a set in τ is called open; and s set F ⊂ X with
F c := X \ F ∈ τ is called closed set.

If τ1 and τ2 are topologies on X and τ1 ⊂ τ2 , we say that τ1 is weaker or coarser than

τ2 , or equivalently, τ2 is finer than τ1 .
The following concepts how to build a topology out of a collection of sets.

Definition 2.1.2. Let (X, τ ) be a topological space.

1This chapter may be skipped and used only as reference.

19
20 2. Elements of point set Topology

1. A collection B of subsets of X is a base for τ if any open set U ∈ τ is the union

of sets in B. A collection S of subsets of X is a subbase for τ if the collection of
finite intersections of elements in S is a base for τ .
2. A local base at a point x ∈ X is a collection Vx of open sets that contain x such
that if U ∈ τ contains x, there there is V ∈ Vx with V ⊂ U .
Example 2.1.3. The usual topology on the real line R is the one that has open intervals
(a, b), where −∞ < a < b < ∞, as a basis.
Example 2.1.4. Suppose X has a total order <. For any x ∈ X define Ax = {y : y < x}
and Bx = {y : x < y}. The collection S = {Ax , Bx : x ∈ X} is a subbasis of a topology τo
called the order topology . For any x < y, the set Bx ∩ Ax = {z : x < z < y} is open.

The next definition relates arbitrary sets within a topological space.

Definition 2.1.5. Let (X, τ ) be a topological space.
3. The interior
S E ◦ of a set E ⊂ X is the largest open set contained in E, that is,
E ◦ = {U ∈ τ : U ⊂ E}.
4. The closure
T E of a set E ⊂ X is the smallest closed set that contains E; that is,
E = {F : E ⊂ F, (X \ F ) ∈ τ }.
5. A point x ∈ X is aaccumulation point of a set A ⊂ X iff for any V ∈ τ with
x ∈ V , A ∩ V \ {x} 6= ∅. We will use A′ to denote the set of accumulations points
of A. Points in A′ are also called limit points of A.

We describe now properties of topological spaces describing relationship between points,

closed sets, and open sets.
Definition 2.1.6. Let (X, τ ) be a topological space.
6. τ is T1 if {x} is closed for every x ∈ X.
7. τ on X is Hausdorff if for any x1 , x2 ∈ X, x2 6= x2 , there are U1 , U2 ∈ τ such
that xi ∈ Ui and U1 ∩ U2 = ∅.
8. A topology τ on X is regular if for any x ∈ X, and closed set F ⊂ X, x ∈ / F
implies that there are sets U, V ∈ τ such that x ∈ U , F ⊂ V , and U ∩ V = ∅.
9. A topology τ on X is normal if for any closed subsets A1 and A2 , A1 ∩ A2 = ∅
implies that there are set U1 , U2 ∈ τ such that Ai ⊂ Ui and U1 ∩ U2 = ∅.
10. (X, τ ) is first countable if every point x ∈ X has a countable local base, and
(X, τ ) is second countable if there is a countable base for the topology τ .
Example 2.1.7. The real line R with is usual topology is Hausdorff. A totally ordered set
with the order topology is also Hausdorff. To see this, suppose x < y. If there is z such
that x < z < y, then Az and Bz are disjoint open sets containing x and y respectively. If
no such z exists, then By and Ax are disjoint open sets containing x and y respectively.
Theorem 2.1.8. For any topological space (X, τ ) and any A ⊂ X, A = A ∪ A′ .
2.1. General definitions 21

Proof. Suppose x ∈ A. Let Vx the set of all open sets that contain x. If V ∈ Vx then
A ∩ V 6= ∅; otherwise, A ⊂ X \ V which means that x ∈ X \ V , contradiction. If x ∈ A
/ A, then (V \ {x} ∩ A 6= ∅ for any V ∈ Vx , and so x ∈ A′ .
there is nothing else to do. If x ∈
′ ′
A ⊂ A ∪ A . Conversely, if x ∈ A and′ F is a closed set that contains A, then
Therefore,
X \ F ∩ A = ∅, and so x ∈ F . Therefore, A ∪ A ⊂ A.
Definition 2.1.9. In any topological space (X, τ ), the boundary of a set E ⊂ X is defined
as ∂E := E ∩ X \ E.
Theorem 2.1.10. Let (X, τ ) be a topological space. Then
(2.1) ∂E = E \ E o = ∂(X \ E)
for all E ⊂ X.

Proof. The result follows from the identity (X \ A)o = X \ A which holds for any A ⊂ X.
In particular, A = X \ E gives the first identity.
Definition 2.1.11. Suppose (X, τ ) is a topological space and ∅ =
6 A ⊂ X. The collection
τY = {Y ∩ U : U ∈ τ } defines a topology on Y . This topology is called the relative
topology induced by τ and (Y, τY ) is said to be a subspace of (X, τ ).
Definition 2.1.12. A function f between topological spaces (X, τX ) and (Y, τY ) is said to
be continuous at a point x ∈ X if for any U ∈ τY containing f (x), f −1 (U ) ∈ τX ; f is
continuous on X if it is continuous at every point of X.

Continuity of functions is completely characterized in terms of preimages of open and

closed sets, see Exercise 2.12.4. The collection of all continuous functions from X to Y is
denoted by C((X, τX ), (Y, τY )), or simply C(X, Y ) when there is no ambiguity about the
topologies τX and τY .
Definition 2.1.13. Let (X, τX ) and (Y, τY ) be topological spaces. X and Y are said to be
homeomorphic if there exists a bijective function f form X to Y such that both f and
f −1 are continuous. Such function f is called an homeomorphism between X and Y .

Suppose X is a non–empty set, {(Yα , τα ) : αA } a collection of topological spaces and

F = {fα : X −→ Yα } a collection of functions. The topology on X generated by F is
the minimal topology on X that makes each fα continuous. This topology has subbase
{fα−1 (V ) : V ∈ τα , α ∈ A }. When all the Yα coincide with either the real or the complex
space with the Euclidean topology, this topology is said to be the weak topology on X
generated by F and denoted by σ(X, F).
Theorem 2.1.14. Let F be a collection of real (or complex) valued functions on X. For
any non–empty subset A of X, let FA denote the restrictions of F to A. Then, the relative
topology on A induced by σ(X, F) and the weak topology σ(A, FA ) coincide.

Proof. For any f ∈ F, let f |A be its restriction to A. As A ∩ f −1 (V ) = (f |A )−1 (V ) for all

V ⊂ C, we have that σ(A, FA ) = {A ∩ U : U ∈ σ(X, F)}.
22 2. Elements of point set Topology

Definition 2.1.15. Let (X, τ ) be a topological space. D ⊂ X is dense in X if D = X. X

is separable if it has a countable dense subset.
Example 2.1.16. Suppose (X, <) is totally ordered. A set Z ⊂ X is order–dense if for
any x < y, there is z ∈ S such that x ≤ z ≤ y. If x < y and {u : x < u < y} = ∅, we
say that x is a predecessor of y and that y is a successor of y. Let us denote by Z ′′ the
collection of predecessors and successors in X.

If τ is a topology on X which is stronger than the order topology τo , and Z ′ is dense in

τ , then Z = Z ′ ∪ Z ′′ is order–dense; furthermore, Z is dense in τo . Indeed, suppose x < y
and let Ay = {u : u < y} and Bx = {u : x < u}. If Ux,y := Ay ∩ Bx = ∅, then (x, y) is
a predecessor–successor pair and so x, y ∈ Z ′′ ⊂ Z. Hence, for z ∈ {x, y}, x ≤ z ≤ y. If
Ux,y 6= ∅, then Ux,y ∈ τo ⊂ τ and so, there is z ∈ Z ′ ⊂ Z such that x < z < y.
Theorem 2.1.17. If (X, τ ) is topological space that admits a countable base for τ , then X
is separable

Proof. Suppose {Bn : n ∈ N} is a base for τ . For each n ∈ N choose xn ∈ Bn . We claim

that D = {xn : n ∈ N} is dense. If U := X \ D 6= ∅ then, U is a nonempty open set and so
Bn ⊂ U for some n. This means that xn ∈ U which is a contradiction.
Lemma 2.1.18. (X, τ ) be a normal space iff for whenever A and U are closed and open
subsets of X with A ⊂ U , there is V ∈ τ such that A ⊂ V ⊂ V ⊂ U .

Proof. Suppose X is normal. Since A and X \ U are disjoint closed sets, there are disjoint
open sets V, W such that A ⊂ V and (X \ U ) ⊂ W . As V ⊂ X \ W and X \ W is closed,
we have that V ⊂ X \ W . Consequently A ⊂ V ⊂ V ⊂ U .

Conversely suppose A and B are disjoint closed sets. Then A ⊂ U := X \ B. Choose V ∈ τ

with A ⊂ V ⊂ V ⊂ U . Then B = X \ U ⊂ X \ V = (X \ V )o and V ∩ (X \ V )o = ∅.
Theorem 2.1.19. If (X, τ ) is regular and second countable, then it is normal.

Proof. Suppose A and B are nonempty disjoint sets. For each x ∈ A there is a set Vx ∈ τ
around x such that V x ∩ B = ∅. Similarly, for any y ∈ B there is a set Ux ∈ τ such that
U y ∩A = ∅. Thus, there are countable collections {Pn : n ∈ N, and {Qn : n ∈ N} of open set
S S S S S
sets such that A ⊂ x∈A Vx = n Pn and B ⊂ y∈B Uy = n Qn . Let Pn∗ := Pn \ nj=1 Qn
S
and Q∗n := Qn \ nj=1 Pn . Clearly Pn∗ and Q∗n are open sets, Pn∗ ∩ Q∗m = ∅, and from
[ n
[
Pn ∩ A ⊂ Pn \ Qj ⊂ Pn \ Qj
j j=1
[ [n
Qn ∩ B ⊂ Qn \ P j ⊂ Qn \ Pj
j j=1
S ∗
S ∗
we have that A ⊂ P := n Pn , B ⊂ Q := n Qn , and P ∩ Q = ∅.
2.2. Connected spaces 23

Lemma 2.1.20. (Urysohn’s separation lemma) Suppose X is a normal topological space.

If A and B are two non–empty disjoint closed sets in X, there exists a continuous function
0 ≤ f ≤ 1 such that f (A) = {1} and f (B) = {0}.

Proof. Let U = X \ B so that A ⊂ U . Let D0 = S{0, 1} and for each n ∈ N define

Dn = { 2an : 0 < a < 2n , a ≡ 1 mod 2}. The set D = n Dn is the set of dyadic rational
numbers. We will define a chain {Ut }t∈D of subsets of X as follows. First, D1 = A and
D0 = U . For n = 1, let U 1 be an open such that U1 ⊂ U 1 ⊂ U 1 ⊂ U0 . Suppose that
2 Sn−1 2 2
n ≥ 2 and that we have defined open sets Ut with t ∈ k=0 Dk in such a way that Ut ⊂ Us
Sn−1
whenever s, t ∈ k=0 Dk , and s < t. For each u = 2an ∈ Dn , s = a−1 a+1
2n and t = 2n belong
to Dn−1 , and so the sets Us and Ut are already defined. Choose and open set Uu so that
Ut ⊂ Uu ⊂ Uu ⊂ Us . This procedure defines a chain {Ut }t∈D of open sets satisfying
Ut ⊂ Us whenever s, t ∈ D, and s < t.

Define the function f : X → [0, 1] as

0 if x ∈ X \ U
f (x) =
sup{t ∈ D : x ∈ Ut } if x∈U
Clearly f = 1 on A, and f = 0 on B. To show that f is continuous it is enough to show
that f 1 ((α, 1]) and f −1 ([0, β))
S are open sets for any 0 ≤ α−1< 1 and 0 <Sβ ≤ 1. Notice that
f (x) > α if and only if x ∈ {Ut : t ∈ D, t > α}. Hence f ((α, 1]) = {Ut : t ∈TD, t > α}
is open. Similarly, being {Ut }t a Tchain, we have that f (x)T≥ β if and only if x ∈ {Us : s ∈
D, s < β}. Hence, f −1 ([β, 1]) = {Us : s ∈ D, s < β} = {Ut : t ∈ D, t < β} is closed, and
so f −1 ([0, β)) is open.

2.2. Connected spaces

A topological space X is said to be connected if whenever A ⊂ X is closed and open in X
(clopen), then either A = ∅ or A = X. A set B ⊂ X is connected, if B is connected in the
relative topology on B inherited from X.
Remark 2.2.1. Two sets A, B in a topological vector space are said to be separated iff
(A ∩ B) = (A ∩ B) = ∅. From the definition of connectedness, it follows that a topological
space X is connected iff it is not the union of two nonempty separed subsets.
Example 2.2.2. For any a, b ∈ R, a < b, the interval I = [a, b] is connected in the usual
topology in R. Indeed, suppose A ⊂ [a, b] is open and closed in I. Without loss of generality
suppose a ∈ A (otherwise consider I ∩ Ac instead of A). Set B = I \ A. If B 6= ∅ then, as
B is closed, β := inf(B) ∈ B. Thus, β > a, and as B is open in I, (β − ε, β + ε) ⊂ B for
some ε > 0. This means that there are points d ∈ B with β − ε < d < β, contradicting the
definition of β; hence, I = A.
Theorem 2.2.3. If Y is a connected subset of a topological space X, then Y is connected.
24 2. Elements of point set Topology

Proof. Suppose Y is the union of two disjoint clopen sets A and B in Y . Then A ∩ Y and
B ∩ Y are clopen in Y . Hence, either A ∩ Y = ∅ or B ∩ Y = ∅. Suppose Y ∩ B = ∅. Then
Y ⊂ A and so, Y = A = A since A is closed in Y . Thus, B = ∅.
Theorem 2.2.4. Let A be a family of connected subset
S in a topological space X. Suppose
that no two members of A are separated. Then, Y = {A : A ∈ A} is connected.

Proof. Suppose D ⊂ Y is clopen in Y . Then A ∩ D is clopen in A for all A ∈ A. Since each

set in A is connected, then either A ⊂ D or A ⊂ Y \ D. No pair (A, B) ∈ A × A satisfies
A ⊂ D and B ⊂ Y \ D otherwise, for such a pair (A ∩ B) ∪ (A ∩ B) ⊂ D ∩ (Y \ D) = ∅
contradicting the hypothesis. Consequently, either all members of A are contained in D (in
which case Y = D), or all members of A are contained in Y \D (in which case Y = Y \D).
Corollary 2.2.5. Suppose X is a topological space such that for any two points p, q ∈ X,
there is a connected set Cp,q that contains them. Then, X is connected.

Proof. Fix x ∈ X. For any y ∈ X choose a connected set Cy such that {x, Sy} ⊂ Cy .
Then, {Cy : y ∈ X} satisfies the conditions of Theorem 2.2.4 and so, X = y∈X Cy is
connected.
Example 2.2.6. As in Example 2.1.16, suppose (X, <, τ ) is a totally ordered set with a
topology τ stronger than the order topology τo . If (X, τ ) is connected, then collection of
predecessor–successor pairs in X is empty, for if x < y, then x ∈ Ay = {u : u < y},
y ∈ Bx = {u : x < u} and X = Ay ∪ By . Since X is connected, and Ay and Bx are
nonempty open sets, Ay ∩ Bx 6= ∅. Hence, any dense set Z in τ is not only order-dense, but
also dense in τo .
Example 2.2.7. Suppose (X, <) is order complete (see Definition 1.2.4[7]), and that there
are no predecessor–successor pairs. Then (X, τo ) is connected. To see this, suppose X =
A ∪ B where A and B are nonempty disjoint open (and hence clopen) sets. Let a ∈ A and
b ∈ B and without loss of generality assume a < b. Then Ia,b = A0 ∪ B0 where A0 = A ∩ Ia,b
and B0 = B ∩ Ia,b . Since X \ Ia,b = {u : u < a} ∪ {u : b < u}, Ia,b is closed in X; hence,
A0 and B0 are closed in X. As A0 is bounded by b, c = sup A0 exists. Since any open
neighborhood of c contains points of A0 , and b ∈ / A, c ∈ A0 and a ≤ c < b. Since A0 is
open in Ia,b there is an open neighborhood V of c in X such that V ∩ Ia,b ⊂ A0 . Hence, for
some d > c, ∅ =6 {u : c ≤ u < d} ⊂ A0 . This means that there is z ∈ A0 such that c < z,
contradicting the definition of c. Therefore, X is connected.
Example 2.2.8. Theorem 2.2.4 implies that, for any point x in a topological space X, the
union C(x) of all connected subsets of X that contain x is connected. This set is closed in
X since C(x) is connected and so, contained in C(x). The sets C(x), called the connected
components of X, form a partition of X. Indeed, if C(x) ∩ C(y) 6= ∅, then C(x) ∪ C(y)
is connected and contains x and y. Then, be definition of C(x) and C(y), it follows that
C(x) = C(x) ∪ C(y) = C(y).
Theorem 2.2.9. Suppose X is a connected space. If M ⊂ X is connected and X \ M =
A ∪ B, where A and B are separated, then A ∪ M and B ∪ M are connected.
2.2. Connected spaces 25

Proof. Suppose A ∪ M = U ∪ V where U and V are non–empty and separated. Since

M = (M ∩ U ) ∪ (M ∩ V ) is connected, either M ⊂ U or M ⊂ V . Without loss of generality,
assume that M ⊂ V . Then
U = (A ∪ M ) \ V ⊂ (A ∪ M ) \ M = A
Since X = U ∪ (V ∪ B),
U ∩ (V ∪ B) = (U ∩ V ) ∪ (U ∩ B) ⊂ (U ∩ V ) ∪ (A ∩ B) = ∅
U ∩ V ∪ B = (U ∩ V ) ∪ (U ∩ B) ⊂ (U ∩ V ) ∪ (A ∩ B) = ∅
This contradicts the assumption that X is connected. Hence A ∪ M is connected. A similar
proof shows that B ∩ M is connected too.

Theorem 2.2.10. Suppose X and Y are topological spaces, and f : X → Y is continuous.

If X is connected, then f (X) is connected.

Proof. Suppose B ⊂ f (Y ) is clopen in f (Y ). Then, by continuity, A = f −1 (B) is clopen

in X. Thus, either A = ∅ or A = X. Consequently, either B = ∅ or B = f (X).

Theorem 2.2.11. Suppose f : X → Y is a continuous function. If C is a connected

component of Y , then f −1 (C) is a union of components of X.

Proof. If f −1 (C) = ∅, take an empty union of components of X. Suppose x ∈ f −1 (C),

and let D(x) be the connected component of X that contains x. Then f (x) ∈ f (D(x)) ∩ C,
and since f (D(x)) is connected
S in Y , it follows that f (D(x)) ⊂ C; hence D(x) ⊂ f −1 (C).
It follows that f (C) = {D(x) : x ∈ f −1 (C)}.
−1

The following result is useful to determine connectedness of a set.

Theorem 2.2.12. A topological space (X, τ ) is connected iff no continuous function f :

X → {0, 1}, where {0, 1} has the discrete topology, is surjective.

Proof. Suppose f : (X, τ ) → ({0, 1}, τd ) is continuous. Then A = f −1 ({0}) and B =

f −1 ({1}) are clopen subsets of X.

If f is surjective, then X has proper nonempty clopen sets.

Conversely, if X is connected, then either A = ∅ or B = ∅; thus, f is not surjective.

Definition 2.2.13. A topological space (X, τ ) is locally connected if τ admits a basis of

connected sets.

All Euclidean spaces, and any normed space in general, are locally connected.

Lemma 2.2.14. A space is locally connected iff the connected components of each open set
are open.
26 2. Elements of point set Topology

Proof. Suppose X is locally component and U is open in X. Let C be a connected com-

ponent of U and x ∈ C. Since U is open, there is a connected open neighborhood Vx of x
that is contained in U . Hence, Vx ⊂ C.

Conversely, suppose the connected components of each open set are open. Then, the col-
lection of all connected components of some open set in X form a basis.
Corollary 2.2.15. If X is locally connected and f : X −→ Y is a continuous closed
function, then f (X) is locally connected.

Proof. Without loss of generality, me may assume that f (X) = Y . It suffices to show that
a component C of an open set U in Y is open. By continuity, f −1 (U ) is open in X, and the
restriction f |f −1 (U ) : f −1 (U ) −→ U is continuous. By Theorem 2.2.11 and Lemma 2.2.14,
f −1 (C) is the union of open components of the open f −1 (U ); hence f −1 (C) is open in
X. Consequently, X \f −1 (C) = f −1 (Y \ C) is closed in Y . Since f is a closed function,
Y \ C = f f −1 (Y \ C) is closed in Y and so, C is open in Y .

A path in a topological space X is a continuous function γ : [0, 1] → X.

Definition 2.2.16. A topological space X is pathwise connected if for any two points
x, y ∈ X, there is a path γ in X such that γ(0) = x and γ(1) = y; X is arcwise connected
if for any two points x, y ∈ X there is an injective path γ in X such that γ(0) = x and
γ(1) = y.
Remark 2.2.17. Clearly an arcwise connected space is path connected. The opposite
however may not be true. Consider X = {0, 1} with the trivial topology τ = {∅, X}. This
space is pathwise connected, for the map γ(x) = 0 for x ∈ [0, 1) and γ(1) = 1 is continuous.
X is not arcwise connected however, since injectivity is not possible.
Theorem 2.2.18. If X is a path connected space, then X is connected.

Proof. Fix x0 ∈ X. For any y ∈ X, let γy be a path joining x0 to y. Since the

S sets
in A = {γy ([0, 1]) : y ∈ X} are connected and each of them contains x, X = A is
connected.

Given a topological space X and points x, y ∈ X, a simple chain from x to y is a

finite collection of open sets U0 , . . . , Un such that x ∈ Uj iff j = 0, y ∈ Uj iff j = n, and
Uj ∩ Ui 6= ∅ iff |j − i| ≤ 1.
Lemma 2.2.19. Suppose X is a connected set and let U be an open cover of X. For any
pair of points (x, y) ∈ X × X, there is a simple chain from x to y whose links are in U .

Proof. Fix x0 ∈ X. Let A be the collection of all x ∈ X for which there is a simple chain
from x0 to x. We claim that A is open. Indeed, if x ∈ A, there is a chain {U0 , . . . , Un } ⊂ U .
For any y ∈ Un , either U0 , . . . , Un is a simple chain from x0 to y, or U0 , . . . , Un−1 is a link
from x0 to y. The latter occurs when y ∈ Un−1 ∩ Un . Hence, A is open.
2.3. Convergence 27

We now show that A is closed. Let y ∈ A and let U ∈ U that contains y. Hence there exists
x ∈ A ∩ U . Let U0 , . . . , Un a simple chain from x0 to x with links in U . Let k the smallest
integer between 0 and n such that Uk ∩ U 6= ∅. Then, either U0 , . . . Uk , U is a simple chain
from x0 to y, or U0 , . . . , Uk is a simple chain from x0 to y. Hence y ∈ A.
Corollary 2.2.20. Any open connected set in the Euclidean space Rn is arcwise connected.

Proof. Let U be an open connected subset of Rn and let B be the collection of all open balls
contained in U . Lemma 2.2.19 implies that for any pair of points x, y ∈ U there is a chain
{B0 , . . . , Bn } ⊂ B from x to y. Let xj be the center of the ball Bj and for j = 0, . . . , n − 1,
x−1 = x, and xn+1 = y. Define the straight line segments
γj,j+1 (λ) = λxj+1 + (1 − λ)xj , 0≤λ≤1
Sn
The polygonal path γ = γ−1,0 .γ0,1 . . . . .γn−1,n .γn,n+1 contains a simple path in j=0 Bj ⊂U
joining x to y.

2.3. Convergence
Suppose D is a non empty set and let be a relation from D to D. From Definition 1.2.2,
we recall that is a pre–order if it is reflexive and transitive; is an partial order if it is
an antisymmetric pre-order; is a total order (or simply and order) if it is a partial order
and each pair (x, y) ∈ D × D, either x y or y x. We introduce another related type of
pre–order.
Definition 2.3.1. A direction on D is a pre–order on D such that for any n and m in
D there is k ∈ D such that n k and m k.
Example 2.3.2. (a) Any non empty subset of R is an ordered and a directed set with
respect the natural order ≤ of numbers.
(b) A collection D of subsets of a given set X is partially ordered by inverse inclusion;
that is A B iff A ⊃ B. If in addition, for any A and B in D, there is C ∈ D
such that C ⊂ A ∩ B, then D is also directed by inverse inclusion.
(c) A collection A ⊂ RS is partially ordered by f ≤ g iff f (x) ≤ g(x) for all x ∈ S. A
is directed if for any f and g in A , there is h ∈ A such that max{f, g} ≤ h.
(d) If (D, 1 ) and (E, 2 ) are directed sets, then D × E is a directed set with respect
to the Cartesian direction: (n, a) ≤ (m, b) iff n 1 m and a 2 b.

A net on a set X indexed by a directed set D is a function x : D → X. Nets are

typically denoted as {xn : n ∈ D}.
Example 2.3.3. (a) Any sequence {xn : n ∈ Z+ } is a net indexed by Z+ .
(b) Any function f on R with values in an arbitrary set X is a net indexed by R.
(c) Let (X, τ ) be a topological space and Vx a local base of open sets at x ∈ X. For each
V ∈ Vx , pick xV ∈ V . If Vx is directed by inverse inclusion, then {xV : V ∈ Vx } is
a net indexed by Vx .
28 2. Elements of point set Topology

The following concept generalizes the notion of convergence of sequences.

Definition 2.3.4. A net {xn : n ∈ D} in a topological space X converges to a point x ∈ X,
denoted by xn → x, if for any V ∈ τ with x ∈ V , there exists m ∈ D such that xn ∈ V
whenever n ≥ m. In such case x is said to be a limit of {xn : n ∈ D}.
Theorem 2.3.5. A topological space X is Hausdorff iff any convergent net has a unique
limit.

Proof. Suppose X is Hausdorff and {xn : n ∈ D} is net converging to x and y. If x 6= y,

let Vx and Vy be disjoint open neighborhoods of x and y respectively. There is m ∈ D such
that xn ∈ Vx and xn ∈ Vy for all n ≥ m. This is a contradiction to Vx ∩ Vy = ∅.

Conversely, suppose X is a topological space where each convergent net has a unique limit.
If X were not Hausdorff, then there would exist a pair of points x and y such that for any
open sets V ∈ Vx and U ∈ Vy there is xU,V ∈ V ∩ U . Then {xV,U : (V, U ) ∈ Vx × Vy } is a
net in X that converges to both x and y which is a contradiction.
Definition 2.3.6. A net {ym : m ∈ E} is a subnet of the net {xn : n ∈ D} if there is a
function g : E → D such that ym = xg(m) , and for any n ∈ D, there is M ∈ E such that
m ≥ M implies that g(m) ≥ n.
Example 2.3.7. Suppose {xn : n ∈ N} is a sequence. Then
1. A subsequence of a sequence is clearly a subnet.
2. Let N × N be directed by the Cartesion direction: (n, m) (k, ℓ) iff n ≤ k and
m ≤ ℓ. Then yn,m = xn+m is a subnet of xn .
Example 2.3.8. Let x : (1, ∞) → (0, 1) be x(α) = 1/α and define y : (0, 1) → (0, 1) as
y(α) = α. If (1, ∞) is ordered by the natural order in the real line and (0, 1) is ordered as
t s iff 1/t ≤ 1/s then x and y are subnets of each other.
Definition 2.3.9. Let {xn : n ∈ D} be a net in a topological space X. A point x ∈ X is
a cluster point of {xn : n ∈ D} if for any V ∈ Vx and m ∈ D, there is n ≥ m such that
xn ∈ V .

Although a cluster point of a net {xn : n ∈ D} is clearly a point in the closure of

{xn : n ∈ D}, the converse is not true in general. The following result holds however.
Theorem 2.3.10. Let {xn : n ∈ D} be a net in X and set An = {xm : m ∈ D,Tm ≥ n} for
each n ∈ D. A point x ∈ X is a cluster point of the net {xn : n ∈ D} iff x ∈ An .
n∈D

Proof. If x is a cluster point of {xn : n ∈ D}, then for any V ∈ Vx and n ∈ D there is
m ≥ n such that xm ∈ V , that is, V ∩ An 6= ∅. Therefore x ∈ An for all n ∈ D.

If x is not a cluster point of {xn : n ∈ D} then there exists

S V ∈ Vx and N ∈ D such that
xn ∈ X \ V whenever n ≥ N . Thus, x ∈ / AN and so x ∈ X \ An .
n∈D
2.3. Convergence 29

Theorem 2.3.11. A point x is a cluster point of a net {xn : n ∈ D} iff there is a subnet
that converges to x.

Proof. Suppose that {ym : m ∈ E} is a subnet of {xn : n ∈ D} that converges to x. Let

g : E → D as in the definition of subnet. For any n ∈ D, there is M ′ ∈ E such that m ≥ M ′
implies that g(m) ≥ n. Also, for any V ∈ Vx , there is M ∈ E such that m ≥ M implies
that ym ∈ V . Since E is a directed set, there is m′ ∈ E with m′ ≥ M ′ and m′ ≥ M ; hence,
g(m′ ) ≥ n and xg(m′ ) = ym′ ∈ V . This shows that x is a cluster point of {xn : n ∈ D}.

Let x be a cluster point of a net {xn : n ∈ D}. Let Vx be the collection of open neighborhoods
of x directed by inverse inclusion. For any V ∈ Vx and n ∈ D, choose g(n, V ) ≥ n with
xg(n,V ) ∈ V . Setting yn,V := xg(n,V ) , we have that {yn,V : (n, V ) ∈ D × Vx }, where D × Vx
is directed by the Cartesian product, is a subnet of {xn : n ∈ D} that converges to x for if
U ∈ Vx and m ∈ D, then (n, V ) ≥ (m, U ) implies that yn,V ∈ V ⊂ U .
Theorem 2.3.12. A net {xn : n ∈ D} in a topological space X converges to x ∈ X iff
every subnet converges to x.

Proof. Suppose {xn : n ∈ D} converges to x and let {ym : m ∈ E} be any subnet. For
any V ∈ Vx there is N ∈ D such that n ≥ N implies that xn ∈ V . Let g : E → D as in the
definition of subnet. There is M ∈ E such that m ≥ M implies that g(m) ≥ N . Therefore,
m ≥ M implies that ym = xg(m) ∈ V .

Suppose {xn : n ∈ D} is a net for which any subnet converges to x. We will show that xn
converges to x by way of contradiction. If xn 9 x, then there is V ∈ Vx such that for any
n ∈ D, there is g(n) ≥ n with xg(n) ∈
/ V . Hence, {yn = xg(n) : n ∈ D} is a subnet that does
not converge to x which is a contradiction.
Example 2.3.13. Let Q ∩ [0, 1] be directed by the natural order in the real line. Then
x(α) = α is a net in [0, 1] which converges to 1. Any number 0 ≤ a < 1 is an accumulation
point of xα but not a cluster point.
Theorem 2.3.14. A point x ∈ A iff there is a net {xn : n ∈ D} ⊂ A such that xn → x.

Proof. Suppose a net {xn : n ∈ D} ⊂ A converges to x. Then for any V ∈ Vx , there exists
N ∈ D such that n ≥ N implies that xn ∈ V . Thus, V ∩ A 6= ∅ for all V ∈ Vx .

Conversely, suppose that x ∈ A. Then, for any V ∈ Vx there is xV ∈ V . Hence, {xV : V ∈

Vx }, where Vx is directed by reverse exclusion, is a net converging to x.
Theorem 2.3.15. A function f between two topological spaces X and Y is continuous at
a point x ∈ X iff for any net xα → x, f (xα ) → f (x).

Proof. If f is continuous at x, for any U ∈ Uf (x) , there is V ∈ Vx such that f (V ) ⊂ U .

Since xα → x, there is α0 ∈ D such that α ≥ α0 implies that xα ∈ V . Therefore, α ≥ αo
implies that f (xα ) ∈ U .
30 2. Elements of point set Topology

Conversely, suppose that f (xα ) → f (x) for any net xα → x. If f were not continuous at
x, there would be U ∈ Uf (x) such that any V ∈ Vx contains a point xV with f (xV ) ∈ / U.
The net {xV : V ∈ Vx } converges to x; however, f (xV ) fails to converge to f (x). This is a
contradiction.

The next result states that in first countable Hausdorff topological spaces, it is enough
to consider convergence of sequences to determine closure of sets and continuity of functions.
Theorem 2.3.16. If (X, τ ) is first countable, then:
(i) X is Hausdorff iff any convergent sequence in X has a unique limit.
(ii) A point x ∈ X is a cluster point of a sequence {xn : n ∈ Z+ } iff there exists a
subsequence that converges to x.
(iii) A sequence xn converges to x iff every subsequence converges to x.
(iv) x ∈ A iff there is a sequence xn ∈ A that converges to x.
(v) For any topological space (Y, τ ′ ) and function f : X → Y , f is continuous at x iff
n→∞ n→∞
for any sequence xn −−−→ x, f (xn ) −−−→ f (x).

Proof. By hypothesis,
T any point x ∈ X has a countable local base Vx = {Vn : n ∈ N} and,
by setting Un = nj=1 Vj if necessary, we may assume that Vn ⊂ Vn+1 for all n ∈ N.

(i) Since any sequence is a net, only sufficiency remains to be proved. Suppose any conver-
gent sequence in X has a unique limit. Let x and y be points in X and let {Vn : n ∈ N}
and {Un : n ∈ N} be decreasing local neighborhoods of x and y respectively. If Vn ∩ Un 6= ∅
for all n ∈ N then we can choose xn ∈ Vn ∩ Un . The sequence {xn : n ∈ N} converges to
both x and y. Therefore, x = y.

(ii) Since a subsequence of a sequence is a subnet of the sequence, only necessity remains
to be proved. Suppose x is a cluster point of the sequence {xn : n ∈ N}. There is n1 ≥ 1
such that xn1 ∈ V1 ∈ Vx . Having found xn1 , . . . , xnk such that n1 < . . . < nk and xnj ∈ Vj
we choose xnk+1 ∈ Vk+1 such that nk+1 ≥ nk + 1, which is possible since x is a cluster point
of {xn : n ∈ N}. Therefore, {xnk : k ∈ N} is a subsequence that converges to x.

(iii) This statement is trivial and is left as an exercise.

(iv) Since any sequence is a net, only necessity remains to be proved. If x ∈ A then
Vn ∩ A 6= ∅ for each Vn ∈ Vx . Choosing xn ∈ Vn ∩ A for each n ∈ N, we obtain a sequence
n→∞
xn −−−→ x.
n→∞
(v) Since any sequence is a net, only sufficiency remains to be proved. Suppose f (xn ) −−−→
n→∞
f (x) whenever xn is a sequence with xn −−−→ x. If f fails to be continuous at x, then
there is a neighborhood U ∈ Vf (x) such that for any n ∈ N there is xn ∈ Vn , Vn ∈ Vx ,
/ U . Then xn is a sequence converging to x for which f (xn ) 9 f (x). This is a
with f (xn ) ∈
contradiction.
2.4. Compactness 31

2.4. Compactness
Definition 2.4.1. A subset K of a topological space (X, τ ) is compact if every open cover
{Ui : i ∈ I} ⊂ τ of K admits a subcover {Ui : i ∈ J} with J ⊂ I finite.

A collection G of subsets of X has the finite intersection property if any finite

subcollection of G has non–empty intersection.
Theorem 2.4.2. A topological space X is compact iff any collection of closed sets F that
has the finite intersection property has non–empty intersection.

Proof. Suppose X is compact and let F be a collection of closed sets with the finite
intersection property. If ∩{F : F ∈ F} = ∅, then {X \ F : F ∈ F} is an open cover of X,
and so there is a finite subcover {X \ Fj : Fj ∈ F, j = 1, . . . , N }. Consequently, ∩N
j=1 Fj = ∅
which is a contradiction.
Suppose that every collection of closed subsets of X with the finite intersection property
has nonempty intersection. Let U be an open cover of X. Then, F = {X \ U : U ∈ U } is a
collection of closed with empty intersection. Consequently, there exists a finite subcollection
F ′ ⊂ F with empty intersection. The sets {U : X \ U ∈ F ′ } is finite open subcover.
Example 2.4.3. A closed and bounded interval [a, b] in (R, | |) is compact. More generally,
in (Rn , k k2 ), any box closed bounded box I = [a1 , bn ] × . . . × [an , bn ] is compact.

Proof. Suppose I0 is a closed bounded box that has an open cover U from which no finite
subcover can be extracted. Halve each side of I0 to obtain 2n boxes whose sides have the
same length. One of those boxes, say I1 can not be cover cover by a finite subcover from
U . By induction, we obtain a sequence of nested closed boxes Ik+1 ⊂ Ik each of which is
obtained by halving the sides of the previous one, and which may not be covered by a finite
collection of sets in U . Let for each j = 1, . . . , n, let αkj < βkj be the endpoints of each of
the sides if the box Ik . Then ajk ≤ ajk+1 ≤ bjk+1 ≤ bjk and, as bjk − ajk = 2−k (bj0 − aj0 ), there
is x∗ ∈ I such that limk ajk = xj∗ = limk bjk for each j = 1, . . . n. Let U ∈ U be a an open set
that contains x∗ . For some ε > 0 the the ball B(x∗ ; ε) is fully contained in ⊂ U . Then for
all k ∈ N large enough Ik ⊂ B(x∗ ; ε) ⊂ U . This is a contradiction since no chosen Ik can
be covered by a finite collection of sets in U .
Theorem 2.4.4. A topological space X is compact iff any net {xn : n ∈ D} in X has a
cluster point. That is, X is compact iff any net in X has a convergent subnet.

Proof. Suppose X is compact and let {xn : n ∈ D} be a net in X. For each n ∈ D set
An = {xm : m ∈ D, m ≥ n}. Since D is a directed set, the collection of all sets An has the
{An : n ∈ D} also has the finite intersection property.
finite intersection property; hence, T
By compactness, there exists x ∈ n∈D An . From Theorem 2.3.10, it follows that x is a
cluster point of {xn : n ∈ D}.
Conversely, assume that any net in X has a cluster point or equivalently, that every net in X
has a convergent subnet. Suppose F is a collection of closed sets with the finite intersection
32 2. Elements of point set Topology

property. Let G be the collection of all finite intersections of sets in F and direct it with
the inverse inclusion. Since F ⊂ G, it is enough to show that ∩{G : G ∈ G} = 6 ∅. For any
G ∈ G choose xG ∈ G. Then, the net {xG : G ∈ G} has a cluster point x ∈ X. We claim
that x ∈ ∩{G : G ∈ G}. Indeed, if G ∈ G then, for any V ∈ Vx , there is H ∈ G with H G
(H ⊂ G) such that xH ∈ H ∩ V ⊂ G ∩ V . Therefore, x ∈ G = G.

The next result shows that in second countable Hausdorff spaces, sequences are enough
to determine compactness.
Theorem 2.4.5. Suppose X is a second countable Hausdorff space X. X is compact iff
any sequence in X has a convergent subsequence.

S open cover G = {Gn : n ∈ Z+ }

Proof. Sufficiency. It is enough to show that any countable
of X admits a finite subcover. Substituting Gn with nk=0 Gk , we may assume that the
cover is increasing. Let U0 = G0 . If G0 does not cover X, let Gn1 ∈ G be the first set such
that U1 = Gk1 \ G0 6= ∅. Suppose a strictly monotone family {U0 , . . . , Um−1 } ⊂ G has been
Sm−1
defined. If j=0 Uj = Um−1 6= X, let Gnm be the first in G such that Gnm \ Um−1 6= ∅. If
no finite subcover exists, then {Um : m ∈ Z+ } is an infinite strictly monotone subcover of
G. Let xm ∈ Um−1 \ Um . By assumption {xm : m ∈ Z+ } admits a convergent subsequence
{xmk : k ∈ Z+ . Let x be the limit of this subsequence. Then x ∈ Uℓ for some ℓ. This
means that all but finitely many elements in {xmk : k ∈ Z+ } are contained in Um . This is
a contradicts the fact that xm ∈ X \ Uℓ for all m > ℓ. Hence a finite subcover exists.

Necessity. Suppose X is compact. Let x = {xn : n ∈ N} be a sequence. Since a sequence

is a net, x admits a cluster point, say x. Let {Vk : k ∈ Z} be a decreasing local system
of neighborhoods at x. Let n1 = 1 and xn1 ∈ V1 . By induction, for k > 1, there is
nk ≥ nk−1 + 1 and xnk ∈ Vk . The subsequence {xnk } converges to x.
Theorem 2.4.6. Let f : X −→ Y be a continuous function between topological spaces X
and Y . If X is compact, then f (X) is a compact subset of Y .

Proof. Let {Vi : i ∈ I} by an open cover of f (X) in Y . Then {f −1 (Vi ) :Si ∈ I} is an

open cover of X. Hence, there exists a finite subset J ⊂ I such that X = j∈J f −1 (Vj ).
S S S
Therefore, f (X) = f −1 (V ) = −1 (V ) ⊂
j∈J f j j∈J f f j j∈J Vj .

In many applications, compactness comes along with the Hausdorff separation property.
In that case, compact sets are also closed sets. The following result offers another link
between these properties.
Lemma 2.4.7. Suppose τ1 ⊂ τ2 are topologies on X. If τ1 is Hausdorff and τ2 is compact,
then τ1 = τ2

Proof. Suppose X \F ∈ τ2 . Since X is τ2 –compact then F is compact. Since τ1 ⊂ τ2 , every

τ1 –open cover of F is a τ2 –open cover of F ; hence, F is τ1 –compact. Since τ1 is Hausdorff,
it follows that F is τ1 –closed; consequently, X \ F ∈ τ1 .
2.5. Metric spaces 33

Theorem 2.4.8. (Alexander) Let (X, τ ) be a topological space and let S be a subbase for
τ . X is compact iff every cover of X by sets in S admits a finite subcover.

Proof. Only sufficiency needs be proved. Suppose that every subbasic cover of X admits
a finite subcover. If X is not compact then the collection X of all open covers of X that
do not admit a finite subcover is non empty. X is partially ordered by inclusion. Observe
that the union of a nonvoid chain in X is also S
an open cover of X in X . By Zorn’s lemma,
X contains a maximal cover V. Hence, X = V and if U ∈ τ \ V then V ∪ {U } admits a
finite subcover. Let W = V ∩ S . Since W ⊂ V, then no finite subfamily of W covers X.
Consequently, since W ⊂ S , then W does not cover X.
S
Let x ∈ X \ W and choose V ∈ V such that x ∈ V . Since S is a subbase, there are
S1 , . . . , Sn ∈ S such that
\n
x∈ Sj ⊂ V.
j=1
S
Since x ∈ X \ W , we conclude that Sj ∈ / V for all j = 1, . . . , n. The maximality of V
implies that for each 1 ≤ j ≤ n, there is a set Aj which is a union of finite sets in V such
that Sj ∪ Aj = X. Hence,
n
[ n
\ n
[ n
\
V ∪ Aj ⊃ Sj ∪ Aj ⊃ Sj ∪ Aj ) = X.
j=1 j=1 j=1 j=1

Thus X is the union of a finite collection of sets in V in contradiction to the choice of V.

2.5. Metric spaces

We recall the following concepts:
Definition 2.5.1. A metric on X is a function d : X × X → [0, ∞) such that
(i) d(x, y) = 0 if and only x = y
(ii) d(x, y) = d(y, x) for any x, y ∈ M
(iii) d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z ∈ X.
Remark 2.5.2. If d satisfies (ii)–(iii) only, then d is called a pseudo–metric on X.

The pair (X, d) is called metric space. The metric d induces a topology τd with a base
given by the open balls Br (x) = {y ∈ M : d(x, y) < r}. A topological space (X, τ ) is
metrizable if there is a metric d on X such that τ = τd .
(iv) A sequence {xn } ⊂ X is convergent if there is x ∈ X such that, for any ε > 0,
there is N ∈ N so that, d(xn , x) < ε whenever n ≥ N .
(v) A sequence {xn : n ∈ N} ⊂ (X, d) is Cauchy if for any ε > 0, there exists N ∈ N
such that d(xn , xm ) < ε whenever n, m ≥ N .
(vi) (X, d) is said to be complete if any Cauchy sequence is convergent.
34 2. Elements of point set Topology

(vii) The diameter of a set A ⊂ X is defined by

diam(A) = sup{d(x, y) : x, y ∈ A}.
if A 6= ∅ and zero otherwise.
(viii) The distance from a point x ∈ X to a set A ⊂ X is defined by
d(x, A) := inf{d(x, a) : a ∈ A}.

An important examples of metric spaces are normed spaces.

Example 2.5.3. Suppose X is a vector space over a field F = R or R = C. A norm on X
is a function k k : X → R+ such that for any x, y ∈ X and λ ∈ F
(a) kxk = 0 iff x = 0
(a) kλxk = |λ|kxk
(b) kx + yk ≤ kxk + kyk
The pair (X, k k) is referred to as a normed space. The norm k k induces a metric on X
given by dk k (x, y) = kx − yk. A Banach space is a normed space (X, k k) for which the
metric dk k is complete.
Example 2.5.4. Suppose (X, d) is a complete metric space. For any points x, y ∈ X define
d(x, y)
ρ(x, y) :=
1 + d(x, y)
d1 (x, y) := 1 ∧ d(x, y)
It is easy to check that ρ and d1 are metrics on X and that they are equivalent to d in the
sense that the topologies τρ , τd1 and τd are the same. See Exercise 2.12.7
Definition 2.5.5. (X, τ ) is said to be a Polish space if it is separable and admits a metric
d such that (X, d) is complete and τ = τd .
Example 2.5.6. (Euclidean spaces) Let F denote either the set of real or complex numbers.
F with kxk := |x|, and more generally the space Fn with
 1
n 2
X
kxk2 :=  |xj | 2

j=1

are normed spaces. Moreover, (Rn , | |) is separable and hence, it is a Polish space.
Lemma 2.5.7. Let (X, d) be a metric space.
(i) For any x, y ∈ X and A ⊂ X,
|d(x, A) − d(y, A)| ≤ d(x, y).
(ii) If A ⊂ B ⊂ X, then d(x, B) ≤ d(x, A) for all x ∈ X.
(ii) d(x, A) = d(x, A). Furthermore, d(x, A) = 0 if and only if x ∈ A.
2.5. Metric spaces 35

Proof. (i) For any x, y ∈ X and a ∈ A we have that d(x, a) ≤ d(x, y) + d(a, y); thus,
d(x, A)−d(y, A) ≤ d(x, y). Changing the rôles of x and y we obtain that |d(x, A)−d(y, A)| ≤
d(x, y).
(ii) The set that defines d(x, A) is contained in the one that defines d(x, B).
(iii) Since A ⊂ A, then by (i) d(x, A) ≤ d(x, A). For any a ∈ A, let {an } ⊂ A be a sequence
that converges to a. Then, by (i),
d(x, a) = lim d(x, an ) ≥ d(x, A).
n
Also, by (i), if d(x, A) = 0 and an ∈ A is chosen so that limn d(x, an ) = d(x, A), we conclude
that x ∈ A.

Given ε > 0 and A ⊂ X, the open and closed ε–neighborhood s of A, denoted by Aε

and Aε respectively, are defined as
Aε = {x ∈ X : d(x, A) < ε}
Aε = {x ∈ X : d(x, A) ≤ ε}
Lemma 2.5.8. Let δ, ε > 0. Then, (Aδ )ε ⊂ Aδ+ε .

Proof. Let x ∈ (Aδ )ε , and for any r > 0, let a′ ∈ Aδ such that d(x, a′ ) < d(x, Aδ ) + r.
Then,
d(x, A) ≤ d(x, a′ ) + d(a′ , A) < d(x, Aδ ) + r + δ ≤ δ + ε + r.
Letting r ց 0 implies that x ∈ Aδ+ε .

In any given topological space X, countable intersections of open sets are called Gδ sets,
and countable unions of closed sets are called Fσ sets.
Lemma 2.5.9. (Alexandroff) Let X be a subspace X of a metric (Y, d). If X admits
a complete metric ρ compatible with the subspace topology then, X is a Gδ subset of Y .
Conversely, if X is a Gδ set in a complete metric space (Y, d) then, X admits a complete
metric compatible with the subspace topology.

Proof. Suppose that X has a complete metric ρ that generates the subspace topology. We
use diamρ and diamd to denote diameters with respect to ρ and d respectively. For each
1
n ∈ N let Gn be the collection of open sets V in X with diamρ (V
S ) < n . Each V ∈ Gn is of
the form V = UV ∩ X for some open set UV in Y . Let Wn = V ∈Gn UV . Then, Wn is an
S 1
open subset of Y , and so X = Gn , X ⊂ Wn . Notice that X n = {y ∈ Y : d(y, X) < n1 } is
an open set in Y that contains X. We claim that
\ 1
X = (X n ∩ Wn )
n
T 1 T 1
The inclusion X ⊂ n (X ∩Wn ) is obvious. Suppose x ∈ n (X n ∩Wn ). Then d(x, X) = 0,
n
m→∞
and so x ∈ X. Let {xm : m ∈ N} ⊂ Y be a sequence such that d(xm , x) −−−−→ 0. For each
1
n choose Vn ∈ Gn such that x ∈ UVn and diamρ (UVn ∩ X) < n . There is an integer Nn such
36 2. Elements of point set Topology

that m ≥ Nn implies that xm ∈ UVn . It follows that {xm : m ∈ N} is a Cauchy sequence

in (X, ρ), and by hypothesis, xm converges some z ∈ X in (X, ρ). Since the metric induced
by ρ coincides with the topology of X as a subspace of (Y, d), we conclude that z = x, and
the reverse inclusion follows.
T
Conversely, suppose that X = n Gn for some open subsets Gn of (Y, d). Then
X 1 1 1

ρ(x, y) = d(x, y) + n
∧ c
− c
n
2 d(x, Gn ) d(y, Gn )
defines a metric on X. The continuity of the maps y 7→ d(y, Gcn ) together with the uniform
convergence of the series defining ρ implies that d and ρ are equivalent metric on X. We
claim that ρ is complete. Suppose {xm : m ∈ N} ⊂ X is a Cauchy sequence for ρ. Then it
is also a Cauchy sequence for d and so xm converges to some point x∗ ∈ Y . It follows that
limm d(xm , Gcn ) = d(x∗ , Gcn ) for each n. Since
1 1

lim − = 0,
k,m→∞ d(xk , Gcn ) d(xm , Gcn )
T
it follows that d(x∗ , Gcn ) > 0 for each n. Therefore x∗ ∈ n Gn = X.
Definition 2.5.10. A function f between metric spaces (X, d) and (T, e) is an isometry
if d(x, y) = ρ(f (x), f (y)) for all x, y ∈ X. If in addition, f is bijective, then clearly f −1 is
an isometry from (T, e) to (X, d). In such case, we say that the spaces (X, d) and (T, e) are
isometric.
Definition 2.5.11. A function f : (X, d) −→ (S, ρ) is uniformly continuous if for any
ε > 0, there exists δ > 0 such that d(x, y) < δ implies that ρ(f (x), f (y)) < ε.

Notice that the notion of uniform continuity is metric dependent. For S = R it is natural
to consider ρ(x, y) = |x − y|. Let Ub (X, d) the space of bounded real valued uniformly
continuous functions on (X, d).
Theorem 2.5.12. Let (X, d) be any metric space. There exists a complete metric space
(T, ρ) and an isometry f from X to T such that f (X) is dense in T . If (T ′ , ρ′ ) and f ′
satisfy the properties described above, then (T, ρ) and (T ′ , ρ′ ) are isometric.

Proof. Fix x0 ∈ X, and for each x ∈ X define φx (y) = d(x, y) − d(x0 , y). From the triangle
inequality we obtain that maxy∈X |φx (y)| = d(x, x0 ), and maxy∈X |φx (y) − φz (y)| = d(x, z).
Then, the map x 7→ φx defines an isometry f between (X, d) and the space (Bb (X), ρ) of
bounded real functions on X equipped with the metric ρ(h, h′ ) = supy∈X |h(y) − h′ (y)|,
which is complete. This proves that (X, d) is isomorphic to the complete metric space
(T, ρ) := (f (X), ρ) in which f (X) is dense.

Suppose (T ′ , ρ′ ) is another complete metric space for which there is an isometry f ′ from X
to T ′ with f ′ (X) = T ′ . Then, the map ξ = f ′ ◦ f −1 : f (X) → f ′ (X) is clearly an isometry
with respect to the metrics ρ and ρ′ . It is easy to extend ξ to an isometry from (T, ρ) and
(T ′ , ρ′ ) by setting ξ(y) = limn ξ(yn ) for any sequence yn in f (X) that converges to y.
2.5. Metric spaces 37

Remark 2.5.13. A metric space (T, ρ) satisfying the conditions of Theorem 2.5.12 is called
metric completion of (X, d). Observe that if (X, d) is complete, then X is isometric to
its completion, that is, no new points are added to the metric completion.
Lemma 2.5.14. Let (X, d) be a metric space for which any sequence (xn ) ⊂ X has a
convergent subsequence. For any open cover A of X, there exists a number δ > 0 such that
if C ⊂ X and diam(C) < δ, then C ⊂ A for some A ∈ A .

Proof. We argue by contradiction. Suppose A is an open cover of X for which no such

δ > 0 exists. Then, for any n ∈ N, there exists Cn with diam(Cn ) < 1/n contained in
no element of A . For each n, let xn ∈ Cn . By assumption, there exists x ∈ X and a
subsequence {xnk } such that d(xnk , x) → 0. Let A ∈ A be such that x ∈ A and choose
ε > 0 small enough so that B(x; ε) ⊂ A. If k is large enough so that 1/nk < ε/2 and
d(xnk , x) < ε/2, then Cnk ⊂ B(x; ε) ⊂ A. This contradicts the choice of {Cn }.
Definition 2.5.15. A subset F of a topological space X is relatively compact if F is
a compact subset of X. A Hausdorff topological space is sequentially compact if every
sequence in X has a convergent subsequence. A metric space (X, d) is totally bounded if
for any ε > 0, X admits a finite cover by open balls of radius ε.
Lemma 2.5.16. Let (X, d) be a metric space. If X is sequentially compact then X is totally
bounded.

Proof. Suppose that any sequence in (X, d) admits a convergent subsequence. If X is not
totally bounded, then there exists ε > 0 such that that every finite collection of discs of
radius ε fails to cover X. Let x1 ∈ X be arbitrary. As B(x1 ; ε) does not cover X, there
is x2 ∈ B c (x1 ; ε). By induction, we can construct a sequence {xn : n ∈ N} such that
Sn−1
xn ∈ X \ k=1 B(xk ; ε). Since d(xn , xm ) ≥ ε for all m and n, the sequence {xn } has no
convergent subsequence. This is a contradiction.
Theorem 2.5.17. Let (X, d) be a metric space. X is compact if and only if any sequence
{xn } ⊂ X has a convergent subsequence.

Proof. Assume that X is compact. If the sequence (xn ) ⊂ X has a finite number of
elements the conclusion follows easily. Assume that (xn ) has an infinite number of elements
and that it does not admit a convergent subsequence. Then any x ∈ X has an open
neighborhood Ux which contains at most one element of {xn }. By compactness, there is a
finite subcover of {Ux : x ∈ X}. This implies that {xn } is finite, contradiction.

Conversely, if (X, d) is sequentially compact then, by Lemma 2.5.16, X is totally bounded.

Let A be an open cover of X and let δ > 0 be as in Lemma 2.5.14. For ε = δ/3, choose a
finite collection of balls of radius ε covering X. For each ball in the finite cover, there is a
set in A that contains that ball. Therefore, every open cover admits a finite subcover.
Theorem 2.5.18. (X, d) is compact iff it is complete and either totally bounded or sequen-
tially compact.
38 2. Elements of point set Topology

Proof. Necessity follows from Lemma 2.5.16and Theorem 2.5.17. To show sufficiency, let
(xn ) be a sequence in X. Totally boundedness implies that there is a ball B1 of radius
one which contains infinitely many elements of {xn }. Let xn1 be one of such elements.
Proceeding by induction, we obtain a strictly increasing sequence nj of integers and balls
Bj of radius 1/j such that xnj ∈ B1 ∩ · · · ∩ Bj , and B1 ∩ · · · ∩ Bj contains infinitely
many elements of (xn ). Clearly (xnj ) is a Cauchy sequence which, by completeness of X,
converges. This shows that every sequence in X has a convergent subsequence.

2.6. Banach fixed point theorem

Definition 2.6.1. A function f : (X, d) → (X, d) is a contraction if there is 0 < θ < 1
such that d(f (x), f (y)) ≤ θd(x, y) for all x, y ∈ X.

The following result has wide theoretical and practical applications in many areas.

Theorem 2.6.2. (Banach’s fixed point theorem) Suppose f is a contraction on a complete

metric space (X, d) with contraction constant 0 < θ < 1. Then, there exists a unique x∗ ∈ X
such that f (x∗ ) = x∗ , and for all x ∈ X
θn
d(f n (x), x∗ ) ≤ d(f (x), x∗ ).
1−θ

Proof. First we show uniqueness. Suppose x∗ and y∗ are fixed points of f . Then d(x∗ , y∗ ) =
d(f (x∗ ), f (y∗ )) ≤ θd(x∗ , y∗ ). Since 0 < θ < 1, it follows that d(x∗ , y∗ ) = 0.

If f is a contraction, then it is continuous. For any x, y ∈ X and m ≥ 1

d(f m (x), f m (y)) ≤ θd(f m−1 (x), f m−1 (y)) ≤ . . . ≤ θm d(x, y)

Fix x ∈ X and set xn = f n (x) for all n ≥ 1. Then

n−1
X n−1
X
n m k+1 k θm
d(f (x), f (x)) = d(f (x), f (x)) ≤ θk d(f (x), x) ≤ d(f (x), x).
1−θ
k=m k=m

Consequently, {xn : n ∈ Z+ } is a Cauchy sequence in X and convergence to some x∗ ∈ X

ensues. The continuity of f implies that f (x∗ ) = limn f (xn ) = limn xn+1 = x∗ .

The follwing result is a slight generalization of Banach’s fixed point theorem.

Theorem 2.6.3. (Capaccioli fixed point theorem) Let (X, d) be a complete metric space.
Suppose f : X → X has the property that for each n ∈ N, there exits cn > 0 such that
(a) d(f n (x), f n (y)) ≤ cn d(x, y) for all x, y ∈ X.
P
(b) n cn < ∞

Then, there is a unique point x∗ ∈ X such that f (x∗) = x∗ .

2.7. Uniformities 39

Proof. Fix x0 ∈ X and set xn = f (xn−1 ) for all n ≥ 1. From (a), f is Lipchitz continuous
and for all n > m
n−1
X n−1
X n−1
X
j j

(2.2) d(xn , xm ) ≤ d(xj+1 , xj ) = d f (x1 ), f (x0 ) ≤ d(x1 , x0 ) cj
j=m j=m j=m
Pn
Condition (b) implies that the sequence of sums sn := j=1 cj is Cauchy. By (2.2) we
conclude that {xn : n ∈ N} is a Cauchy sequence in X, and so it converges to some point
x∗ ∈ X. By continuity f (x∗ ) = limn f (xn ) = limn xn+1 = x∗ .
Uniqueness follows from (a) for if f (y) = y then, d(x∗ , y) = d(f n (x∗ ), f n (y)) ≤ cn d(x∗ , y).
Letting n → ∞ shows that x∗ = y.
Remark 2.6.4. Banach’s contraction principle follows from Capaccioli’s theorem by taking
cn = θn where θ is the contraction coefficient.

2.7. Uniformities
Definition 2.7.1. Let X be a non empty set and D a collection of pseudo-metrics on X.
The collection of sets of the form B(x; d, ε) = {y ∈ X : d(x, y) < ε} with d ∈ D and ε > 0,
defines a subbase for a topology τ (D) which we call D–uniform topology .
If (Y, ρ) is a metric or pseudo-metric space, then a function f : X → Y is said to be
D–uniformly continuous if for any ε > 0 there are pseudo-metrics d1 , . . . , dn ∈ D and
δ > 0 such that max1≤j≤n dj (x, z) < δ implies ρ(f (x), f (z)) < ε.
Remarks 2.7.2. Suppose D collection of pseudometrics on X.
(i) If D separates points, that is, if supd∈D d(x, y) = 0 implies x = y, then τ (D) is
Hausdorff. Indeed, if x 6= y then r := d(x, y) > 0 for some d ∈ D. The sets
B(x; d, r/2) and B(y; d, r/2) are disjoint neighborhoods of x and y respectively.
(ii) If f is D–uniformly continuous, then f is continuous on (X, τ (D)). Indeed, let
x0 ∈ X set y0 = f (x0 ). For any ε > 0 there is δ > 0 and a pseudo-metric
d′ = max1≤j≤n dj , where dj ∈ D, such that d′ (x, y) < δ whenever ρ(x, y) < ε.
Hence f (B(x0 ; d′ , δ)) ⊂ B(y0 ; ρ, ε) and, since B(x0 ; d′ , ε) is a neighborhood of x0
in τ (D), the continuity of f follows.

A net {xα : α ∈ A} ⊂ X is said to be a Cauchy net with respect to τ (D) iff for any
ε > 0 and pseudo-metrics d1 , . . . , dn ∈ D, there is α0 ∈ A such that α ≥ α0 and α′ ≥ α0
imply that max1≤j≤n dj (xα , xα′ ) < ε. The space (X, τ (D)) is complete iff any Cauchy net
is convergent.
Lemma 2.7.3. Suppose (Y, ρ) is a metric space. Y is complete iff any Cauchy net in X
converges.

Proof. Since any sequence is a net, only sufficiency remains to be proved. Suppose (Y, ρ) is
complete and let {yα : α ∈ A} be a Cauchy net. For each α ∈ A let Aα = {y Tβn : β ∈ A, β ≥
α}. For any n ∈ N there is αn ∈ A such that diam(Aαn ) < 1/n. Let Bn = k=1 Aαk . Since
40 2. Elements of point set Topology

A is a directed set, there exists ŷn ∈ Bn for each n. Consequently, {ŷn : n ∈ N} is a Cauchy
sequence in Y , and so it converges to a point y ∈ X. For any ε > 0 let N > 2/ε so that
d(ŷn , y) < ε/2 for all n ≥ N . If α ∈ A with α ≥ αN ,
1 ε
ρ(yα , y) ≤ ρ(yα , ŷN ) + d(ŷN , y) ≤ + < ε,
N 2
which shows that yα → y.
Theorem 2.7.4. Suppose (X, τ (D)) is a D–uniform space, (E, ρ) is a complete metric
space and S ⊂ X is dense in X. If f : S → E is D–uniformly continuous, then there exists
a unique continuous extension fˆ of f to X.

Proof. For any x ∈ X let {xα : α ∈ A} ⊂ S be a net that converges to x. By uniform

continuity, for any ε > 0 there is δ > 0 and pseudometrics d1 , . . . , dn ∈ D such that
(2.3) max dj (x, z) < δ implies ρ(f (x), f (z)) < ε.
1≤j≤n

There is α0 ∈ A for which α ≥ α0 implies max1≤j≤n dj (xα , x) < δ/2. Hence {f (xα ) : α ∈ A}
is a Cauchy net in (E, ρ) and since E is complete, there is a unique y ∈ E such that
f (xα ) → y. If {yβ : β ∈ B} ⊂ S is another net converging to x, then there is β0 ∈ B
for which β ≥ β0 imply d(yβ , x) < δ/2. Hence ρ(f (xα ), f (yβ )) < ε whenever α ≥ α0 and
β ≥ β0 which leads to limα f (xα ) = y = limβ f (yβ ). Consequently, fˆ(x) = limα f (xα ) for
any net xα → x is a well defined function which extends f to all X and which is D–iniformly
continuous. Indeed, given ε > 0, there is δ > 0 and a pseudometrics d1 , . . . , dn ∈ D such
that (2.3) holds. If d(x′ , y ′ ) = max1≤j≤n dj (x′ , y ′ ) < 3δ , xα → x′ and yβ → y ′ , then for some
α0 and β0 , d(x, xα0 ) ∨ d(yβ0 , y) < 3δ and ρ(fˆ(x), f (xα0 )) ∨ ρ(fˆ(y), f (yβ )) < 3ε . Hence
d(xα , yβ ) ≤ d(xα , x) + d(x, y) + d(y, yβ ) < δ,
which implies
ρ(fˆ(x), fˆ(y)) ≤ ρ(fˆ(x), f (xα )) + ρ(f (xα ), f (yβ )) + ρ(f (yβ ), fˆ(y)) < ε.
It follows that fˆ is D–uniformly continuous.

If F is another continuous extension of f to X then for any x ∈ X and net in S with xα → x

we have F (x) = limα F (xα ) = limα f (xα ) = fˆ(x).

2.8. Product topology

Definition 2.8.1. Given a nonempty
Q collection {(Xi , τi ) : i ∈ I} of topological spaces, the
product topology on X = i∈I Xi is the topology with subbase S = {p−1 i (Ui ) : Ui ∈
τi , i ∈ I}. That is, it is the minimal topology that makes each projections pi : (x) 7→ xi :=
x(i) ∈ Xi continuous.
Q
Theorem 2.8.2. Fix x0 ∈ i∈I Xi = X and let D be the set of all points in X that differ
from x0 in at most finitely many components. Then D is dense in X with respect the product
topology.
2.8. Product topology 41

T
Proof. It is enough to show that any basic open set nj=1 p−ij (Uij ), where Uij is open in
Xij , contains an element of D. Choose yij ∈ Uij for j = 1, . . . , n. the point x defined as
x(ij ) = yij for j = 1, . . . , n, and x(i) = x0 (i) otherwise. Clearly x ∈ D.
Theorem 2.8.3. Let (X, τX ) and (Y, τY ) be topological spaces. Suppose that τY is Haus-
dorff. If f : X → Y is continuous then, Graphf := {(x, f (x)) : x ∈ X} is closed in
(X, τX ) × (Y, τY ).

Proof. Let (x, y) ∈ (X × Y ) \ Graphf . Then y 6= f (x) and there are open sets U, V ∈ τY
such that y ∈ U , f (x) ∈ V and U ∩ V = ∅. By continuity there is W ∈ τX with x ∈ W such
that f (W ) ⊂ V . It follows that W ×U is an open neighborhood of (x, y) in (X, τX )×(Y, τY )
each that (W, V ) ∩ Graphf = ∅.
Lemma 2.8.4. Let {(Xi , τi ) : i ∈ I} be a collection of topological spaces Q
and let (X, τ ) its
product space. Given a topological space (Y, τY ) and a function f : Y → i∈I Xi , we have
that f is continuous iff pi ◦ f : Y → Xi is continuous.

Proof. If f is continuous then each function pi ◦ f . i ∈ I, is continuous for each pi is

continuous by definition τ .
Conversely, suppose that pi ◦ f is continuous for each i ∈ I. Then, for any i ∈ I and
U ∈ τi , f p−1 −1
i (U )) = (pi ◦ f ) (U ) ∈ τY . The continuity of f follows from the fact that
{p−1
i (U ) : i ∈ I, U ∈ τi } is a subbasis for τ , and the fact that preimages of intersection of
sets is the intersection of preimages.
Q
Remark
Q 2.8.5. If J is a non–empty subset of I then, there is a natural map pJ : i∈I Xj →
j∈J Xj given by x = (x(i) : i ∈ I) 7→ (x(j) : j ∈ J), that is, the restriction of the choice
function x to J. It is easy to check that pJ is continuous.
Remark 2.8.6. Let {(Xi , τi ) : i ∈ I} be a collection of topological spaces and let (X, τ ) its
product space. Each space Xi can be embedded homeomorphically into X. Fix i ∈ I, and
for each j ∈ I \ {i} choose x0j ∈ Xj . We claim that the slice Si := {x ∈ X : xj = x0j , j 6= i},
as a subspace of X, is homoeomorphic to (Xi , τ ). Consider the map h : xi 7→ x∗ , where
x∗ (i) = xi and x∗ (j) = x0j for j 6= i. h is clearly a bijection from Xi to Si ; since pj ◦ h is
the constant function x0j for j 6= i, and pj ◦ h is the identity map on Xi for j = i, we have
that h is continuous. For any U ∈ τi , h(U ) = Si ∩ p−1 i (U ) which is open in Si , and so h
−1

is continuous.
Example 2.8.7. The set N, as a subspace of the Euclidean space R, has the discrete
topology, where any subset of N is open in N. Recall that positive integer n admits a
unique decomposition as n = 2α−1 (2β − 1), where α, β ∈ N. Then, integer values maps
α(n) = α and β(n) = β are clearly continuous. Similarly, the map φ(α, β) = 2α−1 (2β − 1)
on N2 is continuous. The map Φ : N → NN given by
Φ(n) = φ(n, ·) : m 7→ φ(n, m)
is continuous. Indeed, if pm is the projection map in NN onto the m–th component, we have
that (pm ◦ Φ)(n) = 2n−1 (2m − 1) is continuous.
42 2. Elements of point set Topology

N
Lemma 2.8.8. For any topological space X, the product spaces X N and X N are home-
omorphic.
N
Proof. Let φ be as in Example 2.8.7. Define the function G : X N → X N by
x = (x(n) : n ∈ N) 7→ ((x ◦ φ)(n, ·) : n ∈ N)
It can be seen that G is a bijection with inverse given by
(ξ(n, ·) : n ∈ N) 7→ ((ξ ◦ φ−1 )(n) : n ∈ N)
N
For any n ∈ N denote by pn and πn the projections in X N and X N respectively, onto
the corresponding n–th component. Let Mn := {φ(n, m) : m ∈ N}. Notice that πn ◦ G
is the projection from X N onto X Mn , and so πn ◦ G is continuous. Conversely, suppose
n = φ(αn , βn ). Then pn ◦ G−1 = pβn ◦ παn , which is continuous. The conclusion follows.
Example 2.8.9. Consider the spaces N and {0, 1} as a subspaces of R.By Lemma 2.8.8
N
The product spaces {0, 1}N and {0, 1}N are homeomorphic. Notice that although both
N and {0, 1} have discrete topology, the product spaces NN and {0, 1}N are not discrete.
Theorem 2.8.10. IfQ {(Xi , τi ) : i ∈ I} is a family of connected topological spaces, then the
product space X := i∈I Xi is connected.

Proof. Fix y0 = (yi0 : i ∈ I) ∈ X. By induction, we prove that if point y(n) differs from y0
by only n–components, then there is a connected set that contains both y0 and y(n) . For
(1) (1)
n = 1, suppose yi01 6= yi1 and yi0 = yi otherwise. The slice Si01 = {x ∈ X : xi = x0i , i 6= i1 }
contains y0 and y(1) , and being homeomorphic to Xi1 , it is connected by Theorem 2.2.10.
Suppose the claim is valid for k = 1, . . . n − 1. Suppose y(n) differs from y0 by exactly
n–components. Let y(n−1) be such that differs from y0 in n − 1 components, and from y(n)
by only one components. Then induction hypothesis, there are connected sets A0 and A1
that contain y(n−1) and y0 , and y(n−1) and y(n) respectively. Hence, A0 ∪ A1 is connected
and contains y0 , y(n−1) , and y(n) . This completes the proof of the claim.
Let D be the set of all points in X that differ from x0 by only finitely many components.
The claim along with Theorem 2.2.4 implies that D is connected. As D is dense, Y = D is
connected.
Example 2.8.11. Let [0, 1] the unit interval with the topology inherited from the Euclidean
topology on R. For any set I, [0, 1]I with the product topology is connected.
Theorem 2.8.12. If {(Xi , τi ) : i ∈ I} is a collection of topological spaces.
Q
(i) The product topology τ on i∈I Xi is Hausdorff iff each τi is a Hausdorff topology.
(ii) If I is countable and each (Xi , τi ) is second countable, then (X, τ ) is second count-
able.
Q
Proof. (i) Suppose each (Xi , τi ) is Hausdorff. Let x, y ∈ X = i∈I Xi , and assume x 6= y.
Then xi 6= yi for some i ∈ I. There are open neighborhoods U, V ∈ τi of xi and yi
respectively such that U ∩ V = ∅. Then x ∈ p−1 −1 −1 −1
i (U ), y ∈ pi (V ) and pi (U ) ∩ pi (V ) =
2.8. Product topology 43

p−1
i (U ∩ V ) = ∅.
Conversely, suppose (X, τ ) is Hausdorff. For each i ∈ I choose a slice Si as in Remark 2.8.6.
Being a subspace of X, Si is Hausdorff. Since Xi and Si are homeomorphic, we conclude
that Xi is Hausdorff.

For each i ∈ I let Bi be a countable basis of τi . Then B := {p−1 i (B) : i ∈ I, B ∈ Bi } is a

subbasis for τ . If I countable, then so is B. Since finite intersections of elements in B form
a basis for τ , it follows that τ has a countable basis.
Theorem 2.8.13. Let {(Xn , dn ) : n ∈ N} be a sequence of metric spaces. Then
∞
X dn (xn , yn ) ∧ 1
(2.4) ρ(x, y) :=
2n
n=1
Q
defines a metric on X := n∈N Xn that is compatible with the product topology. Moreover,
(X, ρ) is complete iff each (Xn , dn ) is complete.

Proof. As dn and dn ∧ 1 generate the same topology on Xn , we will assume without loss
of generality that dn ≤ 1. It is easy to check that ρ is a metric on X. To check that ρ is
compatible with the product topology τ we first show that any open ball B(x0 ; r) belongs
to τ . Suppose ρ(x0 , x) < r and set r∗ = ρ(x, x0 ) ∧ (r − ρ(x, x0 )). Let N ∈ N large enough
∗
so that 21N < r2 . Then, the set
r∗
U = {y ∈ X : dn (xn , yn ) < , n = 1, . . . , N }
2
is open in τ , and for any y ∈ U
N
X X dn (xn , yn )
dn (xn , yn )
ρ(x, y) = +
2n 2n
n=1 n>N
N
X dn (xn , yn ) r∗ r∗ r∗
≤ + < +
2n 2 2 2
n=1
This shows that U ⊂ B(x; r∗ ) ⊂ B(x0 ; r).

Now we show that for any N ∈ N and open set V in XN , W = {y ∈ X : yN ∈ V } is

open in (X, ρ). Let x ∈ W and let r > 0 be such that {y ∈ Xn : dN (y, xN ) < r} ⊂ V . If
ρ(x, y) < 2rN then,
dN (xN , yN ) ≤ 2N ρ(x, y) < r

whence it follows that y ∈ W . Therefore, B x; 2rN ⊂ W . We conclude that ρ is a metric
on X that generates τ .

Suppose that each (Xn , dn ) is complete and suppose {xn = (xnm : m ∈ N) : n ∈ N} is

a Cauchy sequence in (X, ρ). Then, for each n, {xnm : m ∈ N} is Cauchy. Thus, there
m→∞
is x∗ = (x∗m : m ∈ N) such that dn (xnm , x∗m ) −−−−→ 0. The uniform convergence of (2.4)
n→∞
implies that ρ(xn , x∗ ) −−−→ 0.
44 2. Elements of point set Topology

Conversely, suppose (X, ρ) is Complete. Fix m ∈ N. Assume (xnm : n ∈ N) is a Cauchy

sequence in (Xm , dm ). For each k ∈ N \ {m}, choose xk ∈ Xk and define xn by setting
xn (m) = xnm and xn (k) = xk for all k 6= m. Then, the sequence (xn : n ∈ N) is a Cauchy
sequence in (X, ρ), and so in convergence to some x∗ . Since dm ≤ 2m ρ, we conclude that
n→∞
dm (xnm : x∗m ) −−−→ 0.
Theorem 2.8.14. (Tihonov) Let Q{Xi : i ∈ I} be an arbitrary collection of compact topo-
logical spaces. The product X = i∈I Xi with the product topology is compact.

Proof. By Alexander’s theorem, it is enough to show that every subbasic cover U ⊂ S of

X has a finite subcover. S For each i ∈ I, let Ui = {U ∈ τi : p−1
i (U ) ∈ U }. We claim that S
there
is i0 ∈ I for which Xi = Ui . Otherwise, there is x ∈ X such that pi (x) = xi ∈ Xi \ Ui
−1
for all i ∈ I. Consequently, x ∈ / pi (U ) for all i ∈ I and U ∈ Ui . Therefore, U is not a
cover of X, contradiction to the choice of U . The compactness of Xi0 implies that there is
a subcover {U1 , . . . , Un } ⊂ Ui0 of Xi0 . Therefore, {p−1
i0 (Uj ) : 1 ≤ j ≤ n} ⊂ Ui0 ⊂ U is a
finite subcover of X.
Example 2.8.15. Consider the unit interval [0, 1] with the subspace topology inherited
from the Euclidean topology in R, and {0, 1} with the discrete topology. The corresponding
product topological spaces will be referred simply by [0, 1]N and {0, 1}N respectively.

2.9. Urysohn metrization

In this section we discuss the simplest result on metrization of a topological space. We start
with a simple result to motivate the main ideas.
Theorem 2.9.1. Suppose (X, τ ) is a compact topological space. If there is a sequence {fn }
of real–valued continuous functions separating points of X, then X is metrizable.

Proof. Define
X |fn (x) − fn (y)| ∧ 1
(2.5) d(x, y) = , x, y ∈ X
2n
n≥0

Since {fn } separates points in X, d is a metric on X. Since each fn is continuous and the
sum in (2.5) is uniformly convergent on X ×X, the metric d : X ×X → [0, ∞) is continuous.
Therefore, B(x; r) = {y ∈ X : d(x, y) < r} ∈ τ for all x ∈ X and r > 0. Consequently
τd ⊂ τ . Being τd Hausdorff and τ compact, from Lemma 2.4.7 we conclude that τd = τ .
Theorem 2.9.2. (Urysohn metrization theorem) Let (X, τ ) be a Hausdorff topological
space. X is metrizable and separable iff X is regular and second countable. In either
case, X is homeomorphic to a subset of [0, 1]N.

Proof. Necessity: If X is metrizable then it is clearly regular. In addition, if X is separable,

taking balls or rational radii around points in a countable dense set gives countable basis
for the topology of X.
2.9. Urysohn metrization 45

Sufficiency: Suppose X is regular and second countable. By Theorem (2.1.17) X is sepa-

rable, and by Theorem 2.1.19 X is normal. Let B be a countable basis for the topology.
Then Q = {(U, V ) ∈ B × B : U ⊂ V } is a nonempty countable collection, and for each
x ∈ X, there is (U, V ) ∈ Q such that x ∈ U ⊂ U ⊂ V . Let {Un , Vn ) : n ∈ N} be an
enumeration of Q. By Urysohn’s separation lemma, for each n ∈ N there is fn ∈ C(X, [0, 1])
such that fn (Un ) = {1} and fn (X \ Vn ) = {0}. Hence {fn : n ∈ N} ⊂ C(X, [0, 1]) is a
sequence of functions that separates points of X. Let F : X → [0, 1]N be the function
given by x 7→ (fn (x) : n ∈ N). Clearly F is injective, and as pn ◦ F = fn ∈ C(X, [0, 1]),
F is continuous. We consider F (X) with the subspace topology inherited from [0, 1]N. It
remains to show that F −1 : F (X) → X is continuous. Let (F (xm ) : m ∈ N) be a sequence
that converges to F (x) in [0, 1]N for some x ∈ X. For any W ∈ τ with x ∈ W , choose
(UN , VN ) ∈ Q be such that x ∈ UN ⊂ UN ⊂ VN ⊂ W . Then fN (x) = 1, and since
m→∞
(pN ◦ F )(xm ) = fN (xm ) −−−−→ fN (x), it follows that fN (xm ) > 0 for all m large enough,
that is xm ∈ VN for all m large enough. This shows that xm → x. Therefore F : X → F (X)
is an homeomorphism between X and F (X).

Finally, let d be a metric on [0, 1]N compatible with the product topology. Then ρ(x, y) :=
d(F (x), F (y)) metrizes (X, τ ).

Corollary 2.9.3. Let (X, ρ) be a separable metric space. Then, there is equivalent metric
ρ̃ on X and an isometry h : (X, ρ̃) → [0, 1]N. Moreover, the spaces (Ub (X, ρ̃), k · ku ) and
(Cb (h(X)), k · ku ) are isometric.

Proof. Let d be a metric on [0, 1]N compatible with the product. Then ([0, 1]N, d) is a
compact Polish space, and by Urysohn’s metrization theorem, (X, ρ) is homeomorphic to
some subset U in [0, 1]N. Let h be such an homeomorphism. Then ρ̃(x, y) := d(h(x), h(y))
defines a metric on X equivalent to ρ, h : (X, ρ) → [0, 1]N is an isometry, and (X, ρ̃) and
(h(X), d) are isometric.

Let Xe = h(X), where the closure is taken with respect the product topology on [0, 1]N. It
follows that Cb (X) e ≡ Ub (X),
e and the map Φ : Ub (X, e d) −→ Ub (X, ρ̃) given by f ′ 7→ f ′ ◦ h ∈
Ub (X, ρ̃) satisfies kf ku = kΦ(f )ku . If f ∈ Ub (X, ρ̃), then f ′ = f ◦ h−1 ∈ Ub (h(X), d), and
′ ′

has a unique extension to a function F ′ ∈ Ub (X, e ρ) = Cb (X)

e with kF ′ ku = kf ′ ku . Therefore,
e
(Ub (X, ρ̃), k · ku ) and (Cb (X), k · ku ) are isometric.

Corollary 2.9.4. Every Polish space (X, ρ) is homeomorphic to a Gδ subset of [0, 1]N.

Proof. Let d be a metric on [0, 1]N that metrizes the product topology. Then ([0, 1]N, d) is
a compact Polish space. By Urysohn’s metrization theorem X is homeomorphic to a subset
U of [0, 1]N. Let h be such an homeomorphism. Then d(h(x),˜ h(y)) := ρ(x, y) metrizes
U = h(X) as a subspace of [0, 1] . By Alexandroff’s lemma, U is a Gδ subset of [0, 1]N.
N

Theorem 2.9.5. The continuous image of a compact metric space into a Hausdorff space
is compact and metrizable.
46 2. Elements of point set Topology

Proof. Suppose X is compact, Y is a Hausdorff space and f : X → Y is continuous. Then

f (X) is compact and Hausdorff and f : X → f (X) is closed function. Thus, we may assume
without loss of generality that f (X) = Y . It is easy to check that Y is regular (in fact it
is even normal). By Urysohn’s metrization theorem, it is enough to show that Y is second
countable. Since X is a compact metric space, it is second countable. Let B be a basis for
X, and B∗ be the collection of all finite union of elements of B. Clearly B ∗ is countable. Let
G be an open set in Y and y ∈ G. Then f −1 ({y}) is a closed (in fact compact) subset of
the open set f −1 (G). There is U ∈ B ∗ such that f −1 ({y}) ⊂ U ⊂ f −1 (G). It follows that
f −1 (Y \ G) ⊂ X \ U ⊂ f −1 (Y \ {y})
Since f is surjective, Y \ G ⊂ f (X \ U ) ⊂ Y \ {y}, that is, {y} ⊂ Y \ f (X \ U ) ⊂ G. Since
f is a closed function, we have that {Y \ f (X \ U ) : U ∈ B ∗ } is a countable basis for Y .

2.9.0.1. Characterization of I = [0, 1]. A compact connected Hausdorf space is called a

continuum. The following result shows that in Hausdorff spaces, the intersection of directly
noincreasing continua is also a continuum. The unit interval [0, 1] with its usual topology
is a prototypical example of a nice continuum.
Theorem 2.9.6. The continuous image of the closed unit interval I = [0, 1] into a Hausdorff
space is compact, connected, locally connected, and metrizable space.

Proof. This is consequence of Theorems 2.4.6, 2.2.10, 2.9.5, and Corollary 2.2.15.
Theorem 2.9.7. Suppose X is a compact Hausdorff space. If C = {Cα : α ∈ I} is
a
T collection of continua contained in X that it is completely ordered by inclusion, then
α∈I Cα is a nonempty continuum.
T
Proof. Since X is Hausdorff and C has the finite intersection property, C = α∈I Cα is
non–empty and compact. Suppose C = A ∪ B, where A and B are nonempty disjoint closed
sets in C. Then A and B are disjoint compact subsets of X; consequently, there are disjoint
open set U and V in X such that A ⊂ U and B ⊂ V . It follows that for any α ∈ I,
Cα ∩ U and Cα ∩ V are disjoint
nonempty open sets in Cα . Since each Cα is connected,
Kα := Cα ∩ X \ (U ∪ V ) 6= ∅. Clearly {Kα : α ∈ I} is collection of compact subsets
of X which is completely ordered by inclusion. Hence ∩α∈I Kα = C ∩ X \ (U ∪ V ) 6= ∅;
however, C ⊂ U ∪ V and we reach a contradiction. Therefore, C is connected.
Definition 2.9.8. Suppose X is a T1 connected space. A point p ∈ X is called a cut point
of X if X \ {p} = A ∪ B where A and B nonempty are separated sets sets. All other points
are called noncut points.
Example 2.9.9. Every point in the unit interval [0, 1], with the exception of {0, 1}, is a
cut point. No point in the circle S1 is a cut point.
Lemma 2.9.10. If X is T1 compact connected, p ∈ X is a cut point and X \ {p} = A ∪ B
where A and B are separated, then A and B contain each a noncut point. In particular, if
X has more that one element, then it has at least two noncut points.
2.9. Urysohn metrization 47

Proof. Suppose that each point x ∈ A is a cut point and induces the separation X \ {x} =
Ax ∪ Bx with p ∈ Bx . Since X is T1 , both Ax and Bx are open in X. The set B ∪ {p} is
connected by Theorem 2.2.9 and intersects Bx at p; hence
(2.6) B ∪ {p} ⊂ Bx , Ax ∪ {x} ⊂ A
If x, y ∈ A and y ∈ Ax , then x 6= y and Bx ∪ {x} ⊂ X \ {y} = Ay ∪ By . Since p ∈
(Bx ∪ {x}) ∩ By and Bx ∪ {x} is connected,
(2.7) Bx ∪ {x} ⊂ By , Ay ∪ {y} ⊂ Ax
The collection {Ax ∪ {x} : x ∈ A} is partially ordered by inclusion and by Haudsorff’s
maximal principle, it contains a maximal chain L . Since X is compact
T and L is collection
of closed subsets that has the finite intersection property, K = L is nonempty. If q ∈ K,
then Aq ⊂ A as in (2.6). If r ∈ Aq , then Ar ∪ {r} ⊂ Aq but q ∈ / Ar ∪ {r} as in (2.7). This
implies that Ar ∪ {r} ∈/ L . On the other hand, if Ax ∪ {x} ∈ L and x 6= q, then q ∈ Ax in
which case Aq ∪ {q} ⊂ Ax ∪ {x}, that is
Ar ∪ {r} ⊂ Ax ∪ {x}
for any Ax ∪ {x} ∈ L . Consequently, {Ar ∪ {r}} ∪ L is a chain that contains L properly,
contradicting the maximality of L . The contradiction arose from assuming that all points
in A were cut points; Therefore, A contains a noncut point. Applying a similar argument
to B shows that B contains a noncut point too.

Suppose X has more than one element. If X has no cut points, then all its elements are
noncut points. If X has one cut point p and X \ {p} = A ∪ B is a separation, then by the
first part of the lemma, each A and B has a cut point. Since A and B are disjoint, X has
at least two cut points.

A continuum X is said to be irreducible about a set A ⊂ X if for any subcontinuum

Y of X, A ⊂ Y implies Y = X. For example, the interval [0, 1] with the usual topology is
irreducible about {0, 1}.
Theorem 2.9.11. Every continuum X is irreducible about its non cut points.

Proof. Let N be the noncut points of a continuum X. Suppose there is a proper subcontin-
uum K such that N ⊂ K. Let x ∈ X \ K. Then, x is a cut point of X and X \ {x} = A ∪ B
for some nonempty separated sets A and B. Since K ⊂ X \ {x}, L must be contained in A
or in B. Without loss of generality, suppose K ⊂ A. Since B ∪ {x} is closed and connected,
B ∪ {x} is also a proper subcontinuum; hence, it has at least two noncut points, one of
which, say y, is different from x. Then (B ∪ {x}) \ {y} and A ∪ {x} are connected and
contained x; thus,

(B ∪ {x}) \ {y} ∪ A ∪ {x} = X \ {y}
is connected. This means that y is a noncut point of X, but y ∈ B and B ⊂ X \ K ⊂ X \ N ,
which is a contradiction.
48 2. Elements of point set Topology

Given a connected set X and points a, b ∈ X, we define the set E(a, b) as the set
consisting of a and b, and all the cut points x ∈ X for which there is a separation X \ {x} =
A ∪ B where a ∈ B and b ∈ B. The latter points are said to separate a and b.
Lemma 2.9.12. Suppose (X, τ ) is a T1 connected set and a, b ∈ X. On E(a, b) define
x < y iff either x = a and x 6= y, or x separates a and y. Then,
(i) (E(a, b), <) is totally ordered.
(ii) The ordered topology τo on E(a, b) is weaker that the subspace topology on E(a, b)
induced by τ .

Proof. (i) For each point p ∈ E(a, b) \ {a, b}, we will use the notation Ap and Bp for
separated sets such that X \ {p} = Ap ∪ Bp and a ∈ Ap and b ∈ Bp .
If x and y are distinct points of E(a, b) \ {a, b}, then either x ∈ Ay or x ∈ By . If the former,
x ∈ Ay , then By ∪ {y} is a connected subset of X \ {x} = Ax ∪ Bx . Since b ∈ (By ∪ {y}) ∩ Bx ,
By ∪ {y} ⊂ Bx . Then y ∈ Bx and so, x separates a and y, that is, x < y. If the latter, y
separates x and so, y < x
Suppose that x, y, z ∈ E(a, b) and x < y and y < z. Then y ∈ / {a, b}, and x ∈ Ay . Hence
By ∪ {y} is a connected subset of X \ {x}. Consequently By ∪ {y} ⊂ Bx . A similar argument
shows that Bz ∪ {z} ⊂ By and so, Bz ∪ {z} ⊂ Bx . Hence z ∈ Bx which means that x < z.
Finally, since no point x separates a from itself, (E(a, b), <) is a totally ordered space.

(ii) For part (i), it follows that for y ∈ E(a, b) \ {a, b}

Ay ∩ E(a, b) = {x ∈ E(a, b) : x < y}
By ∩ E(a, b) = {x ∈ E(a, b) : y < x}

Since {y} is closed, both Ay and By are open in X. This is enough to conclude that τo is
weaker that the subspace topology on E(a, b) induced by τ .
Theorem 2.9.13. If (X, τ ) is a continuum with exactly two noncut points, say a and b,
then X = E(a, b) and the the subspace topology coincides with the order topology.

Proof. If x ∈/ {a, b}, then x is a cut point and there is a separation X \ {x} = U ∪ V . By
Lemma 2.9.10, each open set U and V contain a cut point. Hence, either a ∈ U and b ∈ V
or a ∈ V and b ∈. In either case, x separates a and b, i.e., x ∈ E(a, b). Hence X = E(a, b).
By Lemma 2.9.12, the order topology τo is weaker that τ . Since (X, τ ) is compact and τo
is Hausdorff, τo = τ by Lemma 2.4.7.
Lemma 2.9.14. (Debreu) Suppose (X, τ ) is a topological space, and that X is totally
ordered and that the order topology τo ⊂ τ .
(i) (Open gap lemma) If X is second countable, then there exists a bounded strictly in-
creasing continuous function f : X → R. Furthermore, each connected component
of R \ f (X) is a singleton, a bounded open interval, or an infinite interval.
2.9. Urysohn metrization 49

(ii) If (X, τ ) is connected and has a countable dense set, then X is homeomorphic to a
interval in R. If X has a first element (last element), then the interval is left–closed
(resp. right–closed).

Proof. (Ouwehand) (i) Suppose B is a countable basis for (X, τ ). For each V ∈ B choose
zV ∈ V and set Z ′ = {zV : V ∈ B}. Z ′ is dense in τ and also in τo since τo ⊂ τ .
Consider the collection C of all pairs (a, b) ∈ X for which a < b and {z : a < z < b} = ∅.
This collection is countable. To check this, for each (a, b) ∈ C choose Va,b ∈ B such that
a ∈ Va,b ⊂ {x : x < b}. If (a, b) 6= (a′ , b′ ) then either a < b ≤ a′ < b′ in which case a ∈ Va,b
but a′ ∈
/ Va,b , or a′ < b′ ≤ b < a in which case a′ ∈ Va′ ,b′ but a ∈ / Va′ ,b′ . Hence, there is
′′
an injective map from C into B and so, C is countable. Let Z = {a, b : (a, b) ∈ C} and
Z = Z ′ ∪ Z ′′ . If X has a minimum, say a, and/or a maximum, say b, we assume that are
included in Z. The set Z thus constructed is dense in the order topology τo .

Let {zn : n ∈ Z+ } be an enumeration of Z. Define a function f : Z → R as follows.

f (z0 ) = 12 . Once f (z0 ), . . . , f (zn ) have been defined, we define f at zn+1 as follows:
If zj < zn+1 < zk for some 0 ≤ j, k ≤ n,
max{f (zj ) : 0 ≤ j ≤ n, zj < zn+1 } + min{f (zj ) : 0 ≤ j ≤ n, zn+1 < zj }
f (zn+1 ) = ;
2
if zj < zn+1 for all 0 ≤ j ≤ n,
max{f (zj ) : 0 ≤ j ≤ n} + 1
f (zn+1 ) = ;
2
if zn+1 < zj for all 0 ≤ j ≤ n,
min{f (zj ) : 0 ≤ j ≤ n}
f (zn+1 ) = .
2
Continuing by induction, we obtain a strictly monotone function f : Z → (0, 1) taking
values in the dyadic numbers in (0, 1). It is clear that the function
f (x) = sup f (z)
z∈Z:z≤x

extends f to X, and that f is a strictly monotone bounded function.

Claim I: If Z = A ∪ B where A, B are nonempty, (a, b) ∈ A × B implies a ≤ b, and either

A has no maximum element or B has no minimum element, then supA f = inf B f , for if
supA f < inf B f , then there is 0 < ε < inf B f − supA f . Let a0 ∈ A and b0 ∈ B such that
supA f − ε < f (a0 ) ≤ supA f
inf B f ≤ f (b0 ) < inf B f − ε
There are n, m ∈ Z+ such that a0 = zn and b0 = zm . If A has no maximum, {z ∈ A : zn < z}
has infinitely many elements. Similarly, if B has no minimum, {z ∈ B : z < zm } has
infinitely many elements. In either case, there are infinitely many k > n, m such that
50 2. Elements of point set Topology

f (a0 )+f (b0 )

ao = zn < zk < zm = b0 ; let k be the smallest of them. Since f (zk ) = 2 , we have
that
supA f − ε + inf B f supA f + inf B f + ε
sup f < < f (zk ) < < inf f
A 2 2 B

This is a contradiction for if zk ∈ A, f (zk ) ≤ supA f ; and if zk ∈ B, inf B f ≤ f (zk ).

Claim II: f = g where
g(x) = inf f (z)
z∈Z:x≤z

Suppose f (x) < g(x) for some x ∈ X. Define A = {z ∈ Z : z ≤ x} and B = {z ∈ Z : x ≤ z}.

Then, Z = A ∪ B and a ≤ b for all a ∈ A, b ∈ B. By Claim I, A must have a maximum,
say a0 , and B must have a minimum, say b0 . Since f (x) < g(x) by assumption, a0 < b0
and (a0 , b0 ) ∈ C. Since a0 ≤ x ≤ b0 , x ∈ Z and by definition supz∈Z:z≤x f (z) = f (x) =
inf z∈Z:x≤z f (z) = g(x) which is a contradiction.

To prove that f is continuous, it suffices to show that f −1 (−∞, p) and f −1 (p, ∞) are open
in (X, τ0 ) for any p ∈ R. Fix p ∈ R and set A = f −1 (−∞, p) and B = f −1 (p, ∞). If p = f (x)
for some x ∈ X, then A = {u : u < x} and B = {u : x < u} since f is strictly increasing.
Suppose p ∈ / f (X). If A is empty, B = X. Similarly, if B is empty, A = X. If A and B are
not empty, then A′ = A ∩ Z and B ′ = B ∩ Z are nonempty since f = g, Z = A′ ∪ B ′ , and
a < b whenever (a, b) ∈ A′ × B ′ . In view of Claim I, it must be the case that either A′ has
a maximum, say a0 , and B ′ has a minimum, say b0 , or A has no maximum and B has no
minimum. In the former case, A = {u : u < b0 } and B = {u : a0 < u} S since p ∈ / f (X); in
S case, supA f = supA′ f = p = inf B f = inf B f and so, A = x∈A {u : u < x} and
the later ′

B = y∈B {u : y < u}.

To conclude the proof of (i), it is enough to show that any bounded component G of
R \ g(X) is an open interval. Let ℓ = inf G and u = supp G and assume ℓ < u. Define
C = {z ∈ Z : f (z) ≤ ℓ} and D = {z ∈ Z : u ≤ f (z)}. Clearly C and D are nonempty,
Z = C ∪ D, and for any (c, d) ∈ C × D, c ≤ d. Since supC f ≤ ℓ < u ≤ inf D f , by Claim
I we have that C has a maximum element c0 and D has a minimum element d0 . Hence
f (c0 ) = ℓ < u = f (d0 ). Consequently ℓ, u ∈ f (X) which means that G = (ℓ, u).

(ii) If X connected, then the collection of predecessor–successor pairs is empty. Constructing

f as in part (i) we obtain a strictly increasing continuous function f with values in [0, 1];
hence f (X) is connected in R which means that f (X) is an subinterval of [0, 1]. Since f
is injective, f {u : u < x} = f (X) ∩ (−∞, f (x)) and f {u : x < u} = f (X) ∩ (f (x), ∞)
are open intervals in f (X); hence f : X → f (X) is an open and so, X and f (X) are
homeomorphic.

The next result characterizes all metric continua with exactly two points.

Theorem 2.9.15. If X is a metric continuum with just two noncut points, then X is
homeomorphic to the unit interval I = [0, 1] with the usual topology.
2.10. Arzelà–Ascoli theorem 51

Proof. Suppose {a, b} are the only two cut points. Then X = E(a, b), and the order
topology on E(a, b) coincides with the metric topology by Theorem 2.9.13. With respect to
the order, a is the minimum element and b is the maximum element of X. As X is compact,
it has a countable base. The conclusion follows from Debreu’s lemma.

Corollary 2.9.16. If X is a compact, connected, locally connected metric space, then X is

arcwise connected.

Proof. Suppose a and b are distinct points in X. By Lemma 2.2.19, there exists a simple
chain C1 = {U1,0 , . . . , U1,n } from a to b made of open connected sets of diameter < 1. Let
V be the collection of all open sets in V of X that have diameter < 21 and V is contained
in one of the links in C1 . For each j = 1, . . . , n choose xj ∈ U1,j−1 ∩ U1,j , and set x0 = a
and xn+1 = b. Another application of Lemma 2.2.19 shows that for each j = 0, . . . , n, there
is a chain C(xj , xj+1 ) from xj to xj+1 of sets in V that are fully contained in Uj . We will
constrict a simple chain from a to b as follows. From the simple chain from a to x1 , let
V ′ be the first link that intersects some link in the chain C(x1 , x2 ). Set C0′ to be the chain
obtained by removing all the links in from C(a, x1 ) succeeding V ′ . Let V ′′ be the last link in
C(x1 , x2 ) that intersects V ′ and set C1′ to be the chain obtained by removing from C(x1 , x2 )
the links that precede V ′′ . Clearly C0′ ∪ C1′ is a simple chain from a to x2 . Repeating this
construction at all remaining intersections results in a simple chain C2 = {U2,0 , . . . , U2,n2 }
from a to b such that diam(U2,j ) < 12 , and for each j, U 2,j ⊂ U1,k for some k. Continuing
this process, we obtain for each n ∈ N, a simple chain Cn from a to b with open connected
links of diameter < 2−n and whose closures are contained in links of the chain Cn−1 . Let
Cn be the union of the closures of the linksTin Cn . Clearly {Cn : n ∈ N} is a decreasing
sequence of subcontinua of X. Hence C = n Cn is a nonempty subcontinuum in X and
contains {a, b}.
Claim: C has only two noncut points, a and b. If x ∈ C \ {a, b}, for any n ∈ N, at most one
or two links in the simple chain Cn contain x. Let An be the union of the links preceding
them, and let Bn be the union of all the links in succeeding them. It follows that
[
∞ [
∞
C \ {x} = An ∩ C) ∪ Bn ∩ C ,
n=1 n=1

A and B are nonempty disjoint open sets in C, that is, x is a cut point. Hence, by
Lemma 2.9.10, C has exactly two noncut points, namely a and b. It follows that C is
homeomorphic to the unit interval [0, 1] and so, C is an arc from a to b

2.10. Arzelà–Ascoli theorem

A family F of functions from a topological space (X, τ ) to a metric space (Y, d) is said to
be equicontinuous at a point x ∈ X if for any ε > 0, there is an open neighborhood
U containing x such that supy∈U, f ∈F d(f (x), f (y)) < ε. F is equicontinuous if it is
equicontinuous at every point in X.
52 2. Elements of point set Topology

Lemma 2.10.1. Let (X, d) be a metric space. Suppose that for any ε > 0, there exist some
δ > 0, some metric space (W, ρ) and a map Φ : X → W such that Φ(X) is totally bounded,
and d(x, y) < ε whenever ρ(Φ(x), Φ(y)) < δ. Then, X is totally bounded.

Proof. Given ε > 0, choose δ > 0, W and Φ as in the statement of the Lemma. Then, there
exists a finite collection {V1 , . . . , Vn } of balls of diameter δ covering Φ(X). Consequently
{Φ−1 (V1 ), . . . , Φ−1 (Vn )} covers X, and diam(Φ−1 (Vj ) ≤ ε for each j = 1, . . . , n. This shows
that X is indeed totally bounded.
Theorem 2.10.2. (Arzelà–Ascoli) Let (X, τ ) be a compact topological space and let (S, d)
be a complete metric space. F ⊂ C(X, S) is relatively compact iff F is equicontinuous and
{f (x) : f ∈ F} is relatively compact in S for each x ∈ X.

Proof. If F is relatively compact in C(X, S) then it Sis totally bounded,

and so for any ε > 0
n ε
there are fj ∈ C(X, S), j = 1, . . . , n such that F ⊂ j=1 B fj ; 3 . For fixed x ∈ X,
n
[ ε
(2.8) Fx := {f (x) : f ∈ F} ⊂ B fj (x);
3
j=1

which means that Fx is totally bounded in S. Since S is complete, Fx is relative compact

in S.
ε
There exists an open neighborhood U of x such that max1≤j≤n d(fj (x), fj (y)) < 3 whenever
y ∈ U . By (2.8), for any f ∈ F there is fj with sup d(f (z), fj (z)) < 3ε . Hence
z∈X

d(f (x), f (y)) ≤ d(f (x), fj (x)) + d(fj (x), fj (y)) + d(fj (y), f (y)) < ε, y ∈ U.
Therefore, F is equicontinuous at any point x ∈ X.

Conversely, suppose F ⊂ C(X, S) is equicontinuous and that Fx = {f (x) : f ∈ F} is totally

bounded in S for each x ∈ X. The compactness of X implies that there exist a finite set
of points {xj : j = 1, . . . , n} ⊂ X and open neighborhoods Uj with xj ∈ Uj such that
S
X = nℓ=1 Uℓ , and for all f ∈ F and x ∈ Uℓ , d(f (x), f (xℓ )) < 3ε . As each Fxℓ (ℓ = 1, . . . , n)
Q
is compact in (S, d), we have that W := nℓ=1 Fxℓ with metric ρ(w, z) = max d(wℓ , zℓ ) is
1≤ℓ≤n
a compact metric space. Define the map Φ : F → W by
Φ(f ) = (f (x1 ), . . . , f (xn ))
It follows that Φ(F) is relatively compact in W and hence, totally bounded. Suppose
ρ(Φ(f ), Φ(f ′ )) < 3ε . For any x ∈ X let ℓ be such that x ∈ Uℓ . Then, for any f, f ′ ∈ F
d(f (x), f ′ (x)) ≤ d(f (x), f (xℓ )) + d(f (xℓ ), f ′ (xℓ )) + d(f ′ (xℓ ), f ′ (x)) < ε
This shows that supx∈X d(f (x), f ′ (x)) < ε. The conditions of Lemma 2.10.1 hold; therefore,
F is a relatively compact subset of C(X, S).
Corollary 2.10.3. Suppose (X, τ ) is a compact space and F ⊂ C(X, Rd ).
(i) F is totally bounded iff F is bounded and equicontinuous.
2.11. Locally compact Hausdorff spaces 53

(ii) F is totally bounded iff F is equicontinuous and {f (x) : f ∈ F} is bounded for

every x ∈ X.

Proof. (i) If F is totally bounded, then it is relatively compact in C(X, Rn ) and F ⊂

SN d
j=1 B(fj ; 1) for some fj ∈ C(X, R ). Therefore, kf ku ≤ 1 + max1≤j≤N kfj k := M , and
thus, F is bounded. Conversely, if F is bounded, then Fx = {f (x) : f ∈ F} is relatively
compact in Rn . The conclusion follows from Theorem 2.10.2.

(ii) follows directly from Theorem 2.10.2 by noticing that a set in Rn is relatively compact
iff it is bounded.
Example 2.10.4. Let (X, d) be a compact metric space. Suppose F is a bounded collection
of functions F on X such that |f (x) − f (y)| ≤ M d(x, y) for all x, y ∈ X and f ∈ F. Then,
F is totally bounded in C(X, R). In particular, the collection of all Lipschitz functions such
that kf ku + kf kL ≤ M , where
|f (x) − f (y)|
kf kL = sup
x6=y d(x, y)
is compact in C(X, R).

2.11. Locally compact Hausdorff spaces

Definition 2.11.1. A topological space (X, τ ) is locally compact(l.c.H.) if each point x
has an open neighborhood U such that U is compact.
Lemma 2.11.2. Suppose that X is l.c.H. If is U open in X and x ∈ U , there exists a open
set V with compact closure such that x ∈ V ⊂ V ⊂ U .

Proof. Let W be an open neighborhood of x with compact closure. Since W ∩ U also has
compact closure and contains x, we can assume without loss of generality that W ⊂ U .
If W = W , there is nothing else to prove; otherwise, {x} and ∂W = W \ W are disjoint
nonempty compact sets. For any y ∈ ∂W , there are disjoint open sets Vy and Hy such
that x ∈ Vy and y ∈ Hy . By compactness, there are finite Hy1 , . . . , Hyn such that ∂W ⊂
Sn Tn
j=1 Hyj =: H. Define V := W ∩ j=1 Vyj . Clearly x ∈ V , V ∩ H = ∅, V ⊂ W , and
V ⊂ X \ H. Hence, V is compact and

x ∈ V ⊂ V ⊂ W ∩ (X \ H) ⊂ W ∩ W ∪ (X \ W ) = W ⊂ U
Lemma 2.11.3. If (X, τ ) is a l.c.H. space then any basis B has a subset C whose closures
are compact, and which is itself a basis.

Proof. Let B be a countable basis for the topology. Let C be the collection if all sets in B
with compact closures. We prove now that C = 6 ∅ and C is a basis. Suppose U is an open
set and let x ∈ U . For some B ∈ B x ∈ B ⊂ U . Let V an open set with compact closure
such that x ∈ V ⊂ V ⊂ B. Then, for some B ′ ∈ B, x ∈ B ′ ⊂ V and B ′ is compact.
54 2. Elements of point set Topology

The support of a function f : X → C is defined by supp(f ) = {f 6= 0}. Given

two (complex or real) functions f g, we use the notation f ≺ g to say that f ≤ g and
supp(f ) ⊂ {g 6= 0}. We will denote by C00 (X) the space of (complex or real) continuous
functions on X whose support is a compact subset of X.

Theorem 2.11.4. Let X be a l.c.H. topological space. Suppose that A ⊂ U ⊂ X with A

compact and U open. There exists an open set V with compact closure such that A ⊂ V ⊂
V ⊂ U.

Proof. Each point of A has an open neighborhood with compact closure contained in U .
By compactness, A can be cover with a finite collection of such neighborhoods. The union
V of of the open sets in such finite collection is the required set.

Theorem 2.11.5. (Urysohn’s Lemma) Let X be a l.c.H. space, let A ⊂ X be compact and
U ⊂ X be open with A ⊂ U . There exists f ∈ C00 (X) such that A ≺ f ≺ U .

Proof. The proof is just as the one form Lemma 2.1.20 with a few slight modifications that
a n
S As before, let D0 = {0, 1}, Dn = { 2n : 0 < a < 2 , a ≡ 1
we indicate in what follows.
mod 2} (n ≥ 1), and D = n Dn the set of dyadic rational numbers. Fefine a chain {Ut }t∈D
of subsets of X progressively by first setting D1 = A and D0 = U . For n = 1, choose and
open set U 1 with compact closure such that U1 ⊂ U 1 ⊂ U 1 ⊂ U0 . Suppose open sets Ut ,
2
Sn−1 2 2
with t ∈ k=0 Dk and n ≥ 2 have been define in such a way that Ut ⊂ Us whenever s < t.
For u = 2an ∈ Dn , s = a−1 a+1
2n and t = 2n belong to Dn−1 , and so sets Us and Ut are already
defined. Choose an open set Uu with compact closure such that Ut ⊂ Uu ⊂ Uu ⊂ Us . This
procedure defines a chain {Ut }t∈D of open sets satisfying
Ut ⊂ Us whenever s, t ∈ D and s < t.

Define f (x) = 0 for x ∈ X \ U and f (x) = sup{t ∈ D : x ∈ Ut } elsewhere. That f satisfies

the desired properties follows just as in the proof of Lemma 2.1.20.

Lemma 2.11.6. Let X be a l.c.H space with a countable basis. There exists a countable
cover {Kn : n ∈ N} of X by compact sets such that Kn ( Int(Kn+1 ).

Proof. Let B = {Bn : n ∈ N} be a countable basis of relatively compact sets. Set K1 = B 1 .

Proceeding by induction, suppose we have defined open set K1 , . . . , Kn . Let mn be the
smallest positive integer such that
Kn ⊂ B1 ∪ . . . ∪ Bmn
Define
Kn+1 := B 1 ∪ . . . ∪ B mn ∪ B mn +1
Clearly {Kn : n ∈ N } satisfies the conditions in the present Lemma.
2.11. Locally compact Hausdorff spaces 55

Theorem 2.11.7. (One point compactification) Suppose (X, τ ) is a l.c.H space, and let
Xb = X ∪ {∆} where ∆ a point not in X. Define τb as the collection of arbitrary unions of
sets in τ ∪ {Xb \ K : K; compact in (X, τ )}. Then (X,
b τb) is a Hausdorff compact space. If
b If (X, τ ) is compact, then X is
(X, τ ) is not compact, then X is an open dense set in X.
b τb).
an open and closed compact subset of (X,

Proof. If U ∈ τ and K ⊂ X compact, then V = U ∩ (X b \ K) = U \ K. As X is Hausdorff,

b
K is closed in (X, τ ); hence, V ∈ τ ⊂ τb. This shows that τb is indeed a topology on X.

Suppose U = {Uα : α ∈ A } ⊂ τb covers X. b Then at least one Uα is of the form X b \K, where
0
K is compact in (X, τ ). This means that {Uα ∩ X : α 6= α0 } is an open cover S (in τ ) of K.
Hence there exist a finite collection of set Uα1 , . . . , Uαn in U such that K ⊂ nj=1 X ∩ Uαj .
b = Sn Uα whence we conclude that X
It follows that X b is compact. Clearly X is an open
k=0 j

subspace of (X,b τb) since X ∈ τ . If x ∈ X then there are open neighborhoods V ⊂ V ⊂ U

of x such that V and U are compact subsets in (X, τ ). Then V and X b \ U are disjoint
neighborhoods of x and ∆ showing that X b is Hausdorff.

For any set K that is compact in (X, τ ), X ∩ (X b \ K) = X \ K. If (X, τ ) is not compact,

b τb). If (X, τ ) is compact, then it is also compact,
then X \ K 6= ∅; hence X is dense in (X,
b
and hence closed, in (X, τb).
Lemma 2.11.8. (Partition of unity) Let X be a l.c.H. space. For any open cover G1 , . . . , Gn
of a compact set K ⊂ X, there are functions f1 , . . . , fn ∈ C00 (X) such that 0 ≤ fj ≺ Gj ,
and f1 (x) + · · · + fn (x) = 1 for all x ∈ K.

Proof. Every x ∈ K has a neighborhood Vx whose closure is compact and contained in some
Gj . By compactness, K is covered by a finite collection V1 , . . . , Vk of such neighborhoods.
For each j = 1, . . . , n, let Hj be the union of those Vℓ that lie in Gj . Then, there are functions
gj ∈ C00 (X) such that Hj ≺ gj ≺ Gj . Define h1 = g1 and hj = (1 − g1 ) · . . . · (1 − gj−1 )gj
for 2 ≤ j ≤ k. Then 0 ≤ hj ≺ Gj . It is easy to verify by induction that
k
X
h= hj = 1 − (1 − g1 ) · . . . · (1 − gk )
j=1

If x ∈ K then x ∈ Hj for some j; hence, h = 1 on K.

Theorem 2.11.9. Suppose that (X, τ ) is a l.c.H. space with countable base. Then there
is a metric d on X that generates the topology τ , under which (X, d) is a complete and
separable metric space. Moreover, C00 (X) is separable in the uniform norm.

Proof. Since X has a countable base and is Hausdorff, its one point compactification
Xb = X ∪ {∆}, of which X is an open subset, is also separable and Hausdorff. Indeed, there
esits a countable basis B = {Un ⊂ X} for X such that each Un is compact in X. The family
b \ U , U ∈ B, is a basis of open neighborhoods of
of finite intersections of sets of the form X
∆.
56 2. Elements of point set Topology

We will show that the space X b is metrizable by embedding it homeomorphically into the
cube [0, 1]N. Let C be the family of pairs (V, U ) with V, U ∈ B such that V ⊂ U . By
Urysohn’s lemma, for each (V, U ) ∈ C there is fV,U ∈ C00 (X) such that V ≺ f ≺ U . As X is
Hausdorff and f (∆) = 0 for all f ∈ C00 (X), the collection F = {fV,U : (U, V ) ∈ C} separates
b Let {(Vn , Un ) : n ∈ N} be an enumeration of C. The map e : x 7→ (fVn ,Un (x))
points in X.
embeds continuously X b into the cube [0, 1]N. As F separates points of X, b the map e is
injective. The compactness of X b implies that e is an homeomorphism between X b and
b N
e(X) ⊂ [0, 1] . Hence, X is homeomorphic to an open set e(X) ⊂ [0, 1] . N

To show that C00 (X) is separable in the uniform norm, observe that the collection E ⊂
b
C00 (X) of polynomials in F with rational coefficients is a ring that separates points in X.
Since C00 (X) ⊂ {f ∈ C(X)b : f (∆) = 0}, the Stone–Weiertrass theorem 5.3.10 implies that
u b
C00 (X) ⊂ E = {f ∈ C(X) : f (∆) = 0} = C0 (X).
Corollary 2.11.10. If X is a compact metric space, then X is separable and C(X) is
separable in the topology of uniform convergence.
S
Proof. For
S any n ∈ N, there is a finite set Fn ⊂ X such that X = x∈Fn B(x; 1/n). The
set F = n Fn is a dense set in X and thus, X is separable.

The second assertion is a direct consequence of Theorem 2.11.9.

2.12. Exercises
Exercise 2.12.1. Suppose {(Xα , τα ) : α ∈ A } is a familly of topological spaces. For each
α ∈ A , Xα′ := Xα × {α} is considered an exect copy of Xα in F the sense S that U × {α} is
declared open in Xα′ iff U ∈ τα . Define the disjoint union X = α Xα := α Xα′ . Let τ be
the collection of all U ⊂ X such that U ∩Xα′ is open in Xα′ . Show that τ is a topology on X,
that Xα′ is an open and closed subset of X and that {U ∩ Xα′ : U ∈ τ } = {V × {α} : V ∈ τα }
for all α ∈ A .
Exercise 2.12.2. In any topological space (X, τ ) show that

∂(A ∪ B) ⊂ ∂A ∪ ∂B

∂(A ∩ B) ⊂ ∂A ∪ ∂B
for all subsets A, B of X. (Hint: A ∪ B ⊂ A ∪ B and Ao ∪ B o ⊂ (A ∪ B)o )
Exercise 2.12.3. Let (X, τ ) be a topological space. Suppose there exists a family {(Yi , τi ) :
i ∈ I} of Hausdorff spaces and a collection F = {fi : X −→ Yi }i∈I of continuous functions
which separates points in X; i.e., for any x1 , x2 ∈ X, x1 6= x2 , there is f ∈ F such that
f (x1 ) 6= f (x2 ). Show that X is Hausdorff.
Exercise 2.12.4. Let X and Y be topological spaces and suppose that f : X → Y . Show
that the following statements are equivalent.
(a) f is continuous.
2.12. Exercises 57

(b) f −1 (F ) is closed in X whenever F is closed in Y .

(c) For any x ∈ X and U ∈ Uf (x) , there is V ∈ Vx such that f (V ) ⊂ U .
(d) For any A ⊂ X, f (A) ⊂ f (A).
(e) f −1 (B) ⊂ f −1 (B) for any B ⊂ Y .
◦
(f) f −1 (B ◦ ) ⊂ f −1 (B) for any B ⊂ Y .
Exercise 2.12.5. Let (T, τ ) be a topological space. Show that G ⊂ Y is closed relative
to τY iff G = Y ∩ F where F is closed in τ . Show that the closure of a set B ⊂ Y in Y ,
Y
denoted by B , is given by B ∩ Y .
Exercise 2.12.6. If each (Yα , τα ) is a Hausdorff topological space and the family of func-
tions F = {fα : X −→ Yα } separates points, show that the topology induced by F on X is
Hausdorff.
Exercise 2.12.7. Let (X, d) be a metric space. Show that the metrics ρ and d1 in Exam-
ple 2.5.4 are indeed equivalent. Furthermore, show that id d is a complete metric, then so
are ρ and d1 .
Exercise 2.12.8. Suppose f : (X, d) → (Y, ρ) is a continuous function between metric
spaces. Show that there is an equivalent metric dˆ on X such that f : (X.d) ˆ → (Y, ρ)
ˆ
is uniformly continuous. Furthermore, if d is complete, show that d can be chosen to be
ˆ y) := d(x, y) + ρ(f (x), f (y)).)
complete. (Hint: consider d(x,
Exercise 2.12.9. Suppose X is Hausdorff and Y is second countable Hausdorff. A function
f : X → Y is proper is f −1 (K) is compact in X whenever K is compact in Y . Show that
an injective proper map f maps open sets in X to open sets in f (Y ). (Hint: Suppose W
is open in X and f (W ) not open in f (Y ). There is a convergent sequence {yn : n ∈ N} ⊂
f (Y ) \ f (W ) such that y = limn yn ∈ f (W ). {yn , y : n ∈ N} is compact in Y .)
Exercise 2.12.10. If X is locally connected, show that every connected component of X
is clopen (closed and open in X). If F be a proper closed subset of X, show that each
connected component of Y = X \ F is open in Y (and so in X).
Exercise 2.12.11. (X, τ ) is locally path–connected if τ admits a basis of path–connected
sets. If X is locally path connected, show that every connected component of X is path
connected.
Exercise 2.12.12. (Riemann sphere) Show that the sphere S2 = {(x1 , x2 , x3 ) ∈ R3 :
x21 + x22 + x23 = 1} with the induced Euclidean topology is homeomorphic to the one point
compactification R2 ∪ {∞}. (Hint: from the noth pole (point e3 = (0, 0, 1)) and any point
P ∈ R2 × {0}, the line from e3 to P intersects S2 and one and only one point Q.)
Exercise 2.12.13. It is a well know result in Analysis that (Fn , k k2 ) is complete. In this
Exercise we outline a proof of this fact. It is enough to consider the case F = R. Suppose
A := {xm : n ∈ N} ⊂ Rn is a Cauchy sequence.
58 2. Elements of point set Topology

(a) Show that there is a box I := [a1 , b1 ] × . . . × [an , bn ], where −∞ < aj < bj < ∞
for each j = 1, . . . , n, such that A ⊂ I. Based on the previous statement, by
considering the sequence of numbers on each component of the vectors xn , it will
be enough to consider the case n = 1.
(b) Denote by ℓ(I) be the length of interval I. Divide I in two subintervals of same
length. Choose one subinterval I1 that contains an infinite number of elements
of the sequence A to obtain a subsequence A1 ⊂ A ∩ I1 . Arguing by induction,
obtain a sequence of nested subintervals Ik+1 ⊂ Ik ⊂ I with ℓ(Ik ) = 2−k ℓ(I), and
subsequences Ak+1 ⊂ Ak ⊂ A such that Ak+1 ⊂ Ik+1 ∩ Ak .
(c) Let αk and βk be the left and right end points of Ik . Show that αk ≤ αk+1 ≤
βk+1 ≤ βk for all k, and conclude that limk αk = limk βk := x∗ .
(d) Construct a subsequence {xmk : k ∈ N} ⊂ A so that limk xnk = x∗ .
Exercise 2.12.14. Let (X, d) be a metric space and let (xn : n ∈ Z+ ) be a sequence in X.
For each n ∈ Z+ define An = {xm : m ≥ n}. Show that
(a) (xn ) is Cauchy if and only if for any ε > 0 there exists an integer N > 0 such that
d(xn , xN ) < ε whenever n ≥ N .
(b) (xn ) is Cauchy iff limn→∞ diam(An ) = 0.
Exercise 2.12.15. Suppose {(Xα , dα ) : α ∈ A } is a pairwise–disjoint family of metric
spaces. Show that

ρα (x, y) ∧ 1 if (x, y) ∈ Xα × Xα
(2.9) ρ(x, y) =
2 if (x, y) ∈ Xα × Xβ , α 6= β
F
is a metric compatible with the disjoint union topology on α Xα .
Exercise 2.12.16. Suppose (X, d) and (Y, ρ) are metric spaces, and that d is complete. If
E is closed in X, f : E −→ Y is continuous, and
ρ(f (x), f (x′ )) ≥ d(x, x′ )
for all x and x′ in E, show that f (E) is closed.
Exercise 2.12.17. Show that if (X, d) and f are as in Capaccioli’s theorem, then there is
N ≥ 1 sich that f N is a contraction.
S
Exercise
S 2.12.18. If A ⊂ B, show that Aε ⊂ B ε , Aε ⊂ Bε . Show that Aε = a∈A {a}ε
and a∈A {a}ε ⊂ Aε , where {a}ε and {a}ε are the open S and closed balls of radius T
ε centered
at a respectively. In addition, if A is compact, then a∈A {a}ε = Aε . Show that ε>0 Aε =
T
A = ε>0 Aε .
Exercise 2.12.19. Let B(X) denote the set of all real valued bounded functions on a set
X. For f ∈ B(X) define its uniform norm by kf ku = supx∈X |f (x)|. Show that k · ku is
a metric on B(X). If (X, τ ) is a topological space and Cb (X) is the space of real bounded
continuous functions on X, show that (Cb (X), k · ku ) is a complete metric space.
2.12. Exercises 59

Exercise 2.12.20. Let X be a compact set. Show that if F ⊂ X is closed, then F is

compact. In addition, if X is Hausdorff, show that F ⊂ X is compact iff F is closed.
Exercise 2.12.21. Prove the following statement:
For any set K in an Euclidean space (Rn , k k2 ), K is compact iff K is closed and bounded.
This is a well known result reffered to as the Heine–Borel theorem
P
Exercise 2.12.22. (Hilbert’s cube) Let ℓ2 := {x ∈ RN : n |x(n)|2 < ∞} and define
X 1
2 2
kxk2 := |x(n)|
n
1
Show that k k2 is a norm on ℓ2 . Let K = {x ∈ ℓ2 : |x(n)| ≤ n, n ∈ N}. Show that K is a
compact subset of ℓ2 .
Exercise 2.12.23. Show that the collection of functions F in C 1 ([0, 1]) such that kf ku ≤ 1
and kf ′ ku ≤ 1 is compact in C([0, 1]).
Exercise 2.12.24. Suppose X is a locally compact Hausdorff (l.c.H.) space. Show that if
K and L are disjoint compact subsets of X, then there are disjoint open sets U and V such
that K ⊂ U and L ⊂ V . (Hint: Assume first that L is a single point set)
Chapter 3

Basic measure theory

3.1. Measurable spaces

Consider a non empty set Ω, which we refer to as the sampling space. If we think of
elements ω ∈ Ω as the outcomes of certain experiment, then subsets of Ω are events, that
is, if A ⊂ Ω and the outcome ω ∈ A, then we say that the event A has occurred. A
probability measure is a function on a collection F of events in Ω which measures how
likely an event is to happen; for instance, the probability of occurrence of the void event
is P[∅] = 0, whereas the probability of the sure event P[Ω] = 1.

Example 3.1.1. Consider the experiment of casting a regular dice. The sampling space Ω
is described by the number of points on the side facing up up once the dice comes to a rest.
Then, Ω = {1, 2, 3, 4, 5, 6} and there are up to 26 different events; for instance, the event A
described by all the outcomes which have odd number of points is A = {1, 3, 5}; the event
B of outcomes with less than five points is B = {1, 2, 3, 4}. In this case, one can assign
probabilities to individual outcomes and then define probabilities to all events by assigning
them the sum of probabilities of their elements. For instance, P[{1}] = . . . = P[{6}] = 61
corresponds to the ideal fair dice. In this case, P[A] = P[{1}] + P[{3}] + P[{5}] = 12 ;
similarly, P[B] = 32 . The event B is more likely to happen than the event A.

In the example of the dice, probabilities are assign to events by adding the probabilities
of its individual outcomes. This procedure however does not provide a good way to measure
the probabilities of events when the sample space Ω is not countable.

Example 3.1.2. Consider the angle registered between a fixed reference axis through center
of a roulette and a marked point in the circumference of a roulette after one spins it. Then
Ω = [0, 2π). an ideal roulette has the property that it assigns the same probability to arcs
that have the same length. That is, if [a, b] ⊂ [0, 2π), then P[[a, b]] = b−a
2π . Observe that
this probability measure assigns probability zero to each individual outcome. P[Ω] = 1 but

61
62 3. Basic measure theory

there is no reasonable way of adding up an uncountable set of numbers, each of which is

zero.

The example of the roulette suggests that it is not always possible to start from indi-
vidual probabilities to construct a meaningful notion of probability of events. It is then
reasonable to assume that probabilities have been assigned to all events.
In order to determine probabilities of events, it seems reasonable to establish some ideal
structure on the collection of events, if any. We fisrt introduce some structures that appear
often in the theory of integration.

Definition 3.1.3. A collection E of subsets of Ω is a semiring if

(i) ∅ ∈ E ,
(ii) and for any I, J ∈ E , I ∩ J ∈ E and I \ J is the finite union of disjoint sets in E .

Example 3.1.4. The colletion E of intervals of the form (a, b] with −∞ < a < b < ∞ is a
semiring in R.

Lemma 3.1.5. Let R be a semiring of subsets of a set Ω.

(i) If A, A1 , . . . , An are sets in R then, there is a finite collection C ⊂ R of pairwise
disjoint sets such that
n
[ [
A\ Aj = {C : C ∈ C}
j=1

(ii) If {An : n ∈ N} ⊂ R then, there is a countable pairwise disjoint collection D ⊂ R

such that
[ [
An = {D : D ∈ D}
n∈N

Proof. (i) We prove the first statement by induction. The statement holds for n = 1 by
definition of a semiring. Suppose the statement holds for some n ≥ 1. Then there are
S S
pairwise disjoint sets C1 , . . . , Ck such that A \ nj=1 Aj = kℓ=1 Cℓ . From
n+1
[ n
[ k
[

A\ Aj = A \ Aj \ An+1 = (Cℓ \ An+1 )
j=1 j=1 ℓ=1

it follows that for each ℓ = 1, . . . , k there are pairwise disjoint sets D1ℓ , . . . , Dsℓℓ in R such
Sℓ
that Cℓ \ An+1 = sm=1 Dmℓ . Clearly, {D ℓ : ℓ = 1, . . . k, m = 1, . . . , s } is a collection of
m ℓ
pairwise disjoint sets in R and
n+1
[ sℓ
k [
[
ℓ
A\ Aj = Dm
j=1 ℓ=1 m=1
3.1. Measurable spaces 63

S
(ii) Let B1 = A1 and Bn = An \ jn−1 Aj . By (i) each set Bn is the union of a finite
collectionSof sets in R and {Bn : n ∈ N} is a pairwise disjoint collection. (ii) follows from
S
n An = n Bn .
Definition 3.1.6. A collection R of subsets of Ω is a ring if
(i) A ∪ B ∈ R,
(ii) and A \ B ∈ R whenever A, B ∈ R.
A ring R that is closed under countable unions is called σ–ring . A ring R is called δ–ring
if it is closed under countable intersections. A ring A is an algebra if

(iii) Ω ∈ A .
An algebra F which is also a σ–ring is called σ–algebra.
Example 3.1.7. For any set Ω, if R is σ–ring of subsets of Ω, then R is a δ–ring.TTo check

this, let {An : n ∈ N} ⊂ R. Then A1 \ An ∈ R for any n ∈N, and so A1 \ n An =
S T T
n (A1 \ An ) ∈ R. Consequently, n An = A1 \ A1 \ n An ∈ R.
Example 3.1.8. The collection {∅, Ω} is a σ–algebra in Ω. It is the smallest one, and thus
it is called the trivial σ–algebra.
Example 3.1.9. The collection P(Ω) of all subsets of Ω, that is, P(Ω) = {A : A ⊂ Ω} is
clearly a σ–algebra in Ω. It is the largest one and it is called the power set.
Definition 3.1.10. Given a collection C of subsets of Ω, the σ–algebra generated by C,
denoted by σ(C), is the intersection of all σ–algebras containing C. If A and F are σ–
algebras in Ω and A ⊂ F , then A is said to be a sub σ–algebra of F .
Definition 3.1.11. Let (X, G) be a topological space. The σ–algebra generated by all open
sets G, denoted by B(X) is called the Borel σ–algebra.
Example 3.1.12. Considet the Euclidean space (Rn , | · |). B(Rn ) is generated by the
countable collection of open balls {Br (x) : r ∈ Q+ , x ∈ Qd }.

The following results gives an alternative characterization of the Borel σ–algebra of a

topological space.
Theorem 3.1.13. Let (X, G) be a topological space. The Borel σ–algebra is the minimal
collection of sets containing the closed and open sets that is closed under countable inter-
sections and countable disjoint unions.

Proof. Let S be the minimal collection of sets of X containing the open and closed sets,
and which is closed under countable intersections and countable disjoint unions. Clearly
S ⊂ B(X). Consider
S0 := {A ∈ S : Ac ∈ S }
We will show that S0 is a σ–algebra. Clearly S0 contains the closed and open sets, and
it is closed under complementation. In particular, X ∈ S0 . If {An : n ∈ N} ⊂ S0
64 3. Basic measure theory

then, by definition it follows that {Acn : n ∈ N} ⊂ S . Since S is closed under countable

Tn−1 c T
intersections, we have that {A1 , An ∩ j=1 Aj : n ≥ 2} ⊂ S , and n Acn ∈ S . As S closed
under countable disjoint unions,
[ n−1
\ [
A1 ∪ An ∩ Acj = An ∈ S .
n≥2 j=1 n

Hence S0 is a σ–algebra containing G, and so S0 = σ(G) = B(X).

3.2. Measure spaces

The starting point of axiomatic probability theory is a sample space Ω together with a
σ–algebra events F . The pair (Ω, F ) is called measurable space. We will not make any
attempt to justify the appropriateness of that structure imposed on F , but it seems to be
natural.

Definition 3.2.1. Suppose E is a semiring in a set Ω and let µ : E → R+ .

(a) µ is said to be finitely additive (resp. countably additive) if
[ X
µ( Ai ) = µ(Ai )
i∈I i∈I

S finite (resp. countable) family {Ai : i ∈ I} ⊂ E of pairwise disjoint sets

for any
with i∈I Ai ∈ E
(b) µ is said to be finitely (countably) subadditive if
[ X
µ( Bj ) ≤ µ(Bj )
j∈J j∈J
S
for any finite (countable) family {Bj : j ∈ J} ⊂ E with j∈J Bj ∈ E .
(c) If E is an algebra and µ is finitely additive and µ(∅) = 0, then µ is called a charge.
(d) If E is a σ–algebra and µ is countably additive with µ(∅) = 0, then µ is said to be
a measure and (Ω, E , µ) is called measure space.
(e) A measure µ is semifinite if any A ∈ E , with µ(A) > 0 has a subset B ∈ F such
that 0 < µ(B) < ∞.
(f) A measure µ is called probability measure if µ(Ω) = 1. In this case, the triplet
(Ω, E , µ) is called probability space.

Example 3.2.2. The set function µ on the semiring E = {(a, b] : −∞ < a < b < ∞} of
sets in R given by µ((a, b]) = b − a is σ–additive.

Remark 3.2.3. Clearly, if (Ω, F , µ) is a measure space, then µ(∅) = 0. Also, the order
of in which we take the union of the sets AP
n in the definition
P of a measure is not relevant.
Indeed, if f : N → N is any bijection, then n µ[Af (n) ] = n µ[An ] by Lemma A.1.1.
3.3. Construction of measures 65

Example 3.2.4. (Counting measure) Let Ω be any set with σ–algebra P(Ω). For finite
sets let µ(A) = card(A) the cardinality of A; and µ(A) = ∞ otherwise. It is easy to check
that (Ω, P(Ω), µ) is a measure space.
Example 3.2.5. For any measure space (Ω, F , µ), the collection of set R = {A ∈ F :
µ(A) < ∞} is a ring on Ω.
Theorem 3.2.6.SA nonnegative
charge µ on a measureble space (Ω, F ) is a measure iff
limn µ(Bn ) = µ m Bm for all nondecreasing sequence {Bn : n ∈ N} ⊂ F .

Proof. Suppose first that µ is a measure and let {Bn : n ∈ N} ⊂ F be nondecreasing. Let
A1 = B1 and An = Bn \SBn−1 for S n ≥ 2. Then {An : n ∈ N} is a pairwise disjoint sequence
of measurable sets and m Am = m Bm . Thus,
[ X n
X n
[
µ Bm ) = µ(Am ) = lim µ(Am ) = lim µ Am = lim µ(Bn )
n n n
m m m=1 m=1

Conversely,
Sn suppose {An : n ∈ N} ⊂ F be a pairwise S disjoint sequence and set Bn =
PAm . Then {Bn : n ∈ N} ⊂ F increases to m Am . Beign a charge, µ(Bn ) =
m=1
limn nm=1 µ(Am ). Hence
X [
µ(Am ) = lim µ(Bn ) = µ Am ,
n
n m
which shows that µ is a measure.
Remark 3.2.7. The assumption µ[A1 ] < ∞ is sufficient in Exercise 3.10.5 iii) as the next
example shows. Consider Ω = R and B any σ–algebra that contains all intervals of the
form [a, ∞). Let λ the measure that assigns to each interval (finite or infinite) its length.
Clearly ∩n [n, ∞) = ∅, however λ([n, ∞)) = ∞ for each n.

Let (Ω, F , µ) be a measure space. A set A ⊂ Ω is called µ–negligible if there is E ∈ F

such that A ⊂ E and µ(E) = 0. We denote by Nµ the collection of all µ–negligible sets.
The measurable space (Ω, F , µ) is complete if Nµ ⊂ F . The completion of F (with
µ µ
respect to µ) is the σ–algebra F := σ(F ∪ Nµ ). Equivalent characterizations of F are
given in Exercise 3.10.6
Example 3.2.8. Consider the Borel space ([0, 1], B([0, 1]). The completion of B([0, 1])
with respect to the measure δ0 is the whole power set P([0, 1]). We will see in Section 3.4.1,
the completion of Lebesgue measure λ on B([0, 1]) is the collection of all Lebesgue sets Mλ
contained in [0, 1].

3.3. Construction of measures

As we observed before, the probability of individual outcomes of an event may not add up to
the probability of the event that contain them. In this section we show that one can assign
probabilities to a certain basic class C of events in Ω first, and then extend the probability
measure to all events in σ(C) consistently.
66 3. Basic measure theory

Definition 3.3.1. Let Ω be a sampling space. An outer measure on Ω is a function

µ∗ : P(Ω) → [0, ∞] such that
(i) µ∗ [∅] = 0
(ii) If A ⊂ B ⊂ Ω, then µ∗ [A] ≤ µ∗ [B] (monotonicity)
S P
(iii) µ∗ [ n An ] ≤ n µ∗ [An ] for any An ⊂ Ω (countably subadditivity).

The following result describes a general procedure to construct outer measures.

Theorem 3.3.2. Let Ω be a non empty set. Given a nonempty collection E ⊂ P(Ω),
∅ ∈ E , and a function h : E → R+ with h(∅) = 0 define
X [
(3.1) µ∗ (A) = inf h(An ) : A ⊂ An , An ∈ E .
n∈N n∈N
Then, µ∗
is an outer measure (inf ∅ := ∞). We say that µ∗ is the outer measure associated
to the pair (E , h).

Proof. Clearly µ∗ (∅) = 0 and µ∗ (A) ≤ µ∗ (B) wheneverSA ⊂ B. To check subadditivity

consider a countable sequence of subsets An and let A = n An . If µ∗ (An ) = ∞ for some n,
then clearly µ∗ (A) = ∞; thus, it is enough to assume that µ∗ (APn ) < ∞n for all∗ n. Let ε−n >0
n : m ∈ N} be a cover of A so that
and for each n ∈ N, let {Bm h(B ) < µ (A ) + 2 ε.
n m m n
Then, {Bm n : n, m ∈ N} is a countable cover of A and
X X
(3.2) µ∗ (A) ≤ n
h(Bm )≤ µ∗ (An ) + ε.
n,m n

The conclusion follows by letting ε ց 0.

Outer measures are interesting since they can be use to extend and/or construct mea-
sures as we will demonstrate below.
Definition 3.3.3. Let µ∗ be an outer measure on Ω and let E ⊂ Ω.
(a) If E satisfies
(3.3) µ∗ (A) = µ∗ (A ∩ E) + µ∗ (A ∩ E c ) for all A⊂Ω
then we say that E is µ∗ –measurable.
(b) If µ∗ (E) = 0, then E we say that E is µ∗ –negligible.
The collection of all µ∗ –measurable subsets of Ω is denoted by Mµ∗ .
Theorem 3.3.4. If µ∗ be an outer measure on Ω then, Mµ∗ is a σ–algebra and contains
all µ∗ –negligible sets. Moreover, (Ω, Mµ∗ , µ∗ ) is a complete measure space.

Proof. If µ∗ (E) = 0 then µ∗ (B) = 0 for any B ⊂ E. Thus, by subadditivity,

µ∗ (A) ≥ µ∗ (A \ E) = µ∗ (A ∩ E) + µ∗ (A \ E) ≥ µ∗ (A)
for any A ⊂ Ω. Therefore E is µ∗ –measurable and in particular ∅ is µ∗ –measurable.
By definition, it is clear that A is µ∗ –measurable if and only if Ac is µ∗ –measurable.
3.3. Construction of measures 67

It remains to show that the collection of µ∗ –measurable is closed under countable unions.
Since S
the countable
S union of sets can be expresses as a countable
Sn−1 union of pairwise disjoint
sets: An = n Bn where B1 = A1 , and Bn = An \ ( k=1 Ak ), it suffices to assume a
pairwise disjoint sequence {An : n ∈ N}. We first prove by induction that
n
X n
[
(3.4) µ∗ (E) = µ∗ (E ∩ Ak ) + µ∗ (E ∩ ( Ak )c )
k=1 k=1

for any E ⊂ Ω. For n = 1 this is just by definition. Assume the statement is to for n. Since
An+1 is µ∗ –measurable, we have that
n
\ n
\ n
\
∗
µ (E ∩ ( Ack )) ∗
= µ (E ∩ ( Ack ) ∗
∩ An+1 ) + µ (E ∩ ( Ack ) ∩ Acn+1 )
k=1 k=1 k=1
n+1
\
= µ∗ (E ∩ An+1 ) + µ∗ (E ∩ ( Ack ))
k=1

Thus (3.4) follows.

S S
The monotonicity of µ∗ implies µ∗ (E ∩ ( nk=1 Ak )c ) ≥ µ∗ (E ∩ ( ∞ c
k=1 Ak ) ). Hence by (3.4)
∗
and the subadditivity of µ we have that
n n
!
X [
µ∗ (E) = lim µ∗ (E ∩ Ak ) + µ∗ (E ∩ ( Ak )c )
n→∞
k=1 k=1
∞
X ∞
[
≥ µ∗ (E ∩ Ak ) + µ∗ (E ∩ ( Ak )c )
k=1 k=1
∞
[ [∞
≥ µ∗ (E ∩ Ak ) + µ∗ (E ∩ ( Ak )c ) ≥ µ∗ (E)
k=1 k=1
S∞
This
P∞ shows that k=1 Ak is µ∗ –measurable.
The choice of E = ∪∞ ∗
n=1 An gives µ (∪n An ) =
∗ ∗
n=1 µ (An ). Therefore, (Ω, Mµ∗ , µ ) is a measure space.

The outer measure µ∗ associated to a pair (E , µ) as in Theorem 3.3.2 is more interesting

in applications when E and µ satify some basic algebraic properties.
Theorem 3.3.5. ( Carathéodory’s extension) Suppose that µ is a nonnegative additive and
countably subadditive function over a semiring E of subsets of Ω, and let µ∗ be the outer
measure assiciated to the pair (E , µ). Then, σ(E ) ⊂ Mµ∗ and µ∗ extends µ as a complete
measure on Mµ∗ .

Proof. As ∅ ∈ E , finite additivity implies that µ(∅) = 0. Theorem 3.3.2, with h = µ, shows
that the set function µ∗ given (3.1) is an outer measure while Thoerem 3.3.4 shows that
(Ω, Mµ∗ , µ∗ ) is a complete measure space. We will show (i) that µ∗ and µ coincide in E
and that (ii) Mµ∗ contains σ(E ).
68 3. Basic measure theory

(i) Suppose I ∈ E , and let {Ik : k ∈ N} be a countable cover of I in E . Then {I, ∅} and
{I ∩ Ik : k ∈ N} are also a covers of I in E . By definition of µ∗ , the countable subadditivity
and finite additivity of µ it follows that
X X X
µ∗ (I) ≤ µ(I) ≤ µ(I ∩ Ik ) ≤ µ(Ik ∩ I) + µ(Ik \ I) = µ(Ik )
k k k
Taking the infimum over all countable covers of I in E leads to µ∗ (I) = µ(I).
(ii) Let I ∈ E and let A ⊂ Ω. Given ε > 0 let {Ik : k ∈ N} ⊂ E be a cover of A with
X
µ(Ik ) ≤ µ∗ (A) + ε.
k
Since Ik = (I ∩ Ik ) ∪ (Ik ∩ I c ),
and Ik ∩ I c is a finite union of disjoint sets in E , say
c
S Nk
Ik ∩ I = j=1 Ik,j , it follows that
Nk
X
µ(Ik ) = µ(Ik ∩ I) + µ(Ik ∩ I c ) = µ(Ik ∩ I) + µ(Ik,j )
j=1

Therefore,
X X
µ∗ (A) + ε ≥ µ(Ik ) = µ(Ik ∩ I) + µ(Ik ∩ I c )
k k
X X
= µ(Ik ∩ I) + µ(Ik,j )
k k,j
≥ µ∗ (A ∩ I) + µ∗ (A ∩ I c )
Letting ε → 0 leads to µ∗ (A) ≥ µ∗ (A∩I)+µ∗ (A∩I c ). This combined with the subadditivity
of the outer measure µ∗ shows that I is µ∗ –measurable.
Corollary 3.3.6. Let (E , µ) be as in Theorem 3.3.5, and let E ↑ denote the collection of
countable unions of sets in E .
(i) For any E ⊂ Ω, µ∗ (E) = inf{µ(C) : E ⊂ C ∈ E ↑ }. Moreover, there is B ∈ σ(E )
with E ⊂ B such that
(3.5) µ∗ (E) = µ(B)
S
(ii) For any increasing sequence {An : n ∈ N} of sets, µ∗ ∗
n An = limn µ (An ).
S
(iii) If E = n En , where {En : n ∈ N} ⊂ σ(E ) with µ(En ) < ∞ then, for any ε > 0,
there exists a cover of E by pairwise disjoint sets {An : n ∈ N} ⊂ E such that
[
µ An \ E < ε
n
(iv) If E ∈ σ(E ) and µ(E) < ∞, then for any ε > 0, there exists a finite set of pairwise
disjoint sets {Aj : j = 1, . . . , K} ⊂ E such that
K
[
µ E△ Aj < ε
j=1
3.3. Construction of measures 69

If η is any other extension of µ as a measure on (Ω, σ(E )) then η ≤ µ. In addition, if E is

a ring, then η(E) = µ(E) for all E ∈ σ(E ) with µ(E) < ∞.

Proof. Clearly µ∗ (E) ≤ µ∗ (B) = µ(B) for all B ∈ E ↑ with B ⊃ E. Thus, it suffices to
assume that µ∗ (E) < ∞.
S P
(i) If r > µ∗ (E), then there are Cn ∈ E so that E ⊂ C = n Cn ∈ E ↑ and n µ(Cn ) < r.
P
As E ↑ ⊂ σ(E ), µ(C) = µ∗ (C) ≤ n µ(Cn ) < r. The first statement follows by letting
r ց µ∗ (E).

To obtain
T (3.5), for each n ∈ N we choose Bn ∈ E ↑ with µ(Bn ) < µ∗ (E) + n1 . The set
B = n Bn has the desire property.

(ii) By partT(i), for each n there is Bn ∈ σ(E ) such that An ⊂ Bn and µ∗ (An ) = µ∗ (Bn ).
Let En := m≥n Bm . Then An ⊂ En ⊂ Bn ∩ En+1 and En ∈ σ(E ) whence it follows that
µ∗ (An ) = µ∗ (En ). Consequently,
[ [ [
µ∗ An ≤ µ∗ ( En ) = lim µ∗ (En ) = lim µ∗ (An ) ≤ µ∗ An
n n
n n n

where the first equality follows from the fact that µ∗ is a measure on σ(E ).

(iv) Suppose that E ∈ σ(E ). If

S µ(E) < ∞, then for any ε > 0, there is a cover {Cn :
P ε
n ∈ N} ⊂ E of E such that µ n Cn ≤ n µ(Cn ) < µ(E) + 2 . For N large enough
S SN
ε
µ n Cn \ j=1 Cj < 2 . Consequently,

[
N

µ Cj △E < ε
j=1
Tn−1
The sets B1 = C1 and Bn = Cn \ j=1 Cj for 1 ≤ n ≤ N are pairwise disjoint, and each
one of them is the finite union of disjoint sets in E . This proves (iv).

(iii) Without loss of generally we may assume that the sets in {En : n ∈ N} are pairwise
disjoint. For each n ∈ N there is a cover {Bn,m : m ∈ N} ⊂ E of En such that
[ ε
µ Bn,m \ En < n
m
2

Let {An : n ∈ N} be an enumeration of the countable collection {Bn,m : n ∈ N, m ∈ N}.

(iii) is then a cosequence of Lemma 3.1.5[(ii)].
To prove the last statements, let η be S another extension of µ to (Ω, σ(EP )). For any
E
P ∈ σ(E ), choose B n ∈ E so that E ⊂ n B n = B. Then η(E) ≤ η(B) ≤ n η(Bn ) =
µ(B ). Taking infima over all possible covers gives η(E) ≤ µ ∗ (E) = µ(E).
n n

∗
S that E ∈ σ(E ) with µ (E) ε= µ(E) < ∞. For any
Assume that E is a ring and suppose
ε > 0 choose Bn ∈ E so that E ⊂ n Bn = B and µ(B) < µ(E) + 2 . Hence, µ(B \ E) < 2ε .
70 3. Basic measure theory

S
Since Ak := kj=1 Bj ր B, we can choose k so that µ(B) − µ(Ak ) = µ(B \ Ak ) < 2ε . Since
η(B) = η(E) + η(B \ E), η ≤ µ and η = µ on E , it follows that

η(E) = η(B) − η(B \ E) ≥ η(Ak ) − η(B \ E)

≥ µ(Ak ) − µ(B \ E) > µ(B) − ε ≥ µ(E) − ε

Letting ε ց 0 we obtain that η(E) = µ(E).

Example 3.3.7. (Relative measure) Suppose (Ω, F , µ) is a measure space and let C ⊂ Ω
any arbitrary nonempty subset. The collection FC := {C ∩ A : A ∈ F }, called trace of C
is clearly a σ–algebra on C. Let µ∗ be the outer measure induced by F . Caratheódory’s
theorem extends µ to a σ–algebra Mµ containing F . If µ∗ (C) < ∞ then, there is C ′ ∈ F
such that C ⊂ C ′ and µ∗ (C) = µ(C ′ ). For any A ∈ F there are sets D, F ∈ F such that
C ∩ A ⊂ D, C \ A ⊂ E, and

µ∗ (C ∩ A) = µ(D)
µ∗ (C \ A) = µ(E)

As C ∩ A ⊂ D ∩ C ′ ) ∩ D ∩ A and C \ A ⊂ E ∩ C ′ ) ∩ E \ A , it follows that

µ∗ (C ∩ A) = µ(D ∩ A) = µ(D ∩ C ′ ) = µ(D)

µ∗ (C \ A) = µ(E \ A) = µ(E ∩ C ′ ) = µ(E)

Hence µ(D \ A) = 0 = µ(E ∩ A), and

µ∗ (C) ≤ µ((D ∪ E) ∩ C ′ ) ≤ µ(D ∪ E) ≤ µ(D) + µ(E)

= µ(D ∩ C ′ ) + µ(E ∩ D′ )
= µ∗ (C ∩ A) + µ∗ (C \ A) = µ∗ (C) = µ(C ′ )

Consequently, µ(D ∩ E) = 0, and µ C ′ △(D△E) = 0, and so

µ∗ (C ∩ A) = µ(C ′ ∩ A) = µ(D ∩ A)
µ∗ (C \ A) = µ(C ′ \ A) = µ(E \ A)

Therefore, νC (C ∩ A) := µ∗ (C ∩ A) = µ(C ′ ∩ A) defines a measure on (C, FC ) and it is

independent of the selection of C ′ .

3.4. Two examples of construction by outer measures.

In this section we present a construction of the Lebesgue and Lebesgue–Stieltjes measures
in Euclidean space Rn and the Hausdorff measure in metric spaces. The Lebesgue measure
will be obtain as direct application of Theorem 3.3.5, while the Hausdorff measure will make
use of a metric space structure of the base space.
3.4. Two examples of construction by outer measures. 71

3.4.1. Lebesgue and Lebesgue–Stieltjes measures. Consider measures on the Borel

space (Rd , B(Rd )). For x, y ∈ Rd , we use the notation x ≤ y, and x < y to indicate that
xk ≤ yk and xk < yk respectively and let us also denote e = (1, . . . , 1)⊤ . Finally, consider
Q
the collection E of all d–dimensional intervals dk=1 (ak , bk ] = (a, b] with ak ≤ bk , which is
clearly a semiring.
Let F : Rd −→ R be right–continuous, i.e., limxցa F (x) = F (a). For a ≤ b and
1 < j < d denote by
∆j (a, b)F (s) = F (s1 , . . . , sj−1 , b, sj+1 , . . . , sd ) − F (s1 , . . . , sj−1 , a, sj+1 , . . . , sd )
and the obvious convention for j = 1 and j = d.
Theorem 3.4.1. Suppose that F is right–continuous and has nonnegative increments i.e.,
Q
µ((a, b]) := dj=1 ∆j (aj , bj )F ≥ 0 for any d–dimensional interval (a, b]. Then µ admits an
extension to a measure on a σ–algebra Mµ ⊃ B(Rd ).

Proof. Clearly µ(∅) = 0 and S µ is finitely additive on E . We now prove that µ is countably
subadditive on E . If (a, b] = ∞
m=1 (a(m), b(m)], the right–continuity and positivity of the
increments of F imply that for any ε > 0, there are aε and bε (j) such that
µ((a, b]) < µ((aε , b]) + 2ε ; µ((a(m), bε (m)]) < µ((a(m), b(m)]) + ε
2m+1
Since the close box [aε , b] is compact and
∞
[ ∞
[
(3.6) [aε , b] ⊂ (a, b] ⊂ (a(m), b(m)] ⊂ (a(m), bε (m)),
m=1 m=1
SN 0
there is N0 ∈ N such that (aε , b] ⊂ [aε , b] ⊂ m=1 (a(m), bε (m)). Finite additivity implies
finite subadditivity on the semiring E , so
N0
X
ε ε
µ((a, b]) < µ((aε , b]) + 2 ≤ µ((a(m), bε (m)]) + 2
m=1
∞
X
≤ µ((a(m), b(m)]) + ε
m=1
Countably subadditivity of µ on E follows by letting ε ց 0. The conclusion follows from
Carathéodory’s extension theorem.

Lebesgue measure λ corresponds to the particular instance where F (s) = Πdj=1 sj , in

Q
which case, λ((a, b]) = dj=1 (bj − aj ), and Mλ is the Lebesgue σ–algebra.

Theorem 3.4.2. Let (Rd , B(Rd ), µ) a finite Borel measure space, and define the distribu-
tion function of µ by F (x) := µ[{y : y ≤ x}). Then
(i) F has nonnegative increments
(ii) F is proper, i.e., lim F (x) = µ(Rd ), lim F (x) = 0.
mink xk ր∞ mink xk ց−∞
(iii) F is right continuous
72 3. Basic measure theory

Conversely, if F satisfies (i)–(iii) then there is a finite measure µ on (Rd , B(Rd )) with
distribution F .

A natural question is whether every subset of Rn is Lebesgue measurable. The following

example answers this question in the negative.
Example 3.4.3. (Existence of a non Lebesgue measurable sets) In R the relation x ∼ y
if x − y ∈ Q defines an equivalence relation, that is, x ∼ x for all x ∈ R, x ∼ y implies
y ∼ x, x ∼ y and y ∼ z imply that x ∼ z. Thus, we can decompose R is disjoint classes of
equivalence; in fact, for any x ∈ R, the class of equivalence containing x is x + Q and thus,
dense in R. Using the axiom of choice, we construct a set E ⊂ [0, 1] by selecting an element
in (0, 1) from each of the different classes of equivalence. Let {rn } be an enumeration of all
rational numbers in (−1, 1) and consider the setsSEn = E + rn . It is clear that the sets En
are pairwise disjoint, En ⊂ (−1, 2), and (0, 1) ⊂ n En . If E is Lebesgue measurable, then
so is each En and
[ X
λ En = λ(En )
n n
S S
λ(E) = 0 then λ( n En ) = contradictingSthe fact that (0, 1) ⊂
If S n En . If λ(E) > 0, then
λ( n En ) = ∞ contradicting the fact that n En ⊂ (−1, 2).
Example 3.4.4. (Devil’s stair function) Consider the Cantor set C1/3 . Define the function
F : R → [0, 1] by letting F (x) = 0 if x ≤ 0 and F (x) = 1 if x ≥ 1, F (x) = 1/2 if
x ∈ [1/3, 2/3], F (x) = 1/4 if x ∈ [1/9, 2/9], F (x) = 3/4 if x ∈ [7/9, 8/9], . . .. To extend F
to all x ∈ [0, 1], let F (x) = inf{F (t) : x ≤ t, t ∈ C1/3 }. It is not difficult to check that F
is nondecreasing and continuous function everywhere. The measure µ whose distribution
is F is a probability measure that is continuous, that is µ({x}) = 0 for every x ∈ R;
and more importantly, it is singular with respect to Lebesgue measure in the sense that
µ(R \ C1/3 ) = 0 while λ(C1/3 ) = 0.

The following example shows that not every Lebesgue set is a Borel set.
Example 3.4.5. (Existence of a Lebesgue set that is not a Borel set.) Define the function
G : [0, 1] −→ [0, 1] by
G(y) = inf{x ∈ [0, 1] : F (x) = y}
where F is the devil’s stair function. It is easy to check that G takes values in the Cantor set
C1/3 . The continuity of F implies that F (G(y)) = y for all y ∈ [0, 1]. Thus, G is injective
and since F is nondecreasing, so is G. Hence, G is measurable for G−1 ([0, t)) is an interval
for all t ∈ [0, 1]. Consequently, B = G(E) is Lebesgue measurable for any E ⊂ [0, 1]. Let E
be any non–Lebesgue measurable subset of [0, 1]. If B were a Borel set, then G−1 (B) would
be Borel measurable, but G−1 (B) = E contradicting the choice of E.

3.4.2. Hausdorff measure on metric spaces. Suppose (X, d) is a metric space and let
g : R+ → R+ be a nondecreasing function with g(0) = 0. For each δ > 0 let Eδ the collection
3.4. Two examples of construction by outer measures. 73

of sets of diameter at most δ. It is easy to check that set function defined by

nX [ o
Hδg (A) := inf g(diam(An )) : G is at most countable, A ⊂ An , An ∈ Eδ
n∈G n∈G

on P(X) is an outer measure. Since Eδ ⊂ Eδ′ for δ < δ′, it follows that A 7→ H g (A) :=
supδ>0 Hδg (A) is also an outer measure.
Lemma 3.4.6. If A, B ⊂ X and d(A, B) := inf{d(x, y) : x ∈ A, y ∈ B} > 0, then
H g (A ∪ B) = H g (A) + H g (B).

Proof. Suppose 0 < δ < d(A, B) and let {Cn : n ∈ N} ⊂ Eδ be a cover of A ∪ B. Each
Cn intersects at most one of the sets A or B. Hence, we can split the cover {Cn } in two
according the whether A ∩ Cn = ∅ or B ∩ Cn = ∅. Consequently,
X
g(diam(Cn )) ≥ Hδg (A) + Hδg (B)
n

whence we conclude that Hδg (A ∪ B) ≥ Hδg (A) + Hδg (B). The opposite inequality holds by
the subadditivity of Hδs . The conclusion follows by letting δ → 0.
Definition 3.4.7. An outer measure µ∗ on a metric space that satisfies
µ∗ (A ∪ B) = µ∗ (A) + µ∗ (B) if d(A, B) > 0
is said to be a metric outer measure.
Theorem 3.4.8. ( Carathéodory) If µ∗ is a metric outer measure, then every Borel set is
µ∗ –measurable.

Proof. It is enough to show that any closed set F is µ∗ –measurable and to that end, we
will show that
(3.7) µ∗ (E) ≥ µ∗ (E ∩ F ) + µ∗ (E \ F )
for any subset E with µ∗ (E) < ∞. For any set B and ε > 0, let B ε = {x : d(x, B) < ε}.
Since F is closed, the sequence En = E \ F 1/n = {x ∈ E : d(x, F ) ≥ 1/n} increases to
E \ F . Since d(En , E ∩ F ) ≥ 1/n then,
µ∗ (E) ≥ µ((E ∩ F ) ∪ En ) ≥ µ∗ (E ∩ F ) + µ∗ (En )
and for any n,
[ X
µ∗ (E \ F ) = µ∗ En ∪ (Ek \ Ek−1 ) ≤ µ∗ (En ) + µ∗ (Ek \ Ek−1 )
k>n k>n
j
Observe that d(Ek \ Ek−1 , Ek+1+j \ Ek+j ) ≥ k(k+j) for j ≥ 1. Indeed, for any x ∈ Ek \ Ek−1
and y ∈ Ek+j+1 \ Ek+j we have
1 1 j
d(x, y) ≥ d(x, F ) − d(y, F ) > − = .
k k+j k(k + j)
74 3. Basic measure theory

The metric property of µ∗ implies that

m
X [
m
µ∗ (E2k \ E2k−1 ) = µ∗ (E2k \ E2k−1 ) ≤ µ∗ (E) < ∞
k=1 k=1
m
X [
m
µ∗ (E2k+1 \ E2k ) = µ∗ (E2k+1 \ E2k ) ≤ µ∗ (E) < ∞.
k=1 k=1
P
Thus, k µ∗ (Ek \ Ek−1 ) < ∞, µ∗ (En ) → µ∗ (E \ F ) and (3.7) follows.

Theorem 3.4.8 implies that set M g (X) of H g –measurable functions contains the Borel
sets of (X, d). By Theorem 3.3.4 H g extends to a complete measure on (X, M g (X)). For
each gp (t) = tp , p ≥ 0, the measure H p := H gp is called p–th Hausdorff measure on X.
Notice that p = 0 is the counting measure.

Theorem 3.4.9. If H p (A) < ∞, then H q (A) = 0 for all q > p. If H q (A) > 0, then
H p (A) = ∞ for all p < q.

Proof. It suffices to prove the first statement as the second statement is the contrapositive
statement of the first one. For any δ > 0, let {An : n ∈ N} ⊂ Eδ be a cover of A such that
X
(diam(An ))p < Hδp (A) + 1.
n

For any q > p, we have that

X X
(diam(An ))q ≤ δ q−p (diam(An ))p .
n n

Therefore, Hδq (A) ≤ δ q−p (Hδp (A) + 1). Letting δ ց 0 we obtain that H q (A) = 0.

A function f between metric spaces (X, d) and (Y, ρ) is called Lipschitz of degree α > 0
if for some constant L ≥ 0

ρ(f (x1 ), f (x2 )) ≤ Ldα (x1 , x2 )

for all x1 , x2 ∈ X. Lipschitz functions of degree one are typically refered only as Lipschitz
functions and
ρ(f (x1 ), f (x2 ))
Lip(f ) := sup
x1 6=x2 d(x1 , x2 )
is called Lipschitz coefficient of f .

Theorem 3.4.10. Let f be a Lipschitz function of degree α between metric spaces (X, d)
and (Y, ρ). For any s ≥ 0,

H s/α (f (A)) ≤ Ls/α H s (A)

3.4. Two examples of construction by outer measures. 75

α
Proof. Notice that diam(f (A)) ≤ L diam(A) . Given δ > 0 let δ ∗ = Lδ α . If {An } ⊂ Eδ
is a countable cover of A, then {f (An )} ⊂ Eδ∗ is a countable cover of f (A). Hence
s/α
X X s
Hδ∗ (f (A)) ≤ diam(f (An ))s/α ≤ Ls/α diam(An ) .
n n

Consequently, H s/α (f (A)) ≤ Ls/α H s (A) for all A ⊂ X.

Corollary 3.4.11. Suppose f : (X, d) → (Y, ρ) satisfies

ad(x1 , x2 ) ≤ ρ(f (x1 ), f (x2 )) ≤ bd(x1 , x2 )

for all x, y ∈ X and some constants 0 < a ≤ b. Then, for any s > 0 and any A ⊂ X

(3.8) as H s (A) ≤ H s (f (A)) ≤ bs H s (A)

In particular, if f is an isometry from X onto f (X) then, H s (A) = H s (f (A)).

Proof. It follows from Theorem 3.4.10 that H s (f (A)) ≤ bs H s (A). To obtain the inequality
in left hand side of 3.8 fix δ > 0 and conisder any countable covering {Bn } of f (A) with
diam(Bn ) ≤ δ. Then, {f −1 (Bn ∩ f (X))} is a countable covering of A and
1 1
diam(f −1 (Bn ∩ f (X)) ≤ diam(Bn ∩ f (X)) ≤ δ.
a a
Hence
X s
Has−1 δ (A) ≤ a−s diam(Bn )
n

whence we conclude that Has−1 δ (A) ≤ a−s Hδs (f (A)). The first conclusion follows by letting
δ → 0. For the last statement, set a = 1 = b.

There is a close connection between the Lebesgue measure λd and the d–th H d Hausdorff
measure on Rd . Each Hausdorff measure H p , p ≥ 0, is translation invariant. Let Q = (0, 1]d
and let δ > 0. Divide Q in nd non–overlapping cubes of size 1/n so that n−d < δ. It
follows that Hδd (Q) ≤ nd n−d dd/2 and thus, H d (Q) < dd/2 < ∞. On the other hand, if
{An : n ∈ N} ⊂ Eδ covers Q, then each An is contained in a closed ball of radius diam(An );
thus,
X X
λd (Q) ≤ λd (An ) ≤ ωd (diam(An ))d
n n

where ωd is the volume of the unit ball in Rd . Consequently, ωd−1 ≤ H d (Q). Therefore
there is a constant ωd−1 ≤ ad ≤ dd/2 such that H d = ad λd . We defer until Section 9.8 the
determination of the constant ad .
76 3. Basic measure theory

3.5. Uniqueness of measures

Outer measures and measurable transformations on measure spaces allow us to construct
a measure µ on measurable spaces (Ω, F ) where F = σ(C) and C is a class where µ is
previously defined. We now show that the extensions or measures thus obtained are in fact
unique.
Definition 3.5.1. (Classes of sets) Let Ω be a nonempty set.
(a) A collection P of subsets of Ω is called a π–system if A ∩ B ∈ P whenever A, B
are in P.
(b) A collection D is called a d–system (Dynkin–system) if X ∈ D; if A, B are sets
in D S
with B ⊂ A, then A \ B ∈ D; and if {An : n ∈ N} ⊂ D is an nondecreasing,
then n An ∈ D.
(c) A collection M is called a monotoneS class if for any
T nondecreasing or nonin-
creasing sequence {An : n ∈ N} ⊂ M, n An ∈ M or n An ∈ M respectively.
Theorem 3.5.2. If A is an algebra of sets, then the intersection of all monotone classes
that contain A is σ(A).

Proof. The intersection M the intersection of all monotone classes that contain A is also
a monotone class. Clearly M ⊂ σ(A). Define
M0 = {B ∈ M : X \ B ∈ M}
Clearly A ⊂ M0 . If {Bn : n ∈ N} ⊂ M0 is a monotone sequence, then {X \ Bn : n ∈ N} ⊂
M is also a monotone sequence. Thus limn Bn ∈ M, and X \ limn Bn = limn (X \ Bn ) ∈ M.
It follows that M0 is a monotone class, and so M = M0 .

Define
M1 = {B ∈ M : A ∈ A implies A ∪ B ∈ M}
Clearly A ⊂ M1 . If {Bn : n ∈ N} ⊂ M1 is a monotone sequence and A ∈ A then,
{Bn ∪ A : n ∈ N} is a monotone sequence in M. Thus limn Bn ∈ M, and A ∪ limn Bn =
limn (A ∪ Bn ) ∈ M. It follows that M1 is a monotone class, and so M1 = M.

Finally, define
M2 = {B ∈ M : A ∈ M implies A ∪ B ∈ M}
As M1 = M, we have that A ⊂ M2 . If {Bn : n ∈ N} ⊂ M2 is a monotone sequence, and
A ∈ M, then {A ∪ Bn : n ∈ N} is a monotone sequence in M. Thus limn Bn ∈ M, and
A ∪ limn Bn = limn (A ∪ Bn ) ∈ M. It follows that M2 is a monotone class, and so M2 = M.

n haveSshown that Mois also an algebra of set. Now let {Bn : n ∈ N} ⊂

We
S
M. Then
n
Dn = j=1 Bj : n ∈ N ⊂ M is a monotone sequence, and so limn Dn = n Bn ∈ M.
Therefore M is a σ–algebra.
3.5. Uniqueness of measures 77

Theorem 3.5.3. (Sierpinski’s monotone class Theorem) If P is a π–system and D is a

d–system that contains P, then σ(P) ⊂ D.

Proof. By exercise 3.10.12, it suffices to show that d(P) is a π–system itself. Let d(P) the
intersection of all d–systems that contain P. Clearly σ(P) is a d–system that contains P,
thus d(P) ⊂ σ(P). It remains to prove that d(P) ⊃ σ(P). For that purpose, consider
H = {D ∈ D : D ∩ B ∈ D, ∀B ∈ P}
Clearly, P ⊂ H and H is a d–system. So d(P) ⊂ H. Similarly, let
A = {A ∈ D : A ∩ D ∈ D, ∀D ∈ D}
Then, P ⊂ A and A is a d–system, so A ⊃ d(P). This shows that d(P) is a π–system. It
follows that d(P) is a σ–algebra.

A measure µ on (Ω, F ) is called σ–finite if there is a countable partition {An : n ∈

N} ⊂ F of Ω such that µ(An ) < ∞ for all n ∈ N.
Example 3.5.4. Suppose (S, d) is a metric space and µ and µ are two σ–finite measures
on the Borel σ–algebra B(S). Then µ = ν iff µ(U ) = ν(U ) for any open set U .
Theorem 3.5.5. (uniqueness) Let (Ω, F ) be a measurable space such that F = σ(C) where
C is a π–system. Suppose that µ and ν are to measuresSon F that coincide on C. If there
is an increasing sequence of sets Cn ∈ C such that Ω = n Cn and µ(Cn ), ν(Cn ) < ∞, then
µ = ν.

Proof. Let µn and νn the finite measures on F defined by µn (A) := µ(A ∩ Cn ) and
νn (A) := νn (A ∩ Cn ). Then, since C is a π–system, it is easy to check that D = {D ∈ F :
µn (D) = νn (D)} is a d–system that contains C. Therefore, µn = νn for each n. For any
A ∈ F we have µ(A) = limn→∞ µn (A) = limn→∞ νn (A) = ν(A).
Theorem 3.5.6. Suppose E is a semiring on Ω, and µ is additive and countably subadditive
on E . If the Carathédory extension is σ–finite on σ(E ), then Mµ = σ(E ) and the extension
is unique.

Proof. Suppose that σ(E ) ∋ En ր Ω with µ(En ) < ∞, and consider the finite measures
µn (·) = µ(· ∩ En ). To show that Mµ ⊂ σ(E ) it is enough to show that {E ∩ En : E ∈
Mµ } ⊂ σ(E ) for each n, and to that purpose, it suffices to assume that µ is finite. Let
E ∈ Mµ , and as in Corollary 3.3.6 let B ∈ σ(E ) be such that E ⊂ B and µ∗ (E) = µ(B).
Notice that µ(B) = µ∗ (B) = µ∗ (B ∩ E) + µ∗ (B \ E) and µ∗ (B \ E) = 0. Therefore,
E = B \ (B \ E) ∈ σ(E ).
Example 3.5.7. The Lebesgue measure λ and the Lebesgue–Stieltjes measure µF associ-
ated to a right–continuous function F with nonnegative increments are the only measures
Q Q
that assign dk=1 (bk −ak ) and dk=1 ∆k (ak , bk )F respectively, to each d–dimensional interval
Qd
(a, b] = k=1 (ak , bk ] where ak ≤ bk .
78 3. Basic measure theory

Example 3.5.8. A measure µ on (Rd , B(Rd )) is translation preserving if µ(A − x) = µ(A)

for all A ∈ B(Rd ) and x ∈ Rd , where A − x := {a − x : a ∈ A}. If c = µ((0, 1]d ) < ∞,
then µ = c λd , where λd stands for Lebesgue measure. That is, λd is the unique translation
invariant measure on B(Rd ) with that assigns mass one to the unit cube.

3.6. Measurable functions and random variables

In this section we develop the notion of random variable. Intuitively, a random measur-
able is an observable quantity measured upon the realization of a particular outcome of an
experiment.
Definition 3.6.1. Let (Ω, F ) and (R, B) be two measurable spaces. A function f : Ω → R
is F –B measurable if f −1 (B) := {ω ∈ Ω : f (ω) ∈ B} ∈ F whenever B ∈ B. That is, the
preimages under f of all B measurable sets are F –measurable.
Example 3.6.2. When (R, τ ) is a topological space, it is of special interest to consider the
Borel σ–algebra defined by B(R) = σ(τ ). Given a measurable space (Ω, F ), a measurable
function f : (Ω, F ) −→ (R, B(R)) is called Borel measurable or R–valued random variable.
Example 3.6.3. If both (Ω, F ) and (R, B) are a Borel measurable spaces, that is they
are topological spaces and F , B are corresponding Borel σ–algebras, then any continuous
function f : Ω → R is measurable.
Remark 3.6.4. In many instances it is convenient to consider the set of R = R ∪ {−∞} ∪
{∞} of extended real numbers equipped with the σ–algebra σ(B(R) ∪ {−∞} ∪ {∞}). By
convention 0 · ∞ = ∞ · 0 = 0; a + ∞ = ∞ + a = ∞ for any a ∈ R; c · ∞ = ∞ if c > 0 and
c · ∞ = −∞ if c < 0.
Lemma 3.6.5. Let fn : Ω → R be a sequence of measurable functions. Then, f∗ := inf n fn
and f ∗ := supn fn are measurable. If f (ω) = limn→∞ fn (ω) exists for every ω ∈ Ω, then f
is also a measurable.

Proof. For any α ∈ R,

\
(f∗ )−1 ([α, ∞]) = fn−1 ([α, ∞])
n
\
∗ −1
(f ) ([−∞, α]) = fn−1 ([−∞, α]).
n
The measurability of f∗ and f∗ follows from that of fn . The last statement follows from
the measurability of the functions
lim sup fn = inf sup fn
n m≥1 n≥m

lim inf fn = sup inf fn

n m≥1 n≥m

Lemma 3.6.6. If f : (T, T ) → (S, S) and g : (S, S) → (U, U ) are measurable functions,
then g ◦ f : (T, T ) → (U, U ) is measurable.
3.7. Universal completion 79

Proof. If A ∈ U , then g −1 (A) ∈ S, and so f −1 g −1 (A) ∈ T . Therefore, (g ◦ f )−1 (A) =
f −1 g −1 (A) ∈ T .
Lemma 3.6.7. Let (Ω, F ) be a measurable space. A function f on Ω with values in a metric
space (S, d) is measurable iff g ◦ f : Ω −→ R is measurable for any real valued continuous
function g on S.

Proof. Necessity is a direct consequence of Lemma 3.6.6.

Sufficiency: for any open set U ⊂ S, let gU : S → R be the map x 7→ d(x, U c ) if U 6=
S or gU ≡ 1 otherwise. Notice that gU is continuous and that U = {gU > 0}. The
measurability of (gU ◦ f ) implies that f −1 (U ) = (gU ◦ f )−1 ((0, ∞)) ∈ F ; therefore, f is
Borel–measurable.
Theorem 3.6.8. Let (Ω, F ) be a measurable space and (S, d) be a metric space. If {fn } ⊂
S Ω is a convergent sequence of measurable functions, then f = limn f is measurable.

Proof. For any real continuous function g on S, (g ◦ f ) = lim(g ◦ fn ). The conclusion

follows from Lemmas 3.6.5 and 3.6.7.

3.7. Universal completion

Suppose that (Ω, F ) is a measurable space and let B be a collection of finite measures on
µ
F . For each µ ∈ B, denote by F the completion of F with respect to µ. The completion
of F with respect to B is defined as
B \ µ
F = F
µ∈B

The universal completion of F , denoted by F∗ , corresponds to the case where B is the

collection of all finite measures on F . Alternative description is given in Exercise 3.10.22.
Theorem 3.7.1. Let (X, A ) and (Y, B) be measurable spaces. If f : X −→ Y is A –B
measurable, then it is also A∗ –B∗ measurable.

Proof. For any finite measure µ on A , the push–forward µ ◦ f −1 is a finite measure on B.

Hence, if B ∈ B∗ , there exist sets E and F in B such that E ⊂ B ⊂ F and

(3.9) µ ◦ f −1 (F \ E) = µ f −1 (F ) \ f −1 (E) = 0.
The measurability of f implies that f −1 (E) and f −1 (F ) are in A . Since f −1 (E) ⊂ f −1 (B) ⊂
µ
f −1 (F ), from (3.9) we conclude that f −1 (B) ∈ A for all finite measure µ on A ; therefore,
f −1 (B) ∈ A∗ .

Using Example 3.4.5 we will show that not every Lebesgue measurable set is universally
measurable.
Example 3.7.2. (Existence of a Lebesgue set that is not universally measurable). Let G be
as in Example 3.4.5. As any non–Lebesgue measurable set E is not universally measurable,
then B = G(E) is Lebesgue measurable but not universally measurable. Indeed, if B
80 3. Basic measure theory

were universally measurable, then the measurability of G would imply that E = G−1 (B) is
universally measurable, contradiction.

3.8. Suslin operation and projection of measurable sets*

Given a Borel set B ⊂ R2 , its projection πx (B) onto the x–axis is not in general a Borel
set. We will prove that for any finite measure µ on (R, B(R)), then px (B) belongs to the
µ
completion B(R) .
S
We define N< := n Nn that is, N< is the set of nonempty ordered finite strings of
integers. For any f ∈ NN and k ∈ N, we use f |k to denote the string (f (1), . . . , f (k)). Let
X be a nonempty set ans suppose E is a nonempty collection of subsets of X. A Suslin
scheme or table with values on E is a function
E : N< → E
Given a Suslin scheme E on E, the set
∞
[ \ ∞
[ \
A(E) := Ef |k = E(f (1),...,f (k))
f ∈NN k=1 f ∈NN k=1

is said to be E–analytic or E–Suslin. A, as a function on E–schemes, is called the A–

operation. The collection of all E–Suslin sets will be denoted by S(E).
Example 3.8.1. Countable unions and countable intersections of sets in E are E–Suslin
sets. Let {En : n ∈ N} ⊂ E and define the schemes I and J as
I(α1 , . . . , αk ) = Eα1
J (α1 , . . . , αk ) = Ek
S T∞
for all (α1 , . . . , αk ) ∈ Nk , k ∈ N. Then A(I) = ∞ i=1 Ei and A(J ) = j=1 Ej .

We will show that the Suslin operation on a collection E is exhaustive in the sense that
S S(E) = S(E). First we use a technical result about sequences of integers.
Lemma 3.8.2. The function β : N × N → N given by
(3.10) β(m, n) = 2m−1 (2n − 1)
is a bijection. Let ϕ : N → N and ψ : N → N be given by
ϕ(l) = m, ψ(l) = n if l = 2m−1 (2n − 1).
Then β ◦ (ϕ, ψ) is the identity map on N, and (ϕ, ψ) ◦ β is the identity map on N × N.

The map Ψ : (NN) × (NN×N) → NN defined by

Ψ(σ, τ ) = β ◦ (σ, τ ◦ (ϕ, ψ))
is a bijection. Moreover, if the first l = β(m, n) terms of Ψ(σ, τ ) are known, then the first
m terms of σ and the first n terms of τ (m, ·) are uniquely determined.
3.8. Suslin operation and projection of measurable sets* 81

Proof. It is clear that every integer l can be uniquely expressed as in (3.10). From this it
follows that β is bijective and β −1 = (ϕ, ψ). Moreover,
ϕ ◦ β = ϕ, ψ◦β =ψ
Let (σ, τ ) ∈ (NN) × (NN×N). The equation η = Ψ(σ, τ ) has solution
(3.11) σ =ϕ◦η
(3.12) τ =ψ◦η◦β
which shows that Ψ is indeed a bijection. Finally, if the first l = β(m, n) of Ψ(ψ, τ ), then as
m ≤ β(m, n), the values σ(1), . . . , σ(m) are obtained by using (3.11). As β(m, k) ≤ β(m, n)
for all 1 ≤ k ≤ n, the values of τ (m, 1), . . . , τ (m, n) are obtained by using (3.12).
Theorem 3.8.3. Let E be a nonempty collection of subset of an nonempty set X.
(i) S(S(E)) = S(E); in particular, S(E) is closed under countable unions and countable
intersections.
(ii) If ∅ ∈ E and X \ A ∈ S(E) for each A ∈ E, then the σ–algebra σ(E) generated by
E is contained in S(E).

Proof. (i) Suppose

∞
[ \ ∞
[ \ (g(1),...,g(j))
A= A(g(1),...,g(j)) , where A(g(1),...,g(j)) = A(f (1),...,f (k))
g∈NN j=1 f ∈NN k=1

Let Ψ, β, ϕ, and ψ be as in Lemma 3.8.2. For any k-th tupple (η1 , . . . , ηk ), choose any
η ∈ NN such that η|k = (η1 , . . . , ηk ) and choose functions σ ∈ NN and τ ∈ NN×N so that
η = Ψ(σ, τ ) as in Lemma 3.8.2. Then ηℓ = Ψ(σ, τ )(ℓ) for 1 ≤ ℓ ≤ k and, although the
functions σ and τ are not uniquely determined by (η1 , . . . , ηk ), the k–tupples
(σ1 , . . . , σ(ϕ(k))), (τ (ϕ(k), 1), . . . , τ (ϕ(k), ψ(k))),
uniquely determined by (η1 , . . . , ηk ). Hence, we may define
(τ (ϕ(k),1),...,τ (ϕ(k),ψ(k)))
B(η1 , . . . , ηk ) = A(σ(1),...,σ(ϕ(k))) ∈E
unambiguously. It follows that
∞
[ \ [ ∞
\
Bη|k = B Ψ(σ, τ )(1), . . . , Ψ(σ, τ )(k)
η∈NN k=1 σ∈NN k=1
τ ∈NN×N
[ \∞
(τ (ϕ(k),1),...,τ (ϕ(k),ψ(k)))
[ [ ∞ \
\ ∞
(τ (m,1),...,τ (m,n))

= A(σ(1),...,σ(ϕ(k))) = A(σ(1),...σ(m))
σ∈NN k=1 σ∈NN τ ∈NN×N m=1 n=1
τ ∈NN×N
[ \ ∞ [ \
∞ ∞
[ \
g(1),...,g(n))
= A(σ(1),...σ(m)) = A(σ(1),...,σ(m)) = A
σ∈NN m=1 g∈NN n=1 σ∈NN m=1
82 3. Basic measure theory

This shows that S(S(E)) ⊂ S(E). The converse inclusion is obvious. The last statement is
discussed in Example 3.8

(ii) Let F = {B ∈ S(E) : X \ B ∈ S(E)}. We claim that F is a σ–algebra. Indeed, by

definition F is closed under complementation. By Sassumption ∅ ∈ E, and X ∈ S(E) hence,
X ∈ F . IfS ∈ N)
(Fn : n ⊂ F then, by part (i), n Fn ∈ S(E). By part(i) again, we have
T S
that X \ n Fn = n (X \ Fn ) ∈ S(E), which shows that n Fn ∈ F . Hence F is a
σ–algebra. As E ⊂ F by assumption, we conclude that σ(E) ⊂ F .

Although the Suslin A–operation involves uncountable union of sets, it turns out that
the A–operation preserves measurability which is a quite surprising result. Before we state
and proof this fact, we make a few observations related to the A–operation.
For any α ∈ Nn , let
∞
[ \ n
[ \
α
S := Ef |k , Sα := Eβ|k
f ∈NN k=1 β∈Nn k=1
f |n ≤α β≤α

where α ≤ α means βk ≤ αk for all 1 ≤ k ≤ n. It is left as an Exercise (see Exercise 3.10.23)

to show that
(3.13) S α ⊂ Sα
Moreover,
(3.14) S (n) ր A(E), S (α,n) ր S α as n → ∞,
and for any g ∈ NN,
∞
[ \
(3.15) Sg|n ց Sg := Ef |k as n → ∞.
f ∈NN k=1
f ≤g

Observe that Sg ⊂ A(E) for any g ∈ NN.

µ
Theorem 3.8.4. Suppose (Ω, F , µ) is a measure space and let F the completion of µ. Let
µ
E ⊂ F be a nonempty collection of set which is closed under finite unions and countable
µ
intersections. Then S(E) ⊂ F . If A(E) ∈ S(E) and µ∗ (A(E)) < ∞, then µ∗ (A(E)) =
sup{µ∗ (S) : S ⊂ A(E), S ∈ E}.
µ
Proof. Carathéodory’s extension theorem implies F is the collection of all µ∗ –measurable
sets (in the sense of (3.3)) where µ∗ is the outer measure defined by

µ∗ (A) := inf µ(F ) : A ⊂ F ∈ F , A ⊂ Ω.
S T
Let A(E) = f ∈NN ∞ N
k=1 Ef |k where E(f (1),...,f (k)) ∈ E for all f ∈ N and k ∈ N. It suffices
to show that for any F ⊂ Ω
µ∗ (F ) ≥ µ∗ (F ∩ A(E)) + µ∗ (F \ A(E))
3.8. Suslin operation and projection of measurable sets* 83

If µ∗ (F ) = ∞ there is nothing to proof. So we may assume that µ∗ (F ) < ∞. Let G ∈ F

such that F ⊂ G and µ∗ (F ) = µ(G) (this is possible due to Corollary 3.3.6[(i)]). As
S (n) ∩ G) ր A(E) ∩ G, by Corollary 3.3.6[(ii)], µ∗ (S (n) ∩ G) ր µ∗ (A(E) ∩ G). Thus, for
ε > 0 there is n1 ∈ N such that
ε
µ∗ (A(E) ∩ G) − < µ∗ (S (n1 ) ∩ G)
2
Inductively, once S (n1 ,...,nk ) have been constructed, since S (n1 ,...,nk ,n) ∩ G ր S (n1 ,...,nk ) ∩ G,
we can find nk+1 ∈ N such that
ε
(3.16) µ∗ (S (n1 ,...,nk ) ∩ G) − k+1 < µ∗ (S (n1 ,...,nk ,nk+1 ) ∩ G)
2
Set g(k) := nk . As S g|k ⊂ Sg|k , by adding (3.16) over k we obtain
k
X
∗ ∗ ε
µ (A(E) ∩ G) − ε < µ (A(E) ∩ G) −
2j
j=1
∗ g|k
(3.17) < µ (S ∩ G) ≤ µ∗ (Sg|k ∩ G).
As Sg|k ց Sg and Sg ⊂ A(E), we have that G \ Sg|k ր G \ Sg ⊃ G \ A(E). Since E is closed
µ
under finite unions and countable intersections Sg|k ∈ EF . Thus
µ∗ (F ) = µ(G) = µ∗ (Sg|k ∩ G) + µ∗ (G \ Sg|k )
≥ µ(A(E) ∩ G) − ε + µ(G \ Sg|k ) ր µ(A(E) ∩ G) + µ(G \ Sg ) − ε
≥ µ(A(E) ∩ G) + µ(G \ A(E)) − ε
µ
As ε is can be made arbitrarily small, we conclude that A(E) ∈ F .

The last statement follows from (3.17) in the case G = X, and Sg ⊂ A(E).

We now consider the problem of measurability of projections of sets onto coordinate

subspaces. To be precise, suppose (Ω, F ) and (X, B) are measurable spaces. Consider the
product space Ω × X and the projections πΩ : (ω, x) 7→ ω and πX : (ω, x) → x. The σ–
algebra F ⊗B generated by πΩ and πX is called the product σ–algebra of (Ω, F ) and (X, B);
it coincides with the σ–algebra in Ω × X generated by the boxes {A × B : A ∈ F , B ∈ B}.
The question is whether πΩ (E) is F –measurable whenever E ∈ F ⊗ B. The answer is in
general no, however under certain regularity conditions the answer is almost.
For any topological space X we denote by F(X) the collection of all closed sets in X.
Theorem 3.8.5. Let (Ω, F , µ) be a measure space and X a Polish space. Let
R = {A × C : A ∈ F , C ∈ F(X)}.
µ
If E ∈ S(R) then, πΩ (E) ∈ S(F ) ⊂ F .

Proof. Let d be a metric in Y consistent with the topology in Y . Let D ⊂ Y be a

countable dense set in Y and {Un : nN} be a countable arrangement of all the closed balls
of radius 2−1 with centers in D. For each n ∈ N, let {Un,k : k ∈ N} be the collection of all
84 3. Basic measure theory

closed balls of radius 2−2 , with centers in D, and that intersect Un . By induction, for each
(n1 , . . . , nk ) ∈ Nk , let {Un1 ,...,nk ,n : n ∈ N} be the collection of all closed balls of radius 2−k ,
with centers in D, and that intersect Un1 ,...,nk . It easy to check that
[ [
X= Un , Un1 ,...,nk ⊂ Un1 ,...,nk ,n
n n

and
[
X= Uα , diam(Uα ) = 2−k+1
α∈Nk

As X is complete, for each x ∈ X, there is a unique g ∈ NN such that x ∈ Ug|k for each
k ∈ N. Hence
∞
[ \ ∞
[ \
X= Ug|k , Ω×X = (Ω × Ug|k )
g∈NN k=1 g∈NN k=1

Let E be a Suslin scheme in R so that

∞
[ \
A(E) = (Af |k × Cf |k )
f ∈NN k=1

Then
∞
[ \
A(E) = (Ω × X) ∩ A(E) = (Ω × Ug|k ) ∩ A(E)
g∈NN k=1
[ \ ∞

= Af |k × (Cf |k ∩ Ug|k )
g,f ∈NN k=1

For any h ∈ NN and k ∈ N set

Âh(1),...,h(2k−1) × Ĉh(1),...,h(2k−1) = Ah(1),...h(k) × Ch(1),...,h(k) ∩ Uh(k+1),...,h(2k)
Âh(1),...,h(2k) × Ĉh(1),...,h(2k) = Âh(1),...,h(2k−1) × Ĉh(1),...,h(2k−1)
This way we obtain that
∞
[ \
(3.18) A(E) = Âh|k × Ĉh|k
h∈NN k=1

k→∞
Notice that each Ĉh|k is closed and diam(Ch|k ) ≤ 2−[k/2]+1 −−−→ 0. As the class of all
closed sets is closed under countable intersections, substituting Ĉhk by
Ĉh(1) ∩ . . . ∩ Ĉh(1),...,h(k) ,

we may assume without loss of generality that Ch|k+1 ⊂ Ch|k for all h ∈ NN and k ∈ N. As
T
X is complete, we have that Ĉh := ∞k=1 Ĉh|k 6= ∅ iff Ĉh|k 6= ∅ for all k. When Ĉhk = ∅ we
3.9. Measurable Isomorphism Theorem* 85

T
may redefine Âh|k = ∅ without altering (3.18). Hence, if Âh := ∞k=1 Âh|k 6= ∅ then Ĉh 6= ∅.
It follows that
[ [
πΩ (A(E)) = πΩ (Âh × Ĉh ) = Âh
h∈NN h∈NN
µ
By Theorem 3.8.4, πΩ (A(E)) ∈ S(F ) ⊂ F .

Now we are ready to prove the measurability of projection of measurable sets.

Theorem 3.8.6. (Projection Theorem) Let (Ω, F , µ) be a measure space and let (X, B)
µ µ
be a Polish space with Borel σ–algebra. For any E ∈ F ⊗ B, πΩ (E) ∈ F .
µ
Proof. Let R be as in Theorem 3.8.5 with (Ω, F , µ∗ ) in place of (Ω, F , µ). As any open
set in Y is a countable union of closed subsets in Y , the complement of any set in R belongs
to S(R). As ∅ ∈ R, Theorem 3.8.3 implies that σ(R) ⊂ S(R). The conclusion follows from
µ
the fact that σ(R) = F × B.

3.9. Measurable Isomorphism Theorem*

In this section we present a theoretical result that states that states that Borel sets of Polish
spaces that have the same cardinality are isomorphic. In particular, every uncountable Borel
set of a Polish space is measurable isomorphic to (R, B(R)). This result provides a useful
tool in probability theory by allowing the use of (R, B(R) as a canonical space while studying
probability measures on Polish spaces. We will give a prove if the isomorphism theorem
through a sequence general lemmas and some technical results about Polish spaces.
Definition 3.9.1. Two measurable spaces (S, S ) and (T, T ) are measurable isomor-
phic, or simply isomorphic, whenever there is a bijection φ : S → T such that φ is S –T
measurable, and φ−1 is T –S measurable.

The next two general result show that measurable isomorphic partitions lead to isomor-
phic spaces, and that isomorphic spaces can be partitioned in isomorphic pieces.
Lemma 3.9.2. Suppose {En : n ∈ N} and {Fn : n ∈ N} are sequences of measurable sets
in (S, S ) and (T, T ) respectively. If Ej ∩ ES
j = Fi ∩ FjS= ∅ for any i 6= j and En and Fn
are measurable isomorphic for each n then, n En and n Fn are measurable isomorphic.

Proof.
S For eachS n ∈ N let fn be a measurable isomorphism between En and Fn . Define
F : n En → n Fn as x 7→ fn (x) if x ∈ En . This S is well defined bijective S
function. To
prove measurability, notice that any Borel set B ⊂ Sn Fn is of the
S form B = n Bn , where
Bn is a Borel subset of Fn . Hence F −1 (B) = F −1 ( n (Bn )) = n fn−1 (Bn ), which implies
that F is measurable. A similar argument proves that F −1 is measurable.
Lemma 3.9.3. Suppose E and F are measurable sets in (S, S ), and suppose E and F are
measurable isomorphic. If E = E1 ∩ E2 and E1 ∩ E2 = ∅ then, there are disjoint sets F1
and F2 such that F = F1 ∪ F2 and Ei and Fi , i = 1, 2, are isomorphic.
86 3. Basic measure theory

Proof. Let g be an isomorphism between E and F . Let Fi = φ(Ei ). Clearly Ei and Fi are
isomorphic and F1 ∪ F2 = F .
Theorem 3.9.4. Let A, B, and C measurable sets in (S, S ), and suppose A ⊂ B ⊂ C. If
A and C are isomorphic, then B and C are isomorphic.

Proof. Let E0 = C, and E1 = A. Define D1 = E0 \ E1 so that E0 = E1 ∪ D1 . By

Lemma 3.9.3 there are measurable sets E2 ⊂ E1 and D2 ⊂ E1 such that E1 = E2 ∪ D2 , E1
is isomorphic to E2 , and D1 is isomorphic to D2 . Suppose measurable sets {Ej : 1 ≤ j ≤ n}
and {Fj : 1 ≤ j ≤ n} such that
En−1 = En ∪ Dn , En ⊂ En−1 , Dn ⊂ En−1 ,
En−1 is isomorphic to En , and Dn is isomorphic to Dn−1 . Applying Lemma 3.9.3 once
more, we obtain measurable sets En+1 ⊂ En and Dn+1 ⊂ En such that En = En+1 ∪ Dn+1 ,
En is isomorphic to En+1 and Dn is isomorphic to Dn+1 .
By construction the sets {Dn : n ∈ N} are pairwise disjoint Borel sets, {En : n ∈ Z+ } is a
decreasing set of Borel sets, and
\
∞ [ ∞
E0 = En ∪ Dn
n=1 n=1
Now we partition the sets Dn s follows. Set F1 = B \ A = B \ E1 and G1 = C \ B = E0 \ B.
Then D1 = F1 ∪ G1 . Since all sets Dn are isomorphic, for each n ∈ N there are measurable
sets Fn ⊂ Dn and Gn ⊂ Dn such that Dn = Fn ∪ Gn , Fn is isomorphic to F1 , and Gn is
isomorphic to G1 . Notice that
\∞ [ ∞
E0 = En ∪ (Fn ∪ Gn )
n=1 n=1
\
∞ [
∞ [
∞
= En ∪ Fn ∪ Gn
n=1 n=1 n=1
\
∞ [
∞
B = E1 ∪ F1 = En ∪ F1 ∪ Dn
n=1 n=2
\
∞ [
∞ [
∞
= En ∪ Fn ∪ Gn
n=1 n=1 n=2
An application of Lemma 3.9.2 implies that C and B are isomorphic.

Let {(Xj , Fi ) : i ∈ I} be a collection of measurable spaces. As Qin the case of a product

of topological spaces, we define the product σ–algebra on X = i∈I Xi as the minimal
σ–algebra for which projections pi are measurable. More precisely,
O
Fi := σ(p−1 −1
i (B) : i ∈ I, U ∈ Fi ) = σ(pi (B) : i ∈ I, B ∈ Fi ).

When each (Xi , τi ) N

is a topological space and Fi is the Borel σ–algebra
Q B(Xi ) then, the
product σ–algebra B(Xi ) is contained in the Borel σ–algebra B( i Xi ) generated by
3.9. Measurable Isomorphism Theorem* 87

the product topology. To Q check this is the case, observe that for any i ∈ I andQ open set
−1 −1
U ∈ τi pN
i (U ) ∈ τ ⊂ B(
Q i∈I iX ). It follows that {p i : i ∈ I, B ∈ B(X i )} ⊂ B( i∈I Xi ),
and so B(Xi ) ⊂ B( i∈I Xi ). When I is countable, and each topological space Xi is
nice then, both the product σ–algebra and the Borel σ–algebra generated by the product
topology coincide.
Theorem 3.9.5. Let {(Xn , B(Xn )) : n ∈ N} be a sequence second countable N topological
spaces with the corresponding
Q Borel σ–algebras. Then, the product σ–algebra B(Xn ) and
the Borel σ–algebra B n X) generated by the product topology coincide.
Q N
Proof. It is enough to prove that B( n Xn ) ⊂ n B(Xn ). As each (Xn , τn ) is second
countable, the product topology τ is second countable. Moreover, if Bn is a countable basis
for τn , then A = {pn (B) : n ∈ N, B ∈ Bn } is a countable subbasis for τ , and the
Ncollection
of finite intersections in A forms a countable basis B for τ . Notice that B ⊂ Nn B(Xn ).
As each openQ the countable union of sets in B, it follows that τ ⊂ n B(Xn ).
set in τ isN
Therefore B( n Xn ) ⊂ n B(Xn ).
Lemma 3.9.6. Suppose Q {fn :→N(An , An ) →Q
(Bn , BN
n )} be a sequence of measurable func-
tions. The function F : ( n An , n An ) → ( n Bn , n Bn ) given by
(xn : n ∈ N) 7→ (fn (xn ) : n ∈ N).
Then, F is measurable.
Q
Proof. For any n and anyN B ∈ Bn denote by hBin = {y ∈ m Bm : yn ∈ B}. It suffices to
show that F −1 (hBin ) ∈ n An . It is easy to check that F −1 (hBin ) = hfn−1 (B)in . Therefore
F is measurable.
Lemma 3.9.7. There is a Borel subset E ⊂ {0, 1}N that is isomorphic to [0, 1].
P −n
Proof. The metric d(x, y) = n2 |xn − yn | on {0, 1}N is metric compatible with the
product topology on {0, 1}N. As any number in [0, 1] has a binary expansion, the map
τ : {0, 1}N → [0, 1] given by
∞
X xn
x 7→
2n
n=1

is surjective. τ is continuous since |τ (x) − τ (y)| ≤ d(x, y). The set E = {x ∈ {0, 1}N :
x(n) = 0, i.o.} ∪ {1} is a Borel set in {0, 1}N, and the restriction of τ to E is a bijection
since very number in [0, 1) has a unique binary expansion with an infinite number of 0 bits.
It remains to show that τ −1 : [0, 1] → E is measurable. Let Bj = {x ∈ {0, 1}N : xi = 1}.
Then
j−1
2[
−1 −1 2k − 1 2k
τ (Bj ∩ E) = τ (Bj ) = {1} ∪ , j
2j 2
k=1
is a Borel subset of [0, 1].
Lemma 3.9.8. There is a Borel set E1 ⊂ {0, 1}N that is isomorphic to [0, 1]N.
88 3. Basic measure theory

N
Proof. Let E be as in Lemma 3.9.7 and define the map τ ′ : {0, 1}N 7→ [0, 1]N by
(g(n, )˙ : n ∈ N) 7→ (τ (g(n, ·)) : n ∈ N).
Clearly τ ′ is surjective and continuous, and its restriction to E N is a bijection. By Lemma 3.9.6
and Lemma 3.9.7, the restriction of τ ′ to E N is an isomorphism.
N
The conclusion of the Theorem follows from the fact that {0, 1}N and {0, 1}N are home-
omorphic (See Example 2.8.9).
Theorem 3.9.9. Let X be a Polish space, and let B ∈ B(X). There exits a Borel set
EB ⊂ {0, 1}N that is isomorphic to B.

Proof. By Corollary 2.9.4 X is homeomorphic to a Gδ set U ⊂ [0, 1]N. By Lemma 3.9.8

there is a Borel set E1 ⊂ {0, 1}N to which [0, 1]N is isomorphic. Hence, U is isomoprphic to
a Borel set EB contained in E1 , and so B is isomorphic to EB .
Theorem 3.9.10. For any Polish space X, there exists a continuous surjection φ : NN →
X.
S
Proof. Let d be a complete metric for X. Let N< = k∈N Nk . We will construct a family
{C(n) : n ∈ N< } ⊂ X such that For any k ∈ N and n ∈ Nk ,
(a) C(n) is a non–empty closed set.

(b) diam C(n) ≤ k1 .
For n = (n1 , . . . , nk )
S
(c) C(n1 , . . . , nk ) = nk+1 ∈N C(n1 , . . . , nk , nk+1 ).
S
(d) X = n1 ∈N C(n1 )
We proceed by induction. Let D = {xn : n ∈ N} be a dense sequence in X, and for each
r > 0Slet B(xn ; r) be the closed ball around xn with radius r. For k = 1, we can write
X = n1 ∈N C(n1 ) where the C(n1 ) = B(xn1 ; 12 ), so (b) and (d) hold.
Suppose that for k > 1, sets C(n) with n ∈ Rk satisfying (a) and (b) have been defined.
1
Taking intersections with B(xj ; 2(k+1) )), j ∈ N, and replacing empty intersections with a
1
common subset of C(n) of diameter at most k+1 we can write
[
C(n1 , . . . , nk ) = C(n1 , . . . , nk , nk+1 )
nk+1 ∈N
1
with diam C(n1 , . . . , nk , nk+1 ) ≤ k+1 . This completes the inductive step in our construc-
tion.

For any n = (nk : k ∈ N) ∈ NN consider the collection {C(n1 , . . . , nk ) : k ∈ N}. This is

k→∞
a decreasing family of non–empty closed sets with diam C(n1 , . . . , nk ) −−−→ 0. Hence
T N
k C(n1 , . . . , nk ) has exactly one point ϕ(n). This defines a function ϕ : N → X. Proper-
ties (c) and (d) imply that ϕ is surjective.
3.9. Measurable Isomorphism Theorem* 89

It remains to show that ϕ is continuous. Notice that if m, n ∈ NN are such that m1 =

n1 , . . . , mk = nk for some k ∈ N , then ϕ(m) and ϕ(n) belong to C(n1 , . . . , nk ), and so
d(ϕ((m)), ϕ((n))) ≤ k1 . This implies that ϕ is continuous.
Corollary 3.9.11. Suppose (X, τ ) is a Polish space. The collection S of sets B ⊂ X for
which there exists a continuous surjection φB : NN → B contains the closed sets, the open
sets, and it is closed under countable intersections and countable union of disjoint sets. In
particular, B(X) ⊂ S .

Proof. S contains the closed and open sets: Every closed subset of X is a Polish space,
and by Alexandroff’s lemma, every open set is a Polish subspace of X. From Theorem 3.9.10
it follows that S contains all closed and open sets.
S is close under countable disjoint unions: Suppose {An : n ∈ N} ⊂ S . For each n ∈ N,
there exists continuous function φn : {n} × NN → X with φn ({n} × NN) = An . Since S the
sets {n} × NN, n ∈ N, form a partition of NN, we have that the functionS φ : NN →
n An
N N
S by n 7→ φm (n) if n ∈ {m} × N is continuous and φ(N ) = n An , which shows
defined
that n An ∈ S .
S isQclose under
N countable intersections: Each subspace
Q An is separable metric space, and so
B( n An ) = n B(An ). Notice that ∆ = {x ∈ n An : xn = x1 , n ∈ N} is a closed subset
Q N Q
of n An . The function Φ : NN → n An given by (nk : k ∈ N) 7→ (φk (nk ) : k ∈ N)
N
is continuous; hence, D := Φ−1 (∆) is a closed subset of the Polish space NN . Then,
T
there is a continuous surjection G : NN → D. Consequently, p1 ◦ Φ ◦ G : NN → n An is a
continuous surjection.
The conclusion follows from Theorem 3.1.13.
Lemma 3.9.12. Suppose X is a separable metric space. There exists a countable set N ⊂ X
such that for any x ∈ X \ N and open set Ux containing x, Ux ∩ (X \ N ) is uncountable.
Remark 3.9.13. Points in X \ N are called condensation points of X.

Proof. Let N be the set of points in X which have a neighborhood Nx that is at most
countable.
S Since X is separable, there is a set {xk } ⊂ N (at most countable) such that
N = k Nk . N is countable and satisfies the conditions in the Lemma.

Proof.
Theorem 3.9.14. Let X be a Polish space. For any uncountable Borel set B ⊂ X, there
is a compact set K ⊂ B that is isomorphic to {0, 1}N.

Proof. Let φ be a continuous surjection from NN onto B. For each x ∈ B, choose nx ∈

φ−1 ({x}). Let A = {nx : x ∈ B}. As a subspace, A is an uncountable separable metric
space, and so there is a countable set N ⊂ A such that if x ∈ D := A \ N and Ux is an open
neighborhood of x, Ux ∩ D is uncountable.
We will construct by induction a family of closed sets {An : k ∈ N, n ∈ {0, 1}k } such that
(a) A(n:nk+1 ) ⊂ An for all n ∈ {0, 1}k , k ∈ N, and nk+1 ∈ {0, 1}.
90 3. Basic measure theory

(b) φ(An ) ∩ φ(Am ) = ∅, for all n, m ∈ {0, 1}k , k ∈ N.

1
(c) diam(An ) ≤ k for all n ∈ {0, 1}k , k ∈ N.
(d) int(An ) ∩ D is uncountable for all n ∈ {0, 1}k , k ∈ N.

As D is uncountable, we can choose x0 , x1 in D such that x0 6= x1 . Then φ(x0 ) 6= φ(x1 ) and

there two disjoint open sets Uj with φ(xj ) ∈ Uj . By continuity, there are closed
balls Aj in

{0, 1}N such that diam(Aj ) ≤ 1, xj ∈ Aj and φ(Aj ) ⊂ Uj . The collection Aj : j ∈ {0, 1}
satisfies (b)–(d); moreover, int(Aj ) ∩ D is uncountable.
Suppose that for k ≥ 1 sets {An : n ∈ {0, 1}ℓ , 1 ≤ ℓ ≤ k} satisfying (a)–(d) have been
defined. Let n ∈ {0, 1}k . As the int An is uncountable, we can choose x(n:0) and x(n:1)
in int An with x(n:0) 6= x(n:1) . Repeating the argument used above there are open closed
1
balls A(n:j) in {0, 1}N such that A(n:j) :=⊂ int(An ) with diam(A(n:j) ) ≤ k+1 , such that
x(n:j) ∈ A(n:j) , φ(A(n:0) ) ∩ φ(A(n:1) ) = ∅, and int(A(n:j) ) ∩ D is uncountable. This concludes
our inductive construction.
T
By design, for each n = (nk : k ∈ N) ∈ {0, 1}N, k An1 ,...,nk has exactly one point g(n)
Thus g : {0, 1}N → {0, 1}N is a well defined function. We will show that g is injective
and continuous. For any two distinct points n, m in {0, 1}N, there is k ∈ N such that
(n1 , . . . , nkT) 6= (m1 , . . . , mk ). It follows from (b) that A(n1 ,...,nk ) ∩ A(m1 ,...,mk ) = ∅, and thus
{g(n)} = k A(n1 ,...,nj ) 6= A(m1 ,...,mj ) = {g(m)}.
If n and m are points in {0, 1}N with (n1 , . . . , nk ) = (m1 , . . . , mk ), then g(n) and g(m)
belong to A(n1 ,...,nk . Hence d(g(n), g(m)) ≤ k1 . This shows that g is continuous; hence,

C := g {0, 1}N is a compact subset of {0, 1}N which is homeomorphic to {0, 1}N.
We now show that the restriction of φ to C is injective. The continuity of φ and the
compactness of C will imply that this C and K := φ(C) are homeomorphic. If x, y are
distinct points in C then, x = g(n) and y = g(m) for distinct n, m points in {0, 1}N. Then
(n1 , . . . , nk ) 6= (m1 , . . . , mk ) form some k ∈ N, and so x ∈ An1 ,...,nk , y ∈ Am1 ,...,mk . It
follows from (b) that φ(An1 ,...,nk ) ∩ g(Am1 ,...,mk ) = ∅. Therefore φ(x) 6= φ(y).
Theorem 3.9.15. (Isomorphism theorem) Let X and Y be Polish spaces, and let A ∈ B(X)
and B ∈ B(Y ). A and B are isomorphic iff A and B have the same cardinality.

Proof. Necessity is obvious.

Sufficiency: If A and B are finite, or if A and B are countable then, the result is obvious.
Suppose both A and B. We will show that both A and B are isomorphic to {0, 1}N. By
Theorem 3.9.9 there is a Borel set EA ⊂ {0, 1}N which is measurable isomorphic to A.
By Theorem 3.9.14, there are isomorphic compact sets C ⊂ EA and K ⊂ A such that C
and {0, 1}N are isomorphic. As C ⊂ EA ⊂ {0, 1}N, we conclude from Theorem 3.9.4 that
EA , and thus A, is measurable isomorphic to {0, 1}N. Similar arguments show that B is
isomorphic to {0, 1}N.

A set A ⊂ X for which there is a continuous map φA : NN → X with φA (NN) = A

is called analytic set. Corollary 3.9.11 states that each Borel subset of a Polish space
3.9. Measurable Isomorphism Theorem* 91

(X, τ ) is analytic. The following results makes the link between analytic sets and the Suslin
operation.
Theorem 3.9.16. Let (X, τ ) be a Polish space. A set A ⊂ X is analytic iff A = A(I)
where I is s Suslin scheme {Ef |k : f ∈ NN} of closed sets such that for any f ∈ NN
(i) E(f (1),...,f (k+1)) ⊂ E(f (1),...,f (k)) .
k→∞
(ii) diam(E(f (1),...,f (k)) ) −−−→ 0.

Proof. Let ρ be a Polish metric on (X, τ ). Consider the metric on NN defined by

0 if f = g
t(f, g) = 1
k if k = min{j : f (j) 6= g(j)}

It is left as an exercise to check that t(f, g) ≤ t(f, h) ∧ t(h, g) for all f, g, h ∈ NN whence
1
it follows that t is a metric. In this metric, B(f, m ) = {f (1)} × . . . × {f (m)} × NN and so
N
t generates the product topology on N . Moreover, comparing t with the product metric
P
d(f, g) = n |f (n)−g(n)|∧1
2n , we have that d(f, g) ≤ t(f, g) and so t is a Polish metric on NN.

Suppose A is the continuous image of a function φ : NN → X. For each f ∈ NN define the

Suslin scheme I by setting
Ef |k = φ({f (1)} × . . . {f (k)} × NN).
T
Clearly the scheme thus defined satisfies (i). For any f ∈ NN we have that φ(f ) ∈ k Ef |k .
By continuity, given ε > 0, there exits k > 0 such that if τ (f, g) < k1 then, ρ(φ(f ), φ(g)) < ε.
that for all m ≥ k, diam
It follows T S ρ (ETf |k ) ≤ 2ε. This shows that (ii) holds. Consequently
{φ(f )} = k Ef |k , and so A = f ∈N k Ef |k .

Conversely, suppose I = {Ef |k : f ∈ NN} is a Suslin scheme

T of closed sets satisfying (i) and
(ii), and that A = A(I). As (X, ρ) is complete, Ef∗ := n Ef |k admits at most one point,
and Ef∗ is empty only when there exists k ∈ N for which Ef |k = ∅. Consider the set

M = {f ∈ NN : Ef |k 6= ∅ for all k ∈ N}
S T T
Then A = f ∈M k Ef |k , and there is a map φA : M → X given by φA : f 7→ k Efk .
k→∞
Notice that if t(f, g) < k1 then, g ∈ Eg|k = E|f |k . Since diamρ (Ef |k ) −−−→ 0, continuity of
φB follows.
We claim that M is a closed subset of NN. Let f ∈ Mc , and that Ef |k = ∅ for some k ∈ N.
If t(f, g) < k1 then Eg|k = Ef |k = ∅. Hence Bt (f ; k1 ) ⊂ Mc . Being a closed set, M is itself a
Polish space, and so by Theorem 3.9.10 there exists a continuous surjection G : NN → M.
The map φA ◦ G is a continuous map with φA (NN) = A.
Remark 3.9.17. Theorem 3.8.4 and the the regular Suslin representation of analytics sets
imply that for any Polish space (X, τ ) and measure µ on (X, B(X)), the analytic sets are
µ
included in B(X) .
92 3. Basic measure theory

3.10. Exercises
Exercise 3.10.1. Suppose that {Fi }i∈I is an arbitrary collection of algebras (or σ–algebras),
show that ∩i∈I Fi is also an algebra (respectively a σ–algebra).
Exercise 3.10.2. Let Ω be an uncountable set. Consider the collection A of all subsets
A ⊂ Ω such that either A or Ω \ A is countable. Is A an σ–algebra? (Here by countable we
mean either finite or infinite countable).
Exercise 3.10.3. Let C be a collection of subsets of Ω. Show that for each A ∈ σ(C) there
is a countable sub-family C0 of C such that A ∈ σ(C0 ). (Hint: Let F be the union of all
σ–algebras σ(L) where L runs over all the countable sub-families of C, and show that F is
a σ–algebra that satisfies C ⊂ F ⊂ σ(C).)
Exercise 3.10.4. Show that any positive finitely (countably) additive set function µ on a
semiring E of Ω is finitely (countably) subadditive.
Exercise 3.10.5. Suppose (Ω, F , µ) is a measure space. Show that
(a) If A, B ∈ F , and that B ⊂ A, then µ[B] ≤ µ[A]. If in addition µ[B] < ∞, then
µ[A \ B] = µ[A] − µ[B].
S P
(b) For any {An } ⊂ F , µ[ An ] ≤ n µ[An ].
Continuity properties. Let {An } ⊂ F .
hT i
(c) If An+1 ⊂ An for all n and µ[A1 ] < ∞, then µ n An = limn→∞ µ[An ]. (Hint:
S
observe that A1 = D ∪ n An \ An+1 where D = ∩n An .)
S
(d) If An ⊂ An+1 for all n, then, µ[ n An ] = limn→∞ µ[An ]
Exercise 3.10.6. For any measure space (Ω, F , µ),
(a) Show that
µ
F = {E ⊂ Ω : ∃A, B ∈ F with A ⊂ E ⊂ B and µ∗ (B \ A) = 0}
= {E ⊂ Ω : ∃B ∈ F with E△B ∈ Nµ }
where µ∗ is the outer measure induced by µ.
µ
(b) Show that the measure µ on F extends uniquey to F by setting µ(E) := µ(A)
where A ∈ F and µ∗ (A△E) = 0. In fact, µ(E) = sup{µ(A) : A ∈ F , A ⊂ E}.
Exercise 3.10.7. Let(E , µ) be
as in Theorem 3.3.5. Show that for any increasing sequence
∗
S ∗
of subsets En ⊂ Ω, µ n En = limn µ (En ).

Exercise 3.10.8. (Cantor sets). Consider the space ([0, 1], B([0, 1]), λ), where λ is the
Lebesgue measure. Let 0 < β ≤ 1/3, and set F0 = I0 = [0, 1]. Remove from F0 the middle
open interval of size β. This leaves two disjoint close subintervals I11 , I12 of the same size
whose union we denote by F1 . Suppose that the set Fn has been constructed so that it is the
union of 2n closed subintervals {Ink }k of the same size. From each subinterval Ink , subtract
the middle open interval of size β n+1 leaving 2n+1 disjointT close subintervals {In+1,k }k of
the same size whose union we denote by Fn+1 . Let C := n Fn .
3.10. Exercises 93

(i) Show that C is a nonempty close set.

(ii) Show that C does not contain any open interval.
(iii) Show that the probability λ(C) = 1−3β
1−2β . The case β = 1/3 corresponds to the so
called Cantor one third middle set.
Exercise 3.10.9. This is another construction of Cantor sets. Let {βn } be a sequence of
numbers in (0, 1). Let J0 = [0, 1] and subtract the middle open interval whose size is β1 –
proportional to the length λ(J0 ). This produces a set J1 which is the union of the remaining
close sets. The inductive construction continues as follows: From each close subinterval Ink
in Jn subtract the open middle term
T which is βn+1 –proportional to the length λ(Ink ). The
remaining set is Jn+1 . Let C = n Jn . Show that C is a non empty close set that contains
no open interval. Find λ(C).
Exercise 3.10.10. (Lipschitz extension) Let A be a non empty subset of a metric space
(X, d). If f : A → Rn is Lipschitz, show that there is a Lipschitz function g : X → Rn such
that g = f in A. (Hint: consider the case n = 1 and set g(x) := inf{f (y) + Lip(f ) d(x, y) :
y ∈ A}.)
Exercise 3.10.11. Suppose C is a countable collection of subset of some set Ω. Show that
the ring generated by C is countable. (Hint: without loss of generality assume ∅ ∈ D. For
any countable collection of sets D in Ω define D∗ as the set of finite unions S
and differences
of sets in D. Setting C0 = C and Cn+1 = Cn∗ , show that Cn ⊂ Cn+1 and n Cn∗ is a ring
containing C.)
Exercise 3.10.12. Suppose that D is a d–system. Show that, if in addition, D is a π–system
then D is in fact a σ–algebra.
Exercise 3.10.13. Let R be a ring of subsets in Ω. Show that if M is a monotone class
that contains A, then the σ–ring generated by R is contained in M.
Exercise 3.10.14. A σ–algebra F on Ω is countably generated if there is a coutable
collection C such that σ(C) = F . Show that (Ω, F ) is countably generated iff there is a
countable algebra A such that σ(A) = F .
Exercise 3.10.15. Suppose (Ω, F ) is a measurable space and let E be an arbitrary
nonempty set and let f : Ω → E be a function. Consider the collection B of all subsets
A ⊂ E such that f −1 (A) ∈ F . Show that B is a σ–algebra in E.
Exercise 3.10.16. Suppose that f : (Ω, F ) → (R, B) is a measurable function. Consider
the collection A = {f −1 (B) : B ∈ B} of subsets in Ω. Show that A is a sub σ–algebra of
F . This is the σ–algebra generated by f and it is denoted by σ(f ).
Exercise 3.10.17. Let (Ω, F ) and (R, B) be two measurable spaces and that B = σ(C)
for some collection C of subsets in R. Suppose that f : Ω → R is a function such that
f −1 (C) ∈ F for each C ∈ C. Show that f is F –B measurable.
Exercise 3.10.18. Suppose that (Ω, F ) is a measurable space and let f : Ω → R. Show
that that f is a random variable if and only if any of the following conditions hold:
94 3. Basic measure theory

(i) f −1 ((−∞, r)) ∈ F for any r ∈ Q.

(ii) f −1 ((−∞, r]) ∈ F for any r ∈ Q.
(iii) f −1 ((r, ∞)) ∈ F for any r ∈ Q.
(iv) f −1 ([r, ∞)) ∈ F for any r ∈ Q
Exercise 3.10.19. Suppose that a ∈ R is a constant, and g, f : Ω → R are measurable
functions which do not attain the values ±∞ and ∓∞ at the same time. Show that
(i) The map ω 7→ f (ω) + ag(ω) is measurable,
(ii) The map ω 7→ f (ω)g(ω) is measurable,
(iii) The maps ω 7→ |f (ω)|, ω 7→ f (ω) ∨ g(ω) and ω 7→ f (ω) ∧ g(ω) are measurable,
where a ∨ b = max(a, b), a ∧ b = min(a, b).
If f and g never take the values ±∞ at the same time, show that
(iv) the map ω 7→ f (ω) − g(ω) is measurable.
Exercise 3.10.20. Let (fn ) be a sequence of R–valued measurable functions defined on
a common measurable space (Ω, F ). Show that A0 = {ω ∈ Ω : limn fn (ω) exists} is
measurable and deduce that the map ω 7→ limn f (ω) on A0 is measurable.
Exercise 3.10.21. Let f : Ω → C and r : Ω → Rd be complex valued and vector valued
functions respectively. Then f (ω) = u(ω) + iv(ω) where u and v is the real and imaginary
parts of f respectively; r(ω) = [r1 (ω), . . . , rd (ω)] where rk is the k–th component of r. Show
that f is measurable if and only if u, v are measurable. Similarly, show that r is measurable
if and only if each rk k = 1, . . . , d is real.
Exercise 3.10.22. Let B be a familly of probability measures on (Ω, F ). Define N B =
{M ⊂ Ω : µ∗ (M ) = 0, µ ∈ B}. This is the collection of B–null sets.
(a) Show that F fB := σ(F , N B ) ⊂ F B and that in general, the former is smaller
than the latter.
(b) Show that
(3.19) fB = {A ⊂ Ω : ∃A′ ∈ F , A△A′ ∈ N B }.
F
fB will be called the subcompletion of F with respect to B.
F
Exercise 3.10.23. For any α ∈ Nℓ and n ∈ N, (α, n) := (α1 , . . . , αℓ , n) ∈ Nℓ+1 and (n) is
the 1–tupple whose only element is n. Show that (3.13), (3.14), and (3.15) hold.
Exercise 3.10.24. Let X, Y be a locally compact second countable Hausdorff spaces. Let
KX , KY be the collection of all compact subsets of X and of Y respectively. If f : X → Y
is continuous, show that f (A(E) ∈ S(KY ) for any A(E) ∈ S(K T If (Kn ) ⊂ KX is
TY ). (Hint:
decreasing collection of nonempty compact sets, show that f ( n K) = n (f (Kn ).)
Exercise 3.10.25. If A and B in B(Rn ).
(i) Show that B(Rn ) ⊂ S(KRn )
3.10. Exercises 95

(ii) Show that A + B = {a + b : a ∈ A, b ∈ B} is KRn –Suslin.

(iii) Show that the convex hull
XN N
X
co(A) = { αk xk : 0 ≤ αk , xk ∈ A, αk = 1}
k=1 k=1
is KRn –Suslin.
Chapter 4

Integration: measure
theoretic approach

The moment or mean of a random variable is the average value of the observable after repli-
cating the experiment a large number of times under the same conditions of the experiment.

Example 4.0.1. (Fair dice) The set Ω = {1, 2, 3, 4, 5, 6} contains all the possible outcomes
of rolling a dice. Let X denote the double of the number of dots facing up after the dice
comes to rest. This is a random variable X(ω) = 2ω, ω ∈ Ω. If the dice is fair, the mean
value of X is 2 · 61 + 4 · 61 + 6 · 61 + 8 · 16 + 10 · 16 + 12 · 16 = 7.

Example 4.0.2. In example 3.1.2 corresponding to the roulette spun around its center,
let Y = (cos ω, sin ω), where ω ∈ [0, 2π) is the angle observed after spinning the roulette
once. Y is a random variable with values on the unit circle. If the roulette is such that the
probability is uniformly distributed along the [0, 2π), then the mean of Y is (0, 0).

4.1. Simple functions and integration

Suppose Ω is a fixed set. For any A ⊂ Ω, the real valued function defined by 1A (ω) = 1 of
ω ∈ A and zero otherwise is called the indicator function of A. A function s : Ω → R is
called simple if it takes only a finite number of values.
Suppose R is a ring of subsets of Ω. For the purpose of integration theory, we consider
the set E(R) of all simple functions whose nonzero values are taken on sets in R. Each
function s ∈ E(R) admists the simple expression
X
s= r1{s=r}
r∈R

where each level set {s = r} ∈ R, r 6= 0, and all but finitelly many {s = r} = ∅ . As we

will see, E(R) is a linear space.

97
98 4. Integration: measure theoretic approach

Lemma 4.1.1. For any finite collection I = {A1 , . . . , An } of sets in a semiring R, there
exists another finite collection C = {C1 , . . . , Cm } of pairwise disjoint sets in R such that
(i) For each Cj ∈ C, there is Aℓ ∈ I with Cj ⊂ Aℓ .
S
(ii) For each Aℓ ∈ I, Aℓ = {Cj ∈ C : Cj ⊂ Aℓ }.

Proof. We proceed by induction on the number of elements of I. For n = 1 this is obvious.

Suppose the result is true for n. Let J = {A1 , . . . , An , An+1 } and I = {A1 , . . . , An }. Let
C ′ = {C1 , . . . , Cm } be a finite collection of sets in R for which (i) and (ii) hold for I. Set
n m
[ o
C = Cj ∩ An+1 : 1 ≤ j ≤ m ∪ Cj \ An+1 : 1 ≤ j ≤ m ∪ An+1 \ Cj
j=1

As R is a semiring, C is a finite pairwise disjoint collection of sets in R. It is easy to check

that C satisfies (i) and(ii).
Theorem 4.1.2. If µ is a real–valued additive function on a ring R if subsets of Ω, the
there exists a unique linear extension of µ to the space E(R) of simple functions over R,
namely
X
(4.1) µ(φ) = rµ({φ = r})
r∈R\{0}
Pp Pp
If φ = k=1 bk 1Bk , where bk 6= 0 and Bk ∈ R, then µ(φ) = k=1 bk µ(Bk ).

Proof. Suppose {a1 , . . . , an } are all the non–zero values that φ takes. Each Aj := {φ =
aj } ∈ R. Suppose that φ has another representation
m
X
φ= bk 1Bk
k=1

where {B1 , . . . , Bm } are pairwise disjoint and bk 6= 0 for all k. We show that
n
X m
X
aj µ(Aj ) = bk µ(Bk )
j=1 k=1
S S
First, notice that nj=1 Aj = m k=1 Bk , and that if Aj ∩ Bk 6= ∅, then aj = bk . Hence
aj µ(Aj ∩ Bk ) = bk µ(Aj ∩ Bk ) for all 1 ≤ j ≤ n and 1 ≤ k ≤ m. This shows that
n
X n
X m
X
aj µ(Aj ) = aj µ(Aj ∩ Bj )
j=1 j=1 k=1
Xm X n m
X
= bk µ(Aj ∩ Bk ) = bk µ(Bk )
k=1 j=1 k=1

To show that (4.1) is linear on E(R) consider two measurable simple functions φ1 and φ2 .
Let I the collection of all non–void level sets {φi = r}, i = 1, 2, with r 6= 0 and let C be a
4.1. Simple functions and integration 99

disjoint collection sets in R as in Lemma 4.1.1. Then

X X X
φi = r1{φi =r} = r 1C : C ∈ C, C ⊂ {φi = r}
r∈R\{0} r∈R\{0}
X X X
= r1C = φi (C)1C = φi (C)1C
r∈R\{0} r∈R\{0} C∈C
C⊂{φi =r} C⊂{φi =r}

Hence
X
φ1 + φ2 = (φ1 (C) + φ2 (C))1C
C∈C

As C is pairwise disjoint
X X
rµ({φi = r}) = φi (C)µ(C)
r∈R\{0} C∈C
φi (C)6=0

and so
X X X
rµ({φ1 + φ2 = r}) = rµ({φ1 = r}) + rµ({φ2 = r})
r∈R\{0} r∈R\{0} r∈R\{0}
X
= (φ1 (C) + φ2 (C))µ(C)
C∈C
φ1 (C)+φ2 (C)6=0

∈ E(R) is represented as a
Finally, we show that extension µ does not depend on how φ P
linear combination of indicator functions in R. Suppose φ = pk=1 bk 1Bk and let C as in
Lemma 4.1.1 for {B1 , . . . , Bp }. Then
p
X p
X X X X
bk 1Bk = bk 1C : C ∈ C, C ⊂ Bk = bk : C ⊂ Bk 1C
k=1 k=1 C∈C

and
p
X p
X X
bk µ(Bk ) = bk µ(C) : C ∈ C, C ⊂ Bk
k=1 k=1
X X
= bk : C ⊂ B k µ(C) = µ(φ)
C∈C

Remark 4.1.3. If µ is nonnegetive real extended, then the conclussion of Theorem 4.1.2
holds for all simple function φ ≥ 0. The proof is exactly as before since only finite summa-
tions of nonnegative real extended numbers are involved.

Lemma 4.1.4. Let (Ω, F , µ) be a measure space.

Suppose φ is a nonnegative real–valued
simple function. The function ν(E) := µ φ1E defines a measure on (Ω, F ).
100 4. Integration: measure theoretic approach

P
Proof. Suppose φ = nk=1 bk 1Ak where bk ≥ 0 areSthe distinct values of φ. For any pairwise
disjoint sequence {Ej : j ∈ N} ⊂ σ(R) with E = j Ej ,
n
X n
X X
∞
ν(E) = bk µ(Ak ∩ E) = bk µ(Ak ∩ Ej )
k=1 k=1 j=1
X∞ X n X∞
= bk µ(Ak ∩ Ej ) = ν(Ej )
j=1 k=1 j=1

This means that ν is a positive countably additive. Clearly ν(E) ≥ 0 for all EF and the
proof is complete.

The goal of integration theory is to extend the linear functional µ, called integral on
E(R) to a larger class of functions. Caratheodory’s extension theorem allow us to extend
first a measure over a semiring to a σ–algebra of sets. The collection of sets with finite
measure form a ring of sets and Theorem 4.1.2 allows us to extend the measure linearly to
simple functions.

4.2. Lebesgue Integration

In this section, we will assume that (Ω, F , µ) is a measure space. We will restrict our
attention to collection of nonnegative F –measurable simple functions.
P
Definition 4.2.1. For any nonnegative F –measurable simple function s = nk=1 ak 1Ak
and any E ∈ F , the integral of s over E with respect to µ is defined by
Z n
X
(4.2) s dµ := µ(s 1E ) = ak µ(E ∩ Ak )
E k=1

Theorem 4.1.2 implies that (4.2) is well defined and

R
(i) For any s ∈ E+ (F ), ν : E → E s dµ is a measure on F .
R R R
(ii) E (s + t) µ = E s dµ + E t dµ for all s, t ∈ E+ (F ).
R R
(ii) Ω s dµ ≤ Ω t dµ for all s, t ∈ E+ (E ) with s ≤ t.
To extend the integral more general functions we first show that nonnegative measurable
functions are limits of increasing nonnegative simple functions.
Lemma 4.2.2. Let f : (Ω, F ) → [0, ∞] be a Borel measurable function. Then,
(i) there is a sequence of simple functions such that 0 ≤ sn ≤ sn+1 < ∞ for all n ∈ Z+
and limn→∞ sn (ω) = f (ω) for all n ∈ Z+ ω ∈ Ω
Pis a sequence of sets An ∈ F and a sequence of constants αn ≥ 0 such that
(ii) There
f= ∞ n=1 αn 1An .

Proof. (i) Let φ : [0, ∞] → [0, ∞] be φ(x) = x and φ0 (x) ≡ 0, φn (x) = 2−n [2n x]1[0,n) (x) +
n1[n,∞) (x) for n ≥ 1. If 0 ≤ x ≤ n then 0 ≤ x − φn (x) ≤ 21n , thus limn φn (x) = x. The
sequence sn := φn ◦ f has the desired properties.
4.2. Lebesgue Integration 101

P∞
(ii) Notice that f = n=1 (sn − sn−1 ), and 2n (sn − sn−1 ) ∈ F is an indicator function.
Definition 4.2.3. For any measurable function f : Ω → [0, ∞], the integral of f over
E ∈ F is defined by
Z Z
(4.3) f dµ := sup{ s dµ : 0 ≤ s ≤ f, s is simple}
E E

Since 0 ≡ s ≤ f , the supremum in (4.3) is well defined as a nonnegative extended real

number. Also, it follows by definition that
R R
(i) E f dµ = Ω 1E f dµ
R R
(ii) Ω f dµ ≤ Ω g dµ
for any measurable functions 0 ≤ f ≤ g ≤ ∞, and E ∈ F .
Theorem 4.2.4. (Chebyshev–Markov) Let f : Ω → [0, ∞] be a measurable function. Then
Z Z
(4.4) tµ({ω : f (ω) > t}) ≤ f dµ ≤ f dµ
{ω:f (ω)>t} Ω
for all t ≥ 0.

Proof. Observe that the function gt (ω) := t1f −1 (t,∞] (ω) is a simple measurable and 0 ≤
gt ≤ f 1f −1 (t,∞] ≤ f .

RCorollary 4.2.5. Let f : Ω → [0, ∞] be a measurable function and suppose that µf :=

Ω f dµ < ∞. Then
(i) µ({ω ∈ Ω : f (ω) = ∞}) = 0
(ii) If µf = 0 then µ({ω ∈ Ω : f (ω) > 0}) = 0

Proof. (i) Let An = f −1 ((n, ∞]) and observe that An ց f −1 ({∞}). By Chebyshev–
Markov’s
Z
1
µ(An ) ≤ f dµ
n Ω
the conclusion follows by letting n → ∞ since µ(A1 ) < ∞.
(i) Let Bn = f −1 ( n1 , ∞] and observe that Bn ր f −1 (0, ∞]. By Chebyshev–Markov’s
Z
µ(Bn ) ≤ n f dµ = 0
Ω
The conclusion follows immediately.

A property P about Ω occurs almost surely if µ {ω ∈ Ω : P (ω) is false} = 0. This
is commonlyR denoted by P occurs µ–a.s. In this context Corollary 4.2.5 states that (i) if
f ≥ 0 and Ω f dµ < ∞ then f is finite µ–a.e.; (ii) if in addition the integral is zero, then f
is zero µ–a.s.
Example 4.2.6. In the Steinhaus space ([0, 1], B, λ), the functions 1[0,1]\Q and 1[0,1] are
equal λ–a.s; also, almost surely every ω ∈ [0, 1] has a binary expansion with an infinite
number of ones.
102 4. Integration: measure theoretic approach

4.3. Monotone Convergence

The next result is one of the most important in the theory of integration.

Theorem 4.3.1. (Monotone convergence) Let {fn } be a sequence of measurable functions

such that
(i) 0 ≤ . . . ≤ fn (ω) ≤ fn+1 (ω) ≤ . . . ≤ ∞ for any ω ∈ Ω
(ii) limn→∞ fn (ω) = f (ω) for all ω ∈ Ω.
Then f is measurable and
Z Z
(4.5) lim fn dµ = f dµ
n→ Ω Ω

Proof.R The measurability

R of f follows from exercise 3.6.5. The monotonicity of fnRimplies
that fn dµ ≤ fn+1 dµ for all n. Thus, there is α ∈ [0, ∞] such that α = limn fn dµ.
Since fn ≤ supn fn = f , it follows that
Z
(4.6) α≤ f dµ
Ω

Let s be a simple function with 0 ≤ s ≤ f and 0 < c < 1. Consider the S sets En = {ω ∈
Ω : c s(ω) ≤ fn (ω)}. Observe that En ⊂ En+1 for all n and that Ω = n En . Indeed, if
f (ω) = 0 then ω ∈ E1 ; whereas if f (ω) > 0 then c s(ω) ≤ cf (ω) < f (ω) ans so, ω ∈ En for
some n. Consequently,
Z Z Z
fn dµ ≥ fn dµ ≥ c s dµ.
Ω En En
R
Letting n → ∞ we obtain that α ≥ c Ω s dµ by Theorem 4.1.4. Letting c ր 1 we obtain
Z
(4.7) α≥ s dµ.
Ω

Since (4.7) holds for any simple function 0 ≤ s ≤ f , we obtain that

Z
α≥ f dµ
Ω

Example 4.3.2. On (0, 1), B((0, 1)), λ) the function f (x) = x1p is integrable if p < 1.
Indeed, for p 6= 1, monotone convergence gives
Z Z
1 1
f dλ = lim x−p dx = lim 1 − 1−p
(0,1) n→∞ [n−1 ,1) n→∞ 1 − p n

The limit is finite (1/(1 − p)) when p < 1 and infinity when p > 1. When p = 1
Z Z
1
f dλ = lim dx = lim log(n) = ∞
(0,1) n→∞ −1
[n ,1) x n→∞
4.3. Monotone Convergence 103

Corollary 4.3.3. ( Beppo–Levi) Let fn : Ω → [0, ∞] be a sequence of measurable functions,

then
Z X ∞ X∞ Z
(4.8) fn dµ = fn dµ
Ω n=1 n=1 Ω
P P
Proof. Let Fn = nk=1 fn , then 0 ≤ Fn ≤ Fn+1 ≤ ∞ and limn Fn (x) = ∞ k=1 fk (x). The
statement of the
R result
P will follow
Pdirectly
R from the Monotone Convergence Theorem once
we prove that Ω nk=1 fk dµ = nk=1 Ω fk dµ for each n. It suffices to consider only the
case n = 2. By Lemma 4.2.2 there are sequences 0 ≤ sin ≤ fi (i = 1, 2) of nondecreasing
sequence of simple functions such that limn sin (x) = fi (x). Thus, 0 ≤ s1n + s2n ≤ f1 + f2 is
a nondecreasing sequence
R of simpleR functions Rconverging to f1 + f2 . Lemma 4.1.2 and the
MCT imply that Ω (f1 + f2 ) dµ = Ω f1 dµ + Ω f2 dµ.
P
Theorem 4.3.4. (Borel–Cantelli I). Suppose T S An ∈ F and that n µ(An ) < ∞. Then,
µ(lim supn An ) = 0, where lim supn An = n k≥n Ak .
P R P
Proof. Consider f (x) = n 1An . Since Ω f dµ = n µ(A ) < ∞, it follows that µ({f =
T nS
∞}). The coclusion follows by noticing that {f = ∞} = n k≥n Ak .

The set lim supn An , usually denoted by {An i.o}, is the set in which the events An occur
infinitely often.

Corollary 4.3.5. Suppose f : Ω → [0, ∞] is a measurable function and let

Z
(4.9) µf (E) = f dµ E∈F
E

Then, µf is a measure on F and for any measurable function g : Ω → [0, ∞] we have that
Z Z
(4.10) g dµf = gf dµ
Ω Ω

Proof. It is clear by definition (4.3) that µf (∅) = 0. It remains to verify that µf is

countably additive. LetP∞An be a sequenceR of pairwise
R disjoint measurable sets with union
A. Notice that f 1A = n=1 f 1An . Since E f dµ = Ω f 1E dµ for any E ∈ F , the countable
subadditivity follows from Corollary 4.3.3.

Observe that (4.10) holds if g = 1E with E ∈ F and so it holds also for any nonnegative
simple function. The result follows by monotone convergence and Lemma 4.2.2.

Remark 4.3.6. If two measurable functions f, g : Ω → [0, ∞] are equal R µ–a.s. then µRg (E) =
µf (E) for all E ∈ F . Indeed, if A = {ω ∈ Ω : f (ω) 6= g(ω)} then for A f dµ = 0 = A g dµ.
Since µf (E) = µf (E ∩ A) + µf (E \ A), it follows that µf (E) = µg (E). This shows that the
MCT and its equivalents can be restated by assuming that the hypothesis hold µ–almost
surely.
104 4. Integration: measure theoretic approach

Theorem 4.3.7. (Fatou’s Lemma) If fn : Ω → [0, ∞] is a sequence of measurable functions,

then
Z Z
(4.11) lim inf fn dµ ≤ lim inf fn dµ
Ω n n Ω

Proof. Consider the sequence gn (ω) := inf k≥n fk (ω). Observe that each gn is measurable,
0 ≤ gn ≤ gn+1 , gn ≤ fn and limn→∞ gn = lim inf n→∞ fn . Thus,
Z Z
(4.12) gn dµ ≤ fn dµ.
Ω Ω
Letting n → ∞, the conclusion of the statement follows from the MCT.

Any function f : Ω → R can be decomposed as the difference of two nonnegative

functions f (ω) = f+ (ω) − f− (ω), where f+ (ω) := f (ω) ∨ 0 and f− (ω) := f (ω) ∧ 0. Clearly,
f is measurable if and only if each function f+ and f− is measurable. Similarly, a complex
values function g is a measurable if and only if each u = Re(g) and v = Im(g) is measurable.
Definition 4.3.8. A complex
R or extended real valued measurable function f on Ω is
Lebesgue integrable if Ω |f | dµ < ∞. The set of all (extendedRreal or complex)
R inte-
1
grable functions is denoted by L (Ω,RF , µ). Suppose that either Ω f+ dµ or Ω f− dµ is
finite. The Lebesgue integral of f , Ω f dµ, is defined by
Z Z Z
(4.13) f dµ := f+ dµ − f− dµ
Ω Ω Ω
R
If g is complex valued, u = Re(g) ∈ L1 and v = Im(g) ∈ L1 , then define Ω g dµ by
Z Z Z Z Z
(4.14) g dµ := u+ dµ − u− dµ + i( v+ dµ − v− dµ)
Ω Ω Ω Ω Ω
R
Remark 4.3.9. Since f+ ∨ f− ≤ |f |, it follows that f± ∈ L1 if and only if ΩRf dµ ∈ R if
and only if f ∈ L1 . Similarly, since |u| ∨ |v| ≤ |g|, then u, v ∈ L1 if and only if Ω g dµ ∈ C
if and only if g ∈ L1 .
Theorem 4.3.10. Suppose that f and g ∈ L1 (Ω, F , µ). Then
Z Z Z
(4.15) (a f + b g) dµ = a f dµ + b g dµ
Ω Ω Ω
where a and b ∈ R if f and g are real valued, or a and b ∈ C if f and g are complex valued.

Proof. Measurability of a f + b g follows from exercise 3.10.19. Since |a f + b g| ≤ |a| |f | +

|b| |g| then a f + b g ∈ L1 whenever f and g ∈ L1 .
If f and g are real valued, let h = f +g, then h+ −h− = (f+ −f− )+(g+ −g− ). Equivalently,
h+ + f− + g− = f+ + g+ + h− . By the Monotone convergence Theorem or Corollary 4.3.3
we have
Z Z Z Z Z Z
h+ dµ + f− dµ + g− dµ = f+ dµ + g+ dµ + h− dµ,
Ω Ω Ω Ω Ω Ω
R R R
from which it follows immediately that Ω h dµ = Ω f dµ + Ω g dµ.
4.4. Lebesgue Dominated Converge 105

If a > 0, then af = (af )+ −(af )− = af+ −af−R; whereas if aR < 0 then af = (af )+ −(af )− =
−af− − (−af+ ). It follows immediately that Ω af dµ = a Ω f .
The complex valued case follows from the real one by considering the u = Re(f ) and
v = Im(f ) parts separately, and from i f = i (u + i v) = −v + i u.
Theorem 4.3.11. If f ∈ L1 (Ω, F , µ) then
Z Z

(4.16) f dµ ≤ |f | dµ.
Ω Ω
Equality in (4.16) holds iff there is a constant α ∈ C with |α| = 1 such that αf = |f | µ–a.s

Proof. For a extended–real valued function R f the result follows from −|f | ≤ f ≤ |f |. For
the complex valued case, denote by z = Ω f dµ ∈ C and let α ∈ S1 such that αz = |z|.
Then by Theorem 3.10.19
Z Z Z Z Z

(4.17) f dµ = α f dµ = αf dµ = Re(αf ) dµ ≤ |f | dµ
Ω Ω Ω Ω Ω
R
where the last two relations in (4.17) follow from | Ω f dµ| ≥ 0 and Re(αf ) ≤ |αf | = |f |
respectively.
If there is equality in (4.16) then, from |f | − Re(αf ) ≥ 0 and Corollary 4.2.5 we conclude
that |αf | = Re(αf ) a.s., that is, αf = Re(αf ) = |f | a.s.

RRemark 4.3.12. If (Ω, F , µ) is a probability space and f ∈ L1 (µ), then the integral
Ω f dµ, commonly denoted by Eµ [f ], is called the expectation or expected value of f
under µ. The mention of µ is ommited when µ is clear for the context.
Lemma 4.3.13. Suppose fR ∈ L1 , then for any ε > 0 there is δ > 0 such that for any
A ∈ F , if µ(A) < δ, then | A f dµ| < ε.

Proof. Consider Fn = |f | ∧ n so thatR Fn ր |f |. Byε monotone convergence, given ε > 0

ε
R enough so that |f | − FN dµ < 2 . Let δ = 2N . Then, if µ(A) < δ we
there Ris N be large
have A |f | dµ ≤ Ω (|f | − FN ) dµ + N µ(A) ≤ ε.

4.4. Lebesgue Dominated Converge

The following result is one of the most useful and important in the theory of integration.
It is equivalent to monotone convergence and Fatou’s Lemma.
Theorem 4.4.1. (Lebesgue’s dominated convergence) Let {fn }n and {gn } be µ–a.s. point-
wise convergent sequences of measurable functions (real or complex) such that f = limn fn
µ–a.s., g = limn gn µ–a.s., and
(4.18) |fn | ≤ gn a.s.
If
Z Z
(4.19) lim gn dµ = g dµ < ∞,
n
106 4. Integration: measure theoretic approach

then f ∈ L1 and
Z Z Z
(4.20) lim |fn − f | dµ = 0, lim fn dµ = f dµ
n→∞ Ω n→∞ Ω Ω

Proof. With out loss of generality, we can assume that pointwise convergence and (4.18)
hold everywhere.

Clearly |f | ≤ g and so, f ∈ L1 . Since gn + g − |fn − f | ≥ 0, from Fatou’s lemma and (4.19)
we obtain
Z Z
2g dµ ≤ lim inf (gn + g − |f − fn |) dµ
Ω n Ω
Z Z
= 2 g dµ + lim inf − |f − fn | dµ
n
ZΩ Z Ω
= 2 g dµ − lim sup |f − fn | dµ.
Ω n Ω
R R
Since
R |fn − f | ≥ 0, lim supn Ω |f − fn | dµ = 0. To conclude, notice that Ω (fn − f ) dµ ≤
Ω |fn − f | dµ.
Theorem 4.4.2. If {fn : n ∈ N} ⊂ L1 is a Cauchy sequence, then there is f ∈ L1 such
that limn kfn − f k = 0. If f˜ is any other such function, then f = f˜ µ–a.s.
Remark 4.4.3. This result says that after identifying all integrable function thatRdiffer on
sets of measure zero, the resulting space L1 is a Banach space with norm kf k := |f | dµ.

Proof. Since {fn } is a Cauchy sequence, there is a subsequence {fnk : k ∈ N} such that
kfnk+1 − fnk k < 2−k . Let
k
X
g k = fn 1 + (fnj+1 − fnj )
j=1
k
X
Gk = |fn1 | + |fnj+1 − fnj |
j=1

By monotone convergence GK converges pointwise to some function G ∈ L1 and limk kGK −

Gk = 0. On {G 6= ∞}, Gk absolutely converges to G, thus gk converges to some function f .
As |gk | ≤ Gk ≤ G, we have by dominated convergence that g ∈ L1 and limk kgk − gk = 0.
Since gk = fnk+1 , fn converges to f µ–a.s. and limn kfn − f k = 0.
The last statement follows from kf − f˜k ≤ kf − fn k + kfn − f˜k.

The following application of dominated convergence will help illustrate the strength of
the monotone class theorem.
R
Theorem
R n 4.4.4. Suppose µ and ν are finite measures on ([0, 1], B([0, 1]). If xn µ(dx) =
x ν(dx) for all n ∈ Z+ then, µ = ν.
4.5. Riemann integral and Lebesgue integral on R. 107

Proof. The family of functions {pn (x) = xn : n ∈ Z+ } is a multiplicative family and it

generates
R RB([0, 1]). The collection V of real bounded measurable functions f such that
f dµ = f dν is a real vector space, it contains the constants, and by dominated conver-
gence, it is also a bounded monotone class. The real monotone class theorem implies that
V contains all bounded B([0, 1]) measurable functions.

4.5. Riemann integral and Lebesgue integral on R.

Consider the measure space ([a, b], B([a, b]), λ). A partition of [a, b] is finite set P = {a =
t0 < . . . < tn = b}. Define mk = inf{f (t) : t ∈ [tk−1 , tk ]} and Mk = sup{f (t) : t ∈ [tk−1 , tk ]}.
The lower and upper Riemann–Darboux sums are defined by
n
X
(4.21) (f, P ) = mk (tk − tk−1 )
k=1
n
X
(4.22) U (f, P ) = Mk (tk − tk−1 )
k=1

Let P the collection of all partitions of [a, b].

Definition 4.5.1. A function f : [a, b] → R is Riemann integrable if
(4.23) sup L(f, P ) = inf U (f, P )
P ∈P P ∈P

The common value A(f ) in (4.23) is called the Riemann integral of f over [a, b].

It is easy to see that for any partitions P1 and P2 of [a, b]

L(f, P1 ) ≤ L(f, P1 ∪ P1 ) ≤ U (f, P1 ∪ P2 ) ≤ U (f, P2 )
It follows that f is Riemann integrable over [a, b] if and only if f is bounded and for any
ε > 0 there is a partition Pε such that
(4.24) U (f, Pε ) − L(f, Pε ) < ε.
Theorem 4.5.2. Suppose that f is Riemann–integrable in [a, b], and let M ([a, b]) be the
LebesgueR σ–algebra. Then, f ∈ L1 ([a, b], M ([a, b]), λ) and f is continuous λ–a.s. Moreover,
A(f ) = [a,b] f dλ.

Proof. Choose partitions Pn ⊂ Pn+1 such that U (f, Pn ) − L(f, Pn ) < 1/n. For each
partition Pn , let mn,k = inf{f (t) : t ∈ [tn,k−1 , tn,k ]} and Mn,k = sup{f (t) : t ∈ [tn,k−1 , tn,k ]}.
Let gn and hn be defined by gn (a) = hn (a); and gn (t) = mn,k , hRn (t) = Mn,k on t ∈
(tn,k−1 , tn,k ]. Clearly, gn ≤ gn+1 ≤ f ≤ hn+1 ≤ hn on [a, b] \ Pn , and [a,b] gn = L(f, Pn ) ≤
R
U (f, Pn ) = [a,b] hn .
R R
Dominated convergence implies [a,b] g(x)dx = [a,b] h(x)dx = A(f ); Thus, since g =
limn gn ≤ f ≤ limn hn = h, then g = fS = h a.s. Let D = {t ∈ [a, b] : g(t) < f (t)}.
Then, f is continuous at every point x ∈ / n Pn ∪ D.
108 4. Integration: measure theoretic approach

R
Example 4.5.3. The function f = 1[0,1]\Q ∈ L1 ([0, 1]) and [0,1] f dλ = 1; however, f is not
Riemann integrable in [0, 1] since U (f, P) − L(f, P) = 1 for any partition P of [0, 1].

Let f be a real valued funcion defined on an interval [a, b]. The modulus of continuity
of f on a set T ⊂ [a, b] is defined as
Ωf (T ) := sup{f (x) − f (y) : x, y ∈ T }.
For x ∈ [a, b], the modulus of continuity of f at x is defined as
ωf (x) = lim Ω(B(x; δ) ∩ [a, b]) = inf Ω(B(x; δ) ∩ [a, b])
δց0 δ>0

Lemma 4.5.4. If ωf (x) < ε for all x ∈ [c, d] ⊂ [a, b], then exists δ > 0 such that Ωf (T ) < ǫ
for all T ⊂ [c, d] with diam(T ) < δ.

Proof. For any x ∈ [c, d] there is δx > 0 such that Ωf (B(x; δx ) ∩ [c, d]) < ε. The collection
of all B(x; δx /2) forms an open cover of [c, d]. By compactness, there are x1 , . . . , xk with
[c, d] ⊂ ∪kj=1 B(xj ; δj /2). Let δ = min{δj /2}. If T ⊂ [c, d] with diam(T ) < δ, then is fully
contained in at least one B(xj ; δj ) so Ωf (T ) < ǫ.

The following result gives a full characteriation of Riemann integrable functions.

Theorem 4.5.5. (Lebesgue) A function f is Riemann–integrable in [a, b] iff f is bounded
and continuous λ–a.s. in [a, b].

Proof. Only sufficiency remains to be proved. For each r > 0, define Jr = {x ∈ [a, b] :
ωf (x) ≥ r}. Each Jr is a closed subset in [a, b], see Lemma 17.3.2, and the set of discontinu-
ities of f is J = ∪k∈NJ1/k . Then, each J1/k is a compact subset of measure zero; thus, for
eack k, there is a finite collection of open (w.r.t [a, b]) intervals Ak covering Jk whose lengths
add up less than 1/k. The complement of the union of intervals Ak is a finite collection of
close subintervals Bk By Lemma 4.3.11, there is 0 < δk such that if T ⊂ [a, b] \ ∪k Ak and
diam(T ) < δk , then Ωf (T ) < k1 . Let Pr be the partition formed by the endpoints of the Ak ,
and by the subintervals contained in Bk whose lengths are less that δk . Then,
U (f, Pk ) − L(f, Pk ) = S1 + S2
where S1 is formed by the subintervals Ak and S2 by subintervals contained in Bk . Then
S1 ≤ (M − n)/k and S2 ≤ (b − a)/k; hence, for k large enough we have that U (f, Pk ) −
L(f, Pk ) < ε.

An important example of Riemann integrable functions are the so called piecewise con-
rinuous functions. A function f on [a, b] is piecewise coninuous if there exists a finite set
C ⊂ [a, b] such that f is continuous on [a, b] \ C, and f admits finite left limits and right
limits at every point in (a, b] and [a, b) respectively. A piecewise continuous function f on
[a, b] is piecewise differentiable if f is continuously differentiable outside of a countable
set D ⊂ [a, b], and its derivative f ′ , defined in [a, b] \ D, admits left and right limits at every
point of (a, b] and [a, b) respectively. A function is piecewise continuous (differentiable) in
R if it is piecewise continuous (differentiable) on any finite interval [a, b].
4.6. Integration under measurable transformations 109

4.6. Integration under measurable transformations

We show here that a measurable function T from a measure space (Ω, F , µ) to a measurable
space (R, R) induces a measure on R. That is, T pushes forward a measure µ in the domain
to a measure µT on the image.
Lemma 4.6.1. ( Factorization). Let f : (Ω, F ) → (Ω′ , F ′ ) be measurable function, and let
(R, R) be either R or Rd with the Borel σ–algebra. If g : (Ω, σ(f )) → (R, R) is measurable,
then there exists a measurable function h : (Ω′ , F ′ ) → (R, R) such that g = h ◦ f .

Proof. It suffices to consider the case g ≥ 0 and then apply the conclusion of this case
to real u and imaginary v parts ofPg and the positive and negative parts of u and v. By
Lemma 4.2.2(ii) we have that g = ∞ P∞αn ≥ 0 are constants and An ∈ σ(f ).
n=1 αn 1An where
−1 ′
Thus An = f (Bn ) for some Bn ∈ F . Thus g = n=1 αn 1Bn ◦ f .
Definition 4.6.2. Let T : (Ω, F , µ) → (R, R) measurable. We define the induced measure
µT on R by
(4.25) µT (B) := µ(T −1 (B))
for all B ∈ R. The measure µT := µ ◦ T −1 is called the push forward of µ by T . When µ
is a probability space, the induced measure is called the law or distribution of T under
µ.
Theorem 4.6.3. Consider T : (Ω, F , µ) → (R, R) and the induced measure µ ◦ T −1 on
F . Suppose that f is a extended real or complex valued function defined on (R, R). Then,
f ◦ T ∈ L1 (Ω, F , µ) if and only if f ∈ L1 (R, R, µ ◦ T −1 ). Furthermore,
Z Z
(4.26) f ◦ T dµ = f d(µ ◦ T −1 )
Ω R

Proof. The statement holds for indicator functions by (4.25) and thus, by linearity it holds
for simple functions. The extension to nonnegative real valued measurable functions follows
by monotone convergence. For general f , the conclusion follows by applying (4.26) to the
ℜ(f )+ , ℜ(f )− , ℑ(f )+ and ℑ(f )− separately.
Theorem 4.6.4. Suppose F : R −→ R is a nondecreasing right–continuous function
(F (t+) := limx→t+ F (x) = F (t)). Then, there exists a unique measure µ on (R, B(R))
such that µ((a, b]) = F (b) − F (a) for all a < b.

Proof. For α = inf x∈R F (x) and β = supx∈R F (x) let ((α, β), B((α, β)), λ) be a standard
Lebesgue space. Define the map X : (α, β) −→ R by
X(t) = inf{x ∈ R : F (x) ≥ t}
Increasing monotonicity and right–continuity of F implies that X(t) ≤ x if and only if
t ≤ F (x). Hence X is measurable, and the induced measure µX = λX −1 on B(R) satisfies
µ((a, b]) = λ(X −1 ((a, b]) = λ((F (a), F (b)]) = F (b) − F (a).
110 4. Integration: measure theoretic approach

It follows that µ is a σ–finite measure on (R, B) such that µ(a, b] = F (b) − F (a). Since the
collection of intervals {(a, b] : a < b} is π-system that generates B(R), uniqueness follows
by Sierpinski’s theorem.
Lemma 4.6.5. Suppose H : (a, b) → R, −∞ ≤ a < b ≤ ∞, is anon–decreasing function,
and define
G(x) = H(x−) = sup H(y) (a < x < b)
y<x
F (x) = H(x+) = inf H(z) (a < x < b).
x<z

Then, G and F are non–decreasing functions such that G ≤ H ≤ F , and

F (x) = F (x+) = G(x+)
G(x) = G(x−) = F (x−).

Proof. For any a < x < y < z < b, the monotonicity of H implies that
(4.27) G(x) ≤ H(x) ≤ F (x) ≤ G(y) ≤ H(y) ≤ F (y) ≤ G(z) ≤ H(z) ≤ F (z).
Letting y ր z we obtain
G(x) ≤ H(x) ≤ F (x) ≤ G(z−) ≤ G(z) ≤ F (z−) ≤ G(z).
Thus, G(z) = F (z−). Letting x ր z gives
G(z−) ≤ G(z) ≤ G(z−).
Therefore, G(z) = G(z−) = F (z−). Similarly, by letting first y ց x, and then letting
z ց x in (4.27), we obtain that F (x) = F (x+) = G(x+).
Lemma 4.6.6. Given a right–continuous non–decreasing function F : (a, b) → R, let α =
inf a<x<b F (x), β = supa<x<b F (x) and G(x) = F (x−). For any α < q < β define
(4.28) Q(q) = inf{x ∈ (a, b) : F (x) ≥ q}
(4.29) Q+ (q) = sup{x ∈ (a, b) : G(x) ≤ q}.
Then, Q is a non–decreasing left continuous function, Q+ is a non–decreasing right contin-
uous function,
(4.30) F (Q(q)) ≥ q
(4.31) G(Q+ (q)) ≤ q,
Q(q) = Q+ (q−), and Q+ (q) = Q(q+).

Proof. The definition of Q and the monotonicity of F imply that (Q(q), ∞) ⊂ {x : F (x) ≥
q}. If Q(q) < zn ց Q(q) then F (zn ) ≥ q, and by the right–continuity of F , F (Q(q)) ≥ q.
Consequently,
(4.32) F (x) ≥ q iff Q(q) ≤ x.
4.7. Exercises 111

Similarly, the monotonicity of G imply that (−∞, Q+ (q)) ⊂ {x : G(x) ≥ q}. If Q+ (q) >
xn ր Q+ (q), then G(zn ) ≤ q and, by left–continuity of G, G(Q+ (q)) ≤ q. Consequently,
(4.33) G(x) ≤ q iff x ≤ Q+ (q).
We claim that Q+ (q) ≤ Q(p) whenever q < p. Indeed, if x < Q+ (q) then, by (4.33),
G(x) ≤ G(x+) = F (x) ≤ q < p. From (4.32) we get that x < Q(p).
Consequently, for any α < q < p < β,
(4.34) Q(q) ≤ Q+ (q) ≤ Q(q+) ≤ Q+ (q+) ≤ Q(p) ≤ Q+ (p)
(4.35) Q(q) ≤ Q+ (q) ≤ Q(p−) ≤ Q+ (p−) ≤ Q(p) ≤ Q+ (p).
Applying G to each side of (4.34) leads to
G(Q+ (q)) ≤ G(Q(q+)) ≤ G(Q+ (q+)) ≤ p.
By letting p ց q we obtain that
G(Q+ (q)) ≤ G(Q(q+)) ≤ G(Q+ (q+)) ≤ q.
Therefore, Q+ (q) ≤ Q(q+) ≤ Q+ (q+) ≤ Q+ (q).

Similarly, by applying F to each side of (4.35) and then letting q ր p, we obtain Q(p) ≤
Q(p−) ≤ Q+ (p−) ≤ Q(p).
Example 4.6.7. If µ is a finite measure on (R, B(R)) then F (x) = µ(−∞, x] and G(x) =
µ(−∞, x) are non–decreasing functions such that F (x) = F (x+) = G(x+) and G(x) =
G(x−) = F (x−).
Example 4.6.8. The standard normal distribution µ on B(R) is the probability mea-
sure defined by
1 2
µ(dx) = √ e−x /2 dx
2π
Let T (x) = x2 , then
Z t √
−1
√ √ 1 2
µ(T (−∞, t]) = µ [− t, t] = 2 √ e−x /2 dx
0 2π
Z t Z t
1 1/2 1
= √ s −1/2 −s/2
e ds = (s/2) 2 −1 e−s/2 ds
2π 0 Γ(1/2) 0
1/2 1
Thus, the law of T is given by µT (dt) = (t/2) 2 −1 e−t/2 1(0,∞) (t) dt which, in Statistics,
Γ(1/2)
is known as the χ21 –distribution.

4.7. Exercises
Exercise 4.7.1. In Steinhaus space ([0, 1], B([0, 1]), λ), give sequences fn and gn of non-
negative measurable functions that converge to zero and such that
R
(a) limn [0,1] fn (t)dt = 1
112 4. Integration: measure theoretic approach

R
(b) limn [0,1] gm (t)dt =∞
Give sequences hn , pn of nonnegative measurable functions such that 0 = lim inf hn <
lim supn hn = ∞, and 0 = lim inf pn < lim supn pn = ∞ such that
R R
(c) 1 = lim inf n [0,1] hn (t)dt < lim supn [0,1] hn (t)dt = 2
R
(d) limn [0,1] pn (t)dt = 0
Exercise 4.7.2. Suppose that fn is a sequence of integrable functions such that fn ≥ fn+1
for all n. Show that
Z Z
lim fn dµ = f dµ
n→∞ Ω Ω
Exercise 4.7.3. Suppose f is a continuous function in an interval [A, B]. For any A ≤ a <
b < B, show that
Z b

lim r f (t + 1r ) − f (t) dt = f (b) − f (a)
r→∞ a
Exercise 4.7.4. Suppose f ∈ L1 ([0, ∞), B([0, ∞)), λ1 ). Find
Z
1 r
lim xf (x) dx
r→∞ r 0

(Hint: For gr (x) := 1r x f (x)1[0,r] (x), r > 0, we have |gr | ≤ |f |)

Exercise 4.7.5. Suppose that K : [a, b] × Ω → R, and for each t ∈ [a, b] fixed let Kt : ω 7→
K(t, ω). Suppose that
(i) for each t, Kt ∈ L1 (Ω, F , µ)
∂
(ii) for each t, the map ω 7→ ∂t K(t, ω) exists and is measurable
∂
(iii) | ∂t K(t, ω)| ≤ g(ω) for some g ∈ L1 .
R R ∂
Show that f : t 7→ Ω K(t, ω)µ(dω) is differentiable and that f ′ (t) = Ω ∂t K(t, ω)µ(ω).
Exercise 4.7.6. Let (S, d) be a metric space, and denote by τ and Cb (S) the set of all open
sets and the space of all real bounded continuous functions
R on
R S. For any measures µ and ν
on B(S), show that µ(U ) = ν(U ) for all U ∈ τ iff f dµ = f dν for all f ∈ Cb (S). (Hint:
For any U ∈ τ , let gn (x) = 1 ∧ (nd(x, U c )); then, gn ր 1U . Conversely, for any f ∈ Cb (S)
P2n −1
with 0 ≤ f ≤ 1, let fn = 2−n k=1 1{f >2−n k} ; then, fn ր f .)

RExercise 4.7.7. Let µ be a measure on the Borel space (R, B) such that the integral
tx
e µ(dx) < ∞ for all t in an open interval I. Show that:
(a) If A is a compact subset of R, then µ(A) < ∞.
R
(b) For any t ∈ I the integral |x|etx µ(dx) < ∞,
R
(c) The map φ given by t 7→ etx µ(dx) is differentiable on I and that
Z
φ (t) = xetx µ(dx)
′

(Hint: Use the mean value theorem to show that |eu − 1| ≤ |u|(1 ∨ eu ))
4.7. Exercises 113

R √n x2 n
Exercise 4.7.8. Show that the an = √ (1
− n
− 2n ) dx converges and identify the limit.

Exercise 4.7.9. (Gamma function) For any r > 0 define

Z ∞
(4.36) Γ(r) = tr−1 e−t dt
0
(a) Show that Γ(r) < ∞ for all r > 0.
(b) Show that Γ(r + 1) = rΓ(r). In particular, for r ∈ N,
(2r − 1)!
Γ(r + 1) = r!, Γ(r + 12 ) = Γ( 1 ).
22r−1 (r− 1)! 2
(Hint: Integrate by parts).
R∞
(c) Show that Γ is differentiable and that Γ′ (r) = 0 tr−1 log(t)e−t dt.
Exercise 4.7.10. (Beta function)
(a) Show that the integral
Z 1
(4.37) B(a, b) = xa−1 (1 − x)b−1 dx < ∞
0
for all 0 < a, b < ∞. This is the beta function.
(b) Letting x = sin2 θ, 0 < θ < π2 , show that
Z π/2
B(a, b) = 2 sin2a−1 θ cos2b−1 θ dθ
0
1
Exercise 4.7.11. Define fp (x) = 1
(1−x2 )p (−1,1)
(x).
(a) Show that xn fp (x) ∈ L1 (λ) for all p < 1 and n ∈ Z+ .
(b) Justify the following
Z X (−1)n t2n Z x2n
fp (x)e−ixt dx = 2 p
dx
[−1,1] (2n)! [−1,1] (1 − x )
n≥0
P
N
(−ixt)n |t|
(Hint: fp (x) n! ≤ e fp (x)).
n=0
(c) Show that
Z X (−1)n t2n 1
fp (x)e−ixt dx = B(n + , 1 − p)
[−1,1] (2n)! 2
n≥0

Exercise 4.7.12. Let

Z x 2 Z 1 2
e−x (1+t
2)
−t2
f (x) = e dt , g(x) = dt
0 0 1 + t2
(a) Show that g ′ (x) + f ′ (x) = 0 for all x and deduce that g(x) + f (x) = π4 .
114 4. Integration: measure theoretic approach

(b) Conclude from (a) that

Z ∞ √
−t2 π 1
e dt = = Γ( 12 ).
0 2 2
Exercise 4.7.13. (a) Show directly from Definition 4.5.1 that if f is monotone on [a, b],
then f isR Riemann integrable. (b) Suppose g is Riemann integrable over [a, b] and define
G(x) = [a,x] g(t) dt, for all a ≤ x ≤ b. Show that G is continuous on [a, b] and, if g is
continuous at x0 , show that G is differentiable at x0 and that G′ (x0 ) = g(x0 ).

Exercise 4.7.14. (Direct Riemann integrable functions) Let f : [0, ∞) → [0, ∞). For any
h > 0 let Mnh (f ) = sup{f (x) : x ∈ [(n − 1)h, nh)}, and similarly mhn (f ) = inf{f (x) : x ∈
[(n − 1)h, nh)}. Define
∞
X Z
h
f h (x) = Mn (f )1[(n−1)h,nh) (x), Uh (f ) = f h (x) dx
n=1 [0,∞)

X∞ Z
f h (x) = mhn (f )1[(n−1)h,nh) (x), Lh (f ) = f h (x) dx
n=1 [0,∞)

The function f is said to be direct Riemann integrable (R.d.i) if Uh (f ) < ∞ for all h > 0,
and limh→0 Uh (f ) − Lh (f ) = 0. Show that
(a) If Uh (f ) < ∞ for some h0 > 0, then Uh (f ) < ∞ for all h > 0.
(b) f is d.R.i iff f is bounded, continuous a.s. and Uh (f ) < ∞ for some h0 > 0.
(c) If f is d.R.i., then f ∈ L1 ([0, ∞), B([0, ∞)), λ).
(d) Suppose that f is bounded and continuous a.s. and g is a d.R.i. function. If f ≤ g,
then f is also d.R.i.

Exercise 4.7.15. (Quantile function) Let (Ω, F , P) and X be a probability space and a
real–valued measurable function on (Ω, F ) respectively. For any q ∈ (0, 1), a number zq
such that
P[X < zq ] ≤ q and P[X ≤ zq ] ≥ q,
is called a q–quantile of X. The functions Q and Q+ on (0, 1) defined by
Q(q) = inf{x ∈ R : P[X ≤ x] ≥ q}
Q+ (q) = sup{x ∈ R : P[X < x] ≤ q}
are non–decreasing left–continuous and right–continuous respectively. Show that
1. zq is a q–quantile of X iff −zq is a (1 − q)–quantile of −X.
2. Q(q) is the smallest q–quantile of X and Q+ (q) is the largest q–quantile of X.

3. Show that λQ−1 (−∞, x] = P[X ≤ x]. Thus, R Q and X have the R same law and for
any bounded measurable function φ on R, (0,1) φ(Q(p)) dp = Ω φ(X(ω))P(dω).
4.7. Exercises 115

R R
Exercise 4.7.16. Let µ be a measure on R and let f , g ∈ L+ (µ) such that R f dµ = R g dµ.
If there is c ∈ R such that f (x) ≤ g(x) for almost all x < c and f (x) ≥ g(x) for almost all
x > c, show that

µ · f (a, ∞) ≤ µ · g (a, ∞) , a∈R
where d(µ · f ) = f (x) dµ and d(µ · g) = g(x) dµ.
Exercise 4.7.17. (Generalized Chebyshev–Markov inequality) Let (Ω, F , µ) be a measure
space, f : Ω −→ R measurable and suppose φ : R −→ [0, ∞) is a nondecreasing function
such that φ ◦ f ∈ L1 (µ).
(a) For any function g : (R2 , B(R2 )) → [0, ∞) such that g(x, a) ≥ 1{x>a} , show that
Z
1
(4.38) µ({f > a}) ≤ R g(f (ω), a)φ(f (ω)) µ(dω) a∈R
Ω φ ◦ f dµ
(Hint: Consider the induced measure µf = µ ◦ f −1 on B(R) and Exercise 4.7.16
with µf · 1 and µf1φ µf · φ).
(b) Show that the Chebyshev–Markov inequality (4.4) follows as a particular example
of (4.38) (Hint: consider φ ≡ 1, f ≥ 0 and g(x, a) = xa 1{x>a} , where x, a ≥ 0).
(c) For any nonnegative nondecreasing function v, show that
Z
1
µ(f > a) ≤ v ◦ f dµ a ∈ R.
v(a)
(Hint: this can be prove directly as in the proof of the Chebyshev–Markov inequal-
ity or from (4.38) with φ ≡ 1 and g(x, a) = v(x)
v(a) ).

Exercise 4.7.18. Let (Ω, F , µ) and (Ψ, H, ν) be finite measure spaces with µ(Ω) = ν(Ψ).
Assume that (R, σ(C)) is a measurable space where C is a π–system. If X, Y are measurable
functions in R defined on Ω and Ψ respectively, show that the induced measures µX and
νY on (R, σ(C)) are the same if and only if they coincide on C.
Exercise 4.7.19. (Decreasing rearrangement). Suppose f is a real–valued measurable
function on (Ω, F , µ). Let δf (t) = µ[|f | > t] and define
f ∗ (s) = inf{t : δf (t) ≤ s}
(a) Show that δf (t) ≤ s iff f ∗ (s) ≤ t. (b) Show that f ∗ is nonincreasing and right–
continuous. (c) Suppose that δf (s) < ∞ for all s > 0. Show that f ∗ (δf (s)) ≤ s and
δf (f ∗ (s)) ≤ s. (d) Let λ be Lebesgue measure on ([0, ∞), B([0, ∞)). Show that
λ[f ∗ > t] = µ[|f | > t]
Thus f ∗ and |f | Rhave the same law. In
R ∞particular, for any measurable function ϕ : [0, ∞] →
∗
R, we have that φ(|f |(x)) µ(dx) = 0 φ(f (s)) ds.
Exercise 4.7.20. Suppose that (Ω, F , µ) is a finite measure space.
(i) Show that (F∗ )∗ = F∗ .
µ µ
(ii) Show that F ∗ = F .
Chapter 5

Baire Category and

Stone–Weierstrass
theorem

In this section we discuss two useful results form point set topology. The first result, known
as Baire’s Category Theorem, describes fat sets in topological spaces. The second result,
known as the The Stone–Weierstrass Theorem, is one of the most important results in basic
Analysis. In its classical form, it states that continuous functions in a compact set can be
uniformly approximated by polynomials. This result will be very useful in Chapter 6 where
we discuss a functional approach to integration Theory.
We will also develop in this section the functional counterpart of a monotone class of
sets. Monotone classes are useful to determine when two measures are equal.

5.1. Baire category

◦
Definition 5.1.1. A set E ⊂ X is nowhere dense if A = ∅. A set F is of first
category if it is the countable union of nowhere dense sets. If a set U is not of first
category then it is said to be of second category .

Observe that if E is a closed set which is nowhere dense, then V = X \ E is an open

dense set in X for
◦
X \ (X \ E) = X \ (X \ E) = E ◦ = ∅

The following result, known as the category theorem, has many theoretical applications in
mathematics.

Theorem 5.1.2. (Baire) If X is either

117
118 5. Baire Category and Stone–Weierstrass theorem

(a) a complete metric space, or

(b) a locally compact Hausdorff space,
then the intersection of countable family of open dense sets in X is also dense in X.

Proof. Let {Un } be a sequenceT of dense open sets in X. Let B0 be a nonempty open set
in X. We will show that B0 ∩ n Un 6= ∅. Given an integer n ≥ 1, suppose we have chosen
a nonempty set open set Bn−1 . Since Un is open and dense, there is a nonempty open set
Bn with Bn ⊂ Un ∩ Bn−1 . In case (a) we take Bn to be a ball of diameter less than 1/n; in
case (b) we take Bn to be an open set with compact closure, as in Lemma 2.11.2. Let
\
K= Bn .
n≥1

In case (a), the centers of the balls Bn form a Cauchy sequence, which, by completeness,
implies that K 6= ∅. In case (b), Bn is a decreasing sequence of nonemptyTcompact sets and
its intersection K is therefore non–empty. In either case, ∅ =
6 K ⊂ B0 ∩ n U n .

A space X where the intersection of any sequence of open dense sets is dense is called
Baire space. Equivalently, X is a Baire space iff any sequence of nowhere dense closed sets
has union with empty interior. Indeed, {Un : n ∈ N} is a sequence of openTdense subsets of
X iff {X
S \ Un : n ∈ N} is a sequence of nowhere dense closed sets. Thus, n Un is dense in
X iff n (X \ Un ) has empty interior. From this observation, it follows that if X is a Baire
space, then X is of second category.
T
Example 5.1.3. The set Q is not a Gδ set. S If it were,
S then Q = n Un for some sequence
{Un } of open dense subsets R. Since R = n (R \ Un ) Q and Q is countable, it would follow
that R is of first category, which is false since R is complete.
Example 5.1.4. Let F : X → S be any function from a topological space X into a metric
space S. For any ε > 0 let Uε be the union ofTall open sets U ⊂ X such that diam(F (U )) < ε.
Clearly, F is continuous at x iff x ∈ G = n U1/n . This shows that the set of continuity
points of function F is a Gδ set. Consequently, there is function on R into a metric space
S that is continuous only at Q.
Example 5.1.5. Given a Gδ set G ⊂ R, there exists a function f on R that is continuous
at G and discontinuous anywhere else. Indeed, let Gn P⊂ R be a decreasing sequence of
open sets and ψn = 1Gn + 21Q\Gn − 21Qc \Gn . Then, Ψ = n 2−n ψn is continuous at G and
discontinuous anywhere else.

5.2. Order on Vector spaces

In this section we recall the concepts of vector spaces and introduce the concept of order
that is compatible to the algebraic structure of linear spaces. This concepts will be used in
our discussion of Stone–Weierstrass theorem in the following section. Order structures will
appear in our discussion of elementary integrals and Daniell integration.
5.2. Order on Vector spaces 119

Definition 5.2.1. A vector space over a field F is a non empty set V with two operations:
addition that maps (x, y) ∈ V × V to an element x + y ∈ V , a scalar product that maps
(λ, x) ∈ F × V to an element λx ∈ V . These operations satisfy
(a) x + y = x + y.
(b) x + (y + z) = (x + y) + x.
(c) There is 0 ∈ V such that 0 + x = x + 0 = x.
(d) For each x ∈ V there is −x ∈ V such that x + (−x) = (−x) + x = 0.
(e) λ(γ x) = (λβ) x
(f) For all x ∈ V , 1 x = x, and (−1)x = −x
(h) λ(x + y) = λ x + λ y.
(i) (λ + γ)x = λ x + γ x.
A vector ring (simply ring if the context is clear) is a vector space R with an additional
operation (product) mapping each (x, y) ∈ R × R to an element xy ∈ R satisfying the
following properties:
(j) x(yz) = (xy)z
(k) xy = yx, and (λx)y = x(λy) for all x, y ∈ R and λ ∈ F.
An algebra A is a ring that has an element e ∈ A such that ex = xe = x for all x ∈ A.
Example 5.2.2. Consider FΩ , where F is either the set real numbers R or the set of complex
numbers C. We define sum, scalar multiplication and product of functions point wise, i.e.
(f + g)(x) = f (x) + g(x), (af )(x) = af (x), and (f g)(x) = f (x)g(x) for all x ∈ Ω and a ∈ F.
V ⊂ FΩ is a vector space of functions if af + g ∈ V for all f, g ∈ V and a ∈ F.
(a) A vector space of functions V is a ring if f · g ∈ V for all f, g ∈ V.
(b) A ring of functions V is an algebra if it contains the constant function 1.
(c) A vector space of functions V is a vector lattice if f ∧ g ∈ V, and hence f ∨ g ∈ V,
for any f, g ∈ V.
Definition 5.2.3. Suppose V is a vector space over R. A partial order ≤ on V is compatible
with the linear structure if V if for any α > 0, and x, y, x ∈ V we have
x ≤ y =⇒ ax ≤ ay, x ≤ y =⇒ x + z ≤ y + z
In such case (V, ≤) is said to be a partially ordered vector space.
A partially ordered vector space V is a vector lattice if for any x, y ∈ V there is z ∈ V ,
denoted as x ∨ y, such that x ≤ z, y ≤ z and z ≤ u whenever x ≤ u and y ≤ u.
Example 5.2.4. A vector subspace V ⊂ RΩ is a vector lattice if f ∧ g := min{f, g} ∈ V (or
equivalently f ∨ g := max{f, g} ∈ V) for all f, g ∈ V.
A set M ⊂ V is a linear subspace of V is M is also a linear space, that is, αx + y ∈ M
for all a ∈ F and x, y ∈ M . A linear subspace M of a partially ordered vector space V
majorizes V is for each x ∈ V , there is y ∈ M with x ≤ y.
120 5. Baire Category and Stone–Weierstrass theorem

Example 5.2.5. Let ℓ∞ denote the space of all bounded functions in RN. The subspace c
of convergent sequences on R majorizes ℓ∞ .

5.3. Stone–Weierstrass Theorem

A collection of real valued functions {φα : α ∈ D} is said to be increasingly directed if
(D, ) is a directed set and for any α, β ∈ D, α β implies that φα ≤ φβ . The following
Lemma is a classical result from the theory of continuous functions.
Lemma 5.3.1. (Dini’s Theorem) Let S be a compact set and let {φα : α ∈ D} be an
increasing directed family of continuous functions for which φ = supα φα is continuous.
Then, φα converges uniformly to φ.

Proof. Since φα (x) → φ(x) for each x ∈ S and φ − φα ∈ C(S) for each α ∈ D, the sets
Uα = {φ − φα < ε}, with ε > 0 fixed and α ∈ D, form an increasing directed open cover of
S. Hence, S = Uα0 for some α0 ∈ D. Consequently, for any x ∈ S and α ≥ α0 ,
|φ(x) − φα (x)| = φ(x) − φα (x) ≤ φ(x) − φα0 (x) < ε.
This shows that {φα : α ∈ D} converges to φ uniformly.

Dini’s lemma is typically applied in the context of monotone sequences of continuous

functions on a compact set for which pointwise limits are also continuous.
Lemma 5.3.2. Consider the sequence of functions (pn (t))n defined on [−1, 1] by
p0 (t) ≡ 0 pn+1 (t) = 21 (t2 + 2pn (t) − (pn (t))2 ).
Then, 0 ≤ pn (t) ≤ pn+1 (t) and pn (t) → |t| uniformly.

Proof. An simple computation shows that

(5.1) 2(pn+1 (t) − |t|) = (2 − pn (t))pn (t) − (2 − |t|)|t|
(5.2) 2(pn+1 (t) − pn (t)) = t2 − (pn (t))2
Since the function ϕ(x) = (2 − x)x is increasing on [0, 1], it follows from (5.1) that 0 ≤
pn+1 (t) ≤ |t| whenever 0 ≤ pn (t) ≤ |t|; as this holds for p0 (t) = 0, it also holds for all pn (t)
by induction. Having proved that 0 ≤ pn (t) ≤ |t| for all n and t ∈ [−1, 1], it follows from
(5.2) that 0 ≤ pn (t) ≤ pn+1 (t) ≤ |t|. From (5.2) we obtain that limn pn (t) = |t|. Uniform
convergence follows from Dini’s theorem.
Theorem 5.3.3. Let [−M, M ] be a symmetric interval with M > 0. There are sequences
of polynomials Pn (t) and Qn (t) that vanish at t = 0 which converge uniformly to |t| and
t ∧ 1 over [−M, M ] respectively.

Proof. Let pn (t) be as in Lemma 5.3.2 and consider the sequences

Pn (t) = (M + 1)pn ( Mt+1 )
e n (t) =
Q 1
+ 1 − Pn (t − 1))
2 (t
5.3. Stone–Weierstrass Theorem 121

Clearly Pn (t) converges to |t| uniformly over [−M − 1, M + 1] and Pn (0) = 0; while Q e n (t)
converges to 21 (t + 1 − |t − 1|) = t ∧ 1 uniformly over [−M, M + 2]. Since Q e n (0) → 0 as
n → ∞, the sequence Qn (t) = Q e n (t) − Q
e n (0) satisfies the conditions of the result.
Lemma 5.3.4. For each n ∈ Z+ let gn be the function on [−1, 1] given by
_n o
2kt k2 n
gn (t) = 2n − 2 2n : k ∈ Z, |k| ≤ 2
_n o
2kt 2kt k2 n
(5.3) = 2n − ( 2n ∧ 22n ) : k ∈ Z, |k| ≤ 2
1
Then 0 ≤ gn−1 (t) ≤ gn (t) ≤ t2 and t2 − gn (t) ≤ 4n for all n ∈ N and t ∈ [−1, 1].
2
Proof. For each n ∈ N and k ∈ Z with |k| ≤ 2n let gn,k (t) = 2kt k
2n − 22n . Since gn,0 (t) = 0 and
2 W
gn,k (t) ∨ 0 = 2kt 2kt k
2n − ( 2n ∧ 22n ), gn (t) = {gn,k (t) ∨ 0 : k ∈ Z, |k| ≤ 2n } and gn (t) ≤ gn+1 (t)
k 1
for all n ∈ Z+ and |t| ≤ 1. If |t − 2n | ≤ 2n , then

2
0 ≤ t − 2kn = t2 − gn,k (t) ≤ 41n .
S k−1 k+1
As [ 2n , 2n ] : k ∈ Z, |k| < 2n , the conclusion of the Lemma follows.

We will use Bb (Ω) to denote the collection of all bounded real–valued functions on Ω.
A subset E ⊂ RΩ is closed under chopping if f ∧ 1 ∈ E for any f ∈ E. E is called a
Stone lattice if it is vector lattice that is closed under chopping. E is called a ring lattice
closed under chopping it is a ring and a Stone lattice.
Definition 5.3.5. Suppose E ⊂ RΩ is a vector space. A function f ∈ RΩ is E–confined if
there is ψ ∈ E such that 1{f 6=0} ≤ ψ. In such case we say that ψ confines f . The set of all
functions in E that are E–confined is denoted by E00 . E is self–condined is E00 = E.
Example 5.3.6. The spaces C00 (Rn ) (real continuous compactly supported functions in
Rn ), Cb (Rn ) (real bounded continuous functions in Rn ) are self–contained. The uniform
closure of C00 (Rn ) denoted by C0 (Rn ) is not self–confined.
Remark 5.3.7. If f1 and f2 are E–confined, then there are ψj ∈ E for j = 1, 2 such that
1{fj 6=0} ≤ ψj . For any a ∈ R, {af1 + f2 6= 0} ⊂ {f1 6= 0} ∪ {f2 6= 0}. Therefore af1 + f2 is
confined by φ1 + φ2 . Hence, if E is a vector space, so is E00 . Since for any function f ∈ RΩ ,
f is E–confined iff |f | is E–confined, if E is a Stone lattice, so is E00 .
Lemma 5.3.8. If E ⊂ Bb (Ω) is a Stone lattice, then E00 is dense in E with the uniform
topology.

Proof. Let φ ∈ E+ . For any a > 0, φa := φ − φ ∧ a = (φ − a)+ ∈ E. Clearly φa is confined

by φ/a and so, φa ∈ E00 . The conclusion follows from |φ − φa | ≤ a.
Theorem 5.3.9. If E ⊂ Bb (Ω) is a Stone lattice or a ring then, the uniform closure E of E
is a ring lattice closed under chopping.
122 5. Baire Category and Stone–Weierstrass theorem

Proof. It is easy to check that E is a vector space whenever E is a vector space. Indeed,
let φ, ψ ∈ E and φn , ψn ∈ E such that kφ − φn ku ∧ kψ − ψn ku < n1 . Then for any scalars a,
b,
|a|+|b|
|aφ + bψ − aφn − bψn ku ≤ |a|kφ − φn ku + |b|kψ − ψn ku < n

Suppose E is a Stone lattice. The |φn | and φn ∧ 1 belong to E for each n ∈ N. Since
1
k|φ| − |φn |ku ≤ kφ − φn ku < n
1
kφ ∧ 1 − φn ∧ 1ku ≤ kφ − φn ku < n,

it follows that E is also a Stone lattice. To show that E is a ring is enough to show that
2
φ ∈ E whenever φ ∈ E for
1
(5.4) φψ = (φ − ψ)2 − (φ − ψ)2
2

Let M = supn kφn ku ∨ kφku . Let gn (t) be as in Lemma 5.3.4 and define Gn = M 2 gn φMn .
Then (Gn : n ∈ Z+ ) ⊂ E, kGn k ≤ M 2 and
2 2
kφ − G2n ku ≤ kφ − φ2n ku + kφ2n − Gn ku
M2
≤ 2M kφ − φn ku + 4n .
2
Therefore φ ∈ E. The polarization identity (5.4) implies that E is a ring.

Suppose E is a ring. If M = supn kφn ku ∨ kφku , then

2 2M
kφ − φ2n ku ≤ 2M kφ − φn kn ≤ .
n
Thus, by polarization, it follows that E is also a ring.

Let Pn (t) and Qn (t) be sequences of polynomials in [−M, M ] that vanish at t = 0 such that
||t| − Pn (t)| ∨ |t ∧ 1 − Qn (t)| ≤ n1 for all |t| ≤ M . Then Pn (φn ), Qn (φn ) ∈ E and

k|φ| − Pn (φn )ku ≤ k|φ| − |φn |ku + k|φn | − Pn (φn )ku

2
≤ kφ − φn ku + k|φn | − Pn (φn )k ≤ n
kφ ∧ 1 − Qn (φn )ku ≤ kφ ∧ 1 − φn ∧ 1ku + kφn ∧ 1 − Qn (φn )ku
≤ kφ − φn ku + kφn ∧ 1 − Qn (φn )ku ≤ n2 .

It follows that E is also a Stone lattice.

Theorem 5.3.10. (Stone–Weierstrass Theorem) Suppose S is a compact Hausdorff space

and E ⊂ C(S : R) is either a Stone lattice or a ring. Let ZE := {x ∈ S : φ(x), ∀φ ∈ E}.
If E separates points of S0 = S \ Z i.e. for any points s 6= t in S0 there is φ ∈ E such
that φ(s) 6= φ(t), then E = CZ (S) := {φ ∈ C(S) : φ(x) = 0, ∀x ∈ ZE } (When ZE = ∅,
CZE (S) = C(S)).
5.3. Stone–Weierstrass Theorem 123

Proof. As C(S) is a closed subspace of Bb (S) under uniform norm, it follows from Theorem
5.3.9 that E is a ring lattice closed under chopping contained in CZ (S). The space E ⊕ R :=
{φ + r : φ ∈ E, r ∈ R} is a ring lattice closed under chopping of continuous functions on S
and contains the constant functions. Indeed, for any φ, ψ ∈ E and r, s ∈ R with r ≤ s, E ⊕ R
(φ + r) ∧ (ψ + s) = (φ − ψ) ∧ (s − r) + ψ + r

Claim: E ⊕ R = CZ (S) ⊕ R. Let f ∈ CZ (S) ⊕ R. We will show that for any ε > 0 there
is φε ∈ E ⊕ R such kφε − f ku < ε. For any (t, s) ∈ S × S define a function φt,s ∈ C(S) as
follows:
(a) If t 6= s and either t ∈ S0 or s ∈ S0 , choose ψs,t ∈ E such that ψt,s (t) 6= ψt,s (s). Let
f (t) − f (s)
φt,s (x) := f (s) + (ψs,t (x) − ψt,s (s))
ψt,s (t) − ψt,s (s)
(b) If t = s or (t, s) ∈ ZE define ψt,s (x) ≡ f (t).
Clearly φt,s ∈ E ⊕ R, and
φt,s (t) = f (t), φst (s) = f (s).
For each t, the sets Ust = {φt,s > f − ε}, s ∈ S, form an open cover of S, for s ∈ Ust
for each s ∈ S. Hence, there exists a finite subcover {Ustk : k = 1, . . . , n}. The function
W
f t := nk=1 φt,sk belongs to E ⊕ R, f t > f − ε, and f t (t) = f (t). It follows that the
sets Vt = {f t < f + ε}, t ∈ S, form an open V covert of S. Hence, there is a finite subcover
{Vtj : j = 1, . . . , m}, and the function φε := m
j=1 f
k belongs to E ⊕R, and |f (x)−φ (x)| < ε
ε
for all x ∈ S. This shows that f ∈ E and completes the proof of the claim.

To conclude the proof, suppose f ∈ CZE (S). If ZE 6= ∅ and φn = ψn + rn ∈ E ⊕ R converges

n→∞
to f uniformly then, for any z ∈ Z, φn (z) = ψn (z) + rn = rn −−−→ f (z) = 0. Consequently,
{ψn : n ∈ N} ⊂ E converges uniformly to f . If ZE = ∅, then for any s ∈ S there is fs ∈ E
such that fs (s) > 1. The sets {fs > 1}, s ∈ S, is an open cover Vmof S and thus, there is
a finite subcover {fsj > 1}, j = 1, . . . , m. The function φ∗ := j=1 fsj ∈ E and φ∗ > 1.
Hence φ∗ ∧ 1 ≡ 1 ∈ E = E. This shows that when ZE = ∅, E ⊕ R = E and so, f ∈ E.
Therefore, E = CZE (S).
Corollary 5.3.11. Suppose E is a ring of real bounded functions on some set. Let φ ∈ E
and let M ≥ kφku . If f ∈ C([−M, M ] : R) and f (0) = 0 then, f ◦ φ ∈ E.

Proof. By redefining f as f (−M ) on [−M − 1, −M ] and as f (M ) on [M, M + 1], we may

assume that M = kφku + 1. The Stone–Weierstrass theorem implies that set of polynomials
P with P (0) = 0 is dense in {g ∈ C([−M, M ]) : g(0) = 0}. As f is uniformly continuous
on [−M, M ], for any ε > 0 there is 0 < δ < such that if −M ≤ t, s ≤ M and |t − s| < δ
then, |f (t) − f (s)| < ε/2. Let φ ∈ E such that kφ − φku < δ. Let P be a polynomial with
P (0) = 0 such that sup{|f (t) − P (t)| : t ∈ [−M, M ]} < ε/2. Since P ◦ φ ∈ E and
|f (φ(x)) − P (φ(x))| ≤ |f (φ(x)) − f (φ(x))| + |f (φ(x)) − P (φ(x))| < ε
124 5. Baire Category and Stone–Weierstrass theorem

for all x, we conclude that f ◦ φ ∈ E.

Corollary 5.3.12. Suppose E is a ring of bounded functions on some set. If ψ ∈ E + then,
there exists a sequence (φn ) ⊂ E+ that converges to ψ uniformly.
p √
Proof. The map f : t 7→ |t| is continuous and f√(0) = 0. Hence ψ ∈ E. As a result,
there exists a sequence (ψn ) ⊂ E that converges to ψ uniformly. The sequence (ψn2 ) ⊂ E+
converges to ψ uniformly.
Theorem 5.3.13. Any real continuous function on an interval [a, b] can be approximated
uniformly by a monotone sequence of polynomials. If E is a ring of bounded functions on a
u u
set S and 1 ∈ E , then any function in E can be approximated uniformly by a monotone
sequence in E ⊕ 1.

Proof. The space of polynomials E is a ring and separates points. Thus, of f is a continuous
function on [a, b], there is a sequence of polynomials pn such that kf − pn ku ≤ 4−n . The
sequence of polynomials Pn = pn − a2−n , where a > 0 is to be determined, converges
5
uniformly to f . Since kpn+1 − pn ku ≤ 4n+1 ,
a 5 1 5
Pn+1 (t) − Pn (t) ≥ n+1 − 2n+2 = n+1 a − n+1
2 2 2 2
5 1

For a = 5/2, we obtain Pn+1 −Pn ≥ 2n+2 1− 2n ≥ 0. Similarly, the sequence of polynomials
5
Qn = pn + 2n+1 decreases uniformly to f .
u 5
If φ ∈ E let φn ∈ E be a sequence such that kφ−φn ku ≤ 4−n . The sequences Φn = φn − 2n+1
5
and Ψ = ψn + 2n+1 uniformly increase and decrease to φ respectively.
Example 5.3.14. We that the function t 7→ t ∧ 1 can be uniformly approximated on any
interval [0, M ] by an increasing sequence of nonnegative polynomials gn (t) that vanishing
only at 0. Indeed, Theorem 5.3.13 provides a sequence of polynomials gn0 (t) that increase
uniformly to G(t) = 1 ∧ 1t on [0, M ]. For some n0 ∈ N large enough, the sequence {gn (t) =
tgn0 (t) : n ≥ n0 } satisfies the conclusion of the statement. Therefore, if E is a ring of
bounded functions and φ ∈ E+ , there is a nondecreasing sequence of functions ψn ∈ E+
which increases uniformly to 1 ∧ φ.
Example 5.3.15. For any interval IM = [−M, M ], M > 0, there exists a sequence of
polynomials qn (t) with |qn (t)| ≤ |qn+1 (t)| ≤ |t∧1| such that qn converges to t∧1 uniformly on
IM . Indeed, for H(t) = 1 ∧ t1+ , there is a sequence of polynomials h0n such that 0 ≤ h0n ր H
uniformly on IM . Then qn (t) = th0n (t) converges uniformly to t ∧ 1 and |qn (t)| ≤ |qn+1 (t)| ≤
|1 ∧ t| for all t ∈ IM .
Example 5.3.16. Consider
the continuous functions ψ(t) = t − t ∧ 1 = (t − 1)+ and
φ(t) = 1 ∧ a(t − 1)+ over [−M, M ]. The functions
1 1 1
ψ 0 (t) = 2 ψ(t) = − 2 1[1,M ] (t)
t t t
1 1 1 1
0
φ (t) = 2 φ(t) = a − 2 1[1,1+ 1 ] (t) + 2 1(1+ 1 ,M ] ,
t t t a t a
5.3. Stone–Weierstrass Theorem 125

being continuous on [−M, M ], are the uniform limit of nondecreasing and of nonincreasing
sequences of polynomials. Suppose the sequence of polynomials qn0 (t) and p0n (t) decrease
uniformly to ψ 0 and φ0 on [−M, M ] respectively. Then ψ and φ are the uniform limits
of nondecreasing sequences Q0n (t) = t2 qn0 (t) and Pn0 (t) = t2 p0n (t) on [−M, M ] respectively.
Similarly, if the sequences of polynomials qn1 (t) and qn1 (t) increase uniformly to ψ 0 and φ0
on [−M, M ] respectively, then Q1n (t) = t2 qn1 (t) and Pn1 (t) = t2 qn1 (t) increase uniformly to ψ
and φ on [−M, M ] respectively.

The Stone–Weierstrass theorem can be easily extended the setting of locally compact
Hausdorff topologies.
Theorem 5.3.17. Suppose (X, τ ) is a locally compact Hausdorff space and let E ⊂ C0 (X)
be a Stone lattice or ring. Define ZE = {x ∈ X : φ(x) = 0, ∀φ ∈ E}. Then, E = {φ ∈
C0 (X) : φ(z) = 0, ∀z ∈ ZE }.

Proof. The one point compactification (S, τ̂ ) = (X ∪ {∆}, τ̂ ) of X is a compact Hausdorff

space, and (X, τ ) is an open dense set in (S, τ̂ ). Furthermore, C0 (X) can be identified as
the collection of all C(S) that vanish at ∆. The extension Eb of E to S is a Stone lattice
or a ring of functions in C(S) whose common zeroes form the set ZbE = ZE ∪ {∆}. By the
Stone–Weierstrass theorem, Eb is the collection of all continuous functions on S that vanish
at ZE ∪ {∆}.
Corollary 5.3.18. If Ω is an open set in Rn , then C0 (Ω, R) is separable.

Proof. Let B be the collection of all closed open balls contained in Ω that have rational
centers and radii. For each B ∈ B let φB be a continuous function in Rn supported in
B with φB (B) = [0, 1]. The collection of polynomials on {φB : B ∈ B} with rational
coefficients is countable, separated points of Ω and is a ring E ⊂ C0 (Ω, R).
Theorem 5.3.19. (Weierstrass extension) Suppose E is a collection of bounded functions
on a set S, and that E is either a Stone lattice or a ring. Let S0 ⊂ S. A real function f on
S0 can be approximated uniformly on S0 by functions in E if and only if f is the restriction
to S0 of some function fe ∈ E.

Proof. Sufficiency is clear. To show necessity, suppose that φn ∈ E converges uniformly to f

in S0 . By taking a subsequence, we can assume without loss of generality that kf −φn kS0 ,u <
1/2n+1 so that kφn+1 −φn kS0 ,u < 1/2n . Clearly, the function ψn = −2−n ∧(φn+1 −φn )∨2−n ∈
E and coincides with φn+1 − φn in S0 . The series
∞
X
fe = φ1 + ψn
n=1

converges uniformly on S; therefore, fe ∈ E, and clearly fe = f on S0 .

Theorem 5.3.20. Let S be a compact set and let E be a complex ring of bounded complex–
valued continuous functions on S that is closed under complex conjugations, i.e., f ∈ E
126 5. Baire Category and Stone–Weierstrass theorem

u u
implies that f ∈ E. If E separates points, then either E = C(S, C), or E = {f ∈ C(S, C) :
f (z) = 0} when there is z ∈ S such that g(z) = 0 for all g ∈ E.

Proof. For any f ∈ E, its real and imaginary parts Re(f ) = 21 (f + f ), Im(f ) = 2i 1
(f − f )
are real functions in E. The set of real functions ER in E is a ring of real bounded functions
which separate points. By the Stone–Weierstrass theorem, ER = {f ∈ C(S, R) : f (z) = 0} if
there z is the common zero of E or ER = C(S, R) otherwise. In any case, one can approximate
the real and imaginary parts of an arbitrary complex continuous function separately.

5.4. General Stone–Weierstrass Theorem

For a given collection E of bounded functions on a set S, the collection of pseudometrics
D(E) = {dφ : φ ∈ E} given by
dφ (x, y) = |φ(x) − φ(y)|
defines a D(E)–uniformity on S (See Appendix 2.7 for relevant results). A function f : S →
R is E–uniformly continuous iff it is D(E)–uniformly continuous (it is assumed that R has
the usual metric ρ(x, y) = |x − y|).
Theorem 5.4.1. (General Stone–Weierstrass theorem) Let E be a either a Stone lattice or
a ring of real bounded functions on S. A real–valued function f is E–uniformly continuous
u
iff f is the sum of a constant and a function in E .

Proof. Consider the product space

Y
Π= [−kφ|u , kφ|u ]
φ∈E

This space is compact Hausdorff and the projections PE = {pφ : φ ∈ E} define the uniformity
d˜ψ ({xφ : φ ∈ E}, {yφ : φ ∈ E}) = |xψ − yψ |.
The topology associated with this uniformity is the same as the product topology. The map
J : S → Π given by J : x 7→ {φ(x) : φ ∈ E} is continuous on (S, τ (D(E))) and KS = J(S)
(the closure of J(S) in Π) being a closed subset of a compact set, is compact.

Since J(x) = J(y) iff dφ (x, y) = 0 for all φ ∈ E, if f is E–uniformly continuous function
and J(x) = J(y), then f (x) = f (y). Hence, there is a unique map f ′ : J(S) → R such
that f = f ′ ◦ J. Moreover, the E–uniform continuity of f implies the PE –uniform continuity
of f ′ and by Theorem 2.7.4, f ′ admits a unique continuous extension fˆ on KS . For each
φ ∈ S let φ̂ be the extension of φ′ to KS (notice that φ′ is the projection pφ ). The collection
Ê = {φ̂ : φ ∈ E} is a Stone lattice or a ring of continuous functions on KS , as the case
might be, which separates points of KS .

If there is z ∈ S at which all φ̂ vanish, then the Stone–Weierstrass theorem shows that
u u
fˆ− fˆ(z) ∈ Ê . Hence f = fˆ◦J is the sum of the constant fˆ(z) and a function in E . If there
u u
is no such z, then Stone–Weierstrass theorem shows that fˆ ∈ Ê , so that f = fˆ◦J ∈ E .
5.5. Monotone classes of functions 127

5.5. Monotone classes of functions

Definition 5.5.1. Let Ω be an arbitrary nonempty set.
(i) A collection V ⊂ RΩ is a monotone class (resp. bounded monotone class)
if it is closed under pointwise limits of monotone convergent (resp. monotone
bounded) sequences.
(ii) A collection V of bounded complex or real valued functions is a bounded class
if it is closed under pointwise limits of bounded convergent sequences, that is,
whenever {fn } ⊂ V, sup kfn ku < ∞ and f (x) = limn f (x) exists for all x, then
f ∈ V.
(iii) A collection M ⊂ RΩ is a real multiplicative class if it is closed under finite
multiplication.
(iv) A collection M ⊂ CΩ of complex valued functions is a complex multiplicative
class if it closed under finite multiplication and under complex conjugation.
Theorem 5.5.2. (Real monotone class theorem) Suppose V is a real vector space of func-
tions (resp. bounded functions) containing the constant functions and that V is also a
monotone (resp. a bounded monotone) class. If M ⊂ V is a multiplicative class of bounded
functions, then V contains all real valued σ(M)–measurable functions.

Proof. The collection A of all linear combinations of functions in M ∪ {1} is an algebra of

bounded functions contained in V. By Theorem 5.3.9, its uniform closure A is an algebra
lattice.

We claim that A ⊂ V. Indeed, for φ ∈ A, let (φn : n ∈ N) ⊂ A such that

kφn − φku → 0
kφn+1 − φn ku < 2−n−1 .
m
Then, φen = φn − 21n ∈ A, φen ց φ, φ̌n = φn + 21n ∈ A, and φ̌n ր φ. The intersection A of
all monotone (resp. bounded monotone) classes containing A is again a monotone (resp. a
m
bounded monotone) class and clearly A ⊂ V.
m
We claim that A is a real algebra lattice. Denote by ⋄ any of the operations +, ∨, ∧ and
let
m m
E ⋄ = {f ∈ A : f ⋄ g ∈ A , ∀ g ∈ A}
As A is an algebra lattice, A ⊂ E ⋄ . It is straighforward to check that E ⋄ is a monotone
m
(resp. bounded monotone) class, and so E ⋄ = A . Similarly, let
m m m
E ⋆ = {f ∈ A : f ⋄ g ∈ A ,∀g ∈ A }
Clearly, A ⊂ E ⋆ . It is easy to check that E ⋆ is a monotone (resp. bounded monotone) class.
m
We conclude that E ⋆ = A .
128 5. Baire Category and Stone–Weierstrass theorem

m
It remains to show that A is closed under multiplication. Since f · g = f · g+ − f · g− , it
m m m
is enough to show that f · g ∈ A whenever f ∈ A and 0 ≤ g ∈ A . Define
m m
E ∗ = {f ∈ A : f · g ∈ A , ∀ 0 ≤ g ∈ A}.
m
Clearly, A ⊂ E ∗ and E ∗ is a monotone (resp. bounded monotone) class. Hence, E ∗ = A
m m
and f · g = f · g+ − f · g− ∈ A for all f ∈ A and g ∈ A. Let
m m m
E • = {f ∈ A : f · g ∈ A , ∀ 0 ≤ g ∈ A }.
m
As A ⊂ E • , and E • is a monotone (resp. bounded monotone) class, we have that E • = A .
m
Notice that A is also closed under taking limits of W
convergent
V (resp. bounded convergent)
m m
sequences, for if fn ∈ A converges to f , then f = n m≥n fm ∈ A .
m
Since A is an algebra closed under limits of convergent sequences, the collection of sets
m
1A ∈ A forms a σ–algebra. For any f ∈ M, n ∈ N, and r ∈ R, we have that hn =
m
(n(f − r)+ ) ∧ 1 ∈ A and hn ≤ hn+1 . Hence, limn hn = 1{f >r} ∈ A from whence it follows
m
that σ(M) ⊂ A . Therefore, the family of all real valued σ(M)–measurable functions is
m
contained in A ⊂ V.
Theorem 5.5.3. (Complex bounded class theorem) Suppose V is a complex vector space
of complex valued functions containing the constants, and that V is also a complex bounded
class. If M ⊂ V is complex multiplicative class, then V contains the collection of all bounded
complex–valued σ(M)–measurable functions.

Proof. The family of all complex linear combinations of functions in M ∪ {1} is a complex
algebra A of bounded functions in V which is closed under complex conjugation. Hence,
the real valued functions in A form a real algebra Ar of bounded functions contained in the
collection Vr of real valued bounded functions in V. Clearly, Vr is a real vector space and a
bounded monotone class. As in the real monotone class theorem, we conclude that the space
of bounded real valued σ(M)–measurable functions is contained in Vr . The conclusion of
the Theorem follows immediately.

One important application of the functional monotone class theorems is to problem of

determining whether two finite measures on B(Rd ) are the same, see Theorem 15.1.5.

5.6. Sequential closure and Baire functions

Consider a metric space (S, d). A family E ⊂ S Ω is sequentially closed if it contains
the pointwise limit of any pointwise convergent sequence in E. The instersection of any
collection of sequentially closed families is clearly sequentially closed.
Definition 5.6.1. For any E ⊂ S Ω , the intersection of all sequentially closed families in
S Ω containing E is called the sequential closure of E, and will be denoted by ESΣ . Any
function f ∈ ERΣ is said to be a E–Baire function, and any set A ⊂ Ω with 1A ∈ ERΣ is
said to be a E–Baire set.
5.6. Sequential closure and Baire functions 129

Remark 5.6.2. Recall that R = R ∪ {±∞} is metrizable by the distance d(x, y) =

| arctan(x) − arctan(y)|, where arctan(±∞) = ± π2 . In this way, we consider ERΣ .

Example 5.6.3. The support of a real valued function f on a topological (S, τ ) is defined
as supp(f ) = {f 6= 0}. The space of all real continuous functions with compact support
on S is denoted by C00 (S). A continuous real function f is said to vanish at infinity if
|f |−1 ([ε, ∞)) is compact in S for all ε > 0. The space of all real continuous functions on
S that vanish at infinity is denoted by C0 (S). The space of all real bounded continuous
functions on S is denoted as Cb (S). Evidently,
C00 (S) ⊂ C0 (S) ⊂ Cb (S).
u
Moreover, under the uniform norm topology on Cb (S), C00 (S) = C0 (S). Let M (S) denote
the space of real valued Borel measurable functions in S. In general,
Σ Σ Σ Σ
(5.5) C00 (S) ⊂ C0 (S) ⊂ Cb (S) ⊂ C(S) ⊂ M (S).
Σ
The family Cb (S) is known as the space of Baire functions and its sets are called
Baire sets. If S is locally compact, second countable Hausdorff, the families of sequential
Σ
limits in (5.5) coincide; if S is a metric space, Cb (S) = M (S). (See Exercise 5.8.7.)

Lemma 5.6.4. Suppose (S, d) is a metric space and let p ∈ S be fixed. For any nonempty
collection E ⊂ S Ω
Σ
(5.6) ESΣ = {f ∈ ESΣ : ∃Ef ⊂ E countable with f ∈ Ef S }
[
(5.7) ESΣ = {f ∈ ESΣ : ∃(φn : n ∈ N) ⊂ E with {f 6= p} ⊂ {φn 6= p}}.
n∈N

Proof. Let A and B the sets on the right hand side of (5.6) and (5.7) respectively. Clearly
E ⊂ A ∩ B. Suppose the sequences (fn ) ⊂ A, (gn ) ⊂ B converge poitwise to f and g
respectively.
S
For each n ∈ N let En ⊂ E be a countable collection with fn ∈ (En )ΣS . Then E∗ = n E fn
Σ Σ
is countable and (fn ) ⊂ E∗ S . Hence f ∈ E∗ S , and so f ∈ A. This shows that A is
sequentially closed.
S
S ⊂ n {gn 6= p}. For each n ∈ N there is aS
As S \ {p} is open in S, {g 6= p} sequence (φn,m :
m ∈ N) ⊂ E with {gn 6= p} ⊂ m {φn,m 6= p}. Then g ∈ B, for {g 6= p} ⊂ n,m {φn,m 6= p}.
This shows that B is sequentially closed.
Lemma 5.6.5. Let E ⊂ RΩ .
(i) If E is closed under +, −, ·, ∨, ∧, ∧1 or | |, then so is ERΣ .
If E ⊂ Bb (Ω) is a Stone lattice or a ring then,
(ii) ERΣ is a ring lattice closed under chopping.
130 5. Baire Category and Stone–Weierstrass theorem

(iii) The collection R(E) of sets in ERΣ is the same as the σ–ring, Rσ (E), generated by
all sets of the form φ−1 (I) where φ ∈ E and I is any interval in R \ {0}.

Proof. (i) Let ⋄ denote any of the operations in {+, −, ·, ∨, ∧} and define
E ⋄ = {f ∈ ERΣ : f ⋄ g ∈ ERΣ , g ⋄ f ∈ ERΣ , ∀ g ∈ E}.
If E is closed under ⋄ then E ⊂ E ⋄ . It is easy to check that E ⋄ is sequentially closed. Hence
E ⋄ = ERΣ . Define
E⋄⋄ = {f ∈ ERΣ : f ⋄ g ∈ ERΣ , g ⋄ f ∈ ERΣ , ∀ g ∈ ERΣ }.
Then, E ⊂ E⋄⋄ . It is easy to check that E⋄⋄ is sequentially closed. Hence E⋄⋄ = ERΣ . A similar
proof shows that ERΣ is closed under ∧1 or | |, when E is closed under one or the other
operation respectively.
u u u
(ii) As E ⊂ ERΣ , (E )Σ Σ
R = ER . By Theorem 5.3.9 E is a ring lattice closed under chopping.
The conclusion follows from (i).
W
(iii): As 1A\B = 1A − 1A ∧ 1B and 1Sn An = n 1An , R(E) is closed under proper differences
and countable unions, and so a σ–ring. Since 1{f >1} = limn 1∧(n(f −f ∧1), {f > r} ∈ R(E)
T S
for any f ∈ ERΣ and any r > 0. Thus {f ≥ r} = n f > r(1 − n1 ) and {f > 0} = n {f >
1 Σ
n } belong to ER . Replacing f by −f shows that {f < −r}, {f ≤ −r} and {f < 0} also
belong to ER . Consequently f −1 (I) ∈ R(E) for any f ∈ ERΣ and any interval I ⊂ R \ {0};
Σ

therefore Rσ (E) ⊂ R(E).

Let E ∗ denote the collection of real–valued functions f such that {f > r} ∈ Rσ (E) for all
r > 0. It follows that for any f ∈ E ∗ and any interval I contained in R\{0}, f −1 (I) ∈ Rσ (E).
Thus, for any f ∈ E ∗ the sequences (s+ −
n ) and (sn ) defined by
∞
X
s+
n =
k
2n 1{k<2n f ≤k+1}
k=0
X∞
s−
n =
k
2n 1{k<−2n f ≤k+1}
k=0

belong to ERΣ .
As s+n → f+ and s−
→ f− pointwise, f ∈ ERΣ . We claim that E ∗ is sequentially
n
closed. Indeed, if E ∗ ∋ fn → f pointwise then, as Rσ (E) is closed under countable unions
and intersections,
[[ \
{f > r} = {fn > r + k1 } ∈ Rσ (E).
k N n≥N

Therefore, ERΣ ⊂ E∗ ⊂ ERΣ .

Theorem 5.6.6. Suppose E ⊂ Bb (Ω) is a Stone lattice or a ring. Then, ERΣ is an algebra
iff there is a sequence {φn } ⊂ E such that supn φn > 0 on Ω. In either case, R(E) = σ(E),
and ERΣ coincide with the collection MR(E) of all real σ(E)–measurable fuctions.
5.7. Measurable selection theorem 131

W
Proof. Suppose {φn } ⊂ E satisfies ψ = supn φn > 0. Then, 1 = 1{ψ>0} = n 1{φn >0} ∈ ERΣ .
Hence ERΣ is an algebra; consequently, R(E) is a σ–algebra.
S
If ERΣ is an algebra, then 1 ∈ ERΣ . Hence there is a sequence {φn } ⊂ E such that n {φn 6=
0} ⊃ {1 6= 0} = Ω. If E is a vector lattice then |φn | ∈ E, and supn |φn | > 0 on Ω. If E
is merely a ring then, for each n there is a sequence (ψm,n ) ⊂ E)+ such that ψm,n ր |φn |
uniformly. Therefore ψ = supm,n ψn,m > 0 on Ω. The last statement follows directly from
Lemma 5.6.5(iii).
Example 5.6.7. Suppose S is a topological space. The collection of Baire sets (sets in
Σ
Cb (S) ) is the σ–algebra generated by Cb (S), and we will refered to it as the Baire
σ–algebra. If S is metrizable, then the family of Baire sets coincides with Borel σ-algebra.

5.7. Measurable selection theorem

A multivalued function F from X to Y is a relation F ⊂ X × Y such that for any x ∈ X,
there is y ∈ Y with (x, y) ∈ F . F induces a function, which we also denote by F , from X
to P(X) \ {∅} given by x 7→ F (x) = {y ∈ Y ; (x, y) ∈ F }.
Definition 5.7.1. Let (X, A ) be a measurable space and (Y, B(Y )) be a metric space with
the Borel σ–algebra. Suppose F is a multivalued function from X to Y such that F (x) is
a nonempty subset in Y .
(i) F is called weakly measurable, or simply measurable, if
(5.8) {x ∈ X : F (x) ∩ U 6= ∅} ∈ A
for all open set U ⊂ Y .
(i) F is called strongly measurable if
(5.9) {x ∈ X : F (x) ∩ C 6= ∅} ∈ A
for all closed set C ⊂ Y .
Remark 5.7.2. Since each open subset in a metric space is an Fσ set (countable union of
closed sets), any strongly measurable relation is weakly measurable.
Observe that if F is a function, then weak and strong measurability coincide with the usual
notion of measurability of functions.
Definition 5.7.3. Let (X, A ) and (Y, B) be measurable spaces and suppose F ⊂ X × Y is
a multivalued function. A measurable function f : X −→ Y such that f (x) ∈ F (x) is said
to be a measurable selection or a selector .
Theorem 5.7.4. (Kuratowski–Ryll–Nardzewski) Let (X, A ) be a measurable space and Y
be a separable metric space. For any weakly measurable closed valued multivalued function
F ⊂ X × Y , i.e. F (x) closed in Y for all x ∈ X, there exists a measurable selection f of F .

Proof. Let d be a complete metric with d < 1 that generates the topology in Y and let
D = (yn ) ⊂ Y be a dense sequence. We will show there is a sequence of measurable
functions fn : X → Y such that
132 5. Baire Category and Stone–Weierstrass theorem

(i) d(fn (x), F (x)) < 2−n

(ii) d(fn (x), fn+1 (x)) < 2−n .
Assuming this, we have from (ii) that fn is a uniformly Cauchy sequence; hence, it converges
uniformly to a measurable function f ; by (ii), f (x) ∈ F (x) for all x ∈ X.

We start by defining f0 (x) ≡ y0 so that (i) holds. Proceeding by induction, assume that
fn has been defined so that (i) holds. Since F is weak measurable and fn is measurable, it
follows that
\
Ak = fn−1 B(yk ; 21n ) 1
x ∈ X : F (x) ∩ B(yk ; 2n+1 ) 6= ∅ ∈ A

for each yk ∈ D. Given x ∈ X, let s ∈ F (x) be such that d(s, fn (x)) < 2−n . Since D is
dense, there is yk ∈ D such that d(s, yk ) < min 2−n−1 , 2−n − d(s, fn (x)) ; consequently,
S
d(yk , F (x)) < 2−n−1 , x ∈ Ak and SAk = X. S Let {Bk } ⊂ A be a sequence of pairwise
disjoint sets such that Bk ⊂ Ak and k Bk = Ak = X. By letting fn+1 (x) = yk whenever
x ∈ Bk , we obtain a measurable function fn+1 satisfying (i) and (ii).

5.8. Exercises
Exercise 5.8.1. Let V be a vector space over R.
(i) If (V, ≤) is a partially ordered vector space, show that C = {x ∈ V : x ≥ 0} is a
convex pointed cone, i.e.
(a) αx ∈ C or all α ≥ 0 and x ∈ C,
(b) αx + (1 − α)y ∈ C for all 0 ≤ α ≤ 1 and x, y ∈ C,
(c) C ∩ (−C) = {0}.
(ii) Conversely, if C is a convex pointed cone then the relation x ≤ y iff y − x ∈ C
defines a vector order on V with {x ≥ 0} = C.
(iii) Show that a partially ordered vector space V is a vector lattice iff for any x, y ∈ V
there exits w ∈ V , denoted by x ∧ y such that w ≤ x, w ≤ y and v ≤ w whenever
v ≤ x and v ≤ y.
Exercise 5.8.2. Show that M majorizes V iff M minorizes V , i.e., for any x ∈ V , there is
y ∈ M with y ≤ x.
Exercise 5.8.3. Let X and Y be locally compact Hausdorff topological P spaces. Show that
the ring E ⊂ C00 (X × Y ) of all functions of the form f (x, y) = nk=1 φk (x)ψk (y) where
φk ∈ C00 (X), ψk ∈ C00 (Y ), and n ∈ N, is dense in (C0 (X × Y ), k ku ). (Hint: C00 (X × Y ) =
C0 (X × Y ). Show that any g ∈ C00 (X × Y ) can be approximated uniformly by functions in
E.)
Exercise 5.8.4. Show that the collection of trigonometric polynomials
n
X
p(θ) = ck eiθk
k=−m
5.8. Exercises 133

is uniformly dense in the set of complex periodic continuous functions in [−π, π]. Show that
the set of real trigonometric functions
Xn
g(θ) = ak cos(kθ) + bk sin(kθ)
k=0
is uniformly dense is the set of real periodic continuous functions in [−π, π].

Σ Σ
Exercise 5.8.5. Show that if E ′ ⊂ E ⊂ S Ω then, (E ′ )Σ Σ
S ⊂ ES and ES S = ES .
Σ

Exercise 5.8.6. Let E ⊂ S Ω . Show that

Σ
(a) If d1 and d2 are two equivalent metrics in S, then E(S,d Σ
= E(S,d .
1) 2)

(b) If S is a nonemepty subspace of a metric space (T, d), then ESΣ ⊂ ETΣ ∩ S Ω .
Exercise 5.8.7. Let (S, τ ) be a topological space.
(a) Show that (5.5) holds.
Σ
(b) If S is metrizable, show that Cb (S) = M (S).
(c) If S is a locally compact, second countable Hausdorff space, show all classes in (5.5)
coincide. (Hint: Show that for any compact set K and open set U , there are
sequences fn , gn ∈ C00 (S) such that fn ց 1K and gn ր 1U .)
(d) For the Euclidean space Rd , show that the sequential closure of the set of polyno-
mial in Rn is M (Rd ).
Exercise 5.8.8. Suppose E ⊂ Bb (Ω) is a Stone lattice. Let f ∈ ERΣ . Show that
(a) (f ∧ r) ∨ (−r) ∈ ERΣ for any r > 0.
(b) The sets {f > r} and {|f | = ∞} belong to ERΣ for all r > 0.
(c) For any set A ∈ ERΣ , f 1Ac ∈ ERΣ , and so f 1{|f |6=∞} ∈ ERΣ .
Chapter 6

Integration: functional
approach

In this Section we discuss and approach to integration (Daniell integration) that does not use
any measure theoretic considerations. Daniell’s direct and elegant approach to integration
exploits the continuity properties of a linear functional (elementary integral) defined on a
set of integrands which has a minimal required algebraic and/or order structure. Then,
through the introduction of a seminorm, it extends the elementary integral to the largest
possible space of functions so that linearity and dominated convergence hold. Measurability
is in turn defined in terms of local properties of integrable functions. The Carathéodory’s
cut condition (3.3) of measurability is obtained as a consequence of the extension, and a
measure theoretic representation follows as a result.

6.1. The Riemann integral revisited

To motivate our discussion we consider an alternative construction of the Riemann integral
that is equivalent to the one discussed in Section 4.5, but based on simple properties of
seminormed spaces. This example contains the main ides of the functional approach to
integration.
Let E(R) be the collection ofPstep functions on the real line and let I be the Riemann
integral on E(R), that is, if φ = nj=1 αj 1(aj ,bj ] where αj ∈ R and −∞ < aj < bj < ∞, then
n
X
I(φ) = αj (bj − aj ).
j=1

We first make the following observations.

(a) The space E(R) ⊂ Bb (R) is a ring lattice closed under chopping and self–confined,
that is, for any φ ∈ E(R) there is ψ ∈ E(R) with 1{φ6=0} ≤ ψ.

135
136 6. Integration: functional approach

(b) I is a positive linear functional on E(R).

For any numeric function f on R define the lower Riemann–Jordan and the upper
Riemann–Jordan integrals as
(6.1) I# (f ) := sup{I(φ) : φ ∈ E(R), φ ≤ f }
(6.2) I # (f ) := inf{I(φ) : φ ∈ E(R), f ≤ φ}
respectively.
Definition 6.1.1. We say that f is Riemann–integrable if
(6.3) I# (f ) = I # (f )
The collection of all Riemann–integrable functions is denoted by L# .

The following result summarizes the properties of the upper Riemann–Jordan integral.
Theorem 6.1.2. The upper integral I # satisfies the following properties:
(i) (positive homogeneity) I # (rf ) = rI # (f ) for any scalar r ≥ 0 and any f ∈ F # .
(ii) (subaddtivity) I # (f + g) ≤ I # (f ) + I # (g) for any f, g ∈ F # .
(iii) (increasing monotonicity) If f, g ∈ F # and f ≤ g, then I # (f ) ≤ I # (g).
(iv) (majorization) For any φ ∈ E(R), |I(φ)| ≤ I # (|φ|).
The lower integral I# is positive homogeneous and monotone increasing and satisfies
(ii)’ (superadditivity) I# (f + g) ≥ I# (f ) + I# (g) for any f, g ∈ F # .

Proof. (i) is obvious by definition of I # .

(ii) For any ε > 0 there exist φ, ψ ∈ E(R) such that f ≤ φ, g ≤ ψ and
ε ε
I(φ) 0 is arbitrary, (ii) follows.

(iii) If f ≤ g, then {φ ∈ E(R) : g ≤ φ} ⊂ {φ ∈ E(R) : f ≤ φ}. The result follows

immediately.

(iv) As E(R) is a lattice, if φ ∈ E(R) then ±|φ| ∈ E(R). As I is positive, the result follows
from I(|φ| ± φ) ≥ 0.

For the last statement we can follows similar arguments as above. A more direct proof
however, can be obtained by noticing that I# (f ) = −I # (−f ).
Corollary 6.1.3. For any f, g ∈ F # , |I # (f ) − I # (g)| ≤ I # (|f − g|).
6.1. The Riemann integral revisited 137

Proof. Subadditivity and increasing monotonicity implies that I # (f ) ≤ I # (g + |f − g|) ≤

I # (g) + I # (|f − g|). Exchanging the roles of f and f gives I # (g) ≤ I # (f ) + I # (|f − g|).
By putting these inequalities together we obtain the desired result.
Definition 6.1.4. The map k k# : F # → R+ given by kf k# := I # (|f |) is called the
Jordan–seminorm with respect to (E(R), I).

Theorem 6.1.2 shows that k k# is a solid seminorm on F, that is, k k is a seminorm on

L# and, if f, g ∈ F# satisfy |f | ≤ |g|, then kf k# ≤ kgk# . The next result shows that L#
is the closure of E(R) under the seminorm k k# .
Theorem 6.1.5. f ∈ L# iff there is a sequence (φn : n ∈ N) ⊂ E(R) such that
lim kf − φn k# = 0.
n→∞

Moreover, if (ψn : n ∈ N) ⊂ E(R) converges to f in k k# , then I # (f ) = limn I(ψn ).

Proof. Assume f ∈ L# . Then, for any n ∈ N there φn , ψn ∈ E(R) such that φn ≤ f ≤ ψn

and
1 1
I# (f ) − < I(φn ) ≤ I# (f ) = I # (f ) ≤ I(ψn ) < I # (f ) +
n n
As a consequence,
2
kφn − f k# = I # (ψn − f ) ≤ I(ψn − φn ) ≤ → 0
n

Conversely, suppose (φn : n ∈ N) ⊂ E(R) converges to f in k k# . For any ε > 0, there

is N large enough so that kf − φN k# < 2ε . Thus, there is ψ ∈ E(R) such that |f − φN | ≤ ψ
and I(ψ) < 2ε . Since
φN − ψ ≤ f ≤ φN + ψ
and φN ± φ ∈ E(R) we have that
I(φN − ψ) ≤ I# (f ) ≤ I # (f ) ≤ I(φN + ψ).
As a consequence, 0 ≤ I # (f ) − I# (f ) ≤ 2I(ψ) < ε. As ε > 0 is arbitrary, we conclude that
I# (f ) = I # (f ), that is, f ∈ L# .
To prove the last statement notice that
|I # (f ) − I(φn )| = |I # (f ) − I # (φn )| ≤ I # (|f − φn |) = kf − φn k# .
Therefore limn I(φn ) = I # (f ).

The following is a general result for linear functionals on seminorm spaces.

Lemma 6.1.6. Suppose (X, k k) is a seminorm space and let V be a linear subspace of X.
Suppose Λ is a linear functional on V such that |Λv| ≤ kvk for all v ∈ V . Then, there is a
unique linear extension Λ of Λ to the closure V of V in (X, k k) such that |Λv| ≤ kvk for
all v ∈ V .
138 6. Integration: functional approach

Proof. For any v ∈ V there exists a sequence (vn : n ∈ N) ⊂ V such that limn kv − vn k = 0.
As |Λ(vn ) − Λ(vm )| ≤ kvn − vm k it follows that Λvn converges. If (un : n ∈ N) ⊂ v also
converges to v in k k, then Λvn − Λun | ≤ kvn − un k ≤ kvn − vk + kun − vk. This shows
that Λun converges and that limn Λun = limn Λvn . We define Λv := limn Λvn . Clearly Λ is
a linear extension Λ to V and |Λv| = limn |Λvn | ≤ limn kvn k = kvk.

To prove uniqueness, suppose Λe is a linear extension of Λ to V dominated by k k. For any

e
v ∈ V let (vn : n ∈ N) ⊂ V with kvn −vk → 0. Then |Λv−Λv e
n | = |Λ(v−vn )| ≤ kv−vn k → 0.
This shows that Λve = limn Λvn = Λv.
Corollary 6.1.7. L# is the closure of E(R) in (F # , k k# ), and it is also a ring lattice
closed under chopping. There exists a unique linear extension of I onto L# and it is given
by I(f ) := limn I(φn ), where (φn : n ∈ N) ⊂ E(R) converges to f in k k# .

Proof. The first claim is a restatement of Theorem 6.1.5.

To prove that L# is a ring lattice closed under chopping. This follows from solidity of the
seminorm k k# and the inequalities

|f | − |φ| ≤ |f − φ|

f ∧ 1 − φ ∧ 1| ≤ |f − φ|
|f g − φψ| ≤ kf ku |g − ψ| + kψku |f − φ|

The last statement follows from Lemma 6.1.6.

The observation that E is self–confined has no bearing on the algebraic and order struc-
ture of L# . It has an effect in estimating the limit of the integral of sequences of Riemann
integrable functions that converge uniformly.
Theorem 6.1.8. (Uniform dominated convergence theorem) Suppose the sequence (fn :
n ∈ N) ⊂ L# converges uniformly to some function f . If |fn | ≤ g for all n ∈ N and some
function g ∈ L# , then f ∈ L# , kfn − f k# → 0 and limn I(fn ) = I(f ).

Proof. As g ∈ L# , g is dominated above by a step function φ. Let ψ be a step function

that 1{φ6=0} ≤ ψ. Hence, |f −fn | ≤ kf −fn ku ψ. The conclusion follows from |I(f )−I(fn )| ≤
kf − fn k# ≤ kf − fn ku I(ψ) → 0.

The collection of L# of integrable functions obtained through the Jordan seminorm is

quite limited since, by Theorem 4.5.5, it only contains functions that are bounded with
compact support and continuous Lebesgue–a.s. The extension of the functional I relied
completely on the algebraic and order structure of the set of step functions E(R). In fact,
the Riemann integral procudure can be put in a more general setting.
The extension of I to a larger class of functions (the space of Lebesgue integrable
functions L1 (R, λ)) in which dominated convergence holds, will depend an an addtional
property of I. Without using the results of Chapter 4, we prove the following result.
6.2. The Elementary integral 139

Lemma 6.1.9. If (φn : n ∈ N) ⊂ E and φn ց 0, then I(φn ) ց 0.

Proof. Suppose [−m, m] contains the support of φ1 , and hence of all φn . For each n let
{xjn : 1 ≤ j ≤, kn } be the points of discontinuity of φn . For each ε > 0 and n ∈ N let
kn
n [
[
Bn = (xjℓ − ε2−j−ℓ−1 , xjℓ + ε2−j−ℓ−1 )
ℓ=1 j=1
kn
n [
[
en =
B (xjℓ − ε2−j−ℓ−1 , xjℓ + ε2−j−ℓ−1 ].
ℓ=1 j=1

Bn is an open set containing the points of discontinuities of {φj : 1 ≤ j ≤ n} and Bn ⊂ Bn+1 .

It follows that Un = Bn ∪{φn < ε} is an open for if x ∈ {φn < ε}\Bn , then φn is continuous
at x and thus constant in an open neighborhood of x; while S if x ∈ Bn , the Bn ⊂ Un is itself
an open neighborhood of x. Since φn ց 0 pointwise, R = n Un . By compactness, there
exists N ∈ N such that
[−m, m] ⊂ Un ⊂ {φn < ε} ∪ B en , n ≥ N.

As R \ Ben ⊂ {φn < ε} \ B̃n ∪ R \ [−m, m] for all n ≥ N , φn < ε in R \ B en whenever n ≥ N .
1Ben ∈ E and
X kn
n X ∞
X ∞
X
−ℓ−j −ℓ
I 1Ben ≤ 2 ε≤ε 2 2−j = ε.
ℓ j=1 ℓ=1 j=1

en , then 1Gn ∈ E and φn = 1Gn φn + 1 e φn ; therefore,

If Gn = (−m, m] \ B Bn

I(φn ) = I 1Gn φn + I 1Ben φn
≤ εI(1Gn ) + kφn kI(1Ben ) ≤ ε(M + kφ1 ku ).
This shows that I(φn ) ց 0.

Lemma 6.1.9 is a modest version of monotone convergence. Not only does it use of
the algebraic structure of the space of step functions, but also it takes advantage of the
topological properties of the real line.

6.2. The Elementary integral

The following definition captures the most important aspects of the construction of an
integral.
Definition 6.2.1. Suppose E ⊂ Bb (Ω) is a Stone lattice or a ring. A real valued linear
functional I on E is said to be an elementary integral .
(i) I is a δ–continuous if limn I(φn ) = 0 for any sequence (φn ) ⊂ E with φn ց 0.
(ii) I is positive if I(φ) ≥ 0 whenever φ ∈ E+ .
R
We often use the symbol φdI to denote I(φ).
140 6. Integration: functional approach

It can be shown (Exercise 6.8.3) that δ–continuity is equivalent to the following prop-
erties
(a) (σ–continuity) If φn ≤ φn+1 ∈ E and supn φn ∈ E, then limn I(φn ) = I(supn φn ).
P P P
(b) (σ–additivity) If 0 ≤ ϕn ∈ E and n ϕn ∈ E, then I( n ϕn ) = n I(ϕn ).

Example 6.2.2. (1) Suppose that R is a ring of subsets of Ω and µ : R → [0, ∞) is a

σ–additivePnfunction. Let E be
Pnthe collection of all real simple functions. The functional
I : φ = k=1 ak 1{φ=ak } 7→ k=1 ak µ({φ = ak }) is an positive δ–continuous elementary
integral.
(2) Suppose Ω = R and E is the set of step functions. For each f ∈ E, let I(f ) be its
Riemann integral. I is a positive σ–continuous elementary integral.
(3) Suppose Ω is a l.c.H. space, E = C00 (Ω). Any positive linear functional I on E is a
positive σ–continuous elementary integral.

Remark 6.2.3. Not all elementary integrals are σ–continuous. Let Ω = N. The space c of
all convergent sequences in R is an algebra lattice. The positive linear functional
I(φ) = lim φ(n), φ∈c
n

defines a positive elementary integral on E = c which is not σ–additive. To check the last
statement, consider the sequence {ϕm = 1{1,...,m} : m ∈ N}. Then ϕm ր 1 ∈ E, however
0 = limm I(φm ) < 1 = I(1).

Remark 6.2.4. Exercise 6.8.4 shows why measure theory considers rings of sets as those
that can be measured so that the measure is additive.

6.3. Daniell’s mean

The up-and-down procedure used to build the Riemann integral produces a small and limited
class of integrable functions since it only relies on the algebraic structure and the space of
elementary function E and the addtivity of the elementary integral I. In this section, we
will consider elementary integrals I on E are positive and σ–continuous. We will introduce
a modified up–and–down procedure that first extends the integral to increasing limits of
elementary functions, and then to any function by going down over countable suprema of
elementary functions. This approach produces a large class of integrable functions which
contains not only bounded functions. The success of this approach depends entirely on
σ–continuity the elementary integral.
Let E ↑ denote the collection of all real–extended functions h such that are suprema of
sequences in E, that is, h ∈ E ↑ if h = sup
W n φn for some {φn } ⊂ E. If E is a lattice, we can
replace φn by the increasing sequence nk=1 φk .

Lemma 6.3.1. For any vector space E, E ↑ is closed under addition, multiplication by
nonnegative scalars and taking countable suprema. If in addition E is a vector lattice, then
E ↑ is also closed under taking finite infima.
6.3. Daniell’s mean 141

Proof. Suppose E ↑ ∋ hn and h = supn hn . For each n ∈ N let {ψn,k } ⊂ E such that
hn = supk φn,k . Then, h1 + h2 = supn,m (φ1,n + φ2,m ), rh1 = supn rh1,n for any r ≥ 0 and
h = supn,k φn,k . The first statement follows as each of the collections {φ1,n +φ2,k : n, k ∈ N},
{rh1,n : n ∈ N} and {φn,k : n, k ∈ N} is countable.
If E is a vector lattice, then ψn,m = φ1,n ∧ φ2,m ∈ E and h1 ∧ h2 = supn,m ψn,m . The second
statement follows as {ψnm } is countable.
Example 6.3.2. Suppose E is a ring. Then |φ|, (φ − 1)+ , 1 ∧ φ and 1 ∧ a(φ − 1)+ are
elements of E ↑ for any φ ∈ E and a > 0. Indeed, let M = kφku . By Lemma 5.3.2 and
Example 5.3.16 the maps t 7→ |t|, t 7→ (t − 1)+ and 1 ∧ a(t − 1)+ are the uniform limits on
[−M, M ] of monotone increasing and monotone decreasing sequences of polynomials that
vanish at t = 0. Consequently, |φ|, (φ − 1)+ and 1 ∧ (φ − 1)+ are uniform limits of monotone
decreasing and monotone increasing sequences of elements in E. As φ ∧ 1 = φ − (φ − 1)+ ,
φ ∧ 1 is the uniform limit of an increasing sequence in E.
Definition 6.3.3. Suppose I is a positive σ–continuous elementary integral on a vector
lattice E ⊂ Bb (Ω). The Daniell upper integral of a function h ∈ E ↑ is defined by
Z ∗
(6.4) h dI = I ∗ (h) = sup{I(φ) : φ ∈ E, φ ≤ h}

The upper integral of any extended real function f on Ω is define by

Z ∗
(6.5) f dI = I ∗ (f ) = inf{I ∗ (h) : h ∈ E ↑ , f ≤ h}

It is clear from the definition above that I ∗ (φ) = I(φ) for all φ ∈ E, and that expres-
sions (6.4) and (6.5) coincide on E ↑ . The following result summarizes the properties of
I ∗.
Theorem 6.3.4. Suppose I is an positive σ–continuous elementary integral on a vector
lattice E ⊂ Bb (Ω). Then Daniell’s upper integral I ∗ has the following properties:
(i) I ∗ is nondecreasing and positive homogeneous.
(ii) If {hn } ⊂ E ↑ is a nondecreasing sequence, then I ∗ (hn ) ր I ∗ (supn hn ).
(iii) I ∗ is additive on E ↑ .
P P
(iv) I ∗ is countably subadditive, i.e., if fn ≥ 0 then I ∗ ( n fn ) ≤ nI
∗ (f
n ).

Proof. (i) Increasing monotonicity follows directly from (6.4) and (6.5). Positive homo-
geneity is a consequence of Lemma 6.3.1 and linearity of I on E.
(ii) Suppose E ↑ ∋ hn ր h. Then supn I ∗ (hn ) ≤ I ∗ (h) by the increasing monotonicity of I ∗ .
For each n let {φn,m : m ∈ N} ⊂ E such that φn,m ր hn and define the sequence ψk =
max φn,m . If a < I ∗ (h), let E ∋ φ ≤ h so that a < I(φ). Then, E ∋ ϕk = ψk ∧ φ ≤ hk
0≤n,m≤k
and ϕk ր φ. Since (E, I) is σ–continuous, we have that
a < I(φ) = lim I(ϕk ) ≤ lim I ∗ (hk )
k k
142 6. Integration: functional approach

Hence I ∗ (h) ≤ limk I ∗ (hk ). We conclude that I ∗ (h) = limk I ∗ (kk ).

(iii) Suppose hi ∈ E ↑ , i = 1, 2. If {φn,i } ⊂ E and φn,j ր hi , then E ∋ φn,1 + φn,2 ր h1 + h2 .

Since E ⊂ E ↑ and I ∗ = I on E, it follows from (ii) that

I ∗ (h1 + h2 ) = lim I(φn,1 + φn,2 ) = lim I(φn,1 ) + I(φn,2 ) = I ∗ (h1 ) + I ∗ (h2 ).
n n

P
(iv) It is enough to assume that n I ∗ (fn ) < ∞. For ε > 0 and each n, let E ↑ ∋ hn ≥ fn
so that I ∗ (hn ) < I ∗ (fn ) + 2−n ε. Parts (ii) and (iii) and Lemma 6.3.1 imply
X X n
X
I ∗( fn ) ≤ I ∗ ( hn ) = lim I ∗ ( hk )
n
n n k=1
n
X X X
= lim I ∗ (hk ) = I ∗ (hn ) ≤ I ∗ (fn ) + ε.
n
k=1 n n

Subadditivity follows by letting ε ց 0.

If I is a positive σ–continuous elementary integral on a vector lattice E ⊂ Bb (Ω), then

Ω
the map k k∗ : R → [0, ∞] given by f 7→ I ∗ (|f |) is called Daniell mean of the elementary
integral (E, I).
Theorem 6.3.5. Suppose E ⊂ Bb (Ω) is a vector lattice. If k k∗ is the Daniell mean of the
elementary integral (E, I), then k k∗ is finite on E and:
Ω
(i) (Absolute–homogeneity) For every a ∈ R and f ∈ R , kaf k∗ = |a|kf k∗ .
(ii) (Solidity) If |f | ≤ |g|, then kf k∗ ≤ kgk∗ .

P ∗
PIf {fn } ∗is a sequence of nonnegative real–extended func-
(iii) (Countable subadditivity)
tions, then k n fn k ≤ n kfn k .
P
(iv) (Continuity) If {φn : n ∈ N} ⊂ E + and sup k nk=1 φk k∗ < ∞, then lim kφn k∗ = 0.
n n

(v) For any φ ∈ E, I(φ) ≤ kφk∗ .

Proof. (i)–(iii) are direct consequences of Theorem 6.3.4.

As E is a vector lattice, φ ∈ E implies |φ| ∈ E; hence, kφk∗ = I(|φ|) < ∞ for all φ ∈ E.

(v) Since I is positive and −|φ| ≤ φ ≤ φ, |I(φ)| ≤ I(|φ|) = kφk∗ .

P Pn P∞
(iv) If φn ≥ 0 and φn ∈ E then, as k nk=1 φn k∗ = k=1 I(φk ), n=1 I(φn ) < ∞ by
hypothesis. The conclusion follows immediately.
Remark 6.3.6. The Jordan seminorm k k# on (E(R), I) is not countably subadditive.
To see this, we consider the following counterexample. Let fn = 2−n 1(n,n+1] , n ∈ Z+ .
P P # P

Since n fn has unbounded domain, ∞ n=0 n = ∞. On the other hand,
f #
n kfn k =
P∞ −n
n=0 2 = 2.
6.3. Daniell’s mean 143

For a comparison between a mean (Daniell mean) and the Jordan seminorm, see Exer-
cise 6.8.7.
Ω
Definition 6.3.7. Let E ⊂ Bb (Ω) be vector space. A functional k k on R that is finite on E
and satisfies (i)–(iv) in Theorem 6.3.5 is called a mean for E. A mean is said to dominate
the elementary integral (E, I) if (v) holds.
Remark 6.3.8. Notice that solidity of a mean k k implies that k|f |k = kf k. When E is a
vector lattice, the Daniell mean kf k∗ = I(|f |) dominates the elementary integral.
Theorem 6.3.9. (Chebyshev’s inequality.) If k k is a mean for E, then

(6.6) {f > λ} ≤ {|f | > λ} ≤ 1 kf k
λ
Ω
for any f ∈ R and λ > 0.

Proof. (6.6) is a consequence of the absolute–homogeneity, the solidity of k k and the

inequalities λ1{f >λ} ≤ λ1{|f |>λ} ≤ |f |.
Ω
Definition 6.3.10. A function f ∈ R is called k k–negligible if kf k = 0; a set A ⊂ Ω is
called negligible if 1A is negligible; a property P on Ω is said to hold k k–almost surely if
the set {ω ∈ Ω : P (ω) is false} is negligible.
Lemma 6.3.11. Suppose that k k is a mean for E.
(i) The sum of countably many k k–negligible functions is k k–negligible; the countable
union of k k–negligible functions is k k–negligible.
(ii) If f is k k–negligible if and only if {f 6= 0} is k k–negligible.
(iii) If kf k < ∞, then f is finite k k–almost everywhere.
(iv) If f = f ′ k k–a.s., then kf k = kf ′ k.
P
Proof. (i) Since max n |fn | P ≤ n |f
n |, P
solidity and countable subadditivity of the mean
show that maxn |fn | ≤ n |fn | ≤ n kfn k. Then (i) follows immediately.
P P
(ii) Since 1{f 6=0} ≤ n |f | and |f | ≤ n 1{f 6=0} , (i) implies (ii).

(iii) Since n1{|f |=∞} ≤ |f |, we have that k{|f | = ∞}k ≤ n1 kf k → 0.

(iv) If f = f ′ almost surely, then f = f ′ 1{f =f ′ } +f 1{f 6=f ′ } . As k{f 6= f ′ }k = 0, kf 1{f 6=f ′ } k =
0; thus, kf k ≤ kf ′ k. Applying the same argument to f ′ , we conclude that kf ′ k ≤ kf k.
Therefore kf k = kf ′ k.

A function f is defined k k–almost everywhere if Ω \ dom(f ) is k k–negligible. By

Ω
Lemma 6.3.11, if g, g ′ ∈ R coincide with f on dom(f ), then k1{g6=g′ } k = 0 and so kgk =
kg ′ k. Therefore, we can define kf k := kgk.
Theorem 6.3.12. Suppose E ⊂ Bb (Ω) is a vector space. If k k is a mean for E, then
Ω
(i) F = F(E, k k) := {f ∈ R : kf k < ∞} is a Stone lattice.
144 6. Integration: functional approach

(ii) (F, k k) forms a complete seminormed space.

(iii) If {fn } ⊂ F and limn kfn − f k = 0, then there is a subsequence fnk that converges
to f pointwise almost surely.
Functions in closure of E in F are called k k–integrable. The collection of all such functions
will be denoted by L1 (k k). If k k∗ is the Daniell mean associated to an elementary integral
(E, I), the functions in L1 (k k∗ ) are called Daniell integrable.

Proof. Statement (i) follows from solidity, absolute homogeneity and countable subaddi-
tivity of the mean, and from the inequalities |a f + g| ≤ |a||f | + |g|, |f ∨ g| ≤ |f | + |g|,
|f ∧ g| ≤ |f | + |g|, and |f ∧ 1| ≤ |f |.

(ii) Suppose that {fn } ⊂ F is a Cauchy sequence. By Lemma 6.3.11, we can assume without
loss of generality that |fn (ω)| < ∞ for all n and
Pall ω ∈ Ω. Choose a subsequence {fnk }
−k
such that supn≥nk kfn − fnk k < 2 . Then g = k |fnk+1 − fnk | ∈ F. Hence B = {g = ∞}
is negligible,
∞
X
f (x) = fn1 (x) + (fnk+1 (x) − fnk (x)) = lim fnk (x)
k
k=1
absolutely on B c , and kf k < kfn1 k + 1 < ∞. For each k, if n ≥ nk then
kf − fn k ≤ kfn − fnk k + kf − fnk k ≤ 2−k + k1B c (f − fnk )k
X
≤ 2−k + 1B c (fnm+1 − fnm ) ≤ 2−k+1 → 0.
m≥k

Therefore, limn kf − fn k = 0 and the subsequence fnk converges to f almost surely.

(iii) If fn converges to f in mean, then (fn ) is a Cauchy sequence in mean. By part (ii)
there is a subsequence {fnk } and a function f ′ ∈ F to which fnk converges in mean and
almost surely. It follows that f and f ′ are finite k k–a.s., and f = f ′ k k–a.s.

The following result is a simple version of monotone convergence for pointwise limits of
elementary functions.
Lemma 6.3.13. Suppose (φn ) ⊂ E is a monotone increasing sequence with supn kφn k < ∞.
Then supm φm ∈ L1 and limn k supm φm − φn k = 0.

Proof. We claim that (φn ) is a Cauchy sequence in L1 ; otherwise, there is ε > 0 and a
subsequence φnk such that kφnk − φnk−1 k ≥ ε. However, as (φnk − φnk−1 ) ⊂ E+ and
XK

sup (φnk − φnk−1 ) = sup kφnK − φn0 k ≤ 2 sup kφm k < ∞,
K K m
k=1
limk kφnk − φnk−1 k = 0, which is a contradiction. Therefore, k supm φm − φn k → 0 by
Theorem 6.3.12(b,c).
Lemma 6.3.14. Assume E ⊂ Bb (Ω) is a Stone lattice or a ring. Let k k be a mean for E.
6.3. Daniell’s mean 145

u
(i) For any φ ∈ E, |φ|, φ2 ∈ E ∩ L1 (k k).
(ii) If φ ∈ E+ then φ ∧ 1 ∈ L1 .
(iii) If f ∈ L+
1 (k k), then there exists a sequence {ψn } ⊂ E+ such that kf − ψn k → 0.

Proof. It is clear by definition that E ⊂ L1 .

(i)&(ii) Suppose E is a Stone lattice and let φ ∈ E. Then |φ|, φ ∧ 1 ∈ E ⊂ L1 (k k). By
Lemma 5.3.4 there exits a piecewise linear function gn such that gn (φ) ∈ E+ and gn (φ) ր φ2
uniformly. Therefore, by Lemma 6.3.13, φ2 ∈ L1 .
Suppose E is merely a ring and let φ ∈ E. Then φ2 ∈ E ⊂ L1 . For any φ ∈ E there is a
sequence of polynomials Pn (t) in t2 such that 0 ≤ Pn (φ) ր |φ| uniformly. As (Pn (φ)) ⊂ E,
|φ| ∈ L1 by Lemma 6.3.13. Example 5.3.14 shows that there exists a sequence (τn ) ⊂ E+
that increases uniformly to φ∧1. By Lemma 6.3.13, φ∧1 ∈ L1 since kτn ∧1k ≤ kφ∧1k < ∞.
(iii) Suppose f ∈ L+ 1 and let {φn } be a sequence in E that converges to f in k k. By part
(i), for each n ∈ N there is ψn ∈ E+ such that kψn − |φn |k < n1 . By solidity kf − ψn k ≤
kf − |φn |k + k|φn | − ψn k ≤ kf − φn k + n1 → 0.
Theorem 6.3.15. Suppose E ⊂ Bb (Ω) is either a Stone lattice or a ring, and let k k be a
mean for E. Then,
(i) L1 (k k) is a closed linear subspace of F and a Stone lattice.
u
(ii) If g ∈ L1 (k k) is bounded or if g ∈ E , then f g ∈ L1 for all f ∈ L1 .

Proof. (i) Suppose limn kfn − f k = 0 where {fn : n ∈ N} ⊂ L1 . Then, for any fn there
exists φn ∈ E such that kfn − φn k < n1 . Consequently
kf − φn k ≤ kf − fn k + kfn − φn k → 0
as n → 0. Therefore L1 is a closed linear subspace of F.
Suppose f, g ∈ L1 , a ∈ R and let (φn : n ∈ N) and (ψn : n ∈ N) be sequences in E such that
limn kφn − f k = lim kψn − gk = 0. Then
(6.7) |a f + g − (a φn − ψn )| ≤ |a||f − φn | + |g − ψn |

(6.8) |f | − |φn | ≤ |f − φn |
(6.9) |f ∧ 1 − φn ∧ 1| ≤ |f − φn |.
Solidity, absolute homogeneity and subadditivity of k k imply that af + g ∈ L1 .
If E is a lattice then (6.8) and (6.9) imply that |f |, f ∧ 1 ∈ L1 .
If E is merely a ring then (|φn | : n ∈ N) ⊂ L1 by Lemma 6.3.14(i), and so |f | ∈ L1
by (6.8). Consequently f+ , f− ∈ L1 whenever f ∈ L1 . Since f ∧ 1 = f+ ∧ 1 − f− , to
show that f ∧ 1 ∈ L1 it is enough to assume that f ≥ 0. In such case, there is a sequence
(φn : n ∈ N) ⊂ E+ such that kf − φn k → 0 by Lemma 6.3.14. Since (ϕn ∧ 1 : n ∈ N) ⊂ L1 ,
kf ∧ 1 − φn ∧ 1k → 0 by (6.9).
146 6. Integration: functional approach

u
(ii) We first show that gφ ∈ L1 whenever φ ∈ E. If g ∈ E and (φn ) ⊂ E converges uniformly
to g then, φφn ∈ L1 (k k) by Lemma 6.3.14(i). As kφφn − φgk ≤ kφn − gku kφk → 0, φg ∈ L1 .
If g is integrable and bounded and (φn ) ⊂ E is such that kg − φn k → 0 then, φφn ∈ L1 by
Lemma 6.3.14(i). As kφg − φφn k ≤ kφku kg − φn k → 0, φg ∈ L1 .

For a general f ∈ L1 , let (ψn ) ⊂ E be such that kf − ψn k → 0. Then gψn ∈ L1 for all n
and, since kf g − ψn gk ≤ kgku kf − ψn k → 0, f g ∈ L1 .
Remark 6.3.16. Statement (ii) in Theorem 6.3.15 says that the collection of bounded
integrable functions is an algebraic ring contained in L1 (k k).

6.4. Daniell convergence theorems

The following results present the analogs of the monotone convergence and dominated
convergence for (L1 , k k).
Theorem 6.4.1. (Daniell’s monotone convergence theorem) Suppose that {fn } ⊂ L1 is
either an increasing or a decreasing sequence and let f be its pointwise limit. If supn kfn k <
∞, then f ∈ L1 and limn kfn − f k = 0.
Pn
Proof. We first show that if {fn } ⊂ L+ 1 and supn k k=1 fk k < ∞, then limn kfn k = 0.
For each n we can choose φn ∈ E+ such that kφn − fn k ≤ 2−n . Then
n n
X X
sup φn ≤ sup fn + 1.
n n
k=1 k=1

Thus, kfn k ≤ kfn − φn k + kφn k → 0.

Without loss of generality we may assume that |fn | < ∞ on Ω for all n. It is enough to
consider the case when fn ր f pointwise everywhere, for if fn ց f then f1 − fn ր f1 − f .
We claim that fn is Cauchy sequence on L1 ; otherwise, for some ε > 0 there would be a
subsequence {fnk } such that supk kfnk+1 − fnk k > ε. As (fnk+1 − fnk ) ⊂ L+
1 and
K
X
sup (fnk+1 − fnk ) ≤ sup kfn k + kfn1 k < ∞,
K n
k=1

limk kfnk+1 − fnk k = 0 which is a contradiction. Therefore, by Theorem 6.3.12, f ∈ L1 and

kfn − f k → 0.

A direct consequence of Daniell’s monotone convergence is that if (fn ) ⊂ L1 (k k) is a

monotone sequence of nonnegative functions, then k supn fn k = supn kfn k.
Corollary 6.4.2. If E is a vector lattice. Then E ↑ ∩ F ⊂ L1

Proof. Let h ∈ E ↑ ∩ F and choose a nondecreasing sequence (φn ) ⊂ E converging to h.

Since ψn = φn − φ1 ∈ E+ , ψn ր h − φ1 and supn kψn k ≤ khk + kφ1 k, we conclude that
kφn − hk → 0 by monotone convergence.
6.4. Daniell convergence theorems 147

Theorem 6.4.3. (Daniell–Fatou lemma.) If 0 ≤ fn ∈ L1 , n ∈ N, then

k lim inf fn k ≤ lim inf kfn k.
n n
If lim inf n kfn k < ∞, then lim inf n fn ∈ L1 .

Proof. It is enough to consider the case lim inf n kfn V

k < ∞. For each n ∈ N let gn =
inf m≥n fm . For each pair of integers (n, m), gn,m := m k=0 fn+k ∈ L1 and 0 ≤ gn,m ց gn
as m → ∞. Since supm kgn,m k ≤ kfn k < ∞, gn ∈ L+ 1 by Daniell’s monotone convergence.
For all n ∈ Z+ 0 ≤ gn−1 ≤ gn ≤ fn and gn ր lim inf n fn . Another application of Daniell’s
monotone convergence and
k lim inf fn k = k sup gn k = sup kgn k ≤ lim inf kfn k < ∞
n n n n

imply that lim inf n fn ∈ L1 .

Theorem 6.4.4. (Daniell–Lebesgue dominated convergence.) Suppose {fn } ⊂ L1 con-
verges almost surely to f . Suppose there is g ∈ F such that |fn | ≤ g almost surely for all n.
Then f ∈ L1 and limn kfn − f k = 0.

Proof. Without loss of generality we may assume that all conditions happen everywhere.
By Theorem 6.3.12(d) and Daniell’s monotone convergence,
gn = sup{|fk − fm | : k, m ≥ n} ∈ L1 , n ∈ N.
Since gn ց 0 and 0 ≤ gn ≤ 2g for all n, kgn k → 0 by monotone convergence. Since
kfk − fm k ≤ kgn k for all k, m ≥ n, (fn ) is a Cauchy sequence in L1 . By Theorem 6.3.12(c),
fn converges in mean to f .

A set A ⊂ Ω is said to be k k–integrable iff 1A ∈ L1 (k k).

Example 6.4.5. Suppose f ∈ L1 . For any integrable subset A we have that g = f ·1A ∈ L1
by Theorem 6.3.15. Therefore, f − g = f · 1Ac ∈ L1 .
Example 6.4.6. Domonated convergence implies that the collection of all integrable sets
is a δ–ring. This collection is in general fails to be a σ–ring.

The following result derives integrable set out of integrable functions.

Lemma 6.4.7. Suppose f ∈ L1 and a ∈ (0, ∞). Then 1{f >a} , 1{f ≥a} , 1{f <−a} and
1{f ≤−a} are integrable.

Proof. From Theorem 6.3.15, hn = 1∧ n(f −f ∧1) ∈ L1 for each n. Since 0 ≤ hn ≤ |f | and
hn → 1{f >1} , we conclude that 1{f >1} ∈ L1 by dominated convergence. Since {f > a} =
{f /a > 1}, it follows that 1{f >a} ∈ L1 . Using −f instead of f we get that 1{f <−a} ∈ L1 .
For any sequence {an } ⊂ (0, a) with an ր a, we have that 1{f >an } ց 1{f ≥a} . By dominated
convergence, 1{f ≥a} and 1{f ≤−a} are in L1 .
Example 6.4.8. If f ∈ L+ , then 1(a,b] ◦ f , and 1[a,b) ◦ f are integrable for all 0 < a < b.
1
Consequently, fn := 2−n 2n f ]1{f ≤n} is integrable for all n ∈ Z+ .
148 6. Integration: functional approach

Lemma 6.4.9. If E is a Stone lattice or a ring then, for any a > 0 and h ∈ E ↑ , the function
1{h>a} ∈ E ↑ .

Proof. For any φ ∈ E let φn = 1 ∧ n(φ − φ ∧ 1) . If E is a Stone lattice then φn ∈ E.
If E is merely a ring, then φn ∈ E ↑ as in Example 6.3.2. Thus 1{φ>1} = sup ↑
Sn φn ∈ E by
Lemma 6.3.1. Therefore, if h = supn ϕn where (ϕn ) ⊂ E then, {h > a} = n {ϕn > a} ∈
E ↑↑ = E ↑ .
Theorem 6.4.10. Let I be a positive σ–continuous elementary integral on a Stone lattice
E ⊂ Bb (Ω), and let k k∗ be its Daniell’s mean. Then
(6.10) kAk∗ = inf{kBk∗ : A ⊂ B ∈ E ↑ } = inf{kBk∗ : A ⊂ B ∈ L1 }
for all A ∈ F. Moreover, for any A ∈ F there is a set B ∈ E ↑↓ ∩ L1 such that A ⊂ B and
kAk∗ = kBk∗ .

Proof. If kAk∗ < ∞, then there is a sequence (hn ) ⊂ E ↑ ∩ F such that 1A ≤ hn and
limn khn k∗ = kAk∗ . For any 0 < ε < 1
hn
1A ≤ 1{hn >1−ε} ≤ 1−ε ;

whence we obtain that

kAk∗ ≤ k{hN > 1 − ε}k∗ ≤ 1
1−ε (kAk
∗
+ ε)
for N large enough. By Corollary 6.4.2 (hn ) ⊂ L1 and, by Lemmas 6.4.7 and 6.4.9, {hn >
1 − ε} ∈ E ↑ ∩ L1 . Consequently
(6.11) inf{kBk∗ : A ⊂ B ∈ E ↑ } ∨ inf{kBk∗ : A ⊂ B ∈ L1 } ≤ 1
1−ε (kAk
∗
+ ε).
(6.10) follows by letting ε → 0.

The prove the last statement, suppose

T A ∈ F and choose (Bn ) ⊂ E ↑ so that A ⊂ Bn and
∗ ∗
kAk = limn kBn k . The set B = n Bn has the desired property.

An integrable simple function is a finite linear combination of functions 1A ∈ L1 . The

following result shows that simple functions are dense in L1 .
Theorem 6.4.11. For any f ∈ L1 , there is a sequence {sn } of simple functions such that
|sn | ≤ |f | almost surely and kf − sn k → 0.

Proof. Express f = f+ − f− and let

s+
n = 2−n [2n f+ ]1{f+ ≤n}
s−
n =2
−n [2n f ]1
− {f− ≤n} .

Then {sn = s+ −
n − sn } is a sequence of integrable simple functions that converge to f
on {|f | =
6 ∞} and such that |sn | ≤ |f |. By dominated convergence we conclude that
ksn − f k → 0.
6.5. Extension of the Integral 149

Ω
A function f ∈ R is called σ–finite with respect to a mean k k if {f 6= 0} is covered by
a sequence of k k–integrable sets. A mean k k is said to be σ–finite if the constant
S function
1 is σ–finite with respect to k k. Any f ∈ L1 (k k) is σ–finite, for {|f | > 0} = n {|f | > n1 },
and {|f | > 1/n} ∈ L1 (k k) for all n.
Theorem 6.4.12. If the Daniell mean k k∗ of a positive σ–continuous elementary integral
I on a Stone lattice E ⊂ Bb (Ω) is σ–finite then, there exists a sequence (φn ) ⊂ E such that
supn φn ≡ 1.

Conversely, suppose E ⊂ Bb (Ω) is a Stone lattice or a ring. If there is a sequence (φn ) ⊂ E

such that supn φn ≡ 1, then any mean k k on E is σ–finite.
W
Proof. If k k∗ is σ–finite then there is a sequence (1An ) ⊂ L1 (k k∗ ) such that 1 = 1An .
n
The definition of Daniell’s mean implies that for each n there is a sequence (ψn,k : k ∈ N) ⊂ E
such that 1An ≤ hn := supk φn,k . As E is a lattice closed under chopping, φ′n,k = (φn,k ∧
1)+ ∈ E for each n and k in N. It is easy to check that 1 = sup(n,k)∈N2 φ′n,k .
W
Conversely, if there is a sequence (ψn ) ⊂ E such that 1 = supn ψn then, 1 = 1{ψn > 1 }
n,m m

and 1{ψn > 1 } ∈ L1 (k k) for all n and m in N.

6.5. Extension of the Integral

Consider (E, I) where I is a positive σ–finite elementary integral on a Stone lattice or a ring
E ⊂ Bb (Ω). Suppose k k is a mean for E dominating I, that is, |I(φ)| ≤ kφk for all φ ∈ E.
If f ∈ L1 and {φn } ⊂ E converges in mean to f , then
(6.12) |I(φn ) − I(φm )| = |I(φn − φm )| ≤ kφn − φm k.

Thus I(φn ) : n ∈ N is a Cauchy sequence on R, and so it converges. If {ϕn } ⊂ E is
another sequence that converges in mean to f , then
|I(φn ) − I(ϕn )| = |I(φn − ϕn )| ≤ kφn − ϕn k ≤ kf − φn k + kf − ϕn k → 0.
This shows that I admits a unique extension to (L1 , k k) by setting
(6.13) I(f ) := lim I(φn )
n

for any f ∈ L1 (k k) and {φn } ⊂ E with kf − φn k → 0. For any f ∈ L1 (k k), I(f ) is the
Daniell integral of f .
Theorem 6.5.1. (Integral Extension) Let I be a positive σ–continuous elementary integral
on a Stone lattice or a ring E ⊂ Bb (Ω). If k k is a mean for E that dominates I then,

R as a positive linear functional on L1 (k k). The extension,

(i) I has a unique extension
denoted also by I or dI, satisfies |I(f )| ≤ I(|f |) ≤ kf k for all f ∈ L1 (k k).
R
(ii) If (fn : n ∈ N) ⊂ L1 (k k) converges to f in k k–mean, then f dI = I(f ) :=
limn I(fn ).
150 6. Integration: functional approach

(iii) (Monotone convergence) If {fn } is a monotone sequence of nonnegative k k–

integrable functions and supn kfn k < ∞, then limn I(|fn − supn f |) = 0.
If E is a vector latttice closed under chopping and k k∗ is Daniell’s associated to I then,
(iv) I(f ) = I ∗ (f ) for all f ∈ L+ ∗ ∗ ↑ ∗
1 (k k ) and I(h) = I (h) for all h ∈ E ∩ L1 (k k ).

Proof. (i) Suppose f, g ∈ L1 , a ∈ R and let {φn } and {ψn } be sequences in E which
converge in mean to f and g respectively. As kaf + g − (aφn + ψn )k → 0, the linearity of I
on E implies that
I(af + g) = lim I(aφn + ψn ) = aI(f ) + I(g).
n

If g ∈ L+1 (k k) then, by Lemma 6.3.14(iii) there is a sequence {ψn } ⊂ E+ such that

limn kg − ψn k = 0. Then, 0 ≤ I(g) = lim n I(ψn ) ≤ limn kψn k = kgk, and so I is posi-
tive. Consequently, |I(f )| ≤ I(|f |) ≤ |f | = kf k for all f ∈ L1 (k k).

(ii) follows from (i), since kI(f − fn )| ≤ kf − fn k.

(iii) is a direct consequence of Daniell’s monotone convergence theorem and the inequality
|I(supn fn ) − I(fn )| ≤ k supn f − fn k.
(iv) Suppose E is a Stone lattice and k k∗ is Daniell’s mean. For any f ∈ L1 (k k∗ ) and
sequence (φn ) ⊂ E converging to f in L1 (k k) we have
|I(f )| = lim |I(φn )| ≤ lim I(|φn |) = lim kφn k∗ = kf k∗ = I ∗ (|f |).
n n n
Equality follows if f ≥ 0.
Suppose h ∈ E ↑ ∩ L1 (k k∗ ). Fix φ ∈ E so that φ ≤ h. Then h = (h − φ) + φ and
↑
(h−φ) ∈ E+ ∩L(k k∗ ). Part (i) and Theorem 6.3.4(iii) imply that I ∗ (h) = I ∗ (h−φ)+I ∗ (φ) =
I(h − φ) + I(φ) = I(h).
Example 6.5.2. (Lebesgue integral on R) Considering the set E of step functions φ =
P n
k=1 ck 1(ak ,bk ] , where n ∈ N, ck ∈ R and −∞
P
< ak ≤ bk < ∞. This is a ring lattice
closed under chopping. The functional λ(φ) = nk=1 ck (bk − ak ) is a positive σ–continuous
elementary integral on E. The Daniell extension of λ gives the Lebesgue integral on R.
Example 6.5.3. (Abstract Lebesgue integral) MoreP generally, consider a measure space
(Ω, F , m). The collection E of simple functions φ = nk=1 ck 1{φk =ak } , where ck ∈ R \ {0},
{φ = ak } ∈ F and |ck |m({φ = ak }) < ∞,Pis a ring lattice closed under chopping. The
linear extension of m to E define by m(φ) = nk=1 ck m({φ = ak }) is a positive σ–continuous
elementary integral. The Daniell extension of m produces the abstract Lebesgue integral.

We conclude this section with extend the set of integrable functions to contemplate
complex–valued functions.
Example 6.5.4. Define E ⊗ C = {φ + i ψ : φ, ψ ∈ E} be the complex linear span of E. For
any f ∈ CΩ , it is natural to define its seminorm kf k∗ as
∗
kf k∗C = |f | .
6.6. Alternative extension of the Daniell integral 151

It is obvious that the family (FC∗ , k k∗C) of complex–valued functions f ∈ C Ω with kf k∗C < ∞
is a complete complex normed space. The space L1 (C, k k∗ ) of complex–valued integrable
functions is then defined as the closure of E ⊗ C in FC∗ . It is easy to check that f = u + iv ∈
L1 (C, k k∗ ) iff u and v are in L1 (k k∗ ); furthermore, if {gn := φn + i ψn : n ∈ N} ⊂ E ⊗ C
is a sequence such that limn→∞ kf − gn k∗C = 0, then limn→∞ (kφn − uk∗ + |ψn − vk∗ ) = 0.
This means that I can be uniquely extended to L1 (C, k k∗ ) by setting I(f ) := I(u) + i I(v).

6.6. Alternative extension of the Daniell integral

When k k is the Daniell mean of a positive σ–continuous elementary integral I on a vector
lattice E ⊂ Bb (Ω), there is a more direct approach to define the space L1 (k k). A down–
and–up procedure is develop and which then is matched to the up–and–down procedure
develop above.
Let E ↓ denote the space of functions that are countable infima of functions in E, that
is, g ∈ E ↓ iff g = inf n φn for some sequence {φn } ⊂ E.
Definition 6.6.1. The lower integral of a function g ∈ E ↓ is defined as
(6.14) I∗ (g) = inf{I(φ) : φ ∈ E, g ≤ φ}
The lower integral of any extended real function f on Ω is defines as
(6.15) I∗ (f ) = sup{I∗ (g) : g ∈ E ↓ , g ≤ f }

It is easy the check that I∗ (φ) = I(φ) for all φ ∈ E and that (6.14) and (6.15) coincide
on E ↓ .
Theorem 6.6.2. Let E be a vector lattice. Then
(i) E ↓ is closed under addition, multiplication by non negative scalars, countable infima
and finite suprema.
(ii) I∗ is nondecreasing and positive homogeneous.
(iii) I∗ f ≤ I ∗ f for any numerical function f .
(iv) If {gn } ⊂ E ↓ is a nonincreasing sequence, then I∗ (gn ) ց I∗ (inf n gn ).
(v) I∗ is additive on E ↓ .
P P
(vi) I∗ is σ–superadditive, i.e., if fn ≥ 0 then I∗ ( n fn ) ≥ n I∗ (fn ).

Proof. Observe that E ↓ = −E ↑ . (i) follows directly from Lemma 6.3.1. (ii) follows directly
from the definition of I∗ . The observation above it implies that
(6.16) I∗ f = −I ∗ (−f )
for any numerical function f . (iv) and (v) are consequences of (6.16) and Theorem 6.3.4[i,ii,
iii].

To prove (iii), consider g ∈ E ↓ and h ∈ E ↑ so that g ≤ f ≤ h. (i) and (6.16) imply that
↑
0 ≤ f − g ≤ h − g ∈ E+ and 0 ≤ I ∗ (f − g) = I ∗ (h) + I ∗ (−g) = I ∗ (f ) − I∗ (g).
152 6. Integration: functional approach

↓
To prove superadditivity, let fn ≥ gn ∈ E+ such that I∗ (gn ) > I∗ (fn ) − 2−n ε. For any fixed
integer N ,
X X X N

I∗ fn ≥ I ∗ gn ≥ I∗ (gn )
n n n=1
N
X
> I∗ (fn ) − 2−n ε .
n=1

(vi) follows by letting N ր ∞ and then ε ց 0.

Theorem 6.6.3. Suppose k k∗ is the Daniell mean of a positve σ–continuous elementary
Ω
integral I on a Stone lattice E ⊂ Bb (Ω). f ∈ R is Daniell–integrable if and only if
−∞ < I∗ (f ) = I ∗ (f ) < ∞.
If {φn : n ∈ N} ⊂ E and limn I ∗ (|f − φn |) = 0, then limn I(φn ) = I ∗ (f ).

Proof. Suppose f is Daniell–integrable. Then, |I ∗ (f )| ≤ I ∗ (|f |) < ∞ and there are φn ∈ E

such that I ∗ (|f −φn |) < n1 . Thus, there are hn ∈ E ↑ such that |f −φn | ≤ hn and I ∗ (hn ) < n1 .
It follows that φn − hn ≤ f ≤ φn + hn and
I(φn ) + I∗ (−hn ) =I(φn ) − I ∗ (hn ) = I∗ (φn − hn )
≤I∗ (f ) ≤ I ∗ (f ) ≤ I ∗ (φn + hn ) = I(φn ) + I ∗ (hn ),
Ω
for I∗ (φ) = I(φ) = I ∗ (φ) for all φ ∈ E and I∗ (u) = −I ∗ (−u) for all u ∈ R . Consequently
1
|I ∗ (f ) − I(φn )| ≤ I ∗ (hn ) ≤
n
2
|I ∗ (f ) − I∗ (f )| ≤ 2I ∗ (hn ) ≤ ,
n
whence we conclude that limn |I ∗ (f ) − I(φn )| = 0 and I∗ (f ) = I ∗ (f ).

Conversely, suppose I∗ (f ) = I ∗ (f ) < ∞. Given ε > 0 there exist g ∈ E ↓ and h ∈ E ↑ such

that g ≤ f ≤ h and I ∗ (h) − I∗ (g) = I ∗ (h − g) < ε. By definition of I∗ , there is φ ∈ E such
that g ≤ φ and I(φ) < I∗ (g) + ε. Hence 0 ≤ I ∗ (φ − g) = I(φ) − I∗ (g) < ε. As a result
I ∗ (|f − φ|) = kf − φk∗ ≤ kf − gk∗ + kg − φk∗ < 2ε,
and so, f is Daniell–integrable.

Another version of dominated convergence is discussed in Exercise 6.8.9.

6.7. Order continuous Integrals

Suppose X is a locally compact Hausdorff (l.c.H for short) topological space. The following
result shows that a positive elementary integral on C00 (X) satisfies a stronger property than
σ-continuity.
6.7. Order continuous Integrals 153

+
Lemma 6.7.1. Suppose I is a positive linear functional on C00 (X). If {φ : φ ∈ Φ} ⊂ C00 (X)
is an increasing directed subset and supφ∈Φ φ ∈ C00 (X), then

(6.17) I sup φ = sup I(φ).
φ∈Φ φ∈Φ

+
Proof. Since Φ ⊂ C00 (X) is an increasing directed family and by hypothesis g := supφ∈Φ φ ∈
C00 (X), we have that supp(g) is compact and supp(φ) ⊂ supp(g) for all φ ∈ Φ. By Dini’s
lemma, kg − φku → 0 along the directed set Φ. Urysohn’s lemma provides a function
ψ ∈ C00 (X) such that supp(g) ψ. Hence, g − φ = |g − φ| ≤ kg − φku · ψ for all φ ∈ Φ.
Therefore
|I(g) − I(φ)| = I(g) − I(φ) = I(g − φ) ≤ kg − φku I(ψ) → 0
along the directed set Φ.

Lemma 6.7.1 shows that any positive linear functional I on C00 (X) leads to an ele-
mentary integral with a stronger version of σ–continuity. Positive linear functionals on
C00 (X) are called positive Radon measures . A more detail description of Radon measures
is developed in Section 7.7.
Definition 6.7.2. Suppose E ⊂ Bb (Ω). Let E ⇑ denote the collection of extended real–valued
functions that are the pointwise suprema of arbitrary collections in E.
(a) An elementary integral (E, I) is order continuous if (6.17) holds for every in-
creasingly directed collection Φ ⊂ E with supφ∈Φ φ ∈ E.
(b) A mean k k for E is said to be order continuous if suph∈H khk = k suph∈H hk for
⇑
any increasingly directed family H ⊂ E+ .

As we will see, order continuous means admit a richer integration theory.

Definition 6.7.3. Suppose (E, I), E ⊂ Bb (Ω), is a positive order continuous elementary
Ω
integral. Define E ⇑ as the collection of functions h ∈ R such that h = supα φα for some
{φα : α ∈ A} ⊂ E. For functions h ∈ E ⇑ the Daniell–Stone upper integral is defined as
(6.18) I • (h) = sup{I(φ) : φ ∈ E, φ ≤ h},
Ω
and for any f ∈ R ,
(6.19) I • (f ) = inf{I • (h) : h ∈ E ⇑ , f ≤ h}.
The Daniell–Stone mean is defined as
kf k• = I • (|f |)
Ω
for all f ∈ R .
Remark 6.7.4. A function h belongs to E ⇑ iff h = sup{φ ∈ E : φ ≤ h}. Thus, when E is a
vector lattice, h ∈ E ⇑ iff it is the limit of an increasingly directed net Φ ⊂ E. A comparison
between E ↑ and E ⇑ is discussed in Exercise 6.8.10
154 6. Integration: functional approach

Theorem 6.7.5. Assume E ⊂ Bb (Ω) is a Stone lattice, I is a positive order continuous

elementary integral and let I ∗ and I • be the associated Daniell and the Daniell–Stone upper
integrals respectively. Then
(i) E ⇑ is closed under addition, multiplication by nonnegative scalars, and under taking
finite infima and arbitrary suprema.
S
(ii) If φ ∈ E, h ∈ E ↑ and f ∈ R , I • (φ) = I(φ) = I ∗ (φ), I ∗ (h) = I • (h), and I • (f ) ≤
I ∗ (f ).
(iii) The Daniell–Stone upper integral I • is additive and positive homogeneous in E ⇑ .
Moreover, I • (supα hα ) = supα I • (hα ) for any increasingly directed collection {hα :
α ∈ D} ⊂ E ⇑ .
(iv) I • is increasing, positive–homogeneous, and countably subadditve.
(v) The Daniell–Stone mean k k• is and order continuous mean, |I(φ)| ≤ kφk• for all
φ ∈ E, and k k• ≤ k k∗ .

Proof. (i) If {hα : α ∈ A} ⊂ E ⇑ , then for each α ∈ A there is a collection {φα,β : β ∈ Bα } ⊂

E such that hα = sup{φα,β : β ∈ Bα }. Let α1 , α2 be fixed elements of the index set A, and
let r ≥ 0 be fixed. As E is a linear vector space, positive–homogeneity and additivity on E ⇑
follow from
rhα1 = r sup{φα1 ,β : β ∈ Bα1 } = sup{rφα1 ,β : β ∈ Bα1 },
and

hα1 + hα2 = sup φα1 ,β + φα2 ,β ′ : (β, β ′ ) ∈ Bα1 × Bα2 .
As E is a vector lattice, closure with respect taking finite infima follows from

hα1 ∧ hα2 = sup φα1 ,β ∧ φα2 ,β ′ : (β, β ′ ) ∈ Bα1 × Bα2 .
Closure with respect to taking arbitrary suprema follows from
n [ o
sup sup{φα,β : β ∈ Bα } = sup φα,β : (α, β) ∈ {α} × Bα .
α
α∈A

(ii) is a direct consequence of E ⊂ E ↑ ⊂ E ⇑ .

(iii) Suppose H is an increasingly directed net in E ⇑ . Clearly g = sup H ∈ E ⇑ and

suph∈H I • (h) ≤ I • (g). To obtain the reverse inequality, let r < I • (g) and choose φ ∈ E
with φ ≤ g so that r < I(φ). For each h ∈ H we have thatSh = sup Φh where Φh = {φ ∈
E : φ ≤ h}. Let J be the collection
W of all finite subsets of h∈H Φh . The collection of all
functions of the form ψJ = {φ ∈ J}, J ∈ J , is an increasingly directed net in E with
g = supJ∈J ψJ . As H is increasingly directed, for any J ∈ J there exits hJ ∈ H such
that ψJ ≤ hJ . It follows that {φ ∧ ψJ : J ∈ J } ⊂ E is an increasingly directed net which
converges to φ. Since I is positive and order continuous,
r < I(φ) = sup I(φ ∧ ψJ ) ≤ sup I • (hJ ) ≤ sup I • (h).
J∈J J∈J h∈H
6.7. Order continuous Integrals 155

Therefore I • (sup H) = suph∈H I • (h).

Positive homogeneity is obvious. Additivity follows from (ii) and the continuity of I • along
increasingly directed sets. If h1 , h2 are elements in E ⇑ , then there are increasingly directed
nets {φβ : β ∈ B1 } and {φ′β : β ′ ∈ B2 } in E such that φβ ր h1 and ψβ ′ ր h2 . As B1 × B2 is
a directed set with respect to the Cartesian order, φβ + ψβ ′ ր h1 + h2 . Therefore
I • (h1 + h2 ) = I • (sup(φβ + ψβ ′ )) = sup I • (φβ + ψβ ′ ) = sup I(φβ ) + sup I(ψβ ′ )
β,β ′ β,β ′ β β′
• • • •
= I (sup φβ ) + I (sup ψβ ′ ) = I (h1 ) + I (h2 ).
β β′

(iv) The increasing and positive homogeneity properties of I • are obvious. The subadditivity
of I • follows the same way in which I ∗ is subadditive. Let {fn P : n ∈ N} be a sequence of
nonnegative real extended functions. It is enough to consider n I • (fn ) < ∞. For any
ε > 0 and each n, there exits hn ∈ E ⇑ such that hn ≥ fn and I • (hn ) < I • (fn ) + 2−n ε. By
part (i) and (iii),
X X n
X
• • •
I ( fn ) ≤ I ( hn ) = lim I ( hk )
n
n n k=1
n
X X X
= lim I • (hk ) = I • (hn ) ≤ I • (fn ) + ε.
n
k=1 n n
Subadditivty follows by taking ε ց 0.

(v) Parts (ii), (iii) and (iv) imply that k k• is an order continuous mean dominating the
elementary integral. The last statement follows from E ↑ ⊂ E ⇑ .

If E ⊂ Bb (Ω) is a Stone lattice, I is a positve order continuous elementary integral, and

k k∗ and k k• are the Daniell mean and the Daniell–Stone mean respectively then, from
Theorem 6.7.5(v) we have L1 (k k∗ ) ⊂ L1 (k k• ) and kf k∗ = kf k• for all f ∈ L1 (k k∗ ). The
Daniell–Stone mean provides a non countable way of obtaining integrable functions which
could be difficult to identify through countable procedures. This will be more evident when
we discuss integration on locally compact Hausdorff spaces in Section 7.7.
Theorem 6.7.6. Assume E is a ring lattice closed under chopping and let (E, I) be a
positive order continuous elementary integral whose Daniell–Stone mean is k k• . If h ∈ E ⇑
and khk• < ∞, then h ∈ L1 (k k• ) and 1{h>r} ∈ E ⇑ ∩ L1 (k k• ) for any r > 0.
⇑
Proof. Choose φ ∈ E with φ ≤ h. Since h = (h − φ) + φ and h − φ ∈ E+ , it is enough
to assume that h ≥ 0. There exists a sequence {φn } in E+ such that φn ≤ h and khk• <
kφn k• + n1 . By Theorem 6.7.5(ii,iii) I • (h−φn ) = I • (h)−I • (φn ) ≤ n1 . Therefore h ∈ L1 (k k• ).
W
If h = sup Φ for some collection Φ ⊂ E and r > 0 then {h > r} = φ∈Φ {φ > r}.
Since {φ > r} ∈ E ↑ ⊂ E ⇑ , {h > r} ∈ E ⇑ . The first part of the proof implies that
•
{h > r} ∈ L1 (k k• ), for k{h > r}k• ≤ k{|h| > r}k• ≤ khk r <∞
156 6. Integration: functional approach

Example 6.7.7. If X is l.c.H, I is a positive linear functional on C00 (X) then, functions
⇑
C00 (X) are lower semicontinuous. Moreover, by Theorem B.1.5, every nonnegative lower
semicontinuous function f be belongs to E ⇑ . For any lower semicontinuous function f ≥ 0,

I • (f ) = sup{I(φ) : φ ∈ C00 (X), 0 ≤ φ ≤ f }

by definition. If I • (f ) < ∞, then Theorem 6.7.6 implies f ∈ L1 (k k• ) and, for any r > 0,
the open sets 1{f >r} are also integrable.

The following result is an order continuous version of monotone convergence.

Theorem 6.7.8. Let k k be an order continuous mean for a Stone lattice or a ring E ⊂
Bb (Ω). Let Φ ⊂ E with supφ∈Φ kφk < ∞. If Φ is increasingly directed or decreasingly
directed then, sup Φ ∈ L1 or inf Φ ∈ L1 , respectively, and Φ → lim Φ in L1 (k k). In
particular, if E is a vector lattice, then E ⇑ ∩ F(k k) ⊂ L1 (k k).

Proof. If Φ is decreasingly directed and φ0 ∈ Φ then, Ψ := {φ0 − φ : φ ≤ φ0 , φ ∈ Φ} is

increasingly directed and sup Ψ = φ0 − inf Φ. Thus, it is enough to consider the case Φ
is increasingly directed. For φ0 ∈ Φ fixed, {φ − φ0 : φ ∈ Φ, φ ≥ φ0 } ⊂ E+ is increasingly
directed and sup Φ = φ0 + supφ∈Φ, φ≥φ0 (φ − φ0 ). By order continuity of the mean k k we
have that

sup (φ − φ0 ) = sup kφ − φ0 k ≤ 2 sup kφk < ∞
φ∈Φ, φ≥φ0 φ∈Φ, φ≥φ0 φ∈Φ

We claim that Φ is a Cauchy net in L1 (k k). If that were not the case, then for some ε > 0
there would exit an increasingly monotone sequence (φn ) ⊂ Φ such that kφn+1 − φn k > ε.
However, as {φn+1 − φn : n ∈ N} ⊂ E+ and

XN

sup (φn+1 − φn ) ≤ 2 sup kφk < ∞,
N n=1 φ∈Φ

limn kφn+1 − φn k = 0 by virtue of k k being a mean. This is a contradiction.

Since Φ is a Cauchy net, given ε > 0 there exits φ0 ∈ Φ such that kφ − φ′ k < ε whenever
φ, φ′ ∈ Φ and φ, φ′ ≥ φ0 . As k k is order continuous, for all φ ≥ φ0

k sup Φ − φk = sup (φ′ − φ) = sup kφ′ − φk ≤ ε.
φ′ ∈Φ φ′ ∈Φ
φ′ ≥φ φ′ ≥φ

Therefore Φ → sup Φ in L1 (k k).

Remark 6.7.9. When E is a ring lattice closed under chopping, the conclusion of The-
orem 6.7.8 still holds when Φ is an increasingly direct set in E ⇑ with supφ∈Φ kφk < ∞
(Exercise 6.8.11).
6.8. Exercises 157

6.8. Exercises
Exercise 6.8.1. This exercise studies further properties of the Jordan–Riemann seminorm.
(a) Show that I# (φ) = I(φ) = I # (φ) for all φ ∈ E(R).
(b) Show that I# (f ), and I # (f ) coincide with the Riemann–Darboux lower and upper
integrals (4.21), (4.22) introduced in Section 4.5.
(c) Let F # denote the set of functions on R for which −∞ < I# (f ) ≤ I # (f ) < ∞.
Show that F # ⊂ Bb (R).
(d) If f ∈ F # , show that |f | ∈ F #

Exercise 6.8.2. Let Ω be any non empty set, E ⊂ Bb (Ω) a ring lattice closed under
chopping, and I : E → R is called a positive elementary integral on Ω.
(i) Develop the Riemann integral of (E, I)
(ii) As an example, treat the Riemann integral on R2 and on Rn .
Suppose in addition, that E is self–confined, that is, for each φ ∈ E, there is ψ ∈ E such
that 1{φ6=0} ≤ ψ.
(iii) Show that the uniform dominated convergence theorem holds.
Exercise 6.8.3. Show that δ–continuity is equivalent to σ–continuity and σ–additivity.

Exercise 6.8.4. Let Ω be a non empty set and B a nonempty collection of subset of Ω.
Show that the the collection E(B) of simple functions over B forms a vector space iff B is
a ring of sets. In such case, show that E(B) is a lattice ring. Show that if 1 ∈ E(B), then
E(B) is an algebra lattice of functions and that B) is an algebra of sets.

Exercise 6.8.5. In each of the examples above show that E is a ring lattice closed under
chopping and that (E, I) is a positive σ–continuous elementary integral.

Exercise 6.8.6. Suppose E1 ⊂ Bb (Ω1 ) and E2 ⊂ Bb (Ω2 ) are ring lattices closed under
chopping. Define E ⊂ Bb (Ω1 × Ω2 ) as the collection of all functions of the form
N
X
ϕ(x, y) = φj (x)ψj (y)
j=1

where N ∈ N, φj ∈ E1 and ψj ∈ E2 . Show that E is a ring of bounded functions on Ω1 × Ω2

but not necessarily a Stone lattice. If I1 and I2 are elementary integrals on E1 and E2
respectively, show that the map
N
X
I(φ) = I1 (φj )I2 (ψj )
j=1

on E is an elementary integral. If I1 and I2 are positive, so is I; and if I1 and I2 are

σ–continuous, so is I.
158 6. Integration: functional approach

Exercise 6.8.7. Let E ⊂ Bb (Ω) be a self–confined ring lattice closed under chopping.
Suppose I is a positive σ–continuous elementary integral on E. Let I ∗ be the upper Daniell
integral of I and k k∗ the corresponding Daniell mean. Similarly, let I # the Jordan upper
integral of I and k k# the corresponding Jordan seminorm. Both I ∗ and I # coincide on E.
Show that
I ∗ (f ) ≤ I # (f ), kf k∗ ≤ kf k#
Ω
for all f ∈ R .
Exercise 6.8.8. If f ∈ L+
1 is bounded above by a > 0, show that there exists a sequence
(φn ) ⊂ E with 0 ≤ φn ≤ a such that limn kf − φn k = 0. (Hint: without loss of generality
assume a = 1 and use Example 5.3.14 together with Lemma 6.3.13)
Exercise 6.8.9. Suppose k k∗ is the Daniell mean of some positive σ–continuous elementary
integral I on a vector lattice E ⊂ Bb (Ω).
Ω
(a) For any f ∈ L1 (k k∗ ) and g ∈ R , show that I(f ) + I ∗ (g) = I ∗ (f + g).
(b) (Generalized Lebesgue dominated convergence) Suppose (fn ), (gn ) are sequences
in L1 (k k∗ ) such that |fn | ≤ gn . Assume that gn converges k k∗ –a.s to some
g ∈ L1 (k k∗ ) and that limn I(gn ) = I(g). If fn converges k k–a.s. to some function
f , show that f ∈ L1 (k k) and kf − fn k∗ → 0.
u
Exercise 6.8.10. Show that if E contains a countable dense subset in E then, E ↑ = E ⇑ .
Exercise 6.8.11. Suppose E ⊂ Bb (Ω) is a ring lattice closed under chopping and that
k k is an order continuous mean for E. If H ⊂ E ⇑ is an increasingly directed net and
suph∈H khk < ∞. Show that sup H ∈ L1 S and H → sup H in mean. (Hint: For each h ∈ H
let Φh = {φ ∈ E : φ ≤ h} and define Φ = h∈H Φh . Then Φ → sup H in L1 .)
Chapter 7

Daniell Measurability

7.1. Littlewood’s Principles and Measurability

In the extension of the Riemann integral on the real line (the Lebesgue integral), Little-
wood made the following insightful observations: integrable functions are nearly elementary
functions (continuous); integrable sets are nearly elementary sets (union of bounded open
intervals); measurable sets are locally nearly elementary. We will use Littlewood’s observa-
tions as the basis for the notion of measurability.
Throughout this section we assume that E ⊂ Bb (Ω) is a Stone lattice or a ring, and that
k k is a mean for E.

Theorem 7.1.1. Let f ∈ L1 and ε > 0. There exists a set U ∈ E ↑ with kU k < ε and a
u
function h ∈ E (uniform closure of E) such that f = h on U c .

Proof. As in the proof of Theorem 6.3.12, let {φn } ⊂ E such that kf − φn k → 0. Passing
−n−1 for all n ≥ 1.
P∞ that kφn − φn−1 k ≤ 2
through a subsequence if necessary, we may assume
0 = φ0 and ψn = φn − φn−1 so that f =
Let ψP n=0 ψn in mean and almost surely. Define
f′ = ∞ ψ
n=0 n where the series is defined and zero otherwise. The functions
n
X ∞
X
gn = k|ψk |, g= k|ψk |
k=1 k=1

belong to E ↑ ∩L1 (k k∗ ), kgn −gk → 0 by monotone convergence, and 1{g>M } ∈ E ↑ ∩L1 for any
1
M > 0 by Lemmas 6.4.7 and 6.4.9. Let M be large enough
′ c
P so that k{g > M }k ≤ M kgk < ε
and set U = {g > M } ∪ {f 6= f }. On U the sequence n ψn converges absolutely and
n
′ X X 1X g M
f − ψk ≤ |ψk | ≤ k|ψk | ≤ ≤ .
n n n
k=0 k>n k>n

159
160 7. Daniell Measurability

P
Hence nk=1 φk converges to f uniformly on U c . By Weierstrass extension (Corollary 5.3.19),
u
there is h ∈ E such that f = h on U c .
Theorem 7.1.2. Let {fn } ⊂ L1 and assume that fn converges to f almost surely on a set
A ∈ L1 . Then, for any ε > 0 there is an integrable set A0 ⊂ A on which fn converges
uniformly to f , and such that kA \ A0 k < ε.
T
Proof. For each n, k ≥ 1 the set S(n, k) = A ∩ i,j≥n {|fi − fj | ≤ k1 } is integrable. For
k fixed, S(n, k) ր A almost surely as n ր ∞. Hence by Daniell’s monotone convergence
theorem, there is a sequence of integers nkT< nk+1 such that kA \ S(nk , k)k < ε2−k . Again,
by Daniell’s monotone convergence, A′0 = k S(nk , k) ∈ L1 and kA\A′0 k < ε. By hypothesis
the complement of the set U where (fn ) converges to f is k k–neglibible. It follows that
A0 = A′0 ∩U is an integrable set with kA\A0 k < ε on which fn converges to f uniformly.

We now used Littlewood’s principles to define measurability.

Definition 7.1.3. A function f ∈ RΩ is k k–measurable on A ∈ L1 if for any ε > 0, there
u
is L1 ∋ A0 ⊂ A and g ∈ E such that kA\A0 k < ε and f = g on A0 . f is k k–measurable if
it is measurable on any integrable set. The collection of all (real) k k–measurable functions
will be denoted by MR(k k). A set B ⊂ Ω is measurable whenever 1B ∈ MR(k k).

Definition 7.1.3 and Littlewood’s principles imply that L1 (k k) ⊂ MR(k k). Measura-
bility of constant functions follow from the measurability of 1A for any A ∈ L1 . As we will
see in Theorem 7.1.6, MR(k k) contains σ(E).
Lemma 7.1.4. Suppose (fn : n ∈ N) ⊂ MR(k k). Then, for any A ∈ L1 and ε > 0, there
u
exist an integrable set B ⊂ A and a sequence (gn : n ∈ N) ⊂ E such that kA \ Bk < ε and
each fn = gn on B.
u
Proof. Set A−1 = A. Let L1 ∋ A0 ⊂ A−1 and g0 ∈ E be such that kA−1 \ A0 k < ε/2
and f0 = g0 on A0 . Suppose that sets Ak ⊂ Ak−1 ∈ L1 , k = 0, . . . , n and functions
u
g1 , . . . , gn ∈ E have been chosen so that kAk−1 \ Ak k < ε2−k−1 and fk = gk on Ak . Choose
u
L1 ∋ An+1 ⊂ An and gn+1 ∈ E be so thatTkAn \An+1 k < 2−n−2 ε and fn+1 = gn+1 on An+1 .
The monotone convergence implies that n An := B ∈ L1 . Moreover, the subadditivity of
the mean shows that
[ X X −n−1
kA \ Bk = (An−1 \ An ) ≤ (1A
n−1 − 1An ) ≤ ε2 = ε.
n≥0 n n

Clearly, fn = gn on B for each n.

Theorem 7.1.5. (Egorov’s theorem.) If (fn : n ∈ N) ⊂ MR(k k) converges almost surely
to f , then f is measurable. Moreover, for any A ∈ L1 and ε > 0, there is an integrable set
B ⊂ A with kA \ Bk < ε on which fn converges to f uniformly.
u
Proof. Let L1 ∋ B0 ⊂ A and {gn } ⊂ E such that kA\B0 k < ε/2 and fn = gn on B0 . Each
u
fn 1B0 ∈ L1 since it is the product of an integrable function and a function in E . Also,
fn 1B0 converges to f 1B0 almost surely. By Theorem 7.1.2 there is L1 ∋ B ⊂ B0 such that
7.1. Littlewood’s Principles and Measurability 161

kB0 \ Bk < ε/2 and kfn − f kB,u = kgn − f kB,u → 0. We conclude that f is the restriction
u
of a function g ∈ E on B. Therefore, f is measurable.

Recall from Lemma 5.6.5 that if E a either a Stone lattice or a ring, its sequential closure
ERΣ is a ring lattice closed under chopping and the σ–ring generated by the collection of set
φ−1 (I), where φ ∈ E and I is an interval in R \ {0}, coincides with the collection of sets
in ERΣ . The following theorem makes the connection between Daniell measurable functions
and the collection of measurable functions generated by E in terms of algebraic and order
permanence properties.

Theorem 7.1.6. (i) The collection MR(k k) of real–valued k k–measurable functions is an

algebra lattice which contains ERΣ . (ii) If f ∈ MR(k k) and ϕ : R → R is Borel measurable,
then ϕ ◦ f ∈ MR(k k). (iii) the collection M (k k) of measurable subsets of Ω is a σ–algebra.

Proof. (i) Let f and f ′ be measurable k k–functions, and r ∈ R. Given A ∈ L1 and

u
ε > 0, there are L1 ∋ A0 ⊂ A and functions ϕ, ϕ′ ∈ E such that kA \ A0 k < ε and
|f − ϕ| = 0 = |f − ϕ | on A0 . Then |f | = |ϕ|, rf + f ′ = rϕ + ϕ′ , f f ′ = ϕϕ′ and
′ ′
u
|f | ∧ 1 = |ϕ| ∧ 1 on A0 . As E is a ring lattice closed under chopping, so is MR(k k).

We now show that 1Ω ∈ MR(k k). Let A ∈ L1 , then by Theorem 7.1.1 there is a set U ∈ L1
u
and g ∈ E such that kU k < ε and 1A = g on U c . If A0 = A \ U , then kA \ A0 k < ε and
g = 1Ω on A0 .

Finally, MR(k k) is sequentially closed by Egorov’s theorem. Since E ⊂ MR(k k), we conclude
that ERΣ ⊂ MR(k k).

(ii) Let f ∈ MR(k k). As M (k k) is a ring, p ◦ f ∈ M (k k) for any polynomial p. The

collection G of all real–valued funcions g on R for which g ◦ f ∈ M (k k) is sequentially
closed and contains all polynomials. Therefore G contains the sequential closure of all
polynomials, i.e., collection of all real–valued Borel functions.

(iii) Since 1Ω\A = 1 − 1A , we conclude that M (k k) is closed under complementation. Since

W
1∪n An = limn nk=1 {1Ak : 1 ≤ k ≤ n}, we conclude from the first part and Egorov’s theorem
that M (k k) is closed under countable unions.

Corollary 7.1.7. Suppose {fn } ⊂ MR(k k). Let F1 = inf n fn , F2 = supn fn , F3 =

lim inf n fn , F4 = lim supn fn . For each k = 1, . . . , 4, Fk is measurable if it is finite al-
most surely.
V W
Proof. By Theorem 7.1.6[i] gn = nj=1 fj and hn = nj=1 fj are sequences in MR(k k). By
Egorov’s theorem F1 = limn gn , F2 = sup fn , F3 = supn inf m≥n fm and F4 = inf n supm≥n fn
belong to MR(k k).

Corollary 7.1.8. Suppose D ⊂ R is dense. For any f ∈ RΩ , f ∈ MR(k k) iff {f > d} ∈

M (k k) for all d ∈ D.
162 7. Daniell Measurability

Proof. If f ∈ MR(k k), then fn = 1 ∧ n(f − f ∧ 1) ∈ M (k k). Hence, limn fn = 1{f >1} ∈
MR(k k). For any
T r > 0 let 0 < dn < r so that dn ր r. Then {f > r} S = {f /r > 1} ∈ M (k k)
and {f ≥ r} = n {f > dn } ∈ M (k k). For 0 < dn ց 0, {f > 0} = n {f > dn } ∈ M (k k).
By using −f instead of f we obtain that {f < −r}, {f ≤ −r}, {f < 0} ∈ M (k k).
Since −f ∈ MR(k k), we have 1{f >−r} = 1 − 1{−f ≥r} ∈ MR(k k). Similarly, 1{f ≥−r} =
1 − 1{−f >r} ∈ MR(k k), 1{f ≥0} = 1 − 1{f >0} ∈ MR(k k) and 1{f ≥0} = 1 − 1{f <0} ∈ MR(k k).

Consversely, suppose that f is real valuedSand {f > d} ∈ M (k k) for all d ∈ D. For any
r ∈ R let D ∋ dn ց r. Then {f > r} = n {f > dn } ∈ M (k k). Similar arguments show
that {f ≥ r}, {f < r}, and {f ≥ r} are in M (k k) for all r ∈ R. Hence
n2n
X
−n n k
fn = 2 [2 f ]1{|f |≤n} = 2n 1{k≤2n f <k+1} ∈ MR(k k).
k=−n2n

As fn → f pointwise everywhere, we conclude that f ∈ MR(k k).

The follwing result is a consequence of Egorov’s theorem and states that measurable
functions are uniform limits integrable functions on large integrable sets.
Theorem 7.1.9. Suppose D is a dense set in L1 . A real–valued function f is measurable
iff for every set A ∈ L1 and ε > 0, there is L1 ∋ A0 ⊂ A with kA \ A0 k < ε on which f is
the uniform limit of a sequence in D.
u
Proof. Suppose f ∈ MR(k k). For A ∈ L1 and ε > 0 there are L1 ∋ A′0 ⊂ A and g ∈ E such
that kA \ A′0 k ≤ ε/2 and f = g on A′0 . Since g1A′0 ∈ L1 , there is a sequence {dn } ⊂ D that
converges in mean and almost surely to g1A′0 . By Egorov’s theorem there is L1 ∋ A0 ⊂ A′0
with kA′0 \ A′0 k < ε/2 on which dn converges uniformly to f .

Conversely, let A ∈ L1 , ε > 0 and suppose there is a integrable set A′0 ⊂ A with kA \ A′0 k <
ε/2 on which f is the uniform limit of a sequence (dn : n ∈ N) ⊂ D. For some integer N ,
n ≥ N implies kdn − f ku,A′0 < ε/2. As |dn 1A′0 | ≤ ε1A′0 + |dN | for all n ≥ N , f 1A′0 ∈ L1 by
u
dominated convergence. Consequently, there is an integrable set A0 ⊂ A′0 and g ∈ E such
that kA′0 \ A0 k < ε/2 and f 1A′0 = g1A′0 .

7.2. Localization
The following result shows that a function that is measurable on each set of S a countable
collection G of integrable set is also measurable in any integrable piece of the G .
Theorem 7.2.1. (Localization) Suppose (An : n ∈ N) is a sequence of integrable sets. If f
is a measurable function in each An , then
(i) f is measurable on each integrable subset of A1 .
(ii) f is measurable on A1 ∪ A2 .
S
(iii) f is measurable on any integrable subset of A = n An .
7.3. Integrability criteria 163

Proof. (i) is obvious and it is left as an exercise.

(ii) As A1 ∪A2 = (A1 \A2 )∪A2 , we might assume without loss of generality that A1 ∩A2 = ∅.
u
Given ε > 0, there exist functions φ1 and φ2 in E and an integrable set B ⊂ A1 ∪ A2 with
k(A1 ∪ A2 ) \ Bk < ε/4 such that 1Aj 1B = φj 1B , j = 1, 2. As f is measurable on B ∩ A1 and
u
on B ∩ A2 , there exist functions g1 and g2 in E and integrable sets Bj ⊂ B ∩ Aj , j = 1, 2,
such that f 1Bj = gj 1Bj . Then B1 ∪ B2 is an integrable subset of A1 ∪ A2 with
k(A1 ∪ A2 ) \ (B1 ∪ B2 )k ≤ k(A1 ∪ A2 ) \ Bk + kB \ (B1 ∪ B2 )k < ε
on which
u
f = φ1 g 1 + φ2 g2 ∈ E .
This shows that f is measurable on A1 ∪ A2 .
S
(iii) Let N ∈ N be large enough so that A \ N j=1 Aj k < ε/2. By part (ii) f is measurable
SN u
on BN = j=1 Aj , and so there exist a function g ∈ E and an integrable set B ⊂ BN with
kBN \ Bk < ε/2 such that f 1B = g1B .

To determine whether a function is measurable or not it is enough to study its local

properties in a smaller class of integrable functions as the following result shows.
Corollary 7.2.2. A function f ∈ RΩ is measurable iff it is measurable on every set of the
form {φ > r}, where φ ∈ E and r > 0.

Proof. Only sufficiency requires a proof. Assume AS∈ L1 and let (φn ) ⊂ E+ be a sequence
converging to 1A in mean and k k–a.s. Then A ⊂ n {φn > 21 } k k–a.s. The conclusion
follows from Theorem 7.2.1(iii).

7.3. Integrability criteria

In this section we will derive conditions of integrability in terms of measurability for the
space of functions of finite mean F.
Lemma 7.3.1. For any g ∈ Bb (Ω), g ∈ MR(k k) iff g1B ∈ L1 for all 1B ∈ L1 .

Proof. Suppose g is a bounded and g ∈ MR(k k). Then, for any B ∈ L1 and ε > 0, there are
u
integrable sets Bk ⊂ B and functions φk ∈ E such that kB \ Bk k < 2−k and g1 SBkT= φk 1Bk .
′
Clearly the sequence gk = φk 1Bk ∈ L1 converges to g pointwise on B = k m≥k Bm .
S
Since kB \ B ′ k ≤ k m≥k (B \ Bm )k ≤ 2−k+1 for all k, we conclude that gk converges almost
surely to g1B . As |gk | ≤ kgku 1B , g1B ∈ L1 by dominated convergence.

Conversely, suppose g1B ∈ L1 for all 1B ∈ L1 . Fix 1A ∈ L1 . Then, for any ε > 0
u
there is an integrable set A0 ⊂ A and a function φ ∈ E such that kA \ A0 k < ε and
g1A 1A0 = g1A0 = φ1A0 . This shows that g is measurable on every integrable set.
Corollary 7.3.2. If g ∈ MR(k k) is bounded, then gf ∈ L1 for all f ∈ L1 .
164 7. Daniell Measurability

Proof. If f ∈ L1 , then sequence hn = g1{|f |> 1 } f ∈ L1 for each n. As |hn | ≤ kgku |f | and
n
hn → gf , we obtain that gf ∈ L1 by dominated convergence.

Theorem 7.3.3. (i) If f ∈ L1 (k k) then there exists f ′ ∈ ERΣ such that kf − f ′ k = 0.

(ii) A function f ∈ RΩ is integrable if and only if f ∈ F ∩ MR(k k) and {f 6= 0} is σ–finite.

Hence, ERΣ ∩ F(k k) ⊂ L1 .

(iii) Suppose E is a Stone lattice, I is positive σ–continuous elementary integral on E and

let k k∗ be its Daniell’s mean. Then, L1 (k k∗ ) ∩ RΩ = F(k k∗ ) ∩ MR(k k∗ ).

Proof. (i) If f ∈ L1 then, f ∈ F, k{|f | = ∞}k = 0, and there exists a sequence (φn ) ⊂ E
converging to f in mean and pointwise almost surely. The set C of all points where (φn )
converges is given by
\[ \ n 1o
C= |φn − φm | ≤ .
k
k N n,m≥N

By Lemma 5.6.5[(ii)], each set {|φn −φm | > k1 } belongs to ERΣ ; hence, by Lemma 5.6.5[(ii),(iii)],
Ω \ C ∈ ERΣ and 1C φn = (φn − 1C c φn ) ∈ ERΣ . Consequently h = lim supn 1C φn ∈ ERΣ ⊂
MR(k k) and f = h almost surely.

(ii) (Necessity) If f ∈SL1 (k k) then f measurable. For each n ∈ N, An = {|f | > 1/n} ∈ L1 ,
and since {f 6= 0} = n An , f is σ–finite.

(Suffciency) Suppose f ∈ F ∩ MR(k k). We claim that f 1A ∈ L1 for any A ∈ L1 . Indeed, for
u
each k ∈ N there is an integrable set Ak ⊂ A and a function gk ∈ E such that kA\Ak k < 2−k
and f = gk on Ak . SByT6.3.15(ii) each fk := f 1Ak is integrable. Clearly (fk ) converges to f
pointwise on A′ = k m≥k Am . Since
X X
A \ A′ ≤ kA \ Am k ≤ 2−m = 2−k+1 → 0,
m≥k m≥k

we conclude that fk converges to f 1A almost surely. Since |fk | ≤ |f | ∈ F, it follows from

Daniell’s dominated convergence that f 1A ∈ L1 .

If {f 6= 0} is σ–finite, then there is an increasing sequence {An } of integrable sets such that
1An ր 1A ≥ 1{f 6=0} . As (f 1An ) ⊂ L1 is dominated by |f | ∈ F and fn → f pointwise, we
have that f ∈ L1 by dominated convergence.
Σ
To prove the last assertion, suppose first that f ∈ ES
R ∩ F(k k). By Lemma 5.6.4 there
exists a sequence {φn } ⊂ E such that {f 6= 0} ⊂ {φn 6= 0}. As each {φn 6= 0} is
σ–finite, so is {f 6= 0}. Therefore f ∈ L1 by the first statement. If f ∈ ERΣ ∩ F, then
fm = (−m) ∨ (f ∧ m) ∈ ERΣ ∩ F and so fm ∈ L1 . As fm → f everywhere, f ∈ L1 by
dominated convergence.
7.4. Absolute continuity 165

(iii) If f ∈ MR(k k∗ ) ∩ F, then An = {|f | > n1 } ∈ M (k k∗ ) ∩ F by Chebyshev’s inequality. By

Theorem 6.4.10, for each n there is Bn ∈ E ↑↓ ∩ L1 such that ∗
S An ⊂ Bn and kAn k = kBn k .
∗

By Lemma 7.3.1 An = An ∩ Bn ∈ L1 , and so {f 6= 0} = n An is σ–finite.

Corollary 7.3.4. Suppose f ∈ L1 (k k) and |γ ◦ f | ≤ h for some real–valued Borel function
γ with γ(0) = 0, and h ∈ F. Then γ ◦ h ∈ L1 .

Proof. By assumption and Theorem 7.1.6(ii) γ ◦ f ∈ MR(k k) ∩ F. It suffices to show that

{γ ◦ f } is σ–finite. Since γ(0) = 0, {γ ◦ f 6= 0} ⊂ {f 6= 0} is σ–finite.

The theory of measurability applies also to order continuous means.

Theorem 7.3.5. Assume E ⊂ Bb (Ω) is a ring lattice closed under chopping and let (E, I)
be a positive order continuous elementary integral whose Daniell–Stone mean is k k• . If
h ∈ E ⇑ is finite, then h is k k• –measurable.

Proof. Suppose h ∈ E ⇑ ∩ RΩ \ L1 , and h = supφ∈Φ φ for some increasingly directed family

Φ ⊂ E. Since h = φ0 + supφ0 ≤φ∈Φ (φ − φ0 ) for all φ0 ∈ Φ, it is enough to consider the
case h ≥ 0. As h ∧ k = supφ∈Φ (φ ∧ k) ∈ E ⇑ for each k ∈ N and hk ր h pointwise, by
Egorov’s theorem we can assume without loss of generality that h is bounded. Let A ∈ L1
⇑
and choose (φn : n ∈ N) ⊂ E+ so that k1A − φn k• → 0. As E is a ring, hφn ∈ E+ for all n.
• •
By Theorem 6.7.6, hφn ∈ L1 for all n since khφn k < khku kφn k < ∞. From
khφn − h1A k• ≤ khku kφn − 1A k• → 0
we conclude that h1A ∈ L1 . Therefore, h is measurable on any integrable set.

7.4. Absolute continuity

Let k k be a mean for a Stone lattice or a ring E ⊂ Bb (Ω). A function g is said to be locally
integrable if φg ∈ L1 (k k) for all φ ∈ E. The collection of all locally integrable functions
is denoted by Lloc loc
1 (k k). By Theorem 6.3.15(ii) and Lemma 7.3.1, L1 (k k) contains the
spaces L1 (k k) and L∞ (k k).
If g ∈ Lloc
1 (k k) then g ∈ M (k k). Indeed, as gφ ∈ L1 (k k) ⊂ M (k k) for any φ ∈ E,
gf ∈ M (k k) for all f ∈ ERΣ . In particular, g1{φ>r} is measurable for any φ ∈ E and r > 0.
It follows from Corollary 7.2.2 that g ∈ M (k k).
Definition 7.4.1. Given two means k k♭ and k k for E, k k♭ is said to be absolutely
continuous with respect to k k, denoted by k k♭ ≪ k k, if k k–negligible sets are also
k k♭ –negligible sets.
Theorem 7.4.2. If k k♭ ≪ k k, then MR(k k) ⊂ MR(k k♭ ).

Proof. Let f ∈ MR(k k). By Corollary 7.2.2 it is enough to show that f is k k♭ –measurable
on any set A of the form {φ > r} where φ ∈ E and r > 0.

We first prove that G : G ⊂ A, G ∈ L1 (k k)} ⊂ L1 (k k♭ ). As E ⊂ L1 (k k♭ ) ∩ L1 (k k),
A ∈ L1 (k k♭ ) ∩ L1 (k k). Let (φn : n ∈ N) ⊂ E be a sequence with 0 ≤ φn ≤ 1 that converges
166 7. Daniell Measurability

to 1G in L1 (k k) and k k–a.s. Since k k♭ ≪ k k, φn as well as ψn = 1A φn converge to 1G

k k♭ –a.s. Since (ψn ) ⊂ L1 (k k♭ ) ∩ L1 (k k) and |ψn | ≤ 1A , 1G ∈ L1 (k k♭ ) by dominated
convergence.

We claim that for any ε > 0 there exits δ > 0 such that G ⊂ A, G ∈ L1 (k k) and kGk < δ
imply kGk♭ < ε. Otherwise, there are ε0 > 0 and sequence of k k–integrable sets Gn ⊂ A
T S
such that kGn k < 2−n but kGn k♭ ≥ ε0 . Setting G = n m≥n Gm we obtain that kGk = 0.
S
Monotone convergence, however, implies that ∞ > kAk♭ ≥ kGk♭ = limn k m≥n Gm k♭ ≥
lim supn kGn k♭ ≥ ε0 . This is a contradiction to k k♭ ≪ k k.

Given ε > 0 let δ > 0 be as above. Let A0 ⊂ A in L1 (k k) with kA \ A0 k < δ and

u
g ∈ E be such that f 1A0 = g1A0 . Then A0 ∈ L1 (k k♭ ) and kA \ A0 k♭ < ε. Therefore,
f ∈ MR(k k♭ ).
Lemma 7.4.3. Suppose 1A ∈ M (k k) and define kf k◦ = kf 1A k. Then k k◦ ≪ k k.
Moreover, f is k k◦ –measurable iff f 1A is k k–measurable.

Proof. Sufficiency is easy to prove, for if f 1A ∈ MR(k k) then, as k k◦ ≪ k k, f 1A ∈

MR(k k◦ ) by Theorem 7.4.2. Since kf 1Ac k◦ = 0, f 1Ac ∈ MR(k k◦ ); whence, f = f 1A +f 1Ac
is k k◦ –measurable.

The proof of necessity suppose f ∈ MR(k k◦ ). Since A ∈ M (k k) and k k◦ ≪ k k,

A ∈ M (k k◦ ), and so f 1A ∈ MR(k k◦ ). Hence, for any set B ∈ L1 (k k◦ ) and any ε > 0,
u
there exit B0 ⊂ B, B ∈ L1 (k k◦ ), and φ ∈ E such that kB \ B0 k◦ < ε and f 1B0 = φ1B0 .
As 1A ϕ ∈ L1 (k k) for any ϕ ∈ E and
k1B − ϕk◦ = k(1B − ϕ)1A k = k1A∩B − ϕ1A k ≤ k1A∩B − ϕk,
1B ∈ L1 (k k◦ ) iff 1A∩B ∈ L1 (k k). In both cases, kBk◦ = kA ∩ Bk. Therefore, f is
k k–neasurable on every k k–integrable subset of A, i.e., f 1A ∈ MR(k k).
Theorem 7.4.4. Suppose g ∈ Lloc ♭ Ω
1 (k k) and let kf k := kf gk for all f ∈ R . Then

(i) k k♭ ≪ k k.
(ii) f ∈ L1 (k k♭ ) iff f g ∈ L1 (k k).
(iii) f ∈ MR(k k♭ ) iff f g ∈ MR(k k).

Proof. (i) follows from Lemma 6.3.11(ii).

To prove the next two statements, consider the function ξg := g1 1{g6=0} . Since g ∈ Lloc
1 (k k) ⊂
MR(k k), ξg ∈ MR(k k).

(ii) As φg ∈ L1 (k k) for any φ ∈ E and

kf − φk♭ = kf g − φgk,
f ∈ L1 (k k♭ ) implies that f g ∈ L1 (k k).
7.5. Daniell–Stone representation 167

Conversely, suppose that f g ∈ L1 (k k). For any φ ∈ E, φξg ∈ MR(k k) and kφξg k♭ ≤ kφk <
∞. Since k k♭ ≪ k k, φξg ∈ MR(k k♭ ). From {φξg 6= 0} ⊂ {φ 6= 0}, it follows that φξg is
σ–finite with respect to k k♭ . By Theorem 7.3.3(ii), φg −1 1{g6=0} ∈ L1 (k k♭ ). Therefore, as

kf g − φk = k(f − φξg )gk = kf − φg −1 1{g6=0} k♭ ,

we conclude that f g ∈ L1 (k k).

(iii) If f g ∈ MR(k k), then f 1{g6=0} = f gξg ∈ MR(k k). Since k k♭ ≪ k k, f 1{g6=0} ∈
MR(k k♭ ). As kf 1{g=0} k♭ = 0, f 1{g=0} ∈ L1 (k k♭ ) ⊂ MR(k k♭ . Therefore f ∈ MR(k k♭ ).

Conversely, suppose f ∈ MR(k k♭ )). As g ∈ M (k k), the map k k◦ : h 7→ kh1{g6=0} k is a

mean for E. Furthermore, k k♭ ≪ k k◦ and k k◦ ≪ k k♭ , and so MR(k k♭ ) = MR(k k◦ ). By
Lemma 7.4.3 f 1{g6=0} is k k–measurable. Therefore, f g = f 1{g6=0} g ∈ MR(k k).

7.5. Daniell–Stone representation

In Section 4 we developed Lebesgue integration which starts from a measure space (Ω, F , µ).
An integral is first define on simple F –measurable functions and then extended to a larger
class of F –functions. In many cases, as in the construction of Lebesgue measure on the
real line, the starting point is a ring R of sets of Ω and a positive σ–continuous function µ
on it. Carathéodory proposed a method that extends µ to any subset of Ω by setting
X [
(7.1) µ∗ (E) = inf{ I(Rn ) : E ⊂ Rn , Rn ∈ R}
n n

The extension µ is σ–subadditive on P(Ω). Measurability is defined by a cut condition: A

is measurable iff
(7.2) µ∗ (A) = µ∗ (A ∩ E) + µ∗ (A \ E) for all E⊂Ω
This procedure produces a measure space (A, M(R), µ∗ ) to which Lebesgue’s method can
be a applied.
It is natural to ask whether Lebesgue-Carathéodory and Daniell’s methods give different
collections of integrable and/or measurable functions. We show in this Section that both
approaches in fact produce exactly the same integrable and measurable functions.
Theorem 7.5.1. Let E ⊂ Bb (Ω) be a Stone lattice. Suppose that (E, I) is a positive σ–
continuous elementary integral and let k k∗ be its Daniell’s mean. A ∈ M (k k) iff
(7.3) kEk∗ = kE ∩ Ak∗ + kE \ Ak∗
for all E ⊂ Ω.

Proof. We show first that any Daniell–measurable function satisfies the cut condition (7.3).
Let’s denote M := M (k k). Suppose A ∈ M and let E ⊂ Ω. If kEk∗ = ∞, then (7.3)
holds by subadditivity. If kEk∗ < ∞, then by Theorem 6.4.10, there is B ∈ L1 such that
168 7. Daniell Measurability

B ⊃ E and kEk∗ = kBk∗ . By Lemma 7.3.1 both B ∩ A and B \ A are integrable. The
subadditivity of the mean and Theorem 6.5.1 imply that
k1E k∗ ≤ k1E∩A k∗ + k1E\A k∗ ≤ k1B∩A k∗ + k1B\A k∗
= I(1B∩A ) + I(1B\A ) = I(1B ) = k1E k∗ .
Therefore A satisfies (7.3).
We claim that the collection M ∗ of set satisfying (7.3) is an algebra. Clearly M ∗ is closed
under complementation. If A, B belong to M ∗ and E ⊂ Ω, then
kEk∗ ≤ kE ∩ (A ∪ B)k∗ + kE ∩ (Ac ∩ B c )k∗

= kE ∩ Ak∗ + k(E ∩ B) ∩ Ac k∗ + k(E ∩ Ac ) ∩ B c k∗
= kE ∩ Ak∗ + k(E ∩ Ac ) ∩ Bk∗ + k(E ∩ Ac ) ∩ B c k∗
= kE ∩ Ak∗ + kE ∩ Ac k∗ = kEk∗
These shows that M ∗ is an algebra.
To conclude the proof, we now show that M ∗ ⊂ M . Suppose A ∈ M ∗ and let E ∈ L1 . The
first part of the proof shows that M ∗ contains the integrable sets; hence, E ∩ A ∈ M ∗ . By
Theorem 6.4.10, there exists an integrable set B such that A∩E ⊂ B and kE ∩Ak∗ = kBk∗ .
From
kBk∗ = kB ∩ (E ∩ A)k∗ + kB \ (E ∩ A)k∗
= kBk∗ + kB \ (E ∩ A)k∗ ,
we obtain that kB \ (E ∩ A)k∗ = 0. Hence E ∩ A ∈ L1 for any E ∈ L1 and, by Lemma 7.3.1,
A ∈ M . Incidentally, this argument also shows that M ∗ = M is a σ–algebra.
Lemma 7.5.2. Let µ∗ be as in (7.1), and let k k∗ be Daniell’s mean. Then µ∗ = k k∗ .

Proof. Let R be the δ–ring generated by the collection of sets of the form φ−1 (I) where
φ ∈ E and I is a finite interval of the form (a, ∞) with a > 0. By definition, µ∗ = k k∗
on R.S For any PA ⊂ Ω, if µ∗ (A) < ∞ then there exists {An : n ∈ N} ⊂ R such that
A ⊂ n An and n I(An ) < µ(A) + ε. It follows from Daniell’s dominated converge that
S
B = n An ∈ R↑ ∩L1 which, together with Theorem 6.4.10, implies that µ∗ (A) = inf{I(B) :
A ⊂ B ∈ R↑ } = kAk∗ .

The following theorem summarizes the results of this section.

Theorem 7.5.3. (Daniell–Stone) Suppose I is a positive σ–continuous elementary integral
on a Stone lattice E ⊂ Bb (Ω), and let k k∗ be its Daniell mean. Then, σ(E) ⊂ M (k k∗ ) and
µ = k k∗ restricted on M (k k∗ ) is a measure satisfying
Z
∗
(7.4) µ(E) = I(E) = kEk E ∈ M and I(f ) = f dµ f ∈ L1 (k k∗ ).

Moreover, µ is uniquely determined on the σ–ring, Rσ (E), generated by {f −1 (B) : f ∈

E, B ∈ B(R \ {0})}. If k k∗ is σ–finite then, Rσ (E) = σ(E).
7.6. Maximality 169

Proof. All statements are consequence of the Daniell integral extension theorem, Theo-
rem 7.5.1, and Lemma 7.5.2.
The last statement follows from Corollary 3.3.6, Theorem 3.5.6, Lemma 5.6.5, and Theo-
rem 5.6.6.

7.6. Maximality
Suppose k k and k k♮ are two means for a Stone lattice or a ring E ⊂ Bb (Ω). If k k ≤ k k♮
Ω
on R , then clearly L1 (k k♮ ) ⊂ L1 (k k). In this section we will show that given a mean k k
for E there exists a maximal mean k k♮ that coincides with k k on E such that k k ≤ k k♮
Ω
on R . In particular, we will show that the Daniell mean of an associated to any positive
σ–continuous elementary integral (E, I) is the maximal mean with kφk∗ = I(|φ|) for all
φ ∈ E for which Cauchy sequences converge and dominated convergence holds.
Lemma 7.6.1. Suppose that E ⊂ Bb (Ω) is either a Stone lattice or a ring, and let k k be
a mean for E. If k k♮ is another mean for E such that kφk ≤ kφk♮ for all φ ∈ E, then
khk ≤ khk♮ for all h ∈ ERΣ .

Proof. Suppose that f ∈ L1 (k k + k k♮ ). If {φn } ⊂ E converges to f in (k k + k k♮ )–

mean then it converges both in k k–mean and in k k♮ –mean. Hence, kf k♮ = limn kφn k♮ ≥
limn kφk = kf k.
S
If h ∈ (ERΣ )+ then {h 6= 0} ⊂ n {φn 6= 0} for some {φn } ⊂ E. As h ∧ n ∈ ERΣ , hn =
W
(h ∧ n) · nk=1 1{|φk |>1/n} ∈ L1 (k k + k k♮ ) by Corollary 7.3.1, and so khn k ≤ khn k♮ . Since
hn ր h, khk = supn khn k ≤ supn khn k♮ = khk♮ by monotone convergence. For arbitrary
♮
h ∈ ERΣ , khk = |h| ≤ |h| = khk♮ .

An immediate consequence of Lemma 7.6.1 is that two means that coincide on E will
also do so on ERΣ . The next result shows that among all means that agree with a particular
mean on E+ , there exits one, which we called maximal , that dominates the rest of them.
Theorem 7.6.2. For any mean k k for E, there exists a unique maximal mean k k♮ that
agrees with k k on E. If k k is order continuous, then there exits a unique maximal order
continuous mean k k∨ that agrees with k k on E and k k∨ ≤ k k♮ .

Proof. Let M(k k) be the collection of all means on E that agree with k k on E+ . Define
(7.5) kf k♮ = sup{kf k♭ : k k♭ ∈ M(k k)}.
Clearly k k♮ coincides with k k on E+ , whence continuity on E+ follows. Absolute homo-
geneity and solidity are easy to verify. It remains to show that k k is countably subadditive.
Let {fn } be a sequence of nonnegative functions. Then, For any k k♭ ∈ M(k k) it follows
that X ♭ X X
fn ≤ kfn k♭ ≤ kfn k♮ .
n n n
P ♮ P
Taking suprema over all k k♭ ∈ M(k k) leads to n fn ≤ n kfn k♮ .
170 7. Daniell Measurability

For the second statement consider the collection M• (k k) of all order continuous mean that
agree with k k on E. The arguments used above show that kf k∨ = sup{kf k♭ : k k♭ ∈
M• (k k)} is a mean for E which dominates k k and agrees with k k on E. To show that k k∨
⇑
is in fact order continuous, suppose H ⊂ E+ is increasingly directed. For any k k♭ ∈ M• (k k)
we have
k sup Hk♭ = sup khk♭ ≤ sup khk∨
h∈H h∈H

whence it follows that k sup Hk∨ = suph∈H khk∨ .

Remark 7.6.3. It is easy to check that the Daniell mean k k∗ (Daniell–Stone mean k k• )
associated with a positive σ–continuous (order continuous) elementary integral I on a Stone
lattice E ⊂ Bb (Ω) is maximal. Indeed, by Daniell’s monotone convergence, for any monotone
nondecreasing sequence Φ ⊂ E+ and any mean k k♭ for E that agrees with k k∗ on E we
have k sup Φk∗ = supφ∈Φ I(φ) = supφ∈Φ kφk♭ = k sup Φk♭ ; hence k k♭ and k k∗ agree on E ↑ .
By definition of Daniell’s mean
kf k∗ = inf{khk∗ : |f | ≤ h ∈ E ↑ } = inf{khk♭ : |f | ≤ h ∈ E ↑ } ≥ kf k♭ .
A similar proof works for the order continuous case since k sup Φk♭ = supφ∈Φ kφk♭ =
supφ∈Φ kφk• = k sup Φk• for any increasingly directed net Φ ⊂ E+ and any order–continuous
mean k k♭ ∈ M• (k k• ).
Theorem 7.6.4. Suppose that k k♮ is a maximal mean for E that agrees with k k on E+ .
Then,
(7.6) kf k♮ = inf{khk : |f | ≤ h ∈ ERΣ }

Proof. Denote the right hand side of (7.6) by kf k⋄ . Clearly k k⋄ agrees with k k on E and
thus, k k⋄ is σ–continuous on E+ . It is easy to check that k P
k⋄ is absolute homogeneous and
⋄
solid. To show that k k is countably subadditive, suppose n kf ⋄
P Pnk < r<P∞. TherePexist
Σ
functions hn ∈ ER such that |fn | ≤ hn and n khn k < r. Since
P ⋄ P n fn ≤ n |fn | ≤ n hn
P Σ P
and n hn ∈ ER , the subadditivity of k k implies n fn ≤ n hn ≤ n khn k < r.
Therefore k k⋄ is a mean for E.

Since k k⋄ agrees with k k on E, k k⋄ ≤ k k♮ . To prove the converse inequality assume

kf k⋄ < r. Then there exists h ∈ ERΣ with |f | ≤ h such that khk < r. As khk♮ = khk by
Lemma 7.6.1, it follows that kf k♮ ≤ khk♮ = khk < r. Therefore kf k♮ ≤ kf k⋄ .

The following result generalizes Theorem 6.4.10 to the setting of a maximal mean.
Lemma 7.6.5. Suppose k k♮ is a maximal mean for E. Then, for any f ∈ F(k k♮ )+ , there
exists h ∈ L1 (k k♮ ) ∩ ERΣ with f ≤ h such that kf k♮ = khk♮ . If f is a set, then h can be
chosen to be a set too.
Remark 7.6.6. The function h in Lemma 7.6.5 is called upper envelope of f .
7.6. Maximality 171

Proof. By Theorem 7.6.4 there exist functions hn ∈ ERΣ such that f = |f | ≤ hn and
khn k♮ ≤ kf k♮ + 1/n. Notice that hn ∈ VL1 (k k♮ ) by Theorem 7.3.3(ii). An application of
monotone convergence shows that h = n hn is integrable and satisfies the conditions of
the Lemma. If f = 1A , then 1{h≥1} is a smaller upper envelop of f .
Theorem 7.6.7. Suppose E ⊂ Bb (Ω).
(i) If E is a Stone lattice, I is a positive σ–continuous elementary integral on E, and
k k∗ is the corresponding Daniell mean then, k k∗ is the maximal mean that agrees
with I on E+ .

If E is a Stone lattice or a ring and k k♮ is a maximal mean fo E then,

Ω
(ii) For any nondecreasing sequence {fn } ⊂ R+ ,
(7.7) sup kfn k♮ = k sup fn k♮ .
n

(iii) For any function f ∈ L1 (k k♮ )

there are integrable functions f and f in ERΣ such
that f ≤ f ≤ f and kf − f k♮ = 0.

Proof. (i) This is already proved in Remark 7.6.3. For a different proof, notice that E ↑ ⊂
ERΣ . Then, by Theorem 7.6.4,

kf k∗ = inf{khk : |f | ≤ h ∈ E ↑ } ≥ inf{khk : |f | ≤ h ∈ ERΣ } = kf k♮ .

(ii) If supn kfn k♮ = ∞ there is nothing to prove. Assume supn kfn k♮ < ∞. For each n ∈ N
let hn ∈ ERΣ ∩L1 (k k♮ ) be an upper envelop of fn = |fn |. The sequence f¯n = inf k≥n hk ∈ L1 is
nondecreasing and fn ≤ f¯n ≤ hn ; thus, kfn k♮ = kf¯n k♮ = khn k♮ . By monotone convergence
f¯n converges in mean to supn f¯n . Therefore
sup kfn k♮ = sup kf¯n k♮ = lim kf¯n k♮ = k sup f¯n k♮ ≥ k sup fn k♮ .
n n n n n

The converse inequality supn kfn k♮ ≤ k supn fn k♮ , follows from solidity since fn ≥ 0.

(iii) By Theorem 7.3.3(i) there is a function h ∈ L1 (k k♮ ) ∩ ERΣ such that kf − hk♮ = 0.

By Lemma 7.6.5 there exists g ∈ L1 (k k♮ ) ∩ ERΣ such that |f − h| ≤ g and kgk♮ = 0. The
functions f = h − g and f = h + g satisfy the desired conditions.
Remark 7.6.8. As a consequence of Theorems 7.3.3(i) and 7.6.7(i), if k k∗ of a positive
σ–continuous elementary integral I on a Stone lattice E ⊂ Bb (Ω) and k k is any other mean
that coincides with k k∗ on E, then both means coincide on L1 (k k∗ ).

A mean k k for E ⊂ Bb (Ω) is said to be strictly increasing if for any f, g ∈ L1 (k k),

g ≤ f a.s and kgk = kf k imply that f = g a.s.
Theorem 7.6.9. The Daniell mean k k∗ of an elementary integral I on a Stone lattice is
strictly increasing.
172 7. Daniell Measurability

Proof. Suppose f, g ∈ L1 , g ≤ f a.s and kgk∗ = kf k∗ . Then by Theorem 6.6.3 0 =

kf k∗ − kgk∗ = I ∗ (f ) − I ∗ (g) = I ∗ (f − g) = kf − gk∗ . Therefore, f = g a.s.

7.7. Integration on locally compact Hausdorff spaces

Unless explicitly stated, we will assume in this section that X is a locally compact Hausdorff
space (l.c.H). We will use K, G and F to denote the collection of compact, open and closed
sets respectively. The space E = C00 (X) is a ring lattice closed under chopping. Lemma 6.7.1
shows that any positive linear functional I on E is an order–continuous elementary integral.
In this section we will study the regularity properties of the Daniell–Stone mean k k•
associated to I and the integral representation of the extension of I in terms of the measure
µI on the collection M (k k• ) of k k• –measurable sets. We will show that all Borel sets are
measurable, that is, B(X) ⊂ M (k k• ).
Theorem 7.7.1. Suppose X is a l.c.H topological space. Let I be a positive linear functional
on C00 (X) and let k k• be the Daniell–Stone mean associated with I. Then,
(i) All Borel sets are k k• –measurable.
(ii) For all K ∈ K, kKk• < ∞ and
(7.8) kKk• = inf{I(φ) : K ≺ φ}
(iii) k k• is finitely additive on G and for any G ∈ G,
(7.9) kGk• = sup{I(φ) : 0 ≤ φ ≺ G} = sup{kKk• : K ∈ K, K ⊂ G}
(iv) For any A ⊂ X,
(7.10) kAk• = inf{kGk• : A ⊂ G ∈ G}.
(v) For any A ∈ L1 (k k),
(7.11) kAk• = sup{kKk• : K ∈ K, K ⊂ A}

Proof. Throughout the proof, we denote M := M (k k• ).

(i) Let us denote C00 (X) by E. Notice that all functions in E ⇑ are lower semicontinuous.
Conversely, by Theorem B.1.5, E ⇑ contains all nonnegative lower semicontinuous functions
in X. In particular, G ⊂ E ⇑ and, by Theorem 7.3.5 G ⊂ M . Therefore, σ(G) = B(X) ⊂ M .

(ii) For any φ ∈ E with 1K ⊂ φ we have that kKk• ≤ kφk• = I(φ) < ∞. Such functions
φ exits by Urysohn’s lemma. It is clear that kKk• ≤ inf{I(φ) : K ≺ φ}. The opposite
inequality will follow immediately once we prove (iv).

(iii) Since G ⊂ E ⇑ , by Theorem 6.7.5(iii) we have that I • is finitely additive on G. By

definition, kGk• = I • (G) = sup{I(φ) : φ ∈ E, 0 ≤ φ ≤ 1G }. Let φ ∈ E+ such that φ ≤ 1G .
Then, for each n ∈ N
Kn = {φ ≥ n1 } ⊂ {φ > 1
n+1 } = Gn ⊂ G
7.7. Integration on locally compact Hausdorff spaces 173

and Kn ∈ K, Gn ∈ G. By Urysohn’s lemma, there exists a sequence (fn ) ⊂ E such that

Kn ≺ fn ≺ Gn . Then φn = fn φ ∈ E, 0 ≤ φn ≺ G, and φn = fn φ ր φ. By Dini’s lemma,
for any ε > 0, there is N large enough such that
I(φ) < I(φN ) + ε ≤ sup{I(ψ) : 0 ≤ ψ ≺ G} + ε.
The left-hand side of (7.9) follows.

If 0 ≤ φ ≺ G, then K = supp(φ) ∈ K and K ⊂ G. Hence kφk• = I(φ) ≤ kKk• ≤ kGk• .

The right hand side of (7.9) follows.

(iv) Only the case kAk• < ∞ requires a proof. Since all h ∈ E ⇑ are lower semicontinuous,
we have that {h > r} ∈ G for all r. For any 0 < δ < 1 there is a function h ∈ E ⇑ ∩ F(k k• )
h
such that 1A ≤ h and khk• ≤ (1 + δ)kAk• . Then 1A ≤ 1{h≥1} ≤ 1{h>1−δ} ≤ 1−δ . By
Theorem 6.7.6 h ∈ L1 (k k ), and thus, 1{h≥1} ∈ L1 (k k ) and 1{h>1−δ} ∈ L1 (k k) ∩ E ⇑ .
• • •

From
khk• 1+δ
kAk• ≤ k{h ≥ 1}k• ≤ k{h > 1 − δ}k• ≤ ≤ kAk• ,
1−δ 1−δ
we conclude that
kAk• = inf{k1{h>r} k• : 1A ≤ h ∈ E ⇑ , 0 < r < 1}
= inf{kGk• : A ⊂ G ∈ G}.

We now conclude the proof of (7.8). If K ∈ K then kKk• < ∞ and so there is G ∈ G
such that U ⊂ G and kGk• < kKk• + ε. By Urysohn’s lemma, there is ψ ∈ E such that
K ≺ ψ ≺ G. It follows immediately that inf{I(φ) : K ≺ φ} ≤ kKk• + ε.

(v) Let F the collection of all subsets of X that have finite k k• and which satisfy (7.11).
It follows that F ⊂ L1 (k k• ) for if A ∈ F , then there is sequence {Kn : n ∈ N} ⊂ K, and
by (iv), a sequence {Gn : n ∈ N} ⊂ G ∩ L1 such that Kn ⊂ A ⊂ Gn with kAk• − n1 < kKn k•
and kGn k• < kAk• + n1 . Hence,
2
k1A − 1Kn k• ≤ k1Gn − 1Kn k• = kGn k• − kKn k• < → 0,
n
whence we conclude that A ∈ L1 . Clearly K ⊂ F , and by (7.10), G ∩ L1 ⊂ F .
We claim that F is a ring which
S is closed under finite
S k k• –mean countable unions, that is,
•
if {An : n ∈ N} ⊂ F and n An < ∞ then, n An ∈ F . First, suppose that Aj ∈ F
for j = 1, 2. Given ε > 0, choose Kj ∈ K and Gj ∈ G with Kj ⊂ Aj ⊂ Gj such that
ε
k1Gj − 1Kj k• < .
2
Then K1 \ G2 ∈ K, and since A1 \ A2 ∈ L1 ,
k1A1 \A2 − 1K1 \G2 k• ≤ k1A1 − 1K1 k• + k1G2 − 1A2 k• < ε.
174 7. Daniell Measurability

S •
This shows that A1 \ A2 ∈ F . Now suppose n An < ∞ where {An : n ∈ N} ⊂ F .
Given ε > 0 choose Kn ∈ K such that k1An − 1Kn k• < 2−n−1 ε. It follows that
X ε
1S A − 1S K • ≤ k1An − 1Kn k• <
n n n n
2
n
•
Choose N large enough so that 1Sn Kn − 1SN Kk < 2ε . Then
j=1
•
1S A − 1SN < ε,
n n j=1 Kj
S
whence we conclude that n An ∈ F .

Suppose A ∈ L1 . By (7.10) thereTis decreasing sequence {Gn : n ∈ N} ⊂ G such that

k1Gn − 1A k• → 0. Hence U := n Gn ∈ F , and kU − Ak• = 0. We conclude that
A = U \ (U \ A) ∈ F .
Definition 7.7.2. Let X be any topological space (not necessarily l.c.H.)
(a) A mean for a Stone lattice or a ring of bounded functions Bb (X) (or an outer
measure on the Borel σ–algebra B(X)) is said to be regular if and inner regu-
larity (7.9) and outer regularity (7.10) hold.
(b) When X is l.c.H. space, a positive linear functional I on C00 (X) (or a regular
positive measure µ on a σ–algebra M ⊃ B(X) which is finite on compact sets) is
said to be a Radon measure.

Theorem 7.7.1 states that the Daniell–Stone mean k k• for a positive elementary integral
(C00 (X), I) is regular if X is l.c.H. The following result gives a unique integral representation
of the elemntary integral in terms of an associated Radon measure.
For the remainder of this section, we will assume that X is a l.c.H. space.
Theorem 7.7.3. ( Riesz–Markov representation theorem) If I is a positive Radon measure
on C00 (X) and k k• is the corresponding Daniell–Stone mean then, the resriction µ of k k•
to M (k k• ) is unique complete Radon measure defined on M (k k• ) ⊃ B(X) such that
Z
I(f ) = f dµ, f ∈ C00 (X).
X
In addition, if I is bounded, then µ is finite and kIk = µ(X) = k1k• .

Proof. Lemma 6.7.1 shows that (C00 (X), I) is an order–continuous elementary integral.
Theorem 7.7.1 shows that the restriction µ to M (k k)• ) is regular and that B(X) ⊂ M (k k• ).
The conclusion to the first statement follows from Theorem 7.5.3.

If I is boundend thenR |I(f )| ≤ kIkkf ku , for f ∈ C00 (X) and by regularity, µ(X) ≤ kIk.
Conversely, |I(f )| = | X f dµ| ≤ kf ku µ(X), and so kIk ≤ µ(X).
Example 7.7.4. Lebesgue measure on (Rd , B(Rd )) is a σ–finite Radon regular measure.
More generally, any Lebesgue–Stieltjes measure µ corresponding to a right–continuous func-
tions with nonnegative increments function is a σ–finite Radon measure on Borel sets.
7.7. Integration on locally compact Hausdorff spaces 175

Theorem 7.7.5. (Lusin’s theorem) Let (X, M , µ) be a Radon measure space, and let f be
a complex measurable function in X. If A ∈ M , µ(A) < ∞ and {f 6= 0} ⊂ A then, for
every ε > 0, there is g ∈ C00 (X) such that

(7.12) µ({f 6= g}) < ε.

Moreover, if f is bounded, g can be chosen so that kgku ≤ kf ku .

Proof. For real valued functions, the first part is just restatment the definition of Daniell
measurability of functions for the Daniell mean induced by µ. For complex functions the
result follows by applyig the real–valued result to the real and imaginary part of any complex
function.

To prove the last statement assume R = kf ku < ∞. Define the map ϕ on C by

z
ϕ(z) = z1{|z|≤R} + R |z| 1{|z|>R} .

For g ′ ∈ C00 (X) with µ({f 6= g ′ }) < ε, set g := ϕ(g ′ ). Then kgku ≤ kf ku and, since
{f 6= g} ⊂ {f 6= g ′ }, (7.12) holds.

Corollary 7.7.6. Let f and A be as in Lusin’s theorem. There exists a sequence {gn } ⊂
C00 (X) such that kgn ku ≤ kf ku and gn → f µ–a.s.

Proof. For each n ∈ N, let gn ∈ C00 (X) be such that kgn ku ≤ kf ku and µ(En ) < 2−n ,
where En = {f 6= gn }. By Borel–Cantelli µ(En i.o) = 0; so, for µ–a.s. all x, there is Nx
such that f = gn for all n ≥ Nx .

Theorem 7.7.7. (Vitali–Carathéodory.) Suppose µ is a positive Radon measure on X and

let f ∈ L1 (µ) ∩ RX . For any ε > 0, there are functions
R u ≤ f ≤ v such that u and v are
upper and lower semicontinuous respectively, and (v − u) dµ < ε.

Proof. First we consider the case f ≥ 0. Let 0 ≤ sn ր f be as Pin Lemma 4.2.2 and set
n −n
tn = sn − sn−1 . Then 2 tn = 1Tn for some Tn ∈ Mµ , and so f = n≥1 2 1Tn . For each n,
there exist Kn ∈ K and Gn ∈ G with Kn ⊂ TnR ⊂ Gn such that µ(Gn \ Kn ) < ε/2. Choose
P P
N large enough so that n>N 2
−n µ(T ) =
n f dµ − N n=1 2
−n µ(T ) < ε/2, and define
n
PN −n
P ∞ −n
u = n=1 2 1Kn , v = n=1 2 1 Gn . Then u and v are u.s.c. and l.s.c respectively,
u ≤ f ≤ v, and
Z Z X
N Z X
−n
(v − u) dµ = 2 (1Gn − 1Kn ) dµ + 2−n 1Gn dµ
j=1 n>N
Z X
∞ Z X
≤ 2−n (1Gn − 1Kn ) + 2−n 1Tn dµ < ε
n=1 n>N

For general f , apply the previous reasoning to f− and f+ separately.

176 7. Daniell Measurability

7.8. Exercises
Exercise 7.8.1. Suppose A is a measurable set. Show that a function f is measurable on
every integrable subset of A iff f 1A is measurable.
Exercise 7.8.2. Suppose E is a ring lattice closed under chopping and that k k is an
order–continuous mean for E. Show that the functions E ⇑ are k k–measurable.
Ω
Exercise 7.8.3. If g ∈ Lloc ♭
1 (k k), show that the function k k : f 7→ kf gk on R is a mean
♭
for E. Show that any k k–neglibigle set is a k k –negligible set.
Exercise 7.8.4. Suppose k k♮ is a maximal mean on a Stone lattice or a ring E ⊂ Bb (Ω).
Show that f ∈ L1 (k k♮ ) iff f ∈ MR ∩ F(k k♮ ).
Exercise 7.8.5. If A and B are atoms of k k, show that either kA ∩ Bk = 0 or kAk = kBk.
Exercise 7.8.6. Suppose µ is a nonatomic measure on (Ω, F ) with µ(Ω) = ∞. Show that
for any 0 ≤ u < ∞, there is a A ∈ F such that µ(A) = u. (Hint: Beign µ not atomic, the
collection B := {B ∈ F : 0 < µ(B) < ∞} is not empty. Show that a := supB∈B µ(B) = ∞.)
Exercise 7.8.7. Suppose X is l.c.H and let I be a positive linear functional on C00 (X).
Suppose that {φn : n ∈ N} ⊂ C00 (X) converges uniformly to some function φ and that there
is a compact set K ⊂ X that contains the support of all functions in the sequence. Show
that φ ∈ C00 (X) and that limn kφn − φk• = 0.
Exercise 7.8.8. (Localization of an elementary integral.) Suppose X is a l.c.H space.
Assume there is collection of pairs {Wα , Iα ) : α ∈ A} such that {Wα : α} is an open cover
of X, Iα is a positive linear functional on C00 (Wα ), and Iα and Iβ coincide in C00 (Wα ∩ Wβ ).
Show that there is a unique positive linear functional I on X such that its restriction to
Wα is Iα . (Hint: let f ∈ C00 (X) and K a compact containing supp(f ). Use a partition of
unity of K subordinated
P to a finite cover {WSαj : j = 1, . . . , n} of K (see Lemma 2.11.8),
and define I(f ) = nj=1 Iαj (φj f ) where K ⊂ nj=1 Wαj and supp(φj ) ⊂ Wαj .)
Exercise 7.8.9. Suppose µ is a Borel measure on a topological space (X, τ ). The support
of µ is defined as
supp(µ) = {x ∈ X : ∀U ∈ τ, x ∈ U implies µ(U ) > 0}.
(a) Show that supp(µ) is a closed set.
(b) Show that if (X, τ ) has a countable base, then µ(X \ supp(µ)) = 0.

S measure on X, then µ X \ supp(µ) = 0.
(c) Show that if X is l.c.H and µ is a Radon
(Hint: If G = X \ supp(µ) then G = {V : V open, µ(V ) = 0}.)
Exercise 7.8.10. Suppose µ is a Borel measure on Rn . If f ∈ Cb (Rn ) and f ≡ c µ–a.s.,
show that f (x) = c for all x ∈ supp µ.
Chapter 8

Lp spaces

In this section we develop the theory of p–th integrable functions. Lp spaces are fundamental
objects in applications of integration theory.

8.1. Convex functions on the real line

Definition 8.1.1. A function ϕ : (a, b) → R, −∞ ≤ a < b ≤ ∞, is convex if
(8.1) ϕ((1 − t)x + ty) ≤ (1 − t)ϕ(x) + tϕ(y)
for any a < x < y < b and 0 ≤ t ≤ 1. If strict inequality holds in (8.1) with 0 < t < 1, then
ϕ is strictly convex.

Geometrically, if ϕ is convex and a < x < u < y < b then the point (u, ϕ(u)) on the
graph of ϕ lies below the straight line joining (x, ϕ(x)) and (y, ϕ(y)). It is easy to check
that (8.1) is equivalent to any of the inequalities
ϕ(u) − ϕ(x) ϕ(y) − ϕ(x) ϕ(y) − ϕ(u)
(8.2) ≤ ≤
u−x y−x y−u
ϕ(u)−ϕ(x)
For fixed a < x < b, inequalities (8.2) show that the map u 7→ u−x decreases as u ց x
and increases as u ր x. Consequently, the maps
ϕ(u) − ϕ(x) ϕ(v) − ϕ(x)
(8.3) α(x) := sup ; inf := β(x)
a<u<x u−x x<v<b v−x
satisfy
(8.4) α(x) ≤ β(x) ≤ α(y), a<x<y<b

Lemma 8.1.2. The functions α and β are monotone increasing and left continuous and
right continuous respectively. Furthermore, α(x+) = β(x) and α(x) = β(x−).

177
178 8. Lp spaces

Proof. Let x ∈ (a, b) be fixed, and consider the sequence xn = x + n1 . From (8.4), it follows
that β(x) ≤ α(x + n1 ) ≤ β(x + n1 ) ≤ n(ϕ(x + n2 ) − ϕ(x + n1 )). Letting n ր ∞, we obtain
β(x) ≤ α(x+) ≤ β(x+) ≤ β(x). The corresponding statement for left limits follows by
using xn = x − n1 instead.

Since the functions α and β are nondecreasing, we conclude that, except for a countable
set of common discontinuities where jumps are equal, α = β on (a, b).
Theorem 8.1.3. If ϕ : (a, b) → R convex, then ϕ is continuous; moreover, ϕ is differen-
tiable everywhere, except on a countable set.

Proof. Suppose a < x < y < b and let x = x0 < . . . < xn = y. Then
β(xm−1 )(xm − xm−1 ) ≤ ϕ(xm ) − ϕ(xm−1 ) ≤ α(xm )(xm − xm−1 )
Adding all terms gives
X n n
X
β(xm−1 )(xm − xm−1 ) ≤ ϕ(y) − ϕ(x) ≤ α(xm )(xm − xm−1 ).
m=1 m=1
Ry Ry
Consequently, ϕ(y) − ϕ(x) = x β(t) dt = x α(s) ds; hence, ϕ is absolutely continuous
on any closed interval, and differentiable everywhere except in the countable set N of
discontinuities of β.
Theorem 8.1.4. (a) If ϕ is convex in (a, b), there is a unique Borel measure µ on B((a, b))
such that
(8.5) µ((x, y]) = βϕ (y) − βϕ (x); µ([x, y)) = αϕ (y) − αϕ (x),
where αϕ and βϕ are the left and right derivatives defined as in (8.3). (b) Conversely, if µ
is a Borel measure on B((a, b)), then there exists a convex function ϕ such that (8.5) holds.
(c) If ψ is another such convex function, then ϕ − ψ is linear.
Ry
Proof. (a) By Theorem 8.1.3 ϕ(y) − ϕ(x) = x βϕ (t) dt. Since βϕ is nondecreasing and
right continuous, Theorem 4.6.4 shows that there is a unique Borel measure µ on B((a, b))
such that µ((x, y]) = βϕ (y) − βϕ (x) whenever a < x < y < b. The last identity in (8.5)
follows from βϕ (x−) = αϕ (x) for a < x < b.
(b) Given a Borel measure µ in B((a, b)), the funcion

µ((x0 , x]) if x0 ≤ x
(8.6) g(x) =
−µ((x, x0 ]) if x0 ≥ x,
Rx
where a < x0 < b is fixed, is nondecreasing and right continuous. If ϕ(x) = x0 g(t) dt then,
for any a < x < y < b,
Z
g(x)(y − x) ≤ g(t) dt = ϕ(y) − ϕ(x) ≤ g(y)(y − x).
[x,y]
Hence,
ϕ(u) − ϕ(x) ϕ(y) − ϕ(u)
≤ g(u) ≤
u−x y−u
8.2. Jensen’s Inequality 179

for all a < x < u < y < b; therefore,ϕ is convex. Moreover, βϕ (x) = g(x), αϕ (x) = g(x−),
and (8.5) holds.
(c) If ψ is another for convex function for which (8.5) holds, then
βϕ (y) − βψ (y) = βϕ (x) − βψ (x)
for any a < x < y < b. Consequently, βϕ = βψ + C for some constant C and, for x0 fixed,
ψ(x) = ψ(x0 ) − ϕ(x0 ) + C(x − x0 ) + ϕ(x).
Example 8.1.5. Consider the convex functions f (x) = 21 |x|, g(x) = x− , h(x) = x+ and
p(x) = 21 x2 on R. Then δ0 = µf = µg = µh , whereas µp = λ.

8.2. Jensen’s Inequality

Theorem 8.2.1. ( Jensen’s inequality) Let (Ω, F , P) be a probability space and let ϕ :
(a, b) → R be aRconvex function, where −∞ ≤ a < b ≤ ∞. If X : (Ω, F ) → (a, b) is
integrable then, Ω ϕ ◦ X dµ ∈ R ∪ {+∞} and
Z Z
(8.7) ϕ( X dP) ≤ (ϕ ◦ X) dP.
Ω Ω

If ϕ ◦ X ∈ L1 , equality in (8.7) holds iff there are constants α, β such that ϕ(X) = αX + β
µ–a.s. Hence, if ϕ is strictly convex, equality in (8.7) holds iff X is constant µ–as.

R The continuity of ϕ is implies that ϕ ◦ X is measurable. Since a < X < b, if

Proof.
x0 = Ω X dP then, a < x0 < b and for any ρ ∈ [α(x0 ), β(x0 )],
(8.8) ϕ(X) ≥ ϕ(x0 ) + ρ (X − x0 ).
The nonincreasing monotonicity of the map x 7→ x− along with x− ≤ |x| implies that
(ϕ ◦ X)− ≤ |ϕ(x0 )| + |ρ|(|X| + |x0 |). Whence, (ϕ ◦ X)− ∈ L1 and the first assertion follows.
Inequality (8.7) is obtained by integrating both sides of (8.8). The last two statements are
consequences of Corollary 4.2.5(ii).
Pn
Example 8.2.2. Suppose α1 , . . . , αn are nonnegative numbers with i=k αk = 1 and let
a1 , . . . , an be real numbers and let bk = eak . Then
n
Y n
X n
X n
X
(8.9) bαk k = exp( a k αk ) ≤ ak
e αk = bk α k .
k=1 k=1 k=1 k=1

More generally, if (Ω, F , P) is a probability space and X : Ω → [0, ∞] is a measurable

function for which log ◦X is integrable, then
Z Z
(8.10) exp (log ◦X) dP ≤ X dP.
Ω Ω

If X ∈ L1 , then equality in (8.10) occurs iff X is a nonnegative constant P–a.s. The term
on the left and right of (8.10) are called geometric and arithmetic means respectively.
180 8. Lp spaces

8.3. Lp spaces
In this section we will introduce the spaces Lp starting from a mean k k∗ on a lattice ring
E. We will assume throughout this section that k k∗ is continuous along increasing
sequences of nonnegative functions, that is,
(8.11) kfn k∗ ր k sup fn k∗
n
Ω
for any nondecreasing sequence {fn } ⊂ R+ .
Two important examples of this instance are
Maximal means and the Daniell mean of a positive σ–continuous elementary integral (E, I).
Ω
Definition 8.3.1. The p–norm, 0 0 : k{|f | > α}k = 0} if p=∞
Ω
Denote by Fp (k k∗ ) the collection of functions f ∈ R such that kf kp < ∞.
∗ ∗
It is clear ∗from the definition above that k k1 = k k , kf kp = 0 iff kf k = 0, and
{f > kf k∞ } = 0.
1 1
Theorem 8.3.2. ( Hölder’s inequality) Suppose that p, q ≥ 1 and p + q = 1. For any
Ω
f, g ∈ R
(8.13) kf gk1 ≤ kf kp kgkq .
If 1 0 such that |f |p = c|g|q a.s. If p = 1 and kf k1 ∧ kgk∞ < ∞, then
equality in (8.13) occurs iff |f g| = kgk∞ |f | a.s.

Proof. First assume that 1 ≤ p, q < ∞. If kf kp = 0 or kgkq = 0 then |f g| = 0 a.s.

and (8.13) holds. If kf kp kgkq > 0, then by the goemetric–arithmetic mean inequality(8.9)

|f (ω)| |g(ω)| 1 |f (ω)| p 1 |g(ω)| q
(8.14) ≤ + .
kf kp kgkq p kf kp q kgkq
The solidity of the mean k k∗ and (8.14) imply (8.13). If 0 < kf kp kgkq < ∞ are finite, then
by the geometric–arithmetic mean inequality, equality in (8.13) occurs iff (|f |/kf kp )p =
(|g|/kgkq )q a.s.
If p = 1, then q = ∞ and |g| ≤ kgk∞ a.s. By the solidity of the mean k · k∗ , we obtain
that kf gk1 ≤ kgk∞ kf k1 . If 0 < kf k1 kgk∞ < ∞, then equality in (8.13) is only possible iff
|f ||g| = |f |kgk∞ a.s.
R ∞ −t x−1
Example 8.3.3. Recall the gamma function Γ(x) = 0 e t dt, x > 0. Notice that
1 1
µ(dt) = t e dt is a Borel measure on (0, ∞). Suppose p + q = 1 and define f (t) = tx/p ,
−1 −t

g(t) = ty/q . Hólder’s inequality implies that Γ p1 x + 1q y ≤ Γ(x)1/p Γ(y)1/q . This means
that log ◦ Γ is convex.
8.3. Lp spaces 181

Ω
Theorem 8.3.4. ( Minkowski’s inequality) Suppose 1 ≤ p ≤ ∞ and let f, g ∈ R be such
that {f = ±∞} ∩ {g = ∓∞} is negligible. Then
(8.15) kf + gkp ≤ kf kp + kgkp .

Proof. First assume 1 ≤ p < ∞. If kf kp ∨ kgkp = ∞ or kf + gkp = 0, there is nothing

to prove. Suppose kf + gkp > 0 and kf kp ∨ kgkp < ∞. Since |f + g|p ≤ 2p (|f |p + |g|p ), it
follows that kf + gkp < ∞. Then, since
|f + g|p = |f + g||f + g|p−1 ≤ |f ||f + g|p−1 + |g||f + g|p−1
We conclude from Hölder’s inequality that
kf + gkpp ≤ kf kp kf + gkp−1
p + kgkp kf + gkp−1
p
= (kf kp + kgkq )(kf + gkp−1
p )

For p = ∞ notice that {|f + g| > α + β} ⊂ {|f | >

a} ∪ {|g| > b}. Hence, we have that
{|f + g| > α + β} ∗ ≤ {|f | > a} ∗ + {|g| > b} ∗ . The conclusion follows by letting
a = kf k∞ and b = kgk∞ .
Theorem 8.3.5. (i) k kp is a mean for E for any 1 ≤ p < ∞. (ii) If k k∗ is maximal,
then so is k kp for any 1 ≤ p < ∞. (iii) If p = ∞, k k∞ is absolute homogeneous, solid,
countably subadditive and continuous along increasing sequences.

Proof. Suppose 1 ≤ p < ∞. (i) Absolute homogeneity and solidity are easy to check.
Finite subadditivity follows from Minkowski’s inequality. To check countable subadditiv-
Ω P p P p
ity, let {fn } ⊂ R+ . Then nk=1 fk ր n fn pointwise. Since k k∗ is continuous
P p ∗ = lim |
Pn p ∗ .
along nonnegative monotone increasing sequences, | ∞ n=1 fn | n k=1 fn |
Therefore,
∞ n n ∞
X X X X
fn p = lim fn p ≤ lim kf kp = kfn kp .
n n
n=1 k=1 k=1 n=1
P
To check E–continuity, let {φn } ⊂ E+ be such that supn nk=1 φn p < ∞. If ψn =
Pn p p
Σ
k=1 φn ∈ E+ , then ψn ∈ ER by Corollary 7.3.4 and, since ψ
n ≤ kψnp−1 ku ψn , ψnp ∈ L1 (k k∗ ).
p p P∞ p ∗
As {ψn } is an increasing sequence of integrable functions, ψn − n=1 φn → 0. The
elementary inequality 1 + tp ≤ (1 + t)p , where p ≥ 1 and t ≥ 0, shows that kφn kpp ≤
kψnp k∗ − kψn−1
p
k∗ → 0. This show that k kp is a mean.

(ii) Suppose that k k∗ is maximal and let k k♮p be the maximal mean that coincides with
k kp on E+ . If kf kp < ∞, then by the maximality of k k∗ , there exists 0 ≤ h ∈ ERΣ such that
|f |p ≤ h and k|f |p k∗ = khk∗ . As h1/p ∈ ERΣ and |f | ≤ h1/p , kf kp = kh1/p kp = kh1/p k♮p ≥
kf k♮p . Therefore, by Theorem 7.6.4, kf k♮p = kf kp .

(iii) Suppose p = ∞, and

let 0 ≤ ∗fn ր n∗ kfn k∞ ≤ kf k∞ . Suppose
f . Clearly sup S that b =

supn kf k∞ < ∞, then {fn > b} ≤ {fn > kfn k∞ } = 0. Since {f > b} ⊂ n {fn > b},
182 8. Lp spaces

it follows that k{f > b}k∗ = 0; consequently kf k∞ ≤ b. This shows that k k∞ is continuous
along nonnegative increasing sequences. Subadditivity follows immediately.

The following result is an immediate consequence of Theorem 8.3.5.

Theorem 8.3.6. Suppose 1 ≤ p < ∞. Then,
(i) (Fp , k kp ) is a complete seminormed space that contains E.
(ii) The closure of E in (Fp , k kp ), denoted by Lp (k k∗ ), is a complete Stone lattice.
(iii) If (fn : n ∈ N) ⊂ Fp , f ∈ Fp and limn kfn −f kp = 0, then there exits a subsequence
fnk that converges k k∗ –almost surely to f .
(iv) If (fn : n ∈ N) ⊂ Lp converges to f k k–almost surely and |fn | ≤ g for some
g ∈ Fp , then f ∈ Lp and kfn − f kp → 0.

Proof. Since k kp is a mean for E, statements (i), (ii) and (iii) hold by Theorem 6.3.12.
Statement (iv) is a direct consequence of Daniell–Lebesgue dominated convergence theorem.

Theorem 8.3.7. Let 1 ≤ p, r < ∞. A function f ∈ Lp iff f |f |(p/r)−1 ∈ Lr . In particular,
for all 1 ≤ p < ∞, 1A ∈ Lp iff 1A ∈ L1 .
P
Proof. If f ∈ Lp , then there is a sequence {ψn } ⊂ E such that n kψn kp < ∞ and
P (p/r)−1
f = n ψn almost surely. Let G(x) = x|x| 1R\{0} (x) and define
n
X
Ψn = G( ψk )
k=0
P
Clearly Ψn ∈ E Σ , Ψn → G(f ) = f |f |(p/r)−1 , and |Ψn | ≤ h := ( n |ψn |)p/r ∈ Fr . By
Corollary 7.3.4 Ψn ∈ Lr and, by dominated convergence, f |f |(p/r)−1 ∈ Lr . The converse
statement follows by interchanging p with r and f with f |f |(p/r)−1 .
The last assertion follows from G(1A ) = 1A .
Corollary 8.3.8. Assume 1 ≤ p < ∞. Then MR(k k∗ ) = MR(k kp ). Moreover, for any
real valued function f , f ∈ Lp if f ∈ MR ∩ Fp and {f 6= 0} is σ–finite. If k k∗ is maximal,
then Lp ∩ RΩ = MR ∩ Fp .

Proof. The first statement is a consequence of the fact that all sets in Lp for all 1 ≤ p < ∞,
and that k1A kpp = k1A k∗ . The second follows from Theorem 7.3.3 and maximality of the
mean.
Remark 8.3.9. When p = ∞, the closure of E in F∞ is to small to be useful. Instead, we
define L∞ = MR(k k∗ ) ∩ F∞ (k k∗ ).
1 1
Corollary 8.3.10. If f ∈ Lp (k k∗ ), 1 ≤ p < ∞ and p + q = 1, then
(8.16) kf kp = max{kf gk∗ : g ∈ Lq , kgkq = 1}
If p = ∞ and {f 6= 0} is σ–finite, then (8.16) with q = 1 and sup instead of max.
8.3. Lp spaces 183

Proof. It suffices to assume that kf kp > 0. If kgkq = 1 then kf gk∗ ≤ kf kp kgkq ≤ kf kp by

Hölder’s inequality.
|f |p−1
If 1 ≤ p < ∞ consider g = 1{f 6=0} for p = 1, and g = 1{f 6=0} otherwise. Clearly
kf kp−1
p
g ∈ Lq , kgkq = 1 and kf gk∗ = kf kp .
Suppose p = ∞ and {f = 6 0} is σ–additive. For any ε > 0 there exists a set E ∈ L1 with
1
E ⊂ {|f | > kf k∞ − ε} and such that kEk > 0. Then g = kEk ∗ 1E ∈ L1 , kgk1 = 1 and
∗
kf gk ≥ kf k∞ − ε.
Example 8.3.11. Suppose (E, I) is an elementary integral and k k∗ is Daniell’s mean. Let
{fn } be a sequence in Lp (k k∗ ) such that fn → f converges k k∗ –almost surely. Then,
fn → f in Lp iff kf k∗p → kf k∗p . Necessity follows from

kfn k∗p − kf kp ≤ kfn − f k∗p .

To show sufficiency, let gn := 2p−1 (|fn |p + p

|f | ) and g = 2p |f |p . Then |fn − f |p ≤ gn and
∗
limn kgn k = limn 2 p−1 p p ∗
I(|fn | ) + I(|f | ) = kgk . The conclusion follows from Lebesgue
dominated convergence.

The most important instance of the theory of integration is when (E, I) is an elementary
integral and k k∗ is its Daniell mean. Then, by the Stone–Daniell representation theorem,
we can Rassociate to the extension I a measure µ so that I(1A ) = µ(A) for A ∈ M and
I(f ) = f dµ for all f ∈ L1 (k k∗ ).
The extension to complex–valued functions represents no extra effort in view of Sec-
tion 6.5.4. Almost by designed, we have the following results:
Theorem 8.3.12. Let Ss be the collection of all measurable complex simple functions, and
S = Ss ∩ L1 . Then, S is dense in Lp (Ω, M , µ) for all 1 ≤ p < ∞, and Ss is dense in
L∞ (X).

Proof. It is enough to consider real–valued functions. Clearly S ⊂ Lp . Let sn be a

sequence of simple functions such that sn → f with |sn | ր |f | as in Lemma 4.2.2 Since
|f − sn |p ≤ 2p |f |p , by dominated convergence kf − sn kp → 0 as n ր ∞. Thus f is in the
closure of S in Lp . If f ∈ L∞ , let Ss ∋ sn → f with |sn | ր |f | as in Lemma 4.2.2. Then
kf − sn k∞ ≤ 2−n → 0.
Theorem 8.3.13. Suppose X is a l.c.H. topological space and that µ is a regular Radon–
measure on M ⊃ B(X). Then C00 (X) is dense in Lp (X), (1 ≤ p < ∞).

Proof. Clearly C00 (X) ⊂ Lp (X). Since the space S of integrable simple functions is dense
in Lp (X), it suffices to show that any set A ∈ S can be approximated in Lp (X) by functions
in C00 . By regularity, for any ε > 0 there are K ∈ K and G ∈ G with K ⊂ A ⊂ G such that
µ(G \ K) < ε. If f ∈ C00 (X) with 1K ≤ f ≤ 1G , then k1A − f kp ≤ k1G − 1K kp ≤ ε1/p .
Example 8.3.14. Let µ be a regular Radon–measure on Rn . Lp (Rn , µ) is separable each
1 ≤ p < ∞. This follows from the density of C00 (Rn ) in Lp (Rn , µ) and the fact (see
184 8. Lp spaces

Theorem 5.3.17) that there is a countable collection E of polynomials in C00 (Rn ) which is
uniformly dense in C0 (Rn ).
Example 8.3.15. We prove in this example that L∞ (Rn , λn ) is not separable. Suppose S
is dense in L∞ (Rn , λn ). We will show that S is necessarily uncountable. Fix r > 0 and for
each x ∈ Rn define fx = 1B(x;r) . As kfx − fy k∞ = 1 whenever x 6= y, each g ∈ S may be in
at most one ball B(fx ; 21 ). Since S is dense, we can conclude that S is uncountable.
Example 8.3.16. Clearly C([−1, 1]) ⊂ L∞ ([−1, 1], λ1 ). Let f = 1[0,1] . If 0 < ε < 14 , the
ball B(f ; ε) in L∞ ([−1, 1]) does not contained any function in C([−1, 1]). This shows that
C([−1, 1]) is not dense in (L∞ ([−1, 1]), k k∞ ).
Example 8.3.17. Let H = span{γt (x) = exit : x, t ∈ R}. If (R, B(R), µ) is a finite measure
space, then H is dense in Lp (µ) for all 1 ≤ p < ∞.

Proof. It suffices to show that for any ε > 0 and f ∈ C00 (R), there is g ∈ H such that
c
− gkp <
kf p ε. Let A > 0 large enough so that supp(f ) ⊂ [−A, A] and µ([−A, A] ) <
ε
2(kf ku +1) . By Stone–Weiestrass span{γ2πn/A (x) = ei2πnx/A : n ∈ Z} is uniformly dense
in space of continuous periodic functions
of periodnA. Therefore, there is g ∈ H such that
1/p
kf − gk[−A,A],u < 1 ∧ ε/(2µ (R)) . Then
kf − gkp = k(f − g)1[−A,A] kp + kg1[−A,A]c kp
ε µ([−A, A] 1/p
≤ + (kf ku + 1)µ([−A, A]c )1/p < ε.
2 µ(R)
Therefore, H is dense in Lp , 1 ≤ p < ∞.
Theorem 8.3.18. Let (Ω, F , µ) be a measure space and 1 ≤ q ≤ ∞. Suppose f is a
measurable function such that
n Z o
Mf = sup f g dµ : g ∈ Lq (µ, C), kgkq = 1 < ∞.
1 1
If {f 6= 0} is σ–finite then, f ∈ Lp (µ, C), where p + q = 1, and kf kp = Mf .

f
Proof. For p = 1 the statement follows immediately by taking g = |f | 1{|f |6=0} .
Assume 1 < p < ∞. For any E ∈ F with E ⊂ {|f | = 6 0} and µ(E) < ∞, we will show
that kf 1E kp ≤ Mf . This would imply that f ∈ Lp and that kf kp ≤ Mf . The reverse
inequality follows from Hölder’s inequality. Let fn be a sequence of simple functions such
that |fn | ≤ |f | and fn → f . Then hn = fn 1E belongs to Ls for all s > 0, |hn | ≤ |f |1E and
hn → f 1E . If
f |hn |p−1
φn = 1E ,
|f | khn kp−1
p
then kφn kq = 1 and
Z Z
kf 1E kp ≤ lim inf khn kp = lim inf |φn hn | dµ ≤ lim inf φn f dµ ≤ Mf .
n n n
8.4. Riesz representation. 185

For p = ∞, let ε > 0 and Aε = {|f | > Mf + ε}. If µ(Aε ) > 0, then for any ∅ =
6 E ⊂ A with
f 1E
µ(E) < ∞, let g = |f | µ(E)so that g ∈ L1 and kgk1 = 1. Then
Z Z
1
f g dµ = |f | dµ ≥ Mf + ε,
µ(E) E
contradicting the definition of Mf . Therefore kf k∞ ≤ Mf . Mf ≤ kf k∞ follows by Hölder’s
inequality.

8.4. Riesz representation.

Functions that differ in a negligible set are essentially the same for all practical purposes.
Thus, we will identify two functions f and f ′ iff k{f 6= f ′ }k∗ = 0. It is straight forward to
check that if f, f ′ ∈ Lp and f = f ′ almost surely, then kf kp = kf ′ kp . We define the space
Lp as the classes of equivalence in Lp , that is, Lp = {f˙ : f ∈ Lp }.
For the remaining of this section, we will consider only extendend–real valued functions.
Given equivalence classes ġ and f˙, we say ġ ≤ f˙ iff g ≤ f k k–a.s. similarly, ġ < f˙ iff g < f
k k–a.s. It is clear that Lp , with kf˙kp = kf kp for any f ∈ f˙, is a Banach lattice closed
under chopping by letting f˙ ∧ ġ, f˙ ∨ f˙ and f˙ ∧ 1 be the class of equivalence of f ∧ g, f ∨ g
and f ∧ 1 respectively.
A collection G ⊂ Lp is said to be bounded above in Lp if there is ḣ ∈ Lp such that
ġ ≤ ḣ for all ġ ∈ G. Such ḣ is called upper bound for G. If ġ ∗ ∈ Lp is and upper bound of
G ⊂ Lp such that ġ ≤ f˙ for any other upper bound f˙ ∈ Lp of G, then ġ is said to be the
least upper bound or supremum of G. Clearly, if a least W upper bound of G exists then,
it is is unique. The least upper bound of G is denoted by G.
The following result shows that Lp (k k) is actually order complete, that is, any
nonempty family G ⊂ Lp (k k) that has an upper bound in Lp has a least upper bound
in Lp . We will use the following general result about ordered vector spaces.
Lemma 8.4.1. Suppose (V, ≤) is a real vector space with an order compatible with the
linear structure, that is, for all v, v ′ , w ∈ V and α ∈ R+
v ≤ v =⇒ v + w ≤ v ′ + w, αv ≤ αv ′
W
Suppose that B exists for any nonempty set B ⊂ V+ = {v ∈ V : 0 ≤ v}, that is bounded
above and closed under finite suprema. Then, V is ordered complete.

Proof. Suppose B ⊂ V is a nonempty and bounded above and let g0 ∈ B. Clearly h is an

upper bound of B iff h − g0 is an upper bound of B − g0 . As 0 ∈ B − g0 , f is an upper
bound of B − g0 iff f is an upper bound of (B − g0 )+ := {(g − g0 ) ∨ 0 : g ∈ B}. Hence, in
order to prove that B admits a suprema, it is enough to further assume that B ⊂ V+ . Let
B̂ denote the collection of all finite suprema of elements of B. Clearly B ⊂ B̂ ⊂ V+ , and for
any h ∈ V, h is an upper bound of B̂ iff h is an upper bound of B. As B̂ is closed under
taking finite suprema, by hypothesis B̂, and so B, has a supremum.
186 8. Lp spaces

Theorem 8.4.2. Suppose k k∗ is the Daniell mean of an elementary integral (E, I) or a an

increasingly strictly mean on E. Lp (k k∗ ) is an order complete Banach lattice for 1 ≤ p < ∞.
If p = ∞ and k k∗ is σ–finite, then L∞ (k k∗ ) is also an order complete Banach lattice.

Proof. The solidity of the k k∗p seminorm implies that Lp (k k∗ ) is a Banach lattice for all
1 ≤ p ≤ ∞. The difficult part is to show order completness. By Lemma 8.4.1 it is enough
to show that any nonempty set G of positive elements in Lp that is bounded above and
closed under finite suprema has a supremum in Lp . Let r = supg∈G kġkp < ∞.

Suppose 1 ≤ p < ∞. As k k∗ is strictly increasing (see Theorem 7.6.9), so is k kp . There us

a sequence (ġn ) ⊂ G with limn kġn kp = r such that 0 ≤ gn ≤ gn+1 for some choice gn ∈ ġn .
By Daniell’s monotone convergence, gn → g ∗ := supn gn in k kp –mean and a.s.

We claim that ġ ∗ satisfies the conditions of the theorem. First, we show that ġ ∗ is an upper
bound. Since kġn ∨ ġkp ≤ r = kġ ∗ kp for any ġ ∈ G, kġ ∗ ∨ ġkp ≤ kġ ∗ kp . As k kp is strictly
increasing, ġ ≤ ġ ∗ for all ġ ∈ G.

We now show that ġ ∗ is the least upper bound of G. If f˙ is another upper bound of G, then
so is ġ ′ = ġ ∗ ∧ f˙. Then r ≤ kġ ′ kp ≤ kġ ∗ kp = r. Since k kp is strictly increasing, it follows
that ġ ′ = ġ ∗ , i.e., ġ ∗ ≤ f˙.

Suppose p = ∞ and k k∗ is σ–finite. Suppose h ∈ L∞ is an upper bound of G. Then

there exists a countable collection of pairwise disjoint sets {An } ⊂ L1 with kAn k > 0 and
1∪n An = 1. For each n ∈ N Gn = {ġ 1̇An : ġ ∈ G} ⊂ L1 and thus bounded above in L1
by ḣ1̇An . It follows that Gn admits a least upper ∗ ġn∗ 1̇Acn = 0. For each n
∗ ∗ ∗
P bound ġn with ∗
let gn ∈ ġn with gn = 0 on Ω \ An , then g = n gn ∈ MR(k k ) and, as |g| ≤ khk∞ a.s.,
ġ ∗ ∈ L∞ . If f˙ ∈ L∞ is an upper bound of G in L∞ , then f˙1̇An is an upper bound of Gn in
L1 ; hence, g ∗ ≤ f ∗ a.s. on An . Consequently, ġ ∗ is the least upper bound of G in L∞ .

Suppose m is a positive σ–continuous elementary integral on a Stone lattice E ⊂ Bb (Ω),

and let k k∗ be Daniell’s mean. For any 1 ≤ p ≤ ∞, let L∗p denote the space of continuous
linear functionals on Lp (k k∗ ). It is easy to check that kΛk := supkf kp =1 |Λf |, Λ ∈ L∗p ,
defines a complete norm on L∗p .
Suppose that either 1 < p ≤ ∞ or p = 1 and k k∗ is σ–finite. Let q be such that p1 + 1q = 1.
R
A function g ∈ Lq (k k) defines a continuous linear functional Λg : f˙ 7→ f g dm = m(f g),
f ∈ f˙ ∈ Lp , and by Corollary 8.3.10,
kΛg k := sup |Λg f | = kgkq .
kf kp =1

The positive part g+ of g can be extracted in terms the behavior of Λg on L+

p . Indeed, for
any f˙ ∈ Lp and any γ ∈ Lq such that 0 ≤ γ ≤ g+ ,
+

Z Z Z

Λγ (f ) = f γ dm = f 1{γ>0} γ dm ≤ f 1{γ>0} g dm = Λg f 1{γ>0} .
8.4. Riesz representation. 187

Conversely, if γ ∈ L+ + ∗
q satisfies Λγ (f ) ≤ Λg f 1{γ>0} for all f ∈ Lp , then γ ≤ g+ k k –a.s.
Therefore, ġ+ is the least upper bound of the family
n o
(8.17) G = γ̇ ∈ L+ : Λ γ ( f˙) ≤ Λ g f 1{γ>0} , for all f˙ ∈ L p
q

The next result shows that any continuous linear functional on Lp is of the form Λg for
some g ∈ Lq .
Theorem 8.4.3. (Riesz–representation theorem) Suppose (E, m) is an positive σ–continuous
elementary integral and let k k∗ be its Daniell mean. If either 1 < p < ∞ or p = 1 and k k∗
is σ–finite, then for any Λ ∈ L∗p there exists a unique g ∈ Lq such that Λ = Λg .
Remark 8.4.4. Theorem 8.4.3 states that if 1 < p < ∞ or p = 1, then L∗p (k k∗ , R) and
Lq (k k∗ , R) where p1 + 1q = 1 are isomprphic isometric spaces, that is, the map g →
7 Λg a
linear isometry from Lq and Lp . ∗

n o
Proof. Let G = γ ∈ L+ : Λ γ (f ) ≤ Λ f 1{γ>0} , for all f˙ ∈ L q . We claim that G is a
q
non empty order–directed and k kq –bounded. First notice that G 6= ∅ as it contains 0̇. If
γ1 , γ2 ∈ G , then
Z Z
Λγ1 ∨γ2 (f ) = f 1{γ2 <γ1 } γ1 dm + f 1{γ2 ≥γ1 } γ2 dm

≤ Λγ1 f 1{γ2 <γ1 } 1{γ1 >0} + Λγ2 f 1{γ2 ≥γ1 } 1{γ2 >0} ≤ Λ f 1{γ1 ∨γ2 >0} ,
which shows that γ1 ∨ γ2 ∈ G . For any γ ∈ G ,
nZ o n o
kγkq = sup f γ dm : f ∈ L+p , kf kp = 1 ≤ sup Λ f 1{γ>0} : f ∈ L +
p , kf kp = 1
n o
≤ sup Λ(f ) : f ∈ Lp , kf kp = kΛk < ∞.

By Theorem 8.4.2(i)&(iii), G admits a least upper bound u̇ ∈ L+ ˙ +

q . For any f ∈ Lp ,
{f˙γ : γ ∈ G } ⊂ L1 has least upper bound f˙u̇ and there is (γ̇n ) ⊂ G , γn ≤ γn+1 , such that
kf˙(γ̇n − u̇)k∗ → 0. As a consequence, 1{γn >0} f → 1{u>0} f k k∗ –a.s. and, by dominated
convergence,

Λu (f ) = lim Λγn (f ) ≤ lim Λ f 1{γn >0} = Λ f 1{u>0} .
n n

This shows that u̇ ∈ G .

We will show that Λu −Λ is a positive linear functional on Lp by way of contradiction. If there

is f˙ ∈ L+ ∗ ˙
p (k k ) with Λu (f ) < Λ(f ), then there exists 1A ∈ L1 such that Λu (1A ) < Λ(1A ).
R
Let r > 0 small enough so that r 1A dm < Λ(1A ) − Λu (1A ) and define Λ b = rm1 + Λu − Λ.
A
We claim that there is a integrable set C ⊂ A such that for any integrable set B ⊂ D :=
b B ) ≤ 0. Set
A \ C, Λ(1
b B ) : B ∈ L1 , B ⊂ A} ≥ 0.
α0 := sup{Λ(1
188 8. Lp spaces

If α0 = 0 set D = A, otherwise there exists an integrable set A1 ⊂ A such that Λ(1 b A )>
1
α0 /2. Proceeding by induction, suppose we have found integrable sets A1 , . . . , An such that
j−1
[
Aj ⊂ A \ Ak
k=1

b A ) > αj−1 /2 > 0, for j = 1, · · · , n, where

and Λ(1 j

n m
[ o
αm b B ) : B ⊂ An ⊂ A \
= sup Λ(1 Ak
k=1

for all m ∈ N. If αn = 0 we stop at An , otherwise, we choose an integrable set An+1 ⊂

S b A ) > αn /2. By monotone convergence Wn 1A converges to
A \ Wnk=1 Ak with Λ(1 n+1 k=1 k
1C := n 1An in k kp –mean and pointwise. Hence
X δn X
0≤ ≤ Λ(1 b C) < ∞
b An ) = Λ(1
n
2 n
Sn
and so, αn → 0. For any integrable set B ⊂ D := A \ C, B ⊂ A \ k=1 Ak for all n.
b B ) ≤ δn → 0.
Therefore, Λ(1

b 1D ) ≤ 0. Since integrable simple functions

For any simple function f ≥ 0 we have that Λ(f
are dense in Lp ,
Z

(8.18) (u + r1D )f dm ≤ Λ 1D f
D

for all f ∈ L+
p. As u̇ ∈ G ,
Z Z

(8.19) (u + r1D )f dm = uf 1Dc dm ≤ Λ(f 1Dc 1{u>0} .
Dc

Combining (8.18) and (8.19) we obtain

Z

(u + r1D )f dm ≤ Λ f 1{u+r1D >0}

for all f ∈ L+ ∗ b
p . Hence u̇ + r 1̇D ∈ G , and we must have that k1D k = 0. Then 0 ≤ Λ(1C ) =
b A ) < 0 which is a contradiction.
Λ(1

If Λ is positive, then Λ = Λu for

Λu (f ) ≥ Λ(f ) ≥ Λ(f 1){u>0} ) ≥ Λu (f )

for all f ∈ L+ +
p . For general Λ we consider Λu − Λ and obtain v ∈ Lq such that Λu − Λ = Λv .
Then Λ = Λg with u = g+ and v = g− .
8.5. Reverse Borel–Cantelli theorem 189

8.5. Reverse Borel–Cantelli theorem

The following is a very useful result in probability theory. RWe will assume that (Ω, F , P)
is a probability space and for any f ∈ L1 , we use E[f ] := f dP to denote the integral or
expected value of f under P.
Lemma 8.5.1. If 0 6= f ∈ L2 and E[f ] ≥ 0, then for any 0 < λ < 1
2
2 E[f ]
(8.20) P f > λE[f ] ≥ (1 − λ) .
E[|f |2 ]

Proof. By Hölder’s inequality

Z Z p
E[f ] = f dP + f dP ≤ λE[f ] + kf k2 P[f > λE[f ]] .
{f ≤λE[f ]} {f >λE[f ]}
P
Lemma 8.5.2. (Kochen–Stone) Let {An } ⊂ F . If n P[An ] = ∞, then
P 2
n
\ [ k=1 P[Ak ]
(8.21) P Ak ≥ lim sup Pn Pn
n≥1 k≥n
n k=1 m=1 P[Ak ∩ Am ]

Pn
Proof.
P Without loss of generality, we assume that P[A
n ] > 0 for all
n. Let f n = k=1 1Ak ,
f = n≥1 1An , and for any 0 < λ < 1, define Bn,λ = fn > λE[fn ] . Observe that
\ [ \ [
A= Ak = {f = ∞} ⊃ Bk,λ = Bλ ;
n≥1 k≥n n≥1 k≥n

then, by (8.20), we obtain

2
E[fn ]
2
P[A] ≥ P[Bλ ] ≥ lim sup P[Bn,λ ] ≥ (1 − λ) lim sup .
n→∞ n E[fn2 ]
Letting λ → 1 gives (8.21).

The next result is a partial converse to the Borel-Cantelli theorem discussed in Corol-
lary 4.3.4.
Theorem 8.5.3. (Borel–Cantelli, II) Suppose {Ahn } ⊂ F is suchi that for any i 6= j,
P T S
P[Ai ∩ Aj ] ≤ P[Ai ]P[Aj ]. If n P[An ] = ∞, then P n≥1 k≥n Ak = 1.
T S P P
Proof. Denote by A = n≥1 k≥n Ak . Let an = nk=1 P[Ak ], bn = i6=j P[Ai ]P[Aj ], and
P
cn = nk=1 P2 [Ak ]. By Kochen–Stone’s lemma we have
c n + bn
P[A] ≥ lim sup
n a n + bn
cn an
From a2n
= cn +bn ≤ an +bn and an ր ∞, it follows that bn ր ∞ and limn bn = 0 = limn bn .
Therefore, P[A] = 1.
190 8. Lp spaces

8.6. L0 and convergence in Measure.

We will assume throughout this section that k k is a σ–finite mean for a vector lattice
E ⊂ Bb (Ω).
Definition 8.6.1. Let (S, ρ) be a metric space. A sequence {fn : n ∈ N} ⊂ S Ω converges
in measure to f ∈ S Ω if for every δ > 0 and A ∈ L1

(8.22) lim {ρ(f, fn ) > δ} ∩ A = 0.
n

Remark 8.6.2. Unless S is separable, if both fn and f are Borel–measurable, the map
ω 7→ ρ(fn (ω), f (ω)) may fail to be Borel–measurable (see Theorem 9.4.3 in Section 9.4).
However, when {fn : n ∈ N} ⊂ MS we have that {ρ(fn , f ) > δ} ∈ MR, and so the set
in (8.22) is k k–integrable for any A ∈ L1 .
Remark 8.6.3. Convergence in measure is of particular interest when (Ω, M , µ) is a finite
measure space. In this case, {fn : n ∈ N} ⊂ S Ω converges in measure to f is equivalent to

lim µ∗ ρ(fn , f ) > δ = 0
n

for any δ > 0. Here we use µ∗

to denote the Daniell mean associated to µ which, as we know,
coincides with Carathéodory’s outer measure associated to µ. If (S, ρ) is a separable metric
space
and {fn : n ∈ N} ⊂ MS then {ρ(fn , f ) > ε} ∈ L1 (µ). Consequently µ∗ ρ(fn , f ) >
δ = µ ρ(fn , f ) > δ .

Denote by L0 (k k) the space of all almost surely defined real (or complex)–valued
measurable functions on Ω. We will show that there exists a topology that is consistent
with convergence in measure of functions.
Theorem 8.6.4. Let {fn : n ∈ Z} ⊂ RΩ .
(i) If fn converges in mean to f , then fn converges in measure to f .
Suppose {fn : n ∈ N} ⊂ MR.
(ii) If fn converges k k–a.s. to f , then fn converges in measure to f .
(iii) If fn converges in measure to f then, f ∈ MR and there exists a subsequence
{fmj : j ∈ N} that converges to f a.s.

Proof. (i) If kfn − f k → 0, then for any δ > 0, {|f − fn | > δ} ≤ 1δ f − fn → 0. This
shows that convergence in mean implies convergence in measure.

(ii) If {fn : n ∈ N} ⊂ MR converges pointwise almost surely to f and A ∈ L1 then, by

Egorov’s theorem, f ∈ MR and for any ε > 0 there is L1 ∋ A0 ⊂ A with kA \ A0 k < ǫ on
which convergence is uniform. Hence, given δ > 0 there is N ∈ N such that kfn −f ku,A0 < δ
whenever n ≥ N . Therefore

{|fn − f | > δ} ∩ A ≤ kA \ A0 k < ε, (n ≥ N ).
8.6. L0 and convergence in Measure. 191

(iii) Let {Ak : k ∈ N} ⊂ L1 be a partition of Ω (Here we use the σ–finiteness assumption on

the mean). For each k, there is a sequence {mk,j : j ∈ N} with limj→∞ mkj = ∞ such that

Ak ∩ {|fn − f | > 2−j } < 2−j , (n ≥ mk,j ).
Let mi = max{mk,j : 1 ≤ k, j ≤ i}, then

Ak ∩ {|fm − f | > 2−j } < 2−j , (j ≥ k).
j
T S
It follows that Nk = Ak ∩ ℓ j≥ℓ {|fmj − f | > 2−j } is a k · k–negligible set. If x ∈ Ak \ Nk ,
then there is an integer ℓx,k such that |f −j
Smj (x) − f (x)| ≤ 2 for j ≥ ℓx,k . Therefore
{fmj : j ∈ N converges pointwise to f on k (Ak \ Nk ).

For the remainder if this section we will assume that (Ω, M , µ) is a finite measure space.
For any mensurable complex (or real extended) valued function f define
kf k0 = inf {ε > 0 : µ∗ (|f | > ε) ≤ ε} ,
where µ∗ is the Daniell–mean associated to µ. When f ∈ MR µ∗ is substituted by µ.
Ω
Lemma 8.6.5. For f and g be elements of R
(i) µ∗ (|f | > kf k0 ) ≤ kf0 k ≤ µ(Ω).
(ii) If µ∗ ({f = ±} ∩ {g = ∓}) = 0, then kf + gk0 ≤ kf k0 + kgk0 .
(iii) kf k0 ≤ kgk0 whenever |f | ≤ |g|.
(iv) krf k0 ≤ (r ∨ 1)kf k0 .
(v) If f ∈ MR with µ(|f | = ∞) = 0, then limr→0 krf k0 = 0.

Proof. (i) Since µ∗ (|f | > µ(Ω)) ≤ µ(Ω), kf k0 ≤ µ(Ω). Let εn ց kf k0 with µ∗ (|f | >
εn ) ≤ εn . By Theorem 7.6.7, µ∗ is maximal and continuous along arbitrary nonnegative
nondecreasing sequences. Hence, {|f | > εn } ր {|f | > kf k0 },
µ∗ (|f | > kf k0 ) = sup µ∗ (|f | > εn ) ≤ kf k0 .
n

Therefore µ∗ (|f | > kf k0 ) ≤ kf k0 ≤ µ(Ω).

(ii) From |f + g| ≤ |f | + |g| it follows that

{|f + g| > kf k0 + kgk0 } ⊂ {|f | > kf k0 } ∪ {|g| > kgk0 } .
Consequently
µ∗ (|f + g| > kf k0 + kgk0 ) ≤ µ∗ (|f | > kf k0 ) + µ∗ (|g| > kgk0 ) ≤ kf k0 + kgk0 ,
whence we conclude that kf + gk0 ≤ kf k0 + kgk0 .

(iii) If |f | ≤ |g|, then |f | > ε ⊂ {|g| > ε for any ε > 0. Thus
µ∗ (|f | > kgk0 ) ≤ µ∗ (|g| > kgk0 ) ≤ kgk0 ,
whence we obtain that kf k0 ≤ kgk0 .
192 8. Lp spaces

(iv) Suppose 0 < r ≤ 1. Whenever µ∗ (|f | > a) ≤ a, µ∗ (r|f | > a) = µ∗ (|f | > a/r) ≤
µ∗ (|f | > a) ≤ a. Hence krf k0 ≤ kf k0 . Suppose r > 1. As µ∗ (r|f | > rkf k0 ) = µ∗ (|f | >
kf k0 ) ≤ kf k0 ≤ rkf k0 , krf k0 ≤ rkf k0 .
(v) Suppose kf k0 6= 0. For any ε > 0 limr→0 µ(|f | > ε/r) = µ(|f | = ∞) = 0. Hence, there
is δ > 0 such that 0 < r < δ implies µ(r|f | > ε) < ε. Therefore, krf k0 ≤ ε whenever
0 < r < δ.

The functional k · k0 is not a pseudonorm on L0 ; however, d0 (f, g) = kf − gk0 defines

a pseudo–metric on the space of all µ–a.s. finite measurable functions. The space L0 is
defined by identifying functions f and f ′ in L0 such that µ(f 6= f ′ ) = 0.
Theorem 8.6.6. The space (L0 , d0 ) is a complete metric linear space; moreover,
(8.23) lim kfn − f k0 = 0 iff lim µ(|fn − f | > δ) = 0
n n
for all δ > 0.

Proof. If kf k0 = 0, then µ(|f | > 0) = 0, so f = 0 µ–a.s. Clearly d0 is symmetric, and since

k · k0 satisfies the triangle inequality, we conclude that d0 is a metric on L0 . The continuity
of L0 × L0 → L0 : (f, g) 7→ f g is a consequence of the triangle inequality in Lemma 8.6.5(ii).
The continuity of the map (F × L0 ) → L0 : (λ, f ) 7→ λf follows from
kλf − λ0 f0 k0 ≤ kλ(f − f0 )k0 + k(λ − λ0 )f0 k0
and Lemma 8.6.5(iv)&(v).
Suppose kfn − f k0 → 0 as n → ∞. Let δ > 0 be fixed. For any 0 < ε ≤ δ, there is an
integer N such that kfn − f k0 < ε whenever n ≥ N , and so

µ |fn − f | > δ ≤ µ(|fn − f | > ε) ≤ µ |fn − f | > kfn − f k0 ≤ kfn − f k0 < ε
for all n ≥ N . This shows that fn converges to f in µ–measure.
Conversely, suppose fn converges to f in µ–measure. Then, for any ε > 0 there exists and
integer N such that µ(|fn − f | > ε) < ε for all n ≥ N . Hence, kfn − f k0 ≤ ε whenever
n ≥ N and (8.23) follows.
It remains to show that d0 is a complete metric on L0 . If {fn : n ∈ N} is a Cauchy sequence,
then there is an increasing sequence of integers nℓ such that
sup µ(|fk − fm | > 2−ℓ ) 2−ℓ< ∞, and by the Borel–Cantelli theorem, for µ–a.a ω ∈ Ω,
−ℓ
P ω) such that |fnℓ+1 (ω) − fnℓ (ω)| ≤ 2 for all ℓ ≥ N .
there is an integer N (depending on
Consequently, f := limℓ→∞ fnℓ = k (fnk − fnk−1 ) + fn1 exists µ–a.s. Therefore, {fn } has
subsequence {fnℓ } that converges µ–a.s, and so in L0 .
Lemma 8.6.7. Suppose µ(Ω) < ∞. Let fn , f be mensurable functions in a separable metric
space (S, ρ) and let F : [0, ∞) → [0, ∞) a bounded continuous nondecreasingR function with
F (t) = 0 iff t = 0. Then, fn converges in measure to f if and only if limn F (ρ(fn , f )) dµ =
0.
8.6. L0 and convergence in Measure. 193

Proof. Let ε > 0 arbitrary and kF k∞ := M . Notice that

F (ε)1{ρ(fn ,f )>ε} ≤ F (ρ(fn , f )) ≤ F (ε) + M 1{ρ(fn ,f )>ε}
R
and denote by D(fn , f ) = F (ρ(fn , f )) dµ. Then
(8.24) F (ε)µ({ρ(fn , f ) > ε}) ≤ D(fn , f ) ≤ F (ε)µ(Ω) + M µ({ρ(fn , f ) > ε})
Necessity follows by letting n ր ∞ and then ε ց 0. Sufficiency follows by letting n ր
∞.

Lemma 8.6.7 and Exercise 8.9.19 allows us to put a metric in the space M (S) of all
measurable functions defined on (S, ρ) that is equivalent to convergence in measure.
Theorem 8.6.8. Suppose that µ(Ω) < ∞. Let (S, ρ) be a separable metric space, and F :
[0, ∞) → [0, ∞) be a bounded nondecreasing continuous subadditive function with F (t) = 0
iff t = 0. For any given a pair measurable functions in f , g in S, define
Z
(8.25) DF (f, g) = F (ρ(f, g)) dµ.

Then (M (S), DF ) is a metric space and fn converges in measure to f if and only if

limn DF (fn , f ) = 0. In addition, if ρ is a complete metric, then so is DF .

Proof. Only the last statement require a proof. Suppose (S, ρ) complete and let (fn ) be a
Cauchy sequence in (M (S), DF ). Then by (8.24)
1
lim sup µ(ρ(fn , fm ) > ε) ≤ F (ε) Mlim sup DF (fn , fm ) = 0
M →∞ n,m≥M →∞ n,m≥M

Hence, there are integers nk < nk+1 such that supn,m≥nk µ(ρ(fn , fm ) > 2−k ) < 2−k , so
P −k
k µ(ρ(fnk+1 , fnk ) > 2 ) < ∞. By the Borel–Cantelli lemma, the set A = {ρ(fnk+1 , fnk ) >
2−k , i.o} has µ–measure zero; hence, {fnk } is a Cauchy sequence in (S, ρ) µ–a.s. Complete-
ness of (S, ρ) implies that fnk converges µ–a.s to a measurable function f . The dominated
convergence implies that limk DF (fnk , f ) → 0. Therefore limn DF (fn , f ) = 0, for in any
metric space, a Cauchy sequence that has a convergent subsequence is in fact convergent.
Theorem 8.6.9. Let (Ω, M , µ) be a finite measure space, and (S, ρ) be a separable metric
space.
(i) If fn (ω) converges to f (ω) pointwise µ–a.s. then fn converges to f in measure.
(ii) If fn converges in measure to f , then there is a subsequence fnk such that fnk (ω) →
f (ω) pointwise for µ–a.s. all ω ∈ Ω.

Proof. (i) fn → f a.s. is equivalent to ρ(fn , f ) ∧ 1 → 0 a.s. The conclusion follows from
Lemma 8.6.7 with F (t) = t ∧ 1, and dominated convergence.
(ii) k
P Choose a subsequence nk < nk+1 such that µ({ρ(fnk , f ) > 1/k} < 1/2 . Then
k µ({ρ(fnk , f ) > 1/k} < ∞ and, by Borel–Cantelli, fnk converges pointwise to f out-
side the set A = {ρ(fnk , f ) > 1/k, i.o} which has measure zero.
194 8. Lp spaces

Corollary 8.6.10. Assume µ(Ω) < ∞. Then fn converges in measure to f if and only if
for any subsequence fn′ there is a sub subsequence fn′k → f pointwise µ–a.s.

Proof. Necessity follows from Theorem 8.6.9(ii).

Conversely, suppose R that fn fails to converge to f in measure. Then there is a subsequence
fn′ such that inf n′ ρ(fn′ , f ) ∧ 1 dµ > 0. By hypothesis there is a sub subsequence fn′k → f
R
µ–a.s. By dominated convergence limk ρ(fn′k , f ) ∧ 1 dµ = 0, contradiction to the choice of
fn ′ .
Theorem 8.6.11. Let (Ω, M , µ) be a measure space and let {fn } be a sequence of measur-
able functions with values in a complete separable metric space (S, ρ). The sequence {fn }
converges µ–a.s. iff for any A ∈ L1 and ε > 0
(8.26) lim µ[A ∩ {sup ρ(fn+k , fn ) > ε}] = 0
n k≥1

Proof. Let wn = supk,m≥n ρ(fk , fm ). The completeness of (S, ρ) implies that {fn } con-
verges µ–a.s. iff wn converges to zero µ–a.s.

If {fn } converges µ–a.s., then from supk ρ(fn+k , fn ) ≤ wn , we conclude that both wn and
supk ρ(fn+k , fn ) converge to 0 µ–a.s., and so in measure.

Conversely, if (8.26) holds, then from wn ≤ 2 supk ρ(fn+k , fn ), we get that

µ[A ∩ {wn > 2ε}] ≤ 2µ[A ∩ {sup ρ(fn+k , fn ) > ε}] → 0.
k

Thus, wn → 0 in measure. Since wn is a nonnegative nondecreasing, we have in fact that

wn → 0 µ–a.s., and so fn converges µ–a.s.

8.7. Uniform Integrability

It is easy to check that a sequence {fn } ⊂ L1 may converge in measure and then fail to
converge in L1 . We will show that under a certain uniformity condition both types of
convergence are equivalent.
Definition 8.7.1. A family I ⊂ L1 (Ω, F , µ) is uniformly integrable if for any ε > 0,
there exist functions g, h ∈ L1 with g ≥ h such that
(8.27) sup d(f, [g, h]) < ε.
f ∈I

Given numbers c ≤ a < b ≤ d, it is easy to check that

|x − xdc | ≤ |x − xba | (x ∈ R).
Therefore, a family I ⊂ L1 is uniformly integrable iff
(8.28) inf sup d(f, [−g, g]) = 0.
g∈L+
1 f ∈I
8.7. Uniform Integrability 195

g
As |f − f−g | = (|f | − g)+ , it follows that I is uniform integrable iff
Z
(8.29) inf sup (|f | − g)+ dµ = 0
0≤g∈L1 f ∈I

Theorem 8.7.2. A family I ⊂ L1 (Ω, F , µ) is uniformly integrable if and only if

Z
(8.30) inf sup |f | dµ = 0
0≤e
g ∈L1 f ∈I {|f |>e
g}

If in addition µ(Ω) < ∞, then uniform integrability is equivalent to either of the following
conditions
R
(i) inf a>0 supf ∈I (|f | − a)+ dµ = 0
R
(ii) inf a>0 supf ∈I {|f |>a} |f | dµ = 0

Proof. Since (|f | − g)+ 1{|f |≥g} ≤ |f |1{|f |≥g} , (8.29) follows from (8.30).
Suppose that (8.29) holds, and for each ε > 0 choose 0 ≤ gε ∈ L1 so that
Z
ε
sup (|f | − gε )+ dµ <
f ∈I 2
If geε = 2gε/2 , then |f |1{|f |>egε } ≤ 2(|f | − gε/2 )+ . Therefore,
Z
ε
sup |f | dµ <
f ∈I {|f |>e
gε } 2
and (8.30) follows.
Assume in addition that µ(Ω) < ∞. Repeating the arguments used in the proof of the
equivalence between (8.30) and (8.29) shows that (i) and (ii) are equivalent. Clearly (i)
implies (8.30), since the infimum in (i) is taken over a smaller set of integrable functions,
namely the set of all constants. It remains to show
R that (8.29) implies (ii). For ε > 0, let
gε and geε as before, and choose aε > 0 so that {egε >aε } geε dµ < ε/2. From
|f |1{|f |>aε } ≤ |f |1{|f |>egε } + geε 1{egε >aε } ,
it follows that Z
sup |f | dµ ≤ ε.
f ∈I {|f |>aε }
Therefore (ii) holds.
Lemma 8.7.3. Suppose that µ is σ–finite, then there exists h > 0 with h ∈ L1 (Ω, F , µ).

Proof. By assumption, there exists a countable partition {An : n ∈ N} of Ω with 0 <

µ(An ) < ∞ for all n. The function
∞
X 2−n
h= 1A
µ(An ) n
n=1
satisfies the desired condition.
196 8. Lp spaces

Theorem 8.7.4. Suppose I ⊂ L1 (Ω, F , µ). (a) If

(i) supf ∈I kf k1 < ∞, and
(ii) there is 0 ≤Rh ∈ L1 (Ω, F , µ) such that for every ε > 0, there is δε > 0 so that
A ∈ F and A h dµ < δ implies that
Z
sup |f | dµ < ε,
f ∈I A

then I is uniformly integrable. (b) Conversely, if µ is σ–finte and I is uniformly integrable

then (i) and (ii) hold. (c) If µ(Ω) < ∞, then (ii) is equivalent to
R
(ii)’ For every ε > 0 there is δε > 0 with supf ∈I A |f | dµ < ε if µ(A) < δε .

Proof. (b) Suppose ν is Rσ–finite and I is uniformly integrable. For any ε > 0 let 0 ≤
geε ∈ L1 so that supf ∈I {|f |>egε } |f | dµ < ε. Since |f | ≤ |f |1{|f |>eg1 } + ge1 , (i) follows by
integration.

By Lemma 8.7.3 there exists a strictly positive function h ∈ L1 (µ). As 1{egε/3 >nh} → 0
pointwise, by dominated convergence there is an integer nε such that
Z
geε/3 dµ < 3ε .
{e
gε/3 >nε h}
ε
For any A ∈ F , |f |1A ≤ |f |1{|f |>egε/3 } + geε/3 1{egε/3 >nε h} + nε h1A . Hence, if δε := 3nε , then
R R
A h dµ < δε implies that supf ∈I A |f | dµ ≤ ε.

(a) Suppose h ∈ L+
1 satisfies (ii) and let α = supf ∈I |f | dµ. For any c > 0
Z Z Z
h dµ ≤ 1c |f | dµ ≤ 1c |f | dµ ≤ 1c α.
{|f |>ch} {|f |>ch}
R R
Consequently, if c > α/δε then {|f |>ch} h dµ < δε ; thus, supf ∈I {|f |>ch} |f | dµ < ε.

(c) Suppose µ(Ω) < ∞. Assume that (ii) holds. For ε > 0 let δε > 0 be asR in (ii). Since
1{h≥k} ց 0 as k ր ∞, then monotone convergence we can choose kε so that {h>kε } h dµ <
δ δ
R
2 . If µ(A) < 2kε , then A h dµ < δ and (ii)’ follows.

Assume (ii)’ holds. Then (ii) holds with h ≡ 1.

Theorem 8.7.5. Suppose that µ is σ–finite and let fn ∈ L1 (Ω, F , µ), n ∈ N. The following
statements are equivalent.
(i) There is f ∈ L1 to which fn converges in L1 .
(ii) fn is a Cauchy sequence in L1 .
(iii) {fn } is uniformly integrable and there is a measurable function f to which fn
converges in measure.
8.7. Uniform Integrability 197

Proof. The equivalence of (i) and (ii) is contained in Theorem 8.3.6.

Suppose (i) holds. The Markov–Chebyshev inequality implies that µ(|fn − f | > ε) ≤
1
ε kfn − f k1 . Convergence in measure follows. Given ε > 0, choose nε so that kfn − fm k1 < ε
for all n, m ≥ nε . Since x 7→ (a − x)+ is nonincreasing, letting gε = max{|f1 |, . . . , |fnε |},
we obtain Z
(|fn | − gε )+ dµ < ε

for all n. Therefore, (fn )n is uniformly integrable and (iii) holds.

It remains to prove that (iii) implies (i). Suppose the contrary. Then, there is ε > 0 and a
subsequence (fnk ) such that
(8.31) inf kfnk − f k1 ≥ ε
k

Since fn converges to f in measure, we may assume without loss of generality that fnk
that converges
R R By Fatous’s Lemma and Theorem 8.7.4 if follows that f ∈ L1
to f µ–a.s.
since |f | dµ ≤ lim inf k |fnk | dµ < ∞. Thus,
R the sequence {fmk − f }k is also uniformly
integrable. Choose 0 ≤ g ∈ L1 so that supn {|fn −f |>g} |fn − f | dµ < 2ε . If gk = |fnk − f | ∧ g,
then limk gk = 0 a.s. Since g − gk ≥ 0, Fatou’s lemma gives
Z Z Z
(8.32) 0 ≤ lim sup gk dµ = g dµ − lim inf (g − gk ) dµ ≤ 0.
k k

Since {|fnk − f | > gk } = {|fnk − f | > g}, we have that

(8.33) |fnk − f | ≤ |fnk − f |1{|fnk −f |>g} + gk .
ε
Integrating both sides of (8.33) and letting k ր ∞ gives lim supk kfnk − f k1 ≤ 2 which is
a contractiction to (8.31). Therefore, kfn − f k1 → 0.

We conclude this section with a well known result that is in fact equivalent to Theo-
rem 8.7.5.
Theorem 8.7.6. (Vitaly’s convergence theorem) Suppose 1 ≤ p < ∞ and let {fn : n ∈
N} ⊂ Lp (µ) and let f be F –measurable. Then, kfn − f kp → 0 iff {fn f } satisfies the
following conditions:
(i) fn converges to f in µ–measure.
(ii) For any ε > 0, there is E ∈ F with µ(E) < ∞ such that
Z
sup |fn |p dµ < ε
n Ω\E

(iii) For any ε > 0, there exists δ > 0 such that

Z
sup |fn |p dµ < ε
n A
whenever µ(A) < δ.
198 8. Lp spaces

Proof. Suppose kf − fn kp → 0. Then (i) holds clearly. For ε > 0, there is nε ∈ N such
1/p
that kf − fn kp < ε 2 for all n ≥ nε . Let Aε and Bε be measurable sets of finite measure
such that
Z
ε
|f |p dµ < p
Ω\A 2
Z
max |fj |p , dµ < εp
1≤j≤nε Ω\B
Set C = Aε ∪ Bε . Then, for any n ≥ nε

1Ω\C fn ≤ kfn − f kp + k1Ω\C f kp < ε1/p
p

Thus (ii) holds. Similarly, choose δε > 0 such that µ(A) < δ implies that
Z
ε
|f |p dµ < p
2
ZA
max |fj |p dµ < ε
1≤j≤nε A
Then, for n ≥ nε
k1A fn kp ≤ kfn − f kp + k1A f kp < ε1/p
Thus (iii) holds.
Conversely, suppose (i)–(iii) hold. We will show that any subsequence of (fn ) has a sub-
sequence which converges to f in Lp . By (i) Without loss of generality, suppose fn → f
µ–a.s.
Given ε > 0, choose E ∈ F with µ(E) < ∞ and δ > 0 so that
Z
ε
sup |fn |p dµ <
n Ω\E 4p
Z
ε
sup |fn |p dµ < p
n A 4
R R
whenever µ(A) < δ. By Fatou’s lemma, Ω\E |f |p dµ < 4ε and A |f |p dµ < 4ε whenever
µ(A) < δ. By Egorov’s theorem, there is a measureble set C ⊂ E with µ(E \ C) < δ such
that kfn − f ku,C → 0. Consequently
kf − fn kp = k(f − fn )1Ω\E kp + k(f − fn )1E\C kp + k(f − fn )1C kp
≤ ε1/p + kf − fn kC,u µ(C)
It follows that lim supn kf − fn kp ≤ ε1/p . Therefore, kf − fn kp → 0.

8.8. Lyapunov’s convexity theorem

Definition 8.8.1. Suppose k k is a mean for a Stone lattice or a ring E ⊂ Bb (S). An atom
of k k is a set A ⊂ S with kAk > 0 such that for any set B ∈ M (k k), B ⊂ A implies that
either kBk = 0 or kAk = kBk. When k k admits no atoms, we say that k k is nonatomic.
8.8. Lyapunov’s convexity theorem 199

The notion of an atom is more relevant in the context of a the Daniell mean k k∗ of
a positive σ–continuous elementary integral (E, I) for in this setting, k k∗ is σ–additive on
the family of measurable sets M (k k∗ ).
Theorem 8.8.2. (Saks) Let k k∗ be the Daniell’s mean associated to an elementary positive
σ–continuous elementary integral (E, I).
(i) If E ∈ L1 and kEk∗ > 0 then, for any 0 < ε there exists aSfinite collection of
pairwise disjoint measurable sets E1 , . . . Enε such that E = nj=1
ε
En and either
∗ ∗
kEj k ≤ ε or Ej is an atom of k k with kEj k > ε.∗

(ii) If k k∗ has no atoms and E ∈ L1 then, for any 0 < α < kEk∗ , there exits D ∈ L1
with D ⊂ E such that kDk∗ = α.

Proof. (i) Since kEk∗ < ∞, there are at most a finite number of atoms E1 , . . . Eℓ ⊂ E
S
with kEj k∗ > ε. Let A = E \ ℓj=1 Ej . If kAk∗ = 0, the desired partition is given by
{Ej : 1 ≤ j ≤ ℓ} ∪ {A}. Suppose kAk∗ > 0.

Claim: Any nonnegligible measurable set B ⊂ A contains a set F ∈ L1 such that 0 <
kF k∗ ≤ ε. Suppose that is not the case. Then there is an integrable set B ⊂ A with
kBk∗ > 0 whose nonnegligible measurable subsets have Daniell mean larger than ε. In
particular, kBk∗ > ε and thus, B is not an atom of k k∗ . Consequently, there is a measurable
set G1 ⊂ B such that 0 < kG1 k∗ < kBk∗ . It follows that both kG1 k∗ and kB \ G1 k∗ are
larger that ε; thus, B \ G1 is not an atom of k k∗ and so, there exists G2 ⊂ B \ G1 such
that 0 < kG2 k∗ < kB \ G1 k∗ . Proceeding by induction, we obtain a sequence of pairwise
disjoint sets Gn ⊂ B with kGn k∗ > ε, which contradicts integrability of B.

From the claim, we conclue that for any integrable B ⊂ A with kBk∗ > 0,

0 < β(B; ε) := sup kHk∗ : H ∈ L1 , H ⊂ B, kHk∗ ≤ ε ≤ ε
Let H1 be an integrable subset of A such that
β(A; ε)
< kH1 k∗ ≤ ε
2
Proceeding by induction, we
obtain
S a countable
∗ collection (possibly
S finite) of integrable
subsets Hn of A such that A \ nj=1 Hj > 0 then, Hn+1 ⊂ A \ nj=1 Hj and
S
β A \ nj=1 Hj ; ε)
< kHn+1 k∗ ≤ ε.
2
P S ∗
Since n kHn k∗ = n Hn ≤ kAk∗ < ∞, limn kHn k∗ = 0. Hence
[ n
[

β A\ Hn ; ε ≤ β A \ Hj ; ε ≤ 2kHn+1 k∗ → 0
n j=1
S ∗ P
and so, A \ n Hn = 0. Choose an integer Nε large so that n>Nε kHn k∗ < ε.
S enough S
Set Eℓ+1 := H1 , . . . , Eℓ+Nε := HNε and ENε +1 := (A \ n Hn ) ∪ j>Nε+1 Hj . The collection
{Ej : j = 1, . . . Nε + 1} has the desired properties.
200 8. Lp spaces

(ii) Fix a sequence εn ց 0 with ε1 < α. By part (i), there exists a measurable set D1 ⊂ E
such that
α − ε1 ≤ kD1 k∗ ≤ α
Proceeding by induction, suppose we have constructed a collection of measurable sets D1 ⊂
. . . ⊂ Dn ⊂ E such that
α − εn ≤ kDn k∗ ≤ α
If kDn k∗ = α we are done, otherwise there is a set Bn+1 ⊂ E \ Dn such that
α − kDn k∗
α − kDn k∗ − εn+1 ∧ ≤ kBn+1 k∗ ≤ α − kDn k∗
2
Setting Dn+1 := Dn ∪ Bn+1 , we obtain a measurable set such that
α − εn+1 ≤ kDn ∪ Bn+1 k∗ = kDn k∗ + kBn+1 k∗ ≤ α
S
Let D = n Dn . Clearly D is a measurable subset of E with kDk∗ = α.

We conclude this section with some measure theoretical results concerning the range of
certain finite-dimentional vector-valued measures, and which extend Saks’s theorem 8.8.2[(ii)].
Theorem 8.8.3. (Lyapunov’s convexity theorem) Suppose µ1 , . . . , µn are signed measures
of finite total variation on a measure space (Ω, F ). Denote by Mb (Ω) the space of F –
bounded measurable functions in Ω. Then,
(i) the set
Z Z
K := g dµ1 , . . . , g dµn : g ∈ Mb (Ω), 0 ≤ g ≤ 1

is compact and convex in Rn .

(ii) If each µi , j = 1, . . . , n, is nonatomic then
n o
K = (µ1 (E), . . . , µn (E)) : E ∈ F

Proof. (i) Let µ := |µ1 | + . . . + |µn |. Then µ is a finite measure and µj ≪ µ for each
j = 1, . . . , n. The Radon–Nikodym theorem implies that there are functions fj ∈ L1 (µ)
such that dµj = fj dµ. Since for any f ∈ L∞ (µ) there is a function f ′ ∈ Mb (S) such that
f = f ′ µ-a.s., we may consider functions in L∞ (µ) instead of Mb (Ω). Let Λ : L∞ (µ) → Rn
be the map
Z Z
Λ(g) := gf1 dµ1 , . . . , gfn dµn .
∗
Since L1 (µ) = L∞ (µ), Λ isRweak∗ –continuous
R linear map. Notice that g ∈ H := {h ∈
L∞ (µ) : 0 ≤ h ≤ 1} iff 0 ≤ gf dµ ≤ f dµ for all f ∈ L+ 1 (µ); hence, the convex set
H := {g ∈ L∞ (µ) : 0 ≤ g ≤ 1} is a closed subset of the unit ball in L∞ (µ). By Alaoglu’s
theorem K is weak∗ –compact, and so K = Λ(H) is compact in Rn .
8.8. Lyapunov’s convexity theorem 201

(ii). Since 1E ∈ H for every E ∈ F , we have that I := Λ(F ) ⊂ K = Λ(H). Fix

g ∈ H. By Alaoglu’s theorem the set Kg := {h ∈ H : Λ(h) = Λ(g)} is a nonempty convex
weak∗ –compact set. By Krein–Milman’s theorem, Kg has an extreme point h∗ . We claim
that h∗ = 1A µ–a.s. for some A ∈ F . Suppose the contrary. Then, for some ε > 0
a := µ({ε ≤ h∗ ≤ 1 − ε}) > 0. Since each µj is nonatomic, µ is also nonatomic, and there
is a measurable set E ⊂ {ε ≤ h∗ ≤ 1 − ε} such that 0 < µ(E) < a. The linear subspace
Y = {1E φ : φ ∈ L∞ (µ)} is infinite dimensional, since there is a sequence {En : n ∈ N}
of pairwise disjoint measurable subsets of E such that µ(En ) > 0. Consequently, there is
h ∈ Y such that Λ(h) = 0 and 0 < khk∞ ≤ ε. Since h = 0 on Ω \ E, it follows that
0 ≤ h∗ ± h ≤ 1 and so, h∗ ± h ∈ Kg . However, h∗ = 21 (h∗ + h) + 12 (h∗ − h) which contradicts
the fact that h∗ is an extreme point of Kg . Therefore h∗ is a simple function.

Theorem 8.8.4. Suppose µ1 , . . . , µn+1 are signed measures on (Ω, F ) of finite total vari-
ation, and let H := {g ∈ Mb (Ω) : 0 ≤ g ≤ 1}. Define Λ : g 7→ (µ1 g, . . . , µn g) on H and set
K := Λ(H).
(i) If c ∈ K, then there exist φ∗ , φ∗ ∈ H such that
(8.34) φ∗ = arg max{µn+1 g : g ∈ H, Λg = c}
(8.35) φ∗ = arg min{µn+1 g : g ∈ H, Λg = c}
dµj
Suppose µj ≪ ν and fj = dν for all j − 1, . . . , n + 1 and some σ–finite measure ν (e.g.
ν = |µ1 | + . . . + |µn+1 |).
(ii) If there exists g ∗ ∈ H and (a1 , . . . , an ) ∈ Rn such that Λg ∗ = c and
g ∗ (x) = 1 when fn+1 (x) > a1 f1 (x) + . . . + an fn (x)
(8.36)
g ∗ (x) = 0 when fn+1 (x) < a1 f1 (x) + . . . + an fn (x)
then, g ∗ solves (8.34). Any other solution g to (8.34) satisfies g = g ∗ , ν–a.s. on
{fn+1 6= a1 f1 + . . . + an fn ν}.
(iii) If there exists g∗ ∈ H and (b1 , . . . , bn ) ∈ Rn such that Λg∗ = c and
g∗ (x) = 1 when fn+1 (x) < b1 f1 (x) + . . . + bn fn (x)
(8.37)
g∗ (x) = 0 when fn+1 (x) > b1 f1 (x) + . . . + bn fn (x)
then, g∗ solves (8.35). Any other solution g to (8.35) satisfies g = g∗ ν–a.s. on
{fn+1 6= b1 f1 + . . . + bn fn }.

Proof. (i) The first statement follows from the σ(L∞ (ν), L1 (ν))–continuity of Λ and the
σ(L∞ (ν), L1 (ν))–compactness of H ∩ L∞ (ν) for {g ∈ L∞ : 0 ≤ g ≤ 1, Λg = c} = Λ−1 ({c}).

(ii) Suppose g ∈ H and Λg = c. If g ∗ (x) > g(x) then fn+1 (x) ≥ a1 f1 (x) + . . . + an fn (x),
whereas if g ∗ (x) < g(x) then fn+1 (x) ≤ a1 f1 (x) + . . . + an fn (x). Consequently
Z
I := (g ∗ (x) − g(x))(fn+1 − a1 f1 (x) − . . . − an fn (x))ν(dx) ≥ 0
202 8. Lp spaces

This implies that

Z
∗
µn+1 (g − g) ≥ (g ∗ − g)d(a1 µ1 + . . . + an µn ) = 0

If g also solves (8.34) then I = 0 which means that the set {g∗ =
6 g, fn+1 6= a1 f1 +. . .+an fn }
is ν–negligible. Therefore g = g ∗ ν–a.s. on {fn+1 6= a1 f1 + . . . + an fn }.
(iii) may be obtained from part (ii) applied to −µj , j = 1, . . . , n + 1 and −c in place of µj ,
j = 1, . . . , n + 1 and c.
Theorem 8.8.5. Under the assumptions and notation of Theorem 8.8.4, if c is in the
relative interior of K, there exist g∗ , g ∗ ∈ H with Λg∗ = Λg ∗ = c satisfying (8.37), (8.35)
and (8.36), (8.34) respectively. Moreover, µn+1 g∗ < µn+1 g ∗ unless µn+1 = a1 µ1 +. . .+an µn
for some (a1 , . . . , an ) ∈ Rn .

Proof. The set L = {(Λg, µn+1 g) : g ∈ H} is a compact convex subset in Rn+1 . Let π (n) :
(x1, . . . , xn+1 ) 7→ (x1 ,. . . , xn ) and πn : (x1 , . . . , xn+1 ) 7→ xn . Clearly K = π (n) (L), and
−1
πn L ∩ π (n) ({c}) is a nonempty compact interval [c∗ , c∗ ]. There are two alternatives,
either c∗ = c∗ or c∗ < c∗ .
Case c∗ = c∗ : We claim that L is contained in a non vertical hyperplane containing the
origin, that is some (a1 , . . . , an ) ∈ Rn ,
Xn
(8.38) un+1 = a j uj , (u1 , . . . , un+1 ) ∈ L
j=1

We show that for any c′ ∈ K \ {c}, there exists a unique c′ ∈ R such that (c′ , c′ ) ∈ L.
Suppose this is not the case and that for some c′ ∈ K \ {c} there are c′ , c′ ∈ R with
c′ < c and such that (c′ , c′ ), (c′ , c′ ) ∈ L. As c is relative interior of K, there exists a point
(c′′ , c′′ ) ∈ L such that c′′ lies in the line containing c and c′ so that c is in the interior of
the straight segment from c′′ to c′ , that is
c = tc′′ + (1 − t)c′
for some 0 < t < 1. This implies that

t c′′ , c′′ + (1 − t) c′ , c′ = c, tc′′ + (1 − t)c′ ) ∈ L

t c′′ , c′′ + (1 − t) c′ , c′ = c, tc′′ + (1 − t)c′ ) ∈ L
but as c′ < c′ and 0 < t < 1, this contradicts the fact that c∗ = c∗ . Consequently, L is a
convex set that intersects any vertical line in at most one point, i.e., L is contained in a
non—vertical hyperplane through the origin and (8.38) holds for some (a1, . . . , an ) ∈ Rn .
This means that for any g ∈ H
Z

µn+1 g − (a1 µ1 g + . . . + an µn g) = g fn+1 − (a1 f1 + . . . + an fn ) dν = 0,
P
that is, fn+1 = nj=1 kj aj ν–a.s. Choosing g ∗ ∈ H with (Λg ∗ , µn+1 g ∗ ) = (c, c∗ ) and setting
g∗ = g ∗ we have that (8.36) and (8.37) hold vacuously.
Case c∗ < c∗ : Choose g∗ , g ∗ ∈ H so that (Λg∗ , µn+1 g∗ ) = (c, c∗ ) and (Λg ∗ , µn+1 g ∗ ) = (c, c∗ ).
8.8. Lyapunov’s convexity theorem 203

Since 0 ∈ M , the affine space Y generated by M is a linear subspace of Rn . As (c, c∗ ) and

(c, c∗ ) are in the boundary of L, there exist x∗ , x∗ ∈ Y ∗ such that
x∗ (u, u) ≥ x∗ (c, c∗ )
x∗ (u, u) ≤ x∗ (c, c∗ )
for all (u, u) ∈ L. There exists (a1 , . . . , an+1 ) and (b1 , . . . , bn ) in Rn+1 such that
x∗ (u, u) = an+1 u − (a1 u1 + . . . + an un )
x∗ (u, u) = bn+1 u − (b1 u1 + . . . + bn un )
for all (u, u) ∈ L. As c is a relative interior point of M , (c, c) is a relative interior point of
L for any c∗ < c < c∗ . Then x∗ (c, c) > x∗ (c, c∗ ) and x∗ (c, c) < x∗ (c, c∗ ) and so, an+1 and
bn+1 are positive. Without loss of generality, we may assume that an+1 = 1 = bn+1 . Hence,
for any g ∈ H
Z Z

(8.39) g fn+1 − (a1 f1 + . . . + an fn ) dν ≤ g ∗ fn+1 − (a1 f1 + . . . + an fn ) dν
Z Z

(8.40) g fn+1 − (b1 f1 + . . . + bn fn ) dν ≥ g∗ fn+1 − (b1 f1 + . . . + bn fn ) dν

In particular, (8.39) holds for any g ∈ H that takes value 1 on {fn+1 −(a1 f1 +. . .+an fn ) > 0}
and 0 on {fn+1 −(a1 f1 +. . .+an fn ) < 0}. This implies that g ∗ satisfies the desired conditions.
Similarly, (8.40) holds for any g ∈ H taking vale 1 on {fn+1 − (b1 f1 + . . . + bn fn ) < 0} and 0
on {fn+1 −(b1 f1 +. . .+bn fn ) > 0}, which implies that g∗ satisfies the desired conditions.
Corollary 8.8.6. Suppose µ1 , . . . , µn , µn+1 are probability measures on (Ω, F ). Assume
µj ≪ ν for some σ–finite measure on (Ω, F ) for all j = 1, . . . , n + 1. Let 0 < α < 1 and g∗
and g ∗ be the solutions to
φ∗ = arg min{µn+1 g : g ∈ H, µj g = α, 1 ≤ j ≤ n}
(8.41)
φ∗ = arg max{µn+1 g : g ∈ H, µj g = α, 1 ≤ j ≤ n}
Then, either µn+1 g∗ < α < µn+1 g ∗ or µn+1 = a1 µ1 + . . . + an µn+1 for some (a1 , . . . , an ) ∈
[0, 1]n with a1 + . . . + an = 1.

Proof. Without loss of generality, we may assume that µ1 , . . . , µn are linearly independent.
We proceed by induction. If n = 1 then, as α = αµ1 (Ω) + (1 − α)µ1 (∅) ∈ (0, 1), it follows
that α is an interior point of K = {µ1 g : g ∈ H} and so, the solutions g∗ and g ∗ to (8.41)
satisfy µ2 g∗ < µ2 g ∗ unless µ2 = µ1 . When µ2 6= µ1 , it follows from Theorem (8.8.4)
that µ2 g∗ < µ2 τ = α < µ∗2 g. This proves that the statmnt for n = 1. Suppose that the
statement of the Corollary holds for 1, . . . , n. Then, for each j = 1, . . . , n there exist g∗j
and gj∗ such that µn g∗j < α < µn gj∗ . It follows that the point α = (α, . . . , α) ∈ Rn is an
interior point of K = {(µ1 g, . . . , µn g) : g ∈ H}. Consequently, by Theorem (8.8.5) the
solutions g∗ and g ∗ to (8.41) satisfy µn+1 g∗ < µn+1 g ∗ unless µn+1 is a convex combination
of µ1 , . . . , µn . If µn+1 is not in the convex hull of {µ1 , . . . , µn }, it follows from Theorem 8.8.4
that µn+1 g∗ < µn+1 τ = α < µn+1 g ∗ . This concluded the proof by induction.
204 8. Lp spaces

The following application of the Lyapunov convexity theorem shows the existence of
consensus partitions for nonatomic finite measures.
Theorem 8.8.7. (Dubins–Spanier) Let µ1 , . . . , µm be nonatomic signed Pnmeasures of finite
variabtion on a measurable space (Ω, F ). Given α1 , . . . , αn ≥ 0 with j=1 αj = 1, There
is a measurable partition {A1 , . . . , An } of Ω such that µi (Aj ) = αj µi (Ω) for all i = 1, . . . m,
j = 1, . . . , n.

Proof. As (1 − α)µi (∅) + αµi (Ω) = αµi (Ω) for all 0 ≤ α ≤ 1, Lyapunov’s convexity
theorem[(ii)] implies that there exists a measurable set A1 ⊂ Ω such that
µi (A1 ) = α1 µi (A1 ), i = 1, . . . , m
Similarly, there exists a measrable set A2 ⊂ Ω \ A1 such that
α2
µi (A2 ) = µi (Ω \ A1 ) = α2 µi (Ω), i = 1, . . . , m
α2 + . . . + αn
where we interpret α2 /(α2 + . . . + αn ) = 0 if α2 = . . . = αn = 0. Continuing this way, for
Sj−1
any j = 1, . . . , n − 1 there is a measurable set Aj ⊂ Ω\ ⊂ ℓ=1 Aℓ such that
µi (Aj ) = αk µi (Aj ), i = 1, . . . , m
Sn−1
Let An := Ω \ j=1 Aj . Then

µi (An ) = 1 − α1 − . . . − αn−1 µi (Ω) = αn µi (Ω)
for all i = 1, . . . , m. {Aj : j = 1, . . . , n} is the desired partition.

8.9. Exercises
Exercise 8.9.1. Suppose f is a differentiable function in (a, b). Show that f is convex
if and only if f ′ is nondecreasing. In that case, αf = βf = f ′ . If in addition f is twice
differentiable, show that f is convex if and only if f ′′ (x) ≥ 0 for all a < x < b.
Exercise 8.9.2. (Young’s inequality) Suppose that g : [0, ∞) → [0, ∞) is continuous
R and
strictly increasing with g(0) = 0. Let h = g −1 be its inverse of g. Define Φ(x) = x g(u) du
Ry 0
and Ψ(y) = 0 h(u) du. Show that
ab ≤ Φ(a) + Ψ(b), a, b ≥ 0
(Hint: Plot a graph of g and compare the area of a rectangle of sizes a times b with the
area under the graphs of g and h.)
Exercise 8.9.3. (a) Given a function ϕ : (a, b) → (0, ∞), show that if x 7→ log(ϕ(x)) is
Given a function ψ : (0, ∞) → R, show that ψ is convex iff the
convex, then so is ϕ. (b)
function ψ ∗ (x) = xψ x1 is convex.
Exercise 8.9.4. The following inequality is a slight generalization to Hólder’s inequality.
Ω P
Let fj ∈ R and pj ∈ R+ (j = 1, . . . , n) with j p1j = 1. Show that
kf1 · · · fn k ≤ kf1 kp1 · · · kfn kpn .
8.9. Exercises 205

Exercise 8.9.5. Show that L∞ is a an algebra of functions, and that k k∞ is a complete

multiplicative seminorm, i.e., kf gk∞ ≤ kf k∞ kgk∞ . Show that if g ∈ L∞ and f ∈ Lp
(1 ≤ p ≤ ∞), then g f ∈ Lp and kg f kp ≤ kgk∞ kf kp .
Exercise 8.9.6. (Lp spaces
R for 0 0, B(0, 1) ⊂
r−1/p B(0; r). (Hint: Show that (a + b)p ≤ ap + bp for all a, b ≥ 0.)
Exercise 8.9.7. Show that there are sequences that converge in Lp which not necessarily
converge pointwise.
Exercise 8.9.8. Let Ω be a nonempty set, C a countable collections of subsets of Ω, and
F = σ(C). Suppose µ is a measure on (Ω, F ). Show that the set S ∗ of integrable simple
functions is dense in Lp (µ) for all 1 ≤ p < ∞.
Exercise 8.9.9. Suppose that f is a complex or extended–real measurable function in
(Ω, F , µ) and that kf k∞ > 0. Define the map
ϕ(p) := kf kpp (0 < p < ∞)
Let E = {p : ϕ(p) < ∞}. Show that
(a) If r 0 : α1 ∈ E}. Show that the map α 7→ log(kf k1/α ) is a convex
function in the interior of E −1 .
(e) If r < p < s show that kf kp ≤ kf kr ∨ kf ks , so Ls ∩ Lr ⊂ Lp .
(f) Show that Lr ∩ Ls , with kf k := kf ks ∨ kf ks , is a Banach space and that the
inclusion Lr ∩ Ls ֒→ Lp , f 7→ f , is continuous.
(f) Assume that kf kr < ∞ for some 0 < r < ∞. Show that limp→∞ = kf k∞ .
Exercise 8.9.10. In addition to the assumptions in exercise 8.9.9, assume µ(Ω) < ∞.
1 1
(a) Show that kf kr ≤ kf ks (µ(Ω)) r − s whenever 0 < r ≤ s ≤ ∞.
(b) Show that Ls is dense in Lr whenever 1 ≤ r ≤ s ≤ ∞, that is, for any f ∈ Lr ,
there is a sequence fn ∈ Ls such that kfn − f kr → 0.
(c) Assume that µ(Ω) = 1. If kf kr < ∞ for some 0 < r, show that
Z
lim kf kp = exp log |f | dµ
p→0 Ω
where exp(x) = ex for x ∈ R and exp(−∞) = 0.
Exercise 8.9.11. Suppose f, g ∈ Lp (Ω, F , µ), and f, g ≥ 0. If 0 < α < 1, show that
Z Z
|f − g | dµ ≤ |f − g|p dµ
α α p/α
206 8. Lp spaces

If α ≥ 1, show that
Z Z
p α−1 p
α α p/α
|f − g | dµ ≤ α α (f ∨ g) α p |f − g| α dµ

p
Z 1− 1 Z 1
α α
≤ αα (f ∨ g)p dµ |f − g|p dµ

p
Z 1− 1 Z 1
p α α
≤α α (f + g) dµ |f − g|p dµ

Thus, for {fn : n ∈ N} ⊂ L+ α α

p , limn kfn − f kp = 0 iff limn kfn − f kp/α = 0.

p |f | > λ ≤ kf 1 p
Exercise 8.9.12. If f ∈
L p show that λ {|f |>λ} kp for all λ > 0. Conclude
that limλ→∞ λp |f | > λ = 0.
Exercise 8.9.13. Suppose 1 ≤ p1 < p2 ≤ ∞.
(a) If p1 1}
and f2 = f − f1 ).
(b) Show that Lp1 + Lp2 with
kf k := inf{kukp1 + kvkp2 : f = u + v, u ∈ Lp1 , v ∈ Lp2 },
is a Banach space.
(c) Show that the inclusion Lp ֒→ Lp1 + Lp2 , f 7→ f , is continuous.
Exercise 8.9.14. Let (Rd , B(Rd ), µ) be a Borel measure space.
(a) Show that
n Z o
d
Θ= θ∈R : eθ·x µ(dx)

is a convex (possibly empty) subset of Rd .

(b) Suppose that Θ 6= ∅. Show that µ(A) < ∞ for any compact set A ⊂ Rd .
R
(c) Let M (θ) = log eθ·x µ(dx) for all θ ∈ Θ. If 0 < µ(A) < ∞, show that
1 −xA ·θ
e−M (θ) ≤ e
µ(A)
1
R
where xA = µ(A) z µ(dz) ∈ Rd .
(d) Show that the set D of all such xA is dense in the support supp(µ) of µ.
Exercise 8.9.15. Let (Ω, F , µ) be a measure space and let C ⊂ Lp (µ), p ≥ 1, be a cone.
Suppose Z ∈ Lq (µ), p1 + 1q = 1, and c ≥ 0 are such that
Z
α := sup ZW dµ ≤ c.
W ∈C

Show that α ≤ 0. In addition, if C contains −L+ +

p (µ), show that Z ∈ Lq (µ).

Exercise 8.9.16. Show that k0k0 = 0 and k1A k0 = 1 ∧ µ(A) for all A ∈ M .
8.9. Exercises 207

Exercise 8.9.17. Show that the following statements are equivalent.

(i) E ⊂ L0 is bounded in L0 .
(ii) limλ→0 supf ∈E kλ f k0 = 0.
(iii) For any ε > 0, there is a constant Cε > 0 such that supf ∈E µ(|f | > Cε ) ≤ ε.
Exercise 8.9.18. Suppose (fn : n ∈ N) and (gn : n ∈ N) are two sequences of real–values
measurable functions such that (fn ) is bounded in L0 . If µ(Ω) < ∞ and

lim µ(fn ≤ t, gn ≥ t + ε) + µ(fn ≥ t + ε, gn ≤ t) = 0
n
for all t ∈ R and ε > 0, show that fn − gn → 0 in L0 .
Exercise 8.9.19. Given a metric space (S, ρ), let F : [0, ∞) → [0, ∞) be a nondecreasing
continuous function such that F (t) = 0 iff t = 0 and F (s + t) ≤ F (s) + F (t). Show that
d := F ◦ρ is also a metric on S, and that the identity maps (S, ρ) → (S, d) and (S, d) → (S, ρ)
are uniformly continuous. In particular, we can choose F to be bounded, for instance t ∧ 1,
t
1+t or arctan(t).

Exercise 8.9.20. Suppose that fn converges to f in Lp for some 1 ≤ p < ∞. Show that
fn converges to f in measure.
Exercise 8.9.21. Consider the space L∞ ([0, 1], B([0, 1]), λ). Show that there is a bounded
linear functional Λ 6= 0 on L∞ that vanishes
R on C([0, 1]). Conclude that there is not
g ∈ L1 ([0, 1], B(0, 1]), λ) such that Λg = [0,1] f g dλ for all g ∈ L∞ . Thus, (L∞ )∗ 6= L1 .
Exercise 8.9.22. Let A be the collection of all subsets A of [0, 1] such that A or [0, 1] \ A
is at most countable. This is a σ–algebra. Let ν beP the counting measure on A. Show that
f ∈ L1 (µ) iff C(f ) := {f 6= 0} is countable and x∈C(f ) |f (x)| < ∞. Let g(x) = x for all
x ∈ [0, 1]. Show that g is not A–measurable; however, f g ∈ L1 (µ) whenever f ∈ L1 (µ).
Show that the linear functional Λ : L1 (µ) → R
Z X
Λ(f ) := f (x)g(x) µ(x) = xf (x)
[0,1]
is continuous. Conclude that (L1 (µ))∗ 6= L∞ (µ) in this situation. (Observe that µ is not
σ–finite)
Exercise 8.9.23. Suppose (Ω, M , P) is a probability space. Show that for any 0 ≤ p ≤ ∞,
the dimension of the vector space Lp is given by
[
dim(Lp ) = max n ∈ Z+ : ∃A1 , . . . , An ∈ M disjoint, Ω = An , P[Aj ] > 0
n
(Hint: If {An } is a finite partition of Ω with 0 < µ(An ) < 1 then {1An } is a linear
independent set in Lp for all 0 ≤ p ≤ ∞.)
Exercise 8.9.24. Suppose that µ(Ω) < ∞. For any measurable functions f and g in a
separable metric space S define
(8.42) α(f, g) := kρ(fn , f )k0 = inf{ε > 0 : µ(ρ(f, g) > ε) ≤ ε}
208 8. Lp spaces

Show that
(a) α defines a metric on M (S).
(b) fn converges to f in measure if and only if limn α(fn , f ) = 0.
p
Hint: (Show that D F (f,g)
µ(Ω)+1 ≤ α(f, g) ≤ DF (f, g), where F (t) = t ∧ 1.)

Exercise 8.9.25. Given a pair of real–valued functions g ≤ h in L1 , define [g, h] := {f ∈

L1 : g ≤ f ≤ h]. Show that
d(f, [g, h]) := inf{kf − f ′ k1 : f ′ ∈ [g, h]} = kf − fgh k1
where fgh := g ∨ (f ∧ h).
Exercise 8.9.26. Show that
(a) If I ⊂ L1 (Ω, F , µ) is finite, then I is uniform integrable.
(b) If I and H are two uniform integrable families then I ∪H , |I | := {|f | : f ∈ I },
I + aH := {f + ag : f ∈ I , g ∈ H } are uniformly integrable.
(c) If I is uniformly integrable and for each g ∈ H there is f ∈ I such that |g| ≤ |f |,
then H is uniformly integrable.
Exercise 8.9.27. Suppose that f ∈ L1 (Ω, R F , µ). Show that for any ε > 0 there is δ > 0
such that, if A ∈ F and µ(A) < δ, then A |f | dµ < ǫ. (Hint: the nondecreasing
sequence
gn = |f | ∧ n converges to |f |. Hence, by dominated convergence gn − |f | 1 → 0.)
Exercise 8.9.28. Suppose f ∈ L1 (Ω, F , µ). Show that for any ε > 0, there exists E ∈ F
with µ(E) < ∞ such that
Z
|f | dµ < ǫ
Ω\E

(Hint: The nondecreasing sequence hn = |f |1{|f |> 1 } converges to |f |. Hence, by dominated

n
convergence hn − |f | → 0.)
1

Exercise 8.9.29. Suppose µ1 , . . . , µn+1 are signed measures on (Ω, F ) of finite total vari-
ation, and let H := {g ∈ Mb (Ω) : 0 ≤ g ≤ 1}. Define Λ : g 7→ (µ1 g, . . . , µn g) on H and set
K := Λ(H). Suppose g ∗ ∈ H and Λg ∗ = c for some c ∈ K. If
g ∗ (x) = 1 when fn+1 (x) > a1 f1 (x) + . . . + an fn (x)
g ∗ (x) = 0 when fn+1 (x) < a1 f1 (x) + . . . + an fn (x)
with aj ≥ 0 for all j = 1, . . . , n, show that g ∗ = arg max{µn+1 g : g ∈ H, µj g ≤ cj , j =
1, . . . , n}.
Exercise 8.9.30. LetRν be a σ–finite measure on (R, B(R)). Suppose f is a probability
R den-
sity w.r.t ν and that |t|f (t) ν(dt) < ∞. For any 0 < α < 1 show that α, α tf (t) ν(dt)
is an interior point of the compact convex set
Z Z

K= g(t)f (t) ν(dt), g(t)tf (t) ν(dt) : 0 ≤ g ≤ 1, g ∈ L∞ (ν) .
8.9. Exercises 209

(Hint: Set µ1 (t) = f (t) ν(dt) and µ2 (t) = tf (t) ν(dt). Apply Theorem 8.8.3[(i)] to show α is
interior point of the image of g 7→ µ1 g, g ∈ H. Use Theorems 8.8.4, 8.8.5 and comparison
with g(t) ≡ α.)
Chapter 9

Finite product of
elementary integrals

9.1. The iterated mean

Suppose mX and mY are elementary integrals on the ring lattices closed under chopping
EX ⊂ Bb (X) and EY ⊂ Bb (Y ) respectively.
Definition 9.1.1. Let E ⊂ X × Y . For any (x, y) ∈ X × Y , the x–cross section and
y–cross section of E are given by
Ex = {y ′ ∈ Y : (x, y ′ ) ∈ E}, E y = {x′ ∈ X : (x′ , y) ∈ E}
respectively. Similarly, for any set R and a function f : X × Y → R, the maps fx : Y → R
given by y ′ 7→ f (x, y ′ ) and f y : X → R given by x′ 7→ f (x′ , y) are the x and the y cross
sections of f respectively.

The collection E of functions of the form

N
X
(9.1) ϕ(x, y) = φX Y
j (x)φj (y), N ∈ N, φX Y
j ∈ E X , φj ∈ E Y
j=1

is a ring of bounded functions on X × Y . The map m = mX ⊗ my on E defined by

Z XN
φdm = m(ϕ) = (mX ⊗ mY )(φ) = mX (φX Y
j )mY (φj )
j=1

is a well defined elementary integral. Indeed, if φ is of the form (9.1), then for each x ∈ X,
φx is a function in EY . So we can apply mY to φx and
Z XN
φx (y) mY (dy) = mY (φx ) = φX Y
j (x)mY (ψj )
j=1

211
212 9. Finite product of elementary integrals

R
is independent of the representation (9.1). Thus, the map x 7→ φx (y) mY (dx) = mY (φx )
is a well defined function in EX , and so we can apply mX to it and obtain
X
N N
X
mX φX Y
j mY (ψj ) = mX (φX Y
j )mY (ψj ) = m(φ).
j=1 j=1

Notice that the definitions of EX ⊗ EY and mX ⊗ mY are symmetric in X, Y .

Suppose that both mX and mY are positive and σ–continuous. We will use Daniell’s
procedure to a mean that dominates the elementary integral m. However, since E is not in
general a lattice, we cannot introduce the notion of upper integral. Instead, we consider
Z ♭ Z ∗Z ∗
m♭ (f ) = f dm := f (x, y) mY (dy) mX (dx)
X×Y
and kf k♭ = m♭ (|f |) for any f ∈ R .
Lemma 9.1.2. m♭ is an positive σ–continuous elementary integral on E. k k♭ is mean for
E and agrees with m = mX ⊗ mY on E+ .

Proof. Suppose (φn ) ⊂ E decreases to 0 pointwise. For any x ∈ X, (φn )x : n ∈ N ⊂ EY
and (φn )x ց 0. Thus ψn (x) = mY (φn )x ∈ EX decreases to 0. Consequently mX (ψn ) ց 0.
This shows that m is σ–continuous.
P
If φ(x, y) = N X Y X Y
j=1 φj (x)φj (y) with φj ∈ EX and φj ∈ EY , then
N
X
|φ(x, y)| ≤ |φX Y
j (x)||φj (y)| ∈ E
j=1

Hence
N
X
kφk♭ ≤ kφX ∗ Y ∗
j kmX kφj kmY < ∞
j=1

and
Z Z Z ∗Z
φ(x, y) mY (dy) mX (dx)
|m(φ)| = φ(x, y) mY (dy) mX (dx) ≤
Z ∗ Z

≤ |φ(x, y)| mY (dy) mX (dx) = kφk♭ .

Equality holds if φ ≥ 0. Absolute homogeneity and solidity of k k♭ follow directly from the
absolute homogeneity and solidity of k k∗mX and k k∗my .
X×Y
The subadditivity of m∗X and m∗Y implies that for any pair of functions f, g ∈ R ,
Z ∗Z ∗ Z ∗Z ∗ Z ∗
kf + gk♭ = |f + g| dmY dmX ≤ |f | dmY + |g| dmY dmX

≤ kf k♭ + kgk♭ .
9.1. The iterated mean 213

We claim that k k♭ is continuous along increasing sequences, that is, supn kfn k♭ = k supn f k♭
whenever 0 ≤ fRn ր f := supn fn . Indeed, for R ∗ any x ∈ X, 0 ≤ (fn )x ր fx . By
∗
Theorem 7.6.7, y) mY (dy) increases Rto R f (x, y) mY (dy). By the same token,
fn (x,
R∗R∗ ∗ ∗
fn (x, y) mY (dy) mX (dx) increases to f (x, y) mY (dy) mX (dx) and the claim
follows.
Continuity along nonnegative increasing sequences, combined with subadditivity, implies
that k k♭ is countable subadditive.
Suppose (φn : n ∈ N) ⊂ E+ . Then
XN ♭ X
N XN N
X

φ j = m φ j = m(φ j ) = kφj k♭ .
j=1 j=1 j=1 j=1
P ♭

Hence, supN N j=1 φ j < ∞, implies that m(φj ) = kφj k♭ → 0 as j → ∞. Therefore, k k♭
is a mean for E.

Now that we have a mean k k♭ that dominates the elementary integral m, we can extend
m uniquely to L1 (k k♭ ) as in Theorem 6.5.1 so that all the good properties of integration
such as linearity and dominated convergence hold.
Theorem 9.1.3. If f ∈ L1 (k k♭ ), then:
(i) For k k∗mx –a.a. x ∈ X, the function fx ∈ L1 (k kmY ).
F R
(ii) The k k∗mX –a.s. defined function x 7→ f (x, y) mY (dy) is k k∗mX –integrable.
R R
(iii) The value of F (x) mX (dx) = f dm, that is,
Z Z Z Z
f dm = F (x) mX (dx) = f (x, y) mY (dy) mX (dy).

G R∗
Proof. First notice that if kgk♭ = 0, then the function x 7→ |g|(x, y) mY (x) is defined
k k∗mX –a.s. and kGk∗mX = 0. Consequently, for k k∗mX –almost all x ∈ X, the map fx is
k k∗mY –negligible.

If f ∈ L1 (k k♭ ), then there exists a sequence (φ(n) ) ⊂ E such that

X X
(9.2) kφ(n) k♭ < ∞, and f = φ(n) k k♭ –a.s.
n n
R R Pn
and so, f dm = limn k=1 φ(k)
dm. Let
n X X o
N = (x, y) : |φ(n) (x, y)| = ∞, and f 6= φ(n)
n n

From (9.2), k1N k♭ = 0 and thus, the set

n Z ∗ o
N1 = x ∈ X : 1N (x, y) mY (dy) > 0
214 9. Finite product of elementary integrals

is k k∗mX –negligible, that is, k1N1 k∗mX = 0. Again, by (9.2), the set
n Z ∗ X o

N2 = x ∈ X : |φ(n) (x, y)| mY (dy) = ∞
n

is k k∗mX negligible.
P P (n)
Let s(n) = nk=1 φ(k) and Φ = n |φ(n) |. For all x ∈ X \(N1 ∪N2 ), the sequence (sx ) ⊂ EY
converges to fx k k∗my –a.s. and kΦx kmY < ∞. Since |s(n) | ≤ Φ for all n ∈ N, fx is k kmY –
R R
integrable and In (x) = s(n) (x, y) mY (dy) → f (x, y) mY (dy) = F (x). This shows that
(i) holds. Clearly (In ) ⊂ EX ,
Z X n Z

In (x)| ≤ (k)
|φ (x, y)| mY (dy) ≤ Φ(x, y) mY (dy),
k=1

and
Z Z X
♭
kΦk = Φ(x, y) mY (dy) mX (dx) ≤ kφ(n) k♭ < ∞.
n

By dominated convergence, F is k k∗mX –integrable

and
Z Z Z Z
F dmX = f (x, y) mY (dy) mX (dx) = In (x) mX (dx)
Z Z X
n Z
= lim φ(k) (x, y) mY (dy) mX (dx) = f dm
n
k=1

9.2. Fubini and Tonelli’s theorems

In the previous section, we built a mean k k♭ which dominates the elementary integral m =
mX ⊗ mY on EX ⊗ EY by first integrating with respect to mY and then with respect to mX .
An alternative mean k k† may be obtained by inverting the order of integration. Although
both means coincide on E, there is no guarantee that they are equal. The natural way to
overcome this problem is to consider the maximal mean k k∗m that coincides with m on E+ .
Such maximal mean exists by Theorem 7.6.2; moreover, by Lemma 7.6.1, k k♭ , k k† ≤ k k∗m
with equality on ERΣ . We call k k∗m the Daniell product mean for (EX ⊗ EY , mX ⊗ mY ).
Integration theory shows that L1 (k k∗m ) is a Stone lattice.
R
The following results show how to evaluate the integral X×Y f dm by iterated integrat-
ing, and conditions under which a function f is integrable.
Theorem 9.2.1. (Fubini) If f ∈ L1 (k k∗m ), m = mX ⊗ my , then
(i) For k k∗mx –a.a. x ∈ X, Rthe function fx is k k∗mY –integrable, and the k k∗x –a.s.
defined function G : x 7→ f (x, y) mY (dy) is k k∗mX –integrable.
(ii) For k k∗mY –a.s. y ∈ Y , the y ∗ ∗
R function f is k kmX∗ –integrable, and the k kmY –a.s.
defined function H : y 7→ f (x, y) mX (dx) is k kmY –integrable.
9.2. Fubini and Tonelli’s theorems 215

(iii) The iterated integrals coincide and

Z Z Z
f dm = f (x, y) mY (dy) mX (dx)
Z Z
(9.3) = f (x, y) mX (dx) mY (dy)

Proof. This is a direct consequence of L1 (k k∗ ) ⊂ L1 (k k♭ ) ∩ L1 (k k# ) and Theorem 9.1.3.

Remark 9.2.2. k k∗m ≪ k k♭ and vice versa, k k♭ ≪ k k∗ . Indeed, for any set N ⊂ X × Y ,
if k1N k∗m = 0, then 1N ∈ L1 (k k∗m ) ⊂ L1 (k k♭ ) and by Fubini’s theorem, kN k∗m = kN k♭ = 0.

Fubini’s theorem on its own is not useful unless we know before hand that the function
of interest is already integrable in the product mean. The following result states conditions
for integrability in terms of measurability and iterated integration.
Theorem 9.2.3. (Fubini–Tonelli) Suppose f ∈ MR(k k∗m ) and σ–finite. f ∈ L1 (k k∗m ) iff
one of the iterated upper integrals
Z ∗Z ∗ Z ∗Z ∗
|f (x, y)| mY (dy) mX (dx), or |f (x, y)| mX (dx) mY (dy)

is finite. In either case, both integrals coincide and equal to kf k∗m and (9.3) holds.

Proof. (Necessity) If f is integrable, so is |f | and the conclusion follows from Fubini’s

theorem.

(Sufficiency) Let k k♭ be an iterated mean and assume kf k♭ < ∞. We will show that f 1A ∈
L1 (k k∗m ) for any m–integrable set A ⊂ X×Y . Indeed, there is a sequence of pairwise disjoint
u
m–integrableS sets∗An ⊂ A in and aP n ) ⊂ E such that f 1An = φn 1An
sequence of functions (φP
and kA \ n An km = 0. Let gn = nk=1 φk 1Ak and Gn = nk=1 |φk |1Ak . Then |gn | = Gn ≤
|f |1A m–a.s. The sequence {Gn : n ∈ N} ⊂ L1 (k k∗m ) ⊂ L1 (k k♭ ) increases m–a.s., and hence
k k♭ –a.s., to |f |1A and kf 1A k♭ ≤ kf k♭ < ∞; hence kGn − |f |1A k♭ = kgn − f 1A k♭ → 0 by
dominated convergence. This means that supn kGn k♭ = kf 1A k♭ and, as k k∗m is a maximal
mean,
kf 1A k♭ = sup kGn k♭ = sup kGn k∗m = kf 1A k∗m .
n
By dominated convergence we get that kgn − f 1A k∗m → 0 and f 1A ∈ L1 (k k∗m ).

To conclude the proof, let (Bn ) be a sequence of pairwise disjoint m–integrable sets such that
S ∗ ♭
Pn
n ∈ L1 (k km ) ⊂ L1 (k k ). Since |
n Bn = {f 6= 0}. It follows that each f 1BP k=1 f 1Bk | ր
|f | and kf k♭ < ∞, f ∈ L1 (k k♭ ) and kfP− nk=1 f 1Bk k♭ → 0. The same argument used to
prove the claim above shows that kf − nk=1 1Bk k∗m → 0 and that f ∈ L1 (k k∗m ).
Corollary 9.2.4. Let f ∈ RX and g ∈ RY .
(i) If kf k∗mX = kgk∗mY = 0, then kf gk∗m = 0.
216 9. Finite product of elementary integrals

(ii) If f ∈ L1 (mX ) and g ∈ L1 (mY ), then f g ∈ L1 (mX ⊗ mY ).

(iii) If f ∈ MR(mX ) and g ∈ MR(mY ), then f g ∈ MR(mX ⊗ mY ).

Proof. (i) Suppose f is mX –negligible and g is my –negligible. For any ε > 0 there are
hX ∈ (EX )↑+ and hY ∈ (EY )↑+ with |f | ≤ hX and |g| ≤ hY such that khX k∗mX < ε and
khY k∗mY < ε. Since hX hY ∈ (EX ⊗ EY )↑+ , khX hY k∗m = khX hY k♭ = khX k∗mX khY k∗mY < ε2 .
Consequently, by solidity, kf gk∗ = 0.
Suppose f ∈ L1 (mX ) and g ∈ L1 (mY ). There are sequences (φX Y
n ) ⊂ EX and (φn ) ⊂ EY
such that φX Y
n → f in L1 (mX ) and mX –a.s. and φn → g in L1 (mY ) and mY –a.s. By (i),
X Y
φn φn → f g mX ⊗ my –a.s. and
lim kφX Y ∗ X Y ♭ ∗ ∗
n φn km = lim kφn φn k = kf kmX kgkmy < ∞
n n
By Daniell–Fatou’s lemma, f g ∈ L1 (m).
Suppose f is mX –measurable and g is mY –measurable. Then it is clear that f g is measurable
in any integrable boxes, that is, sets of the form AX × AY where AX ∈ L1 (mX ) and
AY ∈ L1 (my ). We claim that any integrable set A ∈ L1 (m) is m–a.s. contained in a
P
countable union of integrable boxes. Observe that if φ = N X Y X
j=1 φj φj , where φj ∈ EX and
φYj ∈ EY , then
N
[
{φ 6= 0} ⊂ {φX Y
j 6= 0} × {φj 6= 0}
j=1

⊂ E is a sequence that converges in k k∗m –mean and k k∗m –a.s. to 1A , we

Hence, if (φn ) S
have that A = n {φn 6= 0} m–a.s. showing that the claim holds true. The conclusion then
follows directly from localization (Theorem 7.2.1(iii)).
F
Example 9.2.5. If f ∈ MR(mX ) and g ∈ MR(mY ), then (x, y) 7→ f (x) + g(y) is in
M (mX ⊗ my ) since F (x, y) = f (x)1Y (y) + 1X (x)g(y) is the sum of measurable functions.

Given two positive Radon measures (C00 (X), mX ) and C00 (Y ), mY ), where X and Y
are l.c.H spaces, the product mX ⊗ mY constructed from C00 (X) ⊗ C00 (Y ) defined Radon
measure on C00 (X × Y ).
Theorem 9.2.6. Suppose (C00 (X), mX ) (C00 (Y ), mY ) are positive Radon measures on lo-
cally compact Hausdorff spaces X and Y . If f ∈ C00 (X × Y ), then
(i) f ∈ L1 (mX ⊗ mY ), and the maps
Z Z
F (x) = f (x, y)mY (dy), G(y) = f (x, y)mX (dx)
Y X
are continuous of compact support in X and Y respectively.
If g ∈ L1 (mX ⊗ mY ), then
(ii) Eg := {g 6= 0} is mX ⊗ mY –a.s. σ–compact.
9.2. Fubini and Tonelli’s theorems 217

(iii) gx is mY –integrable for mx –a.a. x ∈ X, g y is mX –integrable for my –a.a. y ∈ Y ,

and
Z Z Z
g(x, y)mX ⊗ mY (dx, dy) = g(x, y) mY (dy) mX (dx)
X×Y
ZX Z Y
= g(x, y) mX (dx) mY (dy)
Y X

Proof. The Stone–Weierstrass theorem 5.3.17 implies that C00 (X × Y ) ⊂ C00 (X) ⊗ C00 (Y );
thus, C00 (X × Y ) ⊂ M (mX ⊗ mY ).

(i) For f ∈ C00 (X × Y ), let U ⊂ X and V ⊂ Y be open relatively compact sets such that
πX (supp(f )) × πY (supp(f )) ⊂ U × V where πX and πY are the projections onto X and
Y respectively. By Urysohn’s lemma, there are φ ∈ C00 (X) and ψ ∈ C00 (Y ) such that
πX (supp(f )) ≺ φ ≺ U and πY (supp(f )) ≺ ψ ≺ V . Hence, 1supp(f ) (x, y) ≺ φX (x)ψY (y) for
all (x, y) ∈ X × Y . The integrability of f follows from Fubini–Tonelli’s theorem.
R
It is clear that F (x) 7→ Y f (x, y) mY (dy) is supported in U . C00 (X). Fix x0 ∈ X and let
ε > 0. For any y ∈ Y there are neighborhoods x0 ∈ Uy and y ∈ Vy such that
|f (x, z) − f (x0 , y)| < ε, (x, z) ∈ U × V.
TN
Let {Vyk : k = 1, . . . N } be a finite subcover of πY (supp(f )) and set W = k=1 Uyk . Then
|f (x, y) − f (x0 , y)| ≤ ε(φ(x0 ) + φ(x))ψ(y), x ∈ W, y ∈ Y
Consequently
Z Z

|F (x) − F (x0 )| = f (x, y) − f (x0 , y) mY (dy) ≤ ε|φ(x0 ) + φ(x)| |ψ(y)|mY (dy)
Y Y
This proves that F ∈ C00 (X). A similar argument shows that G ∈ C00 (Y ).

(ii) If g ∈ L1 (mX ⊗ mY ) then Ef is σ–finite with respect to mX ⊗ mY . The regularity of

mX ⊗mY implies that there there is a σ–compact set F ⊂ E such that mX ⊗mY (E \F ) = 0.

(iii) follows from Fubini’s theorem.

Example 9.2.7. The conditions f ∈ M (mX ⊗ mY ) and finiteness of one of the iterative
integrals mX (mY f ) or mY (mX f ) are not enough to guarantee integrability of f , or even
equality of the iterative integrals. Consider X = R with the usual topology and Y = R
with the discrete topology. Clearly X and Y are l.c.H. spaces. The Lebesgue measure λ1
and the counting measure # are Radon measures on X and Y respectively. The diagonal
∆ = {(x, y) ∈ X × Y : x = y} is Borel measurable, and hence measurable with respect to
λ1 ⊗ #. It is easy to check that
Z Z Z Z
1∆ (x, y) dx #(dy) = 0, 1∆ (x, y) #(dy) dx = ∞
Y X X Y
The conflict here is that ∆ is not σ–finite with respect to λ1 ⊗ #. Also, (λ1 ⊗ #)(∆) = ∞.
218 9. Finite product of elementary integrals

Example 9.2.8. Fubini’s theorem implies that when (C00 (X), mX ) and (CR00 (Y ), mY ) are
positive
R Radon measures and f ∈ L1 (X × Y, mX ⊗ mY ), the maps x 7→ Y fx dmY and
y
y 7→ X f dmX are mX –measurable and mY –measurable respectively. However, they may
fail to be Borel measurable. As before, consider the l.c.H spaces X = R with the usual
topology and Y = R with the discrete topology, and let ∆ be the diagonal in X × Y .
The atomic measure δ0 and the counting measure # are Radon measures on X and Y
respectively. Let A ⊂ R be a non–Borel
set containing 0. The set ∆A := ∆ ∩ (X × A) is
a Borel set in X ×R Y and δ0 ⊗ # (∆A ) = 1. It follows that 1∆A ∈ L1 (X × Y, δ0 ⊗ #);
however, 1A (x) = Y (1∆A )x (y)#(dy) is not a Borel function on X.

9.3. A few applications of Fubibi’s theorem

As an application of Fubini’s theorem, we find expressions for the residual of fist order
approximation of convex functions in the real line.
Theorem 9.3.1. Suppose ϕ : (a, b) → R is a convex function and let µϕ be the unique
measure such that D ϕ+ (y) − D ϕ+ (x) = µϕ ((x, y]) for all −∞ ≤ a < x < y < b ≤ ∞. Then
Z
ϕ(y) = ϕ(x) + D ϕ+ (x)(y − x) + (t − y)+ µϕ (dt)
(a,x]
Z
(9.4) (y − t)+ µϕ (dt).
(x,b)
For x0 ∈ (a, b) fixed, the function
ϕx0 (x) = ϕ(x) − ϕ(x0 ) − D+ ϕ(x0 )(x − x0 )
Z Z
= (t − x)+ µϕ (dt) + (x − t)+ µϕ (dt)
(a,x0 ] (x0 ,b)
is nonnegative convex, nonincreasing in (a, x0 ) and nodecreasing in (x0 , b). Moreover, the
limits limxցa ϕx0 (x), limxրb ϕx0 (x) exist as numbers in [0, ∞].

Proof. Suppose that x < y. Then, by Fubini’s theorem

Z Z Z
ϕ(y) − ϕ(x) − D ϕ+ (x)(y − x) = D+ ϕ(s) − D+ ϕ(x) ds = µϕ (dt) ds
(x,y] (x,y] (x,s]
Z Z Z
= ds µϕ (dt) = (y − t) µϕ (dt).
(x,y] (t,y] (x,y]
If y < x then
ϕ(y) − ϕ(x) − D ϕ+ (x)(y − x) = −(ϕ(x) − ϕ(y) − D+ ϕ(y)(x − y))
+ (D+ (x) − D+ ϕ(y))(x − y)
Z Z
=− (x − t) µϕ (dt) + (x − y) µϕ (dt)
(y,x] (y,x]
Z
= (t − y) µϕ (dt).
(y,x]
9.3. A few applications of Fubibi’s theorem 219

The second statement follows directly from (9.4).

Theorem 9.3.2. (Generalized R Minkowski’s inequality) Under the hypothesis of Fubini’s
theorem, suppose that x 7→ Y |f (x, y)| ν(dy) is µ–a.s. finite. Then,
Z Z p 1 Z Z 1
p p p
(9.5) f (x, y) ν(dy) µ(dx) ≤ |f (x, y)| µ(dx) ν(dy)
X Y Y X
for all 1 ≤ p < ∞.

Proof. Without loss of generality we can assume that f ≥ 0. The R case p = 1 is a re-
statement of Fubini’s theorem. Suppose that p > 1 and let H(x) = Y f (x, y) ν(dy). From
Fubini’s theorem and then Hölder’s inequality we obtain
Z Z Z Z
kHkpLp (µ) = f (x, y) ν(dy)H p−1 (x) µ(dx) = f (x, y)H p−1 (x) µ(dx) ν(dy)
X Y Y X
Z Z 1
p
≤ |f (x, y)|p µ(dx) kHkp−1
Lp (µ) ν(dy),
Y X

and the conclusion follows for immediately if kHkp < ∞. If kHkp = ∞, choose monotone
sequences of sets An ⊂ X and Bn ⊂ Y such that µ(An ) ∨ ν(Bn ) < ∞, and for any k ∈ N
define fk = f ∨ k. Then
Z Z p 1/p Z Z 1/p
fk (x, y) ν(dy) µ(dx) ≤ |fk (x, y)|p µ(dx) ν(dy).
An Bm Bm An

Letting first k → ∞, then n → ∞ and finally m → ∞ we obtain the desired result.

Suppose EX ⊂ Bb (X) is a ring lattice closed under chipping and let µ be a positive
σ–finite elementary integral on EX . Let ν be any Radon (Borel) elementary integral on the
Borel measurable space ([0, ∞), B([0, ∞))). From Corollary 9.2.4, it follows that for any
meaurable fuction f : X → [0, ∞], the set
E = {(x, t) ∈ X × [0, ∞) : f (x) > t}
is measurable on for the product ν ⊗ µ, for the function (x, t) ∈ X × [0, ∞) 7→ f (x) − t is
measurable.
Theorem 9.3.3. . Let ν be a Radon measure (Borel measure) on the half line [0, ∞). If
f ∈ L+1 (µ) then,
Z Z ∞

(9.6) ν [0, f (x)) µ(dx) = µ({f > t}) ν(dt)
X 0

In particular, if ϕ is a countinuously differentiable function with ϕ(0) = 0 and ν(dx) =

ϕ′ (x) dx, then
Z Z ∞
(9.7) (ϕ ◦ f ) dµ = µ({f > t})ϕ′ (t)dt.
X 0
220 9. Finite product of elementary integrals

Proof. As f ∈ L1 (µ), the set E = {(x, t) ∈ X × [0, ∞) : f (x) > t} ∈ M (ν ⊗ µ) is σ–finite.

By Fubini’s theorem
Z ∞ Z Z
µ(E t ) ν(dt) = 1E (x, t) µ ⊗ ν(dx, dt) = ν(Ex ) µ(dx)
0 X×[0,∞) X
R f (x)
In the special case (9.7), notice that ν([0, f (x))) = 0 ϕ′ (t) dt = ϕ(f (x)) by the funda-
mental theorem of Calculus.

9.4. The product σ–algebra

Given measurable spaces (X, A ) and (Y, B), the projections pX : (x, y) 7→ x and pY :
(x, y) 7→ y on X × Y , generate a σ–algebra, A ⊗ B, on X × Y called the product σ–
algebra of A and B. The measurable space (X × Y, A ⊗ B) is the called the product space
of (X, A) and (Y, B).
The following result states that cross sections of measureable sets and measurable func-
tions on the product σ–algebra are also measureable.
Lemma 9.4.1. Let E ∈ A ⊗ B and let f : (X × Y, A ⊗ B) → (R, R) be a measurable
function. Then, for any x ∈ X and y ∈ Y ,
(i) Ex ∈ B, E y ∈ A,
(ii) fx : (Y, B) → (R, R) and f y : (X, A) → (R, R) are measurable.

Proof. Statement (i) clearly holds for sets of the form A × B with A ∈ A and B ∈ B.
For each x ∈ X and y ∈ Y , consider the collection Dy = {D ∈ A ⊗ B : Dy ∈ A}
and Dx = {D ∈ A ⊗ B : Dx ∈ B}. It is easy to check that if E ⊂ F ⊂ X × Y and
{An : n ∈ Z+ } ⊂ X × Y , then
[ [
(F \ E)x = Fx \ Ex , An = (An )x .
x
n n
Similar results hold for the corresponding y–sections. From these observations, it follows
that Dx and Dy are both d–systems containing the π–system {A × B : A ∈ A, B ∈ B}.
Therefore, Dx = A ⊗ B = Dy .

Statement (ii) follows from noticing that (f −1 (B))x = (fx )−1 (B) and (f −1 (B))y = (f y )−1 (B)
hold any B ⊂ R.
Theorem 9.4.2. Let (X × Y, A ⊗ B) be the product space of the measurable spaces (X, A)
and (Y, B). For any C ∈ A ⊗ B, the collection of sections {Cx : x ∈ X} has at most the
cardinality of the continuum. In particular, if ∆ = {(x, x) : x ∈ X} ∈ A ⊗ A, then X has
at most the cardinality of the continuum.

Proof. There exists a sequence S = {An × Bn : An ∈ A, Bn ∈ B} such that C ∈ σ(S ). If

1An (x1 ) = 1An (x2 ) for all n, then
(9.8) C x1 = C x2
9.5. Image of elementary integrals 221

Indeed, (9.8) holds for each An × Bn ∈ S , and the collection D of subsets of X × Y for
which (9.8) holds is a σ–algebra. Therefore, there is a one–to–one map between the different
sections {Cx : x ∈ X} and the different sequences {(1An (x))n∈Z+ : x ∈ X} ⊂ {0, 1}Z+ .
The last statement follows from the fact that ∆x = {x} for each x ∈ X.
Theorem 9.4.3. Let (X, τX ) and (Y, τY ) be two topological spaces and let τX×Y be the
product topology on X × Y . If BX , BY and BX×Y are the corresponding Borel σ–algebras,
then BX ⊗ BY ⊂ BX×Y . Equality holds if both X and Y are second countable.

Proof. As the map pX : (x, y) 7→ x is continuous on (X × Y, τX×Y ), {A × Y : A ∈ BX } ⊂

BX×Y . Similarly, using pY : (x, y) 7→ y instead, we obtain that {X×B : B ∈ BY } ⊂ BX×Y .
Therefore, BX ⊗ BY ⊂ BX×Y .
If τX and τY have countable bases BX and BY respectively, then T = {U ×V : U ∈ BX , V ∈
BY } is a countable base for τX×Y . It follows that τX×Y ⊂ σ(T ) = BX ⊗ BY .

9.5. Image of elementary integrals

Let E1 ⊂ Bb (Ω1 ) and E2 ⊂ Bb (Ω2 ) be two vector lattices closed under chopping. Let n
be a positive σ–continuous elementary integral on E1 . Assume G : Ω1 −→ Ω2 satisfies
φ ◦ G ∈ L1 (n) for all φ ∈ E2 .
Theorem 9.5.1. The functional nG : φ 7→ n(φ ◦ G) is a positive σ–continuous elementary
integral on E2 . Moreover, f ∈ L1 (nG ) iff (f ◦ G) ∈ L1 (n). In either case,
nG (f ) = n(f ◦ G), f ∈ L1 (nG ).
If f ∈ MR(k k∗nG )) then f ◦ G is measurable on any k k∗n –integrable set of the form G−1 (B)
with B ∈ L1 (nG ). If nG is σ–finite, then so is n and f ∈ MR(k k∗nG ) then f ◦G ∈ MR(k k∗n ).

Proof. Since the integral extension of n to L1 (k k∗n ) is linear and positive, nG is linear and
positive. For any sequence (φk ) ⊂ E such that φk ց 0 we have that (φk ◦ G) ⊂ L1 (k k∗n ),
φk ◦ G ց 0. Therefore, by monotone convergence, nG (φk ) = n(φk ◦ G) = kφk ◦ Gk∗n → 0.
This shows that nG is a positive σ–continuous elementary integral on E2 .
The properties of the Daniell mean k k∗n imply that the functional k k♭ : f 7→ kf ◦ Gk∗n on
Ω
R 2 is a mean for E2 which coincides with the Daniell mean k k∗nG on E2 . By maximality,
k k♭ = k k∗nG on L1 (k k∗nG ). Hence
kφ ◦ G − f ◦ Gk∗n = kφ − f knG
for any f ∈ L1 (k k∗nG ) and φ ∈ E2 . Consequently f ∈ L1 (k k∗nG ) iff f ◦ G ∈ L1 (k k∗n ).
Therefore, if (φk ) ⊂ E2 converges to f in L1 (k k∗nG ), then n(f ◦ G) = limk n(φk ◦ G) =
limk nG (φk ) = nG (f ).
Suppose f ∈ MR(E2 , k k∗nG ). Then, for any B ∈ L1 (k k∗nG ) and ε > 0, there are ψ ∈
u
E2 and B0 ⊂ B, B0 ∈ L1 (k k∗nG ), such that kB \ B0 k∗nG < 2ε and f 1B0 = ψ1B0 . As
u
(ψ1B0 ) ◦ G ∈ L1 (k k∗n ), there is a function ϕ ∈ E1 and a k k∗n –integrable set A ⊂ G−1 (B0 )
222 9. Finite product of elementary integrals

ε
with kG−1 (B0 ) \ Ak∗n < 2 on which (ψ1B0 ) ◦ G = ϕ. Thus kG−1 (B) \ Ak∗n < ε and
(f ◦ G)1A = ϕ1A .

The last assertion follows from the second statement of the Theorem, the identity
[ [
G−1 Bn = G−1 (Bn )
n n

and Theorem 7.2.1(iii).

9.6. Change of variables formula in (Rn , B(Rn ), λ).

Our aim is to study the induced measure λG−1 for functions on Rn to itself that are smooth.
We will first consider linear transformations and then generalize to diffeomorphisms between
open sets.

9.6.1. Vitali’s covering theorem. We start by discusing two techinical results about
coverings of sets in Rn by closed balls. These results will be used in our proof of the change
of variable theorem and it the proof that equivalence of Lebesgue’s measure and Hausdorff’s
measure H n on Rn .

Lemma 9.6.1. (Vitali’s covering Lemma.) Let (X, d) be a separable metric space and
b be the concentric closed ball with
B a collection of closed balls. For any B ∈ B, let B
b
diam(B) = 5 diam(B). (i) If
(a) diam(B) > 0 for all B ∈ B
(b) D := supB∈B diam(B) < ∞
then, there exists a countable collection G ⊂ B of pairwise disjoint sets such that for any
c′ .
B ∈ B, there is B ′ ∈ G satisfying B ∩ B ′ 6= ∅ and B ⊂ B
S
Suppose ∅ =6 A ⊂ B. (ii) In addition to (a) and (b), if
(c) inf{diam(B) : x ∈ B, B ∈ B} = 0 for any x ∈ A
then, there exists a countable collection G of pairwise disjoint balls in B such that for any
finite collection {B1 , . . . , Bm } ⊂ B,
m
[ [
(9.9) A\ Bk ⊂ b
B.
k=1 B∈G\{B1 ,...,Bm }

Proof. (i) For each k ∈ N, define

Bk = {B ∈ B : 2−k D < diam(B) ≤ 2−k+1 D}.
By Zorn’s lemma, there is a maximal collection G1 ⊂ B1 of pairwise disjoint sets. Suppose
that collections G1 , . . . , Gk−1 of pairwise disjoint sets have been determined. If we have not
9.6. Change of variables formula in (Rn , B(Rn ), λ). 223

exhausted B, choose a maximal subcollection Gk of pairwise disjoint sets in

n k−1
[ o
B ∈ Bk : ∀ B ′ ∈ Gj , B ∩ B ′ = ∅ .
j=1
S
It is clear that the collection G = k Gk only has pairwise disjoint sets.
If B ∈ B, then there is a unique k ∈ N such that B ∈ Bk . The maximality of Gk implies
S
that there is B ′ ∈ kj=1 Gj such that B ∩ B ′ 6= ∅. Let z be an element in the intersection
and let x be the center of B ′ . Then, for any y ∈ B,
1 1
d(y, x) ≤ d(y, z) + d(z, x) ≤ diam(B) + diam(B ′ ) ≤ 2−k+1 D + diam(B ′ )
2 2
1 5
< 2 diam(B ′ ) + diam(B ′ ) = diam(B ′ ).
2 2
c ′
Thus B ⊂ B . Since X is separable, each Gk is countable.
S
S } ⊂ B. If A ⊂ m
(ii) Let {B1 , . . . , BmS k=1 Bk we are done. Otherwise, suppose x ∈ A \
n m
B
k=1 k . As F k = B
k=1 k is closed, d(x, Fk ) > 0. By (c) there exists B ∈ B such that
x ∈ B and B ⊂ X \ Fk . Part (i) implies that there is B ′ ∈ G such that B ∩ B ′ 6= ∅ and
B⊂B c′ . Therefore, B ′ ∈ G \ {B1 , . . . , Bm } and (9.9) holds.
Theorem 9.6.2. (Vitali’s covering theorem.) Suppose U ⊂ Rd is an open. For any covering
B of U by closed balls contained in U satisfying
inf diam(B) = 0
B∈B
sup diam(B) < ∞,
B∈B
S
there is a sequence {Bn : n ∈ N} ⊂ B such that rad(Bn ) ≤ δ and λd (U \ n Bn ) = 0.
S
Proof. Since U = n {x ∈ U : n < |x| < n + 1} ∪ {x ∈ U : |x| ∈ N} and λd {x ∈ U :

|x| ∈ N} = 0, we may assume without loss of generality that U is bounded. As before, for
b to denote the concentric ball of radius 5rB . Vitali’s covering
any given ball B we will use B
Lemma shows that there is a countable collection G1 ⊂ B of pairwise disjoint sets such that
[ [
U= B⊂ b
B.
B∈B B∈G1
Whence,
X X [
λd (U ) ≤ b = 5d
λd (B) λd (B) = 5d λd B ;
B∈G1 B∈G1 B∈G1
S
that is, λd U \ B∈G1 B ≤ (1 − 5−d )λd (U ). Therefore, for any 1 − 5−d < θ < 1, there are
n1 balls, B1 , . . . , Bn1 in G1 such that
n1
[
λd U \ Bj < θλd (U ).
j=1
224 9. Finite product of elementary integrals

Proceeding by induction, suppose pairwise disjoint balls B1 , . . . , Bnk in B have been chosen
so that
nk
[
λd U \ Bj < θk λd (U ),
j=1

S k
Let Bk be the set of all closed balls in B that are contained by the open set Uk := U \ nj=1 Bj .
Clearly Bk covers Uk . Applying the same argument to Uk in place of U , we obtain disjoint
balls Bnk +1 , . . . , Bnk+1 in Bk such that

nk+1 nk+1
[ [
λd U \ Bj = λd U k \ Bj < θλd (Uk ) < θk+1 λd (U )
j=1 j=nk +1

The collections
S of all such {Bn : n ∈ N} is pairwise disjoint and, by letting k → ∞,
λd (U \ n Bn ) = 0.

9.6.2. Linear transformations. Suppose that T : Rn −→ Rn a linear transformation

with det(T ) 6= 0. This transformation induces a measure on (Rn , R n ) through Lebesgue
measure λ; namely, λ(T −1 (dx)). Denote by GL(n, R) the group of linear invertible trans-
formations of Tn onto itself.

Theorem 9.6.3. Suppose T is a linear transformation from Rn to itself. Then,

(i) if f is Borel measurable, so is f ◦ T .

(ii) λ(T (E)) = | det(T )|λ(E) for any E ∈ B(Rn ).
(iii) If in addition, det(T ) 6= 0, then g ◦ T ∈ L1 (λ) iff g ∈ L1 (λ) and
Z Z
1
g ◦ T (x) λ(dx) = g(x) λ(dx)
Rn | det(T )| Rn

Proof. (i) Measurability of f ◦ T follows from the continuity of T .

(ii) Suppose that det(T ) 6= 0. As every nonsingular linear transformation T : Rn → Rn
(det(T ) 6= 0) can be expressed as composition of three types of elementary linear transfor-
mations:

(a) T1 : [x1 . . . xj . . . xn ] 7→ [x1 . . . cxj . . . xn ] where c ∈ R \ {0}.

(b) T2 : [x1 . . . xj . . . xn ] 7→ [x1 . . . xj + xk . . . xn ], j 6= k.
(c) T3 : [x1 . . . xj . . . xk . . . xn ] 7→ [x1 . . . xk . . . xj . . . xn ], j 6= k,
R 1
R
it is enough to consider elementary linear transformations. Recall that f (c t) dt = |c| f (t) dt,
R R
f (t + a) dt = f (t) dt for any f ∈ L1 (R), c, a ∈ R. Integrating first with respect the j–th
9.6. Change of variables formula in (Rn , B(Rn ), λ). 225

coordinate and applying Fubini’s theorem gives

Z Z
1
g ◦ T1 (x) λ(dx) = c g(y) λ(dy)
Rn Rn
Z Z
g ◦ T2 (x) λ(dx) = g(y) λ(dy)
Rn Rn
Z Z
g ◦ T3 (x) λ(dx) = g(y) λ(dy)
Rn Rn
for any measurable function g ≥ 0 or g ∈ L1 (Rn ).
Hence, if det(T ) 6= 0 and E ∈
B(Rn ), λ(T −1 (E)) = | det(T )|−1 λ(E). Since (T −1 )−1 = T , we conclude that λ(T (E)) =
| det(T )|λ(E).
If det(T ) = 0, then T (Rn ) is a subspace of dimension d < n; thus, λ(T (Rn )) = 0, for we
can use a linear L with det L = 1 and map T (Rn ) onto Rd × {0}n−d . Therefore λ(T (E)) = 0
for all E ∈ B(Rd ).

9.6.3. Diffeomorphisms. Suppose Ω ⊂ Rn is open and consider a function G : Ω → Rn .

Recall that G is differentiable at a point x ∈ Ω if there exists a unique T ∈ L(Rn , Rn ) such
that for any ε > 0, there is δ > 0 such that
(9.10) |G(y) − G(x) − T (y − x)| ≤ ε|y − x|, for all y ∈ B(x; δ)
The linear operator T is the derivative of G at x and will be denoted as G′ (x). In the
standard basis on Rn , the matrix representation of G′ (x) is called the Jacobian matrix of
G at x. The determinant of G′ (x), which we will denote by JG (x), is called the Jacobian
determinant of G at x.
Remark 9.6.4. One useful geometric interpretation of (9.10) is that
n o
G B(x; r) ⊂ y : d y, B(G(x); kT kr) ∩ G(x) + T (Rn ) < rε

Lemma 9.6.5. If G is differentiable at the point x ∈ Ω then, for any ε > 0 there exists
δ > 0 such that whenever 0 < r ≤ δ,

λ∗ G B(x; r) ≤ |JG (x)| + ε)λ B(x; r)

where λ∗ is the outer measure (or the Daniell–mean) associated the the Lebesgue measure
on Rn .

Proof. First we consider the case when det(T ) = 0. In this case, T (Rn ) is a linear subspace
of dimension m := rank(T ) < n. Given ε > 0 we will determine a small number ǫ1 > 0
and a corresponding δ > 0 for which (9.10) holds. For any 0 < r ≤ δ, all points G B(x; r)
lie within a distance ǫ1 r of B(G(x); kT kr) ∩ {G(x) + T v : v ∈ Rn }. Hence, G(B(x; r)) is
contained in a box with n − m sides of length 2(kT k + ǫ1 )r and m sides of length 2ǫ1 r.
Consequently

λ∗ G(B(x; r)) ≤ 2n (kT k + ǫ1 )m ǫ1n−m rn = cn (kT k + ǫ1 )m ǫ1n−m λ(B(x; r))
226 9. Finite product of elementary integrals

where cn is a parameter that depends only on the dimes ion n. It is enough to choose ǫ1 > 0
small enough so that cn (kT k + ǫ1 )m ǫ1n−m < ε.

We now assume that det(T ) 6= 0. For any ε > 0 we will choose a small ǫ1 > 0 and a
corresponding δ > 0 so that (9.10) holds. So, if 0 < r ≤ δ then
−1
T G(y) − T −1 G(x) ≤ 1 + ε T −1 |y − x|

for all y ∈ B(x; r). This means that T −1 G B(x; r) ⊂ B T −1 G(x); 1 + ǫ1 kT −1 k r .
Therefore
n
λ∗ T −1 G B(x; r) ≤ 1 + ǫ1 kT −1 k λ B(T −1 G(x); r)
n
= 1 + ǫ1 kT −1 k λ B(x; r)

By Theorem 9.6.3[ii] λ∗ T −1 G B(x; r) = | det(T −1 |λ∗ G(x; r) , and so
n
λ∗ G(B(x; r)) ≤ | det(T )| 1 + ǫ1 kT −1 k λ B(x; r)

It suffices to choose ǫ1 > 0 small enough so that | det(T )|(1 + ǫ1 kT −1 k)n < | det(T )| + ε.

Theorem 9.6.6. Suppose Ω ⊂ Rn is an open set and G : Ω → Rn . Assume that G is

differentiable on a set E and that M := supx∈E |JG (x)| < ∞. Then

(9.11) λ∗ (G(E)) ≤ M λ∗ (E).

In particular, if JG (x) = 0 for all x ∈ E then, G(E) negligible, and so Lebesgue–measurable.

Remark 9.6.7. The last statement with Jg (x) for all x ∈ E is a special version of Sard’s
theorem where domain and range are of same dimension. A point y ∈ G(Ω) is a critical
value if there is x ∈ Ω such that y = G(x), G is differentiable at x and JG (x) = 0. Sard’s
theorem states that the set of critical values of a function G is Lebesgue negligible.

Proof. We first consider the case where E is bounded. By the outer regularity of Lebesgue
measure, for any ε > 0 there is an open set U such that E ⊂ U and λ(G) < λ∗ (E) + ε. By
Lemma 9.6.5, for each x ∈ E there is δx > 0 such that for all 0 < r ≤ δx ,

λ∗ G(B(x; r)) ≤ (M + ε)λ(B(x; r))

The family B of closed balls B(x; r) where x ∈ E, 0 < r ≤ min(δ5 x ,1) satisfy the conditions
of Vitali’s covering lemma. Hence, there exists a sequence G = {B k : k ∈ N} of pairwise
disjoint balls in B such that
[
k [
∞
E⊂ Bj ∪ ˆ
B j
j=1 j=k+1
9.6. Change of variables formula in (Rn , B(Rn ), λ). 227

ˆ is the ball concentric to B and with diam(B)

for all k ∈ N, where B ˆ = 5 diam(B). It follows
that
k ∞
X X ˆ )
λ∗ G(E) ≤ λ∗ G(B j ) + λ∗ G(B j
j=1 j=k+1
k
X ∞
X
n
≤ (M + ε) λ(B j ) + (M + ε)5 λ(B j )
j=1 j=k+1
P
Since k λ(B k ) < λ(U ) < ∞, by letting k → ∞ we obtain that
λ∗ (G(E)) ≤ (M + ε)λ(U ) ≤ (M + ε)(λ(E) + ε)
The conclusion for E bounded follows by letting ε → 0.
For the general case, there choose an increasing sequence of bounded sets Ek ր E. The
monotone continuity of λ∗ implies that
λ∗ (G(E)) = lim λ∗ (G(Ek )) ≤ (M + ε) lim λ∗ (Ek ) = (M + ε)λ∗ (E)
k k
As before, the conclusion follows by letting ε → 0.
For the last statement of the Theorem we have that M = 0 which reduces (9.11) to
λ∗ (G(E)) = 0.
Theorem 9.6.8. Suppose Ω ⊂ Rn is open and let G : Ω → Rn be differentiable on Ω. If
E ⊂ Ω is Lebesgue measurable, then so is G(E) and
Z
(9.12) λ(G(E)) ≤ |JG (x)| dx
E
In particular, if λ(E) = 0, then λ(G(E)) = 0.

Proof. Since G is differentiable on Ω, G ∈ C(Ω) and in particular G is measurable. The

∂G
partial derivatives x 7→ ∂x j
(x), j = 1, . . . , n are limits of measurable functions and so the
are themselves measurable. Consequently the Jacobian determinant function x 7→ JG (x) is
measurable on Ω. Suppose that E ⊂ Ω is Lebesgue measurable.
First we assume that Λ(E) < ∞. Define the sequence of measurable sets {Ek : k ∈ Z+ }
n o
Ek = x ∈ E : kε ≤ |JG (x)| < (k + 1)ε .
S
Since G(E) = k G(Ek ), Theorem 9.6.6 implies that
X X
λ∗ (G(E)) ≤ λ∗ (G(Ek )) ≤ (k + 1)ελ(Ek )
k k
X XZ X
= kελ(Ek ) + ελ(Ek ) ≤ |JG (x)| dx + ε λ(Ek )
k k Ek k
Z
= |JG (x)| dx + ελ(E).
E
228 9. Finite product of elementary integrals

The conclusion for λ(E) < ∞ follows by letting ε → 0.

The case λ(E) = ∞ follows from the bounded case by choosing an increasing sequence of
bounded measurable set Ak ր E. The monotone continuity of λ∗ together with monotone
convergence implies that
Z Z
∗ ∗
λ (G(E)) = lim λ (G(Ak )) ≤ lim |JG (x)| dx = |JG (x)| dx.
k k Ak E
It remains to show that G(E) is measurable. If E is Lebesgue negligible then, by (9.12),
G(E) is Lebesgue negligible and thus, Lebesgue measurable. For a general Lebesgue mea-
surable set E, the inner regularity of λ∗ implies that there exits a sequence of compact sets
{Kn : n ∈ N} and a negligible set N such that
[
E= Kn ∪ N
n
The previous argument shows that G(N ) S
is Lebesgue measurable. Each G(Kn ) is compact,
and so Borel measurable. Since G(E) = n G(Kn ) ∪ G(N ), G(E) is Lebesgue measurable
and λ∗ (G(E)) = λ(G(E)).
Theorem 9.6.9. Assume Ω ⊂ Rn is open and G : Ω → Rn is differentiable on G. For any
Borel measurable function f on Rn , f ◦ G is Borel measurable on Ω and
Z Z

(9.13) f (y) dy ≤ f ◦ G (x)|JG (x)| dx
G(Ω) Ω

Proof. First consider f = 1B where B is a Borel set. The continuity of G implies its Borel
measurability, and so f ◦ G = 1G−1 (B) is Borel measurable. Applying Theorem 9.6.8 with
E = G−1 (B) and noticing that G(E) = G(Ω) ∩ B leads to
Z Z Z

1B (y) dy = λ(G(Ω) ∩ B) ≤ |JG (x)| dx = 1B ◦ G (x)|JG (x)| dx
G(Ω) G−1 (B)

By linearity (9.13) holds for non negative simple functions and by monotone convergence
the conclusion extends to all non negative Borel functions.

Recall that s bijection G : Ω → G(Ω) is a C 1 –diffeomorphism if both G and G−1 are

continuously differentiable. In such case

G′ G−1 (y) (G−1 )′ (y) = I, y ∈ G(Ω)
(G−1 )′ (G(x))G′ (x) = I, x∈Ω
−1
whence JG (x) 6= 0 and JG−1 (G(x)) = JG (x) for all x ∈ Ω. The next result gives a full
description of the measures λ(G−1 (dx)) and λ(G(dy)) when G is a C 1 –diffeomorphism.
Theorem 9.6.10. (Change of variable formula) Suppose Ω is an open set in Rn and let
G : Ω → G(Ω) be a diffeomorphism.
(i) f ∈ M (G(Ω), λ) iff f ◦ G ∈ M (Ω, λ).
(ii) If f ∈ L1 (G(Ω), λ), the (f ◦ G)|JG | ∈ L1 (Ω, λ).
9.6. Change of variables formula in (Rn , B(Rn ), λ). 229

(iii) If f ∈ M+ (G(Ω), λ) or if f ∈ L1 (G(Ω), λ) then

Z Z
(9.14) f (y) λ(dy) = f ◦ G(x)|JG (x)| λ(dx)
G(Ω) Ω

Proof. If f ∈ MR(G(Ω)) then there is fb ∈ B(G(Ω)) such that λ {f 6= fb} = 0. Applying
Theorem 9.6.8 for function G−1 and set {f 6= fb}

λ {f ◦ G 6= fb ◦ G}) = λ G−1 ({f 6= fb}) = 0
Conversely, if f ◦ G ∈ MR(Ω), there exists h ∈ B(Ω) such that λ({f ◦ G 6= h}) = 0. The
continuity of G implies that h ◦ G−1 ∈ B(G(Ω)). Applying Theorem 9.6.8 with for function
G and set {f ◦ G 6= h} implies that

λ {f 6= h ◦ G−1 } = λ G({f ◦ G 6= h}) = 0
This argument shows that it is enough to consider Borel measurability in proving (i)–(iii).

(i) If f ∈ B(G(Ω)) then, the continuity of G implies that f ◦ G ∈ B(Ω). Conversely, if

f ◦ G ∈ B(Ω) then, the continuity of G−1 implies that f = (f ◦ G) ◦ G−1 ∈ B(G(Ω)).
(ii) It is enough to consider f ∈ B+ (G(Ω)). Theorem 9.6.9 implies that that
Z Z
(9.15) f (y) dy ≤ f ◦ G)(x) |JG (x)| dx
G(Ω) Ω

Set Ω2 := G(Ω) so that G−1 (Ω2 ) = Ω. For any g ∈ B+ (Ω2 ), another application of
Theorem 9.6.9 with G−1 in place of G and Ω2 in place of Ω gives
Z Z
(9.16) g(x) dx ≤ g ◦ G−1 )(y) |JG−1 (y)| dy
G−1 (Ω2 ) Ω2

In particular, we consider g(x) := f ◦ G (x)|JG (x)|. The inverse function theorem shows
−1
that JG (x) 6= 0 for all x ∈ Ω and that JG−1 (y) = JG (G−1 (y) . Hence, inequality (9.16)
reduces to
Z Z

f ◦ G (x)|JG (x)| dx ≤ f (y)|JG (G−1 (y)| |JG (G−1 (y))|−1 dy
Ω G(Ω)
Z
(9.17) = f (y) dy
G(Ω)

Putting together (9.15) and (9.17) gives equation (9.14).

(iii) follows from identity (9.14) applied to f+ and f− separately.

Example 9.6.11. (The beta and the gamma functions) The gamma and beta functions are
related by the relation
Γ(a)Γ(b)
B(a, b) =
Γ(a + b)
230 9. Finite product of elementary integrals

To see this, we apply Fubini’s theorem and the change of variable (x, y) 7→ (x, x + y)
Z
Γ(a)Γ(b) = xa−1 e−x y b−1 e−y dxdy
(0,∞)2
Z ∞Z ∞
−v a−1 b−1
= e u (v − u) dv du
Z0 ∞ u
Z
v
= e−v ua−1 (v − u)b−1 du dv
0 0

The change of variables (u, v) 7→ (u/v, v) gives

Z ∞Z 1
Γ(a)Γ(b) = e−t ta−1 sa−1 (t − ts)b−1 s dsdt
0 0
Z ∞ Z 1
a+b−1 −t
= t e sa−1 (1 − s)b−1 ds dt
0 0
= Γ(a + b)B(a, b)

Example 9.6.12. (Generalization of Beta function) Consider the integral

Z
I := f (x1 + . . . + xn )xa11 −1 · . . . · xann −1 dx
Rn
+

where a1 , . . . , an > 0. On Rn+ define

a −1 an −1
g(t1 , . . . , tn ) := f (tn )ta11 −1 · . . . · tn−1
n−1
tn − (t1 + . . . + tn−1 ) ,
and   
1 0 ... 0 0 x1
 0 1 ... 0 0   x2 
  
 .... . . .. ..   .. 
T x :=  . . . . .   
  . 
 0 0 ... 1 0   xn−1 
1 1 ... 1 1 xn
R
Then I = Rn g(T x)|JT (x)| dx, and T (Rn+ ) = {t ∈ Rn+ : t1 + . . . + tn−1 < tn }. Hence
+
 
Z ∞  Z 
 a −1 
I= f (tn )  ta11 −1 . . . tn−1
n−1
(tn − t1 − . . . − tn−1 )an −1 dt1 . . . dtn−1  dtn
0  
t1 ,...,tn−1 >0
t1 +...+tn−1 <tn

−(n−1)
Setting G(t) := (t1 /tn , . . . , tn−1 /tn , tn ) we obtain that |JG (t)| = tn , and Dn−1 :=
G(T (Rn+ )) = {v ∈ Rn+ : v1 , . . . , vn > 0, v1 + . . . vn−1 < 1}. Hence
Z !Z
∞
an−1 −1 a −1
I= v1a1 −1 . . . vn−1 1 − (v1 + . . . + vn−1 ) n dv1 . . . dvn−1 f (v)v α−1 dv
Dn−1 0
9.6. Change of variables formula in (Rn , B(Rn ), λ). 231

where α = a1 + . . . + an . The generalized Beta function is defined as

Z
an−1 −1 a −1
B(a1 , . . . , an ) := v1a1 −1 . . . vn−1 1 − (v1 + . . . + vn−1 ) n dv1 . . . dvn−1
v1 ,...,vn−1 >0
v1 +...+vn−1 <1

It follows that
Z Z ∞
a1 −1 an −1
f (x1 + . . . + xn )x1 . . . xn dx = B(a1 , . . . , an ) f (s)sa1 +...+an −1 ds
0
Rn
+

In particular, seeting f (s) = e−t gives

Γ(a1 ) · . . . · Γ(an )
B(a1 , . . . , an ) =
Γ(a1 + . . . + an )
By putting things together, we obtain
Z Z
a1 −1 an −1 Γ(a1 ) · . . . · Γ(an ) ∞
f (x1 + . . . + xn )x1 . . . xn dx = f (s)sa1 +...+an −1 ds
Γ(a1 + . . . + an ) 0
Rn
+

Example 9.6.13. (Order Statistics) Let ν(dx) = f (x) λ1 (dx) be a probability measure and
νn (dx) = f (x1 ) · · · f (xn ) λd (dx), the product measure. The map T : x = (x1 , . . . , xn ) 7→
(x(1) , . . . , x(n) ), where x(k) is the k–th smallest element in x, is called the n–order statistic
map. If B = {(x(1) , . . . , x(n) ) : x(1) < . . . < x(n) }, then for each permutation σ on {1, . . . , n},
n
there is one to one linear map
n
S Pσ : B → R such that T ◦ Pσ = In the identity on B. It is
easy to check that λn (R \ σ∈Σn Pσ (B)) = 0. The maps Pσ are represented by matrices
with exactly one 1 in each row and each column and det(Pσ ) = ±1. Then, λd ◦ T −1 ≪ λd
and
X
fT (x(1) , . . . , x(n) ) = | det(Pσ )|f (xσ(1) ) · · · f (xσ(n) ) = n!f (x1 ) · · · f (xn )
σ∈Σn

for x(1) < . . . < x(n) and 0 otherwise.

9.6.4. Non invertible smooth functions. In this section assume that Ω ⊂ Rn is open
and that G : Ω → Rn is continuously differentiable on Ω. Let H 0 be the counting measure
on Rn and For any set E ⊂ Ω we define the map hE : G(Ω) → R+ as
X
hE (y) = H 0 E ∩ G−1 (y) = 1E (x)
x∈G−1 ({y})

Theorem 9.6.14. For any Lebesgue measurable set E ⊂ Ω, hE ∈ MRn (G(Ω), λ) and
Z Z
(9.18) hE (y) dy = |JG (x)| dx
Rn E

Remark 9.6.15. If G is a diffeomorphism then 9.18 follows from Theorem 9.14.

232 9. Finite product of elementary integrals

Proof. (1) First assume that E ⊂ Ω is open and that E ∩ {JG = 0} = ∅. The inverse
function theorem implies that for each x ∈ E there is δx > 0 such that, if 0 < r ≤ δx then,
B(x; r) ⊂ E and G is a C 1 –diffeomorphism from B(x; r) onto the open set G(B(x; r)). By
Theorem 9.6.10
Z

λ G(B(x; r)) = |Jg (x)| dx.
B(x;r)

The collection B of all closed balls B(x; r) with x ∈ E and 0 < r ≤ min(δx , 1) satisfy
the conditions of Vitali’s covering theorem.
S Hence, there exits a pairwise disjoint sequence
{B k : k ∈ N} ⊂ B such that N := E \ k Bk is Lebesgue negligible. Consequently
Z X XZ Z Z
1G(Bk ) dλ = |JG | dλ = S |JG | dλ = |JG | dλ
Rn k k Bk k Bk E
P
Since G is one to one on each Bk , 1G(Bk ) = hSk Bk ≤ hE . It is easy to check that
k
[
{hSk Bk < hE } ⊂ G E \ Bk = G(N )
k

By Theorem 9.6.8, λ(G(N )) = 0. Therefore hS k Bk = hE a.s., hE is Lebesgue measurable

and (9.18) holds.

(2) Suppose E ⊂ Ω is open and {JE = 0} =

6 . By Sard’s theorem

λ G({JG = 0} = 0
Applying part (1) to the open set E1 := E ∩ {JG 6= 0} we get
Z Z Z
hE1 dλ = |JG | dλ = |JG | dλ
Rn E1 E

Thus it suffices to show is clear that hE1 ≤ hE to see that (9.18) holds. From

{hE1 < hE } ⊂ G(E \ E1 ) = G E ∩ {JG = 0}
we conclude that hE1 = hE a.s.

(3) Suppose E ⊂ Ω is compact. Choose an open set U such that K ⊂ U ⊂ U ⊂ Ω with U

compact. Applying the open set case to the sets U and U \ E we obtain that
Z Z
(9.19) hU (y) dy = |JG (x)| dx
n
Z R ZU
(9.20) hU \E (y) dy = |JG (x)| dx
Rn U \E

The compactness of U and the continuity of JG imply that the right hand side of (9.19)
and (9.20) are finite. Hence hU − hU \E = hE a.s. and, by subtracting (9.20) from (9.19),
we obtain (9.18).
9.6. Change of variables formula in (Rn , B(Rn ), λ). 233

E, let {Kj : j ∈ N} be a sequence of compact sets

(4) For a general Lebesque measurable setS
such that Kj ≤ Kj+1 ⊂ E and N := E \ j Kj is negligible. Theorem 9.6.8 λ(G(N )) = 0.
By monotone convergence hKj ր hSj Kj , and so
Z Z Z Z
hS j Kj = lim hKj dλ = lim |JG | dλ = S |JG | dλ
Rn j Rn j Kj Kj
j

Since hSj Kj ≤ hE and {hSj Kj < hE } ⊂ G(N ), hE = hSj Kj a.s. and (9.18) holds.

Corollary 9.6.16. Under the assumptions of Theorem 9.6.14, if f ≥ 0 is a Borel measur-

able function in Rn ,
Z Z

(9.21) hΩ (y)f (y) dy = f ◦ G (x)|JG (x)| dx
Rn Ω

Proof. We prove (9.21) for Borel sets first. Let B ⊂ Rn be a Borel set and set E = G−1 (B).
Then,

hE (y) = H 0 ({x ∈ Ω : x ∈ G−1 (B) ∩ G−1 ({y}))

= 1B (y)H 0 ({x ∈ Ω : G(x) = y}) = 1B (y)hΩ (y)

Thus, by Theorem 9.6.14,

Z Z Z
hΩ 1B dλ = 1G−1 (B) |JG | dλ = (1B ◦ G) |JG | dλ
Rn Ω Ω

By linearity (9.21) holds for Borel–measurable simple functions. By monotone convergence

the results holds for nonnegative Borel–measurable functions.

Corollary 9.6.17. Under the assumptions of Theorem 9.6.14, if ϕ : Ω → R+ is Lebesgue

measurable then,
Z X Z
(9.22) ϕ(x) dy = ϕ(x)|JG (x)| dx
Rn Ω
x∈G−1 (y)

Proof. By Theorem 9.6.14 (9.22) holds for Lebesgue sets E ⊂ Ω. Notice that for any
nonnegative function ϕ and any y ∈ G(Ω)
Z X
ϕ(t)H 0 (dt) = ϕ(x)
G−1 ({y})
x∈G−1 ({y})

Then, by linearity (9.22) extends to Lebesgue nonnegative Lebesgue simple functions. Fi-
nally, by monotone convergence arguments, (9.22) extends to nonnegative Lebesgue mea-
surable functions.
234 9. Finite product of elementary integrals

9.7. Applications of change of variables in integration

In this section we present two applications of the change of variables formula. The first
one is an analytical proof of an important topological result in finite dimensions, namely,
the Brouwer’s fixed point theorem. The second one is the derivation of relations between
Cartesian coordinates and polar coordinates. We will obtain explicit formulas for the volume
of the unit ball in Rn and the surface area of the sphere Sn−1 .

9.7.1. Brouwer’s fixed point theorem. Suppose f : B(0; 1) → B(0; 1) is continuous.

Brouwer’s fixed point theorem states that there is a point x ∈ B(0; 1) such that f (x) = x.
Here we present a proof of this result based in the change of variables formula.

Theorem 9.7.1. (Brouwer’s fixed point theorem) If f : B(0; 1) → B(0; 1) is continuous

then, there exits x ∈ B(0; 1) such that f (x) = x.

Proof. It is enough to consider the case where f is C ∞ (B(0; 1)). To see that this is the
case, suppose the result holds for all continuous functions which are C ∞ on B(0; 1). Let
ε > 0. By the Stone–Weierstrass theorem there are polynomials P1 (x), . . . , Pn (x) such that
1
kf − P ku = supx∈B(0;1) |f (x) − P (x)|2 < ε, where P = (P1 , . . . , Pn )⊤ . Setting Pε := 1+ε P
we obtain that kPε ku ≤ 1 and
1
kf − Pε ku ≤ kf − P ku + εkf ku < 2ε
1+ε
As Pε ∈ C ∞ (Rn ), there exits xε ∈ B(0; 1) such that Pε (xε ) = xε . Hence |f (xε ) − xε | < 2ε.
By compactness, there is a sequence εn → 0 such that xεn → x∗ for some x∗ ∈ B(0; 1). By
continuity it follows that f (x∗ ) = x∗ .

We will assume that the statement is false and will reach a contradiction. Suppose f :
B(0; 1) → B(0; 1) is continuous on B(0; 1), of class C ∞ on B(0; 1) and such that f (x) 6= x
for all x ∈ B(0; 1). Then, for each x ∈ B(0; 1) the equation
F (τ, x) := |τ f (x) + (1 − τ )x|2 − 1 = 0
has exactly two solutions,
p
−hx, f (x) − xi ± hx, f (x) − xi2 + |x − f (x)|2 (1 − |x|2 )
τ± (x) =
|x − f (x)|2
By assumption x 6= f (x) for all x ∈ B(0; 1); thus, as B(0; 1) is compact, inf x∈B(0;1) |f (x) −
x| > 0. This implies that
(a) hx, f (x) − xi < 0 for all x ∈ B(0; 1).
(b) τ− ∈ C(B(0; 1)).
(c) τ− (x) = 0 whenever |x| = 1.
(d) τ− (x) < 0 whenever |x| < 1.
9.7. Applications of change of variables in integration 235

By partial differentiation we get that

(9.23) ∂τ F (τ− (x), x) = 2τ− (x)|f (x) − x|2 + 2hx, f (x) − xi < 0
⊤
(9.24) ∂x F (τ− (x), x) = 2 x + τ− (x)(f (x) − x) I + τ− (x)(f ′ (x) − I)

The expression to the right of the equality in (9.23) is, as a function of x, continuous
on B(0; 1) and strictly negative by observation (a) above. Similarly, the right hand side
of (9.24) is, as a function of x, continuous on B(0; 1). These observations, together with
implicit function theorem, imply that τ− is C ∞ on B(0; 1) and that supx∈B(0;1) kτ−′ (x)k < ∞.

Define the function G : B(0; 1) → {y ∈ Rn : |y| = 1} =: Sn−1 by

G(x) := x + τ− (x)(f (x) − x).
It is clear that G(x) = X whenever |x| = 1, G ∈ C(B(0; 1)) and that G is C ∞ on B(0; 1).
Moreover, as τ−′ is bounded, α := supx∈B(0;1) kG′ (x)k < ∞. Consider the function Φ defined
by
Φ(t, x) := x + t(G(x) − x), (t, x) ∈ [0, 1] × B(0; 1)
For each t ∈ [0, 1] we define a map Φt : x 7→ Φ(t, x) on B(0; 1). Notice that
(e) For all t ∈ [0, 1], Φt (B(0; 1)) ⊂ B(0; 1) and Φt (S−1 ) = Sn−1 .
(f) For all t ∈ [0, 1), Φt (B(0; 1)) ⊂ B(0; 1)
1
By the mean value theorem, for all x, y ∈ B(0; 1) and 0 ≤ t < 1+α

|Φ(t, x) − Φ(t, y)| ≥ (1 − t)|x − y| − t|G(x) − G(y)| ≥ (1 − t)|x − y| − tα|x − y|

= 1 − (1 + α)t |x − y| > 0

This means that for each 0 ≤ t < 1/(1 + α), the map Φt is injective on B(0; 1). Since
Φ′t (x) = I + t(G′ (x) − I) and G′ is continuous and bounded on B(0; 1), there exists 0 < δ <
1/(1 + α) such that for all 0 ≤ t < δ, Φ′t (x) is invertible all x ∈ B(0; 1). It follows from the
inverse function theorem that each map Φt with 0 ≤ t < α is a local diffeomorphism. As
all maps Φt with t ∈ [0, α) are injective, we conclude that for each t ∈ [0, α)
g) Φt (B(0; 1)) is an open subset of B(0; 1),
(h) Φt is a C 1 –diffeomorphism from B(0; 1) to Φt (B(0; 1)).

We claim that for all t ∈ [0, δ), Φt (B(0; 1)) = B(0; 1). Observation [(e)] states that
Φt (Sn−1 ) = Sn−1 . Hence Φt (B(0; 1)) ∩ B(0; 1) = Φt (B(0; 1)) for all 0 ≤ t < 1. This
implies that for each 0 ≤ t < α Φt (B(0; 1)) is both open and closed in B(0; 1). The fact
that B(0; 1) is connected implies that Φt (B(0; 1)) = B(0; 1) for all 0 ≤ t < δ.
R
Define ρ(t) := B(0;1) det(Φ′t (x)) dx. It is clear that ρ(t) is a polynomial in t of degree
at most n. Since Φ′0 (x) = I and (t, x) 7→ P hi′t (x) is continuous on [0, 1) × B(0; 1) then
236 9. Finite product of elementary integrals

inf x∈B(0;1) det(Φt (x)) > 0 for all 0 ≤ t < δ. By the change of variables theorem 9.6.10, for
all 0 ≤ t < δ
Z Z
′

ρ(t) = det(Φt ) dλ = dλ = λ Φt (B(0; 1) = λ B(0; 1) =: ωn
B(0;1) Φt (B(0;1))

It follows that ρ(t) = ωn for all t and so ρ(1) = ωn . However since |Φ1 (x)|2 = |G(x)|2 = 1,
it follows that (G(x))⊤ G′ (x) = 0 for all x ∈ B(0; 1). This means that det(G′ (x)) = 0 for all
x ∈ B(0; 1), and so ρ(1) = 0. This is a contradiction.

If a topological space A is homeomorphic to the unit ball is some Euclidean space Rn

and f : A → A is continuous, then it follows immediately that f has a fixed point.

9.7.2. Polar coordinates. Consider the unit sphere Sn−1 = {u ∈ Rn : |u|2 = 1} in Rn .

For any x ∈ Rn \ {0}, its polar coordinates are defined by
x
r = |x|2 ∈ (0, ∞), u= ∈ Sn−1
r
The map Φ : Rn \ {0} −→ (0, ∞) × Sn−1 given by x 7→ (r, u) is continuous and invertible,
and its inverse Φ−1 : (r, u) 7→ ru is also continuous. In this section, we will study the
measure λ∗ on (0, ∞) × Sn−1 induced by Φ and the Lebesgue measure λ on Rn \ {0}, that
is, λ∗ (E) = λ(Φ−1 (E)) for E ∈ B((0, ∞)) ⊗ B(Sn−1 ).
R
Consider the measure ρ on ((0, ∞), B((0, ∞))) given by ρ(I) = I rn−1 dr.

Theorem 9.7.2. There is a unique Borel measure σ on Sn−1 such that λ∗ = ρ × σ. More-
over, for any f in B+ (Rn ) or L1 (Rn ),
Z Z ∞Z
(9.25) f dλ = f (ru)rn−1 dr σ(du)
Rn 0 Sn−1

In particular, if f (x) = g(|x|), then

Z Z ∞
n−1
(9.26) f dλ = σ(S ) g(r)rn−1 dr.
Rn 0

Proof. For any E ∈ B(Sn−1 ) and a > 0 denote

Ea = Φ−1 ((0, a] × E) = {x ∈ Rn : 0 < r ≤ a, u ∈ E}.

If (9.25) holds, then

Z 1Z
σ(E)
λ(E1 ) = rn−1 dr σ(du) =
0 E n
This suggests that σ(E) = nλ(E1 ) for all E ∈ B(Sn−1 ). Since the map E 7→ E1 takes Borel
sets to Borel sets and commutes with unions, intersections, and complements, it follows that
σ defines a finite measure on B(Sn−1 ).
9.7. Applications of change of variables in integration 237

Since Ea = a E1 , we have that λ(Ea ) = an λ(E1 ), therefore

bn − a n
λ∗ ((a, b] × E) = λ(Eb \ Ea ) = σ(E)
n
Z b
= σ(E) rn−1 dr = (ρ × σ)((a, b] × E)
a

Hence λ∗ and ρ × σ coincide on the class of sets C = {(a, b] × E : 0 < a < b, E ∈ B(Sn−1 ),
which is a π–system generating all Borel sets in Rn \ {0}. Therefore, from Theorem 3.5.5,
we conclude that λ∗ = ρ × σ.

Corollary 9.7.3. The measure σ on (Sn−1 , B(Sn−1 )) is invariant under orthogonal trans-
formations.

Proof. Let P be any orthogonal transformation and g : Sn−1 −→ [0, ∞) a measurable

function. Then, by (9.25) and Theorem 9.6.3 we have that
Z Z 1Z Z
n−1
g(P u)σ(du) = n g(P u)r σ(du) dr = g(P (x/r)) dx
Sn−1 0 Sn−1 B(0;1)
Z Z 1Z Z
n−1
= g( xr ) dx =n g(u)r σ(du) dr = g(u)σ(du).
B(0;1) 0 Sn−1 Sn−1

Example 9.7.4. Let a ∈ Rn be fixed and let {ei : i = 1, . . . , n} be the standard orthonormal
1
basis of Rn . For each i let Ti be any orthogonal map that maps |a| a to ei . Then
Z Z
1
(a · u)2 σ(du) = (ei · u)2 σ(du).
|a|2 Sn−1 Sn−1

Thus,
Z n Z
2 |a|2 X
(a · u) σ(du) = (ei · u)2 σ(du) = |a|2 ωn .
Sn−1 n Sn−1
i=1
2
Example 9.7.5. Consider the function f (x) = e−|x| . Fubini’s theorem, a change to polar
coordinates, and then a change of variables u = r2 , gives
Z 2
n Z 2
Z ∞
2 n−1
Z ∞
e−x dx = e−|x| dx = σ(Sn−1 ) e−r rn−1 dr = σ(S2 ) e−u un/2−1 du.
R Rn 0 0

2 π n/2
whence we conclude that σ(Sn−1 ) = n . If g = 1B(0;1) in (9.26), we obtain that
Γ( 2 )

σ(Sn−1 ) π n/2 π n/2

ωn := λ(B(0; 1)) = = n n = .
n 2 Γ( 2 ) Γ( n2 + 1)
238 9. Finite product of elementary integrals

9.7.3. Polar coordinates in Rn . It is possible to give an explicit representation for x ∈

Rn \ {0} in polar coordinates in terms of the angle between x and en , and the orthoganal
projection of x onto the subspace Rn−1 × {0}. Let ϕn−1 ∈ [0, π] be the angle between x
and en , ρ = |x|, and let P be the orthogonal projection from Rn onto Rn−1 × {0}. Then,
x · en = ρ cos ϕn−1 and x · P x = ρ sin ϕn−1 |P x|; hence,
x = P x + ρ cos ϕn−1 en = ρ sin ϕn−1 |P1x| P x + ρ cos ϕn−1 en .
Using induction starting with n = 2 we obtain that a parameterization Φ of Rn in polar
coordinates
n−1
Y
xn = ρ cos ϕn−1 ; xk = ρ sin ϕj cos ϕk−1 , 2 < k < n − 1;
j=k
n−1
Y n−1
Y
x2 = ρ sin ϕj ; x1 = ρ sin ϕj cos ϕ1 ;
j=1 j=2

where ρ ≥ 0 and (ϕ1 , . . . , ϕn−1 ) ∈ [0, 2π] × [−π, π]n−2 . It is easy to check that Φ : (0, ∞) ×
n−2
(0, 2π) × (0, π)n−2 → Rn \ ({0} × R+ × R) is a diffeomorphism and that
n−1
Y
| det(Φ′ )| = ρn−1 sinj−1 ϕj
j=2
n−2
If ρ = 1, we obtain a representation of the surface area dσn−1 on Sn−1 \ ({0} × R+ × R)
in terms of the parameters (ϕ1 , . . . , ϕn−1 ) ∈ (0, π) n−2 × (0, 2π):
σn−1 (d ϕ1 , . . . , d ϕn−1 ) = sinn−2 ϕn−1 · . . . · sin ϕ2 dϕ1 · · · dϕn−1
= sinn−2 ϕn−1 · σn−2 (d ϕ1 , . . . , d ϕn−2 ).
As an application of this relation, we compute the following integral.
Z Z ∞
1 n−1 rn−1
2 (n+1)/2
dx = σ(S ) dr
Rn (1 + |x| ) 0 (1 + r2 )(n+1)/2
Z π/2
n−1 σ(Sn ) π (n+1)/2
= σ(S ) sinn−1 θ dθ = = .
0 2 Γ((n + 1)/2)

9.8. Isodiametric inequality

For any integer n, let ωd denote the Lebesgue measure of the unit ball B(0; 1) in Rd .
Suppose that A ⊂ Rd is symmetric; i.e. A = −A, and let rad(A) = diam A
. For any x ∈ A
2
2|x| = |x + x| ≤ 2rad(A); thus, A ⊂ B 0; rad(A) . Taking Lebesgue measure λd we ontain
d
(9.27) λ∗d (A) ≤ ωd rad(A)
As we will see, inequality (9.27), referred as the isodiametric inequality , holds for any
bounded set A ⊂ Rd . It says that among all subsets of Rd with a given diameter, the ball of
that diameter has the largest volume. Though this is certainly obvious for d = 1, for d > 1
it is not as trivial as it appears to be.
9.8. Isodiametric inequality 239

The show (9.27) for general bounded sets we will use a technique named as Steiner
symmetrization, which generates from A a finite sequence of increasingly symmetric sets of
the same volume and comparable radii.
For each v ∈ Sd−1 , we will use the notation ℓ(v) = {tv : t ∈ R} and {v}⊥ = {u ∈
Rd : v · u = 0} for the straight line through the origin parallel to v and the orthogonal
complement of v respectively. Given v ∈ Sd−1 and x ∈ Rd , we will denote by λx,v the
measure on B(Rd ) induced by the map R ∋ t 7→ x + tv, that is,
λx,v (A) = λ∗1 ({t : x + tv ∈ A}) A ⊂ Rd .
The Steiner symmetrization of A with respect to v ∈ Sd−1 is defined as

S(A; v) = x + tv : x ∈ {v}⊥ , |t| < 21 λx,v (A) .
Geometrically, S(A, v) is constructed by bundling together line segments, each of which is
obtained by taking the intersection of A with x + ℓ(v) (x ⊥ v), squashed it to remove gaps,
and then slide the resulting interval along x + ℓ(v) to center it at x.
Remarks 9.8.1. The following observations can be checked straight forwardly.
(i) If A ⊂ B ⊂ Rd , then S(A; v) ⊂ S(B; v).
(ii) x + tv ∈ S(A; v) iff x − tv ∈ S(A; v).

(iii) If R is a linear unitary operator on Rd (i.e., R⊺ = R−1 ), R S(A; v) = S(R(A); Rv).

Lemma 9.8.2. Let A ∈ B(Rd ) be bounded. Then, for all v ∈ Sd−1 , S(A; v) ∈ B(Rd ),
λd (S(A; v)) = λd (A) and rad(S(A; v)) ≤ rad(A). If R is d
a unitary transformation of R
such that R(ℓ(v)) = ℓ(v) and R(A) = A, then R S(A; v) = S(A; v).

Proof. The last statement follows directly from Remark 9.8.1(iii).

Remarks 9.8.1 also imply that the qualities and quantities under consideration (the mea-
surability of S(A; v) together with its Lebesgue measure and radius) are independent of
the any particular choice of coordinate system. Hence, without loss of generality, we can
assume that v = ed = [0, . . . , 0, 1]⊤ . This way,

S(A; v) = (ξ, t) ∈ Rd−1 × R : −f (ξ) < t < f (ξ)
[
= (ξ, t) ∈ Rd−1 × [0, ∞) : f (ξ) > t (ξ, t) ∈ Rd−1 × (−∞, 0] : −t < f (ξ) ,
R
where f (ξ) = 12 R 1A ((ξ, t)) λ1 (dt). By Fubini–Tonelli’s theorem, f is B(R)d−1 –measurable;
hence, S(A; v) ∈ B(Rd ). By Theorem 9.3.3 we have that
Z
λd (S(A; v)) = 2 f (ξ) λd−1 (dξ) = λd (A).
Rd−1

We now prove that the radius of the symmetrization A does not exceed the radius of A.
Since S(A; v) ⊂ S(A, v) and rad(A) = rad(A), we can assume without loss of generality
240 9. Finite product of elementary integrals

that A is compact. For any pair of points x and y in S(A; v), let ξ, τ ∈ Rd−1 and s, t ∈ R
be such that x = (ξ, s) and y = (τ, t). Define
M ± (x) = ± sup{r : (ξ, ±r) ∈ A}, M ± (y) = ± sup{r : (τ, ±r) ∈ A}.
The compactness of A implies that X ± = (ξ, M ± (x)) and Y ± = (τ, M ± (y)) are in A.
Moreover,
2|s| ≤ λ(ξ,0),v (A) ≤ M + (x) − M − (x)
2|t| ≤ λ(τ,0),v (A) ≤ M + (y) − M − (y);
therefore,
(M + (y) − M − (x)) ∨ (M + (x) − M − (y))
1
≥ M + (y) − M − (x) + (M + (x) − M − (y)
2
1 1
= (M + (y) − M − (y)) + (M + (x) − M − (x)) ≥ |s| + |t|.
2 2
Consequently,
|y − x|2 = |τ − ξ|2 + |t − s|2 ≥ |τ − ξ|2 + (|t| + |s|)2
2
≤ |τ − ξ|2 + (M + (y) − M − (x)) ∨ (M + (x) − M − (y))
2
= |Y + − X − | ∨ ||X + − Y − | ≤ 4 rad2 (A);
that is, rad(S(A, v)) ≤ rad(A).
Theorem 9.8.3. The inequality (9.27) holds for any bounded A ⊂ Rd .

Proof. Since A is compact and hence, measurable, and λ∗d (A) ≤ λd (A), it suffices to assume
that A is compact. Consider the canonical orthonormal basis {e1 , . . . , ed } of Rd and defined
recurrently A0 = A, An = S(An−1 , en } for n = 1, . . . , d. It follows that λd (An ) = λ(A)
and rad(An ) ≤ rad(A) for all 1 ≤ n ≤ d. The crucial part of this construction is that by
Remark 9.8.1(iii), the unitary operators Rn : x 7→ x − 2(x · en )en satisfy Rm (An ) = An for
all 1 ≤ m ≤ n ≤ d. For n = d in particular, this means that −Ad = Ad , that is, Ad is
symmetric. Therefore,
d d
λd (A) = λd (Ad ) ≤ ωd rad(Ad ) ≤ ωd rad(A) .

One important application of isodiametric inequality is the equivalence of the Lebesgue

measure λd and the Hausdorff measure H d on B(Rd ).
Theorem 9.8.4. Let cd = 2d /ωd , where ωd is the volume of the unit ball in Rd . Then
H d = c d λd .

Proof. We have already shown in Section 3.4.2 that H d = ad λd for some constant ωd−1 ≤
ad ≤ dd/2 .
9.9. Laplace’s method 241

Let A ∈ B(Rd ) and let {An } be a countable cover of A by sets of diameter at most δ. Then
X X
λd (A) ≤ λ∗d (An ) ≤ 2−d ωd (diam(An ))d .
n n

Therefore, λd (A) ≤ c−1 d

d Hδ (A) ≤ c−1 d
d H (A). Consequently, for Q = (0, 1]d we have that
1 ≤ c−1 d
d H (Q).

To obtain the inverse inequality we will make use of the Vitali’s covering theorem. Given
δ > 0, there is a countable collection of pairwise disjoint closed balls Bn with radius 0 <
rn < δ such that
[ [
λd (Q \ Bn ) = 0 = a−1 d H d
(Q \ Bn ).
n n
Thus,
[ X X
Hδd (Q) ≤ Hδd ( Bn ) ≤ Hδd (Bn ) ≤ (diam(Bn ))d
n n n
X [
= cd λd (Bn ) = cd λ( Bn ) = c d .
n n

Therefore, H d = cd λd .

9.9. Laplace’s method

Consider the integral of the form
Z
Q(s) := e−sg(x) f (x) dx
D
where D is a region in Rn . In many applications, it is of interest to understand the behavior
of Q(s) as s → ∞. This problem dates back to Laplace who made the observation that
the major contributions to the integral Q(s) arise from the regions where g is the smallest
possible.
Theorem 9.9.1. (Laplace’s method) Suppose f and g are measurable functions on D such
that
(i) g− := inf x∈D g(x) > −∞,
(ii) There is a unique x0 in the interior of D at which g(x0 ) = g− and f (x0 ) 6= 0.
(iii) For any R > 0, gR := inf |x−x0 |≥R g(x) > g− .
R
(iv) Cα = D e−αg(x) |f (x)| dx < ∞ for some α > 0.
If f and g are in C and C 2 in a neighborhood of x0 respectively, and A = D2 g(x0 ) is strictly
positive definite then,
Z n
n (2π) 2 f (x0 )
(9.28) lim s 2 esg− e−sg(x) f (x) dx = p
s→∞ D det(A)
242 9. Finite product of elementary integrals

Proof. Without loss of generality, me may assume that x0 . As D2 g(0) is strictly positive
definite and g is in C 2 near x0 , there exists R > 0 small enough so that f ∈ C(B(0; R)),
g ∈ C 2 (B(0; R) and
(9.29) g(x) ≥ g− + c|x|2 .
By (iii) and (iv), for s > α we have that
Z

n n s→∞
(9.30) s 2 esg− e−sg(x) f (x) dx ≤ Cα s 2 eαgR e−s(gR −g− ) −−−→ 0
{x∈D:|x|>R}
1
Using the change of variables y = s 2 x in the integral over B(0; R) leads to
Z
n
sg−
s e
2 e−sg(x) f (x) dx =
B(0;R)
Z
− 12
1
(9.31) 1
exp − s g(s y) − g − f s− 2 y dy
B(0;s− 2 R)

The continuity of f over B(0; R) together with (9.29) shows that the integrandin (9.31) is
2
bounded by kf ku(B(0;R)) e−c|y| and, as s → ∞, converges pointwise to f (x0 ) exp − 21 y ⊺ Ay .
Hence, by dominated convergence, the integral in the right hand side of (9.31) converges to
the expression in the right hand side of (9.28).
Example 9.9.2. Using Laplace’s method we will derive the classical first-order asymptotic
expansion of the gamma function which is known as Stirling’s formula. Using the change
of variable y = x/s we obtain that
Z ∞ Z ∞
s −x s+1
Γ(s + 1) = x e dx = s exp(−sg(y)) dy
0 0

where g(y) = y − log(y). At y0 = 1 we have g ′ (y′′

0 ) = 0, g (y0 ) = 1 > 0, and conditions
(i)–(iii) are satisfiend. Therefore
Z ∞ √
s 21
lim e s e−sg(y) dy = 2π
s→∞ 0
Therefore
Γ(s + 1)
(9.32) lim √ =1
s→∞ ss+ 12 e−s 2π

9.10. Exercises
Exercise 9.10.1. Suppose (EX , mX ) and (EY , mY ) be σ–finite elementary integrals and
let, and let k k∗m , m = mX ⊗ mY , be the Daniell product mean. Show that m is σ–finite
and that for any set A ∈ M (k k∗m ),
Z Z Z Z
m(A) = 1A (x, y) mY (dy) mX (dx) = 1A (x, y) mX (dx) mY (dy)
X Y Y X
9.10. Exercises 243

Exercise 9.10.2. Consider X = R with the usual topology, Y = R with the discrete
topology, and let λ1 and the # be the Lebesgue and counting measures on X and Y
respectively. Show that (a) the diagonal Λ in X × Y has measure (λ1 ⊗ #)(∆) = ∞, and
(b) inf{m(K) : K ∈ K, K ⊂ ∆} = 0. (Hint: For (a), the outer regularity of the Radon
measure λ1 ⊗ #; for (b), show that every compact subset of ∆ is is a finite set.)

Exercise 9.10.3. Define f : R2 → R by


 1 if 0≤x≤y <x+1
f (x, y) = −1 if 0 ≤ x, x + 1 ≤ y < x + 2

0 otherwise
R R R R
Show that R R f (x, y) dx dy 6= R R f (x, y) dy dx. Conclude that f is not λ2 integrable,
where λ2 is Lebesgue’s measure on R2 .

Exercise 9.10.4. Suppose ϕ : (0, ∞) → R is convex. Recall that ϕ∗ (x) = xϕ(1/x). Show
that
Z
ϕ(0) := lim ϕ(0) = ϕ(1) − D+ ϕ(1) + tµϕ (dt)
xց0 (0,1]
∗ ∗
ϕ (0) = lim ϕ (0) = D+ ϕ(1) + µϕ (1, ∞).
xց0

Exercise 9.10.5. Let k(s, t) be a complex–valued measurable function in (0, ∞)2 such that
k(αs, αt) = α−1 k(s, t) for all α > 0. Suppose that t 7→ k(1, t)t−1/p is Lebesgue integrable
in (0, ∞) for some 1 < p. Show that for every f ∈ Lp (0, ∞),
Z ∞
(Kf )(s) = k(s, t)f (t) dt
0

satisfies kKf kp ≤ C(k, p)kf kp , where C = C(k, p) is a constant depending on k and p only.
When k(s, t) = 1s 1{t<s} , K is called Hardy’s operator . Find C(k, p) in this case.

Exercise 9.10.6. Let (X, A, µ) = ([0, 1],RB([0, 1]), λ) = (Y, B, ν). Let 0 = δ1 < δ1 < . . . <
δn → 1. Let gn (x) = an 1(δn ,δn+1 ] so that [0,1] gn (t)dt = 1. Define the function
∞
X
f (x, y) = (gn (x) − gn+1 (x))gn (y)
n=1
R R R R
Show that [0,1] [0,1] f (x, y)dy dx = 1 6= 0 = [0,1] [0,1] f (x, y)dx dy. Show that |f | ∈
/
L1 ([0, 1] , B ([0, 1]), λ2 ).
2 2

x2 −y 2
Exercise 9.10.7. Let f (x, y) = (x2 +y 2 )2
for (x, y) ∈ (0, 1]2 . Is f ∈ L1 ((0, 1]2 , λ)?

Exercise 9.10.8. If f (n, x) = e−nx − 2e−2nx , (n, x) ∈ N×(0, ∞), show that f ∈ / L1 (# ⊗ λ),
whereR # P
is the counting measure
P R∞ on N and λ is Lebesgue measure on (0, ∞). (Hint: Check
∞
that 0 n f (n, x) dx 6= n 0 f (n, x) dx.
244 9. Finite product of elementary integrals

Exercise 9.10.9. Let X = [0, 1] = Y , F the Borel σ–algebra on [0, 1]. Let λ and ν be the
Lebesgue and counting measure R respectively.
Let ∆be the diagonal
in [0, 1]2 . Compute
R R R
the iterative integrals [0,1] [0,1] 1∆ dλ dν and [0,1] [0,1] 1∆ dν dλ. Show that 1∆ is not
σ–finite and thus, no integrable w.r.t. λ ⊗ ν.
Exercise 9.10.10. Let µ be a measure on space (Ω, F ). If f ∈ Lp (µ) for some 1 ≤ p < ∞,
show that
Z Z ∞
p
(9.33) |f | dµ = p tp−1 µ(|f | > t) dt
Ω 0
P kp k
Show that f ∈ Lp (µ) iff k∈Z 2 µ(|f | > 2 ) converges.

Exercise 9.10.11. Let (X, A, µ) be a σ–finite measure space. Let f ∈ A be fixed and
assume
R that µ({f ≤ t}) < ∞ for all t ∈ R. Consider the collection C = {g ∈ A : 0 ≤ g ≤
1, g dµ = G}, where G > 0 is fixed. Show that the function
g∗ = 1{f <s} + c1{f =s} ,
where s = sup{t : µ(f < t) ≤ G} and c is chosen so that
G = µ({f < s}) + cµ({f = s}),
minimizes the problem
Z
I = inf f (x)g(x) µ(dx).
{g∈C } X

Show that if µ({f = s}) = 0, then g∗ is unique.

Exercise 9.10.12. Suppose that 0 ≤ f ∈ L1 + L∞ . For t > 0, define
δf (t) := µ(|f | > t)
λt := inf{τ : δf (τ ) < t}
∗
f (t) := inf{τ : δf (τ ) ≤ t}
g(λ, t) := kf − fλ k1 + tλ
(a) Show that λt < ∞ and δf (λt ) ≤ t for all t > 0. (Hint: limt→∞ δf (t) = 0).
(b) If δf (λt ) < t, show that f ∗ (s) = λt for all δf (λt ) < s < t.
(c) For λ > 0, let fλ = f ∧ λ. Show that f − fλ ∈ L1 whenever λ ≥ λt .
(d) Suppose λ > λt , then 0 ≤ fλ − fλt ≤ (λ − λt )1{f >λt } . Show that g(λt , t) ≤ g(λ, t).
Assume that g(λ∗ , t) < ∞ for some λ∗ < λt .
(e) Show that g(λ, t) < ∞ for all λ > λ∗ , and that the map λ 7→ g(λ, t) is continuous
on [λ∗ , ∞). (Hint: kfλ2 − fλ1 k1 ≤ |λ2 − λ1 |δf (λ∗ ) for all λ1 , λ2 ≥ λ∗ .)
(f) Show that g(λ∗ , t) ≥ g(λ, t) for all λ∗ < λ < λt ; hence, g(λ∗ , t) ≥ g(λt , t). Conclude
that inf λ>0 g(λ, t) = g(λt , t) = kf − fλt k1 + tλt (Hint: whenever λ∗ < λ < λt ,
fλ − fλ∗ ≥ (λ − λ∗ )1{f >λ} and δf (λ) ≥ t.)
Rt
(g) Show that g(λt , t) = 0 f ∗ (s) ds.
9.10. Exercises 245

Exercise 9.10.13. the functional

(9.34) K(f ; t) := inf kvk1 + t kwk∞ : f = v + w

defines a complete norm on L1 + L∞ .

(a) Show that K(f ; t) = K(|f |; t).
Suppose 0 ≤ f ∈ L1 + L∞ .
(b) Show that the infimun in (9.34) can be taken over real valued functions v ∈ L1
and w ∈ L∞ with f = v + w. (Hint: f = Re(v) + Re(w) and kRe(v)k1 ≤ kvk1 ,
kRe(w)k∞ ≤ kwk∞ .)
(c) If f = v + w where v and w are real valued, let v1 = (f ∧ v)+ and w1 = f − v1 .
Show that 0 ≤ v1 ≤ |v| and 0 ≤ w1 ≤ |w|. Conclude that the infimum in (9.34)
can be taken over real valued functions v ∈ L+ +
1 and w ∈ L∞ with f = v + w.
(d) Show that K(f ; t) = inf λ>0 g(λ, t), where g is as in Exercise 9.10.12.

Exercise 9.10.14. Show that the stretching factor 5 in Vitali’s covering Lemma can be
reduced to any factor θ > 3.

Exercise 9.10.15. Let Rµ be a probability measure on (R, B(R)) and define Fµ (x) :=
µ(−∞, x]. Suppose that |y|µ(dy) < ∞.
(a) Show that
Z x Z
Ψ(x) := Fµ (z) dz = (x − y)+ µ(dy) < ∞, x∈R
∞ R

(b) Show that Ψ is monotone nondecreasing and convex.

Ψ(x)
(c) Show that lim x = 1, lim Ψ(x) = 0, and lim xµ(x, ∞) = 0.
x→∞ x→−∞ x→∞

Define Ψ∗ (y) := supx∈R(xy − Ψ(x)) as a extended real valued function.

(d) Show that
Ry
∗ 0 Qµ (t) dt if 0≤y≤1
Ψ (y) =
∞ otherwise
where Qµ is the quantile
R function Qµ (t) = inf{x : Fµ (x) ≥ t} for 0 < t < 1. In
∗
particular, Ψ (1) = yµ(dy).

Exercise 9.10.16. A function on Rn is radial if f (x) = f (y) for all x, y ∈ Rn with

|x| = |y|. Let O(n) be the collection of linear transformations on Rn that perserve k k2 ,
that is, (U x) · (U y) = x · y. This means that U −1 = U ⊺ and | det(U )| = 1. Show that f is
radial iff f ◦ U = f for all U ∈ O(n).

Exercise 9.10.17. Let S and T be linear operators on Rd . If |Sx| ≤ |T x| for all x, show
that | det(S)| ≤ | det(T )|. (Hint: If det(T ) 6= 0 then S ◦ T −1 (B(0; 1)) ⊂ B(0; 1).)
246 9. Finite product of elementary integrals

Exercise 9.10.18. Show that

Z
f (xb11 + . . . + xbnn )xa11 −1 . . . xann −1 dx
Rn
+
a1
Z
Γ b1 · . . . · Γ( abnn ∞ a1
+... ab n −1
= a1 an
f (s)s b1 n ds
b1 · . . . · bn Γ b1 + ... bn 0

Show that the Lebesgue measure of the set E = {x ∈ Rn+ : xb1 + . . . + xbnn < r} is given by

Γ b11 + 1 · . . . · Γ b1n + 1 1 +...+ 1
λn (E) = r b1 bn
Γ b11 + . . . + b1n + 1
R
Exercise 9.10.19. Show that Sn−1 v · u σ(du) = 0 for any v ∈ Rn . (Hint: Let P be any
orthogonal transformation such that P v = −v.)
Exercise 9.10.20. For any a and b in Rn , show that
Z
(a · u)(b · u) σ(du) = (a · b)ωn
Sn−1
σ(Sn−1 )
where ωn = n is the Lebesgue measure of the unit ball B(0; 1) in Rn . (Hint: Consider
the orthogonal transformation R such that Ra = −a and R is the identity in the orthogonal
complement {a}⊥ .)
Chapter 10

Signed and Complex

measures

In this section we will developed a theory of integration that extends previous discussion
on positive elementary integrals to signed elementary integrals. For a given signed measure
m under some simple technical condition, we will show that there is an optimal mean k k
dominating m and −m. From there, using the integration theory developed for positive
elementary integrals, we extend m to L1 (k k). We will show that the space of integrals have
a rich algebraic and order structure.

10.1. Real valued elementary integrals

Throughout this section we will assume that E, a collection of bounded real functions
defined on a common domain Ω, is a ring lattice closed under chopping and that m is a
linear function in E taking values over R.
Real valued elementary integrals, wich we referred as signed elementary integrals appear
in many applications

Example 10.1.1. The simplest example of signed elementary integrals are those obtained
by difference of positive integrals; more precisely, if (E, m1 ) and (E, m2 ) are elementary
positive integrals, then m = m1 − m2 is a signed elementary integral on E.

Example 10.1.2. Suppose k k is a mean that dominates a positive σ–continuous elementary

integral (E, I). If f ∈ L1 (k k) takes values in R, then φ 7→ I(f φ) is a signed σ–continuous
elementary integral on E.

Definition 10.1.3. The elementary integral m has finite variation at ψ ∈ E+ if

(10.1) |m|(ψ) = sup m(φ) : φ ∈ E, |φ| ≤ ψ < ∞.

247
248 10. Signed and Complex measures

An elementary integral (E, m) is said to be of finite variation if (10.1) holds for all ψ ∈ E+ .
The map ψ 7→ |m|(ψ) on E+ is called variation of m.
Remark 10.1.4. As −φ ∈ E iff φ ∈ E, we have that |m|(ψ) = sup{m(φ) : φ ∈ E, |φ| ≤ ψ} =
sup{|m(φ)| : φ ∈ E, |φ| ≤ ψ}. As |ψ| = | − ψ|, we also have that |m(ψ)| = m(ψ) ∨ m(−ψ) ≤
|m|(ψ) for all ψ ∈ E+ . It is clear that m = |m| whenever m is a positive elementary integral.
Example 10.1.5. Given a measurable space (Ω, F ), let E := B(Ω, F ) be the space of
bounded real valued measurable functions with the sup norm. If Λ is a bounded linear
functional on E, then Λ is of finite variation. Indeed, for any ψ ∈ E and φ ∈ E with |φ| ≤ ψ
|Λφ| ≤ kΛkkφku ≤ kΛkkψku
This shows that |Λ|(ψ) < kΛkkψku < ∞ for all ψ ∈ E+ . In particular |Λ|(1) ≤ kΛk. Let
ψn ∈ E with kψn ku ≤ 1 such that kΛk = limn |Λψn |. Then kΛk ≤ |Λ|(1). This shows that
|Λ|(1) = kΛk.
Lemma 10.1.6. Suppose (E, m) is a signed elementary integral of finite variation. The
variation map |m| on E+ is additive and positive homogeneous.

Proof. Positive homogeneity follows directly from the definition of | |. Also, from the
definition of variation it follows that |m|(ψ) ≤ |m|(ϕ). whenever ψ, ϕ ∈ E+ and ψ ≤ ϕ.
Let ψ1 and ψ2 be nonnegative elementary functions and let ε > 0.
There are functions φj ∈ E, j = 1, 2, such that |φj | ≤ ψj and m(φj ) > |m|(ψj ) − 2ε . As
|φ1 + φ2 | ≤ ψ1 + ψ2 ,
|m|(ψ1 ) + |m|(ψ2 ) − ε < m(φ1 ) + m(φ2 ) = m(φ1 + φ2 ) ≤ |m|(ψ1 + ψ2 ).
Consequently, |m| is superadditive on E+ .
Now we show that |m| is subadditive on E+ . Let φ ∈ E such that |φ| ≤ ψ1 + ψ2 and
|m|(ψ1 +ψ2 )−ε < m(φ). Notice that 0 ≤ ψ1′ = ψ1 ∧|φ| ≤ ψ1 and 0 ≤ ψ2′ = |φ|−ψ1 ∧|φ| ≤ ψ2 .
Consider the functions
φ1+ = φ+ ∧ ψ1′ φ2+ = φ+ − φ+ ∧ ψ1′
(10.2) ′ ′
φ1− = ψ1 − φ+ ∧ ψ1 φ2− = φ− + φ+ ∧ ψ1′ − ψ1′
The functions in the array (10.2) are in E+ ; its columns add up to ψ1′ and ψ2′ respectively;
its rows add up to φ+ and φ− respectively. Hence
|m|(ψ1 + ψ2 ) − ε < m(φ) = m(φ+ − φ− ) = m(φ1+ − φ1− ) + m(φ2+ − φ2− )
≤ |m|(ψ1′ ) + |m|(ψ2′ ) ≤ |m|(ψ1 ) + |m|(ψ2 ).
Subadditivity follows immediately.
Theorem 10.1.7. If (E, m) is a signed elementary integral of finite variation then the
variation map |m| admits a unique linear extension to E. This extension, denoted also by
|m|, is the minimal positive elementary integral on E such that
(10.3) |m(φ)| ≤ |m|(|φ|)
10.1. Real valued elementary integrals 249

for all φ ∈ E.

Proof. By Lemma 10.1.6 the variation map | | is additive and positive homogeneous on E+ .
Hence, for any φ ∈ E we can define |m|(φ) = |m|(φ+ )−|m|(φ− ). Furthermore, if φ = φ1 −φ2
with φ1 and φ2 in E+ , then φ+ + φ2 = φ1 + φ− whence it follows that |m|(φ+ ) − |m|(φ− ) =
|m|(φ1 ) − |m|(φ2 ). This shows that the value |m|(φ) is independent on how we choose to
express φ as the difference of nonnegative elementary functions. Consequently,
|m(φ)| ≤ |m(φ+ )| + |m(φ− )| ≤ |m|(φ+ + φ− ) = |m|(|φ|)

Suppose n is a positive elementary integral on E such that |m(ψ)| ≤ n(ψ) for all ψ ∈ E+ .
Then, for any φ ∈ E such that |φ| ≤ ψ,
|m(φ)| ≤ |m(φ+ )| + |m(ψ− )| ≤ n(φ+ ) + n(φ− ) = n(|φ|) ≤ n(ψ).
Taking the suprema over all such φ we obtain that |m|(ψ) ≤ n(ψ). Consequently, for any
φ ∈ E we have |m(φ)| ≤ |m|(|φ|) ≤ n(|φ|).

The following result provides an alternative representation for the variation of a signed
elementary integral.

Lemma 10.1.8. For any ψ ∈ E+ ,

(10.4) |m|(ψ) = sup{m(φ1 ) − m(φ2 ) : φ1 , φ2 ∈ E+ , φ1 + φ2 = ψ}.

Proof. Denote the right hand side of (10.4) by n(ψ). If φj ∈ E+ , j = 1, 2, and ψ = φ1 + φ2

then,
m(φ1 ) − m(φ2 ) ≤ |m|(φ1 ) + |m|(φ2 ) = |m|(φ1 + φ2 ) = |m|(ψ).
Taking suprema over all such pairs (φ1 , φ2 ) we obtain n(ψ) ≤ |m|(ψ). To prove the reverse
inequality, suppose φ ∈ E, |φ| ≤ ψ so that δ = ψ − |φ| ∈ E+ . If m(δ) > 0 define φ1 = φ+ + δ
and φ2 = φ− ; otherwise, define φ1 = φ+ and φ2 = φ− + δ. In either case, φ1 + φ2 = ψ
and n(ψ) ≥ m(φ1 ) − m(φ2 ) ≥ m(φ). Taking suprema over all such φ, we obtain that
n(ψ) ≥ |m|(ψ).

Theorem 10.1.9. If m is a σ–continuous signed elementary integral on E of finite varia-

tion, then its variation |m| is positive σ–continuous elementary integral.

Proof. By Theorem 10.1.7 the variation |m| is a positive elementary integral. It remains
to show that |m| is σ–additive whenever m is so. Let (ψn ) be an increasing sequence in
E. By replacing ψn by ψn − ψ1 if necessary, we may assume without loss of generalization
that (ψn ) ⊂ E+ . Let ψ = supn ψn . Clearly supn |m|(ψn ) ≤ |m|(ψ). For the converse
inequality, for any ε > 0 choose φ ∈ E with |φ| ≤ ψ such that |m|(ψ) − ε < m(φ).
The sequences (ψn ∧ φ+ ) and (ψn ∧ φ− ) in E+ increase to φ+ and φ− respectively. As
m is σ–additive, and hence σ–continuous, we have that limn m(ψn ∧ φ+ ) = m(φ+ ) and
250 10. Signed and Complex measures

limn m(ψn ∧ φ− ) = m(φ− ). Hence, for some N ∈ N, m(ψn ∧ φ+ ) − m(ψn ∧ φ− ) > m(ψ) − ε
for all n ≥ N . As |ψn ∧ φ+ − ψn ∧ φ− | ≤ ψn , n ≥ N implies that
sup |m|(ψk ) ≥ |m|(ψn ) ≥ m(ψn ∧ φ+ ) − m(ψn ∧ φ− ) ≥ m(ψ) − ε.
k

Consequently, supn |m|(ψn ) ≥ |m|(ψ).

Example 10.1.10. Suppose S is a topological space. A σ–continuous elementary intergral
on Cb (S) of finite variation is called a Baire measure. Recall that from Theorem 5.6.6,
Σ
that the Baire σ–algebra Ba (S) coincides with the collection of sets in Cb (S) . A Baire
measure defines a finite σ–additve function on Ba (S).

10.2. Extension of elementary integrals of finite variation

Extending an elementary integral m of finite variation on a Stone lattice E is straight for-
ward. First we consider the case where m is σ–continuous. In this case, by Theorem 10.1.9,
|m| is a positive σ–continuous elementary integral on E. Let k k∗|m| be the Daniell mean asso-
ciated with the positive σ–additive elementary integral |m|. We define L1 (m) as L1 (k k∗|m| ),
the closure of E under (F∗ (E), k k∗|m| ). As in Section 6.5, the extension of m to L1 (m), which
we also denote by m, is linear and
|m(f )| ≤ |m|(|f |) = kf k∗m , f ∈ L1 (|m|).
The real extended number
Z
kmkT V := 1 d|m| = |m|(1)

R called the total variation of the elementary integral m. If 1 ∈ L1 (m), that is, kmkT V :=
is
1 d|m| < ∞, then m is said to be of finite total variation, or simply, that m is a finite
elementary integral .
If m is only additive, then |m| is also additive on E. We use the Jordan seminorm k k# m
# #
instead of the Daniell mean and define L# (m) to be L# (|m|), the closure of E on (F , k km ).
As in Section 6.1
|m(f )| ≤ |m|(|f |) = kf k#
m, f ∈ L# (|m|)

The procedure described above can be applied to linear functionals (not necessarily
positive) in C00 (X) where X is l.c.H. to produce signed–Radon measures.
Theorem 10.2.1. Suppose X is a l.c.H. space. A linear functional m on C00 (X) has finite
variation iff m has the following property:
Property R: If (φn : n ∈ N) is a sequence of functions in C00 (X) whose supports are
contained in a common compact set and which converges uniformly to a function
φ then, limn m(φn ) = m(φ).
In either case, m is order continuous.
10.3. Signed measures 251

+
Proof. Suppose m is not of finite variation. Then, there exits ψ ∈ C00 (X) and a sequence
(φn : n ∈ N ⊂ C00 (X) with |φn | ≤ ψ such that |m(φn )| > 2 . Each function gn := 2−n φn
n

vanishes outside of supp(ψ) and kgn ku → 0. Since |m(gn )| > 1, it follows that m does not
satisfy the Radon property.

Conversely, suppose m does not satisfy the Radon property. Then, there is a compact set K
and sequence of functions φn ∈ C00 (X) whose supports are contained in K, which converges
uniformly to some function φ ∈ C00 (X) and such that ε := inf n |m(φn − φ)| > 0. Without
loss of generality suppose that kφn − φku < 2−n . Let φ ∈ C00 (X) such that 1K ≺ ψ ≤ 1.
For each n define

ψn = sign m(φn − φ) · (φn − φ)
P
so that m(ψn ) ≥ 0. If Ψn := nk=1 ψk , then clearly |Ψn | ≤ ψ and |m(Ψn )| = m(Ψn ) > nε.
Therefore m has infinite variation at ψ.

By Lemma 6.7.1, if m is a Radon measure, then |m| is order continuous. Consequently, for
any increasing directed family Φ ⊂ C00 (X) with lim Φ = ψ,
lim |m(ψ − φ)| ≤ lim |m|(ψ − φ) = 0
φ∈Φ φ∈Φ

Therefore, m is order–continuous.
Definition 10.2.2. A linear functional m satisfying property R in Theorem 10.2.1 is called
(real valued) Radon measure.
Example 10.2.3. With λ as the Lebesgue measure on R, the linear functional
Z
m(f ) := f (x) sin(x) λ(dx), f ∈ C00 (R)
R
R
is a real valued Radon measure, and its variation is given by |m|(f ) = R f (x)| sin(x)| λ(dx).
m is not defined in all of M|m| , e.g. m(R) is not defined. |m| is a positive Radon measure
defined in all of M|m| . Moreover, |m|(R) = ∞.

10.3. Signed measures

Suppose µ is an real–valued additive function in a ring R of subsets of given set Ω. Its
linear extension m to the space E(R) of simple functions is an elementary integral. If m
is of finite variation, then the restriction |µ| of |m| to R is positive, additive, and satisfies
|µ(A)| ≤ |µ|(A) for all A ∈ R.

Conversely, if ν is a real nonnegative additive function in R such that |µ(A)| ≤ ν(A) for
all A ∈ R, then its extension n dominates m and −m on E(R), that is, |m(ψ)| ≤ n(ψ)
for all ψ ∈ E+ (R). This implies that m has finite variation and that |m| ≤ n and |µ| ≤ ν.
Consequently, |µ| is the smallest positive measure that dominates µ and −µ.
Theorem 10.3.1. Suppose µ be a real–valued additive function in a ring of functions R
and let m be its linear extension to E(R).
252 10. Signed and Complex measures

(i) m is of finite variation iff there exists a positive additive function ν on R such that
(10.5) |µ(A)| ≤ ν(A), A ∈ R.
(ii) If m is of bounded variation, then restriction |µ| of |m| to R is the smallest positive
additive function in R satisfying (10.5); moreover,
(10.6) |µ|(A) = sup{µ(A1 ) − µ(A \ A1 ) : A1 ∈ R, A1 ⊂ A}
(iii) m is σ–continuous iff µ is σ–additive.
Remark 10.3.2. |µ| is called the variation measure of µ. The total variation fo a
measure µ is defined as kµkT V := |µ|(Ω). When kµkT V < ∞, we say that µ is of finite total
variation, or simply that µ is a finite measure.

Proof. The arguments given above prove (i) and half of (ii).

Proof of equation (10.6): Let m be the linear extension of µ to E(R), and let ν(A) denote
the right hand side of (10.6). Clearly |µ(A)| ≤ ν(A) and, since R ⊂ E(R), ν(A) ≤ |m|(A)
by (10.4). We will show that ν is an additive function in R. Let B1 and B2 disjoint sets in
R and let ε > 0. Let A1 ⊂ B1 and A2 ⊂ B2 be sets in R such that
ε
ν(B1 ) − < µ(A1 ) − µ(B1 \ A1 )
2
ε
ν(B2 ) − < µ(A2 ) − µ(B2 \ A2 ).
2
In this case, (B1 ∪ B2 ) \ (A1 ∪ A2 ) = (B1 \ A1 ) ∪ (B2 \ A2 ) and so,

ν(B1 ) + ν(B2 ) − ε < µ(A1 ∪ A2 ) − µ (B1 ∪ B2 ) \ (A1 ∪ A2 ) ≤ ν(B1 ∪ B2 ).
Thi shows that ν is superadditive.

Let A ⊂ B1 ∪ B2 such that ν(B1 ∩ B2 ) − ε < µ(A) − µ (B1 ∩ B2 ) \ A . Set
A1+ = A ∧ B1 A2+ = A − (A ∧ B1 )
(10.7)
A1− = B1 − (A ∧ B1 ) A2− = B2 + (A ∧ B1 ) − A
The terms in (10.7) are pairwise disjoint sets in R since A2− = ((B1 ∪B2 )\A)∩B2 = B2 \A.
The union by rows in (10.7) is A and (B1 ∪ B2 ) \ A, while the union by columns is B1 and
B2 . Therefore

ν(B1 ) + ν(B2 ) ≥ µ(A1+ ) − µ(A1− ) + µ(A2+ ) − µ(A2− )

= µ(A1+ ∪ A2+ ) − µ(A1− ∪ A2− ) = µ(A) − µ (B1 ∪ B2 ) \ A
> ν(B1 ∪ B2 ) − ε.
This shows that ν is subadditive. Therefore ν is additive and dominates µ. The first part
of (ii) implies that |µ| = ν.

(iii) Suppose m is σ–continuous. By Theorem 10.1.9, m is σ–continuous iff |m| is σ–

continuous. As |µ| is the restriction of |m| to R, it follows that then |µ| is σ–continuous.
10.3. Signed measures 253

We claim that µ is σ–continuous iff |µ| is σ–continuous. If |µ| is σ continuous then it is

also δ–continuous. Let (An ) ⊂ R be a sequence decreasing to ∅. From |µ(An )| ≤ |µ|(An ),
µ(An ) → 0. Hence µ is also δ–continuous, and so it is σ–continuous.

Conversely, suppose µ is σ–continuous and let (Bn ) ⊂ R be an sequence increasing to some

B ∈ B. Given ε > 0, let A ⊂ B in R so that |µ|(B) − ε < µ(A) − µ(B \ A). Then
Bn ∩ A ր A and Bn ∩ (B \ A) ր B \ A. Consequently

|µ|(B) − ε ≤ lim µ(Bn ∩ A) − µ(Bn ∩ (B \ A)
n
≤ lim inf |µ|(Bn ) ≤ lim sup |µ|(Bn ) ≤ |µ|(B).
n n
This shows that |µ| is also σ–continuous.

Suppose |µ| is σ–continuous and let φn ց 0 in E(R). Since

Z Z Z
φn d|m| = φn 1{φ1 >0} d|m| ≤ ε|µ|({φ1 > 0}) + φn 1{φn >ε} d|m|

≤ ε|µ|({φ1 > 0}) + kφ1 ku |µ|({φn > ε}),

lim supn |m|(φn ) ≤ ε, and so |m|(φn ) ց 0. This shows that |m| is σ–continuous.
Remark 10.3.3. A direct consequence of Theorem 10.3.1 is that
1
µ+ (A) = (|µ|(A) + µ(A)) = sup{µ(B) : B ∈ R, B ⊂ A}
2
1
µ− (A) = (|µ|(A) − µ(A)) = sup{−µ(B) : B ∈ R, B ⊂ A}
2
Definition 10.3.4. Given a measure space (Ω, F ), a function µ on F into R is a signed
measure if
(i) µ(∅) = 0,
S P
(ii) µ( n An ) = ∞n=1 µ(An ) for any sequence (An ) ⊂ F of pairwise disjoint sets.

Remark 10.3.5. Since the union of a sequence of sets is independent of any rearrangement
of the sequence, the series in (ii) is absolute convergence whenever it is finite. By definition,
a signed measure µ takes at most one value in {−∞, ∞}.

The restriction of a signed measure on the ring R(F ) of measurable sets in A ∈ F with
|µ(A)| < ∞ is clearly σ–additive and its linear extension to the space of simple functions
E(R) is an σ–continuous elementary integral. The converse is not necessarily true, as the
next example shows.
R
Example 10.3.6. The function ν(A) = A (f (x) −R g(x)) dx on B(R), where f, g ∈ L+ 1 (λ) is
a signed measure on B(R). The function µ(A) = A x dx is not a signed measure on B(R);
however, µ is σ–additive on the ring of Borel sets with finite Lebesgue measure.

If µ is of finite variation on R(F ) then (10.6) holds, the measures µ+ and µ− are well
define and satisfy µ = µ+ − µ− and |µ| = µ+ + µ− on R(F ). In the remaining of this
254 10. Signed and Complex measures

section we will extend these identities to all of F , even in the case where µ fails to be of
finite variation.

Definition 10.3.7. Let µ be a signed–measure on (Ω, F ). A set A ∈ F is a positive set

for µ if µ(B) ≥ 0 for any A ⊃ B ∈ F . Similarly, a set A ∈ F is a negative set for µ if A
is a positive set for −µ.

Theorem 10.3.8. Suppose that µ is a signed measure on (Ω, F ). If −∞ < µ(A) < 0, then
there is a negative set B with B ⊂ A and µ(B) ≤ µ(A).

Proof. Let δ0 = sup{µ(E) : E ∈ F , E ⊂ A}. Then 0 = µ(∅) ≤ δ0 ≤ ∞. If δ0 = 0, then A

is a negative set and we can take B = A. Suppose δ0 > 0 and choose A1 ∈ F , A1 ⊂ A so
that µ(A1 ) > δ20 ∧ 1. By induction we obtain sequences 0 ≤ δn+1 ≤ δn ≤ ∞ and An ⊂ F
such that
S
(a) δn = sup{µ(E) : E ∈ F , E ⊂ A \ nk=1 Ak },
S
(b) An+1 ⊂ A \ nk=1 Ak ,
(c) µ(An+1 ) > δ2n ∧ 1.
S
Let A∞ = ∞ and B = A\A∞ . Since the sets An are pairwise disjoint and µ(An ) ≥ 0,
n=1 AnP
it follows that 0 ≤ ∞ n=1 µ(An ) = µ(A∞ ). Hence,

µ(A) = µ(A∞ ) + µ(B) ≥ µ(B).

Since µ(A) is finite, then both µ(A∞ ) and µ(A\A∞ ) are finite. In particular limn µ(An ) = 0,
and by (c) limn δn = 0. The set SnB satisfies the conclusion of the statement. Indeed, if
E ∈ F , E ⊂ B then E ⊂ A \ k=1 Ak for each n ∈ N. Consequently, µ(E) ≤ δn for all
n ∈ N and µ(E) ≤ 0.

Theorem 10.3.9. (Hahn decomposition theorem) Let (Ω, F ) be a measurable space and µ
a signed measure on F . There is a positive set P and a negative set N such that Ω = P ∪ N
and P ∩ N = ∅.

Proof. Without loss of generality we may assume that µ does not take the value −∞. Let
N denote the family of all negative sets and let η = inf{µ(E) : E ∈ N }. Since ∅ ∈ N , then
−∞ ≤ η ≤ µ(∅) S= 0. Let An ∈ N be a sequence such that µ(An ) ց η. The S sets B1S= A1 ,
Bn+1 = An+1 \ nk=1 Ak (n ∈ N) are negative and pairwise disjoint and N = n An = n Bn ;
hence, N ∈ N and η ≤ µ(N ) ≤ µ(An ). Consequently, −∞ < µ(N ) = η.

We will show that P = Ω\N is a positive set. Suppose that there is a measurable set E ⊂ P
and µ(E) < 0. By Theorem 10.3.8 there is a negative set B ⊂ E with µ(B) ≤ µ(E). Since
N and B are disjoint negative sets, N ∪ B ∈ N and µ(N ∪ E) = µ(N ) + µ(E) < µ(N ) = η
contradicting the choice of η. Therefore P is a positive set.

Definition 10.3.10. Let (Ω, F ) be a measurable space. Two measures µ and ν are mu-
tually singular , denoted by µ ⊥ ν, if there is A ∈ F such that µ(A) = 0 = ν(Ω \ A).
10.3. Signed measures 255

Theorem 10.3.11. (Jordan decomposition theorem) Let (Ω, F ) be a measurable space and
µ a signed measure. There is a unique pair of measures µ+ and µ− such that
(10.8) µ = µ+ − µ− , µ+ ⊥ µ −
Set |µ| := µ+ + µ− . If (P, N ) and (S, Q) are two Hahn decompositions of Ω with respect to
µ then P = S and N = Q |µ|–a.s.

Proof. Let (P, N ) be a Hahn decomposition of µ, and define

(10.9) µ+ (A) = µ(A ∩ P ), µ− (A) = −µ(A ∩ N )
for all A ∈ F . Clearly µ+ and µ− mutually singular measures that satisfy (10.8). The
measures µ+ and µ− are independent of the choice of the Hahn decomposition. Indeed, for
any A, B ∈ F with B ⊂ A
µ(B) = µ+ (B) − µ− (B) ≤ µ+ (B) ≤ µ+ (A)
−µ(B) = µ− (B) − µ+ (B) ≤ µ− (B) ≤ µ− (A).
Consequently, from (10.9)
(10.10) µ+ (A) = sup{µ(B) : B ∈ F , B ⊂ A}
(10.11) µ− (A) = sup{−µ(B) : B ∈ F , B ⊂ A}
It remains to show that the decomposition (10.8) is unique. Suppose that ν and τ are
mutually singular measures with µ = ν − τ . If Q ∈ F , S = Ω \ Q are such that ν(Q) =
0 = τ (S), then (S, Q) is a Hahn decomposition of Ω with respect to µ, so ν(A) = µ(A ∩ S),
τ (A) = −µ(A ∩ Q). From (10.10) and (10.11), ν = µ+ and τ = µ− , consequently P = S
and N = Q |µ|–a.s.

We will give a different description of the variation function that extends to complex
measures. For any A ∈ F , let PA denote the collection of all the countable measurable
partitions of A, and define
 
X 
(10.12) Vµ (A) = sup |µ(Aj )| : {Aj } ∈ PA
 
j

Theorem 10.3.12. Vµ = |µ|. If ν is a measure on (Ω, F ) such that |µ(A)| ≤ ν(A) for all
A ∈ F , then |µ| ≤ ν.

Proof. It follows from (10.12) that Vµ (∅) = 0 and |µ(A)| ≤ Vµ (A). Suppose that En ∈ F is
a pairwise disjoint sequence whose union is E, and let Am ∈ F be any countable partition
of E. Then {Am ∩ En : n ∈ N} is a countable partition of Am and {Am ∩ En : m ∈ N} is a
countable partition of En . Hence
X XX XX
|µ(Am )| ≤ |µ(Am ∩ En )| = |µ(Am ∩ En )|,
n m n n m
S P
whence it follows that Vµ ( n En ) ≤ n Vµ (En ). It remains to show the that the last
inequality holds in the opposite direction. To that purpose, let tn ∈ R be a sequence such
256 10. Signed and Complex measures

tn < Vµ (En ) and let {Am,n : m ∈ N} be a measurable partition of En such that

that P
tn < m |µ(An,m |. Since {An,m : (n, m) ∈ n2 } is a countable partition of E,
X X
tn < |µ(An,m )| ≤ Vµ (E)
n n,m
P S
Taking the supremum over all possible tn we obtain n Vµ (En ) ≤ Vµ ( n En ).

For any A ∈ F and any countable partition (An ) ⊂ F of A,

X X X
|µ(A)| = | µ(An )| ≤ |µ(An )| ≤ |µ|(An ) = |µ|(A).
n n n
Consequently, |µ(A)| ≤ |µ|(A) ≤ Vµ (A) ≤ |µ|(A).
Finally, for any Hahn decomposition (P, N ) of Ω,
|µ(A)| = |µ(A ∩ P )| + |µ(A ∩ N )| ≤ ν(A ∩ P ) + ν(A ∩ N ) = ν(A).
Therefore, |µ| is the smallest measure that bounds µ.
Remark 10.3.13. A direct consequence of Theorem 10.3.12 is that is µ is a signed measure
on (Ω, F ), then the restriction of µ to the ring R(F ) of measurable sets A with |µ(A)| < ∞
is of finite variation. Moreover, at least one of the measures µ+ or µ+ is finite, that is
µ− (Ω) ∨ µ+ (Ω) < ∞.

Taking linear combinations (aµ + bν)(E) := aµ(E) + bν(E), we conclude that the space
Mr (Ω, F ) (Mc (Ω, F )) of real (complex) measures of finite total variation form a real
(complex) vector space with norm µ 7→ kµkT V = |µ|(Ω)
Theorem 10.3.14. The space of complex measures Mc (Ω, F ) with the total variation norm
is a Banach space.

Proof. Suppose that (µn ) is a Cauchy sequence, then |µn (E) − µm (E)| ≤ kµn − µm kT V .
This means that (µn ) is a Cauchy sequence of bounded functions defined on F . Hence
µn converges uniformly to a bounded function µ on F . Clearly µ(∅) = 0 and µ is finitely
additive. To show that µ is countably additivity, suppose Em ∈ F increases to its union E.
Given ε > 0, there is N such that supA∈F |µn (A)−µ(A)| < ε/3 for all n ≥ N . The countably
additivity of µN implies that µN is continuous on F , that is limm µN (Em ) = µN (E). Thus,
for some m0 , |µN (E \ Em )| = |µN (E) − µN (Em )| < ε/3 whenever m ≥ m0 . Therefore,
|µ(E) − µ(Em )| ≤ |µ(E) − µN (E)| + |µN (E) − µN (Em )| + |µN (Em ) − µ(Em )| < ε
for all m ≥ m0 . This shows that µ is a complex measure.

10.4. The space of elementary integrals

The collection M(E) of finite variation elementary integrals on E is a vector space of linear
functionals on E. The cone M+ of positive elementary integrals satisfies M+ ∩(−M+ ) = {0}.
Therefore, the relation n ≤ m iff m − n ∈ M+ is an order relation on M that is compatible
with the linear structure.
10.4. The space of elementary integrals 257

Theorem 10.4.1. (M, ≤) is an order complete vector lattice, that is, if B ⊂ M has an
upper bound in M, then it has a least upper bound in M.

Proof. Suppose m, n, ρ ∈ M. If ρ ≤ m and ρ ≤ n then n − m ≤ m + n − 2ρ and

m − n ≤ m + n − 2ρ; thus, |m − n| ≤ m + n − 2ρ, and so ρ ≤ 12 (m + n − |m − n|). Conversely,
if ρ ≤ 12 (m + n − |m − n|) then n − m ≤ m + n − 2ρ and m − n ≤ m + n − 2ρ; hence, ρ ≤ m
and ρ ≤ n. It follows that {m, n} is bounded below and that
1
(10.13) m ∧ n := (m + n − |m − n|)
2
is the greatest lower bound. Similarly, the set {m, n} is bounded above and
1
(10.14) m ∨ n := − (−m) ∧ (−n) = (m + n + |m − n|)
2
is its least upper bound, for n ≤ ρ and m ≤ ρ iff −ρ ≤ −n and −ρ ≤ −m.

For the second statement, ′

Wn let B be the set of finite suprema of elements in B, that is,
terms of the form m = j=1 bj for some {b1 , . . . , bn } ⊂ B. For any b ∈ B, v − b is an upper
bound of B − b iff v is an upper bound of B which in turn, holds iff v is an upper bound
of B′ . As a consequence, it suffices to assume that B ⊂ M+ is bounded above and closed
under taking finite suprema. In such case, define
u(ψ) := sup m(ψ), ψ ∈ E+ .
m∈B
Clearly u finite and positive homogeneous. We claim that u is additive on E+ . Indeed,
given ε > 0 and ψ1 and ψ2 in E+ , there are m1 , m2 and m3 in B such that
ε ε
u(ψj ) − < mj (ψj ), u(ψ1 + ψ2 ) − < m3 (ψ1 + ψ2 ).
3 3
Setting m = m1 ∨ m2 ∨ m3 we obtain that
|u(ψ1 + ψ2 ) − (u(ψ1 ) + u(ψ2 ))| ≤ |u(ψ1 + ψ2 ) − m(ψ1 + ψ2 )|
+ |m(ψ1 ) − u(ψ1 )| + |u(ψ2 ) − m(ψ2 )|
ε ε ε
≤ + + .
3 3 3
This shows the additive of u on E+ . Finally, u we extended linearly to all of E in the obvious
way: u(ψ) := u(ψ+ ) − u(ψ− ).
Remarks 10.4.2. Let m, n ∈ M.
(a) If m+ := m ∨ 0 and m− = (−m) ∨ 0 = −(m ∧ 0), then m± ≥ 0 and, by (10.13)
and (10.14),
m = m+ − m− , |m| = m+ + m− , m+ ∧ m− = 0.
(b) If m and n have finite total variation, then so are m ∨ n and m ∧ n; furthermore,
km ∧ nkT V ≤ kmkT V + knkT V
km ∨ nkT V ≤ kmkT V + knkT V
258 10. Signed and Complex measures

Theorem 10.4.1 and (b) show that the space M∗F V (E) of elementary integrals on E of total
finite variation is a Banach space and a vector lattice.
Theorem 10.4.3. For any m1 , m2 , n ∈ M+ , (m1 + m2 ) ∧ n ≤ m1 ∧ n + m2 ∧ n.

Proof. This follows from

m1 ∧ n + m2 ∧ n = (m1 ∧ n) + m2 ∧ (m1 ∧ n) + n)

= (m1 + m2 ) ∧ (n + m2 ) ∧ (m1 + n) ∧ (2n) ≥ (m1 + m2 ) ∧ n.
Here we have made used of Exercise 10.8.3(a),(c) & (f).
W
Example 10.4.4. Suppose B ⊂ M has least upper bound B and let n ∈ M. Then
B ∧ n := {m ∧ n : m ∈ B} has least upper bound
_ _
B∧n = B ∧ n.
Indeed,
_ _ _
B ∧n+ B ∨n= B +n
_ _
= (B + n) = m+n:m∈B
_
= (m ∧ n) + (m ∨ n) : m ∈ B
_ _
≤ B∧n + B ∨n
W W
whence we obtain upon subtraction that B ∧n≤ B ∧ n . The reverse inequality
W W
B∧n ≤ B ∧ n is clear.

Suppose M is a vector lattice. Two elements m, n of M are said to be orthogonal or

disjoint if |m| ∧ |n| = 0. This is denoted m ⊥ n.
(a) Given a collection G ⊂ M, G ⊥ = {n ∈ M : m ∈ G implies n ⊥ m} is called the
disjoint complement of G.
(b) A collection G ⊂ M is said to be solid if for any n ∈ G and m ∈ M, |m| ≤ |n|
implies that m ∈ G.
(c) A solid vector subspace V of M is called an ideal .
(d) An ideal V in an order complete vector lattice
W M is called a band if any G ⊂ V
that admits an upper bound in M satisfies G ∈ V.
Example 10.4.5. The space RR with pointwise sum, scalar multiplication and pointwise
order is an order complete vector lattice. The family Bb (R) of real bounded functions in
R is an ideal in RR. Moreover, BWb (R) is order complete on itself, i.e., if G ⊂ Bb (R) admits
an upper bound in Bb (R), then G ∈ Bb (R). However, Bb (R) fails to be a band. Indeed,
{fn (x) = |x| ∧ n : n ∈ N} ⊂ Bb (R) has least upper bound in RR, namely f (x) = |x|; however,
f∈/ Bb (R).
10.4. The space of elementary integrals 259

Theorem 10.4.6. (Riesz) Let M be an order complete vector lattice. Then, for any G ⊂ M,
G ⊥ is a band. Moreover, (G ⊥ )⊥ is the band (G) generated by Gand every m ∈ M has a unique
decomposition m = m|| + m⊥ with m|| ∈ (G) and m⊥ ∈ G ⊥ .

Proof. Let m ∈ G ⊥ and r ∈ R. Let k ∈ Z+ so that |r| ≤ k. For any n ∈ G we have

(|rm|) ∧ |n| ≤ (k|m|) ∧ |n| ≤ (k|m|) ∧ (k|n|) = k(|m| ∧ |n|) = 0. Thus G ⊥ is closed under
scalar multiplication. Next let m1 , m2 ∈ G ⊥ . From Exercise 10.8.4 we obtain that
|m1 + m2 | ∧ |n| ≤ (|m1 | + |m2 |) ∧ |n| ≤ |m1 | ∧ |n| + |m2 | ∧ |n| = 0
for any n ∈ G. This shows that G ⊥ is a linear subspace. Suppose n ∈ G ⊥ and |m| ≤ |n|.
Then for any p ∈ G, |p| ∧ |m| ≤ |p| ∧ |n| = 0. Hence m ∈ G ⊥ , and so G ⊥ is an ideal. Suppose
m1 , m2 ∈ G ⊥ . Then for any n ∈ G,
|m1 ∨ m2 | ∧ |n| ≤ (|m1 | + |m2 |) ∧ |n| ≤ |m1 | ∧ |n| + |m2 | ∧ |n| = 0
Therefore, G ⊥Wis an ideal closed under finite suprema. Finally, let B ⊂ G ⊥ ∩ M+ nonempty
and suppose B exists in M. Then
_ _
(10.15) B ∧ |n| = {m ∧ |n| : m ∈ B} = 0
W
and so, B ∈ G ⊥ . This shows that G ⊥ is a band.

The first part of the proof shows that (G ⊥ )⊥ is an ideal containing G. Hence (G) ⊂ (G ⊥ )⊥
and (G) ∩ G ⊥ = {0}. For any m ∈ M+ let
_n o
(10.16) m|| = n ∈ (G) : n ≤ m

and m⊥ = m − m|| . As (G) is a band, we have that m|| ∈ (G)+ and m⊥ ≥ 0. We claim that
m⊥ ∈ G ⊥ . For n ∈ G, m⊥ ∧ |n| = (m − m|| ) ∧ |n| ∈ (G)+ , and so m|| + (m⊥ ∧ |n|) ∈ (G)+ . As
m|| + (m⊥ ∧ |n|) ≤ m, (10.16) implies that m|| + (m⊥ ∧ |n|) ≤ m|| . Therefore m⊥ ∧ |n| = 0.

For arbitrary m ∈ M decompose m+ and m− into their components in (G) and G ⊥ . As

(G) ∩ G ⊥ = {0}, the decomposition (10.15) is unique and (G) = (G ⊥ )⊥ .
Theorem 10.4.7. The collection M∗ (E) of σ–continuous elementary integrals of finite
variation is a band in M(E). The collection M• (E) of order–continuous elementary integrals
of finite variation is a band in M∗ (E).

Proof. Theorem 10.1.9 implies that M∗ (E) is an ideal for m Wis σ–continuous iff |m| is σ–
continuous. Suppose B ⊂ M∗ (E) has least upper bound n = B in M(E). Without loss
of generality, we may assume that B is increasingly directed and contained in M∗+ (E). If
(φn : n ∈ N) ⊂ E+ and φn ր φ ∈ E then, as in the proof of Theorem 10.4.1,
_ _
B (φ) = sup m(φ) = sup m(φn ) = sup sup m(φn ) = sup B (φn ).
m∈B m∈B, n∈N n m∈B n
W
This shows that B ∈ M∗ (E). Therefore, M∗ (E) is a band in M(E). A similar proof shows
that M• (E) is a band in M∗ (E).
260 10. Signed and Complex measures

Example 10.4.8. Theorem 10.15 is of special interest when G consists of a single n ∈ M.

In this case, for any m ∈ M+ we have
_
(10.17) m|| = m ∧ (k|n|).
k∈N
W
Notice that m′ := ∧ (k|n|) ∈ (n) and (m − m′ ) ∧ |n| ∈ (G)+ . Thus
k∈N m
_
m′ ≤ m′ + (m − m′ ) ∧ |n| = m ∧ (|n| + m′ ) = m ∧ |n| + m ∧ (k|n|)
k
_ _
= m ∧ (|n| + m) ∧ ((k + 1)|n|) ≤ m ∧ (k|n|) = m′ .
k k

Therefore m⊥ =m− m′ ∈ {n}⊥ and m′ = m|| .

Corollary 10.4.9. If m, n ∈ M∗ (E) and m = m|| + m⊥ , where m|| ∈ (n) and m⊥ ⊥ n,
then m|| , m⊥ ∈ M∗ (E).

Proof. Suppose m ≥ 0. Then, as M∗ (E) is a band of M(E), we have that B := {m∧(k|n|) :

W
k ∈ N} ⊂ M∗ (E) and B ∈ M∗ (E). Then (10.17) implies that m|| ∈ M∗ (E), and so
m⊥ = m − m|| ∈ M∗ (E). The general m ∈ M∗ (E) the conclusion follows from the result for
M+ (E) applied to m+ and m− , and the uniqueness of the Riesz decomposition.
Example 10.4.10. Any m ∈ M(E) has a unique decomposition as m = m∗ + mc where
m∗ ∈ M∗ (E) and m∗ ⊥ mc . If mc 6= 0 then m fails to be σ–continuous in which case, m is
called a charge. (M∗ (E))⊥ contains purely finite additive elementary integrals which are
called pure charges.
Definition 10.4.11. Let M be a order complete vector lattice. We say that m ∈ M is
absolutely continuous with respect to n ∈ M if m ∈ (n). This is denoted by m ≪ n.

Let E σ = {h ∈ ERΣ : ∃φ ∈ E, |h| ≤ φ}. It is clear that E ⊂ E σ ⊂ L1 (|m|) for any signed
elementary integral m on E. E σ contains all sets of the form {φ > r} where φ ∈ E and r > 0
for 1{φ>r} ≤ φr+ . As E is a ring lattice closed under chopping, so is E σ by Lemma 5.6.5.
Theorem 10.4.12. Suppose (E, m) is a signed elementary integral over a ring lattice E
closed under chopping. Then
(10.18) |m|(h) = sup{|m(ψ)| : ψ ∈ E σ , |ψ| ≤ h}
σ

for any h ∈ E +
.
Remark 10.4.13. In (10.18) it is understood that m(ψ) stands for the value at ψ of the
extension of m to all L1 (|m|).

Proof. For each h ∈ E σ + let ν(h) denote value of the right hand side of (10.18). As E σ is
a ring lattice closed under chopping, (E σ , m) is a signed elementary integral whose variation
is given by ν. If ψ ∈ E σ and |ψ| ≤ h ∈ E σ + , then |m(ψ)| ≤ |m|(|ψ|) ≤ |m|(h), and so
ν(h) ≤ |m|(h). Hence, ν is finite. On the other hand, for all ψ ∈ E+
|m|(ψ) = sup{|m(φ)| : φ ∈ E, |φ| ≤ ψ} ≤ sup{|m(φ)| : φ ∈ E σ |φ| ≤ ψ} = ν(ψ).
10.4. The space of elementary integrals 261

Consequently ν and |m| coincide on E+ , and so the Daniel means k kν and k k|m| associated
to ν and to |m| respectively coincide on E+ . Therefore, by Lemma 7.6.1, ν(h) = khkν =
khk|m| = |m|(h) for all h ∈ ERΣ .
Theorem 10.4.14. (Hahn) Let m, n ∈ M∗ (E).
(i) m ⊥ n iff any set B ∈ E σ admits a partition {Bn , Bm } ⊂ E σ such that |m|(Bn ) =
0 = |n|(Bm ).
(ii) m ≪ n iff for any set N ∈ E σ , |n|(N ) = 0 implies |m|(N ) = 0.

Proof. (i) If every set B ∈ E σ admits the decomposition stated above then, from (10.4),
|m| ∧ |n|(B) = 0 and so |m| ∧ |n| ≡ 0. Conversely, suppose m ⊥ n and let B be a set in E σ .
Without loss of generality suppose m and n are positive. By (10.4), for each k ∈ N there
σ such that
exists a pair of functions ψk , φk ∈ E+
(10.19) 1B = ψk + φk
and
m(ψk ) + n(φk )| = kψk k∗m + kφk k∗n ≤ 2−k .
Then 0 ≤ ψk ≤ 1 converges to 0 in k k∗m -mean and k k∗m –a.s and the same conclusion holds
for φk with k k∗n in place of k k∗m . By (10.19) the set C where (ψk ) converges coincides with
the set where (φk ) converges. As Ω \ C ∈ ERΣ , 1C ψk and 1C φk belong to E σ . Let
Bn = {lim inf 1C ψk > 0}, Bm = B \ Bn .
k
Then Bn , Bm ∈ E+ σ and B ⊂ {lim inf 1 φ > 0}. Since m(B ) = 0 = n(B ), B
m k C k n m m and
Bn provide the desired decomposition.
W
(ii) If m ≪ n then |m| = |m||| = k|m| ∧ (k|n|). As {|m| ∧ (k|n|) : k ∈ N} is increasingly
directed, |m|(ψ) = supk |m| ∧ (k|n|) (ψ) for all ψ ∈ E σ ; therefore, if |n|(B) = 0 at some set
B ∈ E σ , then |m|(B) = 0. Conversely, suppose |m|(B) = 0 whenever 1B ∈ E σ with |n|(B) =
0. For any D ∈ E σ , let D1 , D2 be a disjoint partition of D so that |m|⊥ (D2 ) = 0 = |n|(D1 ).
Then |m|⊥ (D1 ) ≤ |m|(D1 ) = 0, and so |m|⊥ (D) = 0. Therefore, |m| = |m||| ∈ (n).
Remark 10.4.15. If µ, ν are positive measures on a measurable space (Ω, F ), then ν ≪ µ
iff for any A ∈ P
F , ν(A) = 0 whenever µ(A) = 0. Indeed, let E be the space of simple
n
functions φ = j=1 aj 1Aj such that n ∈ N, aj ∈ R, Aj ∈ F , and µ(Aj ) < ∞. As
elementary integrals on E, µ and ν are in M∗ (E). Any set A ∈ F with µ(A) < ∞ is in E σ .
The conclusion follows by Hahn’s theorem (ii).
Example 10.4.16. (Lebesgue decomposition) Suppose µ and ν are σ–finite measures on
(Ω, F ). Then there are unique measures νa and νs with νa ≪ µ and νs ⊥ µ such that
ν = νa + νs . It is enough to assume that ν(Ω) < ∞. Let Nµ the sets of all µ–negligible sets,
that is Nµ = {B ∈ A : µ(B) = 0}. Choose an increasing sequence {Bj : j ∈ N} ⊂ Nµ such
that
lim ν(Bj ) = sup{ν(B) : B ∈ Nµ }.
j
262 10. Signed and Complex measures

S
Let N = j Bj , and notice that µ(N ) = lim µj (Nj ) = 0, and µ(N ) = limj ν(Nj ) =
sup{ν(B) : B ∈ Nµ }. Then ν = νa + νs where νa (A) := ν(A \ N ) and νs (A) := ν(A ∩ N ).
Then νs ⊥ µ and (N, N c ) is the Hahn partition of Ω as in Hanh’s theorem (i). We claim
that that νa ≪ µ. To prove this it suffices to show that for any B ∈ F with B ⊂ N c and
ν(B) = 0, ν(B) = 0 holds. If this were not the case then N ∪ B ∈ Nµ , and
ν(N ∪ B) = ν(N ) + ν(B) > ν(N )
which is a contradiction. Uniqueness follows from Riesz’s decomposition. A more direct
proof follows from noticing that for any σ–finite measure ν, ν ≪ µ and ν ⊥ µ iff ν = 0.
Example 10.4.17. (Hahn–Jordan decomposition) For any m ∈ M∗ we know that m+ ⊥
m− . Hence, any set B ∈ E σ admits a partition {B− , B+ } ⊂ E σ such that m+ (B− ) = 0 =
m− (B+ ). It follows that for any k k∗|m| –integrable sets E ⊂ B− and F ⊂ B+ , m(E) ≤ 0
and m(F ) ≥ 0. If m is σ–finite, there there exists a partition {N, P } ⊂ E Σ of Ω such that
m(A) ≤ 0 and m(B) ≥ 0 for all k k∗m –integrable sets A ⊂ N and B ⊂ P .

10.5. Radon–Nikodym Theorem

Given a Stone lattice E ⊂ Bb (Ω), suppose n ∈ M∗+ (E) is σ–finite. For any g ∈ Lloc
1 (n) the
map ng : φ 7→ n(gφ) defines a σ–continuous elementary integral on E. From

n(gφ) ≤ n(|g||φ|), φ∈E
it follows that ng is of finite variation |ng | and |ng | ≤ n|g| . The map k kgn : f 7→ kf gk∗n on
Ω
R defines a mean for E and by Lemma 7.6.1,
(10.20) kf k∗|ng | ≤ kf k∗n|g| = kf gk∗n = kf kgn , f ∈ ERΣ

As we will show later, it turns out that ng = n|g| , k kn|g| = k kgn and ng (f ) = n(f g) for all
f ∈ L1 (|ng |).
Lemma 10.5.1. k k∗|ng | ≤ k k∗n|g| = k kgn

Proof. The inequality to the left follows directly from the definition of a Daniell mean.
To show the right hand side inequality we first show that it is enough to consider g ∈
ERΣ ∩ Lloc
1 (n). By Theorem 6.4.12, there exists a nondecreasing sequence (φk : k ∈ N) ⊂ E+
with supk φk ≡ 1. By Theorem 7.6.7, for each k ∈ N there is hk ∈ ERΣ such that
|g|1{φk > 1 } ≤ hk , |g|1{φk > 1 } = hk k k∗n|g| –a.s.
k k

Then h = lim inf k hk ∈ ERΣ and |g| = h k k∗n|g| –a.s. As

[ \
{h = ∞} = {hφk = ∞} = {h > k},
k k

1{h=∞} ∈ ERΣ and k1{h=∞} k∗n = 0; hence, γ = h1{h6=∞} ∈ ERΣ and kf gk∗n = kf γk∗n for all
Ω
f ∈ R . This proves our claim.
10.5. Radon–Nikodym Theorem 263

For the rest of the proof we will assume that g ∈ ERΣ ∩ Lloc 1 (n). As Daniell means are
maximal on E, k kgn ≤ k k∗n|g| . Since 1{|g|=0} ∈ ERΣ , k1{|g|=0} k∗n|g| = k1{|g|=0} gk∗n = 0. By
Ω
Lemma 7.6.5, for any f ∈ R with kf gk∗n < ∞, there exists h ∈ ERΣ such that |f g| ≤ h and
kf gk∗n = khk∗n . From
1{|g|>0}
1{|g|>0} |f | ≤ h ∈ ERΣ
|g|
we obtain that
∗
kf k∗n|g| = 1{|g|>0} |f | n ≤ kh1{|g|>0} k∗n ≤ khk∗n = kf gk∗n
|g|

Therefore, k k∗n|g| = k kgn .

Theorem 10.5.2. Let n be a positive σ–continuous σ–finite elementary integral on a Stone

lattice E ⊂ Bb (Ω). If g ∈ Lloc ∗
1 (k kn ) then, |ng | = n|g| .

Proof. By Theorem 7.4.4 and Lemma 10.5.1 f ∈ L1 (n|g| ) iff f g ∈ L1 (n). Thus, if (ψm :
m ∈ N) ⊂ E converges to f in L1 (n|g| ) then, (ψm g : m ∈ N) ⊂ L1 (n|g| ) converges to f g in
L1 (n|g| ). Consequently,
ng (f ) = lim ng (ψm ) = lim n(ψm g) = n(f g), f ∈ L1 (n|g| ).
m m

Since φ1{g>0} ∈ L1 (n|g| ) whenever φ ∈ E+ ,

(10.21) ng (φ1g>0 ) = n(φg1{g>0} ) ≤ |ng |(φ1g>0 )
(10.22) n−g (φ1g<0 ) = −n(φg1{g<0} ) ≤ |ng |(φ1g<0 ).
Adding (10.21) and (10.22) shows that
n(|g|φ) ≤ |ng |(φ).
Therefore, |ng | = n|g| .
Theorem 10.5.3. (Radon–Nikodym theorem) Let E ⊂ Bb (Ω) be a Stone lattice. Suppose
n is a positive σ–continuous elementary integral on E (n ∈ M∗+ (E)). Assume n is σ–finite.
For any m ∈ M∗ (E), the following statements are equivalent:
(i) m ≪ n.
(ii) k k∗m ≪ k k∗n .
(iii) There exists g ∈ Lloc
1 (k kn ) such that m = ng .

The function g such that m = ng is called Radon–Nikodym derivative of m with respect

to n, which is denoted by g = dm
dn . If g ∈ L1 (n), then ng is of finite total variation and
∗
kng kT V = kgkn .

Proof. Let (φk : k ∈ N) ⊂ E+Sbe an increasing sequence such that supk φk = 1. Then
Ak = {φk > k1 } ∈ E σ and Ω = k Ak . If k1B k∗n = 0, there is a subset N ∈ ERΣ such that
1B ≤ 1N and 1B = 1N k k∗n –a.s. Hence, N ∩ Ak ∈ E σ .
264 10. Signed and Complex measures

P
The implication (i) implies (ii) follows from k1B k∗m ≤ k1N k∗m ≤ k k1N ∩Ak k∗m and Hahn’s
theorem (ii). The implication (ii) implies (i) is also a consequence of Hahn’s theorem (ii).

Suppose that (i) holds. As 0 ≤ m+ , m− ≤ |m|, it is enough to W

assume that m ≥ 0. For each
k ∈ Z+ let mk = m ∧ (kn). Then mk ≤ kn for all k and m = k mk . Since
(10.23) |mk (φ)| ≤ mk (|φ|) ≤ k kφk∗n , φ ∈ E,
mk admits a unique extension to L1 (k k∗n ). Since n is assume to be σ–finite, the Riesz–
representation theorem (Theorem (8.4.3)) implies that there exists a k k∗n –a.s. unique
gk ∈ L∞ (k k∗n ) such that
mk (f ) = n(gk f ), f ∈ L1 (k k∗n ).
As 0 ≤ mk−1 ≤ mk , the functions gk can be chosen so that 0 ≤ gk−1 ≤ gk . If g = supk gk
then, by monotone convergence,
n(gφ) = sup n(gk φ) = sup mk (φ) = m(φ) < ∞, φ ∈ E+ .
k k

This shows that g ∈ Lloc ∗

1 (k kn ) and m = ng .

Ω
If (iii) holds then, by Theorem 10.5.2 and Lemma 10.5.1, kf k∗|m| = kf gk∗n for all f ∈ R .
Hence (ii) holds. If g ∈ L1 (n), then 1 ∈ L1 (n|g| ) and kgk∗n = k1kn|g| = kng kT V .
Corollary 10.5.4. If n ∈ M∗ (E) is σ–finite, then m ∈ M∗ (E) admits a unique decompisi-
tion of the form
(10.24) m = m|| + m⊥ = ng + m⊥
||
where m|| ≪ n, m⊥ ⊥ n and g = dm loc
dn ∈ L1 (n). We refer to (10.24) as the Radon–
Nikodym decomposition of m with respect to n.

Proof. Riesz’s theorem provides a unique decomposition m = m|| +m⊥ where m|| ≪ n, and
m⊥ ⊥ n. By Corollary 10.4.9 µ|| ∈ M∗ (E). The conclusion follows from Radom–Nikodym’s
theorem[(iii)].

10.6. Some application of Radon–Nikodym theorem

Here we present a few applications of the Radon–Nikodym theorem.

10.6.1. Coupling. A coupling between probability measures µ and ν on (Ω, F ) is a

measure Γ on (Ω × Ω, F ⊗ F ) with µ(dx) = Γ(dx × Ω) and ν(dy) = Γ(Ω × dy). Let C(µ, µ)
be the collection of all couplings between µ and ν. Assume ∆ = {(x, x) : x ∈ Ω} ∈ F ⊗ F
(this is possible if F is countable generated for instance). For any Γ ∈ C(µ, ν) and A ∈ F
we have that
Z Z

µ(A) − ν(A) = 1A (x) − 1A (y) Γ(dx × dy) ≤ 1A×Ω + 1Ω×A dΓ,
Ω×Ω ∆c
10.6. Some application of Radon–Nikodym theorem 265

R
whence we conclude that |µ − ν|(A) ≤ ∆c (1A×Ω + 1Ω×A ) dΓ. In particular,
(10.25) kµ − νkT V ≤ 2 inf Γ(∆c )
Γ∈C(µ,ν)

Let g : x 7→ (x, x), ρ := (µ ∧ ν) ◦ g −1 , and set a := ρ(∆) = (µ ∧ ν)(Ω). Define

e := 1(a 6= 1) · (µ − ν)+ ⊗ (ν − µ)+ + ρ

Γ
1−a

Since µ(Ω) = 1 = ν(Ω), it follows that (µ ∧ ν) Ω = 1 − (µ − ν)− (Ω) and so, Γe ∈ C(µ, ν).
dµ ′ dν
Let λ := µ + ν =, f = dλ , and f = dλ . Then,
Z
1(x = y)(f (x) − f ′ (x))+ (f ′ (y) − f (y))+ λ(dx)λ(dy) = 0

e
and so, Γ(∆) = a. As a consequence,
Z Z

e c)
kµ − νkT V = |f − f | dλ = 2 1 − (f ∧ f ′ ) dλ = 2(1 − a) = 2Γ(∆
′

This, together with (10.25), shows that

(10.26) kµ − νkT V = 2 min Γ(∆c ).
Γ∈C(µ,ν)

It is very often the case in applcations of coupling in Probability theory, that the space
(Ω, F ) is a complete separable metric space with the Borel σ–algebra.

10.6.2. Change of variables in Rn . Here we are concern with smooth changes of vari-
ables in integrals with respect to Lebesgue measure in Rn .
Theorem 10.6.1. Suppose Ω is an open set in Rn and let G : Ω → G(Ω) be a diffeomor-
phism. If µ is a measure on B(Ω) and µ ≪ λ, where λ is Lebesgue’s measure on B(Ω),
then the induced measure µ ◦ G−1 ≪ λ and
d(µ ◦ G−1 ) f (G−1 (u))
(u) = f (G−1 (u))| det(G−1 )′ (u)| = ,
dλ | det G′ (G−1 (u))|
dµ
where f = dλ .

Example 10.6.2. Suppose (R2 , B(R2 ), µ) is a measure space such that µ is absolutely
dµ
continuous with respect Lebesgue measure λ2 on B(R2 ), and let f = dλ 2
. Consider the
transformation T : (x, y) 7→ (x/y, y) on Ω = {(x, y)|y 6= 0}. It is obvious that T is a
diffeomorphism of Ω to itself. We have that T −1 (u, v) = (uv, v), and det T ′ (x, y) = y1 .
Consequently, the measure µ ◦ T −1 ≪ λ and
d(µ ◦ T −1 )
(u, v) = |v|f (uv, v), (u, v) ∈ Ω
dλ
Similar conclusion is obtained if Ω = R × (0, ∞) or Ω = R × (−∞, 0).
It is now easy to show that the measure ν on Ω, B(Ω) induced by the the map T1 :
266 10. Signed and Complex measures

(x, y) 7→ xy , i.e. ν(du) := (µ ◦ T −1 ) du × (R \ {0}) , is absolutely continuous with respect
Lebesgue measure over the real line, and that
Z
dν
(u) = |v|f (uv, v) dv.
dλ1 R

Example 10.6.3. Let µ be the Gaussian distribution N (0, I2 ), i.e.,

1 − 1 (x2 +y2 )
µ(dx, dy) = e 2 dxdy.
2π
Let T be as in Example 10.6.2. Then, the induced measure ν = µ ◦ T1−1 on (R, B(R))
safisfies
Z Z
dν 1 − 21 (v 2 u2 +v 2 ) 1 ∞ − 1 v2 (1+u2 ) 1
(u) = |v|e dv = ve 2 dv = .
du 2π R π 0 π(1 + u2 )
This is the (Cauchy distribution) centered at 0.
Example 10.6.4. Consider the product µ of the Gaussian measure N (0, 1) on the real line
and the χ2r measure on the half space (0, ∞), that is,
1 1 2 1 r y
µ(dx, dy) = √ e− 2 x r y 2 −1 e− 2 1(0,∞) (y) dx dy
2π Γ r 22
2
√
The measure τr on (R, B(R) induced by the map T : R × (0, ∞) → R given by T (x, y) = √rx y
is called Student’s–t distribution of degree r centered at 0. We see now that τr ≪ λ1 .
Let g : R → R be any bounded measurable function. Then
Z Z Z ∞ √
1 1 rx 1 2 r y
g(t)τr (dt) = √ r g √ e− 2 x y 2 −1 e− 2 dy dx
R 2π Γ r 2 2 R 0 y
2
√ ⊤
The function G(x, y) = √rx y , y is a diffeomorphism from R × (0, ∞) to itself, and
q
JG (x, y) = yr . Consequently
Z Z ∞ √ Z Z ∞ √
rx − 1 x2 r −1 − y 1 rx 1 2 r−1 y
g √ e 2 y 2 e 2 dy dx = √ g √ e− 2 x y 2 e− 2 |JG (x, y)| dy dx
R 0 y r R 0 y

Z Z ∞ v u2
1 − 1+ r+1
=√ g(u)e 2 r
v 2 −1 dv du
r R 0
r+1

2 2 Γ r+1 Z
2 u2 − r+1
2
= √ g(u) 1 + du
r R r
We conclude that

r+1
dτr Γ 2 t2 − r+1
2
(t) = √ 1+ .
dt πrΓ 2 r r

Notice that Cauchy’s distribution is Student’s distribution with r = 1.

10.6. Some application of Radon–Nikodym theorem 267

10.6.3. Exponential distributions. A family of probability measures {Pθ : θ ∈ ∆},

∆ ⊂ Rk , on (X, B) dominated by a σ–finite measure ν is of exponential type if
Pθ
(x) = exp η ⊺ (θ)T (x) − ξ(θ) h(x)
dν
where h : (X, B) → ([0, ∞), B([0, ∞)), m m
m m
R T : (X, B) → (R , B(R )), η : (∆, B(∆)) →
(R , B(R )) and ξ is chosen so that X Pθ dν = 1, that is
Z
ξ(θ) = log exp(η ⊺ (θ)T (x))h(x) ν(dx)
X
If η(∆) is open, then the family {Pη : η ∈ η(∆)} with η = η(θ), θ ∈ ∆, is an family of
exponential type with natural parameter set η(∆).
Theorem 10.6.5. Suppose ν is a measure on (X, B) whose variation |ν| is σ–finite. Let
T be a B–measurable Rp –valued function on X. Define
n Z o
Ξ = η ∈ Rp : eη T (x) ν(dx) < ∞ .
⊺

X
R
Then Ξ is a convex subset of Rp , and the map Λ : η 7→ log eη·T (x) ν(dx) is convex on Ξ.
Suppose Ξo 6= ∅ an define
Z
G(z) := ez T (x) ν(dx), z = η + iβ ∈ Ξo + iRp .
⊺

Then G is analytic in Ξo + iRp and for any α ∈ Zp+ ,

Z Y n ⊺
α α
(10.27) ∂ G(z) = Tj j (x) ez T (x) ν(dx)
j=1

Proof. Convexity of Ξ follows directly by an application of Hölder’s inequality. Let η1 , η2 ∈

Ξ and 0 < t < 1. Then
Z Z t Z 1−t
tη1 ·T (x) (1−t)η2 ·T (x) η1 ·T (x) η2 ·T (x)
e e ν(dx) ≤ e ν(dx) e ν(dx) <∞

For the last statement it suffices to consider the case p = 1. Let µ = ν ◦ T −1 . Fix
z = η + iβ ∈ Ξo + iR. For δ > 0 small enough, |h| ≤ δ implies that z + h ∈ Ξo + iR, and so
eth − 1 eδ|t| − 1 eδt + e−δt

≤ ≤ .
h δ δ
R
As (e(η+δ)t + e(η−δ)t )µ(dt) < ∞, we obtain from dominated convergence that G is analytic
R
and G′ (z) = T (x)ezT (x) ν(dx). Equation 10.27 follows by repeating the same argument as
above with T n (x) ν(dx) in place of ν(dx).
Corollary 10.6.6. Suppose {Pθ : θ ∈ ∆} is a family of exponential type with natural
parameter η = η(θ) and Ξ = η(∆) ⊂ Rp open. Then
Z
0= Dθ fθ (x) ν(dx)
X
268 10. Signed and Complex measures

10.7. Uniformly continuous families of measures

The following results relate convergence of measures in total variation with absolute conti-
nuity of measures with respect a σ–finite measure.
Theorem 10.7.1. (Scheffé) Let {µ, µn }n be finite measures (positive) such that limn µn (Ω) =
µ(Ω). Let ν be a σ–finite measure and suppose that µn , µ ≪ ν. If dµ dµ
dν → dν ν–a.s. then,
n

kµn − µkT V → 0.

Proof. If µ(Ω) = 0, then the proof is trivial since kµn kT V = µn (Ω) → 0.

′
If µ(Ω) > 0, let µ′n = µµ(Ω) µn so that that µ′n (Ω) = µ(Ω). Hence, if fn = dµ dµ
dν and f = dν ,
n
R R n (Ω) R
then (f − fn )+ dν = (f − fn )− dν = 21 |f − fn | dν = 21 kµ − µ′n kT V . Since (f − fn )+ ≤ f ,
dominated convergence implies that kµ′n − µkT V → 0. As
µ(Ω)

kµn − µ′n kT V ≤ − 1 sup kµn kT V → 0,
µn (Ω) n

we conclude that kµn − µkT V → 0.

Lemma 10.7.2. Let (Ω, F , µ) be a measure space. For any finite signed or complex measure
µ on (Ω, F ), ν ≪ µ iff for any ε > 0 there is δ > 0 such that A ∈ F and µ(A) < δ implies
that |ν(A)| < ε.

Proof. Sufficiency: Without loss of generality assume ν is a finite signed measure. Suppose
that for any ε > 0, there is δ > 0 such that |ν(A)| < ε whenever A ∈ F and µ(A) < δ.
If µ(E) = 0 then |ν(E)| < ε for all ε > 0; consequently ν(E) = 0. If (P, N ) is a Hahn
decomposition of ν, then ν+ (E) = ν(E ∩P ) = 0 = ν− (E) = ν(E ∩N ); therefore, |ν|(E) = 0.

0 for which there is a sequence {An } ⊂ F with

Necessity: Suppose that there exist εT> S
µ(An ) < 2−n but |ν(An )| ≥ ε. If A = n m≥n Am , then µ(A) = 0; however,
[
∞ > |ν|(Ω) ≥ |ν|(A) = lim |ν|( Am ) ≥ lim inf |ν|(An ) ≥ lim inf |ν(An )| ≥ ε.
n n n
m≥n

Theorem 10.4.14(ii) implies that ν is not absolutely continuous w.r.t. µ.

A collection G of complex measures on (Ω, F ) is uniformly continuous w.r.t. µ if

for any ε > 0, there exists δ > 0 such that µ(A) < δ implies supν∈G |ν(A)| < ε.
Lemma 10.7.3. A collection G of complex measures on (Ω, F ) is uniformly continuous
w.r.t. a measure µ iff Ga = {|ν| : ν ∈ G} is uniformly continuous w.r.t. µ.

Proof. Sufficiency is obvious as |ν(A)| ≤ |ν|(A) for all A ∈ F . To prove necessity, assume
without loss of generality, that all the elements in G are signed measures of finite total
variation. For each ν ∈ G let (Pν , Nν ) be a Hahn decomposition of ν. Then, for ε > 0, there
is δ > 0 such that if µ(A) < δ then supν∈G |ν(A)| < ε. Then µ(A∩Pµ )∨µ(A∩Nµ ) ≤ µ(A) < δ
implies ν(A ∩ Pν ) ∨ (−ν(A ∩ Nν )) = ν+ (A) ∨ ν− (A) < ε for all ν ∈ G. This means that
supν∈G |ν|(A) ≤ 2ε.
10.7. Uniformly continuous families of measures 269

Theorem 10.7.4. (Vitali–Hahn–Saks) Let (Ω, F ) be a measurable space, and let µn be

a sequence of finite signed (or complex) measures on F converging to µ setwise, that is,
µB := limn µn (B) exists in R (or C) for each B ∈ F .
(i) If µn is an increasing sequence of measures, then µ is a measure.
(ii) If µn ≪ ν for some σ–finite measure ν in F , then µ is a finite signed (or complex)
measure and µ ≪ ν. Moreover, {µn , µ} is uniformly continuous with respect to ν.

Proof. (i) The limit set function µ is clearly a monotone finitely additive function in F with
µ(∅) = 0. For any pairwise disjointPsequence {An } ⊂ F with union P∞ A, the monotonicity and
n
additivity of µ imply that µ(A) ≥ k=1 µ(Ak ) for all n. Thus k=1 P µ(Ak ) ≤ µ(A).
P On the
other hand, for any c < µ(A), there isPN such that c < µN (A) = k µN (Ak ) ≤ k µ(Ak ).
Letting c ր µ(A), we obtain µ(A) ≤ ∞ k=1 µ(Ak ). It follows that µ is countably additive.

(ii) It suffices to assume that ν is a probability measure. Indeed, for any measurable
P 2−k
partition {Ek : k ∈ N} of Ω such that 0 < ν(En ) < ∞, the measure ν ′ = k ν(Ek ) 1Ek dν
is equivalent to ν. We may replace ν with ν ′ .

As µ is finitely additive, to prove σ–additivity it is enough to show hat for whenever An ց ∅

and An ∈ F , limn µ(An ) = 0. Identify sets A, B ∈ F whenever ν(A △ B) = 0, and
let F ∗ be the corresponding set of equivalence classes. The completeness of L1 (Ω, F , ν)
and Lemma 8.6.9 imply that d(A, B) := ν(A △ B) defines a complete metric in F ∗ . By
Lemma 10.7.2, each µn is a continuous function in (F ∗ , d). Being {µn : n ∈ N} a pointwise
convergent sequence of continuous functions, for each ε > 0 and k ∈ N

Fk (ε) = {B ∈ F ∗ : sup µk (B) − µn+k (B) ≤ 3ε }
n≥1
S
is closed in(F ∗ , d),
and F∗ = k Fk (ε). Baire’s category theorem applied to (F ∗ , d) implies
that for some k0 , Fk0 (ε) has nonempty interior. Consequently, for some B0 ∈ F ∗ and η > 0
the ball D(B0 ; η) ⊂ Fk0 (ε). If B ∈ F ∗ and ν(B) < η then ν(B0 △(B ∪B0 )) = ν(B \B0 ) < η,
and ν(B0 △ (B0 \ B)) = ν(B0 ∩ B) < η. Thus B1 = B ∪ B0 and B2 = B0 \ B belong to
Fk0 (ε) and, since B = B1 \ B2 ,

|µk (B)| ≤ |µk0 (B)| + µk0 (B) − µk (B)

= |µk0 (B)| + µk0 (B1 ) − µk (B1 ) + µk0 (B2 ) − µk (B2 )
≤ |µk0 (B)| + 23 ε
for all k ≥ k0 . Since |µℓ | ≪ ν for ℓ ≤ k0 , there is 0 < η ′ ≤ η such that supk |µk (B)| ≤ ε when-
ever ν(B) < η ′ . Putting things together we obtain that limν(B)→0 |µ(B)| ∨ supk |µk (B)| = 0.
This shows that µ is σ–additive, that {µn , µ} is uniformly continuous w.r.t. ν.

The following corollaries are direct consequence of the Vitali–Hahn–Saks theorem.

Corollary 10.7.5. If {µn } is a setwise convergent sequence of finite signed or complex
measures in (Ω, F ). Then the limiting function µ on F is a finite signed or measure, and
{µ, µn : n ∈ N} is uniformly continuous w.r.t a probability measusure on (Ω, F ).
270 10. Signed and Complex measures

P 2−k
Proof. Apply Vitali–Hahn–Saks theorem with k 1+kµk k |µk | in place of ν.

We conclude this section with a result that is useful in the foundations of Statistics.
Theorem 10.7.6. (Halmos–Savage) Let P be a family of complex measures (or finite signed
measures), not all of them zero, on a measurable space (Ω, F ). Suppose ν is a σ–finite
measure on (Ω, F ) such that µ ≪ ν for all µ ∈ P. Then, there exists a probability measure
m ≪ ν on (Ω, F ) such that sup{|µ(A)| : µ ∈ P} = 0 iff m(A) = 0.

Proof. By considering the collection {|µ|/kµkT V : µ ∈ P, µ 6≡ 0} of normalized variations

on P, we may assume that P is a collection of probability measures. Since ν is σ–finite,
P also assume that ν is finite. Let Pc be the
we may P collection of probability measures
P of the
form n≥1 cn µn such that µn ∈ P, cn > 0 and n cn = 1. Observe that if m = n cn µn ,
P dµn
then dm
dν = n cn dν .

Consider the collection C of sets C ∈ F for which there is µC ∈ P with µC (C) > 0 and
dµC
dν +1SC > 0 ν–a.s. Choose P
c a sequence {Cn } ⊂ C such that ν(Cn ) ր supC∈C ν(C). Define
C0 = n≥1 Cn and let m = n 2−n µn , where µn is a choice for µCn . Clearly m(C0 ) > 0
and dm
dν + 1C0 > 0; hence, C0 ∈ C . It is clear that sup{µ(A) : µ ∈ P} = 0 implies that
c

m(A) = 0. To prove the converse implication, suppose that m(A) = 0. It follows from
X Z
−n dµn
0 = m(A ∩ C0 ) ≥ 2 dν ≥ 0
n A∩Cn dν

that ν(A ∩ C0 ) = 0; hence, µ(A ∩ C0 ) = 0 for all µ ∈ P. We will show that µ(A ∩ C0c ) = 0
whenever µ ∈ P. Set B = { dµ c
dν > 0} and notice that µ(A) = µ(A∩B∩C0 ). If µ(A∩B∩C0 ) >
c

0, then dµ
+1 c > 0 by definition of B. Hence, A ∩ B ∩ C ∈ C and
c
dν A∩B∩C0c 0

ν C0 ∪ A ∩ B ∩ C0c > ν(C0 ) = sup ν(C),
C∈C
which is a contradiction. Therefore, µ ≪ m ≪ ν for all µ ∈ P.

10.8. Exercises
Exercise 10.8.1. Let µ be a finitely additive function in F with values in R. Suppose
that limk µ(Ak ) = 0 for any sequence of measurable sets Ak ց ∅. ShowS that µ is a signed–
measure. (Hint: If {An : n ∈ N} is a sequence of pairwise disjoint sets, m≥n Am ց ∅.)
Exercise 10.8.2. Show that (M, ≤) is a partially order vector space, that is, ≤ is a
partial order and for any r ∈ [0, ∞) and n, m, k ∈ M, n ≤ m implies rn ≤ rm and
n + k ≤ m + k.
Exercise 10.8.3. Suppose that m, n, k, l ∈ M and r ∈ [0, ∞). Show that
(a) k + (m ∧ n) = (k + m) ∧ (k + n) and k + (m ∨ n) = (k + m) ∨ (k + n).
(b) r(m ∧ n) = (rm) ∧ (rn) and r(m ∨ n) = (rm) ∨ (rn).
(c) (m ∧ n) ∧ k = m ∧ (n ∧ k) and (m ∨ n) ∨ k = m ∨ (n ∨ k)
10.8. Exercises 271

(d) m ∧ n + m ∨ n = m + n.
(e) |m + n| ≤ |m| + |n|.
(f) If m ≤ n and k ≤ l, then m ∧ k ≤ n ∧ l and m ∨ k ≤ n ∨ l.
Exercise 10.8.4. Using (10.13) and equation (10.4) show that

(10.28) m ∧ n(ψ) = inf m(φ1 ) + n(φ2 ) : φi ∈ E+ , φ1 + φ2 = ψ

for any ψ ∈ E+ .
Exercise 10.8.5. Let (Ω, B) be a measurable space. For any pair of finite signed measures
µ and ν define
1
µ∨ν = 2 (µ + ν + |µ − ν|)
1
µ∧ν = 2 (µ + ν − |µ − ν|).
Show that µ ∨ ν and µ ∧ ν are finite signed measures such that:
(a) For all F ∈ B,

µ ∨ ν (F ) = sup µ(E) + ν(F \ E) : B ∋ E ⊂ F ,

µ ∧ ν (F ) = inf µ(E) + ν(F \ E) : B ∋ E ⊂ F .
(b) µ ≤ µ ∨ ν, ν ≤ µ ∨ ν, and if τ is a signed measure such that µ ≤ τ and ν ≤ τ , then
µ ∨ ν ≤ τ.
(c) µ ≥ µ ∧ ν, ν ≥ µ ∧ ν, and if λ is a signed measure such that µ ≥ λ and ν ≥ λ, then
µ ∧ ν ≥ λ.
In particular, µ+ = µ ∨ 0 and µ− = (−µ) ∨ 0 = −(µ ∧ 0).
Exercise 10.8.6. Suppose µ, ν are signed measures of finite variation on a measure space
(Ω, F ). If µ, ν ≪ λ for some measure σ–finite measure λ, and f = dµ ′ dν
dλ and f = dλ , show
that d|µ−ν|
dλ = |f − f ′ |, and d(µ∧ν)
dλ = f ∧ f ′.
Exercise 10.8.7. Let µ and ν be probability measures on (Ω, F ). Show that
1
kµ − νkT V = (µ − ν)+ (Ω) = (µ − ν)− (Ω) = 1 − µ ∧ ν (Ω) = sup µ(A) − ν(A)|.
2 A∈F

Thus, kµ − µkT V ≤ 2. Show that kµ − νkT V = 2 iff µ ⊥ ν (Hint: Use Hahn–Jordan

decomposiiton for µ − ν.)
Exercise 10.8.8. Let G is a non empty subset of a vector lattice M. Show that
(a) The intersection IG of all ideals containing G is an ideal.

Pn∈ IG iff there exits λ1 , . . . , λn ∈ R+ and x1 , . . . , xn ∈ G such that |x| ≤

(b) x
k=1 λk |xk |.
(c) If M is order complete, the intersection BG of all bands containing G is a band.
272 10. Signed and Complex measures

Exercise 10.8.9. Let m, n ∈ M∗ (E) and suppose there exists a sequence (φk ) ⊂ E with
supk φk = 1. If m ⊥ n, then there exists a partition {A1 , A2 } ⊂ E Σ of Ω such that
|m|(A1 ) = 0 = |n|(A2 ). (Hint: Ω is the countable union of sets in E σ .)
Exercise 10.8.10. Let m be a σ–finite elementary integral over a ring lattice closed under
chopping E ⊂ Bb (Ω). For any countable collection F ⊂ L0 (E, m), show that there is an
elementary integral n ≪ m such that F ⊂ L1 (E, n).
Exercise 10.8.11. Suppose µ is a measure on (R, B(R)). The map G : x 7→ x2 induces a
measure on ([0, ∞), B([0, ∞))). If µ ≪ λ, show that ν = µ ◦ G−1 ≪ λ and
dν 1 √ √
(t) = √ (f (− t) + f ( t))1(0,∞) (t)
dλ 2 t
1 2
dµ √1 e− 2 x
where f = dλ . In particular, if µ(dx) = 2π
dx, we obtain the χ21 –measure
1 −t/2
χ21 (dt) = √ e 1(0,∞) (t) dt.
2πt
Exercise 10.8.12. (Box–Muller) Let µ = 1(0,1)2 · λ2 , where λ2 is Lebesgue’s measure on
(R2 , B(R2 )). Let T : (0, 1)2 → R2 defined by
p p ⊺
T (u1 , u2 ) = −2 log(u1 ) cos(2πu2 ), −2 log(u1 ) sin(2πu2 )

Show that T is a diffeomorphism from (0, 1)2 to R2 \ R+ × {0} . Conclude that the induced
measure µ ◦ T −1 is the normal distribution on R2 with mean 0 amd covariance matrix I2
(the two–by–two identity matrix).
Exercise 10.8.13. Let λ2 is Lebesgue’s measure on (R2 , B(R2 )). Let
1 1
D := {(u1 , u2 ) : u1 , u2 > 0, u1α + u2β < 1}
for α, β > 0 and let c = λ1 (D). Let µ := 1c 1D (u1 , u2 ) · λ2 . Notice that D ⊂ (0, 1)2 . Let
1
u1α
X(u1 , u2 ) = 1 1
u1α + u2β
Show that the induced law Bα,β := µ◦X −1 is absolutely continuous with respect to Lebesgue
measure λ1 on R, and
dBα,β Γ(α + β) α−1
(x) = x (1 − x)β−1 1(0,1) (x)
dλ1 Γ(α)Γ(β)
This is the beta distribution with parameters α and β (Hint: See Example 9.6.11).
Exercise 10.8.14. Let µ and ν be measures on (R, B(R) with µ ≪ ν. Suppose that
µ((−∞, c]) < µ(R) < ∞ and consider the map T : x 7→ x ∧ c. For the induced measure
µ ◦ T −1 . Show that dµ ◦ T −1 = f 1(−∞,c) dν + µ([c, ∞))dδc , where f = dµ
dν .
Exercise 10.8.15. The condition ν(Ω) < ∞ in Lemma 10.7.2R is necessary, as the following
exercise shows. Consider (R, B(R), λ) and define ν(A) := A |x|dx. Show that ν ≪ λ,
however for no ε > 0 does there exist δ > 0 such that A ∈ B(R), λ(A) < δ implies ν(A) < ε
Chapter 11

Differentiation

In this section we apply the results on the previous sections to the case of Borel σ–finite
measures in Rd. In particular, we extend the Fundamental Theorem of Calculus to the
setting of Lebesgue integration.

11.1. Derivatives of Measures in Rd .

Suppose that µ is a Borel (complex or signed σ–finite) measure on Rd , and let λ denote the
Lebesgue measure. For any x ∈ Rd we want to compare the measure µ with respect λ at
small sets around x.
Definition 11.1.1. The symmetric derivative at x of µ is defined by
µ(B(x; r))
(11.1) Dµ (x) = lim ,
rց0 λ(B(x; r))
whenever the limit exists.

One way to study the existence of (11.1) at a point x is to compare the variation measure
|µ| with the Lebesgue measure through a maximal ratio.
Definition 11.1.2. Hardy’s maximal function Mµ of µ at x is given by
|µ|(B(x; r))
(11.2) Mµ (x) = sup
r>0 λ(B(x; r))
Lemma 11.1.3. The map x 7→ Mµ (x) is lower semicontinuous.

Proof. Without loss of generality we assume that µ ≥ 0. For any t > 0 we show that
Et = {Mµ > t} is open. For x ∈ Et , there is r > 0 such that µ(B(x; r)) = pλ(B(x; r)) with
p > t. Choose δ > 0 small enough so that
(r + δ)d < rd p/t

273
274 11. Differentiation

Observe that if y ∈ B(x; r), then B(x; r) ⊂ B(y; r + δ). By translation invariance
µ(B(y; r + δ)) ≥ µ(B(x; r)) = pλ(B(x; r))
pr d
= (r+δ)d
λ(B(y; r + δ)) > tλ(B(y; r + δ))

Therefore, B(x; r) ⊂ Et .

The next result is a covering Lemma that depends on the properties of Lebesgue mea-
sure.

Lemma 11.1.4. Let W be the union of a finite collection of open balls B(xi ; ri ), i =
1, . . . , N . Then, there exist S ⊂ {1, . . . , N } such that
(a) The balls B(xj ; rj ), j ∈ S are pairwise disjoint;
(b) W ⊂ ∪j∈S B(xj ; 3rj );
P
(c) λ(W ) ≤ 3d j∈S λ(B(xj ; rj )).

Proof. List the balls Bj = B(xj ; rj ) so that r1 ≥ . . . ≥ rN . We first choose Bj1 = B1

and then eliminate all balls with j > 1 that intersect Bj1 . If there any balls left, let Bj2
be the first of the remaining balls which do not intersect Bj1 . Now, we eliminate all balls
with j > j2 that intersect Bj2 . If there are any balls left, then let Bj3 be the first of the
remaining balls which do not intersect Bj2 ; and so on. This gives S = {j1 , j2 , . . . , jk }.

By construction, (a) holds. Observe that in any metric space, if r′ ≤ r and B(x; r′ ) ∩
B(y; r) 6= ∅, then B(x; r′ ) ⊂ B(y; 3r). Thus, (b) follows. Finally, (c) follows from (b) and
the the dilation property of λ: λ(B(0; ar)) = ad λ(B(0; r)).

Theorem 11.1.5. (Hardy–Littlewood) If µ is a complex Borel measure on B(Rd )), then

for any t > 0
3d
(11.3) λ({Mµ > t}) ≤ kµkT V
t
where kµkT V = |µ|(Rd ) is the total variation of µ.

Proof. Let K ⊂ {Mµ > t} compact. Any x ∈ K is the center of a ball Bx with |µ|(Bx ) >
tλ(Bx ). By compactness, there is a finite subcover of K by balls Bx ’s; Lemma 11.1.4 implies
the existence of a finite sub collection of pairwise disjoint balls {Bj : j = 1 . . . , N } such that
N
X N
X
3d 3d
λ(K) ≤ 3d λ(Bj ) ≤ t |µ|(Bj ) ≤ t kµkT V
j=1 j=1

Since Lebesgue measure λ is regular, (11.3) follows by taking the supremum over all compact
K ⊂ {Mµ > t}.

Theorem 11.1.6. Suppose f ∈ Lp (Rd , B(Rd ), λ).

11.1. Derivatives of Measures in Rd . 275

(i) If p = 1, then

3d
(11.4) λ({M f > t}) ≤ kf k1
t
(ii) If 1 < p ≤ ∞, then M f ∈ Lp and there is a constant Cp such that

(11.5) kM f kp ≤ Cp kf kp

Proof. Since M f = M |f |, it suffices to assume that f ≥ 0. Statement (i) is a direct

application of Hardy–Littlewood’s theorem with µ(dx) = f (x) dx. If p = ∞ then
Z Z

1
f (y)dy ≤ λ(B(x;r))
1
|f (y)|dy ≤ kf k∞ .
λ(B(x;r))
B(x;r) B(x;r)

Therefore kM f k∞ ≤ kf k∞ . If 1 0, let

gt = f 1{f >ct} , ht = f 1{f ≤ct} .

Clearly ht ∈ L∞ and, from Chebyshev’s and Hölder’s inequalities, we also have that gt ∈ L1 .
Hence, M f ≤ M gt + M ht ≤ M gt + ct, and so, {M f > t} ⊂ {M gt > (1 − c)t}. By Hardy–
Littlewood’s theorem,
Z
3d 3d
λ({M f > t}) ≤ λ({M gt > (1 − c)t}) ≤ kgt k1 = f (x) dx
t(1 − c) t(1 − c) {f >ct}

An application of Fubini’s theorem shows

Z ∞ Z
p 3d p p−2
kM f kp ≤ 1−c t f (x) dx dt
0 {f >ct}
Z Z f /c Z
3d p 3d pc1−p
= 1−c f (x) tp−2 dt dx = (p−1)(1−c) f p (x) dx
Rd 0 Rd

This proves all the statements in this Theorem. The constant Cp in this case can be chosen
to be minimal by letting c = p/(p − 1) = q; this gives Cp ≈ (3d epq)1/p .

A function f is said to be locally integrable, denoted by f ∈ L1loc (Rd , λd ), if f 1E ∈

L1 (Rn , λn ) for each measurable bounded set E. A point x ∈ Rn is called a Lebesgue point
of f ∈ L1loc if
Z
1
(11.6) lim |f (y) − f (x)| λ(dx) = 0.
r→0 λ(B(x; r)) B(x;r)

The following result states that almost every point in Rd is a Lebesgue point of f .
dµ
Theorem 11.1.7. If µ ≪ λ and f = dλ , then Dµ exists λ–a.s and f = Dµ λ–a.s.
276 11. Differentiation

Proof. It suffices to assume that f ∈ L1 (Rd , B(Rd ), λ). For each f ∈ L1 (λ) define the maps
Tr f , with r > 0, and T f by
Z
1
(Tr f )(x) = |f (y) − f (x)| λ(dx)
λ(B(x; r)) B(x;r)
(T f )(x) = lim sup Tr (x).
rց0

We will show that T f = 0 λ–a.s.

Clearly T f (x) = 0 if f is continuous at x. Recall that C00 (Rd ) is dense in L1 (λ). For any
n choose a continuous function gn with kf − gn kL1 (λ) < n1 . Let hn = f − gn and let Mhn be
the maximal function of the measure λhn = hn dλ. Since (Tr f )(x) ≤ (Tr hn )(x) + (Tr gn )(x)
and (Tr hn )(x) ≤ Mhn (x) + |hn |(x), we conclude that
(T f )(x) ≤ Mhn (x) + |hn |(x).
Observe that {T f > 2t} ≤ {Mhn > t} ∪ {hn > t}; then, by Hardy–Littlewood’s theorem
and Markov–Chebyshev’s,
3d +1
λ({T f > 2t}) ≤ tn .

Therefore λ({T f > 2t}) = 0 for all t > 0, and T f = 0 λ–a.s.

The following result shows that the symmetric derivative singular measures with respect
to Lebesgue is null.
Theorem 11.1.8. If µ ⊥ λ, then Dµ = 0 λ–a.s.

Proof. It suffices to assume µ ≥ 0 and kµkT V = µ(Rd ) < ∞. Define

µ(B(x; r))
D̄µ (x) := lim sup .
rց0 λ(B(x; r))

Since µ ⊥ λ, there is a set E such that λ(E) = 0 = µ(Rd \ E). Given ε > 0, there is,
by regularity, a compact K ⊂ E such that µ(K) > kµkT V − ε. If µ1 (·) = µ(· ∩ K) and
µ2 (·) = µ(· ∩ K c ), then kµ2 k < ε and D̄µ1 (x) = Dµ1 (x) = 0 for any x ∈ K c . Hence
D̄µ (x) = D̄µ2 (x) ≤ Mµ2 (x).
Therefore, {D̄µ > t} = ({D̄µ > t} ∩ K) ∪ ({D̄µ > t} ∩ K c ) ⊂ K ∪ {D̄µ2 > t}. Since
λ(K) ≤ λ(E) = 0, Hardy–Littlewood’s lemma implies that
3d 3d
λ({D̄µ > t}) ≤ t kµ2 k < t ε.

Letting ε ց 0 gives λ({D̄µ > t}) = 0 for all t > 0. We conclude that Dµ exists and
Dµ = D̄µ = 0 λ–a.s.
Corollary 11.1.9. Let µ = µa + µs = f dλ + µs be the Radon–Nikodym decomposition of
a complex or signed measure µ in B(Rd ). Then Dµ exists and Dµ = f λ–a.s.
11.2. The fundamental theorem of Calculus 277

Remark 11.1.10. In the case µ ≪ λ, open balls B(x; r) can be replaced by other types
of sets whose Lebesgue measures are proportional to those of a ball. For instance, we
can consider sets E(x; r) ⊂ B(x; r) for which there is a fixed number a > 0 such that
λ(E(x; r)) ≥ aλ(B(x; r)). In such case,
Z Z
1 a
|f (y) − f (x)| λ(dy) ≤ |f (y) − f (x)| λ(dy)
λ(E(x; r)) E(x;r) λ(B(x; r)) B(x;r)

11.2. The fundamental theorem of Calculus

A real valued function
P F on R is absolutely continuous P if for any ε > 0 there is δ > 0
such that whenever N b
j=1 j − a j < δ, where a j < b j , then N
j=1 |F (bj ) − F (aj )| < ε. It is
easy to check that if f ∈ L1 (R, λ) and
Z x
F (x) = f dλ,
−∞
then F is absolutely continuous and λ–a.s. differentiable, with F ′ (x) = f (x) at every
point x where F ′ exists. Indeed, let Er+ (x) = [x − r, x] and Er− (x) = [x, x + r]. Then
λ(Er± (x)) = 21 λ(B(x; r)); so, by Remark 11.1.10 and Theorem 11.1.7 F ′ (x) = f (x) almost
everywhere. This observation and is converse implication are known as the fundamental
theorem of Calculus.
Theorem 11.2.1. Suppose that f : [a, b] → R is non decreasing and absolutely continuous.
Then f is differentiable a.s., f ′ ∈ L1 ([a, b], λ) and
Z x
(11.7) f (x) − f (a) = f ′ (t) dt
a

Proof. Let µ the unique measure on B([a, b]) such that µ((x, y]) = f (y) − f (x). The
absolute continuity of f means that µ ≪ λ since µ([a, b]) = f (b) − f (a) < ∞. Let g be
the Radon–Nikodym derivative of µ w.r.t. λ. By Theorem 11.1.7we have that Dµ = g a.s.
Since
Dµ (x) = lim µ((x−h,x]) µ((x,x+h])
λ((x−h,x]) = lim λ((x,x+h]) = g(x),
h→0 h→0
we have that f is differentiable λ–a.s. and that
Z
f (x) − f (a) = µf ((a, x]) = f ′ (t) dt
(a,x]

Let f : [a, b] → C. The variation function Vf : [a, b] → R of f is defined by

nX n o
Vf (x) = sup |f (tj ) − f (tj−1 )| : a = t0 < . . . < tn = x, n ∈ N
j−1

If Vf (b) < ∞, we say that f is a function of finite variation on [a, b]. It is easy to verify
that a function that is absolutely continuous on an interval [a, b] is automatically of finite
variation.
278 11. Differentiation

Lemma 11.2.2. Suppose that f : [a, b] → R is of finite variation. Then Vf , Vf + f and

Vf − f are non decreasing. If f is absolutely continuous, then so is Vf

Proof. Let a < x < y < b. For any partition a = t0 < . . . < tn = x we have
n
X
|f (tj ) − f (tj−1 )| + |f (y) − f (x)| ≤ Vf (y)
j=1

Hence Vf (x) + |f (y) − f (x)| ≤ Vf (y). Therefore Vf (x) + f (x) − f (y) ≤ Vf (y) and Vf (x) +
f (y) − f (x) ≤ Vf (y).

Suppose that
P f is absolutely continuous. P Then, given ε > 0, there is δ > 0 such that
whenever j b j − a j < δ, we have j |f (b j ) − f (aj )| < ε/2. For each interval [aj , bj ],
choose a partition Pj = {tjk } ⊂ [aj , bj ] such that
X
ε
Vf (bj ) − Vf (aj ) − 2j+1 < |f (tj,k ) − f (tj,k−1 )|
k
P P P
Since j k (tj,k − tj,k−1 ) = − aj ) < δ, we conclude that
k (bj
X XX
|Vf (bj ) − Vf (aj )| < 2ε + |f (tj,k ) − f (tj,k−1 )| < ε.
j j k

Theorem 11.2.3. (Fundamental Theorem of Calculus I) Suppose that f : [a, b] → C is

absolutely continuous. Then, f ′ exists λ–a.s., f ′ ∈ L1 ([a, b], λ), and
Z x
(11.8) f (x) − f (a) = f ′ (t) dt, a ≤ x ≤ b.
a

Proof. Observe that f = V − (V − f ). Since f is absolutely continuous then so are V

and V − f . Since V and V − f are increasing, we apply Theorem 11.2.1 to V and (V − f )
separately. The result follows immediately.

Example 11.2.4. If f is a Lischitz function in [a, b], then it is clearly is of bounded vari-
antion and absolutely continuous. Then f is differentiable λ–a.e., f ′ ∈ L1 ([a, b]) and (11.8)
holds.

A function f in [a, b] is nearly differentiable if f ′ exits in [a, b] except for a countable

set C ⊂ [a, b].

Theorem 11.2.5. (Fundamental Theorem of Calculus, II) Let f : [a, b] → C be continuous.

If f is nearly differentiable on [a, b] and f ′ ∈ L1 ([a, b], λ) then,
Z x
f (x) − f (a) = f ′ (t) dt, a ≤ x ≤ b.
a
11.3. Integration by parts in R 279

Proof. Let C be the at most countable set where f ′ does not exists. We extend f ′ in [a, b]
by setting fR′ (x) = 0 for xR ∈ C. Since f ′ ∈ L1 ([a, b]), there is a l.s.c. g on [a, b] such that
f ′ ≤ g and [a,b] g(t) dt < [a,b] f ′ (t) dt + ε. For any η > 0, define
Z
Fη (x) = g(t) dt − (f (x) − f (a)) + η(x − a), a ≤ x ≤ b
[a,x]

By adding a small constant to g if necessary, we can assume that f ′ < g. The lower
semicontinuity of g implies that for every x ∈ [a, b] \ C there is δx > 0 such that
f (t) − f (x)
g(t) > f ′ (x), < f ′ (x) + η
t−x
for all t ∈ (x, x + δx ). Hence,
Z
Fη (t) − Fη (x) = g(s) ds − (f (t) − f (x)) + η(t − x)
[x,t]
> f (x)(t − x) − (f ′ (x) + η)(t − x) + η(t − x) = 0
′

We claim that Fη is strictly increasing. Suppose that for some a ≤ x1 < x2 ≤ b we have
Fη (x1 ) > Fη (x2 ). For each Fη (x2 ) < y < Fη (x1 ) define
xy = sup{x ∈ [x1 , x2 ] : Fη (x) ≥ y}
The continuity of Fη implies that Fη (xy ) = y and x ∈ C. Since C is countable, we reach a
contradiction.
As Fη (a) = 0, we have that Fη (x) > 0 for all a < x ≤ b. Letting η → 0 gives
Z Z
f (x) − f (a) ≤ g(t) dt < f ′ (t) dt + ε,
[a,x] [a,x]
and since ε is arbitrary, we conclude that
Z
f (x) − f (a) ≤ f ′ (t) dt
[a,x]
The inverse inequality follows by taking −f instead of f .

11.3. Integration by parts in R

A real–valued function F on an interval I is said to be of locally finite variation if F
has finite variation on every compact interval [a, b] ⊂ I. The following result shows how to
integrate functions of finite variation with respect to signed measures.
Theorem 11.3.1. (Integration by parts) Let F and G be right–continuous functions of
locally finite variation on I. Then, for any compact interval [a, b] ⊂ I,
Z Z
(11.9) F (t) µG (dt) = F (b)G(b) − F (a)G(a) − G(t−) µF (dt)
(a,b] (a,b]

where G(t−) = limsրt G(s) and µG , µF are the signed measures induced by G and F
respectively.
280 11. Differentiation

Proof. By Lemma 11.2.2 we can assume without loss of generality that G and F are
nondecreasing. Let µG and µF be the unique Borel measures on (a, b] such that µG ((α, β]) =
G(β) − G(α), µF ((α, β]) = F (β) − F (α) for all (α, β] ⊂ I, −∞ < α < β < ∞. By Fubini’s
theorem,
Z
(F (b)−F (a))(G(b) − G(a)) = µF ⊗ µG (dt, ds)
(a,b]×(a,b]
Z Z Z Z
= µF (dt) µG (ds) + µG (ds) µF (dt)
(a,b] (a,s] (a,b] (a,t)
Z Z
= F (s) µG (ds) + G(t−) µF (dt) − F (a)µG ((a, b]) − G(a)µF ((a, b]).
(a,b] (a,b]

Simplifying terms leads to (11.9).

Denoting by ∆G(t) = G(t) − G(t−) the size of the jump of G at t, we can express (11.9)
as
Z Z X
F (t) µG (dt) = F (b)G(b) − F (a)G(a) − G(t) µF (dt) + ∆G(t)∆F (t)
(a,b] (a,b] a<t≤b

Remark 11.3.2. (Differential notation) The integration by part formula (11.9) is com-
monly written as
d(F G) = F dG + G− dF
where G− (t) = G(t−) and dG stands for the measure Lebesgue–Stieltjes measure µG .
Example 11.3.3. The function f (x) = sinx x 1(0,∞) (x) is not integrable. However,
Z x
sin t π
lim dt =
x→∞ 0 t 2
R∞ P 1
Indeed, first notice that 0 |f (t)| dt ≥ 2 n n = ∞.

As for the second statement, notice that F (x, y) = e−xy sin x is integrable over any region
of the form {0 < x ≤ a, y > 0. By Fubini’s theorem
Z aZ ∞ Z a Z ∞Z a
sin x
e−xy sin x dy dx = dx = e−xy sin x dx dy
0 0 0 x 0 0
Integrating by parts we obtain
Z a Z a
e−xy sin x dx = −e−ay cos a + 1 − ye−xy cos x dx
Z a0 Z a 0
ye−xy cos x dx = ye−ay sin a + y 2 e−xy sin x dx
0 0
Collecting and rearranging all terms gives
Z a
1
e−xy sin x dx = 2
1 − e−ay cos a − ye−ay sin a
0 1+y
11.3. Integration by parts in R 281

Hence
Z a Z a Z a
sin x π e−ay ye−ay
dx = − cos a dy − sin a dy
0 x 2 0 1 + y2 0 1 + y2
The conclusion follows by letting a → ∞.
Example 11.3.4. Suppose F is a right–continuous function, has local finite variation on
I = [0, ∞) and that inf t∈[a,b] |F (t)| > 0 for any [a, b] ⊂ I. Then 1/F is also right–continuous
and locally finite variation on I. Applying (11.9) with G = 1/F we obtain
1 1
0=Fd + dF
F F−
The uniqueness of the Radon–Nikodym derivative implies that
1 1
d =− dF
F F (t)F (t−)
Example 11.3.5. If G is a continuous function of locally finite variation then,
(11.10) dGn = nGn−1 (t) dG
for each n ∈ Z+ . For n = 1 this is evidently true. By induction assume that equation (11.10)
holds for n ≥ 1. Then, an application of (11.9) implies
d(Gn+1 ) = G(t) dGn + Gn (t)dG = nGn (t) dG + Gn (t) dG
= (n + 1)Gn (t) dG.
A simple consequence of (11.10) is
deG(t) = eG(t) dG(t)
for any nonnegative right–continuous function of local total variation on I.
Lemma 11.3.6. Suppose G is right–continuous nondecreasing in the interval [0, T ) (0 <
T ≤ ∞). Then, for any n ∈ N
Z Z
n−1 Gn (t) − Gn (0)
G (s−)µG (ds) ≤ ≤ Gn−1 (s)µG (ds)
(0,t] n (0,t]
n−1
for all 0 < t < T . (In differential notation, nG− dG ≤ dGn ≤ nGn−1 dG.)

Proof. For n ∈ N, Gn is right–continuous an nondecreasing and so, the associates Lebesgue–

Stieltjes measure µGn is nonnegative. Repeated application of integration by parts gives
n−1 n−1 n−2
dGn = G− dG + G dGn−1 = G− dG + G(G− dG + G dGn−2 )
n−1 n−2
= (G− + GG− + . . . + Gn−1 ) dG
in differential notation. As G(s−) ≤ G(s) for all 0 < s ≤ T , we conclude that
n−1
nG− dG ≤ dGn ≤ nGn−1 dG
282 11. Differentiation

Recall that the set I of discontinuities of a right–continuous monotone nondecreasing

function F on an interval (a, ∞) is countable. The functions
X
FI (t) = ∆F (x)
x∈J∩(a,t]
Fc (t) = F (t) − FI (t)
are right–continuous and monotone nondecreasing. While Fc is continuous, FI increases
only at discontinuity points of F and ∆F (x) = ∆FI (x) for all x > a. The measure µFc
associated with F is the continuous part of µF , the measure µFI := µF − µFc is supported
on I and µFI ({x}) = ∆F (x) for every x ∈ (a, ∞).
Theorem 11.3.7. (Exponential Formula) Let F be a right–continuous monotone nonde-
creasing function in [0, ∞) and let µF be the unique measure on (0, ∞) such that µ (a, b] =
F (b) − F (a). Let {xj : j ∈ N} be the sequence of all discontinuities of F . If v ∈ Lloc
1 (µF )
then, for any number H0 ≥ 0 the function
Z Y
(11.11) H(t) = H0 exp v(x)µFc (dx) (1 + v(xj )∆F (xj ))
(0,t] 0<xj ≤t

is the unique solution in t ≥ 0 of the integral equation

Z
(11.12) H(t) = H(0) + H(x−)v(x)µF (dx)
(0,t]

satisfying kH1(0,t] ku < ∞ for all t > 0.

Proof. As v ∈ Lloc loc

1 (µF ), v ∈ L1 (µFI ), and so
X
kv1(0,t] kL1 (µF ) = |v(xj )|∆F (xj ) < ∞.
I
0<xj ≤t

Consequently H is bounded on each compact subinterval of [0, ∞). Let

Y
G1 (t) = H0 (1 + v(xj )∆F (xj ))
0<xj ≤t
Z
G2 (t) = exp v(x)µFc (dx) .
(0,t]

G1 is right–continuous pure jump function of bounded variation which changes only at xj ;

moreover,

∆G1 (xj ) = G(xj ) − G(xj −) = G(xj −) 1 + v(xj )∆F (xj ) − G(xj −)
= G(xj −)v(xj )∆F (xj ).
G2 is a continuous monotone nondecreasing function and
Z
µG2 (dx) = exp v(y)µFc (dy) v(x)µFc (dx)
(0,x]
= G2 (x)v(x)µFc (dx).
11.3. Integration by parts in R 283

Applying the integration by parts formula to H(t) = G1 (t)G2 (t) gives

Z Z
H(t) − H(0) = G1 (x−)µG2 (dx) + G2 (x)µG1 (dx)
(0,t] (0,t]
Z X
= G1 (x−)G2 (x)v(x)µFc (dx) + G2 (xj )G1 (xj −)v(xj )∆F (xj )
(0,t] 0<xj ≤t
Z Z
= H(x−)v(x)µFc (dx) + H(x−)v(x)µFI (dx)
(0,t] (0,t]
Z
= H(x−)v(x)µF (dx).
(0,t]
It remains to prove uniqueness. Suppose
R H1 and H2 are two solutions and set D = H1 −H2 .
Let M := kD1(0,t] ku and Λ(t) = (0,t] |v(x)|µF (dx). Then,
Z Z
|D(t)| ≤ |D(x−)||v(x)|µF (dx) ≤ M |v(x)|µF (dx) = M Λ(t).
(0,t] (0,t]

As Λ is nondecreasing and right continuous, |D(x−)| ≤ M Λ(x−). By Lemma 11.3.6

Z Z
M 2
|D(t)| ≤ M Λ(x−)|v(x)|µF (dx) = M Λ(x−)µΛ (dx) ≤ Λ (t).
(0,t] (0,t] 2
M n
Continuing by induction we obtain |D(t)| ≤ n! Λ (t). Letting n → ∞ gives |D(t)| = 0.
Example 11.3.8. Suppose µ is a probability measure on ((0, ∞), B((0, ∞)) and let F (x) :=
µ(0, x]. The Integrated Hazard Function Q of µ is defined as
Z
1
Q(t) = µ(dx).
(0,t] 1 − F (x−)

The function S(t) := 1 − F (t) is a right–continuous monotone nonincreasing function. Q

is a right–continuous monotone nondecreasing function whose associated measure µQ ≪ µ
satisfies
∆F (x)
µQ ({x}) = ∆Q(x) =
S(x−)
1
µQc (dx) = µc (dx)
S(x−)
S(x−)µQ (dx) = µ(dx) = µF (dx).
Hence Q and F have the same points of discontinuity {xj : j ∈ I} and
Z Z
S(t) = 1 − F (t) = 1 − µ(dx) = S(0) − S(x−)µQ (dx)
(0,t] (0,t]

Taking v ≡ −1 shows that S is the unique solution to the equation above with kS1(0,t] ku =
1 < ∞. Therefore
Y
1 − F (t) = exp − Qc (t) (1 − ∆Q(xj )).
0<xj ≤t
284 11. Differentiation

As an application of integration by parts, we present a result that is useful in the study

of differential equations to control the growth of functions satisfying some integral bounds.
Theorem 11.3.9. Gronwall’s inequality Let α and β ≥ 0 be differentiable and continuous
functions on I := [a, ∞) respectively. If x is a function on I such that
Z t
(11.13) x(t) ≤ α(t) + β(s)x(s) ds
a
then
Z t Z t
x(t) ≤ α(t) + α(s)β(s) exp β(r) dr ds
a s
If in addition α is nondecreasing then,
Z t
x(t) ≤ α(t) exp β(s) ds
a

Suppose µ is a Borel measure on (0, ∞) and α ∈ Lloc loc

1 (µ). If x ∈ L1 (µ) satisfies
Z
(11.14) x(t) ≤ α(t) + x(s) µ(ds)
(0,t)

then,
Z
(11.15) x(t) ≤ α(t) + α(s) exp(µ(s, t)) µ(ds)
(0,t)

Proof. Set h(t) to be the right hand side of (11.13). By the fundamental theorem of
Calculus
ḣ(t) = α̇(t) + β(t)x(t) ≤ α̇(t) + β(t)h(t).
This implies that
Z t ′ Z t
exp − β(r) dr h(t) ≤ α̇(t) exp − β(r) dr
a a
Integrating over [a, t]
Z t Z t Z t
(11.16) h(t) ≤ α(a) exp β(r) dr + α̇(s) exp β(r) dr ds
a a s
Integration by parts leads to
Z t Z t
x(t) ≤ h(t) ≤ α(t) + α(s) exp β(r) dr ds.
a s
If α is non–decreasing, then α̇ ≥ 0 and, since β ≥ 0, (11.16) reduces to
Z t Z t Z t
x(t) ≤ h(t) ≤ α(a) exp β(r) dr + α̇(s) exp β(r) dr ds
a a a
Z t
≤ α(t) exp β(r) dr
a
11.4. Analytic functions 285

11.4. Analytic functions

A complex valued function f defined on an open set D ⊂ C is analytic at a ∈ D if for
some r > 0, f can be expressed as a power series in a ball B(a; r) ⊂ D.
p
Theorem 11.4.1. Given (an ) ⊂ C define 1/R := lim supn→∞ n |an |. For any a ∈ C, the
power series
∞
X
(11.17) f (z) = cn (z − a)n
n=0

converges absolutely and uniformly on any compact K ⊂ B(a; R) and diverges for all z with
|z − a| > R. The number R is called the radius of convergence of f .
p
Proof. Since lim sup n |cn (z − a)n | = |z−a|
R , the first and second statements follow from
Theorem A.1.4[i,ii]. If K ⊂ B(a; R) is compact, then K ⊂ B(a; r) ⊂ B(0; R) for some
0 < r < R. The last statement follows from the first one.

In the rest of this section we will use complex measures to derive several properties of
analytic functions. We start with the following fundamental result.
Theorem 11.4.2. Let µ be a complex measure on a measurable space (Ω, F ) and let ϕ be
a complex–valued measurable function on Ω. Suppose D ⊂ C is an open set which does not
intersect ϕ(Ω). Then, the map f : D → C given by
Z
µ(dω)
f (z) =
Ω ϕ(ω) − z

is analytic. Moreover, if the closed ball B(a; r) ⊂ D, then

∞
X
(11.18) f (z) = cn (z − a)n , z ∈ B(a; r)
n=0

where
Z
µ(dω) kµkT V
(11.19) cn = , |cn | ≤ , n ∈ Z+ .
Ω (ϕ(ω) − a)n+1 rn+1
If R is the radius of convergence of (11.18), then r ≤ R.

Proof. If B(a; r) ⊂ D, then q := inf ω∈Ω |ϕ(ω) − a| > r, and so

|z − a| |z − a| r
≤ ≤ < 1, ω ∈ Ω, z ∈ B(a; r).
|φ(ω) − a| q q
Hence, for any z ∈ B(a; r) fixed, the series
∞
X (z − a)n 1
ω 7→ n+1
=
(ϕ(ω) − a) ϕ(ω) − z
n=0
286 11. Differentiation

converges absolutely and uniformly in Ω. By dominated convergence

Z Z X∞ ∞
X
µ(dω) (z − a)n
f (z) = = n+1
µ(dω) = cn (z − a)n ,
Ω ϕ(ω) − z Ω (ϕ(ω) − a)
n=0 n=0
where the cn satisfy (11.19). The last statement follows from the estimate
r
pn 1 n kµkT V 1
lim sup |cn | ≤ lim =
n→∞ n→∞ r r r

Remark 11.4.3. If M := supw∈Ω |ϕ(w)| < ∞, then the function z 7→ f z1 is also analytic
at 0 and
1 X∞ Z
f =− ϕn (ω) µ(dω) z n+1
z Ω
n=0
1

for all z ∈ B 0; M . In this case, we say that f is analytic at infinity .
Example 11.4.4. Suppose that µ is a complex measure on S1 . Then
Z
w+z
F (z) = µ(dw)
S1 w − z
is analytic open unit disk B(0; 1) ⊂ C.

A complex valued function f defined on an open set D ⊂ C is holomorphic at a point

z0 ∈ D if the limit
f (z) − f (z0 )
lim := f ′ (z0 )
z→z0 z − z0
exits. In such case, f ′ (z0 ) is the derivative of f at z0 . If f is holomorphic at every point
of D, then we say that f is holomorphic on D and denote this fact by f ∈ H(D). If we
identify C as R2 , then we have the following result.
Lemma 11.4.5. (Cauchy–Riemann) For a function f : D → C, let u = Re(f ) and v =
Im(f ), so that f = u + i v. f is holomorphic at z0 iff f is differentiable as a function from
D ⊂ R2 to R2 and the Cauchy–Riemann equations
(11.20) ∂x u(z0 ) = ∂y v(z0 ) ∂y u(z0 ) = −∂x v(z0 )
hold. Thus, f ′ (z0 ) = ∂x u(z0 ) + i∂x v(z0 ) = ∂y v(z0 ) − i∂y u(z0 ).

Proof. Without loss of generality, suppose z0 = 0 = f (z0 ). For any z ∈ C we denote by

x and y its real and imaginary parts respectively. If f is differentiable at 0, then there
are complex numbers α and β (the partial derivatives at 0 of f with respect to x and y
respectively) such that
f (z) = αx + βy + η(z)
where η(z)/z → 0 as z → 0. Since 2x = z + z and 2iy = z − z, we have that
α − iβ α + iβ
f (z) = z+ z + η(z).
2 2
11.4. Analytic functions 287

Therefore,
f (z) α − iβ α + iβ z η(z)
= + +
z 2 2 z z
Thus, f is holomorphic at 0 only if α = −iβ which is equivalent to the Cauchy–Riemann
equations (11.20).
Conversely, If f is holomorphic at z0 , it is obvious that f is differentiable as a function in
the plane. The Cauchy–Riemann equations follow by comparing the real and imaginary
parts in
f (h) − f (0) f (ik) − f (0)
f ′ (0) = lim = lim
h→0 h k→0 ik
The following result shows that a function that is analytic around a point a ∈ D, is also
holomorphic at any point close enough to a.
Theorem 11.4.6. Suppose that the power series
∞
X
(11.21) f (z) = cn (z − a)n
n=0
converges in the inside the disk B(a; r), r > 0. Then, f is holomorphic and analytic in
B(a; r), f admits derivatives f (k) of any order k ∈ Z+ , all of which are holomorphic and
analytic in B(a; r). Moreover,
∞
X n!
(11.22) f (k) (z) = cn (z − a)n−k , z ∈ B(a; r),
(n − k)!
n=k
f (n) (a)
and cn = n! for each n ∈ Z+ .

Proof. We first show that (a) f is analytic at any point w ∈ B(a; r), and then that (b) f and
f ′ are analytic and holomorphic on B(a; r). For derivatives of order k > 1, the statement
will follow by applications of (a) and (b) inductively on f (k−1) . The last statement follows
by setting z = a in (11.22).
p
The convergence of the power series f in B(a; r) implies that r ≤ 1/ lim supn n |cn |. Since
√
limn n n = 1, we conclude that the power series (11.22) (k = 1) converges absolutely in
B(a; r). Let w ∈ B(a; r) and choose δ > 0 so that ρ := |a − w| + δ < r. Then, for any
z ∈ B(w; δ) we have that
X∞ X∞ X n X∞ X ∞
n n
cn (z − a) = cn (z − w)j (w − a)n−j = cn,j (z),
j
n=0 n=0 j=0 n=0 j=0

where cn,j (z) = nj cn (z − w)j (w − a)n−j 1[0,n] (j). Observe that if u = a + |a − w| + |w − z|,
then |u − a| < r and thus,
X ∞
∞ X ∞
X ∞
X
n
|cn,j (z)| = |cn | |z − w| + |w − a| = |cn |(u − a)n .
n=0 j=0 n=0 n=0
288 11. Differentiation

By Theorem A.2.7, for all z ∈ B(w; δ) we have that

∞ X
X ∞ ∞ X
X ∞ X ∞
∞ X
n
f (z) = cn,j (z) = cn,j (z) = cn (w − a)n−j (z − w)j
j
n=0 j=0 j=0 n=0 j=0 n=j
X∞
= bj (z − w)j < ∞,
j=0
P∞ n

bj = n=j j cn (w − a)n−j . This shows that f is analytic at w. From
X ∞
f (z) − f (w)
= b1 + bj (z − w)j−1
z−w
j=2

and the continuity of power series it follows that f is holomorphic at w and

X∞
′
f (w) = b1 = ncn (w − a)n−1
n=1

Example 11.4.7. The complex exponential function

∞
X 1 n
exp(z) := z
n!
n=0
extends the exponential function on R to C. The power series that defines the exponential
function has radius of convergence ∞. By Theorem 11.4.6, exp ∈ H(C) and exp′ = exp. It
is a simple exercise to verify that the the formula exp(z + w) = exp(z) exp(w) holds for all
z, w ∈ C. If z = x + iy ( x, y ∈ R), it is easy to show that
(11.23) exp(z) = ex (cos(y) + i sin(y)).
When x = 0, equation (11.23) is known as Euler’s formula. The complex trigonometric
iz −iz iz −iz
functions defined by cos(z) = e +e2 and sin(z) = e −e2i extend the the usual real
trigonometric functions to C.
Example 11.4.8. (Logarithmic branches) Given a real number θ0 and using polar coordi-
nates, every z ∈ C \ {0} can be expressed uniquely in the form z = reiθ = r(cos θ + i sin θ)
where r = |z| and θ ∈ [θ0 , θ0 + 2π). The angle θ is called argument of z, which we
denote by argθ0 (z). Set Ωθ0 = {z ∈ C : |z| > 0, θ0 < arg(z) < θ0 + 2π} and define
Lθ0 : Ωθ0 → R \ {0} × (θ0 , θ0 + 2π) by
z 7→ log(|z|) + i argθ0 (z),
where log is the usual logarithm function on the realline. Lθ0 is bijective function whose
inverse is the exponential function restricted to R\{0} ×(θ0 , θ0 +2π). Since exp ∈ H(R) and
exp′ = exp 6= 0, Lθ0 is differentiable as a function on the plane, and its derivative satisfies
the Cauchy–Riemann equations. Hence Lθ0 ∈ H(Ωθ0 ) and L′θ0 (z) = z1 for z ∈ Ωθ0 . The
function Lθ0 is called θ0 –branch of logarithm. The branch L−π is called the principal
branch of logarithm. When the branch of logarithm is clear from the context, we use log
to denote the function Lθ0 .
11.5. Cauchy formula 289

Example 11.4.9. (Complex powers) Let Lθ0 be the branch of logarithm defined on Ωθ0 =
{z ∈ C : |z| > 0, θ0 < arg(z) < θ0 + 2π}. For any α ∈ C, the complex power function pα
on Ωθ0 is defined as
pα : z 7→ z α := exp(αLθ0 (z)), z ∈ Ωθ0
Then, pα ∈ H(Ωθ0 ) and p′α (z) = αz α−1 on Ωθ0 . If α ∈ Z, then pα coincides with the usual
integer power function restricted to Ωθ0 .
P
Corollary 11.4.10. If f (z) = ∞ n ′
n=0 cn (z − a) for all z ∈ B(a; r) and f ≡ 0, then f ≡ c0 .

Proof. If f ′ ≡ 0, then ncn = 0 for all n ∈ N; hence, f (z) = c0 for all z ∈ B(a; r).

11.5. Cauchy formula

As we will see, the converse of Theorem 11.4.6 holds, that is, any holomorphic function f
in an open set D ⊂ C is analytic on D. To this end, we will make use of integration of
functions over paths. A path is a continuous map γ : [a, b] → C for which there are points
a = t0 < . . . < tn = b such that γ ∈ C 1 ([tk , tk+1 ]), k = 0, . . . , n − 1. We will often use γ ∗ to
denote γ([a, b]). The integral of a function f over γ is defined as
Z Z b
f := f (γ(t))γ ′ (t) dt.
γ a

The following result, based on Theorems 11.4.2 and 11.4.6, plays a very important role in
the theory of complex functions.
Theorem 11.5.1. Let γ be a closed path in the complex plane and D = C \ γ ∗ . The map
on D defined by
Z
1 dξ
Indγ (z) =
2πi γ ξ − z
is an integer valued function, constant on each connected component of D and 0 in the
unbounded component of D.

Proof. Let z ∈ D be fixed and let the interval [a, b] be the parameter domain of the closed
path λ. Consider the map
Z t γ ′ (s)
ϕ(t) = exp ds , t ∈ [a, b].
a γ(s) − z
We will show that φ(b) = 1. The fundamental theorem of calculus implies that
ϕ(t)γ ′ (t)
ϕ′ (t) = ,
γ(t) − z
which in turn, implies that
d ϕ
= 0.
dt γ − z
290 11. Differentiation

Consequently, the map ϕ/(γ −z) is a constant function over the interval [a, b]. In particular,
ϕ(b) ϕ(a) 1
= =
γ(b) − z γ(a) − z γ(b) − z
since ϕ(a) = 1 and γ(b) = γ(a). Therefore, ϕ(b) = 1 and thus, Ind(z) ∈ Z.

To prove the last statement, observe that Ind is analytic on D by Theorem 11.4.2; being an
integer valued function, it follows that Ind is constant on each connected component of D.
Since γ ∗ is compact, we can choose a ball large enough that contains it. The complement of
this ball is contained in one connected component of D; thus, D has a unique unbounded
component. Since
Λ(γ)

Ind(z) ≤ ,
dist(z, γ ∗ )
we conclude that Ind(z) = 0 for all z in the unbounded component of D.

Example 11.5.2. If γ is a positively oriented circle of radius r > 0 centered at a, then

1 if |z − a| < r
Indγ (a) =
0 if |z − a| > r

Indeed, consider the parameterization γ(t) = a + reit , with 0 ≤ t ≤ 2π. By Theorem 11.5.1
it is enough to consider z = a. Then,
Z Z 2π
1 dz r
Indγ (a) = = eit (reit )−1 dt = 1.
2πi γ z − a 2π 0
Lemma 11.5.3. If f is the derivative of a function F ∈ H(D), then
Z
f = 0.
ϕ
R
for any closed path φ in D. In particular, ϕ z n dz = 0 for all integer n 6= −1 and any
closed path ϕ in C \ {0} (in C \ {0} when n ≥ 0).

Proof. F = [F1 F2 ]⊺ is a differentiable function on D ⊂ R2 . An application of the chain

rule gives
d hd d i⊺ h i⊺
F ◦ϕ = (F1 ◦ ϕ) (F2 ◦ ϕ) = (∇F1 ) ◦ ϕ · ϕ′ (∇F2 ) ◦ ϕ · ϕ′
dt dt dt
= (f ◦ ϕ)ϕ′
The conclusion follows from the fundamental theorem of Calculus. In particular, F (z) =
z n+1 ′ n
n+1 , n 6= −1, is holomorphic on D = C \ {0} and F (z) = z .

Theorem 11.5.4. (Cauchy’s theorem Rfor a triangle) Let D be an open set in C and p ∈ D.
If f ∈ H(D \ {p}) and f ∈ C(D) then ∂△ f = 0 for all triangle △ ⊂ D.
11.5. Cauchy formula 291

Proof. Let A, B and C be the vertexes of the triangle △ := △0 and consider ∂△0 as the
piecewise linear curve that goes from A to B, from B to C and then from C to A.

Case (a) Assume first that p ∈/ △0 . Let C ′ , A′ and B ′ be the midpoints of the segments
AB, BC and CA respectively. By joining the midpoints with linear segments we divide the
triangle △0 in four congruent sub-triangles and obtain
Z X4 Z
f (z) dz = f (z) dz.
∂△ j=1 ∂△j

By the triangle inequality, there is at least one sub-triangle △1 such that

Z 1 Z

f (z) dz ≥ f (z) dz
∂△1 4 ∂△0

Applying the same argument to △1 in place of △0 and continuing by induction, we obtained

a sequence of triangles △n ⊂ △n−1 such that
Z 1
Z

(11.24) f (z) dz ≥ n f (z) dz
∂△n 4 ∂△0

Observe that 2−n diam(△0 ) = diam(△n ) ≤ Λ(∂△n ) = 2−n Λ(∂△0 ); hence, the intersection
T
n △n consists of a single point z0 ∈ △0 . Also, since f is holomorphic at z0 , given ε > 0,
there is δ > 0 such that
|f (z) − f (z0 ) − f ′ (z0 )(z − z0 )| < ε|z − z0 |
whenever |z − z0 | < δ. By Lemma 11.5.3, we obtain that for all n large enough
Z Z ε

(11.25) f (z) dz = f (z) − f (z0 ) − f ′ (z0 )(z − z0 ) dz ≤ Λ2 (∂△0 ) n .
∂△n ∂△n 4
R
Combining (11.25) with (11.24) and letting ε → 0 we obtain ∂△0 f (z) dz = 0.

Case (b) Assume p is one of the vertexes of △0 , say A. The continuity of f at p implies
for any ε > 0, there is that δ > 0 such that |f (z) − f (p)| < ε whenever |z − p| < δ. Let X
and Y be points on AB and AC within δ distance from A and consider the triangles AXY ,
XBC and CY X. From Part (a) we have that
Z Z Z

f (z) dz = f (z) dz = f (z) − f (p) dz ≤ 4δε
∂△0 ∂△AXY ∂△AXY
R
Therefore, ∂△0 f = 0.

Part R(c) Suppose p ∈ △o0 . By considering the triangles ABp, BCp and CAp, Part (b) shows
that ∂△0 f = 0.

Theorem 11.5.5. (Morera’s theorem) Suppose D is an R open convex subset in the complex
plane and let f be a continuous function in D. Then, ∂△ f = 0 for any triangle △ ⊂ D if
and only if there is F ∈ H(D) such that F ′ = f .
292 11. Differentiation

Proof. Sufficiency follows from Lemma 11.5.3. To prove necessity,

R let p ∈ D be fixed. The
convexity of D allows us to define the function F (z) = [p,z] f , where [p, z] denotes the
straight line segment from p to z. For w ∈ D fixed, Theorem 11.5.4 shows that
Z
F (z) − F (w) 1
− f (w) = f (ξ) − f (w) dξ.
z−w z − w [w,z]
The continuity of f shows for any ε > 0 there is δ > 0 such that |f (ξ) − f (w)| < ε for for
all |ξ − w| < δ. Thus, if |z − w| < δ, then
F (z) − F (w)

− f (w) < ε
z−w
This shows that F ∈ H(D) and F ′ (w) = f (w) for all w ∈ D.
Theorem 11.5.6. (Cauchy’ theorem in a convex set) Suppose D is an open convex subset
in the complex plane and let γ be a closed path in D. If f ∈ H(D), then
Z
1 f (ξ)
(11.26) f (z) Indγ (z) = dξ
2πi γ ξ − z
for all z ∈ D \ γ ∗ .

Proof. The function g on D defined by

(
f (ξ)−f (z)
ξ−z if ξ 6= z
g(ξ) =
f ′ (z) if ξ = z
satisfies the assumptions Cauchy’s theorem
R for a triangle. By Morera’s theorem, g = G′
for some G ∈ H(D); by Lemma 11.5.3, γ g = 0 for any closed path in D. The conclusion
follows from Theorem, 11.5.1.
Theorem 11.5.7. Let D be an open subset in the complex plane. f ∈ H(D) iff f is analytic
on D. Consequently, if f ∈ H(D) then, f ′ ∈ H(D) and for any a ∈ D and r > 0 such that
B(a; r) ⊂ D,
∞
X
(11.27) f (z) = cn (z − a)n z ∈ B(a; r)
n=0

where
Z
f (n) (a) 1 f (ξ)
(11.28) cn = = dξ,
n! 2πi γ (ξ − a)n+1
and γ is the positively oriented circle of radius r centered at a. Moreover,
Z
n! |f (ξ)| n!M
(11.29) |f (n) (a)| ≤ n+1
|dξ| ≤ n .
2π γ r r
If R is the radius of convergence of the series (11.27) then, r < R. The sequence of
inequalities (11.29) are known as Cauchy estimates.
11.5. Cauchy formula 293

Proof. Only necessity needs to be proved. For any a ∈ D let 0 < r < q be such that
B(a; r) ⊂ B(a; q) ⊂ D. Let γ be the positively oriented circle of radius r centered at a.
Applying Cauchy’s theorem on the convex set B(a; q) we obtain that
Z
1 f (ξ)
f (z) = dξ z ∈ B(a; r),
2πi γ ξ − z
since Indγ (z) = 1 for all z ∈ B(a; r). All conclusions follow from Theorem 11.4.2 and
Theorem 11.4.6.
R
Corollary 11.5.8. Suppose f ∈ C(D), where D is an open set in the plane. If ∂△ f (z) dz =
0 for any closed triangle △ ⊂ D, then f ∈ H(D).

Proof. Let B(a; r) ⊂ D. Then, by Morera’s theorem, f = F ′ on B(a; r) for some F ∈

H(B(a; r)) and, by Theorem 11.5.7, f = F ′ ∈ H(B(a; r)).
Corollary 11.5.9. If f ∈ H(B(a; R) \ {a}) is bounded, then limz→a f (z) = L exists and,
after setting f (a) = L, f ∈ H(B(a; R)).
Remark 11.5.10. Under the conditions of Corollary 11.5.9, the point z = a is said to be
a removable singularity .

Proof. Let h(z) = (z − a)2 f (z) if z ∈ D := B(a; R) and h(a) = 0. It is easy to check that
h ∈ H(B(a; R)) and that h′ (a) = 0. Hence h admits a power series expansion
X X
h(z) = cn (z − a)n = (z − a)2 cn+2 (z − a)n ,
n≥2 n≥0
P
whence it follows that f (z) = n≥0 cn+2 (z − a)n for all z ∈ D and limz→a = c2 . Setting
f (a) = c2 we obtain that f ∈ H(B(a; R)).
Theorem 11.5.11. Suppose {fn : n ∈ N} ⊂ H(D) converges to a function f uniformly on
compact subsets of D. Then f ∈ H(D) and fn′ also converges to f ′ uniformly on compact
subsets of D.

Proof. Since convergence is uniform on each compact disk contained in D, f is continuous

on D. Let △ be a triangle contained in D. By compactness
Z Z
0= fn (z) dz → f (z) dz,
∂△ ∂△
By Corollary 11.5.8 f ∈ H(D). Let K ⊂ D be a nonempty compact set. There is δ > 0
such that the compact set K δ = {x ∈ D : d(x, K) ≤ δ} is contained in D. Using Cauchy’s
estimates (11.29) for fn − f we obtain that
kfn − f ku,K δ
|fn′ (z) − f ′ (z)| ≤
r
for all z ∈ K. Therefore limn kfn′ − f ′ ku,K δ = 0.
Theorem 11.5.12. (Maximum modulus principle) If f ∈ H(U ) and f is not constant,
then for any B(a; r) ⊂ U , |f (a)| < maxz∈∂B(a;r) |f (z)|.
294 11. Differentiation

Proof. Suppose there is B(a; r) ⊂ U for which the opposite holds. From Cauchy’s formula
Z π
1
|f (a)| ≤ |f (a + reiθ )| dθ ≤ |f (a)|,
2π −π
P n
it follows that |f | ≡ |f (a)| in ∂B(a; r). If f (z) = n≥0 a0 (z − a) is the power series
expansion of f around a, we obtain by dominated convergence that
Z π
2 1
|f (a)| = |f (a + reiθ )|2 dθ
2π −π
Z π X 2
1 n inθ
= an r e
2π −π
n≥0
X
= |an |2 r2n .
n≥0

Hence an = 0 for all n ≥ 1, which means that f ≡ a0 = f (a) in B(0; r). As U is open and
connected, it follows that f is constant contradicting the assumption on f .

Remark 11.5.13. The behavior of an analytic function near the boundary of converges
may be very complicated as the following examples will demonstrate.
P P∞
(a) The power series ∞ n
n=0 z and
n 1
n=0 nz diverge at every point z ∈ S . At z = 1,
1
both series diverge to +∞. For z ∈ S \{1} the partial sums of each series oscillate.
P zn
(b) The power series ∞ 1
n=1 n converges at every point z ∈ S \ {1}. To see this, set
PN
SN = n=1 z n . Then, by summation by parts
−1
1
N
X N
X
zn 1 1 1
= SN − SM −1 − − Sn
n N M n+1 n
n=M n=M

Hence
XN
z n 2 1 1 2 1 1 4 1

≤ + + − ≤
n |1 − z| N M |1 − z| M N |1 − z| M
n=M

which is small for all M large enough.

P zn
(c) The power series ∞ 1
n=1 n2 converges at every point z ∈ S .
P
Theorem 11.5.14. Suppose n≥0 an z n has radius of convergence one. Then, there is a
sequence {zn : n ∈ N} ⊂ B(0; 1) =: U with |zn | → 1 along which f is bounded.

Proof. Assume the staement is false. Then, form any m ∈ N, there is n ∈ N such that if
1 − n1 < |z| < 1, |f (z)| > m. This implies that the number of zeroes of f in U is finite. Let
p be a polynomial with the same zeroes, including multiplicities, as f . Then g = fp ∈ H(U )
has no zeroes in U . It follows that limz→1 g(z) = 0. This contradicts the maximal modulus
principle.
11.5. Cauchy formula 295

Example 11.5.15. The power series

∞
X
f (z) = z n! .
n=0

has radius of convergence 1. Hence f ∈ H(U ) and has no analytical extension to any open
p p
set containing U . For any rational number m we have that along {r exp 2πi m : 0 ≤ r < 1},
limz→1 |f (z)| = ∞.

On the other hand, the power series

∞
X
z n!
h(z) = √
n=0 exp( n!)

has radius of convergence limn→∞ exp √1n! = 1. Thus h ∈ H(U ) and cannot be ex-
√
tended to any open set containing U . Since exp(− n!) ≤ e−n for n ≥ 4, we have that
P √ P eiθ
exp(− n!) < ∞. Hence, by dominated convergence lim h(re−iθ ) = √
exp( n!)
=: h̃(θ),
n r→1− n
and clearly h̃ ∈ C(S1 ). For another interesting example, see Exercise 11.9.12.

Corollary 11.5.16. If f ∈ H(D), where D is an open region in C, then f , u = Re(f ) and

v = Im(f ) are harmonic, that is, for all x ∈ D
△ f (x) = △ u(x) = △ v(x) = 0
2 + ∂2 .
where △ = ∂xx yy

Proof. f ∈ H(D) implies that f ∈ C ∞ (D). The conclusion follows from the Cauchy–
Riemann equations ux = vy , uy = −vx .

Example R 11.5.17. As in Example 11.4.4, for any complex measure µ on S1 , the function
eit +z
F (z) = S1 eit −z µ(d eit ) is analytic on B(0; 1). As linear combination of harmonic functions
R it
are harmonic, it follows that U (z) = S1 Re eeit +z −z
µ(d eit ) is harmonic on B(0; 1). The
it
kernel P (eit , z) = Re eeit +z
−z
is called the Poisson kernel on the unit disk.

A function f is said to be entire if f ∈ H(C). The following result is an immediate

consequence of Cauchy’s theorem.

Theorem 11.5.18. (Liouville’s theorem) If f is bounded entire function, then f is constant

Proof. Suppose that |f (z)| ≤ M for all z ∈ C. Cauchy estimates (11.29) implies that
n!M
|f (n) (0)| ≤ (n ∈ N).
rn
Letting r → ∞ gives f (n) (0) = 0 for all n ∈ N. Therefore, f (z) ≡ f (0).
296 11. Differentiation

Example 11.5.19. (Fundamental Theorem of Algebra) Every polynomial in C of degree

n ≥ 1 has a complex root. To verify this statement, suppose p is a polynomial of degree
n ≥ 1. It is easy to check that |p(z)| → ∞ as |z| → ∞. If p did not vanish at any point, then
f = 1/p would be an entire and bounded function. But then f would have to be constant,
which is not possible.
Theorem 11.5.20. Suppose D ⊂ C is an open and connected and suppose f ∈ H(D). Let
Z(f ) = {z ∈ D : f (z) = 0}. Then, either Z(f ) = D or Z(f ) has no limit points in D. In
the latter case, if a ∈ Z(f ) then there is an integer m = m(a), neighborhood V ⊂ D of a
and h ∈ H(D) such that
f (z) = (z − a)m h(z), z∈V
and h(z) 6= 0 for all z ∈ V .

Proof. Let A be the set of all limit points of Z(f ) in D. By continuity, Z(f ) is closed in
D and A ⊂ Z(f ). Being that A is closed in Z(f ), A is closed in D. The first statement will
follow if we show that A is open in D.

Each a ∈ D has a neighborhood Va′ of a where f admits representation

∞
X
f (z) = cn (z − a)n , z ∈ Va′ .
n=0
If all cn = 0, then f ≡ 0 on Va and Va ⊂ A. Hence, to show that A is open in D, it is
enough to show that if a ∈ A, then cn = 0 for all n. Assume that there is m ∈ Z+ such
that cm 6= 0 and cn = 0 whenever 0 ≤ n < m, then
∞
X
(11.30) f (z) = (z − a)m cn+m (z − a)n , z ∈ Va′ .
n=0
P∞
If h(z) = n=0 cn+m (z − a)n ,then h ∈ H(Va′ ). Since h(a) = cm 6= 0, there is a possibly
′
smaller neighborhood Va ⊂ Va of a on which h does not vanish. Hence a is an isolated point
of Z(f ), that is a ∈
/ A. This shows that A is indeed open in D.

In the case where A = ∅, the second staement follows from the representation (11.30)
when not all cn are zero.

The next result gives some conditions under which analytic functions in open domains
may be extended to larger domains.
Corollary 11.5.21. Let U and V be connected open sets in C and suppose f ∈ H(U ) and
g ∈ H(V ). If U ∩ V 6= ∅ and {z ∈ U ∩ V : f (z) = g(z)} admits a limit point in each
component of U ∩ V then,
f (z) if z ∈ U
h(z) :=
g(z) if z ∈ V
is a well defined function, and is only analytic function in U ∪ V whose restriction to U (or
to V ) equals to f (to g).
11.5. Cauchy formula 297

Proof. Theorem 11.5.20, f and g coincide on U ∩ V . Since U ∪ V is open and connected,

h is well a defined funciton in h ∈ H(U ∪ V ). Uniqueness follows by a another application
of Theorem 11.5.20.
S
Suppose γ1 , . . . , γn are paths in the plane and Γ∗ = nk=1 γk∗ . Each γk induces a linear
R
map γ̃k on C(Γ∗ ) given by f 7→ γk f . Define Γ̃ := γ̃1 +̇ . . . +̇γ̃n on C(Γ∗ ) by
Xn n Z
X
Γ(f ) = γ̃k (f ) = f.
k=1 k=1 γk

The objects Γ are called chains and if all γk are closed paths, then Γ is called cycle, If each
path γk is replaced by its opposite path (denoted formally by −γk ) given by t 7→ γk (b+a−t)
(t ∈ [a, b]), then resulting chain −Γ satisfies
Z Z
f =− f f ∈ C(Γ∗ )
−Γ Γ
If z ∈ C \ Γ∗ , then the IndΓ (z) is defined by
n
X
IndΓ (z) = Indγk (z).
k=1
Suppose D ⊂ C is a non-empty open set and γ, η are chains in D, i. e., γ ∗ ∪ η ∗ ⊂ D. If
Indγ (z) = 0 for all z ∈ C \ D, then γ is said to be homologous to 0 in D, denoted by
γ ∼ 0. If γ − η ∼ 0, then Indγ (z) = Indη (z) for all z ∈ C \ D; in such case, γ is said to be
homologous to η in D, denoted by γ ∼ η.
The following result extends theorem 11.5.6 to cycles homologous to 0.
Theorem 11.5.22. (General Cauchy’s theorem) Suppose f ∈ H(D) where D is a non–
empty open set in the complex plane. If γ is a cycle in D and γ ∼ 0, then
Z
1 f (w)
(11.31) f (z) Indγ (z) = dw z ∈ D \ γ∗
2πi γ w − z
and
Z
(11.32) f (w) dw = 0.
γ
If γ1 and γ2 are cycles in D and γ1 ∼ γ2 , then
Z Z
(11.33) f (w) dw = f (w) dw
γ1 γ2

Proof. Consider the function g : D × D → C given by

(
f (w)−f (z)
w−z if z 6= w
g(z, w) =
f ′ (z) if z = w
We claim that g ∈ C(D × D). It is enough to show that g is continuous at any point
(a, a) ∈ D × D. For ε > 0, there is r > 0 such that |f ′ (z) − f ′ (a)| < ε for all z ∈ B(a; r).
298 11. Differentiation

For all z, w ∈ B(a;

R r), the path ξ(t) = z + t(w − z), t ∈ [0, 1], is contained in B(a; r). Since
f (w) − f (z) = ξ f ′ (λ) dλ,
Z ′ Z 1
f (λ) − f ′ (a) ′

|g(z, w) − g(a, a)| = dλ = f (ξ(t)) − f ′ (a) dt < ε.
ξ w−z 0
The continuity of g follows. Define
Z
1
h(z) = g(z, w) dw z ∈ D.
2πi γ
Identity 11.31 will follow by showing that h(z) = 0 for all z ∈ D\γ ∗ . The uniform continuity
of g on compact subsets of D along with the inequality
Z
′ 1
|h(z) − h(z )| ≤ |g(z, w) − g(z ′ , w)| d|w|
2π γ
shows that h is continuous on D. An application of Fubini’s theorem shows that for any
closed triangle △ ⊂ D,
Z Z Z
1
(11.34) h(z) dz = g(z, w) dz dw.
∂△ 2πi γ ∂△

By Corollary 11.5.8, the map z 7→ g(w, z) is holomorphic on D for all w ∈ D fixed and so,
the integral in parenthesis in (11.34) is zero. Hence h ∈ H(D) by Corollary 11.5.8.

Let D1 = {z ∈ C \ γ ∗ : Indγ (z) = 0}. Then, D1 contains the unbounded component of

C \ γ ∗ and, by assumption, the complement of D. Define
Z
1 f (w)
h1 (z) = dw z ∈ D1 .
2πi γ w − z
Clearly h1 ∈ H(D1 ) and h(z) = h1 (z) for all z ∈ D ∩ D1 . Since D1 contains C \ D,
there is a function ϕ ∈ H(C whose restriction to D is h and whose restriction to D1 is
h1 . Since Indγ (z) = 0 on the unbounded component V of C \ γ ∗ , D1 contains V and so,
lim|z|→∞ ϕ(z) = lim|z|→∞ h1 (z) = 0. By Liouville’s theorem, ϕ ≡ 0 thus proving that
h(z) = 0 on D.

To prove (11.32), fix a ∈ D \ γ ∗ and define F (z) = (z − a)f (z). By the first part of the
proof Z Z
F (w)
f (w) dw = dw = 2πiF (a) Indγ (a) = 0.
γ γ w−a

Equation 11.33 follows by applying (11.32) to γ = γ1 − γ2 .

Corollary 11.5.23. If K is a compact subset of an open set Ω ⊂ C, then there is a cycle
Γ ∼ 0 in Ω such that IndΓ (z) = 1 for all z ∈ K. In particular, for any f ∈ H(Ω)
Z
1 f (w)
f (z) = dw
2πi Γ w − z
for all z ∈ K. A path Γ satisfying the conditions above is said to surround K in Ω.
11.5. Cauchy formula 299

Proof. The last statement will follow as a consequence of the first statement and the general
Cauchy theorem.
By compactness, η ′ := d(K, C \ Ω) > 0. Construct a grid of vertical and horizontal lines
forming squares whose edges lie in the grid and have length η := η ′ /2. Since K is compact,
only a finite number of those squares, say Q1 , . . . , Qm , intersect K. The choice of η ensures
that these squares are contained in Ω. Orient the boundary of each such square Qj =
[nj η, (nj + 1)η] × [mj η, (mj + 1)η] counterclockwise, that is
∂Qj = γj1 +̇γj2 +̇γj3 +̇γj4
where γjk , j = k, . . . 4, are the directed edges η(nj , mj ) to η(nj + 1, mj ), η(nj + 1, mj )
to η(nj + 1, mj + 1), η(nj + 1, mj + 1) to η(nj , mj + 1), and η(nj , mj + 1) to η(nj , mj )
respectively. Clearly

1 if z ∈ Int(Qj )
Ind∂Qj (z) =
0 if z ∈ C \ Qj
Let Σ be the collection of all directed edges γjk (1 ≤ j ≤ m, i ≤ k ≤ 4). Remove from
Σ those directed edges whose opposites appear also in Σ. Let Φ be the remaining set of
directed edges. None of the edges in Φ intersect K for if an edge ℓ of some square Qj
intersects K, then there is exactly one other square Qj ′ having ℓ as common side. Hence ℓ
appears twice with opposite orientation and so, it is an edge that is removed from Σ. We
claim that the edges in Φ form a cycle. To see this, notice that Φ is balanced in the sense
that for each vertex p appearing in Φ, the number of edges having p as initial point is the
same as the number of edges having p as an end point. Now, starting with a vertex p,
choose γ1 = [p, p1 ] ∈ Φ. Having chosen k distinct oriented edges γj = [pj−1 , pj ], 1 ≤ j ≤ k,
we stop if p = pk in which case we have a closed path based at p. If p 6= pk , and there are
exactly r of the edges γ1 , . . . , γk has pk as an endpoint, then exactly r − 1 of those edges
have pk as initial point. Since Φ is balanced, there is another edge γk+1 ∈ Φ whose initial
point is pk . Since Φ is finite, at some finite step n we get an edge γn = [pn−1 , p]. The edges
γ1 , . . . , γn form a closed path based at p0 .
The remaining members of Φ clearly form a balanced collection of edges. The same con-
struction may be applied. This shows that the Φ has finite partition Φ1 , . . . Φt , each of
which forms closed path Γ1 , . . . , Γt . The sum of those closed paths is a cycle.
P
By construction, IndΓ (z) = m j=1 Ind∂Qj (z) for each z that is not in the boundary of any
Qj . Hence
S
1 if z ∈ m Int(Qj )
j=1S
IndΓ (z) =
0 if z ∈ C \ m j=1 Qj
S
m
If z ∈ K ∩ j=1 ∂Qj , then z ∈/ Γ∗ and z is a limit point of the interior of some Qj . Since
the function z 7→ IndΓ (z) is constant in each component of the complement of Γ∗ , it follows
that IndΓ (z) = 1. Consequently

1 if z ∈ K
IndΓ (z) =
0 if z ∈ /Ω
300 11. Differentiation

The following results will give some conditions under which to closed paths γ0 and γ1
in an open set D are homologous. Two closed curves γ0 and γ1 in a topological space
X parameterized by the same interval [a, b] are homotopic if there is a continuous map
H : [0, 1] × [a, b] → X such that
H(0, ·) = γ0 (·), H(1, ·) = γ1 (·), H(s, a) = H(s, b)
for all 1 < s < 0.
If X is a path connected topological space and every closed curved is homotopic to a
constant curve γ1 (a point), then X is said to be simply connected .
Lemma 11.5.24. Let γ0 and γ1 be closed paths in C parameterized by the interval [a, b]. If
there is α ∈ C such that
(11.35) |γ1 (t) − γ0 (t)| < |α − γ0 (t)|, a≤t≤b
then, Indγ0 (α) = Indγ1 (α).

/ (γ0∗ ∪ γ1∗ ) and thus, γ = (γ1 − α)/(γ0 − α) is a closed path. A simple

Proof. Clearly, α ∈
computation shows that
γ′ γ1′ γ0′
(11.36) = − .
γ γ1 − α γ0 − α
Also, by (11.35) |γ − 1| < 1. Hence, γ ∗ ⊂ B(1; 1) and consequently, 0 belongs to the
unbounded component of C \ γ ∗ . Therefore, Indγ (0) = 0. Integration over [a, b] on both
sides of (11.36) gives the desired result.
Theorem 11.5.25. If γ0 and γ1 are homotopic closed paths in D, then γ0 ∼ γ1 .

Proof. Without lose of generality, suppose that γ0 and γ1 are both parameterized by I =
[0, 1]. There exists a continuous function H : I 2 → D such that H(0, ·) = γ0 , H(1, ·) = γ1 (·)
and H(s, 0) = H(s, 1) for all 0 ≤ s ≤ 1. Let α ∈ C \ D. Since H(I 2 ) is compact, then there
is ε > 0 such that
(11.37) inf |H(s, t) − α| > 2ε
0≤s,t≤1

Since H is uniformly continuous, there is an integer n > 1 such that

From(11.41), (11.42), and n + 2 applications of Lemma 11.5.24 we conclude that α has the
same index with respect to the paths γ0 , g0 , . . . , gn , γn .
Remark 11.5.26. The polygonal paths were taken instead of the closed curves γk (·) :=
H( nk , ·) because H may not be differentiable. It is possible to extend the definition of
index to continuous curves by approximating them uniformly by smooth paths (Weierstrass
theorem with trigonometric polynomials); then, an application of Theorem 11.5.25 justifies
that this procedure does not depend on any the particular approximation.
Lemma 11.5.27. Suppose D ⊂ C is a simply connected open. If f ∈ H(D), then there
exists F ∈ H(D) such that F ′ = f . Any two such F differ by a constant.
R
Proof. The assumption in D implies that γ f (w) dw = 0 for all closed path in D. There-
fore, for fix z0 ∈ D, the function
Z
F (z) = f (w) dw
η(z0 ,z)
where η(z0 , z) is any path in D joining z0 to z is well defined. For any z ∈ D, there is a
neighborhood B(z; r) ⊂ D of z such that |f (w) − f (z)| < ε. Choosing η(z, z + h) as the
straight line segment joining z to z + h for all h with |h| < r gives
F (z + h) − F (z) Z
1
− f (z) ≤ |f (w) − f (z)| dw < ε
h |h| η(z,z+h)
This shows that F ∈ H(D) and F ′ (z) = f (z) for all z ∈ D.
If G ∈ H(D) satisfies G′ = f then H = F − G satisfies H ′ ≡ 0. Since D is connected, it
follows that H is a constant function.
Theorem 11.5.28. Suppose D is open and simply connected. If f ∈ H(D) and f (z) 6= 0
for all z ∈ D, then there exists g ∈ H(D) such that f = exp ◦g. Any two such g differ by a
constant multiple of 2πi.

Proof. The assumptions imply that f ′ /f ∈ H(D). Applying Lemma 11.5.27 to f ′ /f in

place of f gives a function h ∈ H(D) such that h′ = f ′ /f . If ϕ(z) = e−h(z) f (z), then ϕ′ ≡ 0
in D. Therefore, ϕ is constant in D. For z0 ∈ D fixed, let w0 be such that ϕ(z0 ) = ew0 .
Then f (z) = exp(h(z) + w0 ). The function g(z) = h(z) + w0 satisfies the the conclusion of
the Theorem.

Any function g that satisfies exp ◦g = f is said to be a logarithm of f in D. If f admits

a logarithm function g in D , then complex powers of f are defined by setting f α = exp(αg)
for any α ∈ C.
Example 11.5.29. If f is identity map f (z) = z on D = B(1; 1) then, by Theorem 11.5.28,
there exists a unique function L ∈ H(D) such that z = exp(L(z)) and L(1) = 0. Clearly
L′ (z) = z1 . We now prove that L is given by
∞
X (−1)n−1
(11.43) L(z) = − (z − 1)n , z ∈ B(1; 1).
n
n=1
302 11. Differentiation

If F is given by the right hand side of (11.43), then F ∈ H(D), and F ′ (z) = 1/z on D. As
D is connected, L and F differ by a constant in D, and since L(1) = 0 = F (1), we conclude
that L ≡ F . The function L given by (11.43) coincides with the restriction to B(1; 1) of
the principal logarithm function introduced in Example 11.4.8.
Example 11.5.30. (Complex binomial expansion) Let log(reiθ ) = log(r) + iθ be the prin-
cipal logarithm function on Ω := {reiθ : r > 0, −π < θ < π} as in Example 11.4.8. For any
α ∈ C and k ∈ Z+ define α(k) = 1 if k = 0 and α(k) = α · . . . · (α − k + 1) otherwise. Define
α
α(k) α
k := k! . Suppose α ∈ C \ Z+ and let hα (z) = (1 + z) := exp(α log(1 + z)). Repeated
(k)
differentiation gives hα (z) = α(k) hα−k (z) and so, h(k) (0) = α(k) 6= 0 for all k ∈ Z+ . It
follows that hα has power series expansion around 0 given by
X∞
α α k
(11.44) hα (z) = (1 + z) = z , |z| < 1
k
k=0

a |α−k|
Indeed, setting ak := αk , we have that R := lim k+1 ak = lim k+1 = 1. Hence the
k→∞ k→∞
radius of convergence of the power series (11.44) is 1/R = 1. Notice that if α ∈ Z+ then
equation (11.44) coincides with the usual binomial expansion of elementary algebra.
Theorem 11.5.31. Suppose D is a simply connected region in C and f ∈ H(D). If
0∈
/ f (D), then the map z 7→ log |f (z)| is harmonic on D and
Z 2π
1
log |f (z0 )| = log |f (z0 + reiθ )| dθ
2π 0
whenever B(z0 ; r) ⊂ D.

Proof. By Theorem 11.5.28 there is g ∈ H(D) such that f = exp ◦g on D. If u = Re(g),

then u is harmonic and log |f | = u.

11.6. Singularities
The next result concerns holomorphic functions in regions with holes.
Theorem 11.6.1. (Laurent–Weierstrass) Let D be an open set in the complex plane con-
taining an annulus A(a; r1 , r2 ) = {z ∈ Z : r1 ≤ |z − a| ≤ r2 } (r1 ≤ r ≤ r2 ). Let γr (a) denote
the positively oriented circle of radius r centered at a. If f ∈ H(D), then
X
(11.45) f (z) = cn (z − a)n z ∈ A(a; r1 , r2 )
n∈Z

where
Z
1 f (w)
(11.46) cn = dw, n ∈ Z.
2πi γr (a) (w − a)n+1
The series (11.45) converges absolutely and uniformly over A(a; r1 , r2 ).
11.6. Singularities 303

Proof. Since D is open, there exists R1 < r1 < r2 < R2 such that A(a; r1 , r2 ) ⊂ A(a; R1 , R2 ) ⊂
D. For any z ∈ A(a; r1 , r2 ), Corollary 11.5.8 shows that the function
(
f (ξ)−f (z)
ξ−z if ξ ∈ D \ {z}
g(ξ) =
f ′ (z) if ξ=z
is holomorphic on D. Since γR2 and γR1 are homotopic, γR2 ∼ γR1 and so,
Z Z
(11.47) g(ξ) dξ = g(ξ) dξ.
γR1 (a) γR2 (a)

Since r1 < |z − a| < r2 , the integrands in (11.47) can be written as g(ξ) = fξ−z (ξ)
− fξ−z
(z)
. After
substitution and transposition of terms we obtain
Z Z ! Z Z
dξ dξ f (ξ) f (ξ)
f (z) − = dξ − dξ.
γR (a) ξ − z
2
γR (a) ξ − z
1
γR (a) ξ − z γR (a) ξ − z
2 1

Theorem 11.5.1 implies that f (z) = f1 (z) + f2 (z) where

Z Z
1 f (ξ) 1 f (ξ)
f1 (z) = − dξ, f2 (z) = dξ.
2πi γR (a) ξ − z 2πi γR (a) ξ−z
1 2

It follows from Theorem 11.4.2 that f2 ∈ H(B(a; r2 )), and admits a power series
∞
X
(11.48) f2 (z) = cn (z − a)n , z ∈ B(a; r2 ),
n=0
1
R f (ξ) ∗
with cn = 2πi γR2 (a) ξ−a dξ for all n ∈ Z+ . Similarly, f1 ∈ H(C \ B(a; r1 )). Since ξ ∈ γR 1
and |z − a| > r1 ,

ξ − a R1

z − a < r1 < 1
Thus
1 1 1 X (ξ − a)n−1 ∞
=− = −
ξ−z z − a 1 − ξ−a (z − a)n
z−a n=1

converges absolutely and uniformly over ξ ∈ γR ∗ (a) and z ∈ D \ B(a; r ). By dominated

1 1
convergence,
∞
X
(11.49) f1 (z) = c−n (z − a)−n , z ∈ D \ B(a; r1 ),
n=1
1
R f (ξ)
with c−n = 2πi γR1 (a) (ξ−a)1−n dξ for all n ∈ N. Since γR2 (a) ∼ γr (a) ∼ γR1 (a) for all
f (ξ)
R1 < r < R2 , and for each n ∈ Z, ξ 7→ (ξ−a)n+1
∈ H(D), we conclude from that
Z
1 f (ξ)
cn = dξ
2πi γr (a) (ξ − a)n+1
for all n ∈ Z.
304 11. Differentiation

Remark 11.6.2. The terms f1 defined by (11.49), and f2 defined by (11.48) are called
principal and regular parts of f respectively.
Theorem 11.6.3. Suppose f ∈ H(D) where D = B(a; R) \ {a}. One and only of the
following holds:
(i) The point z = a is a removable singularity .
(ii) There exist m ∈ N complex numbers c−1 , . . . , c−m , c−m 6= 0, such that
m
X c−k
f (z) −
(z − a)k
k=1

has a removable singularity at a. In this case, limz→a |f (z)| = ∞ and f is said to

have a pole of order m at a.
(iii) f (B(a; ρ) \ {a}) is dense in C for all 0 < ρ ≤ R. In this case, f is said to have an
essential singularity at a.

Proof. Suppose (iii) does not hold. Then there are numbers 0 < ρ ≤ R, δ > 0 and a point
w ∈ C such that z ∈ B(a; ρ)\{a} implies |f (z)−w| > δ. It follows that g : z 7→ 1/(f (z)−w)
is bounded and holomorphic on B(a; ρ) \ {a}; hence a is a removable singularity of g and
g ∈ H(a; ρ) by setting g(a) = lim g(z). Then g has a zero at a of order m ∈ Z+ and
z→a
g(z) = (z − a)m h(z)Pwhere h ∈ B(a; ρ) and h(a) 6= 0. Thus φ = h1 ∈ H(B(a; ρ) admits a
power series φ(z) = n≥0 cn (z − a)n where c0 6= 0. It follows that
1 X
f (z) = m
c′n (z − a)n ,
(z − a)
n≥0

where c′0
= w + c0 and c′n
= cn for all n ≥ 1. If m = 0, that is g(a) 6= 0, then (i) holds;
whereas if m ≥ 1, then (ii) holds.
Remark 11.6.4. The coefficient c−1 in the Laurent expansion (11.45) is called residue of
f at a, and it is denoted as Res(f ; a). The Laurent–Weierstrass, together with the general
Cauchy theorem, implies that
Z Z
1 1
f (z) dz = Res(f ; a) Indγ (a) = Res(f1 ; a) Indγ (a) = f1 (z) dz
2πi γ 2πi γ
/ γ∗.
for any cycle γ ∼ 0 in D such that a ∈

A function f that is analytic on an open set D ⊂ C except for a discrete set of points A,
all of which are poles, is said to be meromorphic. A function f is said to be meromorphic
at z0 if it is meromorphic on a neighborhood U of z0 .
Theorem 11.6.5. (Theorem of residues) Suppose f ∈ H(D \ A) where A ⊂ D is a discrete
set at which f has singularities. If γ ∼ 0 in D and A ∩ γ ∗ = ∅ then,
Z X
1
(11.50) f (z) dz = Res(f ; a) Indγ (a).
2πi γ
a∈A
11.6. Singularities 305

Proof. Let B = {a ∈ A : Indγ (a) 6= 0}. Since A has no limit points in D then A is countable
and closed in D; hence, D \A is open. Indγ is constant in each component of C\γ ∗ , vanishes
at the unbounded component of C \ γ ∗ , and also vanishes at any component intersecting
C \ D. It follows that B is finite. Let a1 , . . . , an be the points of B and Q1 , . . . , Qn be
Pn parts of f at a1 , . . . , an respectively. Then, D0 = D \ (A \ B) is open and
the principal
F = f − k=1 Qk ∈ H(D0R) for the singularities are removable. From the general Cauchy
theorem 11.5.22 we obtain γ F (z) dz = 0. As Res(f ; ak ) = Res(Qk ; ak ),
Z Xn Z Xn
1 1
f (z) dz = Qk (z) dz = Res(Qk ; ak ) Indγ (ak ).
2πi γ 2πi γ
k=1 k=1
This is formula 11.50 since Indγ (a) = 0 for all a ∈ A \ B.

The formula of residues (11.50) is often used to obtain explicit expressions of integrals
over infinite intervals of the real line.
R dx 1
Example 11.6.6. To evaluate R 1+x 4 consider the function f (z) = 1+z 4 . Let γR be the

closed path obtained by joining the straight line segment ℓR from (−R, 0) to (R, 0), and
the upper semicircle cR of radius R centered at the origin (See Figure 1). f has only four
2k−1
single poles, namely zk = ei 4 with k = 0, . . . , 3, of which z2 , z3 lay in the unbounded
component of C \ γR ∗ . The Residues at z and z are given by
0 1
1 1+i
lim (z − z0 )f (z) = 3 = − √
z→z0 4z0 4 2
1 1−i
lim (z − z1 )f (z) = 3 = √
z→z1 4z1 4 2

−R R

Figure 1.

For R > 1 we have

Z 1 √
X 2π
f = 2πi Res(f ; zk ) =
γR 2
k=0
R R dx
Along ℓR we have limR→∞ ℓR f= R 1+x4 whereas along cR
Z πR

lim f ≤ lim =0
R→∞ cR R→∞ R4 −1
306 11. Differentiation

R dx
√
2π
Therefore, R 1+x4 = 2 .
e ax
Example 11.6.7. For any 0 < a < 1 the function fa (x) = 1+e x is integrable in with respect
R
Lebesgue’s measure on (R, B(R)). To evaluate Ia = R fa (x) dx consider the rectangular
path γR with base on the segment from (−R, 0) to (R, 0) and hight R (see Figure 2)

i2π

−R 0 R

Figure 2.

The function fa is meromorphic on C and has simple poles zk = iπ(2k − 1) all of which,
∗ . Now
with the exception of z1 = iπ, are in the unbounded component of C \ γR
Res(fa ; iπ) = lim (z − iπ)f (z) = −eaπi
z→iπ
R
Thus, fa = −2πieaπi . 1 ) and right (v 2 ) vertical sides of γ we have
Along the left (vR R R
γR
Z aR
e e−Ra
R→∞
fa ≤ 2π R
+ −R
−−−−→ 0
v1 +̇v2 e −1 1−e
R R

Along the base h1R and the opposite horizontal side h2R we have
Z Z Z
2aπi R→∞ 2aπi
fa = (1 − e ) fa (x) dx −−−−→ (1 − e ) fa (x) dx
h1R +̇h2R [−R,R] R

Putting things together gives

Z
eax eaπi π
x
dx = −2πi 2aπi
= .
R 1 + e 1 − e sin(aπ)

The theory of holomorphic functions we have presented can also be applied to solve
certain linear differential equations.
P
Example 11.6.8. Suppose f ∈ H(B(0; r)\{0}) with Laurent expansion f (z) = n∈Z an z n .
The region D = B(0; r) \ (−∞, 0){0} is open an simply connected. For any constant c ∈ C,
the function
Z z X an−1
Fc (z) = f = c + a−1 log(z) + zn,
n
n6=0

where log is the principal branch of logarithm, satisfies F ∈ H(D) and F ′ (z) = f (z) for all
z ∈ D. This provides a method to solve the differential equation
w′ (z) + f (z)w(z) = 0, z∈D
11.6. Singularities 307

namely,
X a
n−1 n
w(z) = exp Fc (z) = Cz a−1 exp z .
n
n6=0

Example 11.6.9. (Frobenius–Fuchs method) Consider the second order linear differential
equation
P (z) ′ Q(z)
(11.51) w′′ (z) + w (z) + 2 w(z) = 0
z z
where P and Q are analytic in a neighborhood B(0; a) of 0. Under these assumptions, the
point z = 0 is said to be a regular singular point of the differential equation (11.51). On
the region D = B(0; a) \ (−∞, 0) × {0} we propose a solution of the form
∞
X
w(z) = z r 1 + an z n
n=0
P n
P n.
Let P (z) = n≥0 pn z and Q(z) = n≥0 qn z
A simple computation shows that
X
zw′ (z) = z r an (r + n)z n
n≥0
X
z 2 w′′ (z) = z r an (r + n)(r + n − 1)z n
n≥0

A formal substitution of these expression into (11.51) gives

X
z 2 w′′ (z) + zP (z)w′ (z) + Q(z)w(z) = z r an (r + n)(r + n − 1)z n
n≥0
n
X X
+ am (r + m)pn−m z n
n≥0 m=0
X Xn

+ am qn−m z n = 0
n≥0 m=0

Equating the coefficient of n = 0 gives the equation

a0 r(r − 1) + rp0 + q0 = a0 I(r) = 0

This equation is known as the indicial equation of (11.51). For n ≥ 1, equating the
coefficient of the n–th power to 0 gives

n−1
X
an (r + n)(r + n − 1) + (r + n)p0 + q0 + am (r + m)pn−m + qn−m = 0
m=0
308 11. Differentiation

which can be expressed as

n−1
X
(11.52) an I(r + n) = − am (r + m)pn−m + qn−m
m=0

We set a0 =, and let α and β the two solutions to I(r) = 0 arranged so that Re(α − β) ≥ 0.
Setting s = α − β, we obtain p0 − 1 = −(α + β) = −2α + s. Hence, for all n ≥ 1
I(α + n) = I(α) + n(n + s) = n(n + s) 6= 0
This shows that the recurrence equation (11.52) has a unique solution given by
Pn−1
m=0 am (r + m)pn−m + qn−m
an = −
I(α + n)
P
To proof that the formal series w(z) = z α 1 + n≥1 an z n = z α f (z) is indeed a solution
to (11.51), it suffices to show that f converges in an open disk around z = 0. As P, Q ∈
H(B(0; a)), Cauchy’s estimates shows that for some 0 < ρ < a and M > 1
M M M
|pn | ≤ , |qn | ≤ , |αpn + qn | ≤
ρn ρn ρn
Mj
for all n ≥ 1. For n = 1 we have |a1 | = |r1|s+1|
p1 +q1 |
≤ M
ρ . By induction, assume |aj | ≤ ρj
for
all 1 ≤ j ≤ n − 1. Then
Pn−1 n(n−1)
m=0 |am | |rpn−m + qn−m | + m|pn−m | Mn n + 2 Mn
|an | ≤ ≤ n <
n(n + s)| ρ n2 ρn
This completes the induction argument. It follows that f has radius of convergence R ≥
ρ
M > 0.

We conclude this section with a result that states that in the complex plane there one
can construct functions that have singularities in arbitrary discrete set A and arbitrary
principal parts around points in A.
Theorem 11.6.10. (Mittag–Leffler) Let {an :Pn ∈ N} ⊂ C be a sequence such that
∞ −k be a Laurent series
limn |an | = ∞. For each n ∈ N let Pn (z) = k=1 cn,k (z − an )
converging on C \ {an }. Then there is a holomorphic function f : C \ {an : n ∈ N} → C
such that for all n ∈ N, the principal part of f at an is Pn .

Proof. We can assume that a0 := 0 ∈ / A := {an : n ∈ N} (otherwise were-move 0 from A

and add P0 at the end). Each Pn is analytic on B(0; |an |) and so, we may choose a Taylor
polynomial Qn of Pn such that
kPn − Qn k |an | ≤ 2−n ,
B 0; 2

P∞ converges uniformly on closed disks contained in B(0; |an |). We claim

for the Taylor series
that the series n=1 (Pn − Qn ) converges uniformly in compacts of C \ A. Indeed, for
11.7. Zeroes of analytic functions 309

any compact set K ⊂ C \ A, there NK ∈ N such that 2 dist(0, K) < inf n≥N |an |. Hence
kPn − Qn kK ≤ 2−n for all n ≥ N and so,
∞
X N
X −1 ∞
X
kPn − Qn kK ≤ kPn − Qn kK + 2−n < ∞.
n=1 n=1 n=N
P∞
Consequently, f := n=1 (Pn − Qn ) ∈ H(C \ A). It remains to check that f has the correct
principal parts. To this end, consider ak ∈ A, and let
X
fk = (Pn − Qn ) = f − (Pk − Qk )
n6=k

so that f = (fk − Qk ) + Pk . The first term of this sum is holomorphic near ak and so, Pk
is the principal part of f at ak .

11.7. Zeroes of analytic functions

The formula of residues gives and integral expression for the number zeroes that an analytic
function has in a given region.
Theorem 11.7.1. Let D ⊂ C open and let γ be a closed path such that γ ∼ 0 in D. Suppose
that Indγ (z) ∈ {0, 1} for all z ∈ D \ γ ∗ and let f ∈ H(D). If f has no zeroes in γ ∗ then,
the number of zeroes Nf of f in D1 = {z ∈ D : Indγ (z) = 1}, counted according to their
multiplicity, is finite and
Z ′
1 f (z)
(11.53) Nf = dz = Indγf (0)
2πi γ f (z)
where γf := f ◦ γ.

Proof. It follows from the hypothesis on f that A = {z ∈ D : f (z) = 0} is at most

countable and has no limit points in D. If f has a zero of order m = m(a) at a, then
f (z) = (z − a)m h(z) where h and 1/h are holomorphic on a small neighborhood V of a. It
follows that
f ′ (z) m h′ (z)
g(z) = = + , z ∈ V \ {a}
f (z) z−a h(z)
and so, Res(f ; a) = m(a). Let B = {a ∈ A : Indγ (a) = 1}. Then, by the residue theorem
Z ′ X X
1 f (z)
dz = Res(g; a) = m(a) = Nf .
2πi γ f (z)
a∈B a∈B

If [a, b] parameterizes γ, then

Z b Z b ′ Z ′
1 (f ◦ γ)′ (t) 1 f (γ(t)) ′ 1 f (z)
Indγf (0) = dt = γ (t) dt = dz.
2πi a (f ◦ γ)(t) 2πi a f (γ(t)) 2πi γ f (z)
This completes the proof.
310 11. Differentiation

Corollary 11.7.2. (Rouché) Let D ⊂ C open and let γ be a closed path such that γ ∼ 0 in
D. Suppose that Indγ (z) ∈ {0, 1} for all z ∈ D \ γ ∗ and let D1 = {z ∈ D : Indγ (z) = 1}. If
f, g ∈ H(D) and
(11.54) |f (z) − g(z)| < |f (z)| z ∈ γ∗,
then Ng = Nf , where Ng and Nf is the number of zeroes of g and f in D1 , counted according
to their multiplicity.

Proof. From (11.54) it follows that neither f nor g has zeroes in γ ∗ . If Γ1 = f ◦ γ and
Γ0 = g ◦ γ, then Lemma 11.5.24 and Theorem 11.7.1 show that Ng = IndΓ0 (0) = IndΓ1 (0) =
Nf .

We will use Rouché’s theorem to introduce an important property of holomorphic func-

tions.
Theorem 11.7.3. (Open mapping theorem) Let Ω be an open connected set in C and
suppose f ∈ H(Ω). If f is not constant, then f is an open map, that is, f (W ) is open for
any open set W ⊂ Ω. Moreover, if f (z0 ) = w0 and m is the order of the zero of f (z) − w0
at z0 then, there exists g ∈ H(U ) such that
m
(11.55) f (z) = w0 + φ(z) , z∈V
and φ′ does not vanish in V and so, is φ is an open and invertible.

Proof. Let W ⊂ Ω an nonempty and open, and z0 ∈ W . As f is not constant, there is an

open neighborhood U ⊂ W of z0 and a function g ∈ H(U ) such that
f (z) − w0 = (z − z0 )m g(z), z∈U
and g 6= 0 for any in U . Let V ⊂ V ⊂ U be an open ball around z0 and set ε :=
minz∈∂V |f (z) − w0 | > 0. Hence, for |w − w0 | < ε
|f (z) − w0 | > |w − w0 | = |(f (z) − w0 ) − (f (z) − w)|, z ∈ ∂V
By Rouché’s theorem f (z) − w has m zeroes in V ; hence B(w0 ; ε) ⊂ f (V ) ⊂ f (W ). To
prove the last statement, observe that since g ∈ H(V ) has nozeroes in ball V , there is
h ∈ H(V ) such that g = exp(h). Define φ(z) = (z − z0 ) exp h(z)
m for z ∈ V . Clearly φ′ 6= 0
in V , and (11.55) follows.

We will conclude this section with another remarkable integral equation involving the
number of zeroes of an analytic function in a ball B(0; r).
R 2π
Lemma 11.7.4. For all ρ ∈ R, 2π 1 it +
0 log 1 + ρ e dt = log (|ρ|)

P n
Since n≥1 |ρ|n < ∞, by dominated convergence, we conclude that t 7→ g(t, ρ) is integrable
R 2π
1
for every |ρ| < 1 and 2π it
0 log 1 + ρ e dt = 0.

For |ρ| > 1, observe that log |1 + ρ eit | = log(|ρ|) −1 −it

+ log |1 + ρ e |. Thus, for all |ρ| > 1,
R 2π
t 7→ g(t, ρ) is integrable and 2π 0 log 1 + ρ e dt = log(|ρ|).
1 it

It remains to consider the case |ρ| = 1. Notice that

cos t + ρ
∂ρ g = ≥ 0 for ρ≥1
|1 + ρ eit |2
cos t + ρ
∂ρ g = ≤ 0 for ρ ≤ −1
|1 + ρ eit |2
As g(·, −2) and g(·, 2) are in L1 (S1 ), 0 ≤ g(t, 2) − g(t, ρ) ր g(t, 2) − g(t, 1) as ρ ց 1, and
0 ≤ g(t, −2) − g(t, ρ) ր g(t, −2) − g(t, −1) as ρ ր −1, by monotone convergence we obtain
that,
Z 2π Z 2π
1 1
g(t, 1) dt = lim g(t, ρ) dt = lim log(ρ) = 0
2π 0 ρց1 2π 0 ρց1

1
R 2π
Similarly, we obtain that 2π 0 g(t, −1) dt = 0.

When f has no zeroes in B(0; R), the product in (11.56) is assume to be 1.

Proof. Fix 0 < r < R. Suppose f has mr zeroes in B(0; r) so that {α1 , . . . , αnr } ⊂ B(0; r)
and |αnr +1 | = . . . = |αmr | = r. The function
nr
Y mr
Y
r 2 − αj z αj
(11.57) g(z) = f (z)
r(αj − z) αj − z
j=1 j=nr +1

is analytic on B(0; R) and has no zeroes in B(0; s) for any r < s < R. By Theorem 11.5.28
g = exp ◦h for some h ∈ H(B(0; s)). Then, log |g| = Re(h) is harmonic and satisfies the
mean–value property
Z π
1
log |g(0)| = log |g(reiθ )| dθ.
2π −π
312 11. Differentiation

By (11.57)
nr
Y r
|g(0)| = |f (0)| .
|αj |
j=1

Each factor in the first product in (11.57) has module one for if z = reiθ and 1 ≤ j ≤ nr ,
r2 − α z re−iθ − α
j j
= = 1.
r(αj − z) αj − reiθ

If αj = reiθj , nr + 1 ≤ j ≤ mr , then
mr
X
(11.58) log |g(reiθ )| = log |f (reiθ )| − log |1 − ei(θ−θ0 ) |.
j=nr +1

Identity 11.56 follows from (11.58) by integration over [0, 2π] and application of Lemma 11.7.4.

The second statement follows by noticing that if r < s, then mr ≤ ms , 1 ≤ r/|αj | ≤ s/|αj |
for each j = 1, . . . , mr , and 1 ≤ s/|αj | for each j = mr + 1, . . . , ms . The last statement
corresponds to the case where there are no zeroes in B(0; r) when r is small. It follows by
the continuity of f and dominated convergence by choosing 0 < r0 < R small enough so
that |f (0)|
2 < |f (z)| < 23 |f (0)| whenever |z| < r0 .

Remark 11.7.6. If f in Jensen’s formula is allowed to have a zero of order m at 0, then

that l(r) is still non–decreasing on (0, R), and that limr→0 l(r) = −∞ = log |f (0)|.

Corollary 11.7.7. Suppose f ∈ H(B(0; R)) and f (0) 6= 0. For any 0 < r < R, let n(r) be
the number of zeroes, counting multiplicities. Then
Z r Z 2π
n(s) 1
(11.59) ds = log |f (reiθ )| dθ − log |f (0)|
0 s 2π 0

If M (r) := sup|z|=r |f (z)| then

log(M (2r)) − log |f (0)|

n(r) ≤ , 2r < R.
log 2
11.8. Entire functions 313

Proof. Fix 0 < r < R and suppose α1 , . . . , αn(r) are the zeroes of f in B(0; r) repeated
according to their multiplicity. Then
n(r) r Xn(r) Z r n(r) Z
X ds X r ds
log = = 1(|αk | < s)
αk s s
k=1 k=1 |αk | k=1 0
 
Z r X n(r) Z r
 ds n(s)
= 1(|αk | < s) = ds
0 s 0 s
k=1

Identity (11.59) follows from Jensen’s formula (11.56). As n(r) is nondecreasing, the last
Rr
statement follows by comparing r/2 n(s)
s ds and the right hand side of (11.59).

11.8. Entire functions

In this section we briefly present a study of zeroes of entire functions, In particular, we
consider the problem of existence of entire functions with a prescribed set of zeroes. A
solution to this problem is stated in a celebrated theorem by Weierstrass (Theorem 11.8.8).
We start by introducing the notion of infinite product of numbers.
Definition 11.8.1. Given a sequence (an : n ∈ N) ⊂ C let
n
Y
pn = ak .
k=1
Q Q∞
If limn pn = p we write ∞ n=1 pn = p. The infinite product n=1 an is said to converge
properly
Q if there is an integer N and a q ∈ C \ {0} such that an 6= 0 for all n ≥ N and
limn nk=N ak = q. In this case,
∞
Y
an := a1 · · · aN · q.
n=1
Q
If infinitely many members of the sequence {an } are 0, we say that the product an
diverges to 0.
Q
Theorem 11.8.2. The infinite product ∞ n=1 an converges properly iff for every ε > 0 there
is an integer N such that
(11.60) |an+1 · . . . · an+k − 1| < ε, n > N, k≥0
Q∞
In either case, n=1 an = 0 iff an = 0 for some n.
Q
Proof. Suppose nk=1 an converges properly. Then there is an integer N0 and q ∈ C \ {0}
Q n→∞
such that an 6= 0 for n ≥ N0 and qn = nk=N0 ak −−−→ q. Hence, there is A > 0 such that
|qn | ≥ A for all n ≥ n0 . Given ε > 0 there is N > N0 such that
(11.61) |qn+k − qn | < Aε, n > N, ,k ≥ 0
Dividing (11.61) by qn gives (11.60).
314 11. Differentiation

Conversely, suppose that for any ε > 0 there is N for which (11.60) holds. Then, for ε = 1/2
there is an integer N0 such that
1 3
(11.62) < |aN0 · · · an | < , n > N0 .
2 2
Q
This implies that an 6= 0 for all n ≥ N0 . Let qn := nk=N0 an , n ≥ N0 . Then, for any other
ε > 0 we can choose N > N0 such that
q 2
n+k
− 1 < ε, n ≥ N, k ≥ 0
qn 3
Consequently
2
|qn+k − qn | < ε|qn | < ε, n ≥ N, k≥0
3
Hence {qn : n ∈ N} is a Cauchy sequence and by (11.62), qn converges to some q 6= 0.
Q∞ Q∞
The infinite product n=1 (1 + an ) is absolutely convergent if n=1 (1 + |an |) is conver-
gent.
Q
Theorem 11.8.3.
Q∞ Absolutely convergence of ∞ P(1 + an ) implies proper convergence.
n=1
The product Q n=1 (1 + an ) converges absolutely iff ∞ n=1 |an | < ∞. Absolutely convergence
∞
implies that n=1 (1 + an ) = 0 iff an = −1 for some n.
Q
Suppose
Q (bk : k ∈ N) is a rearrangement of (an : n ∈ N). n (1 + an ) converges absolutely
iff k (1 + bk ) converges absolutely. In either case,
Y Y
(1 + an ) = (1 + bk ).
n k

Proof. The first statement follows from Theorem 11.8.2 since

(1 + aN ) · · · (1 + an ) − 1 ≤ (1 + |aN |) · · · (1 + |an |) − 1.

The second statement follows from

n
X n
Y X
n
sn := |ak | ≤ qn := (1 + |ak |) ≤ exp |ak | .
k=1 k=1 k=1

As both (sn : n ∈ N) and (qn : n ∈ N) are monotone nondecreasing sequences, the conver-
gence of one implies the boundedness, and hence convergence, of the other one. In either
case, there is N ∈ N for which |an | ≤ 12 , and thus an 6= −1, whenever n ≥ N .
Q
Suppose n (1 + an ) converges absolutely. Let bk = ag(k) where g is a permutation of N.
Q Q Q
Let pn = nj=1 (1 + an ), qk = kj=1 (1 + bk ) and p = ∞n=1 (1 + an ). There is a constant
C > 0 such that |pn | ≤ C for all n. Given 0 < ε < 1, there is an integer N such that
11.8. Entire functions 315

P
n ≥ N implies that j≥n |aj | < ε and |pn − p| < ε. There is an integer M such that
{1, . . . , N } ⊂ {g(1), . . . , g(M )}. For m ≥ M ,
Y
|qm − p| ≤ |qm − pN | + |pN − p| ≤ |pN | (1 + |an |) − 1 + ε
n>N
ε
≤ C(e − 1) + ε < (2C + 1)ε.
The third statement follows immediately.

Now we are ready to study infinite products of holomorphic functions.

Lemma 11.8.4. Suppose {Fn : n ∈ N} is a sequenceP of holomorphic functions on an open
set D. If there exist constants cn > 0 such that n cn < ∞ and
|Fn (z) − 1| ≤ cn , z∈D
then
Q
(i) The product Pm (z) = m n=1 Fn (z) converges uniformly and absolutely in D to a
holomorphic function P and for any z ∈ D, P (z) = 0 iff Fn (z) = 0 for some n.
(ii) If K is a compact subset of D containing no zeroes of Fn does for any n, then
P ′ (z) X Fn′ (z)
(11.63) = z∈K
P (z) n
Fn (z)

Proof. (i) for each z ∈ D we can write Fn (z) = 1+an (z) with |an (z)| ≤ cn . The convergence
of Pm (z) follows from Theorem 11.8.3. As each Pm is holomorphic on D, we conclude that
the limit P ∈ H(D). Moreover, if for some z ∈ D, Fn (z) 6= 0 for all n, then {Pm (z)} is
bounded away from zero, that is inf m |Pm (z)| > 0.

(ii) As Pm converges uniformly to P , the sequence {Pn′ : n ∈ N} also converges uniformly

to P ′ in compact subsets of D. If P does not vanish in a compact set K ⊂ D, then
′ ′
{Pm : m ∈ N} is uniformly bounded away from 0. therefore PPm m
→ PP uniformly on K.
Since
′ m
X
Pm Fn′
= ,
Pm Fn
n=1

statement (ii) follows.

Define the functions

E0 (z) = 1 − z
zp
Ep (z) = (1 − z) exp z + . . . +
p
for p ∈ N. The functions Ep are called Weierstrass elementary factors. They are entire
functions with only one zero, z = 1, in C.
316 11. Differentiation

Lemma 11.8.5. If |z| ≤ 1, then

(11.64) |Ep (z) − 1| ≤ |z|p+1
for all p ∈ Z+ .

Proof. For p = 0 (11.64) is obvious. For p > 0 we have that

zp
Ep′ (z) = Ep (z)(1 + z + . . . + z p−1 ) − Ep (z) = −z P exp z + . . . + .
p
Thus −Ep′ (z) is an entire function with a zero of order p at z = 0; furthermore, its series
expansion around 0 has only nonnegative real coefficients. Let γ be the line segment from
0 to z. As
Z
1 − Ep (z) = − Ep′ (w) dw
γ
we conclude that 1 − Ep is an entire function with a zero of order p + 1 at z = 0 whose
series expansion has only nonnegative coefficients. Hence the function
1 − Ep (z)
ϕ(z) =
z p+1
is entire and for |z| ≤ 1, |φ(z)| ≤ φ(|z|) ≤ φ(1) = 1. Therefore (11.64) holds.
Theorem 11.8.6. Let {zn : n ∈ N} ⊂ C \ {0} with |zn | → ∞. If {pn : n ∈ N} ⊂ Z+
satisfies
X∞
r 1+pn
(11.65) <∞
|zn |
n=0
for all r > 0, then the infinite product
∞
Y
z
(11.66) P (z) = Ep n
zn
n=1
defines an entire function P whose only zeroes in C are {zn : n ∈ N}. Furthermore, if
a ∈ {zn : n ∈ N} occurs exactly m times, then a is a zero of P of order m.

Proof. Fix r > 0 and let |z| < r. By Lemma 11.8.5

1+pn

Epn z − 1 ≤ z r 1+pn
zn ≤ .
zn |zn |
P∞ 1+pn < ∞, by
There is N ∈ N such that |zn | > r for all n ≥ N . Since n=1 (r/|zn |)
Lemma 11.8.4 the infinite product 11.66 converges uniformly and absolutely in B(0; r) to a
function P ∈ H(B(0; r)), and Pn (z) = 0 iff Epn (zn ) = 0 for some n < N . As this holds for
any r > 0, P is entire.
Remark 11.8.7. The sequence pn = n − 1 satisfies (11.65) since |zrn | ≤ 12 for all sufficiently
large n. There are cases of sequences {zn : n ∈ N} that grow sufficiently fast that we can
hold {pn } constant. In such cases it is of interest to find the smallest possible constant. For
example, if zn = n we can take pn ≡ 1.
11.8. Entire functions 317

Theorem 11.8.8. (Weierstrass factorization theorem) Let f be an entire function with

f (0) 6= 0. Let z1 , z2 , . . . be the zeros of f repeated according to their multiplicities. There
exists an entire function g and a countable set p1 , p2 , . . . in Z+ such that
Y
g(z) z
(11.67) f (z) = e Ep n .
n
zn
If f has a zero of order k at z = 0 we can apply the result to f (z)/z k .

Proof. If {zn } is finite, the result is immediate. Suppose {zn } is infinite. As f is entire and
not constant, it follows that |zn | → ∞. There exits a sequence {pn : n ∈ N} ⊂ Z+ (pn =n−1
Q
is one example) such that (11.65) in Theorem 11.8.6 holds. Hence P (z) = ∞ n=1 E p n
z
zn
is an entire function whose zeroes are {zn : n ∈ N}. It follows that the function h = f /P
is entire and has no zeros in C. Theorem 11.5.28 implies that there is an entire function g
such that h = exp ◦g.
sin(πz)
Example 11.8.9. The function f (z) = πz is entire and has only zeros of order one at
each n ∈ Z \ {0}. Then, with pn ≡ 1
sin(πz) Y ∞
z2
= eg(z) z 1− 2
π n
k=1

for some entire function h. We will show that eg(z) ≡ 1. The function
cos(πz)
w(z) = π cot(πz) = π
sin(πz)
is meromorphic on C with simple poles (order one) in Z. The function
1 X 1 1 1 X 2z
∞ ∞
(11.68) h(z) = + + = +
z z+n z−n z z 2 − n2
n=1 n=1
is also meromorphic on C with simple poles in Z. We will show that w ≡ h. Let ϕ = w − h.
Then
(a) ϕ is entire.
(b) ϕ(z) = ϕ(z + 1) and ϕ(−z) = −ϕ(z).
(a) is obvious since both w and h have only poles of order one on each integer with residues
equal to 1. Thus ϕ ∈ H(C \ Z) has a removable singularity on n ∈ Z.
To prove (b), it is enough to show that h is periodic with period 1. Let
N
X 1
hN (z) = .
z−n
n=−N
These sequence converges uniformly to h on compact subset of C \ Z. Since
N
X N
X −1
1 1 1 1
hN (z + 1) = = = hN (z) + − ,
z − (n − 1) z−n z+1+N z−N
n=−N n=−(N +1)
318 11. Differentiation

we obtain that h(z + 1) = h(z) by letting N → ∞.

To show ϕ = 0 is enough to prove that ϕ is bounded. Then, by Liouville’s theorem, ϕ is a
constant function, and by (b) ϕ ≡ 0 = ϕ(0). Having period 1, it suffices
h ito show that ϕ is
1 1 1
bounded in the strip | Re(z)| ≤ 2 . Being entire, ϕ is bounded in − 2 , 2 × [−1, 1]; hence
1
we only need to consider z = x + iy with |x| ≤ 2 and |y| > 1. From
e−2πy + e−i2πxei2πx + e2πy
cot(πz) = i = i ,
e−2πy − e−2iπx ei2πx − e2πy
1+a 1+e−eπ
| cot(πz)| ≤ sup0<a<e−2π 1−a = 1−e−2π
. As for h,
X ∞
1 x + iy
h(z) = +2 .
x + iy x2 − y 2 − n2 + 2ixy
n=1
1
Since |x| ≤ 2 < 1 < |y|,
1 1 1
y 2 + n2 − x2 > y 2 + n2 − > y 2 + n2 − 1 > y 2 + n2
4 2 2
and so
∞
X Z ∞
|y| |y|
|h(z)| ≤ 1 + 4 2 2
≤1+4 dt
y +n 0 y + t2
2
n=1
Z ∞
1
=1+4 du = 1 + 2π.
0 1 + u2
This concludes the proof that g ≡ h.
Q
z2
Let P (z) := z ∞n=1 1 − n2
and G(z) = sin(πz)
π . By Lemma 11.8.4(ii), for all z ∈ C \ Z
∞
P ′ (z) 1 X 2z
= + = π cot(πz)
P (z) z z 2 − n2
n=1
G′ (z)
As G(z) = π cot(πz),
P (z) ′
P (z) P ′ (z) G′ (z)
= − =0
G(z) G(z) P (z) G(z)
sin z
Hence P (z) = cG(z) for some constant c. Since limz→0 z = 1, we find that c = 1.

11.9. Exercises
Exercise 11.9.1. Let µ be a complex measure on B(Rd ) and let Mµ be its Hardy’s maximal
function. Show that Mµ < ∞ λd –a.a.
Exercise 11.9.2. For any a < x < y < b, show that
nX
n o
Vf (y) − Vj (x) = sup |f (tj ) − f (tj−1 )| : x = t0 < . . . < tn = y, n ∈ N
j−1
11.9. Exercises 319

This means that the variation of f over any subinterval [x, y] ⊂ [a, b] is given by the
difference of the variations over [a, y] and [a, x].
Exercise 11.9.3. Show that the function f (t) = t sin t−1 ) if t 6= 0 and f (0) = 0 is not of
bounded variation over any interval containing 0.
Exercise 11.9.4. Suppose f and g are absolutely continuous functions over [a, b] and let
α ∈ C. Show that f + αg, f · g and exp ◦f are absolutely continuous.
RR 3
Exercise 11.9.5. Define I± (s, R) := 1 exp i ± t3 + st dt. Show that limR→∞ I± (s, R)
RR 3
exists for any s ∈ R. Conclude that limr→∞ −R cos t3 + st dt exists for all s ∈ R. (Hint:
3 t3
d i t3
Notice that dt (e ) = t2 i ei 3 and use integration by parts.)
Exercise 11.9.6. Suppose F and G are functions on R+ which are of bounded variation
over any interval [a, b] ⊂ R+ . Suppose that G(∞) := limR→∞ G(R) exits. Show that
Z
F (s)µG (ds) = F (a)(G(∞) − G(a)) − F (b)(G(∞) − G(b))
(a,b]
Z

+ G(∞) − G(t−) µF (ds)
(a,b]

Remark 11.9.7. The existence of limR→∞ G(R) does not mean that µG (R+ ) is finite. If
limR→∞ VG (R) < ∞, then |µG |(R+ ) < ∞ and so |µG (R+ )| < ∞.
Exercise 11.9.8. Let µ be a Borel measure on an interval I in the real line. For any
a, b ∈ I with a < b, show that
Z
1 n
1(a < sn < . . . < s1 < b)µ(ds1 ) ⊗ · · · ⊗ µ(dsn ) ≤ µ(a, b)
n!
If in addition, µ is continuous, show that
Z
1 n
1(a < sn ≤ . . . ≤ s1 ≤ b)µ(ds1 ) ⊗ · · · ⊗ µ(dsn ) ≤ µ(a, b] .
n!
(Hint: Define G(t) := µ(a, t]. Apply Fubini’s theorem together with Lemma 11.3.6.)
Exercise 11.9.9. Let {fn : n ∈ N} is a sequence of differentiable functions on an open
interval I and that fn′ ∈ Lloc ′
1 (I). Assume that fn and fn converge uniformly in compact
subsets of I to functions f and g respectively. Show that f is λ–a.s. differentiable and
that f ′ (x) = g(x) at every differentiable point x. (Hint: For fixed x ∈ I, consider φn (h) =
R
1
2h [x−h,x+h] fn′ (t) dt, h > 0. Then, show that fn (x+h)−f
2h
n (x−h)
= φn (h) converges uniformly
1
R
to 2h [x−h,x+h] g(t) dt.)
Exercise 11.9.10. Let D be an open subset in C and suppose f ∈ H(D). Show that
Cauchy–Riemann’s equation in polar coordinates is given by
∂g 1 ∂g
=
∂r ir ∂θ
iθ
where g(r, θ) = f (re ).
320 11. Differentiation

P
Exercise 11.9.11. Suppose the double series a(n)z nm converges absolutely on
(n,m)∈N2
B(0; 1) and call its sum S(z). Show that each of the following series converge absolutely in
B(0; 1) as well, and has sum S(z):
∞
X ∞
X X
zn
a(n) , A(n)z n , where A(n) = a(d).
1 − zn
n=1 n=1 d|n

Give a concrete expression for S(z) when a(n) ≡ 1.

P∞ k nk
Exercise 11.9.12. Consider the function f (z) = k=1 5 z where n1 > 1 and nk+1 >
2knk for all k ∈ N. Show that
(i) f has radius of convergence 1.
1
(ii) There is a constant c > 0 such that for all m, |f (z)| > c5m if |z| = 1 − nm .
(iii) f has no finite radial limit as |z| → 1.
(iv) For any α ∈ C, f (z) + α = 0 has infinitely many solutions in B(0; 1).
Exercise 11.9.13. (Schwartz reflection principle) Suppose U+ is an open connected set
in the upper–half plane and that U+ ∩ (R × {0}) = [a, b] (−∞ ≤ a < b ≤ ∞). Let f be
complex–valued function that is continuous on U+ ∩ (a, b) × {0} and analytic on U+ . Let
U− = {z̄ : z ∈ U+ }, the reflection of U+ with respect the real line. Define g on U− by g(z) =
f (z). If f is real valued on (a, b), showthat the function h on U = U+ ∪ (a, b) × {0} × U−
defined by h = f on U+ ∪ (a, b) × {0} and h = g on U− is analytic, and that R it is the only
analytic function in U that coincides with f on U+ . (Hint: Show that △ h = 0 for any
triangle △ ⊂ U and use Morera’s theorem.)
Exercise 11.9.14. Let µ be a complex measure on a measurable space (X, B) and let
D ⊂ C be open. Suppose ϕ is a bounded complex valued function in D × X such that
ϕ(·, x) ∈ H(D) for each x ∈ X and that ϕ(z, ·) is B–measurable for each z ∈ D. Define
Z
f (z) := ϕ(z, x)µ(dx), z ∈ D.
X

Show that f ∈ H(D). (Hint: Use Morera’s theorem together with Fubini’s theorem.)
Exercise 11.9.15. Determine the regions in which the following functions are holomorphic:
Z 1 Z ∞ tz Z 1
dt e etz
f (z) = , g(z) = 2
dt, h(z) = 2
dt.
0 1 + tz 0 1+t −1 1 + t

Exercise 11.9.16. Let z0 ∈ C and c > 0. Define the path ξ : t 7→ z0 + itc, −1 ≤ t ≤ 1. For
x > 0 define
Z
1 1 1
g(x) = − dz
2πi ξ z − z0 − x z − z0 + x
Estimate limx→0 g(x).
11.9. Exercises 321

R∞
Exercise 11.9.17. (Gamma function reprise) Show that Γ(z) = 0 e−t tz−1 dt defines
an analytic function in the half plane H = {z ∈ C : Re(z) R > 0},−1and that it satisfies
Γ(z + 1) = zΓ(z) for all z ∈ H. (Hint: Define Fn (z) = (1/n,n] e t z−1 dt. Apply the
result from Exercise 11.9.14 to show that on any strip Sa,b = {z ∈ C : a < Re(z) < b}
(0 < a < b < ∞), Fn is analytic and Fn converges to Γ uniformly. For the last statement,
use integration by parts.)
Exercise 11.9.18. Show that
R
(a) S1 (z) = (1,∞) e−t tz−1 dt is an entire function.
R P (−1)n
(b) For Re(z) > 0, (0,1] e−1 tz−1 dt = ∞ n=0 n!(n+z) .
P (−1)n
(c) Show that S0 (z) = ∞ n=0 n!(n+z) is meromorphic with only simple poles in −Z.
(Hint: For fix R > 0, split the series at some integer N > 2R. Show that the finite
sum is meromorhpic with poles in 0, . . . N and the remaining series, converges
uniformly since

(−1)n 1

n!(n + z) ≤ n!R
for 2R < N < n and |z| ≤ R.)
(d) Conclude that Γ can be extended as a meromorphic function in C with only single
poles at −Z.
Exercise 11.9.19. Prove that on the strip S0,1 = {z ∈ C : 0 < Re(z) < 1}
RR
(a) lim 0 cos(t)tz−1 dt = Γ(z) cos πz2 .
R→∞
RR
(b) lim 0 sin(t)tz−1 dt = Γ(z) sin πz2 .
R→∞
(Hint: Use the contour shown in Figure 3)

iε

ε R

Figure 3.

(c) Show that equation in (b) can be extended by analytic continuation to −1 <
Re(z) < 1, and as consequence
Z R Z R √
sin x π sin x
lim dx = and lim 3/2
dx = 2π
R→∞ 0 x 2 R→∞ 0 x
322 11. Differentiation

(Hint: use Exercise 11.9.18)

Exercise 11.9.20. (Frobenius–Fuchs method, cont.) Consider the second order differential
equation (11.51) and suppose that the indicial equation I(r) = r(r − 1) + p0 r + q0 = 0 has
solutions α, β such that Re(α − β) ≥ 0. Show that
(a) If s = α − β ∈
/ Z+ then, there are two solutions to (11.51) of the form
X
w1 (z) = z α 1 + an z n
n≥1
X
β
w2 (z) = z 1+ bn z n
n≥1

where the series have positive radii of convergence. {w1 , w2 } is a linearly indepen-
dent system of solutions.
(b) If n = α − β ∈ Z+ then, there is a second solution to (11.51) of the form
X
w1 (z) = z α 1 + an z n
n≥1
X
w2 (z) = z β 1 + bn z n + Cw1 (z) log z
n≥1

for some constant C, and where the power series have positive radii of convergence.
(Hint: suppose there is a solution of the form w(z) = w1 (z)h(z) for some analytic
function in a disk around 0. w1 (z) = z α f (z) with f analytic in a disk near 0 with
f (0) = 1. This give gives a first order equation on h′ given by
2α f ′ p(z) ′
h′′ + +2 + h
z f z
Show that a solution to this reduced equation is of the form
X
h′ (z) = z −n−1 1 + cn z n .
n≥1

Integration og h gives a well defined analytic function around 0.)

Exercise 11.9.21. Suppose f is an entire function and |f (z)| ≤ A|z|k for some constant
A > 0, k ∈ N and all z large enough. Show that f is a polynomial of degree at most k + 1
(Hint: Use Cauchy estimates).

Exercise 11.9.22. Suppose f is and entire function and that for some ρ > 0 there are
constants A, B such that |f (z)| ≤ A exp(B|z|ρ ). The infimum ρf of all such ρ is called the
order of growth of f . Show that
(a) There exists a constant C depending only on f such that
n(r) ≤ Crρ , r > 0.
11.9. Exercises 323

If {αk : k ∈ N} are the zeroes of f that are different from 0 then, for any s > ρ,
(b) P
∞ 1
n=1 |αk |s < ∞. (Hint: notice that
 
X 1 X X 1 
= 
|αk |s |α k|
s
|αk |≥1 j≥0j j+1
2 ≤|αk |<2

and combined this with part (a).)

(c) If {an : n ∈ Z+ } are the coefficients of the Taylor expansion of f around 0 then
n
|an | ≤ ρne ρ for all n large enough and so,
p
(11.69) lim sup n1/ρ n |an | < ∞
n
P∞ α
zn Rn (R1/α )n 1/α
(d) Show that g(z) = n=0 (n!)α is of order 1/α. (Hint: (n!)α = n! ≤ eαR ).
(e) Show that f is of order ρ∗ if whenever ρ satisfies (11.69), then ρ ≥ ρ∗ .
Exercise 11.9.23. The Bernoulli numbers {Bn : n ∈ Z+ } are defined by
X Bn ∞
z
= zn
ez − 1 n!
n=0

Show that the sequence Bn satisfies

p
(a) lim supn n |Bn |/n! = 1/(2π).
P Bn−k
(b) B0 = 1 and nk=0 (k+1)!(n−k)! = 0 for n ≥ 1.
Show that
∞
z ez/2 + e−z/2 X B2n 2n
= z
2 ez/2 − e−z/2 (2n)!
n=0

and conclude that

∞
X (2π)2n
(11.70) πz cot(πz) = (−1)n B2n z 2n
(2n)!
n=0

Exercise 11.9.24. Show that

∞ X
1 2m
X ∞
(11.71) πz cot(πz) = 1 − 2 z
n2m
m=1 n=1

for |z| < 1. Deduce from this that

∞
X 1 B2m
2 2m
= (−1)m (2π)2m
n (2m)!
n=1

for all m ∈ N where B2m is the 2m–th Bernoulli number. (Hint: Use identity (11.68) in
Example 11.8.9. Equate the coefficients of the power series (11.71) and (11.70).)
324 11. Differentiation

Exercise 11.9.25. (Laplace transform) Suppose µ is a Radon or a complex measure on

(R+ , B(R+ )). The Laplace transform of µ is defined as
Z
f (z) := lim e−z t µ(dt)
R→∞ [0,R]

wherever the limit exits. If f (z0 ) exists for some z0 = σ0 + iξ0 , show that f (z) is exists in
the set ∆(σ0 ):= {z ∈ C :Re(z) > σ0 }. If t 7→ e−σ0 t ∈ L1 (|µ|) for some σ0 ∈ R, show that
f ∈ H ∆(σ0 ) ∩ C ∆(σ0 ) .
R
Exercise 11.9.26. Show that fbp : t 7→ [−1,1] (1 − x2 )−p e−itx dx is entire for each p < 1 and
that
√ 1

b πΓ(1 − p)2 2 −p
f (t) = 1 J 1 −p (t)
t 2 −p 2

where Jm , m > −1, is the Bessel function of order m which is defined as

X (−1)n z m+2n
Jm (z) =
n!Γ(n + m + 1) 2
n≥0
Chapter 12

Some Elements of
Functional Analysis

In this section we discuss a few results on the theory of continuous linear maps on topological
vector spaces which will be useful throughout in the following sections, in particular in the
study of further representation theorems, addressed in Chapter 13, and in the study of weak
convergence of measures, address in Chapter 17.

12.1. Topological Vector Spaces

Suppose that X is a vector space over the field F of real or complex numbers. A seminorm
on X is a function ρ from X to R such that
(a) ρ(αx) = |α|ρ(c) for all α ∈ F and x ∈ X.
(b) ρ(x + y) ≤ ρ(x) + ρ(y) for all x, y ∈ X.
It follows immediately from the definition that ρ(0) = 0 and that ρ(x) ≥ 0 for all x ∈ X.
A seminorm ρ that satisfies
(c) ρ(x) = 0 iff x = 0
is a norm as defined in Example 2.5.3.
Recall that if k · k is a norm on X, then d(x, y) := kx − yk defines a metric on X. If
this distance is complete, then (X, k k) is a Banach space.

Example 12.1.1. The Euclidean spaces (Fn , k k2 ) defined in Example 2.5.6 are the simplest
examples of Banach spaces.

Example 12.1.2. Let K be a compact topological space. The space C(K) of complex or
real functions with kf ku := supx∈K |f (x)| is a Banach space.

325
326 12. Some Elements of Functional Analysis

Example 12.1.3. We saw in Chapter 8 that if (Ω, F , µ) is a measure space, then for each
1 ≤ p ≤ ∞, Lp (µ) us a Banach space.

In general, suppose X is a vector space over the field F, where either F = R or F = C.

Let τ be a topology on X and assigned to X × X and F × X the corresponding product
topologies. (X, τ ) is a topological vector space if {0} is a closed set, and the maps
X × X ∋ (x, y) 7→ x + y ∈ X and F × X ∋ (r, x) 7→ rx ∈ X are continuous.
Remark 12.1.4. It follows immediately from the definition that if X is a topological vector
space, x0 ∈ X and a ∈ F \ {0}, then
(a) Lx0 : x 7→ x + x0 and ga : x 7→ ax are homeomorphisms of X onto itself. The
inverse maps are given by L−x0 and ga−1 respectively.
(b) If x0 6= 0, then λ 7→ λx0 is an homeomorphism between F and span(x0 ) = {λx :
λ ∈ F} with the relative topology given by τ .
(c) {x} is closed for each x ∈ X.
(d) If V ⊂ τ is a local base of open sets at 0, then B = {x + V : x ∈ X, V ∈ V } is a
base for τ .
Lemma 12.1.5. Let X be a topological vector space. Then, for any α ∈ F and subsets A
and B of X, αA = αA and A + B ⊂ A + B.

Proof. It is enough to consider α 6= 0. Since the map gα (x) = αx is a homeomorphism

and its inverse is given by gα−1 , the set αA is closed. Therefore,
αA ⊂ αA = αα−1 αA ⊂ αα−1 αA.

Let a ∈ A and b ∈ B and suppose W is an open with a + b ∈ W . Then, there exit open
neighborhoods V1 and V2 of a and b respectively such that V1 + V2 ⊂ W . Since a ∈ A and
b ∈ B, there are points x ∈ V1 ∩ A and y ∈ V2 ∩ B. Therefore,
x + y ∈ (A + B) ∩ (V1 + V2 ) ⊂ (A + B) ∩ W
and thus, a + b ∈ A + B.
1
Example 12.1.6. In R consider the sets A = {n + n+1 : n ∈ N} and B = {−n : n ∈ N}.
Clearly A and B are both closed subsets of R; however, A + B is not closed since { n1 : n ∈
N} ⊂ A + B but 0 ∈/ A + B.
Theorem 12.1.7. Let X be a topological vector space and ∅ =
6 A ⊂ X. Then
(a) If V is open, so is A + V .
T
(b) A = {A + V : V open, 0 ∈ V }

S If V is open, then so is x + V for any x ∈ X. The conclusion follow from Since

Proof. (a)
A + V = x∈A x + V .
12.1. Topological Vector Spaces 327

(b) Notice that x ∈ A iff (x + V ) ∩ A 6= ∅ for any open neighborhood V of 0, which is

equivalent to x ∈ A − V for every such neighborhood. Since V is an open neighborhood of
0 iff so is −V , the proof is complete.
Theorem 12.1.8. Let K and F be disjoint subsets of X. If K is compact
and F is closed
then, there is an open neighborhood B of 0 such that K + B ∩ F + B = ∅. Consequently,
a topological vector space is Hausdorff (in fact is regular).

Proof. First we show that neighborhoods of 0 contains translations of symmetric neigh-

borhoods of 0. Let W be an open neighborhood around 0. Since 0 + 0 = 0, there are
neighborhoods B1 and B2 of 0 such that B1 + B2 ⊂ W . Then, U = B1 ∩ B2 ∩ (−B1 ) ∩ (−B2 )
is a symmetric neighborhood of 0 such that U + U ⊂ W . Repeating this argument, there
is a symmetric neighborhood B of 0 such that
B+B+B ⊂B+B+B+B ⊂U +U ⊂W
and so, one can continue this way as needed.
If K = ∅ or F = ∅ there is nothing to prove. Assume K 6= ∅ and F 6= ∅. For every
x ∈ K there is a symmetric neighborhood Bx of 0 such that (x + Bx + Bx + Bx ) ∩ F = ∅.
Consequently,

(12.1) x + Bx + Bx ∩ (F + Bx = ∅
S
By compactness, there are x1 , . . . , xn ∈ K such that K ⊂ nj=1 (xj + Bxj ). Let B =
Tn
j=1 Bxj , then,
n
[ n
[
K +B ⊂ x j + Bx j + B ⊂ x j + Bx j + Bx j .
j=1 j=1

From (12.1) we conclude that K + B ∩ F + B = ∅.
Corollary 12.1.9. Let V be a non–empty subset of a topological vector space X. Any point
x ∈ V has an open neighborhood Wx such that Wx ⊂ Wx ⊂ V .

Proof. Let F = V c . If V = ∅, then W = ∅ satisfies the statement. Suppose x ∈ V , then

applying Theorem 12.1.8 gives an open neighborhood B of 0 such that (x+B)∩(B +F ) = ∅.
Therefore, x + B := W ⊂ W ⊂ X \ (B + F ) = X \ (B + F ) ⊂ X \ F = V .

Corollary 12.1.9 implies that any topological vector space is Hausdorff regular, that
is, for any point x ∈ X and closed subset F ⊂ X such that x ∈ / F , there exits an open
neighborhood V ⊂ X of 0 such that x + V and F + V are disjoint.
Definition 12.1.10. Let X be a vector space.
(a) B ⊂ X is balanced if λB ⊂ B for any λ ∈ F with |λ| ≤ 1.
(b) C ⊂ X is convex if λC + (1 − λ)C ⊂ C for all λ ∈ [0, 1].
(c) A ⊂ X is affine if αA + (1 − α)A ⊂ A for all α ∈ R.
If X is a topological vector space,
328 12. Some Elements of Functional Analysis

(d) F ⊂ X is bounded if for any neighborhood W of 0, there is s > 0 such that

F ⊂ sW .
(e) E ⊂ X is totally bounded if for any neighborhood U of 0 there is a finite set
F ⊂ X such that E ⊂ F + U .
(d) A subset C of a vector space X is a cone if tC ⊂ C for all t ≥ 0. A cone is pointed
if (−C) ∩ C ⊂ {0}. The convex cone generated by a subset A ⊂ X, cone(A) is the
smallest convex cone containing A.
Remark 12.1.11. Is is clear that a compact subset of a topological vector space is both,
bounded and totally bounded.
Lemma 12.1.12. Suppose C, A and B are convex, affine and balanced sets of a topological
vector space X respectively. Then,
(i) C o and C are convex and
(12.2) λC o + (1 − λ)C ⊂ C o , 0<λ≤1
o
Moreover, if C o 6= ∅ then C o = C and C = C o.
(ii) A is affine.
(iii) B is balanced, and if 0 ∈ B o , so is B o .

Proof. (i) Lemma 12.1.5 implies that or all 0 < λ < 1

λC + (1 − λ)C = λC + (1 − λ)C ⊂ λC + (1 − λ)C ⊂ C.
This shows that C is convex.
(12.2) holds for λ = 1. Suppose 0 < λ < 1, x ∈ C o and y ∈ C. There is an open
neighborhood U around 0 such that x + U ⊂ C. Since y ∈ C, there exists z ∈ y −
λ

1−λ U ∩ C, and so (1 − λ)(y − z) ∈ λU . The convexity of C implies that the open set
V = λ(x + U ) + (1 − λ)z is contained C; moreover,
λx + (1 − λ)y = λx + (1 − λ)z + (1 − λ)(y − z) ∈ λx + (1 − λ)z + λU = V ⊂ C.
Hence wλ := λx + (1 − λ)y ∈ C o . Convexity of C o and (12.2) follow.
If C o 6= ∅, then as limλ→0 wλ = y, we obtain that C o = C.
o o
Clearly C o ⊂ C . To prove the converse inclusion let x ∈ C . There exists an open
neighborhood W of 0 such that x + W ⊂ C. Fix x0 ∈ C o . There is ε > 0 such that
ε(x − x0 ) ∈ W . Then x + ε(x − x0 ) ∈ C. By the first part of the proof, x − ε(x − x0 ) =
εx0 + (1 − ε)x ∈ C 0 ; hence, x = 21 (x − ε(x − x0 )) + 12 (x + ε(x − x0 )) ∈ C o . Therefore
o
C o ⊂ C ⊂ C o.
(ii) is proved similarly.
(iii) Suppose 0 < |α| ≤ 1. Then αB = αB ⊂ B. Clearly if α = 0, αB = {0} ⊂ B ⊂ B;
hence, B is balanced. If 0 ∈ B o the, for 0 < |α| ≤ 1, αB o = (αB)o ⊂ B o . This completes
out proof.
12.1. Topological Vector Spaces 329

Lemma 12.1.13. Any neighborhood W ∋ 0 contains a balanced neighborhood B ∋ 0.

is δ > 0 and a neighborhood U ∋ 0 such

Proof. By continuity of the scalar product, thereS
that for all |α| < δ, αU ⊂ W . The open set B = {|α|<δ} αU ⊂ W satisfies the conditions
of the Lemma.
Lemma 12.1.14. A set F is bounded iff for any open neighborhood W ∋ 0, there is t0 > 0
such that F ⊂ tW for all t ≥ t0 .

Proof. Let B ⊂ W be a balanced neighborhood of 0. Let t0 > 0 be such that F ⊂ t0 B.

Then tt0 B ⊂ B for all t ≥ t0 . Therefore, F ⊂ t0 B ⊂ tB ⊂ tW .
Theorem 12.1.15. (i) For any sequence {zn ∈ F : n ∈ N} with |zn | → ∞ and any
nonempty open set W ⊂ X
[
X= zn W.
n
(ii) If V is an open bounded neighborhood of 0, then for any decreasing sequence an ց 0,
the collection {an V : n ∈ N} is a local base at 0.
(iii) If A ⊂ X is a bounded, then so is A.

Proof. Let U ⊂ W be a balanced neighborhood of 0. From the continuity of the map

t 7→ tx, it follows that the singleton {x} is bounded for any x ∈ X. Hence, there is α0 > 0
such that x ∈ tU for all t ≥ α0 . Then, x ∈ |z|U = zU ⊂ zW for all z ∈ F with |z| ≥ α0 .

Suppose V is an open bounded neighborhood of 0. Then for any open W neighborhood of

0, there is t0 > 0 such that t ≥ t0 implies that V ⊂ tW . There is n0 such that 0 < t0 an < 1
whenever n ≥ n0 . For such n, we have that an V ⊂ W .

For any open neighborhood W of 0, let U be an open with 0 ∈ U ⊂ U ⊂ W . Since A is

bounded, then there is t0 > 0 such that t ≥ t0 implies A ⊂ tU . Then, A ⊂ tU = tU ⊂
tW .

An immediate consequence of Theorem 12.1.15[(i)] is that compact subsets of a topo-

logical space X are bounded.
A sequence {xn : n ∈ N} in a topological vector space X is said to be a Cauchy
sequence iff for any open neighborhood V of 0, there is an integer N such that n, m ≥ N
implies that xn − xm ∈ V .
Theorem 12.1.16. Let Φ = {xn : n ∈ N} be a sequence in a topological vector space X. If
Φ is convergent, then it is a Cauchy sequence. If Φ is a Cauchy sequence, then it is bounded.

Proof. For any open neighborhood V of 0 let U be a balanced neighborhood such that
U +U ⊂V.
n→∞
Suppose xn −−−→ x. Then, for all n, m large enough, xm , xn ∈ x + U and so, xn − xm =
(xn − x) + (x − xm ) ∈ U + (−U ) = U + U ⊂ V .
330 12. Some Elements of Functional Analysis

Suppose {xn : n ∈ N} is Cauchy. Then there is an integer N such that xn ∈ xN + U for

all n ≥ N . There is t0 > 1 such that xj ∈ tU for all t ≥ t0 and 1 ≤ j ≤ N . Therefore,
{xn : n ∈ N} ⊂ tU + tU ⊂ tV for all t ≥ t0 .

Theorem 12.1.17. A set E in a topological vector space X is bounded iff γn xn → 0 for

any sequences {xn : n ∈ N} ⊂ E and {γn : n ∈ N} ⊂ F with γn → 0.

Proof. Suppose E is bounded, γn → 0 and {xn : n ∈ N} ⊂ E. Then, for any balanced

open neighborhood W of 0 there is t > 0 such that E ⊂ tW . There is an integer N such
that n ≥ N implies |tγn | < 1. Hence, γn xn = γn tt−1 xn ∈ W for all n ≥ N .

Suppose E is not bounded and let W be an open neighborhood of 0. For any integer n
there is xn ∈ E \ (nW ). Then n1 xn does not converge to 0.

A metric d in a topological vector space is translation invariant if

d(x + z, y + z) = d(x, y)
for all x, y, z in X.
A topological space X is called F–space if its topology is generated by a complete
translation invariant metric A topological vector space X is locally convex if every point
has a neighborhood V which is convex. It is easy to check that any locally convex space
admits a basis consisting of convex open sets. An F–space is called Fréchet space if it is
locally convex.

Example 12.1.18. Any Banach space is a Fréchet space.

Example 12.1.19. The ball B(0; r) = {f ∈ L0 : kf k0 < r} is balanced for any r > 0;
however, L0 is not locally convex in general. As a counterexample, consider the probability
space ((0, 1], B((0, 1]), λ). Define
f0 ≡ 1, fn = 2k 1(2−k (l−1),2−k l]

for all n = 2 + . . . + 2k−1 + ℓ, 1 ≤ ℓ ≤ 2k , k ∈ N. The sequence fn converges to 0 in λ–

measure; in fact, kfn k0 = 2−k . If L0 were locally convex, then there would be a δ > 0 such
that the convex hull co(B(0; δ)) ⊂ B(0; 1/2). For all k large enough, there are 2k functions
fn such that 2−k (fn + . . . fn+2k ) ≡ 1 ∈ co(B(0; δ)) ⊂ B(0; 1/2) which is not possible as
k1k0 = 1.

Theorem 12.1.20. Suppose X is a topological vector space whose topology is generated by

an invariant metric. Then, for any sequence {xn : n ∈ N} that converges to 0, there is a
sequence {γn : n ∈ N} ⊂ F such that γn → ∞ and γn xn → 0.

Proof. There is an increasing sequence of integers {nk : k ∈ Z+ with n0 = 0 such that

for k ≥ 1, n ≥ nk implies d(xn , 0) < k12 . Let γn = k1[nk ,nk+1 ) (n). Then, d(γn xn , 0) ≤
k1[nk ,nk−1 ) (n)d(xn , 0) ≤ k1 1[nk ,nk+1 ) (n).
12.2. Quotient topology 331

12.2. Quotient topology

In many applications it is typical to consider closed linear spaces of a topological vector
space. We conclude this section construction of a linear topology on quotient spaces. Recall
that if M is a linear subspace of a linear space X, the space X/M is the collection of class
of equivalences of the relation x ∼ y iff x − y ∈ N . Denote by π(x) = {y ∈ X : x − y ∈
M = x + M . The following facts are easy to check:
(a) π(x)+π(y) = (x+y)+M = π(x+y), and if x ∼ x′ and y ∼ y ′ , then π(x′ )+π(y ′ ) =
π(x) + π(y).
(b) π(αx) = αx + M .
This defines an algebraic structure on X/M by setting π(x)+π(x) := π(x+y) and απ(x) :=
π(αx). The map
π : X −→ X/M
given by x 7→ π(x) is an epimorphism, that is, it is a surjective map that satisfies π(x + y) =
π(x) + π(y) and π(αx) = απ(x). Here π(0) = M . The codimension of M is defined as
the dimension of X/M .
If X is a topological vector space and M is a closed linear subspace of X then, the natural
topology τq on X/M is the quotient topology defined by declaring a set U ∈ X/M open
iff π −1 (U ) is open in X. The space (X/M, τq ) said to be a quotient space.
Theorem 12.2.1. Let (X, τ ) be a topological vector space. Suppose M is a closed linear
subspace of X and let τM be the quotient topology on X/M .
(i) (X/M, τM ) is a topological vector space and the quotient map π is open and con-
tinuous.
(ii) If V is a local basis at 0 for τ , then {π(V ) : v ∈ V} is a local basis at π(0) for τM .
(iii) Each of the following properties of X is inherited by X/M : local boundedness, local
convexity, local countable basis, normability.
(iv) If X is an F –space, a Fréchet space, or a Banach space, then the same is true for
X/M .
S S
Proof. (i) Since π −1 j U j = j π −1 (Uj ) and π −1 (A ∩ B) = π −1 (A) ∩ π −1 (B), τM is
indeed a topology on X/M and π is continuous by definition of τM .
Notice that for any x ∈ X
π −1 (π(x)) = x + M
If V ⊂ X is open then, as π −1 (π(V )) = M + V , and M + V is open, it follows that π is
open. As π −1 (π(0)) = M , it follows that {π(0)} is closed in τM .
It remains to show that the sum and scalar product on X/M are continuous operations.
For the sum is enough to show that (π(x), π(y)) 7→ π(x) + π(y) is continuous at π(0) = M .
If W is a neighborhood of π(0), then π −1 (U ) is an open set containing M . There is an open
332 12. Some Elements of Functional Analysis

neighborhood V of 0 in τ such that V +V ⊂ π −1 (W ). Hence π(V )+π(V ) = π(V +V ) ⊂ W .

Since π(0) ∈ π(V ) and π is open, continuity of the sum follows.

For the scalar product, suppose W is an open neighborhood of π(αx) in τM . Then π −1 (W )

is an open set in X containing the closed set αx + M . There is an open neighborhood Vx of
x in X and an open ball B(α; r) in F such that B(α)·Ux ⊂ π −1 (W ). Hence π(B(α; r)·Ux ) =
B(α; r) · π(Ux ) ⊂ W . Since π is open, then π(Ux ) is a neighborhood of π(x) in τM and the
continuity of the scalar function follows.

(ii) follows immediately from (i).

(iii) can be obtained from (ii) and the proof is left as an exercise.

(iv) Suppose that d is translation invariant metric on X compatible with the topology τ .
We can define a metric ρ on X/M by setting
ρ(π(x), π(y)) := inf{d(x − y, z) : x ∈ M } = d(x − y, M )
Notice that d(x − y, M ) = 0 iff x − y ∈ M = M . Hence
ρ(π(x), π(y)) = ρ(π(x) − π(y), π(0)),
and ρ(π(x), π(y)) = 0 iff π(x) = π(y). Since
d(x − y, z) ≤ d(0, z + y − x) = d(−z, y − x) = d(y − x, −z),
it follows that ρ(π(x), π(y)) = ρ(π(y), π(x)). From
d(x − y, z) ≤ d(x − y, u − y + z ′ ) + d(u − y + z ′ , z) = d(x − u, z ′ ) + d(u − y, z − z ′ ),
we conclude that ρ(π(x), π(y)) ≤ ρ(π(x), π(u)) + ρ(π(u), π(y)). This shows that ρ is a
translation invariant metric on X/M . Since d(x, 0) = d(x + z, z) for all z,

π {x : d(x, 0) < r} = {π(x) : ρ(π(x), π(0)) < r}
From (ii), it follows that if d is a translation invariant metric that generates the topology τ
on X then, rho is a translation invariant metric on X/M that generates τM .

If d corresponds to a norm, then

kπ(x)kM := inf kx − z, z ∈ M k = d(x, M )
defines a norm on X/M . It suffices to show that kπ(αx)kM = αkπ(x)kM . If α = 0 there
is nothing to prove. If α 6= 0 then, from kαx − zk = |α|kx − α−1 zk and α−1 M = M we
conclude that kπ(αx)kM = αkπ(x)kM . Thus, k kM is a norm on X/M .

Suppose that d is a complete translation invariant metric generating τ . and let {π(xn ) :
n ∈ N} be a Cauchy sequence in (X/M, ρ). Without loss of generality we may assume that
ρ(π(xn+1 − xn ), π(0)) < 2−n . Set z1 = 0 and choose z2 ∈ M such that
1
d(x2 + z2 − (x1 + z1 ), 0) < .
2
12.3. Locally convex spaces 333

Proceeding by induction, we obtain a sequence {zn : n ∈ N} ⊂ M such that

ρ(π(xn+1 ), π(xn )) ≤ d(xn+1 + zn+1 , xn + zn ) < 2−n
Since X is complete, there is x∗ ∈ X such that d(xn + zn , x∗ ) → 0. The continuity of π
implies that limn π(xn + zn ) = limn π(xn ) = π(x∗ ). This shows that ρ is complete.

12.3. Locally convex spaces

In this section we will study some property of locally convex topological spaces. The
following result shows that any locally convex space admits a base that is convex and
balanced.

Lemma 12.3.1. Suppose X is locally convex. Any open convex neighborhood of 0 in X

contains an open convex balanced neighborhood.

Proof. Let U be a convex open neighborhood of 0. By Lemma 12.1.13 there is a balanced

open set W ⊂ U . TFor any α ∈ F with |α| = 1, αU is convex and α−1 W = W . Hence,
W ⊂ αU and A = {α:|α|=1} αU is a convex subset of U with non-empty interior. It follows
that Ao ⊂ U is a non-empty open convex set. We conclude the proof by showing that A is
balanced. Let β ∈ F with |β| = 1 and 0 ≤ r < 1. Then βA = A and, as 0 ∈ U and U is
convex,
rA = rβA ⊂ rβU ⊂ βU.
This shows that A is a balanced convex set. Therefore, Ao is an open convex and balanced
local neighborhood contained in U .

A subset A of a topological vector space is said to be absorbent if for any x ∈ X there

is t > 0 such that x ∈ tA. Theorem 12.1.15(a) shows that any open neighborhood of 0 is
absorbent. For any absorbent set A, there is a function µA : X → R+
(12.3) µA (x) = inf{t > 0 : t−1 x ∈ A}.
µA is called Minkowski’s functional of A.

Theorem 12.3.2. Let A ⊂ X be absorbent. Then,

(a) For any s > 0, µA (sx) = µs−1 A (x) = sµA (x).
If A is also convex, then
(b) For any x, y ∈ X, µA (x + y) ≤ µA (x) + µA (y).
If A is convex and balanced, then
(c) For any λ ∈ F and x ∈ X, µA (λx) = |λ|µA (x).
Consequently, µA is a seminorm whenever A is an absorbent balanced convex set.
334 12. Some Elements of Functional Analysis

Proof. (a) follows from

s{t > 0 : t−1 x ∈ A} = {u > 0 : s u−1 x ∈ A} = {u > 0 : u−1 x ∈ s−1 A}.

(b) If A is convex and absorbent, then {t > 0 : t−1 x ∈ A} is an infinite interval (either open
or closed) whose left end point is µA (x) for if t−1 x ∈ A and s > t, then
s−1 x = (1 − st )0 + st t−1 x ∈ A.
Hence, if µA (x) < r and µA (y) < t, then r−1 x and t−1 y belong to A. The convexity of A
implies
r −1 t −1
(r + t)−1 (x + y) ≤ r x+ t y ∈ A.
r+t r+t
Thus, µA (x + y) ≤ r + t. Letting r ց µA (x) and s ց µA (y) completes the proof.
(c) If A is balanced, convex and absorbent then for any θ ∈ F with |θ| = 1 we have that
θ−1 A = A. Then, {t > 0 : t−1 x ∈ θ−1 A} = {t > 0 : t−1 x ∈ A} and consequently,
µA (θx) = µA (x). For a general λ ∈ F we have that λ = |λ|θ for some θ ∈ F with |θ| = 1.
Therefore,
µA (λx) = |λ|µA (θx) = |λ|µA (x).
Therefore, if A is an absorbent balanced convex set µA is a seminorm.
Theorem 12.3.3. Suppose that A is convex and absorbent. Let B = {x ∈ X : µA (x) < 1}
and C = {x ∈ X : µA (x) ≤ 1}. Then, B ⊂ A ⊂ C and µB = µA = µC .

Proof. Since A is convex and absorbent, s > µA (x) implies that s−1 x ∈ A. Thus, if
µA (x) < 1 then x = 1 x ∈ A, that is, B ⊂ A. It is obvious that A ⊂ C. It is easy to see that
1
B and C are convex; since µA ( µA (x)+1 x) < 1, it follows that B and C are also absorbent.
By definition of the Minkowski functional it follows that µC ≤ µA ≤ µB . For x ∈ X fixed,
consider µC (x) < s < t. Then s−1 x ∈ C and µA (t−1 x) = st µA (s−1 x) < 1, it follows that
µB (x) < t. By letting t ց µC (x) we obtain that µB (x) ≤ µc (x).
Theorem 12.3.4. Suppose ρ is a seminorm on a linear space X and set B = {x ∈ X :
ρ(x) < 1}. Then, B is balanced, convex and absorbent, and ρ = µB .

1
Proof. It is clear that B is a balanced convex set, and since ρ ρ(x)+1 x < 1, it follows
that B is also absorbent. For each t > 0 and x ∈ X, ρ(t−1 x) < 1 iff ρ(x) < t; hence,
{t > 0 : t−1 x ∈ B} = {t > 0 : ρ(x) < t}, and so ρ(x) = µB (x).
Theorem 12.3.5. Suppose (X, τ ) is a locally convex topological linear space and let V be
a local convex balanced base at 0 ∈ X. Then,
(i) V = {x ∈ X : µV (x) < 1} for each V ∈ V.
(ii) {µV : V ∈ V} is a family of continuous seminorms that separates points in X.
(iii) τ is generated by {µV : V ∈ V}.
Conversely, if {ρα : α ∈ A} is a family of seminorms that separate points of X then,
12.3. Locally convex spaces 335

(iv) the collection of finite intersections of sets of the form

Vα (x; t) = {y : ρα (x − y) < t}, α ∈ A and t > 0,
is a base for a Hausdorff locally convex linear topology on X in which each ρα is
continuous.
(v) Let τ be the topology generated by {ρα : α ∈ A}. A set E ⊂ X is bounded in w.r.t.
τ iff each ρα is bounded in E.

Proof. (i) Theorem 12.3.3 shows that {x ∈ X : µV (x) < 1} ⊂ V . Let x ∈ V . The
continuity of λ 7→ λx implies that there is a real number 0 < λ < 1 such that λ−1 x ∈ V .
Thus µV (x) < 1, and so V ⊂ {x ∈ X : µV (x) < 1}.

(ii) By Theorem 12.3.2 µV is a seminorm for each V ∈ V. Given ε > 0, if y ∈ x + εV then,

|µV (x) − µV (y)| ≤ µV (y − x) < ε which means that µV is continuous. If x 6= 0 then there
is V ∈ V such that x ∈/ V . It follows that µV (x) ≥ 1, and so {µV : V ∈ V} separates points
of X.

(iii) follows from (ii) since τ is generated by finite intersections of sets of the form x + tV =
{y : µV (x − y) < t} where x ∈ X and t > 0.

(iv) For any α ∈ A and t > 0, Vα (0; t) = {x ∈ X : ρα (x) < t} = tV (0; 1) is balanced,
convex, and absorbent set. Hence, by Theorem 12.3.4, µV (0;1) = ρα . Consequently, the
collection of all finite intersections of sets of the form Vα (0, 1/n), α ∈ A and n ∈ N, is a
locally balanced and convex base at 0 for some topology τ on X.

Since {ρα } separates

points, for any x 6= y there is α ∈ A with rx,y := ρα (x − y) > 0. Hence,
x + V (0; rx,y /2) ∩ y + V (0; rx,y /2) = ∅, that is, τ is Hausdorff. The continuity of ρα is
clear from the inequality |ρα (x) − ρα (y)| ≤ ρα (x − y).

It remains to show that (x, y) 7→ x+y and (λ, x) 7→ λx are continuous. If V = ∩nj=1 Vαj (0; εj )
for some n ≥ 1, α1 , . . . , αn ∈ A and positive ε1 , . . . , εn then
1 1
V + V ⊂ V.
2 2
Continuity of (x, y) 7→ x + y follows.

Let α0 ∈ F and x0 ∈ X be fixed. If |α − α0 | < δ and max1≤j≤n ραj (x − x0 ) < δ for some
δ > 0, then
ραj (αx − α0 x0 ) ≤ |α||ραj (x − x0 ) + |α − α0 |ραj (x0 )
≤ (δ + |α0 |)δ + δ max ραj (x0 ).
1≤j≤n

For δ small enough, (δ + |α0 |)δ + δ max1≤j≤n ραj (x0 ) < min1≤j≤n εj . Continuity of (α, x) 7→
αx follows.

(v) Suppose E is bounded. As Vα (0; 1) is an open neighborhood of 0, we have that E ⊂

kV (0; 1) = V (0; k) for some k > 0. This show that ρ(x) < k for all x ∈ E. Conversely,
336 12. Some Elements of Functional Analysis

suppose every ρα is bounded in E. Let U be an open neighborhood of 0. Then, there are

T
seminorms ραj an positive numbers rj , mj , j = 1, . . . , k such that V := kj=1 Vραj (0; rj ) ⊂
n
U , and ραj (x) < nj for all x ∈ E. Then, for n > max1≤j≤k rjj , E ⊂ nV ⊂ nU . Hence, E is
bounded.
Theorem 12.3.6. If {ρn : n ∈ N} is a collection of seminorms that separates points of
X, then X admits a translation metric that makes X a locally convex balanced topological
vector space for which each ρn is continuous.

Proof. Theorem 12.3.5 shows that the collection of sets {x : ρn (x) < r}, n ∈ N and r > 0,
defines a convex balanced local based at 0 for a linear topology τ on X. It easy to verify
that
ρn (x − y)
d(x, y) := max 2−n
n 1 + ρn (x − y)
is an invariant metric on X. We now show that d is compatible with τ . For any r > 0 let
N be the first integer for which 2−n ≤ r for all n > N . Then,
N
\
r
(12.4) x ∈ X : ρk (x) < −k = Bd (0; r) = {x ∈ X : d(x, 0) < r}.
2 −r
k=1
Hence Bd (0; r) ∈ τ and the identity map I from (X, τ ) into (X, d) is continuous. In
passing, (12.4) also shows that the balls Bd (0; r) are balanced and convex.
T
Conversely, consider the basic set V = m j=1 {x ∈ X : ρj (x) < rj }. Fix a positive number r
2−j rj
less than min1≤j≤m 1+rj . If d(x, 0) < r then
ρj (x) rj
2−j < r < 2−j
1 + ρj (x) 1 + rj
for all 1 ≤ j ≤ m. Hence ρj (x) < rj for all 1 ≤ j ≤ m, that is, Bd (0; r) ⊂ V . This shows
that the identity map I −1 from (X, d) to (X, τ ) is continuous.
Example 12.3.7. (The space C ∞ (Ω).) Suppose Ω ⊂ Rn is a nonempty open set. Let
{Km : n ∈ N} be a cover of Ω by compact sets so that Kn ⊂ Kn+1 o . Let C ∞ (Ω) be the

collection of all infinitely differentiable real valued functions in Ω. For each n ∈ N define
the seminorms

pn (φ) = sup |φ(k) (x)| : x ∈ Kn , |k| ≤ n
|k| P
where φ(k) = k∂1 φ kn and |k| = nj=1 kj . Clearly {pn : n ∈ N} separates points of C ∞ (Ω).
∂x1 ···∂xn
By Theorem 12.3.6, the topology on C ∞ (Ω) induced by {pn : n ∈ N} is metrizable by a
translation invariant metric d. We claim that (C ∞ (Ω), d) is complete. Suppose {φn : n ∈ N}
1
is a Cauchy sequence. For any N ∈ N let VN = {φ : pN (φ) < M }. Then, for each N , there
k k 1
is n0 such that |∂ φn − ∂ φm | < N on KN whenever n, m ≥ n0 and |k| ≤ N . It follows
that ∂ k φn converges uniformly on compact subsets of Ω to function gk . In particular,
n→∞
φn −−−→ g0 . It is an easy exercise to show that g0 ∈ C ∞ (Ω) and that Dk g0 = gk .
12.3. Locally convex spaces 337

As another application of Theorem 12.3.5 we have the following result.

Theorem 12.3.8. A topological vector space X is normable iff its origin has a bounded
convex neighborhood.

Proof. If X is normable and k k is a norm that generates the topology on X, then U =

{x ∈ X : kxk < 1} is a bounded convex neighborhood of the origin.
Conversely, suppose V is a bounded convex neighborhood of the origin. By Lemma 12.3.1
V contains an convex balanced neighborhood U of the origin. Let k k be the Minkowsky
functional of U . Theorem 12.3.2 shows that k k is a seminorm. By Theorem 12.1.15,
{rV : r > 0} is a local base for the topology of X. If x 6= 0, then there is r > 0
such that x ∈/ rV ; hence, by Theorem 12.3.5 (a),kxk ≥ r. Therefore k k is a norm and
{x ∈ X : kxk < r} = rV . This shows that the norm topology induces the topology on
X.
Definition 12.3.9. The convex hull of a set A, denoted by co(X), is the smallest convex
set in X containing A. The balanced or circled convex hull of a set A, denoted by
co◦ (A), is the smallest balanced convex set in X that contains A.

The following result gives a full analytic description of the convex hull of a set.
Theorem 12.3.10. For any linear topological space X:
(i) The intersection of any collection of convex sets is convex
(ii) For any A ⊂ X, co(A) = ∩{C : A ⊂ C, C is convex} and
nX
N N
X o
(12.5) co(A) = λj xj : N ≥ 1, λj ≥ 0, λj = 1, xj ∈ A .
j=1 j=1

(iii) For any ∅ =6 A ⊂ X, Ab = {λx : λ ∈ F, |λ| ≤ 1, x ∈ A} is the smallest balanced

set in X containing A and
nXN N
X o
◦ b
(12.6) co (A) = co(A ) = λj xj : N ≥ 1, |λj | ≤ 1, xj ∈ A .
j=1 j=1

(iv) If A1 , . . . , An are convex subsets in X then

[ n nX n n
X o
co Ak = λk xk : λk ≥ 0, λk = 1, xk ∈ Ak .
k=1 k=1 k=1
S
n
If each Ak is also compact then co k=1 Ak is compact.
(v) If A1 , . . . , Ak are convex and balanced subsets in X then
[n nX n n
X o
◦
co Ak = λk x k : |λk | ≤ 1, xk ∈ Ak .
k=1 k=1 k=1
S
n
If each Ak is also compact, then co◦ k=1 Ak is compact.
338 12. Some Elements of Functional Analysis

T
Proof. (i) Suppose C is a collection of convex subsets of X and let x, y be points in C.
Then, for any λT∈ [0, 1] we have that λx + (1 − λ)y ∈ C for each C ∈ C. Therefore
λx + (1 − λ)y ∈ C.

(ii) Denote the sets on the left–hand side and right hand side of (12.5) by C and D respec-
tively.PWe claim that for any Ppoints x1 , . . . , xn in C and nonnegative numbers λ1 , . . . , λn
n n
with k=1 λk = 1, we have k=1 λk xk ∈ C. Since C is convex, the claim holds n ≤ 2 by
definition. Assume the statementP is valid for n − 1 ≥ 2. Let λ1 , . . . , λn be non negative
numbers with 0 < λn < 1 and nj=1 λj = 1. Then, for any set of points x1 , . . . , xn in C,
n
X n−1
X λj
λj xj = λn xn + (1 − λn ) xj ∈ C
1 − λn
j=1 j=1

since C is convex. As a consequence, A ⊂ D ⊂ C.

To complete the proof it is enough to show that D is convex. For any pair of points x
and y in D, then there exists N ≥ 1 points xj ∈ A and two sets of non negative numbers
P PN PN
{λ1 , . . . , λN } and {λ′1 , . . . , λ′N } with Nj=1 λj = 1 =
′
j=1 λj such that x = j=1 λj xj and
PN ′
y = j= λj xj . Thus, for any α ∈ [0, 1],
N
X
αx + (1 − α)y = (αλj + (1 − α)λ′j )xj ∈ D
j=1
PN
since αλj + (1 − α)λ′j ≥ 0 and j=1 (αλj + (1 − α)λ′j ) = 1.

(iii) It is clear that co(Ab ) ⊂ co◦ (A). From part (3) we obtain that
nX
N N
X o
co(Ab ) = λj αj xj : N ≥ 1, |αj | ≤ 1, λj ≥ 0, λj = 1, xi ∈ A
j=1 j=1
nX
N N
X o
(12.7) = λj xj : N ≥ 1, |λj | ≤ 1, xi ∈ A .
j=1 j=1

From (12.7) it is clear that co(Ab ) is balanced.

Pn
(iv) Let S ⊂ Rn be the set of all points (λ1 , . . . , λn ) such that λk ≥ 0 and k=1 λk = 1 and
let A = A1 × · · · × An . The function f : S × A → X defined as
f (λ, a) = λ1 a1 + . . . + λn an
is continuous and by (1),
n
[ [
n
Ak ⊂ K = f (S × A) ⊂ co An .
k=1 k=1
12.3. Locally convex spaces 339

Let (α, a) and (β, b) be points in S × A. Let J = {1 ≤ k ≤ n : αk + βk 6= 0}. Then, for any
0<λ<1
n
X
λf (α, a) + (1 − λ)f (β, b) = λαk ak + (1 − λ)βk bk
k=1
X λαk ak + (1 − λ)βk bk
= (λαk + (1 − λ)βk )
λαk + (1 − λ)βk
k∈J
= f (λα + (1 − λ)β, c),

where ck = λαλαk ak +(1−λ)βk bk

k +(1−λ)βk
if k ∈ J and ck = ak otherwise. It follows that K is convex,
and so co(∪nk=1 An ) = K. If each Ak is compact then, since f is continuous, K is compact.
Pn
(v) Let D be the set of points in (z1 , . . . , zn ) in Cn such that k=1 |zk | ≤ 1. The function
g : D × A → X given by
n
X
g(z, a) = zk a k
k=1
S
n
is continuous and Q = g(D, A) ⊂ co◦ A
k=1 k . It is easy to check that Q is balanced
and convex. If each Ak is compact, then compactness of Q follows from continuity of g.

Theorem 12.3.11. (Mazur) Let X be a locally convex topological linear space X. If E ⊂ X

is totally bounded, then so is co(E). In particular, if X is Fréchet and E ⊂ X is compact,
then co(E) is compact.

Proof. Let U be any open neighborhood of 0. We can choose V be a convex neighborhood

of 0 such that V +V ⊂ U . By assumption, there exists a finite F ⊂ X such that E ⊂ F +V .
It follows that E ⊂ co(F ) + V . The last set is convex as it is the sum of two convex sets.
Hence
E ⊂ co(F ) + V
By Theorem 12.3.10[(iv)], co(F ) is compact and so it is totally bounded. Hence, there is a
finite set F1 ⊂ X such that co(F ) + V ⊂ F1 + V . Therefore,
co(E) ⊂ F1 + V ⊂ F1 + V + V ⊂ F1 + U
As U is arbitrary, we conclude that E is totally bounded.

Th last statement follows from the fact that the closure of totally bounded sets is also
totally bounded and that in complete metric spaces, a set is compact iff is closed and
totally bounded.

Lemma 12.3.12. Let D be a subset of Rn . If x ∈ co(D), then there exists a subset J ⊂ D

with at most (n + 1) points such that x ∈ co(J).
340 12. Some Elements of Functional Analysis

P P
Proof. Suppose x = kj=1 λj xj where λj > 0, kj=1 λj = 1, and xj ∈ C. Suppose k > n+1.
We will show that x is in the convex hull of a proper subset of {x1 , . . . , xk }. As k − 1 > n,
P
the vectors x2 − x1 , . . . , xk − x1 are linearly dependent; thus, kj=2 cj (xj − x1 ) = 0 for some
P
scalars cj , one of which is strictly positive. Let c1 = − nj=2 cj and c := min{λj /cj : cj > 0}.
Pk
Then, cj = 0, c > 0, λj − ccj ≥ 0 for all j and λm − ccm = 0 for some m. As
Pk j=1
x = j=1 (λj − ccj )xj , x is in the convex hull of a proper subset of {x1 , . . . , xk }.

For finite dimensional spaces we have a stronger version of Mazur’s theorem.

Theorem 12.3.13. If K is a compact subset of Rn , then co(K) is compact.

Proof. Let S be the simplex in Rn+1 consisting of points (λ1 , . . . , λn+1 ) such that λj ≥ 0
P Pn+1
and n+1
j=1 λj = 1. Consider the function f : S × K
n+1 → Rn given by (λ, x) =
j=1 λj xj .
By Lemma 12.3.12 co(K) = f (S × K n+1 ). The compactness of S × K n+1 and continuity of
f imply that co(K) is compact.

12.4. Inductive limit topology

We now show how to construct a locally convex topology on a linear space which is covered
by linear subspaces each of which has a locally convex topological structure and satisfy
some compatibility conditions.
Definition 12.4.1. An inductive system is a linear space X together with a directed
family of locally convex topological linear spaces {(Xi , τi ) : i ∈ I} such that
(a) If i j then Xi ⊂ Xj .
(b) If i j then the topology of τi is the relative topology on Xi induced by τj .
S
(c) X = i Xi .
Theorem 12.4.2. Suppose X, {(Xi , τi ), i ∈ I} is an inductive system. Then, the set B of
all convex balanced sets V ⊂ X such that V ∩ Xi ∈ τi for all i ∈ I forms a local basis for a
topology τ on X, called inductive limit topology , such that
(i) The sum (x, y) 7→ x + y is continuous.
(ii) The scalar product (λ, x) → λx is continuous.
(iii) If U ∈ τ , then U ∩ Xi ∈ τi for all i ∈ I. In other words, the topology on Xj induced
by τ is weaker than τj .
When {0} is closed in (X, τ ) then X is a locally convex space.

Proof. To check that B forms a basis for a topology we first prove that fro each V ∈ B
and x0 ∈ V , V − x0 is absorbent. Fix V ∈ B and let x0 ∈ V and x ∈ X. Then x0 ∈ Xi
and x ∈ Xj for some i, j ∈ I. Since I is directed, there is k ∈ I such that i k, and j k
so that x0 , x ∈ Xk . Then (V ∩ Xk ) − x0 is a neighborhood in τk around 0 and so there is
ε > 0 such that x0 + λx ∈ V ∩ Xk ⊂ Vk for all |λ| < ε.
12.4. Inductive limit topology 341

We now show that {x + V : x ∈ X, V ∈ B} forms a basis for a topology. Fix V ∈ B. Then,

V = {x ∈ X : µV (x) < 1}. Hence, if x0 ∈ V , then r0 := µV (x0 ) < 1 and the set
W = {x ∈ X : µV (x) < (1 − r0 )} = (1 − r0 )V
is also convex, balanced and absorbent. Furthermore, W satisfies

W ∩ Xj = (1 − r0 )V ∩ (1 − r0 )Xj = (1 − r0 ) (V ∩ Xj ) ∈ τj .
It follows that x0 + W ⊂ V . This shows that B is a local basis for a topology τ on X.

(i) To show that the addition operation is continuous is enough to prove that the it is
continuous at (0, 0). This follows from the observation that 12 V + 12 V ⊂ V for any V ∈ B.

(ii) To prove continuity of the scalar product, fix x0 ∈ X and λ0 ∈ F. Notice that
λx − λ0 x0 = λ(x − x0 ) + (λ − λ0 )x0
Given V ∈ B, there is ε > 0 such that (λ − λ0 )x0 ∈ 21 V whenever |λ − λ0 | < ε. Setting
1
δ := 2(ε+|λ 0 |)
, we have that if x ∈ x0 + δV and |λ − λ0 | < ε, then λ(x − x0 ) ∈ λδV ⊂ 21 V .
The continuity of the scalar product follows from this.

(iii) Fix x ∈ X and V ∈ B. We will show that (x + V ) ∩ Xj ∈ τj for all j ∈ I. Let i ∈ I

such that x ∈ Xi and choose k ∈ K such that i k and j k. Then both Xi and Xj
are contained in Xk and so, (x + V ) ∩ Xk = (x + V ) ∩ (x + Xk ) = x + (V ∩ Xk ) ∈ τk
since V ∩ Xk ∈ τk , x ∈ Xk , and Xk is a topological vector space. Since τj is coincides with
the topology on Xj induced by τk we have that (x + V ) ∩ Xj = (x + V ) ∩ Xk ∩ Xj ∈ τj .
Therefore the relative topology on Xj induced by τ is weaker than τj .
Example 12.4.3. Suppose Ω is a locally compact Hausdorff topological space. Consider
K(Ω), the collection of all compact subset of Ω, partially ordered by inclusion. For each
K ∈ K(Ω) let CK := {φ ∈ C00 (Ω) : supp(φ) ⊂ K} equipped with the topology τK induced by
the uniform norm. The collection {CK : K ∈ K(Ω)} satisfy the conditions of Theorem 12.4.2.
The inductive limit topology τ on C00 (Ω) is contains the uniform topology since any uniform
ball B(0; r) belongs to τ . Hence, the topology on CK induced by τ coincides with the
original uniform topology τK . It follows that {0} is closed in τ since W := C00 (Ω) \ {0} =
S
φ∈C00 (Ω) B(φ; kφk) ∈ τ , and W ∩ CK = CK \ {0} is open for all K ∈ K(Ω). Therefore,
(C00 (Ω), τ ) is a locally convex space.
Example 12.4.4. (The space D(Ω)). For each K ∈ K(Rn ) we define DK := {φ ∈ C ∞ (Rn ) :
supp(φ) ⊂ K} equipped with the topology τK induced by the seminorms
(12.8) pm (φ) := sup{|φα (x)| : x ∈ Ω, |α| ≤ m}
It is easy to check that if K ′ ⊂ K ∈ K(Rn ), then DK ′ is a closed subset of (DK , τK ).
Suppose Ω is an open subset of the Euclidean space Rn . Define the space D(Ω) := {φ ∈
C ∞ (Rn ) : supp(φ) ∈ K(Ω)} with the inductive limit topology τ associated to the system
{(DK , τK ) : K ∈ K(Ω)}. As in Example 12.4.3, it is easy to check that {0} is closed in τ and
so, (D(Ω), τ ) is locally convex space. Since convex balanced sets Vm,r := {φ : pm (φ) < r},
342 12. Some Elements of Functional Analysis

where m ∈ Z+ and r > 0, are contained in τ , we have that for each K ∈ K(Ω) the topology
induced by τ on DK coincides with the original topology τK .
Theorem 12.4.5. (D(Ω), τ ), as in Example 12.4.4, is a complete locally convex space. If
E ⊂ D(Ω) is bounded then
(12.9) sup pm (φ) < ∞
φ∈E

for all m ∈ Z+ , and E is compact in τ (that is, (D(Ω), τ ) has the Heine–Borel property).

Proof. First we prove that E ⊂ D(Ω) is bounded iff E ⊂ DK for some KS∈ K(Ω). Consider
a sequence {Kn : n ∈ N} ⊂ K(Ω) such that Kn ⊂ int Kn+1 with Ω = n Kn . Suppose E
is contained in no DK with K ∈ K(Ω). Then, there are functions φn ∈ E and points
xn ∈ Ω \ Kn such that |φn (xn )| > 0. Define the set
\ 1
W := {φ ∈ D : |φ(xn )| < |φn (xn )|}.
n
n≥1

This is a convex and balanced set. Furthermore, for any K ∈ K(Ω), W ∩ DK ∈ τK . To

check this assertion, let m be the smallest integer such K ⊂ Km . Then
\ 1
W ∩ DK = {φ ∈ DK : |φ(xn )| ≤ |φn (xn )} ∈ τK
n
1≤n<m

since each each set {φ ∈ DK : |φ(xn )| ≤ n1 |φn (xn )}, 1 ≤ n < m}, is open in τK . By
definition of the inductive topology, W ∈ τ . As φn ∈
/ nW , no set of the form rW contains
E; hence, E is not bounded. Therefore, if E is bounded then there is K ∈ K(Ω) so that
E ⊂ DK . Since τK coincides with the topology on DK induced by τ , E is bounded in
(DK , τK ) and (12.9) holds for all n ∈ Z+ .

We now show that E is compact. Since E is bounded, E ⊂ DK for some K ∈ K(Ω), and E is
bounded in DK . Since supφ∈E pm (φ) < ∞ for each m ∈ N, {∂ α φ : φ ∈ E} is equicontinuous
(in the sup norm in C(K)) for each α ∈ Zn+ . The Arzelà–Ascoli theorem and Cantor’s
diagonal process imply that any sequence in E contains a subsequence {φm : m ∈ N} ⊂ E
for which ∂ α φn converges uniformly. From this, it follows that E is compact in DK and
hence, in D.

Since Cauchy sequences are bounded, if Φ := {φn : n ∈ N} ⊂ D(Ω) is Cauchy in (D(Ω), τ ),

then Φ ⊂ DK for some K ∈ K(Ω). Since the topology τK coincides with τ ∩ DK , Φ is
Cauchy in (DK , τK ). Since τK is complete, Φ converges to a function φ ∈ DK .

The description of bounded sets and Cauchy sequences in D(Ω) was simple since each
DK , K ∈ K(Ω), is a Fréchet space generated by countable collection of seminorms. The
space D(Ω) is the archetype of countable inductive system {X, (Xn , τn ) : n ∈ N} in which
Xn is closed subset of (Xn+1 , τn+1 ), and taun coincides with the relative topology on Xn
12.4. Inductive limit topology 343

induced by τn+1 . These systems are called strict inductive systems. We will conclude
this section by presenting a result that describes bounded sets in strict inductive systems.
Lemma 12.4.6. Suppose X is a locally convex topological vector space and that M is a
linear subspace of X equipped with the induced topology from X. Let V ⊂ M be an open
convex balanced neighborhood of 0 in M .
(i) There exists an open balanced neighborhood W of 0 in X such that V = W ∩ M .
(ii) If M is closed in X then, for any x0 ∈ X \ M there exists an open convex balanced
neighborhood W of 0 in X such that W ∩ M = V and x0 ∈ X \ W .

Proof. (i) By Lemma 12.3.1 there is an open convex balanced neighborhood U of 0 in X

such that U ∩ M ⊂ V . Let W be the circled convex hull of U ∪ V . Since U ⊂ W , int(W )
is an open convex balanced neighborhood of 0 in X. Since V ⊂ W ∩ M , the interior of
W ∩ M in M contains V . It suffices to prove that W ∩ M ⊂ V . Let z ∈ W ∩ M . By
Theorem 12.3.10 z = αx + βy for some x ∈ U , y ∈ V and α, β ∈ F with |α| + |β| ≤ 1. If
α = 0 then z = βy ∈ V since V is balanced. If α 6= 0 then x = α−1 (z − βy) ∈ U ∩ M ⊂ V ;
hence, z ∈ V since V is convex and balanced.

(ii) If M is a closed linear subspace of X then X/M with the quotient topology is a locally
convex linear vector space and so it is also Hausdorff. Let π : X → X/M be the quotient
map. If x0 ∈ / M then there is an open neighborhood Ṽ of 0 in X/M which does not contain
x0 + M . Thus, π −1 (Ṽ ) is a convex balanced open neigborhood of 0 in X which does not
intersect x0 + M . Consequently, there is an open balanced neighborhood U of 0 in X such
that U ∩ M ⊂ M , and
(12.10) U ∩ (x0 + M ) = ∅
As in the proof of part (i), the set W = co◦ (U ∪ V ) is a convex balanced neighborhood of
0 in X such that W ∩ M = V . We will show that x0 ∈ / W . If x0 ∈ W , then x0 = αx + βy
for some x ∈ U , y ∈ V , and α, β ∈ F with |α| + |β| ≤ 1. If α = 0 then x0 = βy ∈ V ⊂ M
which is a contradiction to the assumption on x0 ∈/ M . If α 6= 0 then, since U is balanced
αx = x0 − βy ∈ (x0 + M ) ∩ U
which is a contradiction to 12.10. Therefore x0 ∈
/ W,
Theorem 12.4.7. (Dieudonné–Schwartz) Let X be a vector space and {(Xn , τn ) : n ∈ N}
an inductive sequence of locally convex vector spaces Xn ⊂ X such that for each n ∈ N, τn
is the topology on Xn induced by τn+1 , and Xn is a closed subset in (Xn+1 , τn+1 ). Then,
(i) (X, τ ) is a locally convex topological vector space, and for each n ∈ N the topology
induced on Xn by τ is the same as τn , and Xn is closed in (X, τ ).
(ii) B ⊂ X is bounded in (X, τ ) iff B ⊂ Xn for some n and B is bounded in (Xn , τn ).

Proof. (i) Fix n ∈ N and let Vn be an open convex balanced neighborhood of 0 in (Xn , τn ).
By Lemma 12.4.6[(i)], there exists an open convex balanced neighborhood Vn+1 of 0 in
(Xn+1 , τn+1 ) such that Vn+1 ∩ Xn = Vn . Continuing by induction we obtained an increasing
344 12. Some Elements of Functional Analysis

sequence {Vn+j : j ∈ Z+ } of convexSbalanced sets such that Vn+j ∈ τn+j , Vn+j+1 ∩ Xn+j =
Vn+j . It is easy to check that V = k≥n Vk is convex balance set and

Vn ∩ Xj if j < n
V ∩ Xj =
Vj if n ≤ j
Hence V ∈ τ and Vn ∈ {Xn ∩ U : U ∈ τ }. This implies that τn and the induced topology
on Xn by τ are equal. We now show that X \ Xn ∈ τ . Let x ∈ X \ Xn . Then x ∈ Xn+p
for some integer p ≥ 1. Since Xn is closed in Xn+p , Theorem 12.1.8 shows that there is an
open convex balanced neighborhood Vn+p of 0 in τn+p such that
(x + Vn+p ) ∩ Xn = ∅.
The first part of the proof shows that there is a convex balanced neighborhood V ∈ τ of 0
such that V ∩ Xn+p = Vn+p . Consequently

(x + V ) ∩ Xn = (x + V ) ∩ Xn+p ∩ Xn = (x + Vn+p ) ∩ Xn = ∅.
This shows that X \ Xn is open in τ .
It remains to show that {0} is closed in (X, τ ). Let x ∈ X \ {0}. Then x ∈ Xn for some
n ∈ N. Then there is a convex balanced neighborhood Vn of 0 in τn such that x ∈ / Vn . By
the first part of the proof there is an open neighborhood V of 0 in τ such that V ∩ Xn = Vn .
Then, x ∈ / V and so {0} is closed in (X, τ ).
(ii) Suppose B is a bounded set in (Xn , τn ). Let V be an open convex balanced neighborhood
of 0 in τ . Then Vn := V ∩ Xn is a convex balanced neighborhood of 0 in τn . There is t0
such that t ≥ t0 implies that B ⊂ tVn ⊂ tV . This shows that B is also bounded in (X, τ ).
Conversely, suppose B ⊂ X is not contained in any Xn . There is a sequence {xn : n ∈ N}
such that xn ∈ B \ Xn . We can extract a subsequence such that xnk ∈ Xnk+1 \ Xnk . Clearly
xn1 6= 0. Thus there is a open convex balanced neighborhood V2 of 0 in τn2 such that
xn1 ∈/ Vn2 . Since 12 xn2 ∈ Xn3 \ Xn2 , by Lemma 12.4.6[(ii)], there is a convex balanced
neighborhood V3 of 0 in τn3 such that V3 ∩ Xn2 = V2 and 12 xn2 ∈ / V3 . Proceeding by
induction, we obtain an increasing sequence {Vk : n ∈ N} of convex
S balanced sets such that
Vk ∈ τnk , Vk+1 ∩ Xnk = Vk , and k1 xnk ∈/ Vk+1 . The set V = k VK is a convex balanced
subset of X and V ∩ Xn ∈ τn for each n ∈ N, that is V is an neighborhood of 0 in τ . By
construction the sequence { k1 xnk } ⊂ X \ V and, since X \ V is close, it does not converge
to 0. Hence, from Theorem 12.1.17, it follows that B is not bounded. Therefore, if B is
bounded in (X, τ ), then B ⊂ Xn for some n.
Suppose B is bounded and B ⊂ Xn . Let Vn be an convex balanced neighborhood of 0
in τn . By Lemma 12.4.6 there is a convex balanced neighborhood of 0 in τ such that
V ∩ Xn = Vn . As B is bounded, there is t0 > 0 such that B ⊂ tV for all t ≥ t0 . Thus
B = B ∩ Xn ⊂ (tV ) ∩ Xn = t(V ∩ Xn ) = tVn for all t ≥ t0 .
Remark 12.4.8. Under the assumptions of Dieudonné–Schwartz theorem, a sequence {xn :
n ∈ N} is convergent in the inductive limit topology (X, τ ) iff there exits Xk such that
{xn : n ∈ N} ⊂ Xk , and the sequence converges in (Xk , τk ).
12.5. Continuous linear transformations 345

Example 12.4.9. Suppose Ω is a locally compact second countable Hausdorff space. The
space C00 (Ω) with the inductive limit topology τ described in Example 12.4.3 is a strict
inductive system. Each space CK is closed in (C00 (Ω), τ ). A sequence {φn } is convergent
in τ iff there is a compact set K ⊂ Ω such that {φn : n ∈ N} ⊂ DK , and φn converges
uniformly.
Example 12.4.10. Let Ω ⊂ Rn be an open. The space D(Ω) described in Example 12.4.4
is a strict inductive system. Each DK is a closed subset of D, and a sequence {φn } ⊂ D
is convergent iff there is a compact set K such that {φn : n ∈ N} ⊂ DK , and a function
φ ∈ DK such that limn pm (φn − φ) = 0 for each m ∈ Z+ .
Example 12.4.11. Suppose ψ, , φ ∈ D(Rn ). Let K1 = supp(ψ) and K2 = supp(φ) and
K = K1 + K2 . The map F : x 7→ ψ(x)τx φ is a continuous map from Rn to DK . Indeed,
F (x) = 0 for all x ∈ K1c and supp(F (x)) ⊂ K1 + K2 for all x ∈ K1 . Since each ϕ ∈ D is
m→∞
uniformly continuous, for any sequence xm −−−−→ x in Rn and any α ∈ Zn+ , we have that
m→∞ m→∞
τxn ∂ α φ −−−−→ τx ∂ α φ uniformly. Hence F (xm ) −−−−→ F (x).

12.5. Continuous linear transformations

Suppose X and Y are linear spaces over the field F. A function Λ : X −→ Y such that
Λ(αu + v) = αΛ(u) + Λ(v) for all u, v ∈ X and α ∈ F is called a linear map. The kernel
or null space of such map Λ is defined as ker(Λ) := Λ−1 ({0}) = {x ∈ X : Λ(x) = 0}.
A linear map Λ between topological vector spaces X and Y is said to be a bounded
linear map if it maps bounded sets in X to bounded sets in Y .
Since local basis completely determine the topology of vector space, it is clear that Λ is
continuous iff Λ is continuous at 0. The space of all continuous linear maps from X to Y will
be denoted by L(X, Y ); when X = Y , we use L(X) to denote L(X, X). Elements of L(X)
are called operators on X. The following results provides some properties of continuous
linear function in topological vector spaces.
Theorem 12.5.1. Suppose X and Y are topological vector spaces and let Λ be a linear map
from X into Y . Then the implications (i) ⇒ (ii) ⇒ (iii) hold.
(i) Λ is continuous.
(ii) Λ is bounded.
(iii) For any sequence {xn : n ∈ N} ⊂ X that converges to 0, {Λxn : n ∈ N} is bounded
in Y .
In addition, if the topology of X is generated by an invariant metric, then the implications
(iii) ⇒ (iv) ⇒ (i) hold.
(iv) If xn → 0 in X then Λxn → 0 in Y .

Proof. (i)⇒(ii): Let E ⊂ X be a bounded set. For any open neighborhood W of 0 in Y

there exists an open neighborhood V of 0 in X such that Λ(V ) ⊂ W . There exists t0 > 0
such that E ⊂ tV for all t ≥ t0 . Therefore, Λ(E) ⊂ tΛ(V ) ⊂ tW for all t ≥ t0 .
346 12. Some Elements of Functional Analysis

(ii)⇒(iii): Let {xn : n ∈ N} ⊂ X be a sequence converging to 0. By Theorem 12.1.16,

{xn : n ∈ N} is bounded in X. As Λ is bounded, {Λxn : n ∈ N} is bounded in Y .

Suppose X has a topology compatible with an invariant metric.

(iii)⇒(iv): By Theorem 12.1.20(ii) there is a sequence {γn } ⊂ F with |λn | → ∞ such

that γn xn → 0. Hence, {Λ(λn xn )} is bounded in Y and, by Theorem 12.1.17, Λ(xn ) =
n→∞
γn−1 Λ(γn xn ) −−−→ 0.

(iv)⇒(i): Suppose (iv) holds but Λ fails to be continuous. Let {Vn : n ∈ N} be a local
neighborhood of 0. There exists an open neighborhood W of 0 in Y such that, for any n
there is xn ∈ Vn \ Λ−1 (W ). Then xn → 0 but Λxn 9 0 contradicting assumption (iv).
Corollary 12.5.2. Suppose (X, τ ) is a locally convex space and that τ is generated by a
countable nondecreasing family of seminorms {ρm : m ∈ N}. Then, Λ ∈ X ∗ iff there exists
a constant C > 0 and N ∈ N such that
(12.11) |Λx| ≤ CρN (x), x∈X

Proof. For any x ∈ X, m ∈ N and r > 0 define Vm (x; r) = {y ∈ X : ρm (x − y) < r}. Since
ρm ≤ ρm+1 for all m ∈ N, the collection of all sets Vm (x; r) forms a base for the topology τ .

Suppose (12.11) holds. Given ε > 0, let V := VN (0; ε/C). Then, Λ(V ) ⊂ B(0; ε). This
implies that Λ is continuous.

Conversely, if Λ is continuous, there is N and δ > 0 such that |Λ(x)| < 1 whenever x ∈
VN (0; δ). Consequently, |Λ(x)| < 2δ ρN (x) for any x ∈ X.
Example 12.5.3. (Continuous linear maps on D(Ω)) Suppose Y is a topological space, Ω
is an open subset of Rn , and Λ : D(Ω) → Y is a linear map. Theorem 12.5.1 implies that if
Λ is continuous, then Λ is bounded. Although (D(Ω), τ ) is not metrizable, we know that for
any compact K ⊂ Ω, the relative topology on DK induced by τ coincides with the topology
τK generated by the countable seminorms pm (φ) given. This will allow us to establish the
equivalence between continuity and boundedness of Λ.
Theorem 12.5.4. Suppose Λ : D(Ω) → Y is a linear map where Y is a locally convex
space. The following statements are equivalent.
(i) Λ is continuous.
(ii) Λ is bounded.
(iii) For any sequence φn → 0 in D(Ω), Λ(φn ) → 0 in Y .
(iv) The restriction of Λ to any DK , K ∈ K(Ω), is continuous.

Proof. The implication (i) ⇒ (ii) is as in Theorem 12.5.1.

Suppose Λ is bounded and let φn → 0 in D(Ω). Being bounded, {φn : n ∈ N} is contained

in some DK . As any bounded set in DK is bounded in D, the restriction of Λ to DK is
12.5. Continuous linear transformations 347

bounded. Since the relative topology τK induced by τ on DK is metrizable, the implication

(ii) ⇒ (iii) follows from Theorem 12.5.1.

Suppose (iii) holds and let {φn : n ∈ N} ⊂ DK be such that φn → 0 in τK . As τK

coincides with the relative topology induced by τ on DK , φn → 0 in (D(Ω), τ ). Hence, by
assumption Λφn → 0 in Y . As τK is mertrizable, the implication (iii) ⇒ (iv) follows from
Theorem 12.5.1.

Suppose that (iv) holds. This is the only place where the locally convex assumption on Y is
used. Let U be an open balanced convex set in Y and set V = Λ−1 (U ). Then V is balanced
and convex. By assumption V ∩ DK ∈ τK . By definition of the inductive topology τ on
D(Ω) (see Theorem 12.4.2), it follows that V ∈ τ . Therefore, Λ is continuous.
Example 12.5.5. Let α ∈ Zn+ and ψ ∈ C ∞ (Ω). The maps
Ψ : φ 7→ ψφ
Dα : φ 7→ ∂ α φ
from D(Ω) to itself are continuous. By Theorem 12.5.4, it is enough to show that if
m→∞
φm −−−−→ 0 in D(Ω), then ψφm and ∂ α φm converge to 0 in D(Ω). There is K ∈ K(Ω)
m→∞
such that {φm } ⊂ DK , and ∂ β φm −−−−→ 0 uniformly for all β ∈ Zn+ .

Clearly {∂ α φm : m ∈ N} ⊂ DK . For any N ∈ N

pN (∂ α φ) ≤ pN +|α| (φ), φ ∈ D(Ω)
m→∞
Hence, ∂ α φm −−−−→ 0 in D(Ω).

It is obvious that {ψφm : m ∈ N} ⊂ DK . From the Leibniz differentiation formula

P β β−γ
∂ β (ψψ)(x) = 0≤γ≤β γ ∂ ψ(x)∂ γ φ(x), for each N ∈ N there is a constant C =
C(ψ, N, K) such that
pN (ψφ) ≤ CpN (φ), φ ∈ DK
m→∞
It follows ψφm −−−−→ 0 in D.
Corollary 12.5.6. Λ ∈ D∗ (Ω) iff for any K ∈ K(Ω), there is a constant C > 0 and N ∈ N
such that
|Λφ| ≤ CpN (φ), φ ∈ Dk
where pN (φ) = sup{|Dα φ(x)| : x ∈ Ω, |α| ≤ N }.

Proof. By Theorem 12.5.4, Λ ∈ D∗ (Ω) iff Λ restricted to DK is continuous. The topology

τK on DK induced by the the inductive limit topology τ is generated by the the family of
seminorms {pN : N ∈ N}, and pN ≤ pN +1 for all N ∈ N. The conclusion follows from
Corollary 12.5.2.

When Y is a normed space, the equivalence between continuity and boundedness of

linear maps is easier to establish.
348 12. Some Elements of Functional Analysis

Theorem 12.5.7. Suppose X is a topological vector space and Y is a normed space. A

linear map Λ : X → Y is continuous iff Λ is bounded on a neighborhood of 0 in X.

Proof. If Λ is continuous, then there exists a neighborhood V of 0 in X such that Λ(V ) ⊂

B(0; 1). If E ⊂ X is bounded, then E ⊂ tV for some t > 0. Hence Λ(E) ⊂ tΛ(V ) ⊂
tB(0; 1) = B(0; t). This means that Λ maps bounded sets in X to bounded sets in Y .

Conversely, if Λ is bounded on a neighborhood V of 0 in X, then there is t > 0 such that

Λ(V ) ⊂ B(0; t) = tB(0; 1). Then, for any ε > 0, Λ(εt−1 V ) ⊂ εB(0; 1) = B(0; ε).

12.6. Banach algebra of linear operators on a Banach spaces

When X and Y are normed vector spaces, continuity of a linear map Λ : X → Y can be
determined directly by analyzing the norm of elements of the image.
Theorem 12.6.1. Let X and Y be normed spaces. The map k k : T 7→ sup kT xkY on
kxkX =1
L(X, Y ) defines a norm on L(X, Y ). If Y is a Banach space, then (L(X, Y ), k · k) is a
Banach space.

Proof. As T ∈ L(X, Y ) is bounded, kT k < ∞. From k(T + αU )xkY ≤ kT xkY + |α|kU xkY ,
it follows that kT k := sup kT xkY is a norm.
kxkX =1
Suppose Y is a Banach space, and assume (Tn : n ∈ N) is a Cauchy sequence in L(X, Y ).
Since
kTn x − Tm xkY ≤ kTn − Tm kkxkX
for all x ∈ X, it follows that {Tn : n ∈ N} converges pointwise to some function T : X → Y .
Since
kT (αu + v) − αT (u) − T (v)kY ≤ kT (αu + v) − Tn (αu + v)kY +
kαT (u) + T (v) − αTn (u) − Tn (v)kY ,
by passing to the limit as n → ∞ we conclude that T is a linear map. Given ε > 0, there
is an integer N such that n > m ≥ N implies that k(Tn − Tm )xk ≤ kTn − Tm k < ε for all
x with kxk = 1. Letting n → ∞ we obtain that supkxk=1 k(T − Tm )xkY ≤ ε for all n > N .
n→∞
This shows that Tn −−−→ T in L(X, Y ).
Definition 12.6.2. A normed ring (A, +, ·, k k) over the field F is a Banach ring if (A, k k)
is a nontrivial Banach space, and for any x, y ∈ A
kxyk ≤ kxkkyk
(A, +, ·, k k) is a Banach algebra if A is a Banach unital ring whose unit e satisfies kek = 1.
Remark 12.6.3. If A is a Banach ring, then the product (x, y) 7→ xy is continuous in
A × A. This follows from
kxy − x′ y′ k = kx(y − y′ ) − (x′ − x)y′ k ≤ kxkky − y′ k + ky′ kkx − x′ k
12.6. Banach algebra of linear operators on a Banach spaces 349

Remark 12.6.4. If A is a Banach ring with a unit e, then kek = keek ≤ kek2 and so,
kek ≥ 1. For each a ∈ A define La x = ax. Then La ∈ L(A) since kLa xk = kaxk ≤ kakkxk.
Clearly a 7→ La is an algebra isomorphism from A onto A e := {La : a ∈ A}. Since
kLa k ≤ kak and kak = kLa ek ≤ kekkLa k, A and A e are linearly homeomorphic. More
importantly, the norms a 7→ |||a||| := kLa k and a 7→ kak are equivalent. Under the norm
||| |||, A becomes a Banach algebra since Le = I.
e := A × C define
Remark 12.6.5. Suppose A is a Banach ring. On A
(a) (x, α) + c(y, β) = (x + cy, α + cβ) for all x, y ∈ A and α, β, c ∈ C.
(b) (x, α) · (y, β) = (xy + αy + βx, αβ) for all x, y ∈ A and α, β ∈ C.
(c) k(x, α)k = kxk + |α| for all x ∈ A and α ∈ C.
Under these operations and norm,(A, e +, ·) is a Banach algebra with unit (0, 1). Indeed,
k(x, α) · (y, β)k ≤ (kxk + |λ|)(kyk + |β|) = k(x, α)kk(y, β)k, and k(0, 1)k = 1. Clearly
x 7→ (x, 0) is an isometric isomorphism from A onto A × {0}. Notice that (x, α) · (y, 0) =
(xy + αy, 0) ∈ A × {0} and (y, 0) · (x, α) = (yx + αy, 0) ∈ A × {0} for all x, y ∈ A and
α ∈ C. By identifying A with A × {0} we have that any non–unital Banach ring is a closed
ideal of codimension one in a Banach algebra.
Example 12.6.6. If X is a locally compact Hausdorff space, C0 (X) under the uniform
norm and the pointwise sum, product and scalar product is a Banach ring. C0 (X) × R with
the operations and norm defined in Remark 12.6.5 is a Banach algebra. This algebra is
homeomorphic to C(X ∪ {∆}) where X ∪ {∆} is the one–point compactification of X.
Remark 12.6.7. If A is a Banach algebra, x ∈ A is invertible if there is y ∈ A such that
xy = e = yx. Clearly such y, if it exists, is unique, and will be denoted by x−1 . The
collection GA of all invertible elements in A contains the unit element e, and is a group
under multiplication. Indeed, if x, y ∈ A then (xy)−1 = y−1 x−1 .
Example 12.6.8. Suppose (X, k k) is a complex Banach space. Under operator addition
and scalar multiplication, and composition L(X) is an algebra whose unit is the identity
map I. With respect the operator norm
kT k = sup kT xk,
kxk=1

L(X) is a Banach algebra(see Exercise 12.17.16). The group GL(X) := GL(X) , called the
general linear group of X, consists of all bijective maps T ∈ L(X) for which T −1 ∈ L(X).
It will be shown below that GL(X) is an open subset of L(X) and that it is topological
group, that is the group multiplication (composition) in GL and the map T 7→ T −1 are
continuous with respect to the operator norm.
Lemma 12.6.9. Suppose A is a Banach algebra. If x ∈ A and kxk < 1, then (e − x) ∈ GA
and
X∞
(e − x)−1 = xn .
n=0
350 12. Some Elements of Functional Analysis

P
Proof. Let sn = nk=0 xk . Notice that (e − x)sn = sn (e − x) = e − xn+1 ; hence, if sn
converges in A to some element s, s = (e − x)−1 . For that purpose, we will show that
{sn : n ∈ N} is a Cauchy sequence. For n > m we have that
n
X kxkm+1
ksn − sm k = kxkk < .
1 − kxk
k=m+1

The Cauchy property follows from kxk < 1.

Theorem 12.6.10. If A is a Banach algebra, then the set GA of all x ∈ A for which x−1
exists is an open in (A, k k). Moreover, if x ∈ GA , then for khkkkx−1 k < 1

(x + h)−1 − x−1 + x−1 hx−1 = o(khk)

which implies that the map x 7→ x−1 on GA is continuous.

Proof. Suppose x ∈ GA . For any y ∈ A, we have

y = x − (x − y) = x e − x−1 (x − y) .
1
−1
If kx − yk < then kx−1 (x − y)k ≤ kx−1 kkx − yk < 1 and so, e − x−1 (x − y)
kx−1 k
,
−1
exists in A. Hence y is invertible and y−1 = e − x−1 (x − y) x−1 . This shows that the
open ball B(x; 1/kx−1 k) is fully contained in GA .

Let f be the map x 7→ x−1 on GA . For any h ∈ A such that khk < 1/kx−1 k we have that
X
(x + h)−1 = (e + x−1 h)−1 x−1 = (−1)n (x−1 h)n x−1
n≥0
X
−1 −1 −1 −1
=x −x hx +x (hx−1 )2 (−1)n (hx−1 )n .
n≥0

Hence
kx + h)−1 − x−1 + x−1 hx−1 k kx−1 k3 khk h→0
≤ −−−→ 0
khk 1 − kx−1 kkhk
n→∞
Lemma 12.6.11. Suppose x ∈ ∂GA in A. If {xn : n ∈ N} ⊂ GA and kx − xn k −−−→ 0,
then limn kx−1
n k = ∞.

Proof. Suppose the conclusion is false. Then, for some M > 0 and subsequence {xnk : k ∈
1
N} we have supk kx−1
nk k < M . Let K be large enough so that kxnK − xk < M . Then

ke − x−1 −1 −1
nK xk = kxnK (xnK − x)k ≤ kxnK kkxnK − xk < 1

Hence, x−1 −1
nK x ∈ GA and so x = xnK (xnK x) ∈ GA . Since GA is open, x ∈
/ ∂(GA ) which is a
contradiction.
12.7. Finite dimensional spaces 351

12.7. Finite dimensional spaces

Recall that n–th dimensional Euclidean space Fn is the set of all ordered n–tuples
⊤
x = x(1), . . . , x(n)
with the topology generated by
n
X 1/2
kxk2 = |x(k)|2
k=1

It is easy to check that for each 1 ≤ j ≤ n, the projection πj : Fn → F given by

(12.12) πj (x(1), . . . , x(n) = x(j)
is continuous. The following result shows that all finite dimensional vector spaces of dimen-
sion n are homeomorphic to the Euclidean space (Fn , k · k2 ).

Theorem 12.7.1. Let X be a topological vector space.

(i) Any linear map from Fn into X is continuous.
(ii) If Y ⊂ X is a linear subspace of finite dimension n, then Y is closed in X and
homeomorphic to (Fn , k · k2 ).
(iii) If X is a locally compact topological space, then it has finite dimension.

Proof. (i) Suppose Λ is a linear map on Fn into X. Set yj = Λ(ej ) where {ej : j = 1, . . . , n}
is the standard coordinate system ej (k) = 1{j} (k). Then
n
X n
X
Λ(x) = πj (x)yj = Sj ◦ πj (x)
j=1 j=1

where πj is as in (12.12), and Sj : λ 7→ λyj , j = 1, . . . , n. The continuity of Λ follows from

that continuity of addition and scalar multiplication on X as well as the continuity of each
πj and each Sj .

(ii) Let {yj : j = 1, . . . , n} be a basis of Y and define the linear map Λ on Fn into X by
setting Λ(ej ) = yj . Then Λ is an isomorphism between Fn and Y which, by part (i) is
continuous.
We will show that Λ−1 : Y → X is bounded in a neighborhood of 0 in Y . Let B the open
unit ball in Fn and Sn−1 = ∂B the unit sphere in Fn . As Sn−1 is compact and Λ(x) = 0
iff x = 0, K is a compact subset of X and 0 ∈ / K. By Theorems 12.1.8 and 12.1.13
there is an open balanced neighborhood V of 0 in X such that V ∩ (K + V ) = ∅. Then
U = Λ−1 (V ∩ Y ) = Λ−1 (V ) is an open neighborhood of 0 in Fn such that U ∩ Sn−1 = ∅.
Since V is balanced, the linearity of Λ−1 implies that U is balanced, and so U is a connected
subset of Fn . It follows that U is contained in B. This establishes that Λ is an linear
homeomorphism between Fn and Y .
352 12. Some Elements of Functional Analysis

To show that Y is closed in X consider a point y ∈ Y . For some t > 0, y ∈ tV . Since B is

compact in Fn ,
y ∈ Y ∩ V ⊂ tΛ(Λ−1 (Y ∩ V )) ⊂ tΛ(B) ⊂ Λ(tB).
This shows that y ∈ Λ(Fn ) = Y .

(iii) Suppose V is a neighborhood of 0 in X whose closure is compact. By Theorem 12.1.15[(b)]

the collection {2−n V : n ∈ Z+ } is a local base at 0 in X. The compactness of V implies
that
m
[ 1
V ⊂ xj + V
2
j=1

for some finite collection of points x1 , . . . , xm ∈ X. The linear space Y generated by such
points xj is finite dimensional and thus, it is closed in X. Then
1 1 1 1
V ⊂Y + V ⊂Y + Y + V =Y + V
2 2 4 4
The same argument shows that V ⊂ Y + 2−n V for each n ∈ N. By Theorem 12.1.7
\
V ⊂ (Y + 2−n V ) = Y = Y.
n∈N

This shows that Y has nonempty interior; therefore, Y = X.

Theorem 12.7.2. Let X be a topological vector space, M a closed linear subspace of X
and F a finite dimensional subspace of X. Then M + F is a closed linear subspace of X.

Proof. Let π : X → X/M be the quotient map. Clearly π(F ) is a finite dimensional linear
subspace of X/M and closed in (X/M, τM ) by Theorem 12.7.1. Therefore π −1 (π(F )) =
F + M is a closed subset of X.
Remark 12.7.3. The finite dimensioness assumption on F is necessary. See Exercise 12.17.4.

A pair of linear subspaces M and N of a linear space X are said to be scomplimentary

if M ∩ N = {0} and X = M + N . In such case we use the notation X = M ⊕ N . The
following result follows directly from Theorem 12.7.1.
Corollary 12.7.4. Let X be a locally convex space. If M is a closed linear subspace of
X with finte codimention n then, there is a subspace N ⊂ X of dimension n such that
X = M ⊕ N.

Proof. Let {x1 + M, . . . , xn + M } be a basis of X/M . Then, B = {x1 , . . . , xn } is a linearly

independent subset of X and N = span(B) is closed subspace of X. Clearly M ∩ N = {0}
and by definition of X/M , X = N + M .
Theorem 12.7.5. The convex cone generated by a finite set in a topological vector space
is closed.
12.7. Finite dimensional spaces 353

Proof. Let A = {x1 , . . xn } be a nonempty finite subset of a topological vector space X.

. ,P
n
Clearly the cone(A) = j=1 λj xj : λj ≥ 0 . For any x ∈ cone(A) there exists a subset
B of linearly independentPelements of A such that x is a positive linear combination of
B. Indeed, suppose x = nj=1 λj xj where λj ≥ 0. If T = {xj : λj > 0} is not a linear
P
independent collection, then 0 = xj ∈T αj xj = 0 for some scalars αj , at least one of which
is strictly positive. Let µ = max{αj /λj : λj > 0}. Then µ > 0 and λj ≥ αj /µ if λj > 0 and
λm = αn /µ for some λm > 0. As
X αj
x= (λj − )xj ,
µ
λj >0

it follows that x is a positive linear combination of a proper subset of T . Repeating this

argument finitely many times, we arrive at desired conclusion.

Suppose (ym : m ∈ N) ⊂ cone(A) converges to some point x ∈ X. As the collection

of linear independent subsets of A is finite there exists a linearly independent collection
P (m) (m)
B = {z1 , . . . , zℓ } ⊂ A such that ym = ℓj=1 λj zj , where λj ≥ 0 for all m ∈ N and
1 ≤ j ≤ ℓ. By Theorem (12.7.1) it follows that x ∈ cone(A).
Theorem 12.7.6. If K is a compact convex subset of the Euclidean space (Rn , k k2 ), then
there is 0 ≤ m ≤ n such that K and the unit ball B m (0; 1) in Rm are homeomorphic.

Proof. If K is a singleton, then the statement holds with m = 0. Assume that K has is not a
singleton. Fix p ∈ K and let Mp := span(K −p). If m := dim(Mp ) then, there are m linearly
independent vectors {x1 , . . . , xm } ⊂ K − p that generate Mp . Let
PmIp be the interior
Pof K −p
m
relative to Mp . Consider the norm k kMp on Mp defined by k j=1 αj xj kMp := j=1 |αj |.
Theorem 12.7.1 shows that this norm induces the same topology on Mp as the topology
induced by (Rn , k k2 ). Thus Ip is an open convex set in (Mp , k kMp ).
We claim that Ip 6= ∅ which, by Lemma Pm12.2, would imply that IP p ∩ Mp = K − p. First
notice that any point of the form x = j=1 αj xj with 0 ≤ αj and m j=1 αj < 1 belongs to
1 Pm
K − p since 0 ∈ K − p. If w := 2m x
j=1 j then, the set
U (w; ε) := {x ∈ Mp : kx − wkMp < ε} ⊂ K − p
P
1
for all 0 < ε < 2m . Indeed, if x ∈ U (w; ε), x − w = m 1
j=1 ǫj xj where |ǫn | < 2m . Then
Pm P
x = j=1 2m 1
+ ǫ xj , m 1 1
j=1 2m + ǫj < 1, and 2m + ǫj > 0. This completes the proof of
the claim.

By moving the origin 0 ∈ Mp to w ∈ Mp we may assume without loss of generality that

0 ∈ Ip . Let µ : Ip → R+ the Minkowski functional. Define the function φ : Mp → Mp by
x
φ(x) = µ(x) if x 6= 0, φ(0) = 0.
kxk2
kyk2
It is easy to check that φ is an homeomorphism and φ−1 (y) = µ(y) y for y ∈ Mp \ {0} and
n
φ−1 (0)= 0. Since φ(K − p) = B (0; 1) ∩ MP and (Mp , k k2 ) is isometric to (Rm , k k2 ),
K − p is homeomorphic to B m (0; 1).
354 12. Some Elements of Functional Analysis

Corollary 12.7.7. Suppose K is a nonempty compact convex subset in Rn . If f :→ K is

continuous, then f (x) = x for some x ∈ K.

Proof. By Theorem 12.7.6 K is homeomorphic the unit ball in some Euclidean space Rm .
The conclusion follows from Brouwer’s fixed point theorem.

12.8. Fixed point theorems

Brouwer’s theorem admits extensions to infinite dimensional spaces. Our presentation will
not follow the historical order. The fist extension was obtained by Schauder for Banach
spaces. Tihonov considered the case of locally convex spaces later. Here we treat Tihonov’s
theorem and obtain Shauder’s as a Corollary.
Theorem 12.8.1. (Tihovov) Let K be a compact convex subset of a Fréchet space X. If
f : K → K is continuous, then there exists p ∈ K such that f (p) = p.

Proof. Suppose that f has no fixed point in K. Then, its graph G = {(x, f (x) : x ∈ K ⊂
X × X is a compact set that does not intersect the diagonal ∆ in X × X. Thus, there is a
convex balanced neighborhood V of 0 such that
G + (V × V ) ∩ ∆ = ∅
This implies that
(12.13) f (x) ∈
/ x + V, x ∈ K;

otherwise, we would have (x + v, f (x)) ∈ G + (V × V ) ∩ ∆ for some x ∈ K. Let µ
be the Minkowski functional of V . By Theorem 12.3.5 µ is a continuous seminorm and
V = {x ∈ X : µ(x) < 1}. Define the function
α(x) = (1 − µ(x))+
Sn
Clearly, φ−1 ({0}) = X \ V . Let x1 , . . . , xn ∈ K be such that K ⊂ j=1 (xj + V ) and define
the functions βj in K as
α(x − xj )
βj (x) := Pn , j = 1, . . . , n
k=1 α(x − xk )
These functions are well defined since the denominator is positive in K.

The set H := co({x1 , . . . , xn }) is a finite dimensional compact set. Define the function g on
K by
n
X
g(x) := βj (x)xj .
j=1

Clearly g is continuous and g(K) ⊂ H. The same holds for the function g ◦ f . By Brouwer’s
fixed theorem, there exits p ∈ H such that g(f (p)) = p. Since βj (x) = 0 outside xj + V , we
12.9. Uniform booundedness 355

have that
n
X
x − g(x) = βj (x)(x − xj ) ∈ co(V ) = V
j=1

for all x ∈ K. Hence,

f (p) − p = f (p) − g(f (p)) ∈ V
which contradicts (12.13)
Corollary 12.8.2. (Schauder) Let K be a nonempty closed bounded convex set in a Banach
space X. Suppose f : K → K is continuous and that for any bounded subset A ⊂ K, f (A)
is compact. Then, there exists p ∈ K such that f (p) = p.
Remark 12.8.3. A function that satisfies the conditions above is said the be a compact
map or totally continuous map.

Proof. Set A := f (K). By assumption A is a compact subset of K. As K is convex,

co(A) ⊂ K. By Mazur’s theorem, B := co(A) = co(A) is compact. By continuity,

f (B) ⊂ f co(A) ⊂ f (K) = A ⊂ co(A) = B
By Schauder–Tihonov’s theorem, there is p ∈ B such that f (p) = p.

12.9. Uniform booundedness

Definition 12.9.1. Suppose Γ is a collection of linear maps from a topological vector
space X into another topological vector space Y . Γ is equicontinuous if for any open
neighborhood W of 0 in Y , there exists an open neighborhood V of 0 in X such that
Λ(V ) ⊂ W for all Λ ∈ Γ.

Γ is uniformly bounded if for any bounded set E ⊂ X there exists a bounded set F ⊂ Y
such that Λ(E) ⊂ F for all Λ ∈ Γ.

The following result will be used in Section 17.4.

Theorem 12.9.2. (Banach–Steinhaus) Let X and Y be topological vector spaces. Suppose
Γ is a collection of continuous linear maps from X into Y . Let B be the set of all points
x ∈ X whose orbits
Γ(x) = {Λ(x) : Λ ∈ Γ}
are bounded in Y . If B is of second category, then Γ is equicontinuous, B = X and Γ is
uniformly bounded.

Proof. Let W be an open neighborhood of 0 in Y and U a balanced open neighborhood

of 0 in Y such that U + U ⊂ W . Define
\
D= Λ−1 (U )
Λ∈Γ
356 12. Some Elements of Functional Analysis

If x ∈SB, then Γ(x) ⊂ nU for some n ∈ N, and so Λ(n−1 x) ∈ U ⊂ U for all Λ ∈ Γ. Hence
B⊂ nD. As B is of second category and D is closed, D must have an interior point x.
n∈N
Therefore, there exists a neighborhood V of 0 in X such that V ⊂ x − D, and
(12.14) Λ(V ) ⊂ Λ(x) − Λ(D) ⊂ U − U ⊂ W
for each Λ ∈ Γ. This shows that Γ is equicontinuous.

If E ⊂ X is bounded, then E ⊂ tV for some t > 0. Then, by (12.14)

[ [
F = Λ(E) ⊂ t Λ(V ) ⊂ tW.
Λ∈Γ Λ∈Γ

This shows that Γ is uniformly bounded. In particular, as {x} is bounded in X for any
x ∈ X, we have that Γ(x) is bounded in Y .
Corollary 12.9.3. Suppose X is an F–space and Γ is a collection of continuous maps from
X into a topological vector space Y . If Γ(x) = {Λ(x) : Λ ∈ Γ} is bounded in Y for each
x ∈ X, then Γ is equicontinuous.

Proof. By Baire’s category theorem, X is of second category. The conclusion follows from
the Banach–Steinhaus’ theorem.

The Banach–Steinhaus theorem is sometimes referred to as the uniform boundedness

principle.
The following version of the Banach–Steinhaus theorem establishes uniform bounded-
ness on compact convex sets instead of the whole space.
Theorem 12.9.4. Suppose X and Y are topological vector spaces, K ⊂ X is a compact
convex set, and Γ is a collection of continuous linear maps from X into Y . If the orbit
Γ(x) = {Λ(x) : Λ ∈ Γ} of each x ∈ K is bounded in Y , then {Λ(x) : x ∈ K, Λ ∈ Γ} is
bounded in Y .

Proof. Let C = {Λ(x) : x ∈ K, Λ ∈ Γ}. As in the proof of the Banach–Steinhaus, for any
open neigborhood
T W of 0 in Y , let U be a blanced open set in Y such that U + U ⊂ W and
define D = Λ∈Γ Λ−1 (U ). For x ∈ K, there is an integer n such that Γ(x) ⊂ nU . Hence
[
K= K ∩ nD.
n∈N

As K is compact Hausdorff space, the Baire category theorem implies that K is of second
category in the relative topology. Since D is closed, for some integer n, x0 ∈ K and a
neighborhhod V of 0 in X,
(12.15) K ∩ (x0 + V ) ⊂ nD.
Since compact sets are bounded, there exists an integer m such that
(12.16) K − x0 ⊂ mV
12.9. Uniform booundedness 357

The convexity of K implies that for any x ∈ K

1 1
z = x + (1 − )x0 ∈ K.
m m
1
From (12.16) z − x0 = m (x − x0 ) ∈ V ; from (12.15) z ∈ nD. Since x = mz − (m − 1)x0 and
U is balanced, we obtain that
Λ(x) ∈ mnU − (m − 1)nU ⊂ nm(U − U ) ⊂ nmW
for all x ∈ K and Λ ∈ Γ.

The convextity assumption in Theorem 12.9.4 can not be remove as the following ex-
ample shows.
Example 12.9.5. Consider the sequence xn ∈ ℓ2 (C) defined as xn (m) = n1 1{n} (m) for
P
n ≥ 1 and x0 ≡ 0. Let Λn be the linear map on ℓ2 (C) given by Λn x = nk=1 k 2 x(k). As
xn → 0 in ℓ2 (C), we have that K is compact in ℓ2 (C). If Γ(x) deonotes the orbit of x, then
Γ(x0 ) = {0}, Γ(x1 ) = {1} and Γ(xm ) = {0, m} for all m ≥ 2. Hence Γ(x) is bounded for
each x ∈ K; however, {Λn (x) : x ∈ K, n ∈ N} = Z+ is not bounded.
Definition 12.9.6. (Bilinear mappings) Suppose X, Y , and Z are vector spaces. A function
B : X × Y → Z is a bilinear map if for each (x, y) ∈ X × Y , the maps B(x, ·) : Y → and
B(·, y) : X → Z defined as u 7→ B(x, u) and v 7→ B(v, y) respectively, are linear,
Theorem 12.9.7. Suppose X is an F –space, Y and Z are topological vector spaces. If
B : X × Y → Z is a bilinear map such that for each (x, y) ∈ X × Y , the maps B(x, ·) and
n→∞ n→∞
B(·, y) are continuous the,n B(xn , yn ) −−−→ B(x0 , y0 ) in Z whenever xn −−−→ x0 in X
n→∞
and yn −−−→ y0 in Y .

Proof. For each n ∈ N the map bn (x) = B(x, yn ) is continuous in X. Since y 7→ B(x, y)
is continuous in Y , bn (x) → B(x, y0 ) in Z. Since Cauchy sequences are bounded, {bn (x) :
n ∈ N} is bounded in Z for each x ∈ X. Corollary 12.9.3 of Banach–Steinhause’s theorem
implies that the maps {bn : n ∈ N} are equicontinuous. Let U and W be open neighborhoods
of 0 ∈ Z such that U + U ⊂ W . There is a neighborhood V of 0 ∈ X such that
bn (V ) ⊂ U, n∈N
There is N ∈ N such that for all n ≥ N
xn − x0 ∈ V
B(x0 , yn ) − B(x0 , y0 ) ∈ U
Hence, for all such n
B(xn , yn ) − B(x0 , y0 ) = bn (xn − x0 ) + B(x0 , yn − y0 ) ∈ bn (V ) + U ⊂ U + U ⊂ W
This means that limn B(xn , yn ) = B(x0 , y0 ) in Z.
Remark 12.9.8. When Y in Theorem 12.9.7 is metrizable, the product X × Y is a metriz-
able topological linear space (with sum and scalar product defined as (x, y) + a(x′ , y ′ ) =
(x + ax′ , y + a′ y ′ )). It follows that the bilinear map is a continuous map.
358 12. Some Elements of Functional Analysis

12.10. Duality and separation theorems

A linear map from X into the field F is called a linear functional .
Definition 12.10.1. The topological dual space X ∗ of the topological vector space X is
the space of all continuous linear functionals Λ. It is clear that Λ ∈ X ∗ iff Λ is a linear
functional which is continuous at 0.
Lemma 12.10.2. Suppose x∗ is a linear functional and x∗ (x) 6= 0 for some x ∈ X. The
following statements are equivalent:
(i) x∗ ∈ X ∗ .
(ii) x∗ is bounded in some neighborhood of 0.
(iii) ker(x∗ ) is closed in X.
(iv) ker(x∗ ) 6= X.

Proof. The equivalence of (i) and (ii) follows from Theorem 12.5.7 when Y = F.

(i) implies (iii) since {0} is closed in F. (iii) implies (iv) since by assumption x∗ is not
identically zero.

(iv) implies (ii): Let x ∈ X \ ker(x∗ ). Then, for some balanced open neighborhood B of 0

(12.17) x + B ∩ ker(x∗ ) = ∅.
Since x∗ (B) is a balanced subset of F, either x∗ (B) is bounded and (ii) follows, or x∗ (B) = F.
In the latter case, there is y ∈ B such that x∗ (y) = −x∗ (x). This means that x + B ∩
ker(x∗ ) 6= ∅ contradicting 12.17.
Theorem 12.10.3. Suppose x∗ ∈ X ∗ \ {0}. Then ker(x∗ ) is a closed subspace of X of
codimension 1 and there is x0 ∈ X such that X = ker(x∗ ) ⊕ span({x0 }).

Proof. By Lemma 12.10.2 N = ker(x∗ ) is a closed subspace of X. Choose x0 ∈ X such

that x∗ (x0 ) = 1. For any y ∈ X, x∗ (y − x∗ (y)x0 ) = 0; hence y − x∗ (y)x0 ∈ N . This
means that X/N = span(x0 + N ) and so, dim(X/N ) = 1. The last statement follows from
Corollary 12.7.4.

Suppose X is a linear topological space. Let ∅ = 6 B ⊂ X ∗ . The sets

6 A ⊂ X and ∅ =
A⊥ := {x∗ ∈ X ∗ : x∗ (x) = 0 for all x ∈ A}
⊥
B := {x ∈ X : x∗ (x) = 0 for all x∗ ∈ B}
are called annihilators
T of A and B respectively. Clearly, for any x∗ ∈ X ∗ , ⊥ {x∗ } = ker(x∗ );
hence, ⊥ B = x∗ ∈B ker(x∗ ) is a closed linear subspace of X.
The Hahn–Banach theorem is an important result in analysis that states that a linear
functional f defined on subspace of a linear space admits an extension to the whole space
provided that f is dominated by a positive convex function.
12.10. Duality and separation theorems 359

Theorem 12.10.4. (Hahn–Banach theorem.) Let X be a real vector space, and L ⊂ X a

linear subspace. Suppose that ρ : X → R be a function that satisfies
(12.18) ρ(x + y) ≤ ρ(x) + ρ(y), ρ(a x) = aρ(x)
for all x, y ∈ X and a ≥ 0. If f : L → R is a linear functional such that
(12.19) f (x) ≤ ρ(x), x ∈ L,
then, there exists a linear functional Fe on X such that Fe = f on L, and
(12.20) −ρ(−x) ≤ Fe (x) ≤ ρ(x), x ∈ X.

Proof. Suppose f is a linear functional on L ⊂ X which satisfies (12.19). For any v ∈ X \L

and x, y ∈ L, we have that f (x)+f (y) = f (x+y) ≤ ρ(x+y) ≤ ρ(x−v)+ρ(v +y); therefore,

(12.21) A = sup{f (x) − ρ(x − v) : x ∈ L ≤ inf{ρ(v + y) − f (y) : y ∈ L} = B.
If A ≤ α ≤ B, the function fe on M = {x + a v : x ∈ L, a ∈ R} defined by fe(x + a v) =
f (x) + aα extends f and, by (12.21),

fe(x + a v) = |a| f (|a|−1 x) + sign(a) α ≤ |a|ρ(|a|−1 x + sign(a)v) = ρ(x + a v)
for all a ∈ R. The collection Q of functions fe that extend f and satisfy fe ≤ ρ on their
domainis partially ordered by inclusion. Any totally ordered set D ⊂ Q has upper bound
S e e
F = f : f ∈ D ∈ Q. By Zorn’s lemma, Q has a maximal element F . It follows that
domain(F ) = X otherwise, F can be extended to a linear function Fb defined on a larger
domain on which Fe ≤ ρ holds on contradiction to the maximality of Fe.
Example 12.10.5. (Banach–limit) Suppose D is a directed set and let Bb (D) be the
collection of all real bounded functions in D. Let L = {x ∈ Bb (D) : limn∈D x(d) exists} and
define f : L → R as f (x) = limd x(d). Define
lim inf x(d) := sup inf x(n)
d d d≤n
lim sup x(d) := inf sup x(n)
d d d≤n

There is a linear functional Λ : Bb (D) → R such that Λ = f in L and

lim inf x(d) ≤ Λ(x) ≤ lim sup x(d), x ∈ Bb (D)
d n

To see that, set p(x) := lim supd x(d) and notice that M , f and p satisfies the conditions of
Hanh–Banach’s theorem. The last statement follows from the observation that −p(−x) =
lim inf d x(d). Banach limits can be used to prove the existence of charges that are not
measures (Exercise 12.17.6).
Corollary 12.10.6. Let X be a complex linear space and L ⊂ X be a linear subspace.
Suppose that ρ : X → [0, ∞) satisfies ρ(x + y) ≤ ρ(x) + ρ(y) and ρ(a x) = |a|ρ(x) for all
x, y ∈ X and a ∈ C. If f : L → C is a linear functional such that |f (x)| ≤ ρ(x), then there
is a linear functional F on X such that F = f on L and |F | ≤ ρ on X.
360 12. Some Elements of Functional Analysis

Proof. Let u and v be the real and imaginary part of f respectively. By considering X as
a linear space over R, it follows that u and v are real linear functionals on X. Consequently,
from
f (ix) = iu(x) − v(x)
f (ix) = u(ix) + iv(ix)
it follows that v(x) = −u(ix) and so, f (x) = u(x) − i u(ix). Since
u(x) ≤ |f (x)| ≤ ρ(x) x ∈ L,
by Theorem 12.10.4, there is a real functional U on X such that U = u on L and U (x) ≤
ρ(x). It is easy to check that
F (x) = U (x) − iU (i x)
defines a complex linear functional on X as a complex linear space. For any x ∈ X, let
a ∈ C with |a| = 1 such that |F (x)| = aF (x). Then,
|F (x)| = F (a x) = U (a x) ≤ ρ(a x) = ρ(x)

A linear functional f on a partially ordered vector space V is a positive linear func-

tional if f (x) ≥ 0 for all x ≥ 0.
Corollary 12.10.7. (Kantorovich) Let V be a partially ordered vector space. If M is a
linear subsapce that majorizes V , then every positive linear functional on M extends to a
positive linear functional on V .

Proof. Suppose f is a positive linear functional on a majorizing linear subspace M . Let

ρ : V → R be the function
ρ(x) = inf{f (y) : x ≤ y, y ∈ M }
As M majorizes V , ρ is a well defined real–valued function. Clearly ρ satisfies condi-
tions (12.18) and (12.19) of Hahn–Banach’s theorem 12.10.4. Hence, f admits an extension
to V such that f (x) ≤ ρ(x) for all x ∈ X. If x ≥ 0, then −x ≤ 0 and as 0 ∈ M , we have
that −f (x) = f (−x) ≤ ρ(−x) ≤ f (0) = 0. This shows that f (x) ≥ 0.

The following result is a specialization of the Hahn–Banach theorem to normed spaces.

Theorem 12.10.8. Suppose X is a normed space and L a sublinear space in X. If Λ is
e to X that preserves
continuous linear functional on L, then Λ has a continuous extension Λ
e for all y ∈ L and
the original norm, that is, Λy = Λy
kΛk = sup |Λy| = sup e = kΛk.
|Λx| e
{kyk=1:y∈L} {kxk=1:x∈X}

Proof. The result follows from directly from Corollary 12.10.6 to the Hahn–Banach theorem
with ρ(x) := kΛkkxk.
12.10. Duality and separation theorems 361

Theorem 12.10.9. Let X be a normed linear space and H a linear subspace. x ∈ X \ H

iff there exists a bounded linear functional f ∈ H ⊥ such that kf k = 1 and f (x) 6= 0. In
particular, for any x ∈ X with x 6= 0, there is a bounded linear functional x∗ such that
x∗ (x) = kxk and kx∗ k = 1.

Proof. If x ∈/ H then d := d(x, H) = inf{kx − hk : h ∈ H} > 0. W = {λx + h : λ ∈

F, h ∈ H} is a linear subspace of X. The non–zero linear functional f : λx + h 7→ λd on W
vanishes in H and, since
|f (λx + h)| |f (x + h)|
sup = sup ≤ 1,
{kλx+hk6=0} kλx + hk {kx+hk6=0,h∈H} kx + hk

we have that f ∈ W ∗ . By taking hn ∈ H so that kx − hn k → d we get that limn f (x −

hn )/kx − hn k = 1 = kf k. By Hahn–Banach’s theorem f can be extended to a bounded
linear functional F such that kF k = kf k = 1.

Conversely, if F is a bounded linear functional such that F (H) = {0} and F (x) 6= 0 then,
since H ⊂ F −1 ({0}), it follows that x ∈
/ H.

The last statement follows from the first part by taking H = {0}.
Corollary 12.10.10. Let X be a normed linear space and denote by X ∗ the space of all
continuous linear functionals on X. Then, for any x ∈ X,
(12.22) kxk = max |x∗ (x)|
{x∗ ∈X ∗ :kx∗ k=1}

Proof. Since |x∗ (x)| ≤ kx∗ kkxk, the right hand side of (12.22) is at most kxk. For x = 0
there is nothing to prove; if x 6= 0, then (12.22) follows by taking a linear functional x∗ such
that x∗ (x) = kxk and kx∗ k = 1.
Remark 12.10.11. If (X, k kX ) is a Banach space, then X ∗ is a Banach space under the
norm kx∗ k := sup{|x∗ (x)k : kxkX ≤ 1}. By the same token, X ∗∗ = (X ∗ )∗ is a Banach space
under the corresponding sup norm. X ∗∗ contains a copy of X, namely the functionals of
the form fx : x∗ 7→ x∗ (x). Corollary 12.10.10 implies that the map F : X → X ∗∗ given by
x 7→ fx is injective isometry. If F is also onto, then X is said to be a reflexive space.
Example 12.10.12. Suppose X is a Banach space and M ⊂ X is a closed linear subspace.
It is easy to check that M ⊥ is closed on X ∗ under the induced norm topology. Theo-
rem 12.10.8 implies that any linear functional m∗ ∈ M ∗ has an extension x∗m∗ ∈ X ∗ with
km∗ k = kx∗m∗ k. If x∗2 ∈ X ∗ is another extension of m∗ to X then x∗m∗ − x∗2 ∈ M ⊥ and
km∗ k ≤ kx∗2 k. Hence, σ : M ∗ → X ∗ /M ⊥ given by m∗ 7→ x∗m∗ + M ⊥ is a well defined linear
map. Moreover, from
km∗ k ≤ kx∗m∗ + M ⊥ kτq := inf {kx∗m∗ + y ∗ k} ≤ kx∗m∗ k = km∗ k
y ∗ ∈M ⊥

it follows that σ is an isometry between M ∗ and X ∗ /M ⊥ .

Lemma 12.10.13. Any non–zero linear functional Λ on a topological vector space is open.
362 12. Some Elements of Functional Analysis

Proof. It suffices to show that Λ(B) is an open neighborhood of 0 ∈ F for any balanced
open neighborhood of 0 ∈ X. If x ∈ B is such that rx = |Λ(x)| 6= 0, then Λ(B) ⊃ Drx
where Drx = {z ∈ F : |z| ≤ rx }. Then, there are two alternatives: either Λ(B) = F
or r := supx∈B |Λ(x)| < ∞. In the former alternative there is nothing else to prove and
Λ is unbounded; in the later, Λ ∈ X ∗ and we claim that Λ(B) = Dr . It is clear that
Dr ⊂ Λ(B). Suppose there is x ∈ B such that Λ(x) = r. The continuity of the scalar
product α 7→ αx implies that there is t > 1 such that tx ∈ B. But then, Λ(tx) > r, which
is a contradiction.

The core of a set A, denoted by core(A), is the set of points x ∈ A such that A − x is
absorbent. Points in core(A) are called core points of A. Theorem 12.1.15(a) shows that
Ao ⊂ core(A).
Lemma 12.10.14. If A is a nonempty subset of a topological vector space X and a nonzero
linear functional f on X satisfies Re(f )(x) ≥ a for all x ∈ A, then Re(f )(x) > a for all
x ∈ Ao .

Proof. Assume that x0 + V ⊂ A, where V is a balanced neighborhood of zero. It suffices

to consider the case where X is a real topological vector space. If f (x0 ) = a, then for each
v ∈ V we have a ± f (v) = f (x0 ± v) ≥ a. Consequently, ±f (v) = 0 which means that
f (v) = 0 on V . Since V is absorbent, we have that f (y) = 0 for all y ∈ X, which is a
contradiction. Hence f (x) > a holds for all x ∈ Ao .
Theorem 12.10.15. Suppose that A and B are disjoint non–empty convex sets in a topo-
logical vector space X.
(i) If A has at least one core point, then there exists a linear functional Λ on X and
a real number s such that
Re(Λ(x)) ≤ s ≤ Re(Λ(y))
for every x ∈ A and y ∈ B. If A◦ 6= ∅, then Λ can be chosen to be continuous.
(ii) If A is an open set, then Λ and s in (i) can be chosen so that Λ is continuous and
Re(Λ(x)) < s ≤ Re(Λ(y))
for every x ∈ A and y ∈ B.
(iii) If X is locally convex, A is compact and B closed, then there is a continuous linear
functional Λ and real numbers t < s such that
Re(Λ(x)) ≤ t < s ≤ Re(Λ(y))
for every x ∈ A and y ∈ B.

Proof. It is enough to consider the case where the scalar field F = R since once the real
case is proved with a linear function Λ1 , the unique complex linear functional Λ whose real
part is given by Λ1 gives the stated separation.
12.10. Duality and separation theorems 363

(i) Suppose a0 ∈ A is a core point and let b0 ∈ B arbitrary. As A − a0 ⊂ A − B − x0 , we have

that x0 = a0 − b0 is a core point of the convex set A − B. It follows that C = A − B − x0 is
convex and absorbent and, since A ∩ B = ∅, −x0 6= C. Let ρ be the Minkowski functional
of C. Theorems 12.3.2 and 12.3.3 imply that ρ is subadditive and positive homogeneous,
and that ρ(−x0 ) ≥ 1. Let f (tx0 ) = −t for all t ∈ R. If t > 0 then f (tx0 ) = −t < 0 ≤ ρ(tx0 ),
and if t ≤ 0 then f (tx0 ) = −t ≤ −tρ(−x0 ) = ρ(tx0 ). By the Hahn–Banach theorem, there
exists an extension Λ of f to all X such that Λ ≤ ρ. For any x ∈ A and y ∈ B we have
x − y − x0 ∈ C; hence,
Λ(x − y − x0 ) = Λ(x) − Λ(y) + 1 ≤ ρ(x − y − x0 ) ≤ 1.
Consequently, Λ(x) ≤ Λ(y) for all x ∈ A and y ∈ B. Since Λ(A) and Λ(B) are disjoint
convex sets in the real line, Λ(A) and Λ(B) are disjoint intervals and Λ(A) is to the left of
Λ(B). (i) follows by letting s be the right–endpoint of Λ(A).

(ii) If a0 is an interior point, then A◦ − B − x0 is a convex open neighborhood of 0 contained

in C. Since Λ ≤ 1 on C, then Λ > −1 on −C and thus, |Λ| ≤ 1 in the neighborhood
C ∩ (−C) of 0. Consequently Λ is continuous.

If A is open in X then, as linear functionals are open, Λ(A) is open in F. (ii) follows from
(i) by taking s as the right–endpoint of the open interval Λ(A).

(iii) If X is locally convex, A convex and compact and B convex and closed, then by
Theorem 12.1.8 there is a convex neighborhood V of 0 such that (A+V )∩(B+V ) = ∅. From
(b), There is a continuous linear functional Λ and a real number s such that Λ(x) < s ≤ Λ(y)
for all x ∈ A + V and y ∈ B + V . Since A is compact and Λ is continuous, the later attains
its maximum value at some point x0 ∈ A. (iii) follows by setting t = Λ(x0 ).
Corollary 12.10.16. If X is locally convex, then X ∗ separates points.

Proof. If x, y ∈ X and x 6= y then {x} and {y} are disjoint compact, and hence closed,
sets in X. The conclusion follows from Theorem 12.10.15(iii).
Corollary 12.10.17. Suppose X is locally convex, B closed balanced and convex and x0 ∈
/
B. Then, there exists Λ ∈ X ∗ such that |Λ| ≤ 1 on B and Λ(x0 ) > 1.

Proof. By Theorem 12.10.15 there is Λ1 ∈ X ∗ and a real numbers t < s such that Λ1 (x0 ) ∈
(−∞, t) × R and Λ1 (B) ⊂ (s, ∞) × R. Since B is balanced, it follows that s < 0 and that
K = Λ1 (B) ⊂ C is a bounded closed ball around 0. If Λ1 (x0 ) = Reiθ , then there is
0 < r < R such that |Λ1 | ≤ r on K. The function Λ = r−1 e−iθ Λ1 satisfies the desired
properties.

The following result extends Theorem 12.10.9 to the setting of locally convex spaces.
Theorem 12.10.18. Suppose X is a locally convex topological vector space and Y a linear
subspace of X. If x ∈ X \ Y , then there exists x∗ ∈ Y ⊥ such that x∗ (x) = 1.
364 12. Some Elements of Functional Analysis

Proof. Theorem (12.10.15)[(iii)] with A = {x} and B = Y implies that there exists Λ ∈ X ∗
and a constant s ∈ R such that Re(Λ(y)) < s < Re(Λ(x)) for all y ∈ Y . As Y is a vector
1
space, it follows that Λ(Y ) = {0} and Λ(x) 6= 0. The functional x∗ = Λ(x) Λ satisfies the
conditions of the theorem.
Corollary 12.10.19. Suppose X is a locally convex and M ⊂ X is a linear subspace. If
f ∈ M ∗ , then there is Λ ∈ X ∗ such that Λ = f on M .

Proof. It is enough to consider a non–zero continuous functional f ∈ M ∗ . The set N =

{x ∈ M : f (x) = 0} is a closed linear subspace of M and N 6= N . Hence, N = N ∩ M and
⊥
so, M \ N = M \ N 6= ∅. Fix x0 ∈ M \ N . By Theorem 12.10.18, there exists Λ ∈ N such
that and Λ(x0 ) = 1. Set Λ′ = f (x0 )Λ so that Λ′ = 0 on N and Λ′ (x0 ) = f (x0 ). For each
x∈M
f (x)
f x− x0 = 0.
f (x0 )
f (x)
Thus, x − f (x0 ) x0 ∈ N and
f (x)
0 = Λ′ x − x0 = Λ′ (x) − f (x).
f (x0 )
The linear functional Λ′ satisfies the desired properties.
Corollary 12.10.20. Suppose X is a locally convex space. If F is a finite dimensional
subspace of X, then there is a closed linear subspace M of X such that X = F ⊕ M .

Proof. Since n := dim(F ) < ∞, F is a closed subspace of X. Let {e1 , . . . , en } be a basis

for F . Every x ∈ F can be uniquely expressed as
x = φ1 (x)e1 + . . . + φn (x)en
where each φk ∈ F ∗ by Theorem 12.7.1[(i)]. By Corollary 12.10.19, each φk can be ex-
tended continuous functional in X ∗ which we will denote by φk as well. Clearly
Tnas a −1
M = k=1 φk ({0}) is a closed linear subspace of X, M ∩ F = {0}, and X = M + F .
Remark 12.10.21. Corollaries 12.7.4 and 12.10.20 state that in locally convex spaces,
any linear subspace of finite dimension or of finite codimention has a complimentary closed
linear subspace in X.

For any convex closed set B in a topological vector space, let PB be the collection of
half spaces PΛ,c = {x : Re(Λ(x)) ≤ c}, Λ ∈ X ∗ and c ∈ R, that contain B. The next result
states that closed convex sets in a locally convex topological space as completely described
by the dual X ∗ .
Theorem 12.10.22. Let (X, τ ) be a locally convex topological vector space with dual X ∗ .
If B ⊂ X is a closed convex set, then B = ∩PB . Consequently, all locally convex topologies
on X with a common dual X ∗ have the same closed convex subsets.
12.11. Weak topology 365

Proof. Clearly ∩PB ⊂ B. If B = X, then X = P0,0 . Suppose B 6= X and let x ∈ X \ B.

By Theorem 12.10.15, there is Λ ∈ X ∗ and a number t such that B ⊂ PΛ,t and t < Re(Λ)(x).
Hence, B ⊂ ∩PB ⊂ B.

The last statement follows from the fact that half spaces are defined only in terms of X ∗ .

12.11. Weak topology

Suppose X is a linear space and X ′ is a vector space of linear functionals on X. The weak
topology σ(X, X ′ ) is the minimal topology on X that makes the every f ∈ X ′ continuous.
The collection of sets
(12.23) {x ∈ X : |Λj (x)| < εj , j = 1, . . . , n},
Λj ∈ X ′ and εj > 0 for j = 1, . . . , n, n ∈ N, form a local base at 0 for this topology. For
any linear functional f on X, let Nf = {x ∈ X : f (x) = 0}.
Theorem 12.11.1. Suppose (X, τ ) is a locally convex topological vector space and let X ∗ be
its dual space. If Y ⊂ X is a linear subspace of X, then σ(Y, Y ∗ ) = {Y ∩U : U ∈ σ(X, X ∗ )}.
Remark 12.11.2. This result says that the weak topology σ(Y, Y ∗ ) is the same as the
topology on Y inherited as a subspace of (X, σ(X, X ∗ )).

Proof. By Corollary 12.10.19 every linear functional f ∈ Y ′ can be extended by a linear

functional F ∈ X ∗ . Therefore, the collection XY∗ of all restrictions on Y of linear functionals
in X ∗ is the same as Y ∗ .
Remark 12.11.3. If (X, τ ) is a topological vector space with dual X ′ , then clearly σ(X, X ′ ) ⊂
τ and X ′ ⊂ (X, σ(X, X ′ ))∗ . We will see in Theorem 12.11.5 that in fact X ′ = (X, σ(X, X ′ ))∗ .
Lemma 12.11.4. LetT f, f1 , . . . , fn be linear functionals on a vector space X (no topological
assumptions needed). nj=1 Nfj ⊂ Nf iff f ∈ span{f1 , . . . , fn }.
P
Proof. Clearly, f = nj=1 λj fj where λ1 , . . . , λn ∈ C, implies ∩nj=1 Nfj ⊂ Nf .
T
Conversely, assume nj=1 Nfj ⊂ Nf . Let T : X → Fn be the linear transformation given by
T T
T (x) = f1 (x), . . . , fn (x) . Since T (x) = 0 iff x ∈ nj=1 Nfj and nj=1 Nfj ⊂ Nf , the map
g : T (X) → F given by g(T (x)) = f (x) is a well defined linear functional on T (X). Since
T (X) is a subspace of Fn , there is an n–tuple (λ1 , . . . , λn ) ∈ Fn such that
n
X
G : v = [v1 , . . . , vn ] 7→ λj v j
j=1
Pn
extends g to all Fn . Therefore, f = j=1 λj fj .
Theorem 12.11.5. Let X be a vector space and X ′ a linear space of linear functionals
on X. Then, (X, σ(X, X ′ ))∗ = X ′ . If X ′ separates points of X, then (X, σ(X, X ′ )) is a
Hausdorff locally convex topological vector space.
366 12. Some Elements of Functional Analysis

Proof. If Λ ∈ (X, σ(X, X ′ ))∗ , then there exists a weak neighborhood of 0 of the form
V = {xT: |Λj (x)| < ε, j = 1, . . . , n}, Λ1 , . . . , Λn ∈ X ′ such that x ∈ V implies |Λ(x)| < 1.
If x ∈ nj=1 NΛj , then |aΛ(x)| < 1 for all a ∈ F; therefore, Λ(x) = 0. It follows from
P
Lemma 12.11.4 that Λ = nj=1 λj Λj for some scalars λ1 , . . . , λn and thus, Λ ∈ X ′ .

To prove the last statement, notice that the topology σ(X, X ′ ) is generated by the separating
family of seminorms ρΛ (x) := |Λ(x)|, Λ ∈ X ′ . By Theorem 12.3.5, (X, σ(X, X ′ )) is a
Hausdorff locally convex topological linear space.

Suppose X is a vector space and X ′ is a vector space of linear functionals on X. If X

separates points of X ′ and X ′ separates points of X we say that hX, X ′ i is a dual pair .
Example 12.11.6. Let (E, m) be a positive σ–continuous elementary integral and let k k∗
be its Daniell mean. Assume that either 1 < p < ∞ or p = 1 and k k∗ is σ–finite. If The
Riesz representation theoremRshows that (Lq (k k∗ , C) and Lp (k k∗ , C) are isometric. The
map g 7→ Λg , where Λg (f ) = f g dm provides a conjugate isometry between these spaces.
hLp , Lq i is a classical example of a dual pair. For 1 < p < ∞, the spaces Lp are examples
of reflexive spaces.
Theorem 12.11.7. Suppose X is a vector space and X ′ is a vector space of linear func-
tionals on X. If the weak topology σ(X, X ′ ) is locally convex, then X ′ separates points of
X. Conversely, if X ′ separates
∗ points, then σ(X, X ′ ) is the weakest locally convex topology
τ on X for which X, τ = X . ′

Proof. The first statesment is a direct consequence of Corollary 12.10.16.

Conversely, if X ′ separates points of X, then by Theorem 12.11.5 σ(X, X ′ ) is a locally
convex Hausdorff linear topology on X and (X, σ(X, X ′ ))∗ = X ′ .

If X ∗ is the dual space of a topological vector space X, then each x ∈ X defines a

linear functional fx on X ∗ by letting fx (x∗ ) = x∗ (x) for any x∗ ∈ X ∗ . The weak topology
σ(X ∗ , X) is called the weak∗ –topology on X ∗ .
Example 12.11.8. Example 8.3.16 shows that C([−1, 1]) is not dense in (L∞ ([−1, 1]), k k∞ ).
In the weak∗ topology σ(L∞ , L1 ) on L∞ ([−1, 1]), C([−1, 1]) is dense. Indeed, suppose
f ∈ L∞ ([−1, 1]). Then, by Littlewood’s principle (Theorem 7.1.1), for any n ∈ N there
is a set An and a continuous function φn such that λ1 (Acn ) < 2−n and f = φn in An .
The sequence fn = −kf k∞ ∨ (φn ∧ kf k∞ )R is uniformly bounded R and fn → f pointwise
a.s. Thus, by dominated convergence, limn [−1,1] g(x)fn (x) dx = [−1,1] g(x)f (x) dx for any
g ∈ L1 ([−1, 1]). This shows that fn → f in σ(L∞ , L1 ).
Remark 12.11.9. Since x∗ (x) = y ∗ (x) for all x ∈ X implies x∗ = y ∗ , it follows that X
(or rather {fx : x ∈ X}) separates points of X ∗ . Thus, the weak∗ topology is the weakest
locally convex topology on X ∗ for which (X ∗ , σ(X ∗ , X))∗ = X. It is clear that M ⊥ is
closed in (X ∗ , σ(X ∗ , X)) for any nonempty M ⊂ X. Indeed, if {yn∗ : n ∈ D} is a net in
M ⊥ converging to y ∗ in σ(X ∗ , X) then, 0 = limn yn∗ (x) = y ∗ (x) for any x ∈ M , that is,
y∗ ∈ M ⊥.
12.11. Weak topology 367

Theorem 12.11.10. Suppose (X, τ ) is a locally convex topological vector space, and let X ∗
be its dual space equipped with the weak∗ topology. Let M and N be linear subspaces of X
and Y ∗ respectively. Then
⊥
τ
M⊥ = M
⊥
⊥ w∗
N =N
τ w∗
where M and N donote the the closures of M and N on (X, τ ) and (X ∗ , σ(X ∗ , X))
respectively.

Proof. If x ∈ M then x∗ (x) = 0 for all x∗ ∈ M ⊥ ; hence, x ∈ ⊥ M ⊥ . Since ⊥ M ⊥ is closed
τ τ
in (X, τ ), it follows that M ⊂ ⊥ M ⊥ . Conversely, if x ∈ / M then, by Theorem 12.10.18,
τ
there is x∗ ∈ M ⊥ such that x∗ (x) 6= 0 and so, x ∈ / ⊥ M ⊥ . This shows that X \ M ⊂

X \ ⊥ M⊥ .
⊥ ⊥
If x∗ ∈ N then x∗ (x) = 0 for all x ∈ ⊥N ;
hence, x∗ ∈ ⊥ N . Since ⊥ N is closed
w∗ ⊥
in (X ∗ , σ(X ∗ , X)), it follows that N ⊂ ⊥ ∗ ∗
N . Conversely, as (X , σ(X , X)) is a lo-
w∗
cally convex space whose dual is X, if x∗ ∈ / N then, by Theorem 12.10.18 applied to
⊥
(X ∗ , σ(X ∗ , X)), there is x ∈ ⊥ N such that x∗ (x) 6= 0 and so, x∗ ∈
/ ⊥ N . This shows that
w∗ ⊥
X ∗ \ N ⊂ X ∗ \ ⊥N .

The following result is a weaker version of Theorem 12.10.15(iii) which does not require
local convexity.
Theorem 12.11.11. Suppose X is a topological vector space whose dual X ∗ separate points.
If A and B are non–empty disjoint compact convex subsets of X, then there exits Λ ∈ X ∗
such that
(12.24) sup Re(Λx) < inf Re(Λy)
x∈A y∈B

Proof. By Theorem 12.11.5, the weak topology σ(X, X ∗ ) is a Hausdorff locally convex
topology on X. Consequently, A and B are nonempty disjoint convex σ(X, X ∗ )– weakly
closed subsets of X. Therefore, there is Λ ∈ (X, σ(X, X ∗ ))∗ = X ∗ for which 12.24 holds.
Corollary 12.11.12. Suppose X1′ and X2′ are linear spaces of linear functionals on X which
separate points. X1′ ⊂ X2′ iff σ(X, X1′ ) ⊂ σ(X, X2′ ).

Proof. If X1′ ⊂ X2′ then clearly σ(X, X1′ ) ⊂ σ(X, X2′ ) by definition of weak topology.

Conversely, if σ(X, X1′ ) ⊂ σ(X, X2′ ) then C (X, σ(X, X1′ )); F ⊂ C (X, σ(X, X2′ )); F and
thus, X1′ = (X, σ(X, X1′ ))′ ⊂ (X, σ(X, X2′ ))′ = X2′ by Theorem 12.11.5.
Theorem 12.11.13. Assume (X, τ ) is a locally convex topological vector space with dual
w
X ′ . For any non–empty convex set E ⊂ X, the closure E of E in τ and the closure E of
E in σ(X, X ′ ) coincide.
368 12. Some Elements of Functional Analysis

w w
Proof. As E is weakly closed, then it is originally closed; hence, E ⊂ E . Conversely,
/ E there exist Λ ∈ X ′ and s ∈ R
by the separation theorem (12.10.15))(iii), for any x ∈
such that Re(Λ(x)) < s < Re(Λ(y)) for all y ∈ E. Thus, V = {z ∈ X : Re(Λ(z)) < s} is
a weak–neighborhood of x that does not contain points in E. It follows that E is weakly
w
closed; therefore, E ⊂ E.

12.12. Some compactness theorems in linear spaces

In this section we present three useful compactness results in functional analysis namely, The
Krein–Millman theorem, the Banach–Alaoglu theorem, and the Eberlein–Smulian theorem.
The first theorem states that convex–compact sets in suitable topological vector spaces are
the convex hull of special points (extreme points). The Banach–Alaoglu theorem establishes
compactness of any set of linear functionals that are bounded in neighborhood of zero.
The Eberlein–Smulian theorem makes a connection between weak–compactness and weakly
sequential compactness similar to what happens in metric spaces.
Definition 12.12.1. Let K be a nonempty set in a topological vector space. A non empty
set S ⊂ K is an extreme set of K if for any x, y ∈ K and 0 < λ < 1,
λx + (1 − λ)y ∈ S
implies that x, y ∈ S. A point x ∈ K is an extreme point of K if {x} is an extreme set.
The set of all extreme points of a nonempty set S will be denoted by E(S).
Lemma 12.12.2. Suppose X is a linear space whose dual space X ∗ separates points. If K
be a nonempty compact subset of X then, E(K) 6= ∅. Moreover, if S is a compact extreme
set of K then,

(i) the set SΛ := x ∈ S : Re(Λx) = supy∈S Re(Λy) is also a compact extreme set of
K,
(ii) and S ∩ E(K) 6= ∅.

Proof. Let P be the collection of all nonempty compact extreme sets of K.T This is a
nonempty
T collection since K ∈ P. It is clear that if ∅ =
6 C ⊂ P then, either C = ∅ or
C ∈ P.

(i) Fix S ∈ P and Λ ∈ X ∗ and let µ := maxz∈S Re Λz . Clearly SΛ 6= ∅. Suppose that
for some 0 < λ < 1 and points x, y ∈ K, z := λx + (1 − λ)y ∈ SΛ . Since z ∈ S, we have
x, y ∈ S and

µ = Re Λ(λx + (1 − λ)y) = λ Re(Λx) + (1 − λ) Re(Λy).
This implies that Re(Λx) = µ = Re(Λy). This implies that x, y ∈ SΛ . Therefore SΛ ∈ P.

(ii) We now prove that E(K) is not empty. Fixed any S ∈ P and let P(S) the collection of all
sets in P that are contained in S. By definition S ∈ P(S) and so P(S) is not empty. Order
P(S) by inclusion. By Hausdorff’s maximal principle there is a maximalTchain C ⊂ P(S).
Since C satisfies the finite intersection property, we have that ∅ =
6 C := C ∈ P(S). The
12.12. Some compactness theorems in linear spaces 369

maximality of C implies that C = CΛ for any Λ ∈ X ∗ , and so each Λ ∈ X ∗ is constant on C.

Consequently, C has exactly one point. This shows that S ∩ E(K) 6= ∅ for all S ∈ P.

Theorem 12.12.3. (Krein–Milman) Suppose X is a topological vector space whose dual

X ∗ separates points. If K is a nonempty compact convex set in X, then K = co E(K) .

Proof. By Lemma 12.12.2 ∅ = 6 E(K) ⊂ K. The assumption on K implies that co(E(K)) ⊂

K. In particular, the closed convex closure of E(K) is compact. To prove the opposite
inequality, suppose that there is x0 ∈ K \ co(E(K)). By Theorem 12.11.11 there exists
Λ ∈ X ∗ such that
Re(Λx) < Re(Λx0 ), x ∈ co(E(K))
This means that co(E(K)) ∩ KΛ = ∅, which contradicts Lemma12.12.2[(ii)]. Therefore
K ⊂ co(E(K)).

Remark 12.12.4. If X is a locally convex linear space then, By Corollary 12.10.16,

Kreiman–Milman’s theorem holds since X ∗ separates points. Moreover, if K is a nonempty
compact (not necessarily convex) subset of X then, we can apply Theorem 12.10.15[(iii)]
in place of Theorem 12.11.11 in the proof of the Krein-Milman theorem to conclude that
K ⊂ co(E(K)).

Theorem 12.12.5. (Banach–Alaoglu) Suppose B ⊂ X is an open neighborhood of 0 in the

topological vector space X with dual X ∗ . Then,
(12.25) K = {Λ ∈ X ∗ : |Λ(x)| ≤ 1, for every x ∈ B}
is weak∗ –compact.

Remark 12.12.6. The set K in (12.25) is called the polar of V . Banach–Alaoglu’s theorem
states that the polar V of an any open neighborhood V of 0 ∈ X is σ(X ∗ , X)–compact
in X ∗ .

Proof. Since B is an open neighborhood of 0 ∈ X, it is absorbent and for every x ∈ X

there exists s(x) ∈ R+ such that x ∈ s(x)B. Consequently,
|Λ(x)| ≤ s(x) for all x ∈ X, Λ ∈ K.
ball closed ball Dx = {z ∈ C : |z| ≤ s(x)} is compact; hence, by Tihonov’s theorem
EachQ
P = x∈X Dx is compact. Observe P is the space of all the functions f : X → C such that
|f (x)| ≤ s(x), x ∈ X.
Clearly, K ⊂ P ∩ X ∗ . Let τ1 be weak∗ –topology on K as a subspace of X ∗ and let τ2 be
the topology of K as a subspace of P . If the following holds
(a) τ1 = τ2 and
(b) K is a closed subspace of P ,
370 12. Some Elements of Functional Analysis

then we conclude that K is compact. Indeed, (b) implies that K is τ2 –compact and (a)
implies that K is τ1 –compact.

(a): Let Λ0 ∈ K. For each n ∈ N, xi ∈ X and δi > 0, 1 ≤ i ≤ n, the sets

W1 = {Λ ∈ X ∗ : |Λ(xi ) − Λ0 (xi )| < δi , 1 ≤ i ≤ n}
W2 = {f ∈ P : |f (xi ) − Λ0 (xi )| < δi , 1 ≤ i ≤ n}
are basic open sets in the X ∗ with the weak∗ –topology and in P with the product topology.
Since K ⊂ P ∩ X ∗ , we have that W1 ∩ K = W2 ∩ K and (a) follows.

(b): Let f0 be an element in the closure of K in P . We will show that f0 ∈ X ∗ and that
|f0 (x)| ≤ 1 for all x ∈ B. Given x ∈ X and ε > 0, let
W (x; ε) = {f ∈ P : |f (x) − f0 (x)| < ε}
For any x, y ∈ X, a, b ∈ C, W (x; ε) ∩ W (y; ε) ∩ W (ax + by; ε) is open in P ; thus, it contains
a function f ∈ K. From
|f0 (ax + by) − af0 (x) − bf0 (y)| ≤|f0 (ax + by) − f (ax + by)|
+ |a||f0 (x) − f (x)| + |b||f0 (y) − f (y)|
≤(1 + |a| + |b|)ε,
we conclude that f0 ∈ X ∗ . Similarly, for x ∈ B and ε > 0, let f ∈ W (x; ε) ∩ K. Then
|f0 (x)| ≤ |f0 (x) − f (x)| + |f (x)| < ε + 1.
We conclude that f0 ∈ K, and (b) follows.
Example 12.12.7. If X is a Banach space, then the unit ball {x∗ ∈ X ∗ : kx∗ k ≤ 1k}
in X ∗ is weak∗ compact. More generally, the strong closure of a bounded set in X ∗ is
σ(X ∗ , X)–compact.
Example 12.12.8. Suppose (Ω, F , µ) is a nonatomic measure space. Let B be the closed
unit ball in (L1 (µ), k k1 ). We claim that E(B) = ∅. First, if f ∈ L1 with kf k1 < 1 then,
from
f 1
f = (1 − kf k1 )0 + kf k1 , 0 = (f − f )
kf k1 2
it follows that neither 0 nor f is an extreme point. If kf k1 = 1, we claim that the measure
µf (dx) := |f | · µ(dx) is nonatomic for if µf (A) > 0, B ⊂ A ∩ {f 6= 0}, and νf (B) = 0,
then B = ∅. By Saks theorem (Theorem 8.8.2), there is a set A ∈ F for which µf (A) = 21 .
Setting g = 2f 1A and h = 2f 1Ac , we have that kgk1 = khk1 = 1 and
1
f = (g + h).
2
This shows that f is not an extreme point. As a consequence, (L1 (µ), k k1 ) cannot be
isometrically isomorphic to dual space of any Banach space. Otherwise B is be weak∗
compact and so, E(B) 6= ∅ by Lemma 12.12.2 which is a contradiction.
12.12. Some compactness theorems in linear spaces 371

Alaoglu’s theorem is very useful under the additional assumption that X is separable,
for then the weak∗ –ball (12.25) is also sequentially compact, that is, any sequence {x∗n } ⊂ K
has a weak∗ –convergent subsequence.
Theorem 12.12.9. Let X be a separable topological vector space. If K ⊂ X ∗ is weak∗ –
compact, then K is metrizable.

Proof. Let {xn } be a countable dense subset of X. Each linear functional x bn : Λ 7→ Λ(xn ) is
weak∗ –continuous. If x cn (Λ′ ) for all n, then Λ = Λ′ for they are continuous functions
cn (Λ) = x
that coincide on a dense set of X. Thus {b xn } separates points in X ∗ . By Theorem 2.9.1,
we conclude that K is metrizable.
Example 12.12.10. Let µ ≥ 0 be a Radon measure on (Rd , B(Rd )). The unit ball in L∞ (µ)
is compact and metrizable. Indeed, first notice that C00 (Rd ) is dense in L1 (µ). Let {Gn :
n ∈ N} be a sequence of open sets such that Gn is compact in Rd and Gn ⊂ Gn+1 ր Rd .
From Urysohn’s lemma we obtain a sequence {φn : n ∈ N} ⊂ C00 (Rd ) with Gn ≺ φn ≺ Gn+1 .
Let R be the collection of polinomials in Rd with rational coefficients. By Stone–Weierstrass
theorem, D := {φn p : n ∈ N, p ∈ R} is a countable dense in (C00 (Rd , k ku ). As µ is finte on
open compact sets, it follow that D is dense in L1 (µ).
Definition 12.12.11. Let A ⊂ X and B ⊂ X ∗ . The polar of A and the dual polar of
B, are the sets in X ∗ and X respectively, defined by
A = {Λ ∈ X ∗ : |Λ(x)| ≤ 1, x ∈ A}

B = {x ∈ X : |Λ(x)| ≤ 1, Λ ∈ B}
respectively.
Lemma 12.12.12. Suppose X is a topological vector space with dual X ∗ . Let ∅ =
6 A⊂X
∗ ∗
and ∅ =
6 B ⊂ X . Then, A is convex, balanced and weak –closed; similarly, B is convex,
balanced and closed in X.

Proof. Balance and convexity are clear. Since

\
A = {Λ ∈ X ∗ : |Λ(x)| ≤ 1}
x∈A
\

B = {x ∈ X : |Λ(x)| ≤ 1},
Λ∈B

it follows that A and B are weak∗ –closed and closed in X ∗ and X respectively.
Theorem 12.12.13. (Bipolar theorem) Suppose X is a locally convex topological vector
space with dual X ∗ and let ∅ =
6 A ⊂ X and ∅ = 6 B ⊂ X ∗ . Then, A is the closure in

X of the balanced convex hull of A. Similarly, B is the weak∗ –closure of the balanced
convex hull of B.

Proof. It is clear that A ⊂ A . Since the latter set is balanced, convex and closed

in X, it contains co◦ (A). Suppose there is x ∈ A \ co◦ (A). By Corollary 12.10.17,
372 12. Some Elements of Functional Analysis

there is Λ ∈ X ∗ such that |Λ| < 1 on co◦ (A) and Λ(x) > 1. The first condition implies that
Λ ∈ A , and so Λ(x) ≤ 1. This is a contradiction.
Since (X ∗ , σ(X ∗ , X)) is locally convex and has X as its dual, the second statement goes
through step by step as above, exchanging the roles of X and X ∗ .

In the following result, we combines the Banach–Alaoglu theorem and the compact
version of the Banach–Steinhaus theorem to show that in locally convex topological spaces,
weak bounded sets are originally bounded.
Theorem 12.12.14. Suppose (X, τ ) is a locally convex topological vector space, and let X ∗
be its dual space. A non empty subset E in X is bounded in τ iff E is bounded in σ(X, X ∗ ).

Proof. Since σ(X, X ∗ ) ⊂ τ , every τ bounded set E is σ(X, X ∗ )–bounded.

Conversely, suppose E is weakly bounded and let U ∈ τ be a neighborhood of 0. Since
X is locally convex, there exists a convex balanced set V ∈ τ such that V ⊂ V ⊂ U . By
the Banach–Alaoglu theorem K = V is convex and σ(X ∗ , X)–compact; by the bipolar
theorem V = K. As E is weakly bounded, for each Λ ∈ X ∗ there exists a number
c(Λ) ∈ R+ such that
(12.26) sup |Λx| ≤ c(Λ).
x∈E
(12.26) means that the orbit {Λ(x) : x ∈ E} for each Λ ∈ K is a bounded subset of F;
therefore, by Theorem 12.9.4 there exists a constant c ∈ R+ such that
(12.27) sup |Λx| ≤ c.
x∈E, Λ∈K

As V is balanced, if x ∈ E then x ∈ cV ; consequently, E ⊂ tU for all t ≥ c.

The following result is a fully describes weak compact spaces in Banach spaces.
Theorem 12.12.15. (Eberlein–Smulian) Let X be a Banach space with dual space X ∗ . A
set K ⊂ X is σ(X, X ∗ )–compact iff any sequence in K has a σ(X, X ∗ )–weakly convergent
subsequence in K.

Proof. Assume K is weakly compact and (xn ) ⊂ K. If Y = span(xn : n ∈ N), then Y

is a separable closed Banach subspace of X. For each xn there is x∗n ∈ X ∗ such that
x∗n (xn ) = kxn k and kx∗n k = 1. The sequence (x∗n ) ⊂ X ∗ separates points of Y for if ∗n (x) = 0
for all n, then
kxk ≤ kx − xn k + kxn k = kx − xn k + x∗n (xn − x) ≤ 2kxn − xk
from whence it follows that x = 0. Hence K ∩ Y is a metrizable σ(Y, Y ∗ )–compact set.
Therefore there exists a subsequence (xnk ) and x ∈ K ∩ Y such that Λ(xnk ) → Λ(x) for all
Λ ∈ X ∗ . As Y ∗ = XY∗ , we conclude that xnk → x in σ(X, X ∗ ).
Conversely, suppose that any sequence in K has a σ(X, X ∗ ) convergent subsequence in K.
It follows that K is norm bounded otherwise, there is a sequence (xn ) ⊂ K with kxn k ≥ n.
12.12. Some compactness theorems in linear spaces 373

Then, for some subsequence {xnk : k ∈ N} and x ∈ K we have that Λ(xnk ) → Λ(x)
as k → ∞ for all Λ ∈ X ∗ . As X ∗ is a Banach space, the Banach–Steinhaus theorem
implies that (xn ) is norm bounded in X ∗∗ and hence, in X. This contradicts the fact that
limk kxnk k = ∞.
w∗∗
By the Banach–Alaoglu’s theorem, the σ(X ∗∗ , X ∗ )–closure of K, denoted by K , is
w∗∗
σ(X ∗∗ , X ∗ )–compact. We will show that K ⊂ X by constructing a sequence (xn ) ⊂ X
which converges to x′′ in σ(X ∗∗ , X ∗ ). The conclusion of the theorem would then follow
w∗∗
from Theorem (12.11.1). Fix x′′ ∈ K and choose any x∗1 ∈ X ∗ with kx∗1 k = 1. Then,
there exists x1 ∈ K such that
|(x′′ − x1 )(x∗1 )| < 1.
We continue by induction. Suppose that {x1 , . . . , xn } ⊂ X, {x∗1 , . . . , x∗n } ⊂ X ∗ and
{k1 , . . . , kn } ⊂ N, have been constructed so that
(1) 1 = k1 < . . . < kn .
(2) kx∗j k = 1, j = 1, . . . , kn .
(3) max{|y ∗∗ (x∗j )| : j = 1, . . . , kn } > 12 ky ∗∗ k for all y in
En = span(x∗∗ , x∗∗ − x1 , . . . , x∗∗ − xn ) ⊂ X ∗∗ .
(4) max{|(x∗∗ − xn )(x∗j )| : j = 1, . . . , kn } < n1 .
As En is finite dimensional, then it is a closed subspace of (X ∗∗ , k k) and the sphere
S n−1 = {y ∗∗ ∈ En : ky ∗∗ k = 1} is compact. Hence, there are points yk∗∗n +1 , . . . , yk∗∗n+1 in
S n−1 such that
kn+1
[ 1
(12.28) S n−1 ⊂ y ∗∗ ∈ En : ky ∗∗ − yj∗∗ k < .
4
j=kn +1
3
For each j = kn + 1, . . . , kn+1 choose x∗j ∈ X ∗ so that kx∗j k = 1 and |y ∗∗ (x∗j )| > 4.
From (12.28) it follows that
1
max{|y ∗∗ (x∗j )| : j = kn + 1, . . . , kn+1 } > ky ∗∗ k
2
for all y ∈ En . For each ℓ = 1, . . . , n + 1 define
kℓ
\ 1
Vℓ = y ∗∗ ∈ X ∗∗ : |(x∗∗ − y ∗∗ )(x∗j )| < .
ℓ
j=1
w∗∗
As x∗∗ ∈ W , there exists a point xn+1 ∈ K ∩ Vn+1 . The sequence (xn ) ⊂ X thus
constructed satisfies xn ∈ Vn and, by (3), it follows that
1
(12.29) sup{|y ∗∗ (xj )| : j ∈ N} ≥ ky ∗∗ k
2
S
for all y in the closure E of E = n En in (X ∗∗ , k k).
374 12. Some Elements of Functional Analysis

By hypothesis, there exist a subsequence xnm and a point x ∈ K to which xnm converges in
σ(X, X ∗ ). By Theorem (12.11.13), x belongs to the closure of span(xn : n ∈ N) in (X, k k);
consequently, x∗∗ − x ∈ E. Fix j ∈ N. Then, for any ε > 0 there is M > kj such that
|(x − xnm )(x∗j )| < ε for m ≥ M . For all such m we have that nm ≥ m ≥ M > kj ≥ j and
1
|(x∗∗ − x)(xj )| ≤ |(x∗∗ − xnm )(xj )| + |(xnm − x)(xj )| ≤ + ε.
nm
It follows that |x∗∗ (xj ) − x(xj )| = 0 for all j ∈ N. Therefore, from (12.29), x = x∗∗ .

12.13. The open map theorem

A map f between topological spaces X and Y is an open map if f (U ) is open in Y for any
open set U ⊂ X. The following result states that a one-to-one linear map from an F–space
onto a topological vector space is in fact a homeomorphism.
Theorem 12.13.1. (Open map theorem) Suppose X is an F –space and let Λ : X → Y
be a continuous linear map such that Λ(X) is of second category in Y . Then Λ is an open
map, Λ(X) = Y , and Y is an F–space.

Proof. To show that Λ is an open map it suffices to show that for any open neighborhood
V of 0 in X, the set Λ(V ) contains an open neighborhood of 0 in Y .

Let d be a complete invariant metric on X and let V be a neighborhood of 0 in X. Let

r > 0 be small enough so that V0 = Bd (0; r) ⊂ V , and for n ≥ 1 define Vn = Bd (0; 2−n r).
For each k ∈ Z+
[
Λ(X) = nΛ(Vk ).
n∈N
◦
Since Λ(X) is of second category in Y , there is some n for which nΛ(Vk ) 6= ∅. Since
◦
x 7→ nx is an homeomorphism, Λ(Vk ) 6= ∅. For any x ∈ Vk and y ∈ Vk

d(x − y, 0) = d(x, y) ≤ d(x, 0) + d(0, y) ≤ r2−k + r2−k = r2−(k−1)

Hence, Vk − Vk ⊂ Vk−1 for all k ≥ 1. By Lemma 12.1.5
Λ(Vk ) − Λ(Vk ) ⊂ Λ(Vk ) − Λ(Vk ) ⊂ Λ(Vk−1 )
whence we conclude that Λ(Vk−1 ) contains a neighborhood Wk−1 of 0 in Y . We construct
sequences xn ∈ Vn and yn ∈ Λ(Vn ) as follows: fix y1 ∈ Λ(V1 ). Once a point yn ∈ Λ(Vn ) has
been chosen,
(yn − Λ(Vn+1 )) ∩ Λ(Vn ) ⊃ (yn − Wn+1 ) ∩ Λ(Vn ) 6= ∅
and so, we can choose xn ∈ Vn so that Λ(xn ) ∈ yn − Λ(Vn+1 ). Then, yn+1 := yn − Λ(xn ) ∈
Λ(Vn+1 ), and continue by induction. Since Λ is continuous and {Vn : n ∈ Z+ } is a decreasing
n→∞
local basis at 0, we have that yn −−−→ 0 in Y . Since d(xn , 0) < 2−n r and X is an F–space,
12.13. The open map theorem 375

the sequence of partial sums x1 + . . . + xn converges to some point x ∈ X with d(x, 0) < r.
Consequently
n
X n
X
Λ(x) = lim Λ(xk ) = lim yk − yk+1 = lim y1 − yn+1 = y1 .
n→∞ n→∞ n→∞
k=1 k=1

This shows that W1 ⊂ Λ(V1 ) ⊂ Λ(V0 ) ⊂ Λ(V ).

The second statement follows directly from the first since Λ(X) is an open linear subspace
of Y . To prove the last statement, notice that N = Λ−1 ({0}) is a closed subspace of X, and
by Theorem 12.2.1[(iv)], X/N inherits the metric (F -space, Fréchet, normed) properties of
X. Let π : X → X/N be the quotient map. Since π(X) = X/N and x − y ∈ N implies
Λx = Λy, there exists f : X/N → Y such that Λ = f ◦ π. Since Λ(X) = Y and f in
ono-to-one, f is a linear isomorphism. Since Λ is continuous, for any open set V in Y the
set π −1 f −1 (V ) = Λ−1 (V ) is open in X; hence, by definition of the quotient topology,
f −1 (V ) is open in X/N an so, f is continuous. To show that f −1 is continuous, it is enough
to show that f is open. This follows from the identity

f (U ) = f π(π −1 (U )) = Λ(π −1 (U )),
the continuity of π, and the fact that Λ is open.

If T ∈ L(X, Y ) is bijective, then it is clear that the inverse map T −1 from Y to X is

linear; however, it is not always the case the inverse is a continuous map. The open map
theorem provides some conditions to address this problem.
Corollary 12.13.2. If Λ is a continuous linear mapping from a topological vector space X
into another topological vector space Y .
(i) If X and Y are F–spaces and Λ is surjective, then Λ is open.
(ii) If X and Y are F–spaces and Λ is bijective, then Λ−1 is also continuous.
(iii) If Both X and Y are Banach spaces and Λ is bijective, then there exists constants
a, b > 0 such that
akxk ≤ kΛ(x)k ≤ bkxk, x ∈ X.

Proof. (i) is an immediate consequence of the open mapping theorem since Λ(X) = Y is
a complete metric space.

(ii) For any open set V ⊂ X, (Λ−1 )−1 (V ) = Λ(V ) is open in Y . Thus, Λ−1 is continuous.

(iii) is a restatement of the continuity of both Λ and Λ−1 .

Corollary 12.13.3. If τ1 ⊂ τ2 are vector topologies on X and if both (X, τ1 ) and (X, τ2 )
are F–spaces, then τ1 = τ2 .

Proof. The identity map I : x 7→ x from (X, τ2 ) into (X, τ1 ) is continuous and bijective.
Therefore, by Corollary 12.13.2(ii) τ1 = τ2 .
376 12. Some Elements of Functional Analysis

A map f from a topological space (X, τX ) into a topological space (Y, τY ) has a closed
graph if {(x, f (x)) : x ∈ X} is closed in the product space (X × Y, τX ⊗ τY ).
Theorem 12.13.4. (Closed graph theorem) Suppose X and Y are F–spaces. If Λ is a
linear map from X to Y whose graph G is closed in X × Y , then Λ is continuous.

Proof. The product space (X × Y, dX ⊗ dY ) is also an F–space with metric

d((x1 , y2 ), (x2 , y2 )) = dX (x1 , x2 ) + dY (y1 , y2 ).
The projection maps πX : (x, y) 7→ x and πY : (x, y) 7→ y are linear and continuous. Thus
πX |G : (x, Λ(x)) 7→ x is continuous and bijective. As G closed, it is also an F–space; hence,
by the open mapping theorem, the map (πX |G )−1 : x 7→ (x, Λ(x)) is continuous. The
conclusion follows from the identity Λ = πY ◦ (πX |G )−1 .
Remark 12.13.5. The closedness condition of the graph G = {(x, Λ(x)) : x ∈ X} is usually
checked by showing that for any x ∈ X and sequence xn → x, if Λ(xn ) → y for some y ∈ Y ,
then y = Λ(x). In other words, if (xn , Λ(xn )) → (x, y) in X × Y , then y = Λ(x).

A linear map Λ : X → Y induces a linear map from the space Y ♯ of all linear functions
on Y into the space X ♯ of all linear functions on X, namely Λ† : f 7→ f ◦ Λ. Clearly Λ† is
a linear map from Y ♯ to X ♯ . When Λ is a continuous, Λ† (Y ∗ ) ⊂ X ∗ . In this situation, the
restriction of Λ† to Y ∗ is called the transpose of Λ.
Lemma 12.13.6. When X and Y are topological linear spaces for which their duals X ∗ and
Y ∗ separate points, then Λ† ∈ L(Y ∗ , X ∗ ) where Y ∗ and X ∗ are given the weak∗ topologies.

Proof. For any y ∗ ∈ Y ∗ let {yn∗ : n ∈ D} be a net that converges to y ∗ in σ(Y ∗ , Y ). Then,
for any x ∈ X
lim(Λ† yn∗ )(x) = lim yn∗ (Λx) = y ∗ (Λx) = (Λ† y ∗ )(x).
n n

That is, {Λ† yn∗ : n ∈ D} converges to Λ† y ∗ in σ(X ∗ , X).

Theorem 12.13.7. (Duality) Suppose X and Y are locally convex spaces and let Λ : X → Y
be a continuous linear operator. Then, the following hold:
(i) Range(Λ)⊥ = ker(Λ† ) and ⊥ Range(Λ† ) = ker(Λ)
(ii) Λ has a dense image iff Λ† is injective.
(iii) Λ is injective iff Λ† has a weak∗ dense image.

Proof. (i) Suppose

y ∗ ∈ Y ∗ . Then, y ∗ ∈ Range(Λ)⊥ iff y ∗ (Λx) = 0 for all x ∈ X.
Since
∗ † ∗
y (Λx) = Λ y (x), the last statement is equivalent to y ∗ ∈ Range(Λ)⊥ iff Λ† y ∗ (x) = 0
for all x ∈ X. Therefore, y ∗ ∈ Range(Λ)⊥ iff Λ† y ∗ = 0.

Similarly, x ∈ ⊥ Range(Λ† ) iff Λ† y ∗ (x) = 0 for all y ∗ ∈ Y ∗ . This means that x ∈
⊥ Range(Λ† ) iff y ∗ (Λx) = 0 for all y ∗ ∈ Y ∗ . Since Y ∗ separates points of Y by Corol-

lary 12.10.16, we conclude that x ∈ ⊥ Range(Λ† ) iff Λx = 0.

12.13. The open map theorem 377

⊥
(ii) Since Range(Λ) = Range(Λ)⊥ , part (i) implies that Λ has dense image in Y iff
ker(Λ† ) = {0∗ }, or equivalently, iff Λ† is injective.

(iii) Equip X ∗ and Y ∗ with the corresponding weak∗ topologies. Part (i) implies that Λ is
injective iff ⊥ Range(Λ† ) = {0}. Since the dual of (X ∗ , σ(X ∗ , X)) is X, Theorem 12.10.18
(applied to the locally convex space (X ∗ , σ(X ∗ , X))) implies that ⊥ Range(Λ† ) = {0} iff
w∗
⊥
Range(Λ† ) = ⊥ Range(Λ† ) = X ∗.

When X and Y are Banach spaces, the conclusion of Lemma 12.13.6 can be strengthen.
Theorem 12.13.8. Let X and Y be Banach spaces and equine X ∗ and Y ∗ with the corre-
sponding norm topologies.
(i) Λ ∈ L(X, Y ) iff Λ† ∈ L(Y ∗ , X ∗ ).
(ii) The map σ : Λ 7→ Λ† is a linear isometry from L(X, Y ) into L(Y ∗ , X ∗ ).

Proof. (ii) Suppose Λ ∈ L(X, Y ). From Corollary 12.10.10

kΛ† k = sup kΛ† y ∗ k = sup sup k(Λ† y ∗ )(x)k = sup sup ky ∗ (Λx)k
ky ∗ k=1 ky ∗ k=1 kxk=1 ky ∗ k=1 kxk=1
= sup sup ky ∗ (Λx)k = sup kΛxk = kΛk.
kxk=1 ky ∗ k=1 kxk=1

This shows that σ is an isometry from L(X, Y ) into L(Y ∗ , X ∗ ). The linearity of σ is left as
an exercise.

(i) Necessity follows from (ii). As for sufficiency, suppose Λ† ∈ L(Y ∗ , X ∗ ). Let (x, y) ∈
Graph(Λ). Choose a sequence {xn : n ∈ N} ⊂ X such that kxn − xkX+ kΛxn − ykY → 0.
By continuity, for any f ∈ Y ∗ we have that f ◦ Λ (x) = limn f ◦ Λ (xn ) = f (y); hence,
f (Λx) = f (y) for all f ∗ ∈ Y ∗ . Theorem 12.10.18 implies that Λx = y. Continuity of Λ
follows from by the closed graph theorem.
Example 12.13.9. (Dual of a quotient space) Suppose M is a closed linear subspace of a
Banach space X. We know that Y = X/M equipped with the norm induced by the quotient
topology is a Banach space . The quotient map π : x 7→ x + M belongs to L(X, Y ) and its
transpose π † belongs to L(Y ∗ , X ∗ ). Since π(x) = 0 + M for any x ∈ M , π † (Y ∗ ) ⊂ M ⊥ . We
claim that π † (Y ∗ ) = M ⊥ . Fix x∗ ∈ M ⊥ and let N = ker(x∗ ). N is a closed linear subspace
of X and M ⊂ N . If π(x) = π(y), then x − y ∈ M ⊂ N and x∗ (x) = x∗ (y). Hence, there
is a unique map Λ : Y → F such that Λ ◦ π = x∗ . It is easy to check that Λ is linear. It
follows from the definition of the quotient topology that Λ ∈ Y ∗ and π † (Λ) = x∗ . Therefore
π † (Y ∗ ) = M ⊥ . If U is the unit ball in X, then π(U ) is the unit ball in Y . Hence

kπ † y ∗ kX ∗ = sup{|π † y ∗ x| : kxkX < 1} = sup{|y ∗ (πx)| : kxkX < 1}

= sup{|y ∗ (y)| : kykY < 1} = ky ∗ kY ∗
Therefore, (X/M )∗ and M ⊥ are isometrically isomorphic.
378 12. Some Elements of Functional Analysis

Example 12.13.10. (Quotient in the dual space) Suppose M is a closed linear subspace
of a Banach space X. We know that Z = X ∗ /M ⊥ equipped with the norm induced by the
quotient topology is a Banach space. By Hahn–Banach’s theorem every m∗ ∈ M ∗ admits
and extension x∗ ∈ X ∗ and if x∗1 and x∗2 are two such extensions, x∗1 − x∗2 ∈ M ⊥ . Thus,
the map τ : M ∗ 7→ Z given by m∗ 7→ x∗ + M ⊥ , where x∗ extends m∗ is a well defined
linear map. Since the restriction of any x∗ ∈ X ∗ to M belongs to M ∗ , we have that τ is
an isometric isomorphism. We claim that τ is continuous. Fix m∗ ∈ M ∗ . Notice that for
any extension x∗ of m∗ we have that km∗ kM ∗ ≤ kx∗ kX ∗ . The Hanh–Banach provides an
extension x∗ ∈ X ∗ to m∗ such that km∗ kM ∗ = kx∗m∗ kX ∗ . Then
km∗ kM ∗ ≤ inf{kx∗m∗ + y ∗ kX ∗ : y ∗ ∈ M ⊥ } = kτ m∗ kZ ≤ kx∗m∗ kX ∗ = km∗ kM ∗
Therefore, X ∗ /M ⊥ and M ∗ are isometrically isomorphic.

In the remaining of this section we focus on linear maps between Banach spaces. The
following results state equivalent forms of the open mapping theorem in this setting.
Theorem 12.13.11. Let U and V the open unit disks in the Banach spaces X and Y
respectively. Suppose T ∈ L(X, Y ) and let δ > 0. The following statements are equivalent.
(i) kT † y ∗ k ≥ δky ∗ k for every y ∗ ∈ Y ∗ .
(ii) δV ⊂ T (U ).
(iii) δV ⊂ T (U ).
Moreover, T satisfies T (X) = Y iff any (and hence all) of (i)–(iii) holds for some δ > 0.

Proof. (i) implies (ii). Let y0 ∈

/ T (U ). As T (U ) is a closed convex set, Corollary 12.10.17
implies that there is Λ ∈ Y ∗ such that |Λy| ≤ 1 for all y ∈ T (U ) and |Λy0 | > 1. Hence, for
any x ∈ U
†
(T Λ)(x) = |Λ(T x)| ≤ 1

whence we conclude that kT † Λk ≤ 1. This, together with

δ < δ|Λy0 | ≤ δkΛkky0 k ≤ kT † Λkky0 k ≤ ky0 k
shows that Y \ T (U ) ⊂ T \ δV .

(ii) implies (iii). Statement (i) implies that δV ⊂ T (U ). Then, for any y ∈ Y \ {0} and
ε > 0 there is x′ such that kx′ k ≤ 1 and kδ −1 T x′ − kyk−1 yk < kyk−1 ε. This means that for
any y ∈ Y and ε > 0 there is x ∈ X with kxk ≤ kyk such that kδ −1 T x − yk < ε.
Fix y1 ∈ V and choose a sequence of positive numbers εn > 0 such that
X
εn < 1 − ky1 k.
n≥1

by induction, once yn has been picked, there is xn ∈ X such that kxn k ≤ kyn k and kyn −
δ −1 T xn k < εn . Set
yn+1 := yn − δ −1 T xn
12.13. The open map theorem 379

The sequences {xn : n ∈ N} ⊂ X and {yn : n ∈ N} thus constructed satisfy

kxn+1 k ≤ kyn+1 k = kyn − δ −1 T xn k < εn
and so,
X X
kxn k ≤ kx1 k + εn < ky1 k + (1 − ky1 k) = 1
n≥1 n≥1
P
This means that x := n≥1 xn ∈ U , and by continuity of T
N
X N
X
T x = lim δ −1 T xn = lim (yn − yn+1 ) = yn − lim yN +1 = y1
N →∞ N →∞ N →∞
n=1 n=1
n→∞
since yn −−−→ 0. Therefore δV ⊂ T (U ).

(iii) implies (i). By definition of the operator norm and of the transpose
n o n o
kT † y ∗ k = sup (T † y ∗ )(x) : x ∈ U = sup y ∗ (T x) : x ∈ U
n o n o
= sup |(y ∗ (y)| : y ∈ T (U ) ≥ sup |y ∗ (y)| : y ∈ δV = δky ∗ k

Necessity last statement is a direct consequence of the open map theorem. Sufficiency is
clear from (iii).
Theorem 12.13.12. Suppose X and Y are Banach spaces and T ∈ L(X, Y ). Then the
following statements are equivalent
(i) Range(T ) is normed closed in Y
(ii) Range(T † ) is σ(X ∗ , X)–closed in X ∗ .
(iii) Range(T † ) is normed closed in X ∗ .

Proof. Clearly (ii) implies (iii). We will prove that (i) implies (ii) and that (iii) implies (i).

(i) implies (ii): Let U be the open unit ball in X. By Theorems 12.13.7 and 12.11.10
ker(T )⊥ is the closure of Range(T † ) in X ∗ , σ(X ∗ , X)). To prove that (ii) holds it is enough
to show that ker(T )⊥ ⊂ Range(T † ). Fix x∗ ∈ ker(T )⊥ . Since Range(T ) is closed in Y , it is
also a Banach space; hence, T : X → Range(T ) is an open map. Theorem 12.13.11 implies
that δU ⊂ T (U ) for some δ > 0. Thus, for every y ∈ Range(T ) there is x ∈ X such that
y = T x and kxk ≤ 2δ kyk.

Since T x = T x′ implies that (x − x′ ) ∈ ker(T ), the linear functional Λ : Range(T ) → F

given by
Λ(T x) = x∗ (x) x∈X
is well defined. Furthermore,
2
|Λy| = |ΛT xk = |x∗ (x)| ≤ kx∗ k|x| ≤ kx∗ kkyk
δ
380 12. Some Elements of Functional Analysis

This shows that Λis continuous. The Hanh–Banach theorem provides an extension y ∗ ∈ Y ∗
of Λ. Then T † y ∗ (x) = y ∗ (T x) = Λ(T x) = x∗ (x) for all x ∈ X. Hence T † y ∗ = x∗ , that is
x∗ ∈ Range(T † ).

(iii) implies (i): Let W be the normed closure of Range(T ) in Y and define the map S :
X → W by Sx = T x. Since Range(S) = Range(T ) is dense in W , by Theorem 12.13.7[(ii)],
S † : W ∗ → X ∗ is injective. By Hahn–Banach’s theorem, if w∗ ∈ W then there is y ∗ ∈ Y
with kw∗ k = ky ∗ k that extends w∗ to Y . For any x ∈ X

T † y ∗ )(x) = y ∗ (T x) = w∗ (T x) = w∗ (Sx) = S † w∗ (x)

Hence T † y ∗ = S † w∗ , which means that Range(T † ) = Range(S † ). The assumption that

Range(T † ) is closed, and hence complete, implies that S † : W ∗ → Range(S † ) is invertible
by the open map theorem. As a consequence, there is c > 0 such that
ckw∗ k ≤ kS † w∗ k
for all w∗ ∈ W ∗ . Theorem 12.13.11 implies that S is an open map. It follows that S(X) is
an open dense linear subspace of W ; hence,
Range(T ) = Range(S) = W
This shows that (i) holds.
Corollary 12.13.13. Suppose X and Y are Banach spaces and let T ∈ L(X, Y ). Range(T ) =
Y iff T † is injective and Range(T † ) is normed closed. in particular, T is invertible as an
element in L(X, Y ) iff T † is invertible as an element of L(Y ∗ , X ∗ ).

Proof. Necessity: If T (X) = Y then, by Theorem 12.13.7[(i)] T † is injective. As Y is

closed, by Theorem 12.13.12 T † (Y ∗ ) is normed closed in X ∗ .
Sufficiency: If T † is injective then, by Theorem 12.13.7[(ii)] T (X) is normed dense in Y .
If in addition T † has a closed range then, by Theorem 12.13.12, T (X) is normed closed.
Hence T (X) = Y .

Clearly, if A is invertible, then A† is invertible and (A† )−1 = (A−1 )† . Conversely, if A† is

invertible then, the first part of this Corollary implies that T (X) = Y . Since T † (Y ∗ ) = X ∗ ,
Theorem 12.13.7[(i)] implies that T is injective. By the open map theorem, it follows that
T is in fact invertible.

12.14. Spectrum of linear operators on Banach spaces

Definition 12.14.1. Suppose A be a (complex) Banach algebra with unit e. Let x ∈ A.
resolvent of x, denoted by ρ(x), is the collection of all scalars λ for which (λe − x) is
invertible. The set σ(x) := C \ ρ(x) is called the spectrum of x.

Theorem 12.14.2. For any x ∈ A, y ∗ ∈ A∗ define the function gy∗ (λ) = y ∗ (λe−x)−1 on
ρ(x). Then, σ(x) is a nonempty compact subset contained in B(0; kxk) and gy∗ ∈ H(ρ(x)).
12.14. Spectrum of linear operators on Banach spaces 381

Proof. Notice that the map Φ : C → A given by λ 7→ λe − x is continuous. As GA is open

in A, ρ(x) = Φ−1 (GA ) is open in C and so, σ(x) is closed. Theorem 12.6.10 implies that

(µe − x)−1 − (λe − x)−1

lim −2
+ (λe − x) = 0, λ ∈ ρ(x).
µ→λ µ−λ

Hence, for any gy∗ ∈ H(ρ(x)) and gy′ ∗ (λ) = −y ∗ (λe − x)−2 .

If kλk > kxk, 1 − λ−1 x is invertible and so, (λe − x) = −λ(e − λ−1 x) ∈ GA . Hence
σ(x) ⊂ B(0; kxk) and σ(x) is compact.

If σ(x) = ∅, then ρ(x) = C and for any y ∗ ∈ A∗ , gy∗ (λ) = y ∗ (λe − x)−1 is an entire
1 1
function. Since k(λe − x)−1 k ≤ |λ| 1−|λ|−1 kxk
for all |λ| > kxk, lim|λ|→∞ gy∗ (λ) = 0. By
Liouville’s theorem, gy∗ ≡ 0 for all y ∈ A ; by Hahn–Banach theorem, (λe − x)−1 = 0 for
∗ ∗

all λ ∈ C. Hence e = (λe − x)(λe − x)−1 = 0. This is a contradiction since e 6= 0.

The spectral radius of x ∈ A is defined as r(x) := sup{|λ| : λ ∈ σ(x)}.

Theorem 12.14.3. (Spectral radius formula) Suppose A is a complex Banach algebra. For
any x ∈ A
p p
(12.30) r(x) = lim n kxn k = inf n kxn k
n→∞ n∈N

Proof. By Theorem A.1.4, the A–valued series

X∞
f (λ) = λ−n−1 xn
n=0
p
subsets of D = {λ ∈ C : |λ| > lim supn n kxn k}
converges absolutely and uniformly in compact p
and diverges in E = {λ ∈ C : |λ| < lim supn n kxn k}. Clearly, f (λ) = (λe − x)−1 for all
λ ∈ D. Hence, for any y ∗ ∈ A∗ \ {0}
∞
X
∗

(12.31) gy∗ (λ) := y ◦ f (λ) = λ−n−1 y ∗ (xn ) = y ∗ (λe − x)−1
n=0

for all λ ∈ D. Theorem 12.14.2 implies that gy∗ (λ) = y ∗ (λe − x)−1 is analytic on
ρ(x) ⊃ C \ B(0; r(x)). Then, by Laurent–Weierstrass theorem, we conclude that
p
(12.32) lim sup n kxn k ≤ r(x)
n
On the other hand, if λ ∈ σ(x), the factorization
λn e − xn = (λe − x)(λn−1 e + . . . + xn−1 )
implies that (λn e − xn ) is not invertible, that is λn ∈ σ(xn ). Consequently |λn | ≤ kxn k and
so,
p p
(12.33) r(x) ≤ inf n kxn k ≤ lim inf n kxn k
n∈N n

The conclusion follows by combining (12.32) and (12.33).

382 12. Some Elements of Functional Analysis

For the rest of this section we focus primarily on the Banach algebra L(X) of bounded
linear operators on a non–trivial complex Bananch space X. For T ∈ L(X), the set σP (T )
of all λ ∈ σ(T ) for which T − λI is not injective is called the point spectrum of T , and its
elements are called eignenvalues of T . For λ ∈ σP (T ), ker(T − λI) is called eigenspace,
and its elements are called eigenvectors.
Corollary 12.14.4. Suppose X is a complex Banach space. For any T ∈ L(X), σ(T ) is a
non–empty compact set in C. If λ ∈ ∂(σ(T )) in C, then inf kxk=1 k(T − λe)xk = 0.

Proof. Only the second statement needs to be proved. Suppose λ is in the boundary
of σ(T ). Let {λn : n ∈ N} ⊂ ρ(T ) such that λn → λ. Lemma 12.6.11 implies that
limn k(T − λn I)−1 k = ∞. For all n large enough, there are xn ∈ X with kxn k = 1 such that
1
k(T − λn )−1 xn k > k(T − λn )k − > 0
n
−1 −1 −1
Let αn = k(T − λn ) xn k and set yn = αn (T − λn ) xn . Then kyn k = 1 and
n→∞
k(T − λI)yn k ≥ k(T − λn I)yn k − |λ − λn | = αn−1 − |λ − λn | −−−→ 0
This shows that inf |y|=1 k(T − λI)yk = 0.
Theorem 12.14.5. Suppose X is a Banach space and T ∈ L(X). Then σ(T ) = σ(T † ).

Proof. Notice that λ ∈ ρ(T ) iff (T − ΛI) is invertible as an element of L(X), and this
happens iff (T † − λI) is invertible as an element of the Banach space L(X ∗ ).
Example 12.14.6. Consider the maps S and T from CN to itself given by
Sx(n) = x(n − 1)1(n ≥ 2)
T x(n) = x(n + 1)
1 1
For 1 ≤ p ≤ ∞ and p + q set Sp and Tq as the restrictions of A and T to ℓp and ℓq
respectively. It is easy to check that for 1 ≤ p < ∞, Sp† = Tq . When p = ∞, we have
T1† = S∞ . Then σ(Tq ) = σ(Sp ). Since Sp is an isometry, 0 ∈ σ(Sp ) and σ(Sp ) ⊂ B(0; 1).
For any λ 6= 0 with |λ| < 1, if Sx = λx, then λx(1) = 0, and x(n − 1) = λx(n) for n ≥ 2.
From this, it follows that x ≡ 0. Thus
σP (Sp ) = ∅
Similarly, if |λ| ≤ 1 and T x = λx for some x ∈ ℓq , then x(n + 1) = λx(n). From this it
follows that x(n) = x(1)λn−1 for all n ∈ N and so, |λ| < 1. From this and Example 12.13.9
if follows that if |λ| < 1

1 = dim ker(Tq − λI) = dim ker(Sp† − λI)
⊥ ∗
= dim Range(Sp − λI) = dim ℓp / Range(Sp − λI) ,

and B(0; 1) ⊂ σP (Tq ) ⊂ σ(Tq ) ⊂ B(0; 1). Therefore

σ(Tq ) = σ(Sp ) = B(0; 1)
12.15. Compact operators 383

/ S1 ,
Finally, for λ ∈

k(Sp − λI)xk ≥ kSp xk − |λ|kxk = 1 − |λ|kxk
for all x ∈ ℓp . This shows that when |λ| 6= 1, (Sp − λI) is injective and Range(Sp − λI) is
closed in ℓp . Therefore, {λ ∈ C : inf kxk=1 k(Sp − λI)xk = 0} = S1 .

12.15. Compact operators

Definition 12.15.1. Let X and Y be locally convex linear spaces. A linear map T : X → Y
is said to be completely bounded if for any bounded set U in X, T (U ) is totally bounded
in Y .
Remark 12.15.2. As totally bounded sets in a locally convex linear space are bounded, if
T : X → Y is completely continuous then T is bounded; in addition, if X is Fréchet, then
T is continuous.

In these notes, we will only consider the case where X and Y are both Banach spaces.
Let U be the unit ball in X. Since Y is a complete normed space, totally bounded sets in Y
are relatively compact; hence, T is completely continuous iff T (U ) is compact subset of Y .
In this setting, completely continuous maps are called compact operators. We will use
Lc (X, Y ) to denote the set of compact operators. Clearly, T ∈ Lc (X, Y ) iff any bounded
sequence {xn : n ∈ N} ⊂ X admits a subsequence such that {T xnk : k ∈ N} converges in Y .
Example 12.15.3. Let Ω be an open bounded subset of Rd . The space X = C(Ω)
eqqiped with
R the sup norm is a Banach space. Suppose K ∈ C(Ω × Ω). The map
T x(t) := Ω K(t, s)f (s) ds defines a bounded operatorn on C(Ω). An application of Arzèla–
Ascoli’s theorem shows that T is a compact operator (see Exercise 12.17.31).
Theorem 12.15.4. Suppose X, Y and Z are Banach spaces. The collection of compact
operators Lc (X, Y ) is a closed linear subspace of L(X, Y ) with its norm topology. Further-
more, if either S ∈ Lc (X, Y ) or T ∈ Lc (Y, Z), then T S ∈ Lc (X, Z).

Proof. Suppose S, T ∈ Lc (X, Y ) and α ∈ F. Let {xn : n ∈ N} ⊂ X be a bounded sequence.

Then, there is a subsequence {xnk : k ∈ N} such that Sxnk and T xnk converge in Y . Hence
(S + αT )xnk converges. Thus S + αT is compact, and so Lc (X, Y ) is a linear subspace of
L(X, Y ).

Let U be the unit ball in X. Suppose {Tn : n ∈ N} ⊂ Lc (X, Y ) converges in operator norm
to T . To show that T ∈ Lc (X, Y ) it is enough to show that T (U ) is totally bounded. Given
ε > 0, choose TN so that kTN −T k < 3ε . Then, there is a finite collection of {x1 , . . . , xm } ⊂ U
S
such that TN (U ) ⊂ N ε
j=1 B T xj ; 3 . For x ∈ U , choose xj so that kTN x − TN xj k < 3 .
ε

Since
kT x − T xj k ≤ kT x − TN xk + kTN x − TN xj k + kTN xj − T xj k < ε,
Sm
T (U ) ⊂ j=1 B(T xj ; ε). This shows that T (U ) is totally bounded.
384 12. Some Elements of Functional Analysis

Suppose S ∈ Lc (X, Y ) and T ∈ L(Y, Z). Let U be the unit ball in X. If S is compact then,
S(U ) is compact in Y and so T (S(U )) is compact in Z. Hence T (S(U )) is compact in Z.
Similarly, T is compact then, S(T (U )) is compact in Z since T (U ) is bounded in Y .

We list a few simple facts about compact operators.

Theorem 12.15.5. Let X and Y be Banach spaces, and T ∈ L(X, Y ).

(i) If dim Range(T ) < ∞, then T is compact.

(ii) If T is compact and Range(T ) is closed in Y , then dim Range(X) < ∞.
Suppose now that X = Y .

(iii) If T is compact then, dim ker(T − λI) < ∞ for any λ 6= 0.
(iv) If dim(X) = ∞ and T is compact then, 0 ∈ σ(T ).

Proof. (i) is consequence from the fact that a subspace of Y of finite dimension n is
homeomorphic to the Euclidean space Fn . There, a set is compact iff is closed and bounded.

(ii) If Range(T ) is closed, then it is itself a Banach space. By the open map theorem,
T : X → Range(T ) is an open map. If T is compact, then Range(T ) is locally compact;
hence, by Theorem 12.7.1[(iii)], Range(T ) is of finite dimension.

(iii) Suppose λ 6= 0. Clearly Y = ker(T − λI) is a closed normed space. Since λ−1 T y = y
for all y ∈ Y , the restriction of T to Y is a continuous linear map onto Y . Part (ii) implies
that Y is of finite dimension.

(iv) If 0 ∈
/ σ(T ) then T ∈ GL(X) and so Range(T ) = X. since T is compact, part(ii)
implies that dim(X) < ∞ contradicting the assumption in the statement.
Example 12.15.6. Let 1 ≤ p < ∞. Suppose {αn : n ∈ N} ⊂ C is bounded and let m :=
supn kαn |. Let A : ℓp → ℓp be the operator defined by Ax(n) = αn x(n). It is easy to check
that kAxkℓp ≤ mkxkℓp and that kAk = m. Furthermore {αn : n ∈ N} ⊂ σP (A) ⊂ σ(A).
For each m ∈ N define Am : ℓp → ℓp by
Am x(n) = αn x(n)1(n ≤ m)
Each Am has finite dimensional range and so, it is compact. Notice that kA − Am k =
n→∞
supn>m |αn |. Therefore, if αn −−−→ 0 then, A is compact. Conversely, if A is compact,
n→∞
then we must have that αn −−−→ 0.
Theorem 12.15.7. (Schauder) Suppose X, Y are Banach spaces, and let T ∈ L(X, Y ).
T ∈ Lc (X, Y ) iff T † ∈ Lc (Y ∗ , X ∗ ).

Suppose T is compact. Let U be the unit disk in X, and let {yn∗ : n ∈ N} be a sequence
in the unit disk of Y ∗ . For each n, denote by fn the restriction of yn∗ to T (U ). Since
|fn (y) − fn (y ′ )| = |yn∗ (y − y ′ )| ≤ ky − y ′ k, {fn : n ∈ N} is an equicontinuous sequence
in C(T (U ), F). Clearly supn |fn (y)| ≤ kT k for all y ∈ T (U ). Hence, by Arzelà–Ascoli’s
12.15. Compact operators 385

theorem, {fn : n ∈ N} is relatively compact in C(T (U ), F) and so, there exists a subsequence
fnk that converges to some f ∈ C(T (U ), F). From
kT † y ∗nk −T † yn∗ j k = sup |(T † y ∗nk −T † yn∗ j )(x)| = sup |(y ∗nk −yn∗ j )(T x)|
x∈U y∈T (U )
= sup |fnk (y) − fnj (y)|
y∈T (U )

It follows that {T † yn∗ k : k ∈ N} converges. This shows that T † is compact.

Suppose T † is compact. The first part of the Theorem implies that T †† ∈ Lc (X ∗∗ , Y ∗∗ ). Let
φ : X → X ∗∗ and ψ : Y → Y ∗∗ be the standard isometric embeddings given by φ : x 7→ ex ,
where ex (x∗ ) = x∗ (x) for all x∗ ∈ X ∗ and ψ : y 7→ ey , where ey (y ∗ ) = y ∗ (y) for all y ∗ ∈ Y ∗ .
Then

ψ(T x) (y ∗ ) = eT x (y ∗ ) = y ∗ (T x) = T † y ∗ (x)

= ex (T † y ∗ ) = φ(x) (T † y ∗ ) = T †† (φ(x)) (y ∗ )
for all y ∗ ∈ Y ∗ and x ∈ X. This means that ψ ◦ T = T †† ◦ φ. Since φ is an isometry, φ(U )
is contained in the unit disc U ∗∗ of X ∗∗ . Hence
ψT (U ) ⊂ T †† (φ(U )) ⊂ T †† (U ∗∗ )
It follows that ψ(T (U )) is totally bounded in Y ∗∗ . Since ψ is an isometry, it follows that
T (U ) is totally bounded in Y ; therefore, T is compact.

The rest of this section is dedicated to the analysis of the spectrum of compact operators
in X.
Theorem 12.15.8. If X is a Banach space and T ∈ Lc (X), then Range(T − λI) is closed
in X for all λ 6= 0.

Proof. By Theorem 12.15.5[(iii)], dim ker(T − λI) < ∞. By Corollary 12.10.20, there
exists a closed linear subspace M such that X = ker(T − λI) ⊕ M . Let S be the restriction
of T − λI to M . Then S ∈ L(M, X), S is injective, and Range(S) = Range(T − λI). To
show that Range(S) is closed it suffices to show that for some r > 0
(12.34) kSxk ≥ rkxk, x∈M
If‘(12.34) does not hold for any r > 0 then, for any n ∈ N there is xn ∈ M with kxn k = 1 such
that kSxn k < n1 . Then Sxn → 0 and by compactness of T , after passage to a subsequence,
T xn converges to some x0 ∈ X. Hence λxn = T xn − Sxn → x0 . Since M is closed, x0 ∈ M ,
and kx0 k = |λ| > 0. However, by continuity of S
Sx0 = lim S(λxn ) = 0
n
which is a contradiction. Therefore (12.34) holds for some r > 0.

The following technical results will be used to give a full description of the spectrum of
compact operators.
386 12. Some Elements of Functional Analysis

Lemma 12.15.9. Suppose Y is a locally convex topological linear space and M ⊂ X a

closed linear subset of X. Then
dim(Y /M ) ≤ dim(M ⊥ )

Proof. If M = Y the conclusion is obvious. Suppose M is properly contained in Y . For any

positive integer k ≤ dim(Y /M ) there are vectors y1 , . . . , yk such that {y1 + M, . . . , yk + M }
are linearly independent in Y /M . Let M0 = M and for each j = 1, . . . , k set Mj =
span(M ∪ {y1 , . . . , yj }) and yj ∈ Mj \ Mj−1 for all j = 1, . . . k. By Theorem 12.7.2, each
Mj is closed in Y . By Theorem 12.10.18, there are linear functionals Λ1 , . . . , Λk in Y ∗ such
that Λj yj = 1 and Λj ∈ Mj−1 ⊥ ⊂ M ⊥ . It follows that functionals are linearly independent;

hence dim M ⊥ ≥ k.

Lemma 12.15.10. Suppose M is proper closed linear subspace of a Banach space X. For
any r > 1 there exists x ∈ X such that kxk < r and d(x, M ) = 1.

Proof. Let x′ ∈ X \ M . As M is closed d(x′ , M ) = inf{kx′ − yk : y ∈ M } =: δ > 0. If

x1 = δ −1 x′ , d(x1 , M ) = 1. Hence, there is y ∈ M such that kx1 − yk < r. The vector
x = x1 − y satisfies the desired properties.

Theorem 12.15.11. Suppose X is a Banach space and T ∈ Lc (X).

(i) If λ 6= 0 is an eigenvalue of T the Range(T − λI) 6= X.
(ii) For each r > 0, the set eigenvalues λ of T with |λ| > r is finite.

Proof. Proof. We first show that if either (i) or (ii) is false, then there are closed subspaces
Mn and scalars λn such that
(a) {Mn : n ∈ N} is a strictly increasing sequence of closed subspaces of X.
(b) T (Mn ) ⊂ MN for all n ∈ N.
(c) c := inf n |λn | > 0
(d) (T − λn I)(Mn ) ⊂ Mn−1 for all integer n ≥ 2.

Suppose (i) is false. Let Tλ := T − λI and for each n ∈ Z+ define Mn := ker(Tλn ).

Since λ is an eigenvalue, there exists x1 6= 0 in M1 . Since Tλ (X) = X, there exists
x2 6= 0 with Tλ x2 = x1 and so, Tλ2 x2 = 0. Proceeding by induction, we obtain a sequence
{xn : n ∈ N} ⊂ X with
Tλn xn+1 = x1 6= 0, Tλn+1 xn+1 = Tλ x1 = 0.
Thus, for all n ∈ N we have that Mn−1 is a proper closed subspace of Mn , Tλ (Mn ) ⊂ Mn−1
and, since T Tλn = Tλn T , T (Mn ) ⊂ Mn . Set λn := λ for all integers n ≥ 1 and set c = |λ|.

Suppose (ii) is false. Let {λn } a sequence of distinct eigenvalues with |λ| > r. To each
λn choose a unit–norm eigenvalue xn ad define Mn = span{x1 , . . . , xn }. Each Mn is finite
12.15. Compact operators 387

dimensional and hence closed. We prove by induction that {x1 , . . . , xn } is a linearly inde-
pendent set for each n. For n = 1 this is trivial. Assume the the statement holds for n ≥ 1.
Suppose
0 = a1 x1 + . . . + an xn + an+1 xn+1
Applying T gives
0 = a1 λ1 xn + . . . an λn xn + an+1 λn+1 xn+1
Consequently
0 = a1 (λn+1 − λ1 )x1 + . . . + an (λn+1 − λn )xn
As λj 6= λn+1 for all 1 ≤ j ≤ n, we conclude that aj = 0 for all 1 ≤ j ≤ 0. Hence
an+1 xn+1 = 0 and so an+1 = 0. We conclude that Mn is properly contained in Mn+1 .
Clearly T (Mn ) ⊂ Mn . Notice that if x ∈ Mn and
x = a1 x1 + . . . + an xn ,
then
(T − λn )x = a1 (λ1 − λn )x1 + . . . + an−1 (λn−1 − λn )xn−1 ∈ Mn−1
This shows that (T − λn )(Mn ) ⊂ Mn−1 .
Having shown the existence of spaces Mn and scalars λn satisfying (a)–(d) we obtained from
Lemma 12.15.10 vectors yn ∈ Mn such that
kyn k < 2, d(yn , Mn−1 ) = 1
for all integers n ≥ 2. For 2 ≤ m < n we have T ym ∈ Mm ⊂ Mn−1 and (T − λn )yn ∈ Mn−1 .
Hence

kT yn − T ym k = kλn yn − T ym − (T − λn )yn k = |λn | yn − kλn |−1 T ym − (T − λn )yn
≥ cd(yn , Mn−1 ) = c > 0
This shows that {T yn : nn ∈ N} does admit have a convergent subsequence which is in
contradiction to the compactness of T . Therefore (i) and (ii) hold.
We now present the main result of this section.
Theorem 12.15.12. (Sprectral theorem for compact operators) Suppose X is a banach
space and T ∈ Lc (X). For any scalar λ 6= 0
(i) The numbers defined below are all finite and equal:

α = dim ker(T − λI)

β = dim X/(Range(T − λI))

α∗ = dim ker(T † − λI)

β ∗ = dim X ∗ /(Range(T † − λI))
(ii) If in addition λ ∈ σ(T ), then λ is an eigenvalue of T and T † .
(iii) σ(T ) is compact, at most countable and it has at most one limit point, namely 0.
388 12. Some Elements of Functional Analysis

Proof. Since T is compact iff T † is compact, then by Theorem 12.15.5 implies that α and
α∗ are finite. Set Tλ := T − λI.

Let Y = Y with the norm topology and M = Range(T − λI). Then M is closed in Y and,
by Theorem 12.13.7[(i)], the annihilator of M is ker(T † − λI). Lemma 12.15.9 implies that
(12.35) β ≤ α∗
Set Y = X ∗ with the weak∗ –topology and M = Range(T † − λI). Theorem 12.13.12 M
is a closed subspace of Y . By Theorem 12.13.7[(i)], the annihilator of M is ker(T − λI).
Lemma 12.15.9 implies that
(12.36) β∗ ≤ α
We now show that
(12.37) α≤β
Assume (12.37) is false. Since β ≤ α < ∞, by Corollaries 12.7.4 and 12.10.20 imply that
there are closed subspaces E and F in X, with dim(F ) = β such that
X = ker(Tλ ) ⊕ E = Range(Tλ ) ⊕ F
Each x ∈ X has a unique representation x = x1 + x2 with x1 ∈ ker(Tλ ) and x2 ∈ E. Let
π : X → ker(Tλ ) given by x 7→ x1 . Clearly π is linear. We claim that π is continuous.
Suppose (x, y) ∈ Graph(π) and let (xn , π(xn )) → (x, y) in (X × ker(Tλ )). Then, y ∈ ker(Tλ )
and
z := x − y = lim(xn − π(xn )) ∈ E
n

This shows π(x) = y. Thus, by the closed graph theorem, π is continuous.

Since we are assuming α = dim(ker(Tλ )) > β = F, there is a linear map A : ker(Tλ ) → F

such that Ax0 = 0 or some x0 6= 0. As a map with finite dimensional range, A is compact.
Hence, the map defined as Φ = T + A ◦ π is compact. Notice that
Φ − λI = Tλ + A ◦ π
Since π(E) = {0},
(Φ − λI)(E) = Range(Tλ )
Similarly, since π acts on ker(Tλ ) as the identity operator,
(Φ − λI)(ker(Tλ )) = A(ker(Tλ )) = F
Consequently
(12.38) X = Range(Tλ ) ⊕ F ⊂ Range(Φ − λI)
Since (Φ − λ)x0 = Tλ x0 + A(πx0 )) = 0, it follows that λ is an eigenvalue of Φ. Therefore, by
Theorem 12.15.11[(i)], the range of Φ−λI is properly contained in X. This is a contradiction
conclusion of (12.38). Therefore, β ≤ α.
12.16. Hilbert Spaces 389

Since T † is also compact, it follows then that

(12.39) β ∗ ≤ α∗
Putting things together, we conclude that α = α∗ = β = β ∗ .

(ii) If λ 6= 0 is not an eigenvalue, T − λI) is one to one and, since α = β = 0, X =

Range(T − λI). Thus, by the open map theorem, (T − λI) is invertible, that is λ ∈ ρ(T ).

(iii) Theorem 12.14.2 shows that σ(T ) is compact. Part (ii) shows that σ(T ){0} consists only
of eigenvalues. From Theorem 12.15.11, the nonzero eigenvalues of T is at most countable
and with 0 as the only possible accumulation point. If dim(X) < ∞, σ(T ) is finite. If
dim(X) = ∞ then 0 ∈ σ(T ).

12.16. Hilbert Spaces

Suppose H is a vector space over F, where F is either the real numbers R or the complex
numbers C. An inner product on H is a map H × H 7→ C such that
(a) (x, x) ≥ 0 and if (x, x) = 0, then x = 0,
(b) (x, y) = (y, x),
(c) (x + y, z) = (x, z) + (y, z),
(d) (α x, y) = α(x, y)
for all x, y and z in H and α ∈ F.
p
We will see that the map kxk = (x, x) defines a norm on H.

Lemma 12.16.1. (Cauchy–Schwartz) If H is a vector space with inner product (·, ·), then
(12.40) |(x, y)| ≤ kxkkyk
for all x, y ∈ H.

Proof. We will assume that F = C as the real case is simple to check. It is enough to
assume that that both x and y are not the zero vector; then, for any α ∈ C

0 ≤ kx − αyk2 = kxk2 − 2 Re α(x, y) + |α|2 kyk2 .
1
Letting α = kyk2
(x, y) we obtain that

|(x, y)|2
0 ≤ kxk2 − ,
kyk2
whence (12.40) follows.

Corollary 12.16.2. If H is a vector space with an inner product (·, ·), then (H, k · k) is a
normed space.
390 12. Some Elements of Functional Analysis

Proof. We will only prove the triangle inequality as the other properties of a norm are easy
to verify. For any x, y ∈ H

kx + yk2 = kxk2 + 2 Re (x, y) + kyk2 ≤ kxk2 + 2|(x, y)| + kyk2
≤ kxk2 + 2kxk kyk + kyk2 = (kxk + kyk)2 .
The conclusion follows immediately.

The following relations between the inner product and the induced norm play a very
important role in applications.
Lemma 12.16.3. If H is an inner product vector space and k · k is the induced norm, then
(12.41) kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2
and
1 i
(12.42) (x, y) = kx + yk2 − kx − yk2 + kx + iyk2 − kx − iyk2
4 4
for all x, y ∈ H.

Proof. For any x, y ∈ H, we have that

kx + yk2 = kxk2 + 2Re((x, y)) + kyk2
(12.43)
kx − yk2 = kxk2 − 2Re((x, y)) + kyk2 ,
Adding the two equations in (12.43) we obtain (12.41). On the other hand, substituting y
by i y in (12.43) gives
kx + iyk2 = kxk2 + 2Im((x, y)) + kyk2
(12.44)
kx − iyk2 = kxk2 − 2Im((x, y)) + kyk2 .
Combining (12.43) and (12.44). Solving for the real and imaginary parts from (12.44)
gives (12.42).

The identity‘(12.41) is known as the parallelogram law . The next result shows that
the parallelogram law is defining property of any inner product space.
Theorem 12.16.4. p (von Neumann-Jordan) A normed space (H, k · k) has an inner product
(·, ·) with kxk = (x, x) if and only if (12.41) holds.

Proof. Only sufficiency requires a proof at this point. Let (x, y) be defined by equa-
tion (12.42). We will show that (·, ·) satisfies properties (a)–(d). Observe that the continuity
of the norm implies that continuity of the inner product.
√ It is clear that (x, 0) = 0, (y, x) = (x, y) and that (x, iy) = −i(x, y). Since |1 + i| =
2 = |1−i|, it follows that (x, x) ≥ 0; moreover, since k·k is a norm, we have that (x, x) = 0
only if x = 0.
12.16. Hilbert Spaces 391

The parallelogram law implies that

1
x + z + y 2 − x + z − y 2
(x, y) + (z, y) =
2 2 2
i
x + z 2 x + z 2 x + z
+ + iy − − iy = 2 ,y .
2 2 2 2
Letting z = 0 shows that (x, y) = 2( x2 , y) for all x, y ∈ H; therefore, (x+z, y) = (x, y)+(z, y).
It follows that (αx, y) = α(x, y) for all α ∈ Q and thus, for any α ∈ R by continuity.
Consequently, (αx + z, y) = α(x, y) + (z, y) for all α ∈ C, and x, y, z ∈ H.

Two vectors x, y ∈ H are said to be orthogonal whenever (x, y) = 0; the orthogonal

complement of a set ∅ =6 V ⊂ H is defined as

V ⊥ = {u ∈ H : (v, u) = 0, ∀v ∈ V }.

The following concept is a slight generalization of inner product on a general linear space.

Definition 12.16.5. Suppose X is a complex linear space (no topology needed). A map
from a linear space f : X × X → C is said to be sesquilinear if for all a, y, z ∈ X and
α∈C
(i) f (x + αy, z) = f (x, z) + αf (y, z).
(ii) f (x, y + αz) = f (x, y) + αf (x, z)
In addition f satisfies f (y, x) = f (x, y) for all x, y ∈ X, then f is said to be symmetric.

Example 12.16.6. The inner product on a complex vector space H is a symmetric sesquilin-
ear map. For any linear map A : H → H on an inner product space, the map f (x, y) :=
(Ax, y) is sesquilinear (but not necessarily symmetric).

Lemma 12.16.7. Suppose X is a complex linear space. A sesquilinear map f : X × X → C

is symmetric iff f (x, x) ∈ R for all x ∈ R.

Proof. If f is symmetric then, f (x, x) = f (x, x) for all x ∈ X. This means that f (x, x) is
real.

Conversely, suppose f˜(x) := f (x, x) is real for all x ∈ X. For any x, y ∈ H, a simple
calculation gives

f˜(x + y) = f˜(x) + f (x, y) + f (y, x) + f˜(y)

f˜(x − y) = f˜(x) − f (x, y) − f (y, x) + f˜(y)

From this, it follows that

1 ˜ i
f (x, y) = f (x + y) − f˜(x − y) + f˜(x + iy) − f˜(x − iy) .
4 4
392 12. Some Elements of Functional Analysis

Since f˜(λx) = |λ|2 f˜(x) for all x ∈ X and λ ∈ C, and i−1 = −i,
1 ˜ i
f (y, x) = f (x + y) − f˜(x − y) − f˜(y + ix) − f˜(y − ix)
4 4
1 ˜ i
= f (x + y) − f˜(x − y) + f˜(y − ix) − f˜(y + ix)
4 4
1 ˜ i
= f (x + y) − f˜(x − y) + f˜(x + iy) − f˜(x − iy) = f (x, y)
4 4
12.16.1. Hilbert spaces and the Projection Theorem. A Hilbert p space is a vector
space with an inner product such that, under the induced norm x 7→ (x, x), (H, k · k) is
a complete normed space.
Theorem 12.16.8. (The projection theorem) Let M be a nonempty closed convex subset
of a Hilbert space H. For any x0 ∈ H, there exists a unique y0 ∈ M such that
(12.45) kx0 − y0 k = inf{kx0 − yk : y ∈ M }

Proof. Let d be the right hand side of (12.45) and let (yn ) ⊂ M be a sequence such that
kx0 − yn k → d as n → ∞. By the parallelogram law,
yn + ym
2
4 x0 − + kyn − ym k2 = 2kx0 − yn k2 + 2kx0 − ym k2 .
2
Since yn +y
2
m
∈ M , it follows that
kyn − ym k2 ≤ 2kx0 − yn k2 + 2kx0 − ym k2 − 4d2 → 0
as n, m → ∞. Therefore, (yn ) is a Cauchy sequence in H, and by completeness and the
closeness of M , there exists y0 ∈ M such that limn kyn − y0 k = 0 and thus, kx0 − y0 k = d.
To show uniqueness, suppose there is another y ∗ ∈ M such that kx0 − y ∗ k = d. Since
y0 +y ∗
2 ∈ M , it follows from the parallelogram law that
y0 + y ∗
2
ky0 − y ∗ k2 = 4d2 − 4 x0 − ≤ 4d2 − 4d2 = 0;
2
that is, y0 = y ∗ .
Corollary 12.16.9. If M is a closed convex subset of a Hilbert space H. For each x ∈ H
let PM (x) be the unique vector in M such that kx − PM xk = inf y∈M kx − yk. Then
(i) PM (x) = x if x ∈ M and PM (x) ∈ ∂M if x ∈
/ M.
(ii) supy∈M Re(x − PM (x), y) ≤ Re(x − PM (x), PM (x)) ≤ Re(x − PM (x), x), that is,
the hyperplane through PM (x) defined by v := x − PM (x) separates M from x.
(iii) For all x, y in H we have that kPM (x) − PM (y)k ≤ kx − yk, that is, the map
x 7→ PM (x) is continuous.
(iv) If M is closed linear subspace of H, then for any x ∈ H, x − PM (x) ∈ M ⊥ .

Proof. (i): If x ∈ M then it is clear that PM (x) = x. Suppose now that x ∈

/ M . Consider
the continuous map g : R → H given by λ 7→ λx + (1 − λ)PM (x). As g(0) = PM (x), if
PM (x) ∈ M o then, g(λ) := xλ ∈ M o ⊂ M for all λ small enough. However, kx − xλ k =
12.16. Hilbert Spaces 393

(1 − λ)kx − PM (x)k < kx − PM (x)k whenever 0 < λ < 1. This is a contradictiom to the
definition of PM (x). Therefore PM (x) ∈ ∂M .

(ii): Let x ∈ H, y ∈ M and 0 < λ < 1. Then

kx − PM (x)k2 ≤ kx − λy − (1 − λ)PM (x)k2
≤ kx − PM (x)k2 − 2λ Re(x − PM (x), y − PM (x)) + λ2 ky − PM (x)k2 .
This shows that 2 Re(x − PM (x), y − PM (x)) ≤ λky − PM (x)k2 . Letting λ → 0 gives
Re(x − PM (x), y − PM (x)) ≤ 0
from where we obtain
Re(x − PM (x), y) ≤ (x − PM (x), PM (x)).
To complete the proof of (ii), notice that
0 ≤ kx − PM (x)k2 = Re(x − PM (x), x − PM (x))

(iii): From part (ii) with PM (y) in place of y we have that

(12.46) Re(x − PM (x), PM (y) − PM (x)) ≤ 0, x, y ∈ H
Exchanging th roles of x and y gives
(12.47) Re(y − PM (y), PM (x) − PM (y)) ≤ 0, x, y ∈ H
Adding (12.46) and (12.47) gives
0 ≥ Re(x − y − (PM (x) − PM (y)), PM (y) − PM (x))
= − Re(x − y, PM (x) − PM (y)) + kPM (y) − PM (x)k2
By Cauchy’s inequality
kPM (y) − PM (x)k2 ≤ Re(x − y, PM (x) − PM (y)) ≤ kx − ykkPM (y) − PM (x)k,
whence (iii) follows.

(iv) Let x ∈ H and y ∈ M \ {0}. Then, for any α ∈ F,

kx − PM xk2 ≤ kx − PM x − αyk2 = kx − PM xk2 − 2Re α(x − PM x, y) + kαk2 kyk2 .
In particular, for α = (x − PM x, y)t with t > 0 we obtain
2|(x − PM x, y)|2 ≤ tk(x − PM (x), y)|2 kyk2
Letting t ց 0 implies that (x − PM x, y) = 0. As y ∈ M is arbitrary, x − PM (x) ∈ M ⊥ .
Corollary 12.16.10. Let M be a closed linear subspace of a Hilbert space. For any x ∈ H,
there is a unique decomposition
x = P x + Qx, P x ∈ M, Qx ∈ M ⊥ .
P x and Qx are the nearest point of x to M and M ⊥ respectively. Moreover, the maps
x 7→ P x and x 7→ Qx are linear.
394 12. Some Elements of Functional Analysis

Proof. To prove uniqueness, suppose x = x1 + y1 = x2 + y2 where xj ∈ M and yj ∈ M ⊥ .

Then
x1 − x2 = y2 − y1 ∈ M ∩ M ⊥ = {0};
that is, x1 = x2 and y1 = y2 .

To proof the existance of the decomposition obove, let x ∈ H. Let P x := PM x ∈ M be

the unique vector such that kx − P xk = inf y∈M kx − yk. Theorem 12.16.9[(iv)] shows that
Qx = x − P x ∈ M ⊥ .
To show linearity of P and Q, notice that
P (αx + βy) + Q(αx + βy) = αx + βy = α(P x + Qx) + β(P y + Qy).
Hence, by uniqueness of the orthogonal decomposition, we have that P (αx + βy) = αP x +
βP y and Q(αx + βy) = αQx + βQy.

The linear transformation P is called the orthogonal projection of H onto M . The

following consequences of the projection theorem are very useful in applications.
Corollary 12.16.11. Let M , N be closed linear subspaces of a Hilbert space H such that
M ⊂ N . then, PM PN = PM , where PM and PN are the orthogonal projections of H onto
M and N respectively.

Proof. It is enough to show that x − PM PN x ∈ M ⊥ . For any y ∈ M ,

hx − PM PN x, yi = hx − PN x, yi + hPN x − PM PN x, yi = 0,
for y ∈ M ⊂ N .
Corollary 12.16.12. Let En ⊂ En+1 , n ∈ N, be closed linear subspaces of a Hilbert space
H. Let Pn , and P∞ be the orthogonal projection of H onto En and E = ∪n En respectively.
Then, for any x ∈ H,
lim kPn x − P∞ xk → 0.
n

Proof. For any x ∈ H and ε > 0, since P∞ x ∈ ∪n En , there is xN ∈ EN , N ∈ N, such that

kP∞ x − xN k < ε. Since EN ⊂ En whenever n ≥ N ,
kP∞ x − Pn xk = kP∞ x − Pn P∞ xk ≤ kP∞ x − xN k < ε.
This shows that Pn x → P∞ x in H.

12.16.2. Adjoint operator. An important application of the Projective theorem is the

representation of continuous functionals on a Hilbert space.
Theorem 12.16.13. (Riesz representation) Suppose (H, k k) is a Hilbert space and denote
by k kH ∗ the sup norm induced on its dual space H ∗ . For any y ∗ ∈ H ∗ , there exists a unique
y ∈ H such that
y ∗ (x) = (x, y), x ∈ H.
12.16. Hilbert Spaces 395

The map V : H → H ∗ defined by (V y)(x) = (x, y) is a sesquilinear isometry from H onto

H ∗ , that is V (y + λz) = V (y) + λV (z) and kV (y)kH ∗ = kyk.

Proof. We first show uniqueness. If (x, y) = (x, y ′ ) for all x ∈ H, then (x, y − y ′ ) = 0 for
all x ∈ H. In particular, if x = x − y ′ , we conclude that ky − y ′ k = 0; therefore, y = y ′ .
If y ∗ ≡ 0, then y = 0 represents the functional. Suppose that y ∗ 6= 0 and let u ∈ H
so that y ∗ (u) 6= 0. By continuity, M = {x : y ∗ (x) = 0} is a closed linear subspace. Let
1
u = P u + Qu, with P u ∈ M and Qu ∈ M ⊥ . Then, Qu 6= 0 and v = kQuk Qu is a well
⊥ ∗
∗
∗
defined unit vector in M . If w = y (x) v − y (v) x, then y (w) = 0 and

0 = (w, v) = y ∗ (x) − y ∗ (v) (x, v).

Hence, if y = y ∗ (v) v, then y ∗ (x) = (x, y) = V (y) (x) for all x ∈ H.

For any y, z ∈ H and λ ∈ F we have that

V (y + λz) (x) = (x, y + λz) = (x, y) + λ(x, z)

= V (y) (x) + λ V (z) (x)
which shows that V is conjugate–linear. To conclude, notice that

kV (y)k = sup | V (y) (x)| = sup |(x, y)| = kyk
kxk=1 kxk=1

This shows that V is an isometry.

Remark 12.16.14. Since (H, k k) and (H ∗ , k kH ∗ ) are isometric, H ∗ satisfies the parallel-
ogram law (12.41). Hence H ∗ admits an inner product h·|·i which generates k kH ∗ . The
polar formula (12.42) implies that
hV (x)|V (y)i = (x, y), x, y ∈ H
where (·, ·) is the inner product on H. If y ∗∗ ∈ H ∗∗ , there is y ∗ ∈ H ∗ such y ∗∗ (x∗ ) = hx∗ |y ∗ i
for all x∗ ∈ H ∗ . Let V −1 (x∗ ) = x and V −1 (y ∗ ) = y. Then
y ∗∗ (x∗ ) = hx∗ |y ∗ i = hV (x)|V (y)i = (x, y) = (y, x) = x∗ (y)
for all x∗ ∈ H ∗ . This shows that H is reflexive.
Theorem 12.16.15. (Weakly sequential compactness in Hilbert spaces) If {xn : n ∈ N} is
a bounded sequence in a Hilbert space H, then there exists a subsequence xnk and a point
x ∈ H such that xnK converges to x in σ(H, H).

Proof. Consider the space G = span({xn : n ∈ Z}). G is a separable Hilbert space, and
by Alaoglu’s theorem and Theorem 12.12.9 there exist x ∈ G and a subsequence xnk such
that xnk → x in σ(G, G), that is, for any g ∈ G, limk hg, xnk i = hg, xi exists. Let PG the
orthogonal projection from H onto G. Then, for any u ∈ H, u = PG u + (I − PG )u and
hg, (I −PG )u) = 0 for all g ∈ G. Hence limk hu, xnk i = limk hPG u, xnk i = hPG u, xi. Therefore
xnk → x in σ(H, H).
396 12. Some Elements of Functional Analysis

Theorem 12.16.16. For any T ∈ L(H, H) there exists a unique T ∗ ∈ L(H, H) such that
(T x, y) = (x, T ∗ y)
for all x, y ∈ H. Moreover, T ∗ ∈ L(H, H) and kT ∗ k = kT k. The operator T ∗ is called
adjoint of T . The adjoint and the transpose of T are related by
T ∗ = V −1 T † V.
where V is the Riesz representation map in Theorem 12.16.13. Furthermore, the map
T 7→ T ∗ on L satisfies
(i) (λT + S)∗ = λT ∗ + S ∗ for all λ ∈ C and T, S ∈ L(H).
(ii) (T S)∗ = S ∗ T ∗ for all T, S ∈ L(H).
(iii) (T ∗ )∗ = T for all T ∈ L(H).
(iv) If T is invertible, then so is T ∗ and (T ∗)−1 = (T −1 )∗ .

Proof. For fix y ∈ H, the map x 7→ (T x, y) is linear and bounded. Therefore, the Riesz
representation theorem implies that there is a unique T ∗ y ∈ H such that
(12.48) (T x, y) = (x, T ∗ y),
x ∈ H.

The left hand side of (12.48) can be expressed as T † V (y) (x) = V (y) ◦ T (x), whereas

the right hand side of (12.48) can be expressed as V (T ∗ y) (x). Therefore T † ◦ V = V ◦ T ∗ .

For all x, y and z in H and λ ∈ F we have that

(x, T ∗ (y + λz)) = (T x, y + λz) = (T x, y) + λ(T x, z)
= (x, T ∗ y) + λ(x, T ∗ z) = (x, T ∗ y + λT ∗ z)
we have that T ∗ (y + λz) = T ∗ y + λT ∗ z. From this, linearity of T ∗ follows.

From definition of operator norm

kT ∗ k = sup kT ∗ yk = sup sup |(x, T ∗ y)| = sup sup |(T x, y)|
kyk=1 kyk=1 kxk=1 kyk=1 kxk=1
= sup sup |(T x, y)| = sup kT xk = kT k
kxk=1 kyk=1 kxk=1

Therefore T ∗ ∈ L(H, H) and kT k = kT ∗ k.

Properties (i)–(iv) are easy to verify and this is left as an exercise.

p
Corollary 12.16.17. For any T ∈ L(H), kT k = kT ∗ k = kT ∗ T k.

Proof. The first inequality has been proved already. Let x ∈ H with kxk = 1. It follows
from
kT xk2 = (T x, T x) = (x, T ∗ T x) ≤ kxkkT ∗ T xk ≤ kT ∗ T k ≤ kT ∗ kkT k = kT k2
that kT k2 ≤ kT ∗ T k = kT k2 .
12.16. Hilbert Spaces 397

Corollary 12.16.18. Suppose H is a Hilbert space. If T ∈ L(H), then σ(T ∗ ) = {λ : λ ∈

σ(T )}. If in addition T is compact, then so is T ∗ .

Proof. The first statement follows from σ(T ) = σ(T † ) and the fact that
T ∗ − λI = (T − λI)∗ = V −1 ◦ (T − λI)† ◦ V = V −1 ◦ (T † − λI † ) ◦ V
The last statement follows from the fact that V is an isometry (sesquilinear though).

The adjoint map on L(H) is an example of a more general concept which we define
below.
Definition 12.16.19. A C ∗ –algebra is a complex Banach algebra A together with an map
∗ from A into itself (called involution) that satisfies
(a) (λx)∗ = λx∗ for any λ ∈ C and x ∈ A.
(b) (x∗ )∗ = x for all x ∈ A.
(c) (xy)∗ = y ∗ x∗ for all x, y ∈ A.
(d) kx∗ xk = kxk2 for all x ∈ A.
Remark 12.16.20. In a C ∗ –algebra, (a)–(d) imply that kx∗ k = kxk. Indeed, kxk2 =
kx∗ xk ≤ kx∗ kkxk implies kxk ≤ kx∗ k. Applaying this to x∗ gives kx∗ k ≤ k(x∗ )∗ k = kxk.
Example 12.16.21. By Corollary 12.16.18, L(H) is a C ∗ –algebra.
Theorem 12.16.22. Any non–unital complex Banach ring A with an incolution operator
is isometrically ∗–isomorphic to a C ∗ –subalgebra of codimansion one in a C ∗ –algebra.

Proof. Suppose A is a non–unital ring with and involution operator. For any a ∈ A and
λ ∈ C the operator La +λI where La x = ax belongs to L(A). Since kLa xk = kaxk ≤ kakkxk,
kLa k ≤ kak. On the other hand,
kLa a∗ k = kaa∗ k = kak2 = ka∗ k2 = kakka∗ k
Hence, kLa k = kak that is, a 7→ La is an isometry homomorphism from A into L(A). By
defining (La + λI)∗ := La∗ + λI, we have that a 7→ La preserves to involution, that is
e := {La + λI : (a, λ) ∈ A × C} is an subalgebra
a∗ 7→ La∗ = (La )∗ . It is readily seen that A
in L(A) and has unit L0 + I = I.
Claim I: Ab is closed in L(A). First notice that if kLa + λIk = 0 then λ = 0, otherwise
−1
(−λ−1 a)x = x and x(−λ a∗ ) = x for all x ∈ A. Consequently
−1 ∗ −1 ∗
−λ−1 a = (−λ−1 a)(−λ a ) = (−λ a )
This means that A has a unit which contradicts the assumption on A. Hence λ = 0 and
kLa k = 0. Since a 7→ La is an isometry, we have that a = 0. Let φ(La + λI) := λ. This is
a well defined linear map on Ae since La + λI = Lb + βI) implies that La−b + (λ − β)I = 0
and so, a = b and α = β. The kernel of φ is {La : a ∈ A} which is a closed subspace
of L(A) for a 7→ La is an isometry and A is a Banach space. This shows that φ ∈ A∗ .
398 12. Some Elements of Functional Analysis

e then φ(Lan + λn I) = λn is a Cauchy

If {Lan − λn I : n ∈ N} is a Cauchy sequence in A,
sequence in C and so, λn converges to some λ ∈ C. This implies that
kan − am k = kLan − Lam k ≤ kLan + λn I − (Lam + λm I)k + |λn − λm |
Hence an is a Cauchy sequence in A and so, it converges to some a ∈ A. Putting things
e This proves the claim.
together we have that Lan + λn I converges to (La + λI) ∈ A.
e Notice that
It remains to show that ∗ is an involution on A.
k(La + λ)xk2 = kax + λxk2 = k(ax + λx)∗ (ax + λx)k
= k(x∗ a∗ + λx∗ )(ax + λx)k = kx∗ (La + λI)∗ (La + λI)xk
≤ k(La + λI)∗ (La + λI)kkx∗ kkxk ≤ k(La + λI)∗ (La + λI)kkxk2
Hence,
kLa + λIk2 ≤ k(La + λI)∗ (La + λI)k ≤ k(La + λI)∗ kk(La + λI)k
That is, kLa + λIk ≤ k(La + λI)∗ k. Replacing a by a∗ and λ by λ gives
kLa + λIk2 ≤ k(La + λI)∗ (La + λI)k ≤ kLa + λIk2
e is a C ∗ –algebra containing a a closed ideal {La : a ∈ A} that is ismoetric isom-
Thus, A
phorphic to A.

From the parallelogram law, a map U ∈ L(H) is an isometry iff

(U x, U y) = (x, y) x, y ∈ H.
If U is a linear isometry, then it is easy to check that
U ∗ U x = x, x ∈ H, U U ∗ y = y, y ∈ U (H)
If U is a surjective isometry in H, then U is said to be a unitary operator . In this case,
U is invertible and U −1 = U ∗ .
Example 12.16.23. While any isometry on any finite dimensional Hilbert space is unitary,
this is not the case for infinite dimensional spaces. For instance, the map S : ℓ2 → ℓ2 defined
as Sx(n) = x(n − 1)1(n ≥ 2) is an isometry on ℓ2 but not onto. To see this, consider the
maps en : m 7→ δnm and notice that e1 ∈ / S(ℓ2 ) = {e1 }⊥ .
Lemma 12.16.24. Let H be a Hilbert space and suppose M is a closed linear subspace. If
PM is the orthogonal projection from H onto M and U is an isometry in H, then U (M ) =
{U x : x ∈ M } is a closed linear subspace of H and
PU(M ) = U PM .

Proof. That U(M ) = {U m : m ∈ M } is a closed linear subspace of H is a consequence

of U being an isometry because U maps Cauchy sequences into Cauchy sequences. Notice
that U PM x ∈ U(M ) for any x ∈ H. For any m ∈ M ,
(U x − U PM x, U m) = (x − PM x, m) = 0.
12.16. Hilbert Spaces 399

Since U(M ) is a closed subspace of H, it follows that U PM x = PU(M ) x.

Definition 12.16.25. Suppose T is a bounded operator on a Hilbert space H. T is called
self–adjoint or hermitian if T ∗ = T . T is called normal if T T ∗ = T ∗ T .
Lemma 12.16.26. Suppose H is a complex Hilbert space. A linear map T : H → H is
self–adjoint iff (Ax, x) ∈ R for all x ∈ H.

Proof. The sesquilinear map f (x, y) = (Ax, y) is symmetric iff A is self–adjoint. The
conclusion follows from Lemma 12.16.7.

It is clear that unitary operators as well as self–adjoint operators are normal

Example 12.16.27. Any projection P of a Hilbert space H onto a closed subspace M ⊂ H
is self–adjoint. Indeed, for any x, y ∈ H, we have that (x − P x, P y) = 0 = (P x, y − P y);
hence
(P x, y) = (P x, y − P y) + (P x, P y) = (P x − x, P y) + (x, P y) = (x, P y).
If P 6= 0 and P =
6 I then {0, 1} ⊂ σ(P ). If λ ∈
/ {0, 1} then
1 1 1 1
− (I + P )(P − αI) = − (P − αI) I + P =I
α α−1 α α−1
which means that λ ∈ ρ(P ). Notice that if x ∈ P (X), P x = x; and if x ∈ (I − P )(X),
P x = 0. Thus σ(P ) = σP (P ) = {0, 1}.
Example 12.16.28. Suppose (Ω, F , µ) is a σ–finite measure space. Let k ∈ L2 (Ω×Ω, µ⊗µ)
and define
Z
Tk f (x) = k(x, y)f (y) µ(dy), f ∈ L2 (µ)
R
Tk is a bounded operator in L2 (Ω, µ) and Tk∗ g(y) = k(x, y)g(x) µ(dx). If k(x, y) = k(y, x)
for all (x, y) ∈ H × H, the Tk is self–adjoint.
Lemma 12.16.29. Any bounded operator T in a (complex) Hilbert space can be expressed
uniquely as
T = R + iJ
where R and T are self–adjoint operators

Proof. This is similar to the decomposition of complex number in their real and impaginary
parts. If such decomposition exists, then
T + T ∗ = (R + iJ) + (R∗ − iJ ∗ ) = 2R
T − T ∗ = (R + iJ) − (R∗ − iJ ∗ ) = 2iJ
Therefore, the decomposition exists, is unique and
1 1
R = (T + T ∗ ), J = (T − T ∗ )
2 2i
Theorem 12.16.30. Suppose H is a Hilbert space. If T ∈ L(H) is self–adjoint, then
400 12. Some Elements of Functional Analysis

(i) (T x, x) is a real al for all x ∈ H and so, σP (T ) ⊂ R.

Define

(12.49) |||T ||| := sup (T x, x)
kxk=1

(ii) For any x, y ∈ H

(T x, y) ≤ |||T |||kxkkyk

(iii) kT k = |||T |||

Proof. Since T is self–adjoint, (T x, y) = (T y, x) for all x, y ∈ H.

(i) The fist statment follows by taking x = y. As for the second by taking a unit eigenvector
x corresponding to an eigenvalue λ.

To prove (ii) it is enough to auume that kxk = kyk = 1. It follows from the self-adjointness
of T that
1
(T x, y) = (T (x + y), x + y) − (T (x − y), x − y)
4

+ i (T (x + iy), x + iy) − (T (x − iy), x − iy)

for all x, y ∈ H. Hence

|||T |||
|(T x, y)| ≤ kx + yk2 + kx − yk2 + kx + iyk2 + kx − iyk2 ≤ |||T |||
4

(iii) Cauchy–Schwartz’ inequality implies that |(T x, y)| ≤ kT kkxkkyk. Consequently

|||T ||| = sup |(T x, x)| ≤ kT k = sup sup |(T x, y)| ≤ |||T |||
kxk=1 kxk=1 kyk=1

Corollary 12.16.31. Suppose H is a (complex) Hilbert space and let T ∈ L(H). If

(T x, x) = 0 for all x ∈ H, then T = 0.

Proof. Since (T x, x) = (x, T ∗ x), then (T x, x) = 0 for all x implies that (T ∗ x, x) = 0 for
all x. Let T = R + iJ the real and imaginary decomposition of T . It follows from the
expressions for R and J that (Rx, x) = 0 = (Jx, x) for all x ∈ H. Since R and J are
self-adjoint, we conclude that kRk = |||R||| = 0 = |||J||| = kJk. Therefore R = 0 = J and so
T = 0.
Theorem 12.16.32. Suppose H is a (complex) Hilbert space and let T ∈ H(H). The
following statements are equivalent.
(i) T is normal.
(ii) kT xk = kT ∗ xk for all x ∈ H.
(iii) The real and imaginary parts of T as in Lemma 12.16.29 commute.
12.16. Hilbert Spaces 401

Proof. Notice that kT xk2 − kT ∗ xk2 = (T ∗ T x, x) − (T T ∗ x, x) = (T ∗ T − T T ∗ )x, x) for all

x ∈ H. Hence (i) clearly implies (ii), and by Corollary 12.16.31, (ii) implies (i).

Let T = R + iJ the real–imaginary decompostion of T . Then

T ∗ T = R2 + i(RJ − JR) + J 2
T T ∗ = R2 − i(RJ − JR) + J 2
Hence T ∗ T − T T ∗ = 2i(RJ − JR). The quivalence between (i) and (iii) follows.

12.16.3. Orthonormal systems. In inner product spaces, families of orthogonal vectors

are very helpful as they the coefficients of the elements in the closure of their span can be
easily computed. In a Hilbert space H, a collection of orthogonormal vectors P ⊂ H forms
a complete othonormal system (orthogonal basis for shorter) if span(P ) is dense in H.
In this section we show such a system exists in any Hilbert space. For separable Hilbert
spaces, we show a procedure to construct susch a system.
Theorem 12.16.33. (Bessel’s inequality) Suppose that H is a Hilbert space and that {en :
n ∈ N } is a collection of orthonormal (orthogonal and unitary) vectors, that is (en , em ) =
δn,m . Then, for any x ∈ H,
X
(12.50) |(x, en )|2 ≤ kxk2 .
n

In particular, if N = N, then limn |(en , x)| = 0.

Proof. For a finite set I ⊂ N , let MI = span{en : n ∈ I} and define

X
QI x = x − (x, en )en .
n∈I

Since (QI x, em ) = 0 for all m ∈ I, the orthogonal projection PI of H onto MI satisfies

X
PI (x) = (x, en )en , x ∈ H.
n∈I

Clearly
X
kxk2 = kPI xk2 + kQI xk2 ≥ kPI xk2 = |(x, en )|2
n∈I

Inequality (12.50) follows by taking the supremum over all finite subsets I ⊂ N .
Theorem 12.16.34. (Parseval) Let H be a Hilbert space. There exists a maximal family
G ⊂ H of orthonormal vectors such that H = span(G ). If in addition, H is separable, then
G is countable and
X X
(12.51) lim kx − (x, en )en k = 0, and kxk2 = |(x, en )|2
n→∞
n n

for all x ∈ H, where G = {en : n ∈ N}.

402 12. Some Elements of Functional Analysis

Proof. Consider the family S of all collections of orthonormal vectors partially ordered by
inclusion. It is clear that C is an orthonormal family of vectors whenever C is a chain of
orthonormal families. By Zorn’s lemma, there exits a maximal orthonormal family G in H.
Let M = span(G ). We claim that M = H. If not, there is u ∈ H \ M and u = P u + Qu
1
for some P u ∈ M , Qu ∈ M ⊥ , and Qu 6= 0. If follows that G ∪ { kQuk Qu} is an orthonormal
collection, in contradition to the maximality of G . Therefore H = span(G ).
If H is separable, then any maximal orthogonal class√ G is countable. Indeed, for any
′ ′
orthonormal vectors e and e we have that ke − e k = 2. If S is a countable
√ dense subset
of H, then for each e ∈√G one can choose u(e) ∈ S such that ke − u(e)k < 2/4. It follows
that ku(e) − u(e′ )k ≥ 2/2; consequently, G is countable.
Let (en : n ∈ N) be Pan enumeration of the elements of G . Bessel’s inequalityPimplies that
n
the sequence sn = k=1 (x, ek )ek is a Cauchy sequence in H, suppose that s = n (x, en )en .
⊥
Since (x − s, em ) = 0 for all m ∈ N, it follows that (x − s) ∈ span(G ) = H ⊥ = {0}. A
simple calculation shows that
Xn Xn n
X
2
x − 2
(x, ek )ek = kxk − 2 2
|(x, ek )| + |(x, ek )|2 .
k=1 k=1 k=1
After letting n → ∞ (12.51) follows immediately.
Theorem 12.16.35. (Gram–Schmidt orthogonalization) Suppose {xn : n ∈ N} ⊂ H is a
sequence of linearly independent vector in a Hilbert space H. Let M0 = {0}, and for n ≥ 1
let Mn = span(x1 , . . . , xn ). There exists an orthonormal sequence {un : n ∈ N} ⊂ H such
that for each n ∈ N
⊥ .
(i) un ∈ Mn ∩ Mn−1
(ii) Mn = span(u1 , . . . , un ).
If {vn : n ∈ N} is another orthonormal sequence satisfying (i)–(11), then vn = λn un , where
λn ∈ S1 .

Proof. For n = 1 define u1 = kx1 k−1 x1 . Clearly consitions (1)–(ii) are satisfied. Assume
vectors {u1 , . . . , un−1 }, n > 1, had been constructed so that (i) and (ii) hold. Let Pn be the
orthogonal projection from H onto Mn . Define
u′n+1 = (I − Pn )xn+1
1
un+1 = ′ un+1
kun+1 k
Clearly u′n+1 ∈ Mn+1 ∩Mn⊥ , and since the vectors in Mn+1 are linearly indpendent, ku′n+1 k >
0. Since
Xn
Pn xn+1 = (xn+1 , uj )uj
j=1
xn+1 = Pn Xn+1 + ku′n+1 k un+1
12.16. Hilbert Spaces 403

it follows that Mn+1 = span(u1 , . . . , un+1 ). This completes our construction. The last
statement is easily proved by induction.

12.16.4. Compact operators in Hilbert spaces. Hilbert spaces compact operators

are in a way more similar to finite dimensional spaces than in Banach spaces. We will
see for instance, that any T ∈ Lc (H) may be approximated by operators of finite range.
Also, from Theorem 12.15.12 we know that if T ∈ Lc (H), σ(T ) \ {0} consists of at most
countable collection of eigenvalues and, when dim(H) = ∞, the eigenvalues can be ordered
in a sequence {λn : n ∈ N} so that |λn+1 | ≤ |λn | and limn λn = 0. If in addition T is self–
adjoint, σ(T ) ⊂ R. The interesting fact we will study here is that a self–adjoint–compact
operator is in fact completely determined by its values on eigenvectors.
We know consider self–adjoint–compact operators and show that they are similar to
their finite dimensional counterparts.
Theorem 12.16.36. Suppose H is a Hilbert space. An operator T ∈ L(H) is compact iff
T can be approximated in L(H) by a sequence of operators with finite dimensional range.

Proof. Sufficiency is direct consequence of Theorems 12.15.4 and 12.15.5[(i)].

Necessity Let U be the unit ball in H and let L = T (H). L is a Hilbert space and since T (U )
is totally bounded, L is separable. Therefore, by Parseval’s theorem L admits a sequence
of orthonormal vectors Φ = {φn : n ∈ N} such that span(Φ) = L. Let Pn be the projection
from H onto {φj : 1 ≤ j ≤ n}. Each Tn := Pn T is a bounded operator of finite dimension
range.

By assumption K = T (U ) is compact in L. Notice that

kT − Tn k = k(I − Pn )T k ≤ sup k(I − Pn )yk
y∈K

To complete the proof of the Theorem, it suffices to show that the sequence of functions
gn (y) = k(I − Pn )yk defined in K converges uniformly to 0 along some subsequence. From
Bessel’s inequality shows that gn pointwise to 0 since
X
gn2 (y) = |(y, ek )|2 → 0
k>n

for each y ∈ K. For y, y ′ ∈ K

|gn (y) − gn (y ′ )| ≤ k(I − Pn )(y − y ′ )k ≤ ky − y ′ k
Hence {gn : n ∈ N} is equicontinuous. The conclusion follows from by Arzelà–Ascoli’s
theorem.
Lemma 12.16.37. Suppose H is a Hilbert space. If T is a self–adjoint compact operator,
then there is an eigenvalue λ of T such that |λ| = kT k.

Proof. By Theorem 12.16.30, there is a sequence of unit vectors {xn : n ∈ N} such that
limn |(T xn , xn )| = kT k. Since T is compact and x 7→ (T x, x) is real, without loss of
404 12. Some Elements of Functional Analysis

generality we may assume that T xn converges to some y ∈ H and (T xn , xn ) converges to

some number λ ∈ R. Clearly |λ| = kT k. From
n→∞
kT xn − λxn k2 = kT xn k2 − 2λ (T xn , xn ) + λ2 ≤ kT k2 − λ (T xn , xn ) + kT k2 −−−→ 0
Hence limn λxn = limn T xn = y, and by continuity
T y = lim T (λxn ) = λy
n
This shows that λ ∈ σ(T ).
Lemma 12.16.38. Suppose T is a self–adjoint–compact operator on a Hilbert space. If λ
⊥
and µ are distinct eigenvalues of T then, ker(T − λI) ⊂ ker(T − µI) .

Proof. Suppose T x = λx and T y = µy for nonzero x and y. Both λ and µ are real. Thus
λ(x, y) = (T x, y) = (x, T y) = µ(x, y).
As a consequence (λ − µ)(x, y) = 0. Since λ 6= µ, (x, y) = 0.
Theorem 12.16.39. Suppose T is a self–adjoint–complex operator on a Hilbert space H.
Then, σ(P ) is at most countable and σ(T ) \ {0} are eigenvalues of T that have at most one
accumulation point, namely 0.
Let Φ = {λn } be the list of all distinct nonzero eigenvalues of T ordered decreasingly accord-
ing to magnitude, i.e., |λn+1 | ≤ |λn |. If Pn be the projection from H onto Nn = ker(T −λn I),
then
X
(12.52) T = λn Pn
n

Proof. The first statement is consequence of Theorem 12.15.12 which also implies that
each Pn is of finite range. Define
Mn := N1 ⊕ . . . ⊕ Nn
[
M := span Mn
n

Since Nn = ker(T − λn T ), T Pn = λn Pn . By Lemma 12.16.38, N0 := ker(T ) ⊥ Mn , and

Pn Pm = 0 for all n and m 6= n. Consequently, T (Mn ) ⊂ Mn , T (Mn⊥ ) ⊂ Mn⊥ ; furthermore,
the orthogonal projection PMn : H → Mn is given by
PMn = P1 + . . . + Pn
Pn ⊥
and satisfies T PMn = j=1 λj Pj . It follows from these observations that N0 ⊂ M ,
T (M ) ⊂ M and, since 0 = (T x, y) = (x, T y) for all x ∈ M and y ∈ M ⊥ , T (M ⊥ ) ⊂ M ⊥ .

We claim that T (M ⊥ ) = {0}. Otherwise, the restriction of TM ⊥ of T to M ⊥ is a non–zero

self–adjoint compact operator in L(M ⊥ ). By Lemma 12.16.37, TM ⊥ has an eigenvalue λ
with |λ| = kT kM ⊥ > 0. However, all eigenspaces of T are contained in M . Therefore
T (M ⊥ ) = {0}, that is ker(T ) = M ⊥ .
12.16. Hilbert Spaces 405

When the sequence Φ of nonzero eigenvalues has a finite number k of elements, then

H = N1 ⊕ . . . Nk ⊕ N0

and (12.52) follows immediately.

To prove (12.52) in the case when Φ is infinite, it suffices to show that Tn = T PMn converges
to T in the operator norm. For each n, the restriction TMn⊥ of T to Mn⊥ is a self–adjoint com-
pact operator all of whose distinct nonzero eigenvalues are {λk : k > n}. Lemma 12.16.37
implies that kT kMn⊥ = |λn+1 |. Hence
n
X

kT − Tn k = T − λj Pj = kT − T PMn k
j=1
n→∞
= kT (I − PMn )k ≤ kT kMn⊥ kI − PMn k ≤ |λn+1 | −−−→ 0

since limn λn = 0.

Corollary 12.16.40. Suppose T and S are self–adjoint compact operators in a Hilbert

space H. If T S = ST , then there exists an orthonormal sequence {en : n ∈ N} ⊂ H and
real sequences {τn } and {ξn } such that
X X
Tx = τn (x, en )en , Sx = ξn (x, en )en
n n

Proof. As in Theorem 12.16.39, let Φ = {λn : n ∈ N} the sequence of all distinct nonzero
eigenvalues of A ordered decreasingly in order of magnitude (|λn+1 | ≤ |λn |), Nn = ker(T −
λn I), and Pn be the orthogonal projection from H onto Nn . Then
X
A= λn Pn
n

Let N0 = ker(T ). Since T S = ST , for all n ∈ Z+ S(Nn ) ⊂ Sn , for if x ∈ Nn , then

T Sx = ST x = λn Sx. Similarly, for any x ∈ Nn S(Nn⊥ ) ⊂ Nn⊥ , for if x ∈ Nn and y ∈ Nn⊥ ,
then 0 = (Sx, y) = (x, Sy). Hence, SPn = Pn T for all n.

For each n ∈ N, kn := dim(Nn ) < ∞. Theorem 12.16.39 applied to the restriction of S to

Nn implies that there are an orthonormal basis {en,j : 1 ≤ j ≤ kn } for Nn and sequence of
real numbers {µn,j : 1 ≤ j ≤ kn } (not necessarily distinct) such that Sen,j = µn,j en,j , and
for all x ∈ Nn
kn
X
T x = λn (x, en,j )en,j
j=1
kn
X
Sx = µn,j (x, en,j )en,j
j=1
406 12. Some Elements of Functional Analysis

If N0 6= {0}, then by Theorem 12.16.39 applied to the restriction S0 of S to N0 implies that

X
S0 = ξm E m
m

where {µm } are all the distinct nonzero eignevalues of S0 ordered decreasingly according
to magnitude, and Em is the projections from N0 onto N0,m = ker(S0 − µm I|N0 ). Each
′ = dim(N ′
km 0,m ) < ∞ and so, N0,m has a finite orthonormal basis {em,j : 1 ≤ j ≤
′ }. Rearranging the order of the {µ } and
S ′ ′
km m m {em,j : 1 ≤ j ≤ km } we obtained a
sequence of orthonormal vectors {e0,m } ⊂ N0 and a sequence of real numbers {µ0,m } (not
necessarily distinct numbers) such that Se0,m = µ0,m e0,m . Then, setting λ0 = 0 and

k0 = dim span({e0,m }) we get
 
XX kn
Sx =  µn,j (x, en,j )en,j 
n≥0 j=1
 
XX
kn
Tx =  λn (x, en,j )en,j 
n≥0 j=1

for all x ∈ H. The remaining of the proof consists of rearranging the double sequences into
one keeping the relation between eigenvalues and corresponding eigenvectors.

12.17. Exercises
Exercise 12.17.1. Let α ∈ F \ {0} and A ⊂ X. Show that (αA)◦ = αA◦ .
Exercise 12.17.2. Let X be a topological vector space. Suppose Y is linear subspace of
X. Show that
(a) Y has non empty interior iff Y = X (Hint: Use Theorem 12.1.15(a)).
(b) Y is bounded iff Y = {0} (Hint: let x ∈ Y \ {0} and choose a neighborhood V ∋ 0
which does not contain x).
Exercise 12.17.3. Suppose X is linear topological spaces, A ⊂ X is nonempty compact
and B ⊂ X nonempty closed. Show that A + B is closed in X.
Exercise 12.17.4. For each n ∈ Z let en (t) := eint (|t| ≤ π). Define
fn = e−n + nen , (n ∈ N)
Let X1 be the closure in L2 (−π, π) of the linear span of the functions {en : n ∈ Z+ }, and
let X2 be the closure in L2 (−π, π) of the linear span of {fn }. Show that X1 + X2 is dense
in L2 (−π, π) but it is not closed. For instance,
∞
X 1
x= e−n
n
n=1

is an element if L2 (−π, π) \ (X1 + X2 ).

12.17. Exercises 407

Exercise 12.17.5. Suppose d is a translation invariant metric in a topological vector space

X. Show that
(a) d(x, 0) = d(−x, 0) for all x ∈ X.
(b) d(x1 + . . . + xn , 0) ≤ d(x1 , 0) + . . . + d(xn , 0) for all x1 , . . . , xn ∈ X and n ∈ N.
(c) d(nx, 0) ≤ nd(x, 0) for all n ∈ N and x ∈ X.
Exercise 12.17.6. Let ℓ∞ = L∞ (N, P(N), #). Let Λ be a Banach limit on ℓ∞ . For any
A ⊂ N define ν(A) := Λ(1A ). Show that ν is not countably additive and conclude that µ is
a charge on (N, P(N)).
Exercise 12.17.7. Complete the prove that the topology τd on C ∞ (Ω) defined in Exam-
ple 12.3.7 is a Fréchet topology. Show that for each K ∈ K(Ω), DK , restricted to Ω, is a
closed subspace with respect to τd .
Exercise 12.17.8. Suppose {X, (Xn , τn ) : n ∈ N} is a strict inductive system of Fréchet
spaces, and let Y be a locally convex space. Show that a linear transformation T : X → Y
is continuous iff T is sequentially continuous, that is T φn → T φ whenever φn → φ in D.
(Hint: For any open convex and balanced neighborhood V of 0 ∈ Y , W = T −1 (V ) is a
convex and balanced set in X.)
Exercise 12.17.9. Suppose Ω ⊂ Rn is nonempty and open, h ∈ Rn , and U san open set
containing Ω + h. Show that
(a) τh : φ(x) 7→ φ(x − h) is a continuous linear map from D(Ω) to D(U ) is continuous.
(b) R : φ(x) 7→ φ̃(x) := φ(−x) is a continuous linear map from D(Ω) into D(−Ω).
Exercise 12.17.10. Suppose Ω ⊂ Rn is nonempty and open. Members of the dual space
D′ (Ω) of D(Ω) are called distributions in Ω. The weak∗ topology (σ(D′ (Ω), D(Ω))) on
makes D′ (Ω) a locally convex space. Let f ∈ C ∞ (Ω), µ be a (real–valued) Radon measure
on Ω, and u ∈ D′ (Ω) and define
Z
Λµ (φ) = φ(x)µ(dx), Dα u(φ) = (−1)|α| u(∂ α φ), (f · u)(φ) = u(f φ)
Ω
(a) Show that Λµ , Dα u, f ·u are distributions. For fixed x ∈ Rn show that φ 7→ u∗φ(x)
is a distribution. Dα u is called the α–th derivative of u; f · u is the multiplication
of u with f . (Hint: Dα u = (−1)|α| u ◦ ∂ α .)
(b) Consider the case µ = f dλn and Ω = Rn . Show that
Z Z
|α| α
(−1) f (x) (∂ φ)(x) dx = (∂ α f )(x)φ(x) dx
Rn Rn

for φ ∈ D(Rn ). This gives some justification to the name the derivative of a
distribution. (Hint: Use Fubini’s them together with integration by parts.)
(c) For any u ∈ D(Rn ) define τx u(φ) := u(τ−x φ). Show that τx u ∈ D∗ (Rn ) for any
xinRn . For φ ∈ D(Rn ) fixed, show that x 7→ τx u(φ) is continuous.
408 12. Some Elements of Functional Analysis

T
Exercise 12.17.11. For any set A in a vector space X, show that co(A) = {C : A ⊂
C, C convex}. If X is a topological vector space, show that co(A) is the smallest closed
convex set that contains A.
Exercise 12.17.12. Suppose A is a non–empty subset of a real vector space X. The
minimal affine set that contains A is defined as the intersection of all affine subspaces in X
that contain A. For any a ∈ A, show that
nX n Xn o
aff(A) = αk xk : n ∈ N, αk ∈ R, αk = 1, xk ∈ A
k=1 k=1
= a + span(A − a).

Show that the smallest closed affine space that contains A is given by aff(A) = a +
span(A − a) for all a ∈ A.
Exercise 12.17.13. Let µ be a probability measure on (R, B(R)) such that F (x) =
µ(−∞, x] is continuous. Show that L0 (µ) is not locally convex.
Exercise 12.17.14. Let C ⊂ X be a convex set in a real vector space X. The relative
interior of C, denoted by ri(C), is defined as the interior of C relative to aff(C). For any
a ∈ C, show that
(a) ri(C) = a + int(C − a), where int(C − a) is the interior of C relative to the vector
space span(C − a).
(b) If ri(C) 6= ∅, show that ri(C) = C.
Exercise 12.17.15. For any nonempty subset A ⊂ X, show that
nX n o
cone(A) = λj xj : n ∈ N, λj ≥ 0, xj ∈ A
j=1

and co(A) ⊂ cone(A).

Exercise 12.17.16. Suppose (X, k · kX ) and (Y, k · kY ) are normed vector paces. Recall
that
kT k := sup kT xkY , T ∈ L(X, Y )
kxkX =1

defines a norm on L(X, Y ). Show that

(a) kT k = sup kT xkY for all T ∈ L(X, Y ).
kxkX ≤1
(b) If T ∈ L(X, Y ) and S ∈ L(Y, Z), where Z is another normed space, then kST k ≤
kSkkT k. (Here ST denotes S ◦ T .)
Exercise 12.17.17. Suppose K is a compact set in a Fréchet space X. If f : X → K is
continuous, show that f admits a fixed point in K. (Hint: Use Mazur’s theorem to show
that co(K) is compact.)
12.17. Exercises 409

Exercise 12.17.18. Show that the there is a solution f ∈ C([0, 1]) to the equation
Z 1
f (x) = sin(t + f 2 (x)) dt
0
for all x ∈ [0, 1]. (Hint: Use Exercises 12.17.17.)
Exercise 12.17.19. Suppose X is an F–space, Y is a normed space and Γ is a collection
of continuous maps from X into Y . Let B be the set of all points x ∈ X whose orbit
Γ(x) = {Λ(x) : Λ ∈ Γ} is bounded. If B is of first category, show that X \ B = {x ∈
X : supΛ∈Γ kΛ(x)k = ∞} is a dense Gδ set in X. (Hint: Consider the map ϕ : x 7→
supΛ∈Γ kΛ(x)k. As ϕ is lower semicontinuous, Vn = ϕ−1 ((n, ∞)) is open in X for any
n ∈ N. Show that Vn is dense in X.)
Exercise 12.17.20. Let X and Y be an F–space and let Γ = {Λn : n ∈ N} be a sequence of
continuous linear maps from X into a topological vector space Y such that Λn x converges
to a point Λx for each x ∈ X. Show that Λ is a continuous linear functional from X to Y .
Exercise 12.17.21. Suppose (X, k · kX ), (Y, k · kY ) and (Z, k · kZ ) are Banach spaces, and
B : X × Y → Z is a bilinear map continuous separately on each component. Show that B
is continuous as a map from the product space X × Y to Z. Show that there is M > 0 such
that kB(x, y)kZ ≤ M kxkX kykY for all (x, y) ∈ X × Y .
Exercise 12.17.22. Let X be topological vector space over F with topological dual space
X ∗.
(a) Show that the space X × F with adition and scalar multimplication given by
λ(x, α) + (y, β) = (λx + y, λα + β)
is a topological vector space over F when F has the Euclidean topology.
(b) Show that (X × F)∗ = X ∗ × F.
Exercise 12.17.23. Consider the measure space (N, 2N, #) where # is the counting mea-
sure. The spaces Lp (#) on (N, 2N) will be denoted by ℓp . Let c0 be the subspace of all
f ∈ ℓ∞ such that limm→∞ f (m) = 0. Show that
(a) c0 is a closed subspace of ℓ∞ .
(b) c∗0 = ℓ1 , that is, for any L ∈ c∗0 , there is a sequence l ∈ ℓ1 such that
X
kLk1 = |l(n)| < ∞
n≥1
P
and L(f ) = n≥1 l(n)f (n) for all f ∈ c0 .
P
(Hint: Given L ∈ let γn = L(1{n} ). For f ∈ c0 define fn = nk=1 f (k)1{k} . Show that
c∗0 ,
kfn − f k∞ → 0.) Constrast the conclusion in (b) with Example 12.12.8.
Exercise 12.17.24. Suppose X is a locally convex topological vector space with dual X ′ .
Suppose K is σ(X, X ′ )–compact. If there is a countable set in X ′ that separates points of
K, show that K is originally bounded and metrizable. (Hint: K is weakly bounded and
hence, originally bounded. Use Theorem 2.9.1)
410 12. Some Elements of Functional Analysis

Exercise 12.17.25. Let X, Y and Z be Banach spaces. For any T ∈ L(X, Y ), S ∈ L(Y, Z)
and a ∈ F show that
(i) (aT + S)† = aT † + S † .
(ii) (ST )† = T † S † .
† −1
(iii) If T is bijective, then so is T † and T −1 = T† ∈ L(X ∗ , Y ∗ ).
Suppose X = Y and we identify X as a subspace of double dual X ∗∗ through the map
x 7→ x̂ where x̂(x∗ ) = x∗ (x).
(iv) T †† |X = T .

Exercise 12.17.26. If X is a Banach space and T is an isometry in X, show that either

σ(T ) ⊂ S1 or σ(T ) = B(0; 1). (Hint: if T is a bijective, σ(T ) ⊂ S1 . If T is not surjective
and σ(T ) 6= B(0; 1) there is λ ∈ ∂σ(T ) with |λ| < 1.)

Exercise 12.17.27. Suppose X is a Banach space. Show that the set Surj(X) of all
bounded linear surjective maps is open in L(X) with the operator norm (Hint: Apply
Theorem 12.13.12).

RExercise 12.17.28. Let k ∈ Lp (X × Y, B ⊗ F , µ ⊗ ν), 1 < p < ∞, and define Kf (x) :=

Y k(x, y) f (y)ν(dy). Show that there is a constant C > 0 such that kKf kp ≤ Ckf kq for
all f ∈ Lq (Y ), where q is the conjugate of p. This means that K : Lq (Y ) → Lp (X) is a
bounded linear operator. What is K † ?

Exercise 12.17.29. Suppose H is a Hilbert space and let T ∈ L(H). Show that |||T ||| :=
supkxk=1 |(T x, x)| ≤ kT k. If T is seld–adjoint, show that kT k = |||T ||| (Hint: Show that
1
Re(αβ(T x, y)) = (T (αx + βy), αx + βy) − (T (αx − βy), αx − βy)
4
≤ |||T ||| |α|2 kxk2 + |β|2 kyk2 )

for all x, y ∈ H and a, b ∈ C. In particular, set y = T x and choose a, b appropriatley.)

Exercise 12.17.30. Suppose H is a complex Hilbert space. A linear map A : H → H is

positive iff (Ax, x) ≥ 0 for all x ∈ H. Show that if A is positive, then A is self–adjoint, and
that

(12.53) |(Ax, y)|2 ≤ (Ax, x)(Ay, y), x, y ∈ H

Inequality (12.53) is the generalized Cauchy inequality. (Hint: f (x, y) := (Ax, y) satisfies
the properties of an inner product, except for possibthe condition f (x, x) = 0 iff x = 0. The
proof of the Cuachy–Schwartz inequality goes through in this case).

Exercise 12.17.31. Suppose Ω is an open bounded subset of Rd , and let K ∈ C Ω × Ω .
R
Show that the operator T x(t) = Ω K(t, s)x(s) ds defined on C(Ω) is a compact operator on
L(C(Ω)).
12.17. Exercises 411

Exercise 12.17.32. Let I = [a, b], a < b. On C 2 (I) define the norm |||x||| = kxku + kx′ ku +
kx′′ ku . Under this norm X is a Banach space. Define the map L : C 2 (I) → C 0 (I) by
Lx(t) = a0 (t)x′′ (t) + a1 (t)x′ (t) + a2 (t)x(t)
where aj ∈ C 2−j (I) for j = 0, 1, 2, and a0 > 0. Show that
(a) L ∈ L(C 2 (I), C 0 (I)).
(b) dim(ker(L)) = 2 (Hint: there are unique solutions to the initial value problems
Lx = 0 with x(a) = 0, x′ (a) = 1 and x(a) = 1, x′ (a) = 0 respectively.)
Exercise 12.17.33. In this exrcise, D is the differential operator. Show that the n–th term
in each of the sequences {pn : n ∈ N} defined below are polynomials of degree, ans that
each sequence is orthogonal in a corresponding L2 space.
(a) Legendre polynomials
√
2n + 1 n
Pn (x) := √ D (1 − x2 )n , ([−1, 1], B([−1, 1]), dx)
n!2n 2
(b) Laguerre polynomials
ex 1
Ln (x) := Dn (e−x xn ) = (D − 1)n xn , (0, ∞), B(0, ∞), e−x dx
n! n!
(c) Hermite polynomials
2 /2 2 /2 1 2
Hn (x) := (−1)n ex Dn (e−x(R, B(R), √ e−x /2 dx)
),
2π
In Section 15.1, it will be seen that these sequences are complete orthogonal systems in
their respective L2 spaces.
Exercise 12.17.34. On L2 [0, 1], define the operator Ax(t) = tx(t). Show that A is a
self–adjoint bounded operator, kAk = 1, and σP (A) = ∅. What is σ(T )?
Exercise 12.17.35. Let H be a Hilbert space. Suppose T is a compact normal operator on
H. Show that there are a sequence of complex numers {λn } and an orthonormal sequence
of vectors {en } ⊂ H such that
X
Tx = λn (x, en )en , x ∈ H.
n
(Hint: use the real–imaginary decomposition of T )
Chapter 13

More results on duality

13.1. Dunford–Pettis Theorem

The dual of L∞ (µ) clearly contains L1 (µ). When µ is σ–finite we know by the Riesz–
representation theorem 8.4.3 that L∗1 (µ) = L∞ (µ). By Corollary 12.10.10
Z

kΛf k1 = max gf dµ = kf k1 ,
{g∈L∞ :kgk∞ =1}

which means that the map f 7→ Λf is an isometry from (L1 , k k1 ) into (L∗∞ , k k). As
a consequence, L1 (µ) is normed–closed in L∗∞ (µ) and, by Theorem 12.11.13, σ(L∗∞ , L∞ )–
closed.
The following example shows that if p = 1, the statement of Theorem 8.4.3 may not
hold if µ is not σ–finite.

Example 13.1.1. Suppose Ω is uncountable. Let F = P(Ω) and let B be the sub σ–
algebra generated by the countable subsets of Ω. Let µ be the counting measure on F and
let µ0 be its restriction to B. Then L1 (µ) = L1 (µ0 ) consists of all functions equal to zero
except on countable subsets of Ω; L∞ (µ0 ) is the collection of all functions that are constant
except on countable subsets of Ω; L∞ (µ) is the collection of all bounded functions. It is
∗
P = L∞ (µ) ⊃ L∞ (µ0 ). Let A and B be uncountable with A ∪ B = Ω and
clear that (L1 (Ω))
define Λ(f ) =R x∈A f (x). Then Λ is a continuous linear functional on L1 (µ0 ) with kΛk = 1
and if Λf = f g dµ, then g = 1A ∈ / L∞ (µ0 ).

Example 13.1.2. Suppose (Ω, F , µ) is a σ–finite measure space. For any 1 ≤ p < ∞,
The collection S ∗ of simple integrable functions is dense in Lp , and by Alaoglu’s theorem,
the dual unit ball Bq = {g ∈ Lq : kgkq ≤ 1}, 1 < q ≤ ∞, is σ(Lq , Lp )–compact. If F is
countably generated then, S ∗ has a countable subset that is dense in Lp ; in which case,
the topology σ(Lq , Lp ) on Bq is metrizable, and so Bq is sequentially compact.

413
414 13. More results on duality

We conclude this section with a result that describes uniform integrability in terms of
weak compactness.
Lemma 13.1.3. Let (fn ) be a bounded sequence in L1 (Ω, F , ν) . (a) If µn = fn dν con-
verges setwise, then (fn ) is uniformly integrable, and there is f ∈ L1 (Ω, F , ν) to which (fn )
converges weakly in σ(L1 , L∞ ). (b) In addition, if ν is finite and fn → f in ν–measure,
then fn → f in L1 .

Proof. (a) By Corollary 10.7.5 of Vitali–Hahn–Saks theorem {µn } converges setwise to a

finite signed or complex measure µ and {µn , µ : n ∈ N} is uniformly continuous w.r.t. a
probability measure P ≪ ν. Thus, µ ≪ ν and µ can be expressed uniquely as µ = f · dν
for some f ∈ L1 (µ). Conditions (i) and (ii) of Theorem 8.7.4(a), with h = dP
dν , hold and we
conclude that {fn , f } is uniformly integrable in L1 (ν).

We now show that (fn ) converges to f in σ(L1 , L∞ ).R Let M :=R supn kfn k1 . Setwise
convergence of µn = fn dν to µ = f dν implies that sfn dν → sf dν for all simple
functions. As simple functions are dense in L∞ (ν), for any g ∈ L∞ , and ε > 0, there exists
a simple function s such that kg − sk∞ < 3(Mε+1) . For such simple function s, there exists
R
an integer Nε such that n ≥ Nε implies s(fn − f ) dν < 3ε . Combining these facts, we
obtain that
Z Z Z Z

g(f n − f ) dν ≤
(g − s)f n dν +
s(f n − f ) dν +
(s − g)f dν
ε
≤ 2kg − sk∞ M + < ε.
3

(b) Suppose that, in addition, ν is finite and fn → f in ν–measure. Then, for any ε > 0 we
have limn ν(|fn − f | > ε) = 0. From the uniform integrability of (fn ) and the inequality
Z Z
kfn − f k1 ≤ εν(Ω) + |fn | dν + |f | dν
|fn −f |>ε |fn −f |>ε

it follows that fn → f in L1 (µ).

Theorem 13.1.4. (Dunford–Pettis) Suppose (Ω, F , µ) is σ–finite. A subset K ⊂ L1 is
σ(L1 , L∞ )–relatively compact iff K is uniformly integrable.

Proof. We consider the case where µ is a probability measure. The general case can be
derive from this one.

Suppose K is σ(L1 , L∞ )–compact. As (L1 (µ))∗ = L∞ (µ), by Eberlein–Smulian’s theorem

K is sequentially σ(L1 , L∞ ) compact.
We claim that K is bounded in L1 . If this were not the case, there would be a sequence
(fn ) ⊂ K with kfnRk1 ≥ n. Let (fn′ ) be a σ(L1 , L∞ )–convergent subsequence. Then, for all
g ∈ L∞ , Λn′ (g) = gfn′ dµ converges. Thus, by Banach–Steinhaus’ theorem, supn′ kΛn′ k =
supn′ kfn′ k < ∞ which is a contradiction.
13.2. The dual of L∞ 415

Now we prove that K is uniformly integragle. If that were not the case, there would be
number ε > 0 and sequences (En ) ⊂ F and (fn ) ⊂ K such that
Z
1
(13.1) µ(En ) < , |fn | dµ ≥ ε.
n En
Let fn′ be a σ(L1 , L∞ )–convergent
R subsequence. Then by Lemma 13.1.3, (fn′ ) is uniformly
integrable, and so limn′ E ′ |fn′ | dµ = 0. This is a contradiction to (13.1).
n

w∗
Conversely, if K is uniformly integrable, then L is bounded in L1 an thus, the closure K
w∗
of K in σ(L∗∞ , L∞ ) is σ(L∗∞ , L∞ )–compact. For any Λ ∈ K , the map E 7→ Λ1E is clearly
a bounded finitely additive function in F ; for if (fα ) is a net in K such that limα fn = Λ
in σ(L∗∞ , L∞ ), then
Z Z
|Λ1E | = | lim fα dµ| ≤ sup |f | dµ ≤ sup kf k1 < ∞
α E f ∈K E f ∈K

for all E ∈ F . We will show now that in fact Λ is countably additive (hence, a measure)
and that Λ ≪ µ. Indeed, since K is uniformlyR integrable, for any ε > 0 there is δ > 0
such that µ(E) < δ implies |Λ1E | ≤ supf ∈K E |f | dµ < ε. As µ is finite, if En ց ∅, then
µ(En ) → 0, and so limn Λ1En = 0. This shows that Λ is a finite signed or complex measure
w∗
and Λ ≪ ν. Consequently, Λ = f dµ for some f ∈ L1 and so K ⊂ L1 . Therefore K is
relatively σ(L1 , L∞ )–compact.

13.2. The dual of L∞

In this section we describe the dual of the space L∞ (µ). As we mentioned earlier, L∗ (µ)
contains L1 (µ). If Λ ∈ L∗∞ (µ) then the map mΛ : E 7→ Λ(1E ) is clearly finitely additive
and mΛ (A) = 0 whenever µ(A) = 0. For any A ∈ F we have that
|mΛ |(A) = sup{Λ(φ) : φ ∈ E (F ), |φ| ≤ 1A } ≤ kΛkk1A k∞ < ∞
R
that is, mΛ is a charge of bounded variation. We will show that f ∈ L∞ (µ), Λf = f dmΛ .
We use baµ to denote the collection of finitely additive functions on F that have finite
variation and are absolutely continuous with respect to µ, i.e., if m ∈ baµ , then |m|(A) = 0
whenever µ(A) = 0.
Theorem 13.2.1. Suppose m ∈ baµ . For any R h ∈ L∞R, let f be a measurable bounded
R
function such that h = f µ–a.s., and define h dm := f dm. Then, Λ : h 7→ h dm
defines a continuous linear functional on L∞ . Conversely, if Λ is a continuous linear
functional on L∞ (µ). Then, there exists a mΛ ∈ baµ such that
Z
Λ(h) = h dmΛ , h ∈ L∞ (µ).

Moreover, kΛk = kmΛ k := |mΛ |(Ω).

Proof. Suppose Λ ∈ (L∞ (µ))∗ . Let mΛ be the restriction of Λ to the set of simple functions
E(F ) on F . The arguments above show that mΛ is additive and of finite variation. If
416 13. More results on duality

f ∈ L∞ , there is a sequence (φn : n ∈ N) ⊂ E(F ) that converges uniformly to f , except on

a set A ∈ F with µ(A) = 0, and so kφn − f k∞ → 0. From
|Λ(φn ) − Λ(φm )| ≤ |mΛ |(|φn − φm |) = |mλ |(|φn − φm |1Ac ) ≤ kΛkk(φn − φm )1k∞ ,
R R
it follows that limn φn dmΛ = limn Λφn = Λf , that is Λf = f dmΛ .

We now prove that if m is a charge of finite variation

R on E(R) that is absolutely continuous
with respect to µ, then the map Λ : f 7→R f dm is in (L∞ (µ))∗ . If f , g are bounded
functions such that f = h = g µ–a.s., then |f − g| d|m| = 0. Indeed, if A = {f 6= g}, then
Z Z Z
(13.2) |f − g| d|m| = |f − g| d|m| + |f − g| d|m| ≤ kf − gku |m|(A) = 0
A Ac
R R
Hence, h dm = f dm. If in addition kf ku = khk∞ , then
Z Z

h dm = f dm ≤ kf ku kmk = khk∞ kmk.
R
Therefore Λ : h 7→ h dm is a continuous linear functional.

The last statement is consequence if Theorem 10.3.1.

13.3. Lp –Interpolation Theorems

If 1 < p0 < p < p1 then Lp = Lp0 + Lp1 (see Exercise 8.9.13). Suppose that T is an operator
on Lp0 + Lp1 such that is bounded on both Lp0 and Lp1 . A natural question is whether
T is also bounded on Lp . We will answer this question in the positive under some general
assumptions.
Theorem 13.3.1. (Phragmen–Lindelöf) Let Ω = {x + i y : a < x < b}. Suppose f is a
bounded continuous function in Ω and holomorphic on Ω. If
M (x) = sup {|f (x + i y)| : −∞ < y < ∞} (a ≤ x ≤ b),
then
M b−a (x) ≤ M b−x (a)M x−a (b) (a ≤ x ≤ b)

Proof. Without loss of generality assume a = 0, b = 1. For ε > 0 let

for all 0 ≤ t ≤ 1.
13.3. Lp –Interpolation Theorems 417

Theorem 13.3.2. (M. Riesz). Suppose (X, MX , µ) and (Y, MY , ν) are measure spaces, ν
is semifinite, and 1 ≤ p0 , p1 , q0 , q1 ≤ ∞. For any 0 < t < 1 define pt and qt as
1 1−t t 1 1−t t
= + , = + .
pt p0 p1 qt q0 q1
Suppose T is a linear operator on Lp0 (µ) + Lp1 (µ) into Lq0 (ν) + Lq1 (ν) such that T is
bounded from Lpj (µ) to Lqj (ν), that is, for some constants M0 , M1 , kT f kqj ≤ Mj kf kpj for
all f ∈ Lpj (µ) (j = 0, 1). Then T is bounded from Lpt (µ) to Lqt (ν) for all 0 < t < 1, and

(13.3) kT f kqt ≤ M0t−1 M1t kf kqt

for all f ∈ Lpt (µ).

1 1
Proof. For each number 1 ≤ p ≤ ∞, we use p′ to denote its conjugate; that is p + p′ = 1.

Fix 0 < t < 1. We first assume that pt < ∞ and qt > 1. This excludes the cases
p0 = ∞ = p1 and q0 = 1 = q1 . Let SX denote the collection of µ–integrable simple
functions on X; similarly for SY . As the collection SX is dense in Lp (µ) for all 1 ≤ p < ∞,
it suffices to prove (13.3) for functions in SX . Corollary 8.3.10 and the density of SY in Lqt′
imply that if f ∈ SX , then
Z

(13.4) kT f kqt = sup T f g dν : g ∈ SY , kgkqt′ = 1 .
Y
Pm Pn
Let f = j=1 aj 1Aj ∈ SX and f = k=1 bk 1Bk ∈ SY , where all aj and bk are not zero and
the sets in {Aj } and {Bk } are disjoint, be such that kf kpt = 1 = kgkqt′ . Then T f ∈ Lq0 ∩Lq1
and thus, T f ∈ Lqt for any qt between q0 and q1 . Consider the functions α and β on C
given by
1−z z 1−z z
α(z) = + , β(z) = + .
p0 p1 q0 q1

Suppose aj = |aj |eiθj and bk = |bk |eiφk and let

m
X
(13.5) fz = |aj |α(z) pt ei θj 1Aj
j=1
Xn
′
(13.6) gz = |bk |β(z) qt ei φk 1Bk .
k=1
Pn P ′
If z = x + iy then |fz |px = j=1 |aj |pt 1Aj = |fx |px and |gz |qx = nk=1 |bk |qt 1Bk = |gx |qx .
The function
Z m X
X n Z
α(z)pt β(z)qt′ i(θj +φk )
F (z) = (T fz )gz dν = |aj | |bk | e (T 1Aj )1Bk dν
j=1 k=1
418 13. More results on duality

is entire and bounded on the strip Ω = {z = x + i y : 0 ≤ x ≤ 1}. Hölder’s inequality and

equations (13.5) and (13.6) lead to
|F (i y)| ≤ kT fi y kq0 kgi y kq0′
≤ M0 kfi y kp0 kgi y kq0′
q ′ /q0′
= M0 kf kpptt /p0 kgkqt′
t
= M0
and
|F (1 + i y)| ≤ kT f1+i y kq1 kg1+i y kq1′
≤ M1 kf1+i y kp1 kg1+i y kq1′
q ′ /q1′
= M1 kf kpptt /p1 kgkqt′
t
= M1 .
As ft = f and gt = g, the Phragmen–Lindelöf theorem implies that
Z
(T f ) g dν = |F (t)| ≤ sup {|F (t + i y)| : y ∈ R} ≤ M01−t M1t

This shows that (13.4) holds for any f ∈ SX with kf kpt = 1, whence we conclude that (13.3)
holds.

If q0 = 1 = q1 , then we define fz as in (13.5), and set gz = g. The arguments of the proof

above hold in this case.

If p0 = ∞ = p1 , the conclusion follows directly from Hölders inequality, for in such case
f ∈ L∞ (µ) implies T f ∈ Lq0 (ν) ∩ Lp1 (ν). Therefore,
Z
(1−t)qt
|T f |qt dν ≤ kT f k(1−t)q
q0
t
kT f ktq
q1 ≤ M0
t
M1tqt kf kq∞t .

Remark 13.3.3. The assumption that ν is semifinite is used only when q0 = ∞ = q1 ,
where qt = ∞ for all 0 < t < 1 and Corollary 8.3.10 still applies.
Definition 13.3.4. Suppose 1 ≤ p ≤ ∞, 1 ≤ q ≤ ∞. A mapping from T : Lp −→ L0 is
said to be of strong–type (p, q) if for f ∈ Lp
(13.7) kT f kq ≤ Akf kp ,
where A is a constant not depending on f .
If q < ∞, then T is said to be of weak–type (p, q) if there is a constant A such that

Akf kp q
(13.8) ν (|T f | > α) ≤ ,
α
for all α > 0. If q = ∞ weak (p, ∞) type is the same as strong (p, ∞) type.
13.3. Lp –Interpolation Theorems 419

It follows from straight application of Chebyshev inequality that strong–type (p, q) implies
weak–type (p, q):

αq m (|T f | > α) ≤ kT f kqq

≤ (A kf kp )q .

Theorem 13.3.5. (Marcinkiewicz) Supose (X, F , µ) and (Y, B, ν) are σ–finite measure
spaces. Let 1 ≤ s < r ≤ ∞ and Suppose T is a subadditive map from Ls (µ) + Lr (µ) to
the space MY of B measurable functions. If T is simultaneously of weak–type (s, s) and
weak–type (r, r), then T is of strong–type (p, p) for all s α) ≤ Aα1 kf ks when f ∈ Ls .
r
(iv) ν |T f | > α ≤ Aαr kf kr , when ∈ Lr .

(If r = ∞ we assume T is of strong type (∞, ∞)). Then,

kT f kp ≤ Ap kf kp , f ∈ Lp

for all s < p < r, where Ap depends only on As , Ar , p and r.

Proof. We first consider the case r < ∞. Let f ∈ Lp and define the function λ(α) =
{|T f | > α}. For α > 0, we have that f = f 1{|f |>α} + f 1{|f |≤α} , so that f1 = f 1{|f |>α} ∈ Ls
and f2 = f 1{|f |≤α} ∈ Lr . Condition (i) implies that

{|T f | > α} ⊂ {|T f1 | > α/2} ∪ {|T f2 | > α/2} .

Hence,

λ(α) = ν {|T f | > α} ≤ ν {|T f1 | > α/2} + ν {|T f2 | > α/2} ,

and by assumptions (ii) and (iii)

Z Z
(2As )s s (2 Ar )r
λ(α) ≤ |f1 | dµ + |f2 |r dµ.
αs αr
From the definition of f1 and f2 , we conclude that
Z Z
(2As )2 s (2 Ar )r
(13.9) λ(α) ≤ |f | dµ + |f |r dµ.
αs |f |>α α r
|f |≤α

By Fubini’s theorem
Z Z ∞
p
|T f | dν = p αp−1 γ(α) dα.
0
420 13. More results on duality

Multipying both sides of (13.9) by αp−1 and integrating with respect to α gives
Z ∞ Z Z Z |f |
p−1 −s s s
α α |f | dµ dα = |f | αp−s−1 dα dµ
0 |f |>α 0
Z
1
= |f |s |f |p−s dµ
p−s
Similarly,
Z ∞ Z Z Z ∞
p−1 −r r r
α α |f | dµ dα = |f | αp−1−r dα dµ
0 |f |≤α |f |
Z
1
= |f |r |f |p−r dµ
r−p
Consequently,

p (2 As )s (2 Ar )r
kT f kp ≤ Ap kf kp , (Ap ) = + p.
p−s r−p

Finally, we consider the case r = ∞. We decompose f ∈ Lp by letting f2 = f 1{|f |≤ 2(A α +1) }

∞
and f1 = f − f2 . Then the following inequality holds (almost everywhere):
|T f | ≤ |T f1 | + |T f2 |
≤ |T f1 | + kT f2 k∞
α
≤ |T f1 | +
2
This means that {|T f | > α} ⊂ {|T f1 | > α/2}. Therefore
ν {|T f | > α} ≤ ν {|T f1 | > α/2}
Z
(2 As )s
≤ |f1 | dµ
αs
Z
(2 As )s
= |f |s dµ
αs 2(A∞ +1)|f |>α

Just as we did before, we multiply by p αp−1 both sides of the previous inequality, integrate
with respect to α and apply Fubini’s theorem to get:

p(2As )s (2(A∞ + 1))p−s

kT f kp ≤ Ap kf kp with App =
p−s

This concludes the proof of the theorem.

Example 13.3.6. (Hardy–Littlewood) The Hardy–Littlewood maximal function is of weak–
type (1, 1) with
3n
m (|M f | ≥ α) ≤ kf k1
α
13.3. Lp –Interpolation Theorems 421

and strong–type (∞, ∞), with

kM f k∞ ≤ kf k∞
Thus by the Marcinkiewicz theorem, it is of strong–type (p, p) for any 1 < p < ∞.
Theorem 13.3.7. Suppose {Tt : t ∈ I}, where I ⊂ R is a family of linear operators on
Lp ((X, F , µ)) into Lp ((T, G , ν)). Define the maximal function
(T ∗ f )(y) = sup |Tt f (y)|, y∈Y
t∈I

If T ∗ is of weak (p, q)–type, 1 ≤ p, q < ∞, then the set {f ∈ Lq (µ) : limt→t0 Tt f (y) =
f (y) ν–a.s} is closed in Lp (µ)

Proof. Suppose {fn : n ∈ N} ⊂ Lp (µ) converges to f in Lp (µ) and satisfies limt→t0 Tt fn =

fn ν–a.s. for each n. Then for any λ > 0,

ν {y ∈ Y : lim sup |Tt f (y) − f (y)| > λ}
t→t0

≤ ν({y ∈ Y : lim sup |Tt (f − fn )(y) − (f (y) − fn (y))| > λ}
t→t0
λ
≤ν y ∈ Y : T ∗ (f − fn )(y) >
2
λ
+ ν {y ∈ Y : |f (y) − fn (y)| >
2
2A q 2 p
≤ kf − fn kp + kf − fn kp
λ λ
The terms in the last inequality converge to 0 as n → ∞. Therefore,

ν {y ∈ Y : lim sup |Tt f (y) − f (y)| > 0}
t→t0
X 1
≤ ν {y ∈ Y : lim sup |Tt f (y) − f (y)| > } = 0.
t→t0 k
k∈N

We conclude that limt→t0 Tt f = f ν–a.s.

1
R
We can provide another proof of the a.s. convergence of λd (B(x;r) B(x;r) f (y) dy to f (x).

Theorem 13.3.8. Let f ∈ Lloc d

1 (R , λd ). Then
Z
1
lim f (y) dy = f (x)
r→0 λd (B(x; r) B(x;r)

for almost all x ∈ Rd .

Proof. TheR conclusion is obviously true for all f ∈ C00 (Rd ). The opeprators Tr f (x) =
1 d ∗
λd (B(x;r) B(x;r) f dλd clearly map L1 (R , λd ) into itself, and T is Hardy’s Maximal function
M . As M is of weak–(1, 1) type, the result follows from Theorem 13.3.7.
422 13. More results on duality

13.4. Localization of distributions

In this section, we discuss a technique that allows to define distributions by looking at their
local behavior. Suppose Ω is a nonempty open set in Rn . Let U be an open subset of Ω. If
u1 and u2 are distributions in Ω, i.e. ui ∈ D′ (Ω) for i = 1, 2, we say that u1 = u2 in U if
u1 φ = u2 φ for all φ ∈ D(U ). We say that u ∈ D′ (Ω) vanishes in U if u(φ) = 0 whenever
φ ∈ D(U ). To study local properties of distributions, we will make use a speciall type of
functions in D(R) called mollifiers. A mollifier is a radial function ψ ∈ D such that for
some some compact K ⊂ Rn and open set U ⊂ Rn , K ψ U (see Exercise 13.7.4).
Theorem 13.4.1. (Smooth partition of of unity). Suppose Ω is a nonempty open set in Rn .
For any open covering U of Ω, there is a sequence {(Vn , ψn ) : n ∈ N}, where ψn ∈ D(Ω),
Vn is open in Ω, and such that
(i) Vn is a compact subset of Ω, and 0 ≤ ψn Vn .
(ii) {Vn : n ∈ N} is a locally finite cover of Ω.
(iii) Each V n is contained in some member of U .
P
(iv) ψn (x) = 1 for all x ∈ Ω.
n≥1
(v) For any compact K ⊂ Ω, there is an m ∈ N and an open set K ⊂ W ⊂ Ω such
that
ψ1 (x) + . . . + ψm (x) = 1
for all x ∈ W .

Proof. Let S be a dense set in Ω, and let {Bn : n ∈ N} be the sequence of all closed balls
whose centers pn lie in S, whose ratios rn are rational, and that are contained in some
member of U . For each Bn = B(pn ; rn ) set Vn = B(pn ; rn /2). Clearly {Vn : n ∈ N} is an
open cover of Ω. For each n ∈ N, let φn be a mollifier such that Vn φn Bn . Define
ψ1 = φ1 , and inductively ψn+1 = (1 − φ1 ) · . . . · (1 − φn )φn+1 . Clearly 0 ≤ ψn Bn . It is
easy to check by induction that for any n ∈ N.
ψ1 + . . . + ψn = 1 − (1 − φ1 ) · · · (1 − φn ).
If K ⊂ Ω is compact, then K ⊂ V1 ∪ . . . ∪ Vn form some n and so,
(13.10) ψ1 (x) + . . . + ψn (x) = 1, x ∈ V1 ∪ . . . ∪ Vn
From (13.10) it follows that {(Vn , ψn ) : n ∈ N} satisfies (i)–(v).

The sequence {(Vn , φn )} in Theorem 13.4.1 is said to be a smooth partition of unity

subordinated to U .
Theorem 13.4.2. Suppose U is an open cover of Ω ⊂ Rn , and that for any U ∈ U , there
corresponds a distribution ΛU ∈ D′ (U ) such that
ΛU = ΛV in U ∩V
13.4. Localization of distributions 423

whenever U ∩ V 6= ∅. Then, there exists a unique Λ ∈ D′ (Ω) such that

(13.11) Λ = ΛU in U
for any U ∈ U .

Proof. Let {(Vn , ψn )} be a partition of unity subordinated to U so that supp(ψn ) ⊂ Vn ⊂

Vn ⊂ Un , for Un ∈ U , and define
X
(13.12) Λ(φ) := ΛUn (ψn φ), φ ∈ D(Ω).
n≥1

For each φ ∈ D(Ω), the summation in (13.12) is in fact finite. Clearly Λ is linear on D(Ω).
To prove continuity, suppose φn → 0 in D(Ω). Then, there is a compact set K ⊂ Ω such
that supp(φn ) ⊂ K. Let m be as in Theorem 13.4.1[(v)], so that
m
X
Λ(φn ) = ΛUj (ψj φn ), n ∈ N.
j=1
n→∞ n→∞
Since ψj φn −−−→ 0 in D(Uj ) for each j, Λ(φn ) −−−→ 0. This means (see Exercise 12.17.8
that Λ ∈ D′ (Ω).

We claim that Λ satisfies property (13.11). Consider φ ∈ D(U ) where U ∈ U . Then

ψn φ ∈ D(U ∩ Un ) and so, ΛU (ψn φ) = ΛUn (ψn φ). Consequently,
X X X
Λ(φ) = ΛUn (ψn φ) = ΛU (ψn φ) = ΛU ψn φ = ΛU (φ)
n≥1 n≥1 n≥1
P
where we have used again the fact that n≥1 ψn φ has only a finite number of non zero
terms. This proves the existence of Λ. To prove uniqueness, notice that for any φ ∈ D(Ω),
ψn φ ∈ D(Un ). If Λ′ satisfies (13.11)
X X X
Λ′ (φ) = Λ′ ψn φ = Λ′ (ψn φ) = ΛUn (ψn φ) = Λ(φ)
n≥1 n≥1 n≥1

Definition 13.4.3. Suppose u ∈ D′ (Ω). Let WΛ be the union of all open sets in Ω where
u vanishes. The support of u ∈ D′ (Ω) is defined SΛ = Ω \ WΛ .
Theorem 13.4.4. If Λ ∈ D′ (Ω) has support SΛ then,
(i) Λ vanishes off SΛ .
(ii) If φ ∈ D(Ω) and supp(φ) ∩ SΛ = ∅, then Λ(φ) = 0.
(iii) If SΛ = ∅, then Λ ≡ 0.
(iv) If SΛ ⊂ W ⊂ Ω for some open Ω and ψ ∈ C ∞ (Ω) is such that ψ|W ≡ 1, then
ψ · Λ = Λ.

Proof. (i) Let {(Vn , φn )} be a partition of unitySsubordinated to the collection

PU of all
open sets in Ω on which Λ vanishes, and let WΛ = U . If φ ∈ D(WΛ ), then φ = n≥1 ψn φ.
424 13. More results on duality

Only finitely terms in the sum are different from 0. Since ψn φ ∈ D(U ) for some open set U
where Λ vanishes,
X
Λ(φ) = Λ(ψn φ) = 0.
n≥1

(ii) and (iii) follows directly from (i).

(iv) If φ ∈ D(Ω), {φ − ψφ 6= 0} ⊂ W c ⊂ SΛc . By (ii), Λ(φ) − Λ(ψφ) = Λ(φ − ψφ) = 0.

13.5. Riesz duality between C0 (X) and M (X)

In this section assume that X is a l.c.H. topological space. The Riesz representation theorem
gives an isomorphism from the cone of finite (positive) regular measures on B(X) onto the
cone of positive continuous linear functionals on C00 (X). The space C0 (X) of continuous
functions vanishing at infinity is the closure of C00 (X) with respect to the sup norm. Thus,
any bounded linear functional Λ on C00 (X) has a unique extension to a bounded linear
functional Λ∗ on C0 (X) and kΛ∗ k = kΛk. Furthermore, if Λ is positive, so is the extension
Λ∗ . It is straight forward to check that the conclusion of the Riesz representation remains
true if C00 (X) is replaced by C0 (X).
R
Suppose now that µ is a complex (or finite signed) measure. Then the f 7→ X dµ
defines a linear functional Λν on C0 (X) since
Z Z

f dµ ≤ |f | d|µ| ≤ kf ku kµkT V
X X

Lemma 13.5.1. If |µ| is a regular finite measure on B(X). Then kΛµ k = kµkT V .

Proof. We only need to prove that kΛµ k ≥ kµkT V . Since |µ| is finite and regular, for any
measurable set A and ε > 0, there is K ∋ K ⊂ A such that |µ|(A \ K) < ε.
P
Let {Aj : 1 ≤ j ≤ n} be a finite partition of X such that kµkT V < nj=1 |µ(Aj )| + ε/2. For
each 1 ≤ j ≤ n, let K ∋ Kj ⊂ Aj such that |µ|(Aj \ Kj ) < 2−j ε. Then
n
X n
X
kµkT V < |µ(Kj )| + ε ≤ |µ|(Kj ) + ε
j=1 j=1

Λ is a real linear functional on C0 (X) if Λ(f ) ∈ R whenever f is real–valued. The next

result for linear functionals is the analog to the Hahn decomposition theorem for signed
measures.

Lemma 13.5.2. Suppose Λ is a real bounded linear functional on the space C0 (X). There
exists a pair of positive bounded linear functionals Λ+ and Λ− on C0 (X) such that Λ =
Λ+ − Λ− .

Proof. Since Λ is a bounded linear functional on C0 (X), by Dini’s theorem, it is a σ–

continuous elemtary integral. For any ψ ∈ C0+ (X) we have that
|Λ|(ψ) := sup{|Λ(φ)| : φ ∈ C0 (X), |φ| ≤ ψ} ≤ kΛkkψku .
Hence, Λ has finite variation |Λ|. Theorem 10.1.9 implies that Λ+ = 21 (|Λ| + Λ) and Λ− =
1
2 (|Λ|−Λ) are positive σ–continuous elementary integrals on C0 (X). Since |Λ|(ψ) ≤ kΛkkψku
for all ψ ∈ C0+ (X), Λ+ and Λ− are in fact positive bounded linear functionals.

Theorem 13.5.3. (Riesz representation theorem) Let X be a l.c.H. topological space. Sup-
pose that Λ is complex or real bounded linear functional on C0 (X). Then, there is a unique
regular complex (finite signed) measure µΛ on B(X) such that
Z
(13.13) Λ(f ) = f dµΛ , f ∈ C0 (X)
X

and kΛk = kµΛ k.

Proof. It suffices to consider the case of real bounded linear functional, for if Λr = ℜ(Λ),
then Λ(f ) = Λr (f ) − i Λr (i f ).

Let Λ be a real bounded linear operator on C0 (X). Then there is a pair of positive bounded
linear operators Λ+ , Λ− such that Λ = Λ+ − Λ− . By Riesz’ representation R theorem 7.7.3,
there is a pair of regular finite measures
R µ + and µ − such that Λ ± (f ) = X f µ± on C0 (X).
Let µΛ = µ+ − µ− . Hence Λ(f ) = X f dµ and, by Lemma 13.5.1, kΛk = kµΛ k.
R
To prove uniqueness, suppose that ν is a finite regular measure and that X f ν = 0 for
all f ∈ C0R(X). Let ν = ν+ − ν− be the Hahn decomposition
R of ν. The linear functionals
Λ± (f ) = X f dν± are bounded. The assumption X f dν = 0 implies that Λ+ = Λ− . The
Riesz representation theorem 7.7.3 shows that ν+ = ν− . Since ν+ ⊥ ν− , we have that
ν+ = ν− = 0.

If we denote by M(X) the space of complex (real values of finite total variation) mea-
sures on B(X), and by C0∗ (X) the space of complex (real) bounded linear functionals on
C0 (X), the Riesz duality principle states that the map µ 7→ Λµ from M(X) to C0∗ (X) is an
isometry.

Corollary 13.5.4. Suppose Xis a Hausdorff compact topological space. The set P(X) if
Borel probability measures on X is a weak∗ –compact convex subset of M(X).
426 13. More results on duality

Proof. Convexity is obvious. By the Risesz representation theorem C ∗ (X) = M(X). Since
Z
P(X) ⊂ {µ : | f dµ| ≤ 1, kf ku ≤ 1}

and the set in right hand side is compact by Alaoglu’s theorem,

n R it is enough
o to show that
P(X) is closed. For any h ∈ C(X) with h ≥ the set Eh := µ : h dµ ≥ 0 is weak∗ –closed.
n R o
Similarly, the set E := µ : 1 dµ = 1 is weak∗ closed. Since P(X) is the intersection of
E and the sets Eh , we have that P(X) is weak∗ closed.

13.6. An application: Runge’s theorem.

This section is an application of the Riesz representation theorem to the problem of approx-
imating holomorphic functions by nice functions, extending the Stone–Weierstrass theorem
to the setting of holomorphic functions.
A rational function f is the ration of two polynomial functions. Assume f = P/Q
where P and Q are polynomials with no common factors. Then, f has a pole at each zero
of Q, and those zeroes are finite since Q is a polynomial. Subtracting the principal part at
each on those zeroes leaves a function rational function with whose only singularity is at
∞, that is, f is of the form
k
X
f (z) = p0 (z) + pj (z − λj )−1
j=1

where p− , . . . , pk are polynomials and a1 , . . . , ak are the distinct zeroes of Q. The order of
pj , 1 ≤ j ≤ k is corresponds to the oder of multiplicity of aj .
The problem we will study below is that of approximating holomorphic functions in an
open set by rational functions with a prescribed set of poles. We first state an auxiliary
topological result about the complex plane.
Lemma 13.6.1. Let Ω be a nonempty open set in C. There exists a sequence of compact
sets Kn such that
(i) Kn ⊂ Int(Kn+1 )
S
(ii) Ω = n Kn
(iii) Every component of S 2 \ Kn contains a component of S2 \ Ω, where S2 is the the
one point compactification C ∩ {∞} of C.

Proof. (i) For each n ∈ N define Kn = {z ∈ C : |z| ≤ n, d(z, Ωc ) ≥ n1 }. Clearly each Kn is a

compact subset of Ω. Suppose z ∈ Kn and let r = n1 − n+11
. We claim that B(z; r) ⊂ Kn+1 .
Indeed, for ω ∈ C \ Ω
1 1
|y| ≤ |z| + |x − y| ≤ n + − <n+1
n n+1
1 1 1
≤ |z − ω| ≤ |z − y| + |y − ω| < − + |y − ω|
n n n+1
13.6. An application: Runge’s theorem. 427

1 1
Hence |y − ω| > n+1 and so, d(z, Ωc ) ≥ n+1 .
1
(ii) If z ∈ Ω, d(z, Ωc ) > 0. Let m ∈ N large enough so that |z| ≤ m and m < d(z, Ωc ).
Then, z ∈ Km .
(iii) For each n ∈ N, set B(∞; n) := {z : |z| > n}. Let C be a connected component of
Vn := S2 − Kn . Since
[ 1
S2 \ Ω ⊂ Vn = B(∞; n) ∪ B(a; ),
n
a∈Ω
/

C is open and contains at least one of the open discs B(a; n1 ) where a ∈ {∞}∩C\Ω = S2 \Ω.
Say B(a0 ; n1 ) ⊂ C. Since discs are connected, C intersects the connected component Q of a
in S2 \ Ω. Since connected components are pairwise disjoint, Q ⊂ C.
Theorem 13.6.2. (Runge) Suppose ∅ = 6 K ⊂ Ω ⊂ C, where K and Ω are compact and open
respectively. Let A = {aj } be a set that contains one point in each component of S2 \ K. If
f ∈ H(Ω), for every ε > 0 there exists a rational function R whose poles lie in A such that
(13.14) sup |f (z) − R(z)| < ε
z∈K

Proof. Let R be the subspace of rational functions contained in C(K), and whose poles lie
in A. The statement of the theorem is equivalent to saying that if f ∈ H(Ω), then f is in the
uniform closure of R in C(K). By the Hahn–Banach theorem 12.10.9, this is equivalent to
saying that if µ ∈ C ∗ (K) and µ ∈ R⊥ , then µ(f ) = 0. By the Riesz–representation theorem,
C ∗ (K) is the space of M(K) of complex (and R thus of finite variation) Borel measures on K.
Suppose then that µ ∈ M(X) is such that K R dµ = 0 for all R ∈ R. Define
Z
1
h(z) := µ(dw), z ∈ S2 \ K
K w − z
Theorem (11.4.2) together with Remark 11.4.3 imply that h ∈ H(S2 \ K). We claim that
h ≡ 0 on S2 \ K. Suppose Cj is the component of S2 \ K that contains aj .
Case aj ∈ C: For some r > 0, B(aj ; r) ⊂ Cj . For fixed z ∈ B(aj ; r) and w ∈ K,
|z − aj | < r ≤ |w − aj | and so,
N
X (z − aj )n
1
(13.15) = lim
w − z N →∞ (w − aj )n+1
n=0
uniformly for w ∈ K. The truncated sums in (13.15) belong to R and so, they vanish under
µ. Hence h(z) = 0 for z ∈ B(aj ; r) and, since Cj is connected, h ≡ 0 on Vj .
Case aj = ∞: There is r > 0 such that B(∞; r) ⊂ Vj . For fix z with |z| > R
X wn N
1
(13.16) = lim
w − z N →∞ z n+1
n=0
uniformly for w ∈ K. Again, the truncated sums in (13.16) belong to R and, by similar
arguments as before, h ≡ 0 on Vj .
428 13. More results on duality

Let Γ be a cycle in Ω such that Γ ∼ 0 in Ω and IndΓ (z) = 1 for all z ∈ K. Since Γ∗ ⊂
K c , Cauchy’s general theorem and Fubini’s theorem (notice that the integrand involved is
continuous on the compact set K × Γ) imply that
Z Z Z
1 f (w)
f dµ = dw µ(dz)
K K 2πi Γ w − z
Z Z
1 1
= f (w) µ(dz) dw
2πi Γ K w−z
Z
1
=− f (w)h(w) dw = 0
2πi Γ
This shows that µ(f ) = 0 for all µ ∈ R⊥ .

The following special case is an extension to the Stone–Weierstrass theorem.

Corollary 13.6.3. Let K and Ω as in Runge’s theorem, and suppose S 2 \ K is connected.
If f ∈ H(Ω), for any ε > 0, there exists a polynomial p such that supz∈K |f (z) − p(z)| < ε.

Proof. The assumptions imply that S2 \ K is a open connected set containing ∞. The
conclusion follows by applying Runge’s theorem with A = {∞}.
Theorem 13.6.4. Let Ω be a nonempty set in C, and A a set that has one point in each
component of S2 \ Ω (A could be uncountable). If f ∈ H(Ω, there exists a sequence of
rational functions Rn whose poles are all in A, such that Rn converges to f uniformly in
compact subsets of Ω. In particular, when S2 \ Ω us connected, one may take A = {∞} and
get Rn to be polynomials.

Proof. Let {Kn } be a sequence of compact sets as in Lemma 13.6.1. Since each component
of Vn := S2 \ Kn contains a component of S2 \ Ω, each Vn contains a point in A. Thus, by
Runge’s theorem, for each n ∈ N there exists a rational function Rn whose poles are in A
such that
1
sup |Rn (z) − f (z)| < .
z∈Kn n
For any compact set K ⊂ Ω, there is n0 ∈ N such that K ⊂ Kn for all n ≥ n0 . The
conclusion follows.

13.7. Exercises
Exercise 13.7.1. Suppose that Λ is a bounded linear functional on Lp (Ω). For A ∈ F ,
define FA = {F ∩ A : F ∈ F }; denote by µA the restriction of µ to FA ; for any real or
complex valued function f on A, define fA as fA = f on A and zero elsewhere. Show that
ΛA : f 7→ Λ(fA ) is a bounded linear functional on Lp (A) with kΛA k ≤ kΛk.
Exercise 13.7.2. Let X = C k ([0, 1]) the space of functions that admit continuous deriva-
tives of order k. Define kf kk := kf ku + kf ′ ku + . . . + kf (k) ku . Show that k k is a complete
13.7. Exercises 429

norm on X. If Λ ∈ X ∗ , show that there is a Borel measure µ on [0, 1] and constants

c0 , . . . , ck−1 such that
Z 0 k−1
X
Λ(f ) = f (k) dµ + cj f (j) (0)
1 j=0
Rx
for all f ∈ X. (Hint: Let I : C(I) → C 1 (I) the operator defined by f 7→ f (t) dt. Show
∗ 0
that Λ ◦ I k ∈ C(I) .)
Exercise 13.7.3. Let µ1 , µ2 be measures on the unit circle S1 defined by
µ1 (dθ) = cos θ dθ, µ2 (dθ) = sin θ dθ.
Find the range of the vector–valued measure µ := (µ1 , µ2 ).
Exercise 13.7.4. (Mollifiers) In R1 , let f (t) = e−1/t 1(0,∞) (t).
(a) Verify that f ∈ C ∞ (R) and that f (k) (0) = 0 for all k ∈ Z+ . (This is a typical
example of a function that is smooth which fails to be analytic.)
(b) Show that ψa,b (t) = f (t − a)f (b − t) is a C ∞ (R) function with support [a, b].
R R
−1 t
(c) Let Ca,b = R ψa,b (t) dt and define ϕa,b (t) = Ca,b −∞ ψ(s) ds. Show that 0 ≤
ϕa,b ≤ 1, ϕ(t) = 0 for t ≤ a, and ϕa,b (t) = 1 for t ≥ b.
(d) For ε > 0, use (iii) to construct a mollifier η ∈ D(Rn ) which equals one in the
closed unit ball B(0; 1), vanishes outside the ball B(0; 1 + ε). (Hint: The function
g(t) = ϕ−δ,0 (t)ϕ−δ,0 (1 − t) is a function in D(R), 0 ≤ g ≤ 1, g(t) = 1 for 0 ≤ t ≤ 1,
and supp(g) = [−δ, 1 + δ].)
Exercise 13.7.5. Show that every meromorphic function on S2 is rational.
Chapter 14

Calculus on Banach
spaces

In this Section we give a brief presentation of Calculus on Banach spaces. We cover three
topics: Integration, differentiation, and optimization. First, following the steps of Daniell’s
approach to integration, we extend the notion of measurability, and introduce Bochner’s
integral, a form of integration defined for functions taking values on a Banach space. The
second part of these notes is dedicated to Differentiation. We extend the notion of derivative
to functions between Banach spaces, and present important results such as the implicit the
mean value theorem and the implicit function theorem. Lastly, we give a brief introduction
to the problem of optimization, where the objective and constrains are defined in Banach
spaces.

14.1. Measurability and uniformity

The general Stone–Weierstrass theorem allows us to extend the notion of measurability to
functions taking values on metric space.
Definition 14.1.1. Let k k be a mean for a Stone lattice or a ring E ⊂ Bb (Ω). Suppose
(S, ρ) is a metric space. A function f ∈ S Ω is measurable if for any A ∈ L1 and ε > 0,
there is L1 ∋ A0 ⊂ A with kA \ A0 k < ε on which f is E–uniformly continuous. We
denote by MS (k k) the space of S–valued measurable functions.
Remark 14.1.2. If f and g are MS (k k)–functions, then the map x 7→ ρ(f (x), g(x)) is
MR(k k). Indeed, given A ∈ L1 and ε > 0 there exits an L1 –subset A0 ⊂ A with kA\A0 k < ε
where both f and g are E–uniformly continuous. Hence, for any η > 0 there exist δ > 0 and
a finite set {φ1 , . . . , φn } ⊂ E such that x, y ∈ A0 and max1≤j≤n |φj (x) − φj (y)| < δ imply
that ρ(f (x), f (y)) ∨ ρ(g(x), g(y)) < η/2. Consequently, for all such x, y
|ρ(f (x), g(x)) − ρ(f (y), g(y)| ≤ ρ(f (x), f (y)) + ρ(g(x), g(y)) < η.

431
432 14. Calculus on Banach spaces

Suppose (Ω, M , µ) is a measure space. Let µ∗ be the Daniell mean associated to the
elementary integral µ on the space E of simple integrable M –measurable functions. A Borel
measureable function in Ω with values on a general metric space may fail to be in MS (µ∗ ).
For separable metric spaces we have the following result.
Lemma 14.1.3. Suppose (Ω, M , µ) is a measure space and let µ∗ be the Daniell–mean
associated to µ. If (S, d) is separable and f : (Ω, M ) −→ (S, B(S)) then f ∈ MS (µ∗ ).

Proof. For any ε > 0 there exits a countable

S disjoint Borel cover {Bn : n ∈ N} of S with
diam(Bn ) < ε. Let A ∈ L1 . As A = n A ∩ f −1 (Bn ) and f −1 (Bn ) ∈ E for all n, there is
S
an integer N such that µ(A \ A0 ) < ε where A0 = N j=1 (f
−1 (B ) ∩ A). Let δ = ε ∧ 1 and
j
φj = 1Bj ◦ f . If x, y ∈ A0 and max1≤j≤N |φj (x) − φj (y)| < δ, then x and y belong to the
same set f −1 (Bj ) for some 1 ≤ j ≤ N . Consequently, ρ(f (x), f (y)) ≤ diam(Bj ) < ε. This
shows that f is E–uniformly continuous.

The general Stone–Weierstrass theorem shows that when (S, ρ) = (R, | |), Defini-
tions 14.1.1 and 7.1.3 are equivalent. Egorov’s theorem extends in the general setting
of Definition 14.1.1.
Theorem 14.1.4. (Extended Egorov’s theorem) Suppose that (fn : n ∈ N) ⊂ MS (k k)
converges almost surely to f . Then f is measurable and, for any A ∈ L1 and ε > 0, there
is L1 ∋ A0 ⊂ A with kA \ A0 k < ε on which convergence is uniform.

Proof. By repeating the proof of Lemma 7.1.4 we obtain a set L1 ∋ A′0 ⊂ A with kA\A′0 k <
ε/2 on which each fn is E–uniformly continuous.

If f , g are E–uniformly continuous

and Φ : x 7→ d(f (x), g(x)) ∧ 1, then Φ is also E–
uniformly continuous since Φ(x) − Φ(y)| ≤ d(f (x), f (y)) + d(g(x), g(y)). It follows that
each Φn,m = d(fn , fm ) ∧ 1 is E–uniformly continuous on A′0 and, by the general Stone–
Weierstrass theorem, 1A′0 Φn,m ∈ L1 . Repeating the steps of the proof of Theorem 7.1.2 and
applying monotone convergence we obtain that
\
S(n, k) = A′0 ∩ {Φi,j ≤ k1 } ∈ L1
i,j≥n

and for each k fixed, S(n, k) ր A′0 as n ր ∞. By monotone convergence, there is a

′ −k−1 . If U is the set where f
subsequence nk < nk+1 such that T kA0 \ S(nk , k)k < ε2 n
converges to f , then A0 = U ∩ k S(nk , k) is an integrable set with kA \ A0 k < ε on
which fn converges to f uniformly. It remains to show that f is E–uniformly continuous on
A0 . Let N ∈ N large enough so that supx∈A0 d(fN (x), f (x)) < ε/3. As fN is E–uniformly
continuous on A0 , there exists δ > 0 and φ1 , . . . , φk in E such that for any x, y ∈ A0 ,
max1≤j≤k |φj (x) − φj (y)| < δ implies d(fN (x), fN (y)) < ε/3. Therefore,
d(f (x), f (y)) ≤ d(f (x), fN (x)) + d(fN (x), fN (y)) + d(fN (y), f (y)) < ε
whenever max1≤j≤k |φj (x) − φj (y)| < δ.
14.2. Banach valued integral 433

14.2. Banach valued integral

Suppose (E, I) is a positive σ–continuous elementary integral with Daniell mean k k∗ . Let
(E, | |E ) be a Banach space. For a any function f ∈ E Ω define
∗
(14.1) kf k∗E := |f |E .
Functions of the form
n
X
(14.2) Φ= ej φ j , n ∈ N, ej ∈ E, φj ∈ E.
j=1

will be called E–valued elementary functions. The collection of all such functions will
be denoted as E ⊗ E.
The following result summarizes the properties of k k∗E ; this in turn, will be used to
define E–valued integrable functions.
Theorem 14.2.1. Let FE be the space of E–valued almost surely defined functions for
which the seminorm (14.1) is finite.
(i) (FE , k k∗E ) is a complete seminormed space.
(ii) If {fn } ⊂ E Ω converges to f in k k∗E –mean, then there is a subsequence that
converges to f almost surely.
Let L1 (E) be the closure of E ⊗ E in FE . If f ∈ L1 (E), then
(iii) f is measurable and |f |E ∈ L1 (R) := L1 (k k∗ ),
(iv) for any ε > 0, there is a set U ∈ L1 (R) with kU k∗ < ε such that f is the uniform
limit of a sequence in E ⊗ E on U c .
Remark 14.2.2. Unless it is clear from the context, we will explicitly specify the Banach E
space in L1 (E) to distinguish it from the space of numerical integrable functions L1 (k k∗ ).

Proof. (i) and (ii) follow by repeating all the steps of the proof of Theorem 6.3.12, substi-
tuting absolute value | | by the E–norm | |E .
P
To prove (iii), we first consider functions of the form Φ = j ej φj as in (14.2). Since
X
|Φ(x)|E − |Φ(y)|E ≤ |Φ(x) − Φ(y)|E ≤ |ej |E |φj (x) − φj (y)|,
j

it follows that Φ and |Φ|E are E–valued and R–valued E–uniformly continuous respectively.
By the Stone–Weierstrass theorem, |Φ|E is the sum of a constant a ∈ R and a function
u
φ ∈ E . Hence, |Φ|E ∈ F(k k∗ ) ∩ MR(k k∗ ) and we conclude that |Φ|E ∈ L1 (k k∗ ). For
general f ∈ L1 (E), let Φn ∈ E ⊗ E be a sequence converging to f almost surely and in
k k∗E –norm. Egorov’s theorem 14.1.4 shows that f is measurable. Since

|f |E − |Φn |E ∗ ≤ |f − Φn |E ∗ = kf − Φn k∗E → 0

and |Φn |E ∈ L1 (k k∗ ), we conclude that |f |E ∈ L1 (k k∗ ).

434 14. Calculus on Banach spaces

Statement (iv) is proved by a slight modification of the proof of Theorem 7.1.1. Choose a
sequence {Φn } ⊂ E ⊗ E that converges to f in mean and such that kΦnP− Φn−1 k∗ < 2−n−1 .
Setting Ψ0 = Φ0 and ΨP n = Φn − Φn−1 for n ≥ 1 we have that f = n Ψn in mean and
almost surely. Let f ′ = n Ψ where
Pn the series converges, and zero otherwise.
The
P real valued sequence ψ n = k=1 k|Ψk |E converges in L1 and almost surely to ψ =
n n|Ψn |E . For any M and K
K
X X
k{ψ > M }k∗ ≤ 1
M kψk
∗
≤ 1
M kkΨk k∗E + k
2k
;
k=1 k>K
thus, for K and M large enough we have k{ψ > M }k∗
< ε. For such M define U = {ψ >
∗ c
M }. Then U ∈ L1 (k k ) and on U ,
n
′ X X 1X ψ M
f − Ψ k E ≤ |Ψk |E ≤ k|Ψk |E ≤ ≤ .
n n n
k=1 k>n k>n
Therefore, on Uc ∩ {f ′ = 0}, f is the uniform limit of a sequence in E ⊗ E.

The functions f ∈ L1 (E) are called Bochner–integrable. The dominated convergence

theorem can be extended for Bochner’s integral.
Theorem 14.2.3. (Bochner dominated convergence.) Suppose {fn } ⊂ L1 (E) converges
almost surely to f and that |fn |E ≤ g for g ∈ F. Then f ∈ L1 (E) and kfn − f k∗E → 0.
W
Proof. For any k, j ≥ n, |fk − fm |E ≤ k,j≥n |fk − fj |E := gn . The real valued sequence
gn → 0 almost surely and is dominated by 2g; thus, kgn k∗ → 0 by Daniell–Lebesgue
∗
dominated convergence, and {fn } is a Cauchy sequence in L1 (E). Therefore, |fn −f |E =
kfn − f k∗E → 0.

Parts (c) and (d) of Theorem 14.2.1 make a connection between integrability of E–valued
functions, E–uniform continuity, and uniform limits of E ⊗ E functions. This motivates the
following stronger notion of measurability.
Definition 14.2.4. An E–valued function f is strongly measurable if for any set A ∈ L1
and ε > 0, there exists an integrable set A0 ⊂ A with kA \ A0 k∗ < ε on which f is the
uniform limit of a sequence in E ⊗ E.
As any E ⊗ E–function is E–uniformly continuous and thus measurable, strong measur-
ability implies measurability. Is easy to check that the collection of strongly measurable
functions is a linear space.
Example 14.2.5. (Strong measurability of functions in C(R, E)). Let ∆ be a subinterval
of R. Consider E the space of step functions in (∆, B(∆)) and let k kλ be Daniell’s mean
associated to Lebesgue’s measure on R. If u ∈ C(∆; E), then u is strongly measurable. To
check this, define
n −1
2X k
un (t) := u n 1 k k+1 (t), n ∈ N.
n
2 ,
2n 2n
∩∆
k=−2
14.2. Banach valued integral 435

Clearly (un : n ∈ N) ⊂ E ⊗ E converges to u uniformly in compact sets. It follows from the

inner regularity of Lebesgue measure that u is strongly measurable.
Theorem 14.2.6. Suppose f ∈ E Ω and f (Ω) is separable. If Λ ◦ f ∈ MR(k k∗ ) for all
Λ ∈ E ∗ , then f is strongly measurable.

Proof. The linear space V = span f (Ω) is separable. Let {yn : n ∈ N} be a countable
dense subset in V . Hahn–Banach’s extension Theorem 12.10.9, for each yn there is Λn ∈ V ∗
such that kΛn kV = 1 and Λn yn = kyn k. We claim that
kyk = sup |Λn y|, y ∈ V.
n
It is clear that supn |Λn y| ≤ kyk. If ynk → y, then
kynk k = |Λnk ynk | ≤ |Λnk (ynk − y)| + |Λnk y| ≤ kynk − yk + sup |Λn y|
n
and so, kyk ≤ supn |Λn yk.
Since Λn is the restriction of a linear functional in E ∗ to V , Λn ◦f ∈ MR(k k∗ ) by assumption.
Consequently, for any y0 ∈ V
\
f −1 (B(y0 ; r)) = {ω ∈ Ω : |Λn (f (ω)) − Λn (y0 )| ≤ r} ∈ M (k k∗ ).
n

For any m, {B(yn ; 1/m) : n ∈ N} covers V . Set D1m = B(y1 ; 1/m), Dn+1 m = B(y
n+1 ; 1/m) \
Sn P 1
j=1 B(yj ; 1/m) for n ≥ 1. Then Φm = m clearly satisfies kf − Φm ◦ f ku ≤
n yn 1Dn m for
each m.
P
Notice that 1 ≡ n 1Dnm ◦ f for each m. By Egorov’s theorem, given a set A0 P ∈ L1 (k k∗ )
∗ ε
and ε > 0, there is an integrable subset A1 ⊂ A0 with kA0 \ A1 k < 2 on which n 1Dn1 ◦ f
finitely many 1Dn1 ◦f vanish on A 1 . As a consequence,
converges uniformly to 1. Thus, all but

there exists ψ1 ∈ E ⊗ E such that |Φ1 ◦ f − ψ1 |E < 1, and so |f − ψ1 |E < 2.
u,A1 u,A1
Repeating this argument inductively we obtain a decreasing sequence of sets (Am : m ∈
N) ⊂ L1 (k k∗ ) and a sequence (ψm : m ∈ N) ⊂ E ⊗ E such that
ε
kAm−1 \ Am k∗ <
2m
2

|f − ψm |E < .
u,Am m
T
By monotone convergence B := m Am ∈ L1 (k k∗ ) and by construction A \ Bk∗ =
P ∗
m kAm−1 \ Am k < ε. On B we have that ψm −→ f uniformly. This shows that f is
strongly measurable.

The next result gives necessary and sufficient conditions for a function f ∈ E Ω to be
integrable.
Theorem 14.2.7. f ∈ L1 (E) iff f is strongly measurable and |f |E ∈ L1 (k k∗ ). In either
case, there exists f˜ ∈ L1 (E) such that kf − f˜k∗E = 0 and f˜(Ω) separable.
436 14. Calculus on Banach spaces

Proof. Necessity is in Theorem 14.2.1(c,d).

To prove sufficiency, we first show that if |f − Φn |E ku,A → 0 where A ∈ L1 and {Φn } ⊂
E ⊗ E, then f 1A ∈ L1 (E). Indeed,

(f − Φn )1A ∗ ≤ |f − Φn |E kAk∗ → 0
E u,A
We show now that f 1A ∈ L1 (E) whenever A ∈ L1 . For each k ≥ 1 choose L1 ∋ Ak ⊂ A
with kA \ Ak k∗ < 2−k on which f is the uniform limit of a sequence in E S ⊗ E.
T The sequence
of integrable functions fk = f 1Ak converges to f 1A pointwise on A′ := n k≥n Ak . Since
kA \ A′ k∗ = 0, the convergence is in fact almost surely. As |fk |E ≤ |f |E and |f |E ∈ L1 , by
dominated convergence f 1A ∈ L1 (E).
By Chebyshev’s inequality Bn = {|f |E > n1 } ∈ L1 for each n. Consequently, gn = f 1Bn ∈
L1 (E). Since gn converges to f almost surely and its dominated by |f |E , we conclude that
f ∈ L1 (E).
The last statement follows by choosing a subsequence {Φn } ⊂ E ⊗ E converging to f a.s.
and in mean. Let f˜ = f where Φn converges and 0 ∈ E otherwise.
Example 14.2.8. In Example 14.2.5 we showed that E–valued continuous Rfunctions in an
interval ∆ ⊂ R are strongly continuous. Consequently, if f ∈ C(∆; E) and ∆ |f |E dλ < ∞
then f ∈ L1 (k kλE ).
Example 14.2.9. Suppose Ω is a Hausdorff compact topological space, and µ is a Borel
probability measure on Ω. Let k k∗µ be the Daniell mean induced by the elementary integral
(C(Ω), µ). If F : Ω → E is continuous then, f (Ω) is a compact subset of E and, since E is a
(complete) normed space, F (Ω) is separable. For each Λ ∈ E ∗ , Λ ◦ F ∈ C(Ω) ⊂ MC(k k∗µ ).
Hence, F is strongly measurable and since |F |E ∈ L1 (k k∗µ ), F ∈ L1 (E).

14.3. Extension of Bochner’s Integral

Let S denote the collection of of functions of the form
Xn
(14.3) Φ= ej 1Aj , ei ∈ E, Aj ∈ L1 (k k∗ ).
j=1
P
Clearly S ⊂ L1 (k Define the integral I on S by I(Φ) = nj=1 ej I(Aj ). It is clear that
k∗E ).
I is independent on the representation (14.3) of Φ. Moreover,
X
(14.4) I(Φ) = e I({Φ = e})
e∈E
and the integral is dominated by the mean since
X X
|I(Φ)|E ≤ |e|E I({Φ = e}) = I( |e|E 1{Φ=e} ) = I(|Φ|E ) = kΦk∗E .
e∈E e∈E
Since simple functions are dense in L1 (k k∗ ) we have that f ∈ L1 (E) iff there is {Φn } ⊂ S
such that kf − Φn k∗E → 0. From
∗
|I(Φn ) − I(Φm )|E = |I(Φn − Φm )|E ≤ |Φn − Φm |E = kΦn − Φm k∗E
14.3. Extension of Bochner’s Integral 437

it follows that I(Φn ) is a Cauchy sequence in E. We define the Bochner integral of

function f ∈ L1 (E) by I(f ) = limn I(Φn ).
As in the real–valued case, the Bochner integral is linear on L1 (k k∗E ). To check this,
suppose f, g ∈ L1 (E) and a ∈ C. Chose sequences Ψn and Φn in S such that kΨn − f k∗E ∨
kΦn − gk∗E → 0. Then aΨn + Φn ∈ S and from
ka f + g − (aΨn + Φn )k∗E ≤ |a|kf − Ψn k∗n + kg − Φn k∗E
the linearity of I follows. A direct consequence of the linearity of the Bochner integral is
that I on PE ⊗ E is independent
P of the particular choice of representation (14.2) of Φ, that
is, if Φ = nj=1 ej φj = m e ′ φ′ then,
k=1 k k
n
X m
X
I(Φ) = ej I(φj ) = e′j I(φ′k ).
j=1 k=1

Theorem 14.3.1. If f ∈ L1 (k k∗E ) and Λ ∈ E ∗ , then

∗
|I(f )|E ≤ I(|f |E ) ≤ |f |E = kf k∗E ,
Λf ∈ L1 (k k∗ ) and I(Λf ) = Λ(I(f )).

Proof. Suppose {Φn } ⊂ S such that kΦn − f k∗E → 0. Since

|I(f )|E − |I(Φn )|E ≤ |I(f ) − I(Φn )|E ,

it follows that
|I(f )|E = lim |I(Φn )|E ≤ lim I(|Φn |E ) = lim kΦn k∗E = I(|f |E ) = kf k∗E .
n n n

For any Λ ∈ E ∗ and Φ ∈ S, ΛΦ ∈ L1 (k k∗ ) and I(ΛΦ) = Λ(I(Φ)). For general f ∈ L1 (k k∗E ),

let {Φn : n ∈ N} ⊂ S such that kf − Φn k∗E → 0 as n → ∞. Then

Λ(I(f ) − I(Φn )) ≤ |Λk|I(f ) − I(Φn )|E ≤ kΛkI |f − Φn |E = kΛkkf − Φn k∗E ,

and

I(Λf − ΛΦn ) ≤ I |Λf − ΛΦn | ≤ kΛkI |f − Φn |E ) = kΛkkf − Φn k∗E .

Therefore Λ(I(f )) = limn Λ(I(Φn )) = limn I(ΛΦn ) = I(Λf ).

Remarks 14.3.2. The following observations are in order:
(a) By Theorem 14.3.1 and Hahn–Banach’s theorem, the Bochner integral of f ∈
L1 (E) is the unique element I(f ) ∈ E such that Λ(I(f )) = I(Λf )) for all Λ ∈ E ∗ .
(b) If L is a linear map from E into F , then Lφ ∈ E ⊗ F whenever φ ∈ E ⊗ E. When L
is bounded, arguments similar to those used in the proof of Theorem 14.3.1 show
that Lf ∈ L1 (k k∗E ) for all f ∈ L1 (k k∗B ), and L(I(f )) = I(Lf ).

Corollary 14.3.3. If f ∈ L1 (k k∗E ) then, I(f ) ∈ span(f (x) : x ∈ E).

438 14. Calculus on Banach spaces

∗
by f (Ω). If I(f ) ∈
Proof. Let V be the closed linear space generated / var V , then there
exists Λ ∈ E such that Λ(V̂ ) = {0} and Λ I(f ) 6= 0. However, Λ(I(f )) = I(Λf ) = 0
which is a contradiction.

The following result extends Remark 14.3.2 to the setting of closed operators (not nec-
essarily bounded). B ⊂ E × F is a closed linear map if B is a closed linear subspace
of E × F such that (0, y) ∈ B implies that y = 0. The domain of B is defined as
dom(B) = {x ∈ E : ∃y ∈ F with (x, y) ∈ B}. Similarly, the range of B is defined as
range(B) = {y ∈ F : ∃x ∈ F with (x, y) ∈ B}. If (x, y) ∈ B, then we write y = Bx.
Theorem 14.3.4. (Hille) Let B ⊂ E × F be a closed linear map. Suppose f ∈ L1 (k k∗E )
and that f (Ω) ⊂ dom(B). If Bf ∈ L1 (k k∗F ) then I(f ) ∈ dom(B) and B(I(f )) = I(B(f )).

Proof. Consider the Banach space (E × F, | |E + | |F ). Since L1 (E × F ) is the closure

of E ⊗ (E × F ) in FE×F under the seminorm k k∗E×F ≤ k k∗E + k k∗F , we obtain that if
f1 ∈ L1 (k k∗E ) and f2 ∈ L1 (k k∗F ), then g = (f1 , f2 ) ∈ L1 (k k∗E×F ) and I(g) = (I(f1 ), I(f2 )).
This, together with the assumptions on f implies that h := (f, Bf ) ∈ L1 (k k∗E×F ) and
I(h) = (I(f ), I(Bf )). As B is closed and h ∈ graph(B), it follows from Corollary 14.3.3
that I(h) ∈ graph(B). Therefore, I(f ) ∈ dom(B) and I(B(f )) = B(I(f )).

We conclude this section with a simple fundamental theorem of Calculus for Banach
valued integrals over closed compact intervals.
Theorem 14.3.5. If f ∈ C 1 ([a, b]; E), then f ′ ∈ L1 ([a, b], k kλE ) and
Z b
f (b) − f (a) = f ′ (t) dt
a

Proof. Integrability of f ′ follows immediately from continuity, see Examples 14.2.5 and 14.2.9.
For any Λ ∈ E ∗ we have that φΛ = Λ ◦ f ∈ C 1 ([a, b]; R) and, by the fundamental theorem
of Calculus
Z b
d
Λ f (b) − f (a) = (Λ ◦ f )(t) dt
a dt
Z b
=Λ f ′ (t) dt
a
The conclusion follows from the version Hahn-Banach’s extension theorem stated in Theo-
rem 12.10.9.

Recall that a continuous curve in E parameterized by [a, b] is a continuous function

ϕ : [a, b] → E. We denote the ϕ([a, b]) by ϕ∗ . When ϕ(a) = ϕ(b) we say that ϕ is a closed
curve. A curve ϕ on E is rectifiable if
n
X
ℓϕ (a, b) := sup |ϕ(tk ) − ϕ(tk−1 )|E < ∞
P k=1
14.4. Other vector valued integrals 439

where the suprema is taken over all partitions P of [a, b]. The term ℓϕ (a, b) is the arc
length of the curve ϕ over [a, b].
A path ϕ in E defined is a continuous function ϕ : [a, b] → E such that for some
partition a = t0 < . . . < tn = b, ϕ ∈ C 1 ([tk−1 , tk ]). In Example 14.6.6) we show that if ϕ is
a path in E defined on [a, b] then,
Z b
(14.5) ℓϕ (a, b) = |ϕ′ (t)|E dt.
a

If ϕ is a path in [a, b] and f : ϕ∗→ L(E, F ), the path integral of f over ϕ is defined as
Z Z b
f := f (ϕ(t))ϕ′ (t) dt
ϕ a

As f ◦ ϕ is continuous on [a, b], M = supp∈ϕ∗ |f (p)|E < ∞ and so,

Z

(14.6) f ≤ M ℓϕ (a, b).
ϕ E

The next result is a direct consequence of Theorem 14.3.5.

Theorem 14.3.6. Suppose that f is continuously differentiable function in an open set

Ω ⊂ Rd and that ϕ is a path with ϕ∗ ⊂ Ω. Then,
Z b Z
′ ′
f (ϕ(b)) − f (ϕ(a)) = f (ϕ(t))ϕ (t) dt := f ′.
a ϕ

Proof. It is enough to assume that ϕ is continuously differentiable over [a, b]. By the chain
d
rule dt (f ◦ ϕ)(t) = f ′ (ϕ(t))ϕ′ (t). The conclusion follows from the fundamental theorem of
Calculus 14.3.5.

14.4. Other vector valued integrals

There are other integration theories for functions defined on a measure space (Ω, F , µ) with
values in a topological vector space (X, τ ). The Gelfand integral, the Pettis integral, and
the Dunford integral are some of such integrals. In this notes, we only briefly discuss the
Pettis integral which we will use later on in the discussion of convolution of distributions.
A function φ : Ω → X is weakly measurable if for any Λ ∈ X ∗ , the function ω 7→
Λ(φ(ω)) is measurable.

Definition 14.4.1. Suppose φ is a weakly measurable function. If there exists y ∈ X such

that
Z
(14.7) Λy = Λ ◦ φ dµ, Λ ∈ X∗
R
then, we say that φ is Pettis integrable and that y := φ dµ is its Pettis integral.
440 14. Calculus on Banach spaces

Remark 14.4.2. If X is a Banach space, then Theorems 14.2.6 and 14.2.7 show that
Bochner integrability implies Pettis integrability and in this case, the value of the integral
is uniquely defined. If X is a topological vector space where X ∗ separate points, then the
Pettis integral of a weakly measurable function, when it exists, is uniquely defined.

Theorem 14.4.3. Suppose µ is a Borel measure in a compact Hausdorff space Ω. (Ω, B, µ).
Let X be a topological vector space where X ∗ separates points. If φ : Ω → X is continuous
and the closed
R convex hull of φ(Ω), co(φ(Ω)), is compact in X, then φ is Pettis integrable
and y := Ω φ dµ ∈ co(φ(Ω)).

Proof. We consider the case of real vector spaces. The complex case follows from this by
doubling dimension in the arguments detailed below.

Let H = co(φ(Ω)). As Λ ◦ φ is continuous for each Λ ∈ X ∗ , φ is weakly measurable. For

any finite sequence L = (Λ1 , . . . , Λn ) in X ∗ , define EL the set of y ∈ H for which (14.7)
holds whenever Λ ∈ L. If EL 6= ∅ for any nonempty sequence T L, then the collection E of all
such EL has the finite intersection property and so, E = 6 ∅. Any y in that intersection, is
a Pettis integral of φ.
R
Set K = L(φ(Ω)), mj = Ω (Λj φ) dµ, and m := (m1 , . . . , mn ). Suppose t ∈ Rn \ co(K).
Since co(K) is compact in Rn , the separation Theorem 12.10.15[(iii)] there is a vector
c = (c1 , . . . , cn ) such that
n
X n
X
c j uj < c j tj , u = (u1 , . . . , un ) ∈ co(K)
j=1 j=1

In particular,
n
X n
X

cj Λj φ (ω) < c j tj , ω∈Ω
j=1 j=1

it follows that c·m < c·t. This shows that m ∈ co L(φ(Ω) .
Since µ is a probability measure,
Since LR is linear, co L(φ(Ω) = L(H) and so, there exists y ∈ H such that m = Ly. Thus
Λj y = Ω (Λj φ) dµ, 1 ≤ j ≤ n and EL 6= ∅.

Example 14.4.4. Let ψ, φ be functions in D(Rn ), and set K1 = supp(ψ), K2 = supp(φ).

The map x 7→ ψ(x)τx φ is a continuous function from K1 to D(Rn ). Notice that

supp(ψ(x)τx φ) ⊂ K1 + K2 ,

and {ψ(x)τx φ : x ∈ K1 } is a compact subset of DK1 +K2 . As DK1 +K2 is Fréchet, for any
distribution u ∈ D∗ (Rn )
Z Z
u ψ(x)τx φ dx = ψ(x)u τx φ dx
K1 K1
14.5. Symbolic calculus in Banach algebras 441

Theorem 14.4.5. Suppose X is a Banach space, Q is a compact Hausdorff topological

space, and µ a Borel measure on Q. If f : Q → X is continuous,
Z Z

f dµ ≤ kf k dµ

Q Q
R
Proof. By Theorem 14.4.3, y = Q f dµ exists. By Theorem 12.10.9, there is Λ ∈ X ∗
Λy = kyk and kΛk = 1. It follows that
Z Z
kyk = Λy = Λ ◦ f dµ ≤ kf k dµ
Q Q

14.5. Symbolic calculus in Banach algebras

In this section we present an extension of the notion of analytic functions to functions with
values on a topological vector space. An instance of this appeared already in our discussion
of the spectrum of points in a Banach algebra in Theorem 12.14.2.
Definition 14.5.1. Suppose X is a locally convex space. Let f be a function defined on
an open set Ω ⊂ C with values in X. f is weakly holomorphic in Ω if Λ ◦ f ∈ H(Ω) for
all Λ ∈ X ∗ ; f is strongly holomorphic in Ω if
f (z) − f (a)
lim
z→a z−a
exists in the topology of X for any a ∈ Ω.

It is left as an exercise to check that sums of strongly (weakly) holomorphic functions is

strongly holomprphic (resp. weakly), and that the product of a scalar holomorphic function
and a strongly (weakly) function f is strongly (resp. weakly) holomorphic.
Theorem 14.5.2. Suppose X is a Fréchet space. If f : Ω → X, where Ω ⊂ C is open,
is weakly holomorphic on Ω, then it is continuous (in the original topology of X), strongly
holomorphic,
Z
f (w)
(14.8) dz = f (a)
Γ w −a
Z
(14.9) f (w) dw = 0
Γ
for any cycle Γ ∼ 0 in Ω, and a ∈ Ω \ Γ∗ such that IndΓ (a) = 1.

n Let z ∈ Ω and r > 0 so that

Proof. We first prove that f is continuous. o B(a; r) ⊂ Ω. Since
∗ f (w)−f (a)
Λ ◦ f ∈ H(Ω) for all Λ ∈ X , the set Q = w−a : 0 < |w − a| ≤ r is weakly bounded.
Then, by Theorem 12.12.14, Q is bounded in the original topology of X. Hence, for any
open neighborhood V of the 0 in X, there is t0 > 0 such that
f (w) − f (a) ∈ (w − a)t0 V, |w − a| ≤ r
This implies continuity of f at a.
442 14. Calculus on Banach spaces

Theorems 12.3.11 and 14.4.3 imply that the integrals in (14.8) and (14.9) exists. By the
general Cauchy theorem 11.31, the formulas there hold for Λ◦f in place of f , where Λ ∈ X ∗ ;
hence, the identities hold by the definition of the Pettis integral.
We now prove that f is strongly holomorphic. Let a ∈ Ω and r > 0 as before. Let Γ
be the
R circle of radius R centered at a. follows from Theorems 12.3.11 and 14.4.3 that
1 f (w)
2π1 Γ (w−a)2 dw exists. A simple calculation gives
Z Z
f (z) − f (a) 1 f (w) 1 f (w)
(14.10) − dw = (z − a) dw
z−a 2πi Γ (w − a)2 2πi (w − a)2 (w − z)
Let V be any balanced convex neighborhood of 0 in X. Define g(z) as the integral in the
right–hand–side of (14.10). As K = {f (w) : w ∈ Γ∗ } is compact in X, K ⊂ tV some t0 > 0.
Since dw = ireiθ dθ on a + rS1 , for |z − a| < r/2 we have that
|(w − a)−2 (w − z)−1 ||dw| ≤ 2r−2 dθ
It follows that the integrand in g is contained in 2r−2 K ⊂ 2r−2 tV ; consequently, g(z) ∈
z→a
2r−2 t0 V . This shows that g(z) −−−→ 0 in X; hence, f is holomorphic.
Example 14.5.3. Suppose A is a Banach algebra and let x ∈ A. Let σ(x), ρ(x) and
r(x) be the spectrum resolvent and spectral radius of x. Theorem 12.14.2 shows that
f (λ) = (λe − x)−1 is weakly holomorphic on ρ(x) = C \ σ(x). Hence, f (λ) and
P the functions
λn f (λ), n ∈ Z+ , are strongly holomorphic on ρ(x). If |λ| > kxk, f (λ) = ∞ m=0 λ
−m−1 xm

absolutely and uniformly on compact subsets of C \ B(0; kxk). Denoting by Γr the circle of
radius r centered at 0, by Theorem 14.4.5, we have that for r > kxk
Z
n 1
(14.11) x = λn f (λ) dλ, n ∈ Z+
2πi Γr
Since ρ(x) contains all λ with |λ| > r(x), Theorem 14.5.2 implies that the condition r > kxk
in (14.11) can be replace by r > r(x). For such r, let M (r) = max{|f (λ)| : |λ| = r}. Then
kxn k ≤ rn+1 M (r)
p p
Consequently lim supn n kxn k ≤ r and so, lim supn n kxn k ≤ r(x).

Formula (14.11) may be easily extended to a more suggestive identity. If p(λ) = a0 +

a1 λ + . . . + an λn , it is clear what we mean by p(x) = a0 e + a1 x + . . . + an xn . Then
Z
1
(14.12) p(x) = p(λ)(λe − x)−1 dλ, r > r(x).
2πi Γr
In the next result, we show that formula 14.12 holds actually for rational functions whose
poles are in the resolvent ρ({x}) and when integration is done along cycles surrounding (see
Corollary 11.5.23 for definition) the spectrum σ(x) of x.
Lemma 14.5.4. Suppose A is a Banach algebra and x ∈ A. Let α ∈ ρ(x) and let Γ be a
cycle that surrounds σ(x) in C \ {α}. Then, for any n ∈ Z
Z
n 1
(14.13) (αe − x) = (α − λ)n (λe − x)−1 dλ
2πi Γ
14.6. Differentiation on Banach spaces 443

14.6. Differentiation on Banach spaces

In this section we extend the notions of differentiability of real–valued Calculus to the
setting of Banach spaces. We also present extensions of classical results such as the man
value theorem, the implicit function theorem, and problems of optimization to this setting.
The statements of proofs of all results from real–valued Calculus that we use are left as
exercises.
Suppose X and Y are Banach spaces over the field F (F = R or F = C) and let U ⊂ X
be nonempty and open.
Definition 14.6.1. (Fréchet) A function F : U −→ Y is called differentiable at x ∈ U if
there is F ′ (x) ∈ L(X, Y ) such that
(14.14) F (x + h) = F (x) + F ′ (x)h + r(h)
|r(h)|
where r(h) = o(h); i.e., limh→0 khk = 0.

F ′ (x) is called derivative of F at x. It is easy to check that if F is differentiable at

x, then F is continuous at x. The function F is continuously differentiable in U if F
is differentiable at every x ∈ U and the map F ′ : U −→ L(X, Y ) is continuous. If the
derivative function F ′ is itself differentiable on U , then we say that F is of class C 2 (U, Y ).
More generally, F ∈ C r (U, Y ) if F ′ ∈ C r−1 (U, L(X, Y )).
We know extend the chain rule for composition of differentiable functions in Banach
spaces.
Lemma 14.6.2. Let X, Y and Z be Banach spaces and let U ⊂ X and V ⊂ Y be non
empty open sets. Suppose F : U −→ Y , and G : V −→ Z are functions such that F (U ) ⊂ V .
If F is differentiable at x and G is differentiable at F (x), then G ◦ F is differentiable at x
and
(G ◦ F )′ (x) = G′ (F (x)) F ′ (x)
.

Proof. Denote by y = F (x). Then, for all h ∈ X and k ∈ Y small enough,

F (x + h) = F (x) + F ′ (x)h + r(h)
G(y + k) = G(y) + G′ (y)k + p(k)
where r(h) = o(h) and p(k) = o(k). By continuity k(h) = F (x + h) − F (x) → 0 as h → 0;
hence, for h small enough
G(F (x + h)) = G(F (x) + k(h)) = G(F (x)) + G′ (F (x))k(h) + p(k(h))
= G(F (x)) + G′ (F (x))F ′ (x)h + G′ (F (x))r(h) + p(k(h)).
For s(h) = G′ (F (x))r(h) + p(k(h)), we have
ks(h)k kr(h)k kp(k(h))k kk(h)k
≤ kG′ (F ′ (x))k + →0
khk khk kk(h)k khk
as h → 0.
444 14. Calculus on Banach spaces

The following result is an extension of the real-variable mean valued theorem (see Ex-
ercise 14.10.11) to the setting of differentiable functions in Banach spaces.
Theorem 14.6.3. (Mean value theorem) Suppose F ∈ C 1 (U, Y ) where U ⊂ X is convex.
For any x, y ∈ U ,
(14.15) kF (x) − F (y)k ≤ M (x, y) kx − yk
where M (x, y) = sup0≤t≤1 kF ′ (x + t(y − x))k.

Conversely, if there is M ≥ 0 such that

(14.16) kF (x) − F (y)k ≤ M kx − y|, x, y ∈ U,
then supx∈U kF ′ (x)k ≤ M .

Proof. The last statement is the simplest to prove. The differentiability of F implies that
for any unitary vector u ∈ X
F (x + tu) − F (x)
lim = F ′ (x)u,
t→0 t
where the limit is taken over t ∈ F. Therefore, from (14.16), we conclude that
sup kF ′ (x)uk ≤ M.
kuk=1

For the first statement, it is enough to assume that X and Y are Banach spaces over R.
Let x, y ∈ U be fixed. Let v ∈ Y ∗ with kvk = 1. Define ϕ : [0, 1] → R as ϕ(t) =
(v ◦ F )(x + t(y − x)). Then, ϕ is differentiable in (0, 1) and, by the real–valued mean valued
theorem, there is t∗ ∈ (0, 1) such that ϕ(1) − ϕ(0) = ϕ′ (t∗ ). Hence

v(F (y) − F (x)) = v F ′ (x + t∗ (y − x))(y − x)

≤ kF ′ (x + t∗ (y − x))kky − xk

Consequently, kF (y) − F (x)k = supkvk=1 (F (y) − F (x)) ≤ kF ′ (x + t∗ (y − x))kky − xk.
The conclusion follows immediately.

The following results are immediate consequence of the mean value theorem.
Corollary 14.6.4. Suppose U ⊂ X is an open connected set in the Banach space X. A
function F ∈ C 1 (U, Y ) is constant iff F ′ = 0.

Proof. Exercise.
Corollary 14.6.5. Let X, Y be Banach spaces and U ⊂ X open. Suppose F ∈ C 1 (U, Y ).
For any x0 ∈ U and ε > 0, there exists a ball B(x0 ; r) ⊂ U such that
kf (u) − f (v) − f ′ (x0 )(u − v)k < εku − vk, u, v ∈ B(x0 ; r)
14.6. Differentiation on Banach spaces 445

Proof. Consider the function g(x) = f (x) − f ′ (x0 )x. The continuity of f ′ at x0 implies
that there exists a ball B(x0 ; r) ⊂ U such that kf ′ (x) − f ′ (x0 )k < ε whenever x ∈ B(x0 ; r).
Since tv + (1 − t)u ∈ B(x0 ; r) for any u, v ∈ B(x0 ; r) and 0 ≤ t ≤ 1,
kg(u) − g(v)k = kf (u) − f (v) − f ′ (x0 )(u − v)k
≤ sup kg ′ (u + t(u − v))kku − vk
0≤t≤1
= sup kf ′ (tv + (1 − t)u) − f ′ (x0 )kku − vk < εku − vk.
0≤t≤1

Example 14.6.6. We are now in Rthe position of proving that if ϕ ∈ C 1 ([a, b], E) then ϕ
is recitifiable and that ℓϕ (a, b) = (a,b] |ϕ(t)|E dt. Given ε > 0, there is δ > 0 such that

|t − s| < δ implies |ϕ′ (t) − ϕ′ (s)|E < ε. Let B xj ; 2δ , j = 1, . . . , N be a finite cover of [a, b].
If |s − t| < 2δ , then s, t ∈ B(xj ; δ) for some j and so, setting gj (t) = ϕ(t) − ϕ′ (xj )t, we have
that

ϕ(t) − ϕ(s) − ϕ′ (xj )(s − j) ≤ sup gj′ (t + λ(s − t)) |t − s|
E E
0≤λ≤1

≤ ϕ′ (t + λ(s − t)) − ϕ′ (xj )E |t − s| < ε|t − s|
δ
Let a = t0 < . . . < tn = b be any partition such that max (tk+1 − tk ) < 2. For each
0≤k≤n−1
j = 0, . . . , n, there is xkj with tj+1 , tj ∈ B(xkj ; δ). Then

n−1 n−1
X X
ϕ(tj+1 ) − ϕ(tj ) − ϕ′ (tk ) (tj+1 − tj )
E E
j=0 j=0
n−1
X
≤ ϕ(tj+1 ) − ϕ(tj ) − ϕ′ (tj )(tj+1 − tj )
E
j=0
n−1
X ′
≤ ε(b − a) + ϕ (xk ) − ϕ′ (tj ) (tj+1 − tj )
j E
j=0
≤ 2ε(b − a)
The conclusion follows immediately.

Suppose X and Y are locally convex linear spaces. Consider a function F : U → Y ,

where U is an open subset of X. The directional derivative of F : U ⊂ X → Y at x in
the direction v is defined as
F (x + tv) − F (x)
Dv F (x) := lim
t→0 t
when the limit exists. F is said to be Gâteaux–differentiable at x ∈ U there is Lx ∈
L(X, Y ) such that Dv F (x) = Lx v for all v ∈ X. In this case, Lx is called Gâteaux–
derivative of F at x. When X and Y are Banach spaces, if F is differentiable at x ∈ U
then it is Gâteaux–differentiable at x and Dv F (x) = F ′ (x)v for all v ∈ X. The converse
does not hold necessarily.
446 14. Calculus on Banach spaces

Example 14.6.7. Consider the function F : R3 → R defined by

( 3
x + y − x4x+yy 2 6 (0, 0)
if (x, y) =
F (x, y) =
0 otherwise

For any v = (h, k), Dv F (0) = h + k. However, F is not differentiable at 0 since

F (h, k) − h − k 1 h3 k
√ =√
h2 + k 2 h2 + k 2 h4 + k 2
fails to converge to 0 as (h, k) → (0, 0).

Theorem 14.6.8. Suppose that F : U ⊂ X → Y is Gâteaux–differentiable on a neighbor-

hood V ⊂ U of a point x ∈ U . If the Gâteaux derivative y 7→ Ly is continuous at x, then F
is differentiable at x and F ′ (x) = Lx .

Proof. As y 7→ L(y) is continuous at x, we can assume without loss of generality that

M := supy∈V kLy k < ∞. Choose v ∈ X and small enough so that x + th ∈ V for all |t| ≤ 1.

Let e ∈ Y ∗ be such that kekY ∗ = 1 and define f (t) := e F (x + tv) . The chain rule implies
that f is differentiable on [0, 1] and f ′ (t) = e L(x + tv)v . Since f ′ is continuous over [0, 1]
it is integrable and, by the fundamental theorem of Calculus,
Z
f (1) − f (0) = f ′ (t) dt.
(0,1]

As a consequence
Z

e F (x + v) − F (x) − L(x)v = e (L(x + tv) − L(x))v dt
(0,1]
Z
≤ kvk kL(x + tv) − L(x)k dt.
(0,1]

By taking suprema over all e ∈ Y ∗ with kekY ∗ = 1, we obtain

Z

kF (x + v) − F (x) − L(x)v k ≤ kvk kL(x + tv) − L(x)k dt.
(0,1]

Therefore
F (x + v) − F (x) − L(x)v = o(v)
since L is continuous at x.

14.7. Implicit Function Theorem

The following result studies how the fixed points of contractions vary with respect a pa-
rameter. From this result, we will generalize the celebrated implicit function theorem of
multivariate calculus to the more general setting of Banach spaces.
14.7. Implicit Function Theorem 447

Let U and V be open subsets of Banach spaces X and Y respectively. Consider a

function F : U × V −→ U . Then, F is a uniform contraction if there exists 0 ≤ θ < 1
such that

(14.17) |F (x, y) − F (x′ , y)| ≤ θ|x − x′ | x, x′ ∈ U , y ∈ V.

Theorem 14.7.1. (Uniform contraction principle) Suppose W and V are closed and open
subsets of Banach spaces X and Y respectively. Let F : W × V −→ W be a uniform
contraction and let x∗ (y) be the unique fixed point of F (·, y) : W −→ W .
(i) If F ∈ C(W × V, X) then, x∗ ∈ C(V, X).
Suppose W = U where U is an open subset of X and that F (U × V ) ⊂ U .
(ii) If F ∈ C(U × V, X) and F ∈ C r (U × V, X) (r ≥ 1) then, x∗ ∈ C r (V, X) and
−1
(14.18) x′∗ (y) = I − ∂x F (x∗ (y), y) ∂y F (x∗ (y), y), y ∈ V.

Proof. (i) From

kx∗ (y + h) − x∗ (y)k = kF (x∗ (y + h), y + h) − F (x∗ (y), y)k

≤ kF (x∗ (y + h), y + h) − F (x∗ (y), y + h))k + kF (x∗ (y), y + h) − F (x∗ (y), y)k
< θkx∗ (y + h) − x∗ (y)k + kF (x∗ (y), y + h) − F (x∗ (y), y)k.

From the continuity of F on W × V we obtain that

1
kx∗ (y + h) − x∗ (y)k ≤ kF (x∗ (y), y + h) − F (x∗ (y), y)k → 0
1−θ
as h → 0. Hence, x∗ ∈ C(V, X).

(ii) The assumption F (U ×V ) ⊂ U implies that x∗ maps V into U since x∗ (y) = F (x∗ (y), y).
By the chain rule,

(14.19) x′∗ (y) = ∂x F (x∗ (y), y)x′∗ (y) + ∂y F (x∗ (y), y)

at every y ∈ V where x∗ is differentiable. Consider (14.19) as a fixed point equation

T (z, y) = z where T : L(Y, X) × V → L(Y, X) is given by

(14.20) T (z, y) = ∂x F (x∗ (y), y)z + ∂y F (x∗ (y), y).

The inverse part of mean value theorem 14.6.3 along with (14.17) shows that

(14.21) sup k∂x F (x, y)k ≤ θ.

(x,y)∈U ×V

Hence T is a uniform contraction and, by the first part of the proof, T has a continuous fixed
point z : V → L(Y, X). We will show that z is in fact the derivative of x∗ . We fix y ∈ V ,
and set B(y) = ∂x F (x∗ (y), y), A(y) = ∂y F (x∗ (y), y). Let h(k) := x∗ (y + k) − x∗ (y) for all
448 14. Calculus on Banach spaces

k small enough. The fixed point property of x∗ and z together with the differentiability of
F implies that for all k small enough
(I − B(y))(h(k) − z(y)k) = F (x∗ (y + k), y + k) − F (x∗ (y), y) − B(y)h(k) − A(y)k
= F (x∗ (y) + h(k), y + k) − F (x∗ (y), y) − B(y)h(k) − A(y)k
:= P (h(k), k),

where kP (h,k)k
khk+kkk → 0 as (h, k) → (0, 0). From (14.21), (I − B(y)) ∈ L(X) is an invertible
operator with (I − B(y))−1 ∈ L(X). This shows that
x∗ (y + k) = x∗ (y) + z(y)k + r(k)
where r(k) = o(k) as k → 0

For r > 1, the result follows by induction. Suppose the result holds for r − 1, then at least
x ∈ C r−1 (V, X). The fact that x∗ satisfies (14.19) implies that
−1
(14.22) x′∗ (y) = I − ∂x F (x∗ (y), y) ∂y F (x∗ (y), y)
Since the map T 7→ T −1 from GL(X) to GL(X) is differentiable, it follows that x∗ ∈ C r (V )
whenever F ∈ C r (U × V, Y ).
Remark 14.7.2. The continuity of x∗ in Theorem 14.7.1 holds if one assumes F ∈ C(U ×
V, X) and F (U × V ) ⊂ U . Theorem 14.7.1 holds when F ∈ C r (W × V, X), where W ⊂ X
is an open subset containing U and F (U × V ) ⊂ U .

We now prove one of the fundamental theorems in differential Calculus.

Theorem 14.7.3. (Implicit Function Theorem) Let X, Y and Z be Banach spaces, Ω ⊂
X × Y open and F ∈ C r (Ω, Z) for some r ≥ 0. When r = 0 assume that ∂x F ∈ C(Ω). If
∂x F (x0 , y0 ) ∈ L(X, Z) has a bounded inverse for some (x0 , y0 ) ∈ Ω then, there is an open
neighborhood U × V ⊂ Ω of (x0 , y0 ) and a unique function g : V −→ U such that
g(y0 ) = x0
F (g(y), y) = F (x0 , y0 ).
Moreover, g ∈ C r (V, X) and if r ≥ 1, then
−1
(14.23) g ′ (y) = − ∂x F (g(y), y) ∂y F (g(y), y), y ∈ V.

Proof. Define G : Ω −→ X by
−1
G(x, y) = x − ∂x F (x0 , y0 ) (F (x, y) − F (x0 , y0 ))
Observe thatG has the same smoothness as F ; moreover, x = G(x, y) is equivalent to
F (x, y) = F (x0 , y0 ). Since ∂x G(x0 , y0 ) = 0, for any 0 < θ < 1 there exists open balls U and
V1 around x0 and y0 respectively, such that U × V1 ⊂ Ω and sup(x,y)∈U ×V1 k∂x G(x, y)k ≤
θ < 1. The mean value theorem implies that
kG(x, y) − G(x′ , y)k ≤ θkx − x′ k, x, x′ ∈ U , y ∈ V1
14.8. Existence and uniqueness of solutions to differential equations 449

Let δ = rad(U ). Since F in continuous on U × V1 and

−1
kG(x0 , y) − x0 k ≤ k ∂x F (x0 , y0 ) kkF (x0 , y) − F (x0 , y0 )k,
there is an open ball V ⊂ V1 around y0 such that kG(x0 , y) − x0 k < (1 − θ)δ. Hence,
kG(x, y) − x0 k ≤ kG(x, y) − G(x0 , y)k + kG(x0 , y) − y0 k < δ
for all x ∈ U and y ∈ V . This shows that G : U × V −→ U is a uniform contraction with
G ∈ C r (U × V, X). By Theorem (14.7.1), for each y ∈ V there is a unique g(y) ∈ U such
that F (g(y), y) = F (x0 , y0 ); moreover, g ∈ C r (V, X) and, if r ≥ 1,
−1 −1
g ′ (y) = I − ∂x G(g(y), y) ∂y G(g(y), y) = − ∂x F (g(y), y) ∂y F (g(y), y)
for all y ∈ V .

Another consequence of the uniform contraction principle is the following important

result in differential Calculus
Theorem 14.7.4. (Inverse Function Theorem) Let X, Y be Banach spaces, W ⊂ X open,
and let f ∈ C r (W, Y ), r ≥ 1. If f ′ (x0 ) has a bounded inverse for some x0 ∈ W , then there
exists an open set U ⊂ W containing x0 such that f (U ) is open, f : U −→ f (U ) is bijective.
Moreover, the inverse function satisfies g ∈ C r (f (U ), X) and
−1
(14.24) g ′ (y) = f ′ (g(y) , y ∈ f (U ).

Proof. Applying the implicit function theorem to F (x, y) = y − f (x) gives neighborhoods
U ′ ⊂ W and V ⊂ Y around x0 and y0 = f (x0 ) respectively, such that for each y ∈ V ,
there exists a unique g(y) ∈ U ′ satisfying y = f (g(y)). Moreover, the relation g : y 7→
g(y) is necessarily in C r (V, X). This uniqueness shows that f is injective in U ′ . The set
U = U ′ ∩ f −1 (V ) is an open neighborhood of x0 with V = f (U ), and thus, f : U −→ V
is a bijective function whose inverse f −1 = g. Finally, equation 14.24 follows directly
from (14.23).

14.8. Existence and uniqueness of solutions to differential equations

As another application of Banach fixed point theorem and the uniform contraction principle
to find solutions to the initial valued problem
(14.25) ẋ(t) = f (t, x(t)), x(t0 ) = x0
where f : D ⊂ R × Rn → Rn and D is open, satisfies some regularity conditions and
(t0 , x0 ) ∈ D. Here, x is defined in a neighborhood I of t0 and its graph t 7→ (t, x(t)) ∈ D
for all t ∈ I.
Theorem 14.8.1. (Picard) Let D ⊂ R × Rd open f : D → Rn be a continuous function that
is locally Lipschitz in x, that is for any (t0 , x0 ) ∈ D, there is an interval I around t0 and
an open neighborhood U of x0 such that I × U ⊂ D and
(14.26) |f (t, x) − f (t, y)| ≤ L|x − y|
450 14. Calculus on Banach spaces

for all t ∈ I and x, y ∈ U . Then, for any (t0 , x0 ) ∈ D, there exits δ > 0 and a function
x(·; (t0 , x0 )) ∈ C 1 ((t0 − δ, t0 + δ); Rn ) satisfying (14.25). Furthermore, there is η > 0 such
that on (t0 − η, t0 + η)2 × B(x0 ; η), the map (t, (τ, x)) 7→ x(t; (τ, x)) : I → Rn is continuous.

Proof. Existence and uniqueness of solutions. We first prove that for any point (t0 , x0 ) ∈ D
we can find δ > 0 and a neighborhood V where (14.25) admits a solution with initial
conditions (τ, x) ∈ V in the interval (τ − δ, τ + δ). Let a, b > 0 such that K := (t0 , x0 ) ∈
[t0 −a, t0 +a]×B(x0 ; b) ⊂ D and (14.26) holds. Let m = sup(t,x)∈K |f (t, x)| and choose δ > 0
so that (i) δ < a2 , (ii) mδ ≤ 2b , and (iii) Lδ < 1. Define F as the family of all continuous
functions ϕ on Iδ := [−δ, δ] such that ϕ(0) = 0 and kϕku(Iδ ) ≤ 2b equipped with the uniform

norm. For each (τ, x) ∈ K ′ := (t0 − δ, t0 + δ) × B x0 ; 2b , define the transformation T(τ,x)
on F by
Z t+τ
T(τ,x) ϕ(t) = f (s, ϕ(s − τ ) + x) ds, t ∈ Iδ .
τ

We claim that T(τ,x) is a uniform contraction on F. Indeed, if ϕ ∈ F then clearly

T(τ,x) ϕ(0) = 0. Also, for any t ∈ Iδ
Z t+τ
b

|T(τ,x) ϕ(t)| ≤ f (s, ϕ(s − τ ) + x) ds ≤ mδ ≤
τ 2
If ψ is another function in F, then
Z
t+τ
|T(τ,x) ϕ(t) − T(τ,x) ψ(t)| ≤ f (u, ϕ(s − τ ) + x) − f (s, ψ(s − τ ) + x) ds
τ
≤ Lδkϕ − ψku(Iδ ) .

This proves the claim. Since F is a closed subset of the Banach space C(Iδ ; Rn ), k · ku(Iδ ) ,
each T(τ,x) admits a unique fixed point ϕ∗(τ,x) ∈ F. The fundamental theorem of Calculus
shows that ϕ∗(τ,x) ∈ C 1 ((−δ, δ); Rn ); hence, x(t; (τ, x)) = ϕ∗(τ,x) (t − τ ) + x is continuous on
[τ − δ, τ + δ], continuously differentiable on Iδ (τ ) := (τ − δ, τ + δ), and satisfies
Z t
∗
x(t; (τ, x)) = x + ϕ(τ,x) (t − τ ) = x + f (s, ϕ∗(τ,x) (s − τ ) + x) ds
τ
Z t
=x+ f (s, x(s; (τ, x)) ds.
τ

Consequently, x(·, (τ, x)) is the only function on Iδ (τ ) solving (14.25) with x(τ ; (τ, x)) = x.

Local continuity with respect initial conditions. We will that T : F × K ′ → F given by

(ϕ, (τ, x)) 7→ T(τ,x) ϕ is continuous. Fix ϕ ∈ F and (t1 , x1 ) ∈ K ′ . As ϕ is uniformly
continuous, given ε > 0, choose 0 < δ̂ < ε ∧ 2b ∧ δ so that |ϕ(s) − ϕ(s′ )| < 2m+2 ε
whenever
′
|s − s | < δ̂. Hence, if kϕ − ψku(Iδ ) ∨ |t2 − t1 | ∨ |x2 − x1 | < δ̂ and |t| ≤ δ (assuming for
14.8. Existence and uniqueness of solutions to differential equations 451

simplicity that t1 ≤ t2 ≤ t1 + t, other cases handled similarly),

Z t+t1
|T(t1 ,x1 ) ϕ(t) − T(t2 ,x2 ) ψ(t)| ≤ 2mδ̂ + |f (s, ϕ(s − t1 ) + x1 ) − f (s, ψ(s − t2 ) + x2 )| ds
t2
Z t+t1
≤ 2mδ̂ + L |ϕ(s − t1 ) − ψ(s − t2 )| ds + Lδ δ̂
t2
≤ 2mδ̂ + Lδ(δ̂ + ε) < ε
Hence, by the uniform contraction principle, the map φ : K ′ → F given by (τ, x) 7→ ϕ∗(τ,x) (·)
is continuous. We further claim that the function φ∗ : Iδ × K ′ → Rn given by (t, (τ, x)) 7→
ϕ∗(τ,x) (t) is continuous. Fix (t, (t1 , x1 )) ∈ Iδ × K ′ . As φ and ϕ∗(t1 ,x1 ) are continuous on K ′
and Iδ respectively, given ε > 0, there is 0 < δ ′ < ε ∧ δ such that
ε
|ϕ∗(t1 ,x1 ) (t) − ϕ∗(t1 ,x1 ) (s)| <
2
∗ ∗ ε
kϕ(t1 ,x1 ) − ϕ(t2 ,x2 ) ku(Iδ ) <
2
whenever (s, (t2 , x2 )) ∈ Iδ × K and |t − s| ∨ |t1 − t2 | ∨ |x1 − x2 | < δ ′ . This in tun implies
′

that
|ϕ∗(t1 ,x1 ) (t) − ϕ∗(t2 ,x2 ) (s)| ≤ |ϕ∗(t1 ,x1 ) (t) − ϕ∗(t1 ,x1 ) (s)| + |ϕ∗(t1 ,x1 ) (s) − ϕ∗(t2 ,x2 ) (s)| < ε.
It follows that x : (t, (τ, x)) 7→ x(t; (τ, x)) is continuous on V = {(t, (τ, x)) : (τ, x) ∈
K ′ , |t − τ | < δ}.

For each point (t0 , x0 ) in the domain of the vector field f , the solution x(t) = x(·; (t0 , x0 ))
to (14.25) provided by Theorem 14.8.1 is only defined in a neighborhood of t0 . Such solution
can be extended uniquely to a continuously differentiable function defined in a maximum
interval. Suppose y(t) and z(t) are solutions to (14.25) with y(t0 ) = x0 = z(t0 ) defined in
an interval J containing Iδ (t0 ). We claim that y(t) ≡ z(t). Otherwise, there is an interval
Iδ (t0 ) ⊂ [a, b] ⊂ J such that y = z on [a, b] but that in any neighborhood of b (or (a) )there
is a point t′ with y(t′ ) 6= z(t′ ). Applying Picard’s construction around the point (b, y(b)) we
obtain a unique solution φ to the problem (14.25) with φ(b) = y(b) in an interval containing
b. This is a contradiction to y 6= z. Hence, x(·; (t0 , x0 )) can be extended uniquely to a
maximal interval Iδ (t0 ) ⊂ J(t0 , x0 ) as a continuously differential function (also denoted by
x) satisfying (14.25) on J(t0 , x0 ).
Local continuity of solutions to (14.25) can be extended to the whole domain (maximal
interval) of definition.
Theorem 14.8.2. Consider the initial valued problems
ẋ(t) = f (t, x(t)), x(t0 ) = x0
ẏ(t) = g(t, y(y)), y(t0 ) = y0
Assume that g satisfies the conditions of Theorem 14.8.1 and that
|f (t, x) − f (t, y)| ≤ L|x − y|
452 14. Calculus on Banach spaces

for all (t, x) and (t, y) in D. If

|x0 − y0 | ≤ δ
sup |f (t, x) − g(t, x)| ≤ ε
(t,x)∈D

then

|x(t; (t0 , x0 )) − y(t; (t0 , y0 ))| ≤ δ + ε|t − t0 | exp L|t − t0 |
for all t in the intersection of the maximal domain of definition of x(·; (t0 , x0 )) and y(·; (t0 , y0 )).

Proof. For simplicity we set x(t) = x(t; (t0 , x0 )) and y(t) = y(t; (t0 , y0 )). Then, for t ≥ t0
Z t

|x(t) − y(t)| ≤ |x0 − y0 | + f (s, x(s)) − g(s, y(s)) ds
t0
Z t Z t

≤δ+ f (s, y(s)) − g(s, y(s)) ds + f (s, x(s)) − f (s, y(s)) ds
t0 t0
Z t
≤ δ + ε(t − t0 ) + L |x(s) − y(s)| ds
t0

Applying Gronwall’s inequality with u(t) = |x(t) − y(t)|, α(t) = δ + ε(t − t0 ) and β = L we
obtain that

|x(t) − y(t)| ≤ (δ + ε(t − t0 )) exp L(t − t0 ) .
For t ≤ t0 the proof is similar.

14.9. Optimization and Lagrange Multipliers

Suppose D is a nonempty subset in the Banach space X and let F : D → R. A point x0 ∈ D
is a local minimum of F of there δ > 0 such that F (x) ≥ F (x0 ) for all x ∈ B(x0 ; δ) ∩ D.
A point y0 ∈ D is a local maximum of F if y0 is a local minimum of −F .
Theorem 14.9.1. Suppose F : U → R, where U is open in X, has a local minimum at x0 .
If F is differentiable at x0 , then F ′ (x0 ) = 0.

Proof. By hypothesis, there is δ > 0 such that F (x) ≥ F (x0 ) for all x ∈ B(x0 ; δ) ⊂ U .
For any unitary vector u ∈ X and |t| < δ we have that x0 + tu ⊂ B(x0 ; δ). Define
gu : t 7→ F (x0 + tu). Clearly gu has a local minimum at t = 0 and it is differentuable at
t = 0. By a classical result of real–valued Calculus gu′ (0) = F ′ (x0 )u = 0. As this holds for
any unitary vector in X, we conclude that F ′ (x0 ) = 0.

In the remaining of this section, we will consider the problem of finding local extreme
points of a funcion under functional constrains.
Theorem 14.9.2. (Surjective Theorem) Let X, Y be Banach spaces and Ω ⊂ X open.
Assume that F ∈ C 1 (U, Y ) and that for some x0 ∈ U , F ′ (x0 ) has a right hand inverse in
L(Y, X). Then, F (Ω) contains an open ball around f (x0 ).
14.9. Optimization and Lagrange Multipliers 453

Proof. Let L ∈ L(Y, X) be a right hand side inverse of A = f ′ (x0 ) and let c = kLk. By
Corollary 14.6.5, there exists a ball B(x0 ; δ) such that
δ
kf (u) − f (v) − A(u − v)k < ku − vk, u, v ∈ B(x0 ; δ).
2kck
1 1
We will show that the the ball B(y0 ; 2c ) ⊂ f (Ω), where y0 = f (x0 ). For y ∈ B(y0 ; 2c ),
we define inductively the following sequence (xn : n ≥ 0) as follows. Starting at x0 , we let
x1 = x0 + L(y − y0 ), and for n ≥ 1

(14.27) xn+1 = xn − L f (xn ) − f (xn−1 ) − A(xn − xn−1 ) .
We show by induction that xn ∈ B(x0 ; δ) and kxn − xn−1 k ≤ δ2−n . Indeed, for n = 1
δ
kx1 − x0 k ≤ cky − y0 k ≤ ,
2
and if the statement holds for n ≥ 1, then
1
kxn+1 − xn k ≤ ckf (xn ) − f (xn−1 − A(xn − xn−1 )k ≤ kxn − xn−1 k < δ2−n−1 .
2
Hence,
n
X n
X
kxn+1 − x0 k ≤ kxk − xk−1 k ≤ δ2−k < δ.
k=1 k=1

It follows immediately that {xn : n ≥ 0} is a Cauchy sequence in B(x0 ; δ) and thus, by

completeness, xn → x as n → ∞ for some x ∈ B(x0 ; δ). From (14.27), we have that
A(xn+1 − xn ) = A(xn ) − A(xn−1 ) − f (xn ) − f (xn−1 );
whence we obtain that
A(xn+1 ) − xn ) = A(x1 − x0 ) − f (xn ) − f (x0 )
= AL(y − y0 ) − f (xn ) − y0 ) = y − f (xn )
Letting n → ∞ and using the continuity of f we conclude that y = f (x).
Corollary 14.9.3. Suppose f ∈ C(Ω, Y ) where Ω ⊂ X is an open subset in the Banach
space X, and Y is a finite dimensional Banach space. If f ′ (x0 ) is surjective, then f (Ω)
contains an open neighborhood of y0 = f (x0 ).

Proof. It suffices to consider the case when Y = F n . Let {ek : 1 ≤ n} be the standard
basis for F n . If A = f ′ (x0 ) is surjective, then there are {uk : 1 ≤ k ≤ n} in X such
thatPAuk = ek . Define Lek = uk , k = 1, . . . , n and linearly extend L to all of Y . For any
y = nk=1 αk ek , we have
Xn X n 1/2 X
n 1/2 X
n 1/2
2 2
kLyk ≤ |αk kuk k ≤ |α| kuk k = kyk kuk k2 .
k=1 k=1 k=1 k=1
This shows that L ∈ L(Y, X). Therefore, the conclusion of the result follows immediately
from the surjective theorem.
454 14. Calculus on Banach spaces

The following application of the preceding Corollary to the Surjective Theorem is to the
problem of nonlinear optimization with constraints.
Theorem 14.9.4. (Lagrange Multipliers) Let f ,g1 , . . . , gn be functions in C 1 (Ω, R), where
Ω ⊂ X is an open subset of a Banach space X. Let M = {x ∈ Ω : g1 (x) = · · · gn (x) = 0}.
If x0 is a local maximum point of f restricted to M , then there exists a nontrivial linear
relation of the form
n
X
(14.28) µf ′ (x0 ) + λk gk′ (x0 ) = 0.
k=1
Moreover, if {gk′ (x0 )
: 1 ≤ k ≤ n} is a linearly independent family in L(X, R), then µ 6= 0
and there is a unique solution to
n
X
′
(14.29) f (x0 ) + λk g ′ (x0 ) = 0.
k=1

Proof. Let U be a ball around x0 such that f (x0 ) ≥ f (x) for all x ∈ U ∩ M . Let F : U −→
Rn+1 be the function given by
F (x) = (f (x), g1 (x), . . . , gn (x))⊤
For any r > f (x0 ), the vector (r, 0, . . . , 0)⊤ ∈
/ F (U ). Hence, F (U ) does not contained any
open neighborhood of the point (f (x0 ), g1 (x0 ), . . . , gn (x0 ))⊤ = (f (x0 ), 0, . . . , 0))⊤ . Then,
F ′ (x0 ) is not surjective. Therefore, the range V of F ′ (x0 ) is a proper subspace of Rn+1 . Let
(µ, λ)⊤ = (µ, λ1 , . . . , λn )⊤ be nonzero element in V ⊥ . Then
n
X
′
µf (x0 )v + λk g ′ (x0 )v = 0
k=1
for all v ∈ X and (14.28) follows.

If in addition G = {g1′ (x0 ), . . . , gn′ (x0 )} is linearly independent, then µ 6= 0. Dividing by µ if

necessary, one can assume µ = 1. The uniqueness of λ follows from the linear independence
of G.

14.10. Exercises
Exercise 14.10.1. Suppose that E is a Banach space. Show that if f is and E–valued
measurable function, then Λf is measurable for all Λ ∈ E ∗ , where E ∗ is the space of
continuous linear functionals on E.
Exercise 14.10.2. Any f ∈ CΩ has a unique representation f √ = u + i v, where u, v ∈ RΩ .
Show that f is measurable iff u, v ∈ MR. In either case, |f | = u2 + v 2 ∈ MR.
Exercise 14.10.3. Show that k k∗ defines a complete seminorm on the space FC∗ of
complex–valued functions with finite mean k k∗C.
Exercise 14.10.4. Let CΩ ∋ f = u + i v, where u, v ∈ RΩ . Show that
14.10. Exercises 455

(a) f ∈ L1 (C) iff u, v ∈ L1 .

(b) If f ∈ L1 (C), then |f | ∈ L1 .
(c) f is measurable as in Definition 14.1.1 iff for any set A ∈ L1 and ε > 0, there are an
u u
integrable set A0 ⊂ A and a function ϕ ∈ E ⊗ R = E ⊗ C such that kA \ A0 k < ε
and f = ϕ on A0 .
(d) If k k∗ is Daniell’s mean, then f ∈ L1 (C) iff f is measurable and |f | ∈ L1 .
(e) Dominated convergence: if {fn } ⊂ L1 (C), fn → f almost surely, and supn |fn | ≤ g,
where g ∈ F, then f ∈ L1 (C) and kfn − f k∗C → 0.
Exercise 14.10.5. Suppose {fn } ⊂ E Ω is a sequence of strongly measurable function that
converges to f pointwise. Show that f is strongly measurable. (Hint: Show that for any
A ∈ L1 and ε > 0, there is an integrable set A0 ⊂ A with kA \ A0 k < ε on which each fn is
uniform limit of sequence in E ⊗ E. Then apply Egorov’s theorem 14.1.4.)
Exercise 14.10.6. If E is finite dimensional, show that strongly measurability and measur-
ability coincide and that f is measurable iff f −1 (U ) ∈ M for all U ⊂ E open. (Hint: suffices
to consider Rd . Then f is strongly measurable iff each of its components is measurable)
Exercise 14.10.7. Suppose Ω is a compact Hausdorff space and (Ω, B(Ω), µ) is a finite
Borel measure space. If f is an E–valued continuous function in Ω, show that f ∈ L1 (k k∗E )
where k k∗ = µ(| |) and k k∗E = µ(| |E ).
Exercise 14.10.8. Suppose Ω is an open set in C, X is a Fréchet space, and that f : Ω → X
is holomorphic. State and prove a theorm
P concerning the power series representation of f ,
that is, concerning the formula f (z) = ∞ c
n=0 n (z − a) n for a ∈ Ω where c ∈ X.
n

The following next four exercises deal with basic resutls from single–variable real valued
functions.
Exercise 14.10.9. Suppose f : (a, b) → R has a local minimum at some point x0 ∈ (a.b).
If f is differentiable at x0 , show that f ′ (x0 ) = 0.
Exercise 14.10.10. (Rolle’s theorem) Suppose f : [a, b] → R (−∞ < a < b < ∞) is
continuous function, and that f is differentiable in (a, b). If f (a) = f (b), show that there is
c ∈ (a, b) such that f ′ (c) = 0.
Exercise 14.10.11. (General Mean value theorem) Suppose f, g : [a, b] → R (−∞ < a <
b < ∞) are continuous functions, and that both f and g are differentiable in (a, b). Show
that there is a point c ∈ (a, b) such that g ′ (c)(f (b) − f (a)) = f ′ (c)(g(b) − g(a)). (Hint:
consider the function h(x) = (f (x) − f (a))(g(b) − g(a)) − (f (b) − f (a))(g(x) − g(a)).) The
version g(x) = x is known as Cauchy’s mean value theorem.
Exercise 14.10.12. (L’Hôpital’s rule) Suppose f and g are real valued functions defined
in an interval I. For a ∈ I, suppose that limx→a f (x) = 0 = limx→a g(x).
(a) If f and g are differentiable at a and g ′ (a) 6= 0, show that
f (x) f ′ (a)
lim = ′
x→a g(x) g (a)
456 14. Calculus on Banach spaces

(b) Consversely, suppose f and g are differentiable in a neighborhood of a. Show that

′ (x)
if limx→a fg′ (x) exists, then so does limx→a fg(x)
(x)
and
f (x) f ′ (x)
lim = lim ′
x→a g(x) x→a g (x)

(Hint: Without loss of generality, assume f (a) = 0 = g(a). Apply the mean value
theorem (general) to f and g over the interval with enpoints a and x ∈ I.)
Exercise 14.10.13. (Taylor approximation) Suppose f is a real-valued function defined in
an interval I.
(a) (Peano’s residual) Suppose f has n finite derivatives at a ∈ I, that is, f ′ (a), . . . , f (n) (a)
exits. Show that
n
X f (k) (a)
rn (x) := f (x) − (x − a)k = o (x − a)n
k!
k=0
(Hint: Use L’Hôpital’s rule (a) and (b) to rn (x) and (x − a)n .)
Suppose f and g admit n ≥ 0 continuous derivatives in (α, β) ⊂ I, and that f (n+1) and
g (n+1) exist in (α, β). Let α < a < β and fix x ∈ (α, β). Define
n
X f (k) (t)
F (t) = (x − t)k
k!
k=0
n
X g (k) (t)
G(t) = (x − t)k
k!
k=0
(x−t)n (n+1)
(b) Show that F (x) = f (x) and F ′ (t) = n! f (t), and similarly, G(x) = g(x),
n
G′ (t) = (x−t)
n! g
(n+1) (t).

(c) For any x ∈ [α, β] and x 6= a, show that there is ξ between a and x such that
n
X f (k) (a) n
X g (k) (a)
f (x) − (x − a)k g (n+1) (ξ) = g(x) − (x − a)k f (n+1) (ξ)
k! k!
k=0 k=0
(d) (Lagrange’s residual) Show that there is a point ξ between a and x such that
n
X f (k) (a) f (n+1) (ξ)
(14.30) f (x) = (x − a)k + (x − a)n+1
k! (n + 1)!
k=0

(Hint: Set g(t) = (t − a)n+1 in the definition of G.)

(e) If in addition f (n+1) is continuous in (α, β), show that
Xn Z
f (k) (a) k 1 x
f (x) − (x − a) = (x − t)n f (n+1) (t) dt
k! n! a
k=0
(Hint: integration by parts)
14.10. Exercises 457

Exercise 14.10.14. In Theorem 12.6.10 it was showed that the group GL(X) of bounded
operators on a Banach space X whose inverses are also bounded is open in L(X). Show
that the map T 7→ T −1 on GL(X) is differentiable and compute its derivative.
Exercise 14.10.15. Let X and Y be two normed spaces. If T ∈ L(X, Y ), show that
x 7→ L(x) is differentiable everywhere and that L′ (x) = L.
Chapter 15

Fourier transform and

Convolution on Rn

15.1. Fourier transform

Definition 15.1.1. Let µ be a Complex measure on (Rd , B(Rd )). The Fourier transform
or characteristic function of µ is the function defined as
Z
b(t) = exp(it · x)µ(dx)
µ

Observe that gt (x) := eix·t satisfies |gt | ≡ 1; hence gt ∈ L1 (|µ|) for all t ∈ Rn and so, µ
b
n
is a well defined from R to C.
p
Example 15.1.2. The Bernoulli measure ηa,b (0 ≤ p ≤ 1, a 6= b) on R is given by
p
ηa,b := (1 − p)δa + pδb . This measure corresponds to the flipping a biased coin that results in
heads up (with a value of a) with probability p or tails up (with a value of b) with probability
1 − p. Its characteristic function is given by ηd
p
a,b (t) = (1 − p)e
ita + peitb . Special cases are

the symmetric Bernoulli measure where η = 12 (δ−1 + δ1 ), in which case ηb(t) = cos t; the
p p
Bernoulli 0–1 measure with probability of success η0,1 ({1}) = p where η0,1 = pδ1 + (1 − p)δ0 ,
dp it
in which case η (t) = pe + (1 − p).
0,1

Example 15.1.3. The uniform distribution on R over (a, b) is the measure Ua,b (dx) =
1 d eibt −eiat
b−a 1(a,b) (x) dx. Its characteristic function is Ua,b (t) = it(b−a) .

Example 15.1.4. The exponential distribution E(λ) with parameter λ > 0 is given by
µ(dx; λ) = λe−λx 1[0,∞) (x) dx. Its characteristic function is
Z ∞
(it−λ)x ∞ λ
b(t; λ) =
µ λeixt e−λx dx = λ e it−λ 0 = .
0 λ − it

459
460 15. Fourier transform and Convolution on Rn

1
For λ = 1, the reflected exponential is µr (dx) = ex 1(−∞,0] (x) dx and thus, µ cr (t) = 1+it .
1 1 −|x|
The double exponential distribution ν(dx) = 2 (µr + µ)(dx) = 2 e dx has characteristic
function
1 1 1
νb(t) = + = .
2(1 − it) 2(1 + it) 1 + t2
Theorem 15.1.5. Suppose that µ and ν are complex measures (measures of finite variation)
on B(Rd ). Then, µ = ν iff µ
b = νb.

Proof. Denote by ft (x) = exp(x · t) and consider the collection M of all such functions;
observe that f0 ≡ 1 ∈ M. This is a complex multiplicative family contained in the space of
all bounded complex valued Borel measurable functions V. The later is a complex vector
space and a bounded class. By the Complex Bounded Class Theorem, V contains all
the bounded complex valued σ(M)–measurable functions, which contains in particular all
functions of the form 1B , B ∈ σ(M). Since µ and ν coincide in M, then by Dominated
Convergence they also coincide in σ(M). Consider the maps γt (x) = t · x, with t ∈ Rd and
observe that they generate B(Rd ). Since
γt (x) = t · x = −i lim n(ft/n (x) − f0 (x)),
n

each γt is σ(M)–measurable. Therefore σ(M) = B(Rd ) and µ = ν.

Lemma 15.1.6. Suppose µ is a Borel probability measure on R. If |bµ(t)| = 1 for some
t 6= 0, then there are b ∈ R and h > 0 such that supp µ ⊂ b + hZ.

Proof. Without loss of generality assume t > 0. Then form some θ ∈ (−π, π],
Z
−iθ
1=e µ b(t) = cos(xt − θ) µ(dx)

θ 2π
By Theorem 4.3.11, x 7→ cos(xt − θ) = 1 µ–a.s. Hence, supp µ ⊂ t + t Z.
Theorem 15.1.7. Suppose µ is a probability measure on Rn . If |b
µ(t)| = 1 in a small
neighborhood of 0, then µ = δb for some b ∈ Rn .

Proof. By considering te law of each component of Rn , it is enough to consider the case

n = 1. For every t in a neighboorhood of 0, there is a number bt such that supp µ ⊂ bt + 2π |t| Z.
If µ is not a trivial distribution, then there are distinct points x1 and x2 of positive measure
µ. It follows that |x1 −x2 | ≥ 2π|t| . This is not possible as |t| can be taken to be very small.

Given a positive finite measure µ, its reflection µr is given by µr (A) = µ(−A) for all
A ∈ B(Rn ). Then, for any bounded measurable function f
Z Z
f (x) µr (dx) = f (−x) µ(dx)

Consequently, µcr (t) = µ

b(t) = µb(−t) and the real part Re(b b(t) is the characteristic
µ(t)) of µ
1
function of the measure 2 (µ + µ r ). In particular, if µ is symmetric, b is
i.e. µ = µr , then µ
15.1. Fourier transform 461

real and
Z
b(t) =
µ cos(x · t)µ(dt)

15.1.1. Smoothness of the Fourier transform. Here we present an analysis of the re-
lation between moments of a measure and the degree of smoothness of its Fourier transform.
Pn
For any t ∈ Rn and α ∈ Zn+ we denote tα = tα1 1 · · · tαnn , |α| = j=1 αj , and α! =
α1 ! · · · αn !.
Lemma 15.1.8. For any n ∈ Z+ and x ∈ R
n |x|n+1 2|x|n
ix X (ix)k
(15.1) e − ≤ min ,
k! (n + 1)! n!
k=0

P (ix)k
Proof. Let h−1 (x) := eix , and hn (x) := eix − nk=0 k! for n ≥ 0. It is easy to check that
Z x
(15.2) hn (x) = i hn−1 (s) ds, n ≥ 0.
0
Since |h0 (t)| ≤ 2 and |h−1 | = 1, it follows (15.2) that |h0 (x)| ≤ |x| ∧ 2. By induction,
if (15.1) holds for n − 1 then, from (15.2) we obtain that
|x|n+1 2|x|n
|hn (x)| ≤ min , .
(n + 1)! n!
Theorem 15.1.9. Suppose that µ is a complex measure on (Rn , B(Rn ). If
Z
|xj |m |µ|(dx) < ∞,
Rn
then the partial derivative ∂jm µ
b
exists, is uniformly continuous, and
Z
(15.3) ∂jk µ
b(t) = ik xkj eix·t µ(dx), 0 ≤ k ≤ m.
Rn
P m
n 2
Moreover, if |x|m = 2
j=1 xj b ∈ C m (Rn ), and
∈ L1 (|µ|), then µ
X i|α| Z
(15.4) b(t) =
µ tα xα µ(dx) + o(|t|m )
α!
0≤|α|≤m

dµ R
Proof. Let f = d|µ| . Since kµk is finite, |xj |k |µ|(dx) < ∞ for all 0 ≤ k ≤ m. We proceed
by induction. For k = 0 there is nothing to proof. Suppose the statement is valid for
ixj h
0 ≤ k < m. Since e h −1 ≤ |xj |, by dominated convergence we get that
Z
(∂jk µ
b)(t + hej ) − (∂jk µ
b)(t) eixj h − 1
lim = lim (ixj )k eix·t f (x)|µ|(dx)
h→0 h h→0 h
Z
=i k+1
xk+1
j eix·t f (x)|µ|(dx).
462 15. Fourier transform and Convolution on Rn

This shows that ∂jk+1 µ

b exists and that (15.3) holds.
To prove the last statement, we take t · x in place of x in Lemma 15.1.8 to obtain
n Z
X Z n
(it·x)k it·x X (it·x)k
µb(t) − k! µ(dx) ≤ e − k! |µ|(dx)
k=1 k=1
Z
n+1 2|t·x|n
≤ min |t·x|
(n+1)! , n! |µ|(dx)
Z
n+1 2|x|n
≤ |t|n min (|t||x|)
(n+1)! , n! |µ|(dx).

The conclusion follows by dominated convergence, since

Z
n+1 2|x|n
lim min (|t||x|)
(n+1)! , n! |µ|(dx) = 0
t→0
2 2 /2
Lemma 15.1.10. For the normal distribution µ(dx) = √12π e−x /2 dx, µ
b(t) = e−t . More-
R
over, denoting Mn = xn µ(dx), M2n−1 = 0 and M2n = (2n)!
2n n! for all n ∈ Z+ .

Proof. We give a simple ODE proof of this fact. First note that
Z Z
1 ixt −x2 /2 1 2
b(t) = √
µ e e dx = √ cos(xt)e−x /2 dx
2π 2π
Lemma 15.1.9 and integration by parts shows that
Z Z
′ 1 −x2 /2 1 2
b (t) = − √
µ x sin(xt)e dx = − √ t cos(xt)e−x /2 dx = −tb
µ(t)
2π 2π
b satisfies the equation
Therefore, µ
b′ (t) + tb
µ µ(t) = 0; b(0) = 1.
µ
2
b(t) = e−t /2 .
The unique solution to this initial value problem is µ
P (i)2n 2n
The last statesment follows from (15.4) and µ b(t) = ∞ n=0 2n n! t .

The following result relates the smoothness of the Fourier transform of a measure to the
existence of finite moments.
Theorem 15.1.11. Let µ be a finite positive measure on (Rn , B(Rn )). If ∂ α µ
b(0) exits and
2m n n
b ∈ C (R ); furthermore, for all α ∈ Z+ with |α| = 2m,
is finite for all |α| = 2m then, µ
R α R
|x | µ(dx) < ∞ and ∂ α µb(t) = i|α| xα eix·t µ(dx).

Proof. We first show that x2k

j ∈ L1 (µ) for all 1 ≤ k ≤ m and 1 ≤ j ≤ n. We proceed by
induction. Fix 1 ≤ j ≤ n, and let uj denote the j–th canonical unit vector in Rn . For any
h∈R
eihuj ·x − 2 + e−ihuj ·x 1 − cos(hxj )
− =2 ≥0
h2 h2
1 − cos(hxj )
lim 2 = x2j ,
h→0 h2
15.1. Fourier transform 463

By Fatou’s lemma
Z Z
2 1 − cos(hxj )
xj µ(dx) ≤ lim inf 2 µ(dx)
h→0 h2
b(huj ) − 2b
µ b(−huj )
µ(0) + µ
= − lim sup 2
= −∂j2 µ
b(0).
h→0 h
Hence the claim holds for k = 1.

Suppose that the claim holds for 1 ≤ k < m. Then

Z
2k 2k
∂j µ b(t) = i x2k eix·t µ(dx)

for all 1 ≤ j ≤ n. By applying the case k = 1 to each measure of the form x2k µ(dx) we obtain
R 2(k+1) 2(k+1) R 2(k+1) ix·t
that xj µ(dx) < ∞ for all 1 ≤ j ≤ n and that ∂j b(t) = i2(k+1) xj
µ e µ(dx).
This completes our induction argument.

It follows from
|x1 + . . . + xn |2m ≤ n2m−1 (x2m 2m
1 + . . . + xn )

thatR xα ∈ L1 (µ) for all α ∈ Zn+ with |α| = 2m, and from Theorem 15.1.9, ∂ α µ
b(t) =
i2m xα eix·t µ(dx).
Lemma
R −δ |x| 15.1.12. Suppose µ is a complex measure on (R, B(R)) of finite variation. If
R izx
e 0 |µ|(dx) < ∞ for some δ0 > 0, then µ b(z) = e µ(dx) has an analytic extension to
the strip D = {z ∈ C : | Im(z)| < δ0 }. Furthermore, for any z ∈ D
Z
µ (z) = i xeizx µ(dx)
′

Proof. The ideas in the proof of Theorem 10.6.5 provide a proof for the present lemma.
dµ
Set f := d|µ| . Then |f | = 1 |µ|–a.s. and µ = f · |µ|. As |eizx | ≤ eδ|x| for any z ∈ H, the map
R izx
z 7→ e µ(x) is a continuous extension of µ b to D. For a + ib = z ∈ D fixed, let δ1 > 0 be
such that B(z; δ1 ) ⊂ H. Clearly, |b| + δ1 < δ0 and, since
δ1 |xeizx | ≤ eδ0 |x| ,
xeizx ∈ L1 (|µ|(dx)). The convexity of the exponential function implies that for any |h| < δ1 ,
ihx
e − 1 e|x||h| − 1 e|x||δ1 | − 1 eδ1 |x| + e−δ1 |x|
≤ ≤ ≤ .
h |h| δ1 δ1
Dominated convergence implies that µ′ (z) exists and
Z
b(z + h) − µ
µ b(z) eihx − 1
b′ (z) = lim
µ = eizx lim f (x) |µ|(dx)
h→0 h h→0 h
Z Z
= i xeizx f (x) |µ|(dx) = i xeizx µ(dx)

b ∈ H(D).
This shows that µ
464 15. Fourier transform and Convolution on Rn

We state now an important theorem about the completeness of orthogonal polynomials

in L2 .

Theorem 15.1.13. (Kolmogorov) Suppose µ is a finite positive measure on R, B(R) . If
R −δ |x|
e 0 µ(x) < ∞ for some δ0 > 0, then span {pn (x) = xn : n ∈ Z+ } is dense in L2 (R, µ).
R
Proof. By hypothesis, |x|n µ(dx) < ∞ for all nonnegative integers n. Assume the state-
ment is false. The Hahn–Banach theorem 12.10.9 and the Riesz representation
R theorem for
Hilbert spaces imply that for some h ∈ L2 not identically zero, (pn , h) = xn h(x) µ(dx) = 0
for all integers n ≥ 0. By hypothesis, for any 0 < δ < 21 δ0 the map x 7→ eδ|x| h(x) is in
L1 (µ). Hence, setting µh := h · dµ, we have that
Z
ch (t) = eitx h(x) µ(dx)
µ

can be extended analytically to the strip H = {z ∈ C : | Im(z)| < 12 δ}. Our assumption
ch (n) (0) = (i)n pn , h) = 0 and so, µ
implies that µ ch (z) ≡ 0. This means that h = 0 µ–a.s.
which is a contradiction.

Example 15.1.14. The Gram–Schmidt orthogonalization process applied to P := {xn :

n ∈ Z+ } in different finite measure spaces in the real line gives complete sequences of or-
2
thonormal polynomials encountered in applied mathematics. For (R, B(R), e−x /2 dx) we
obtain the Hermite polynomials; for (R+ , B(R+ ), e−x dx) we obtain the Laguerre poly-
nomials defined in Exercise 12.17.33; for ([−1, 1], B([−1, 1]), dx) we obtain the Legendre
polynomials; for ((−1, 1), B((−1, 1)), (1 − x2 )−1 dx) we obtained the Chebyshev poly-
nomials.

15.1.2. Fourier transform on integrable functions. By identifying L1 (Rn ) with the

set of complex measures which are absolutely continuous with respect to the Lebesgue
measure, one can extend the definition of Fourier transform to integrable functions in the
obvious way. In Analysis however, it is convenient to define the Fourier transform fb of
f ∈ L1 slightly differently.

Definition 15.1.15. If f ∈ L1 , the Fourier transform of f is the function fb defined by

letting
Z
b
f (y) = f (x)e−2πix·y dx
Rn

Remark 15.1.16. If µf (dx) = f (x) dx, then fb(t) = µ

cf (−2πt).

Theorem 15.1.17. If {f, g} ⊂ L1 , then

Z Z
b
f (y)g(y) dy = f (y)b
g (y) dy
15.1. Fourier transform 465

Proof. Applying Fubini’s theorem, we get

Z Z Z
b
f (y)g(y) dy = f (x)e−2πix·y dx g(y) dy
Z Z
= g(y)e−2πix·y dy f (x) dx
Z
= f (x)b
g (x) dx

Remark 15.1.18. For any positive number a and any vector h we define the dilation by
a, δa , and the translation by h, τh , as the operators mapping any function g(x) into g(ax)
and g(x − h) respectively. It is left as an exercise (see Exercise 15.9.4) to show that the
Fourier transform satisfies
(a’) (e2πix·h f (x))∧ (y) = (τh fb)(y).
(b’) (τh g)∧ (y) = e−2πiy·h fb(y).
(d’) (δa f )∧ (y) = a−n fb(a−1 y)
Theorem 15.1.19. Suppose f ∈ L1 (Rn , λn ). Let A be an invertible linear transformation
on Rn and set fA = f ◦ A. Then,
1
fc
A (y) = fb (A⊺ )−1 y
| det(A)|
In particular, if f is a radial function, so is fb.

Proof. The first statement is a direct application of the change of variables formula for
Lebesgue measure on Rn . For the last statement, recall that f is radial iff f = fU for all
unitary linear transformation U (i.e. U ∈ O(n)). Hence fb(y) = fc b
U (y) = f (U y) for all
b
U ∈ O(n) and so, f is radial.

The following result will be very useful when we sudy regularity properties of the Fourier
transform of integrable functions, as well as of the operations discussed in Section 15.2.
Theorem 15.1.20. Suppose 1 ≤ p < ∞, and let f ∈ Lp (Rn , λn ). Then, the mapping
τ : Rn −→ Lp (Rn , λn ) given by t 7→ τt f = f (· − t) is uniformly continuous.

Proof. We first prove this lemma for continuous functions of compact support. Suppose
that g ∈ C00 (Rn ) and that supp(g) ⊂ B(0, a) then, g is uniformly continuous. Given ε > 0,
by uniform continuity of there is a 0 < δ < a such that |s − t| < δ implies
|g(s) − g(t)| < (λ(B(0, 3a)))−1/p ε.
Hence,
Z
|g(x − t) − g(x − s)|p dx = kτt g − τs gkpp = kτt−s g − gkpp < εp .

Therefore t 7→ τt g is uniformly continuous. For general f ∈ Lp , the conclusion follows from

the density of C00 (Rd ) in Lp .
466 15. Fourier transform and Convolution on Rn

Theorem 15.1.21. (Riemann–Lebesgue’s lemma) The Fourier transform F : f 7→ fb is a

bounded linear transformation from L1 (Rn , λn ) to L∞ (Rn , λn ) with kFk ≤ 1. Moreover, fb
is uniformly continuous and fb(y) → 0 as |t| → ∞ for any f ∈ L1 (Rn , λn ).

Proof. The first statement is clear from the definition of fb. Uniform continuity follows
from
Z
b b
|f (y) − f (s)| ≤ |f (x)||e−2πix·(y−s) − 1| dx
Rn

and dominated convergence. To prove that fb vanishes at infinity, notice that since eπi = −1
then
Z y Z
−2πi x+ ·y y −2πix·y
fb(y) = − f (x)e 2|y|2 dx = − f x − 2|y| 2 e dx.

Hence,
Z
y
2fb(y) = f (x) − f x − 2|y|2
e−2πix·y dx,

whence 2fb(y) ≤ kf − τh f k1 with h = y
2|y|2
. From Theorem 15.1.20 we conclude that
fb(y) → 0 as |y| → ∞.

15.2. Convolution
Definition 15.2.1. Suppose that µ and ν are two complex Borel measures on Rn . The
convolution of µ with ν is the measure Borel measure µ ∗ ν defined by
Z
(µ ∗ ν)(E) = 1E (x + y)(µ ⊗ ν)(dx, dy)
n n
ZR ×R Z
(15.5) = ν(E − x)µ(dx) = µ(E − y)ν(dy).
Rn Rn
(Here, (15.5) follows from Fubini’s theorem. Thus µ ∗ ν = ν ∗ µ.)

The Fourier transform and convolution are linked as follows.

Theorem 15.2.2. If µ and ν are complex Borel measures on Rn then, |µ ∗ ν| ≤ |µ| ∗ |ν|,
kµ ∗ νkT V ≤ kµkT V kνkT V , and
(15.6) µ[
∗ ν(t) = µ
b(t)b
ν (t)

Proof. By definition
Z Z
g d(µ ∗ ν) = g(x + y)(µ ⊗ ν)(dx, dy)
Rn ×Rn

for all g ∈ L1 (µ ⊗ ν). Then, the first two statements follow directly from Radon–Nikodym’s
theorem together with Fubini’s theorem. To obtain (15.6), for each t ∈ Rn , define gt (x) :=
15.2. Convolution 467

eix·t . Applying Fubini’s theorem we obtain that

Z

µ[∗ ν(t) = ei(x+y)·t µ ⊗ ν (dx, dy)
Rn ×Rn
Z Z
ix·t
= e µ(dx) eiy·t ν(dy) = µ
b(t)b
ν (t)
Rn Rn
Remark 15.2.3. From Theorem 10.4.1 we have that the space M(Rn ) of Borel complex
measures on Rn with setwise addition and scalar multiplication is a Banach with respect
the total variation norm (see Remark 10.4.2). From Theorem 15.2.2, it follows that M(Rn )
with convolution as product operation is a Banach algebra with unit δ0 since kδ0 kT V = 1.
Theorem 15.2.4. Suppose µ and ν are positive measures on (Rd , B(Rd )). Then,
supp(ν ∗ ν) = supp(µ) + supp(ν).

Proof. We will first that supp(µ) + supp(ν) ⊂ supp(µ ∗ ν). Let x0 ∈ supp(µ) and y0 ∈
supp(ν). It is enough to show that (µ∗ν) x0 +y0 +U ) > 0 for any open neighborhood U of 0.
Choose an open neighborhood V of 0 such that V + V ⊂ U . Then, 1{x0 +V } (x)1{y0 +V } (y) ≤
1{x0 +y0 +U } (x + y). Integration with respect to µ ⊗ ν gives the desired result, for µ(x0 +
V )ν(y0 + V ) > 0.
To obtain the converse inclusion, suppose that z ∈ supp(µ ∗ ν). Let X = supp(µ) and
Y = supp(ν). Then, for any ε > 0
Z

0 < (µ ∗ ν) B(z; ε) = ν Y ∩ (B(z; ε) − x) µ(dx)
X

This means that for some x ∈ X, ν Y ∩ (B(z; ε) − x) > 0. This in turn implies that
there exists y ∈ Y ∩ (B(z; ε) − x), that is x + y ∈ B(z; ε) ∩ (X + Y ). This shows that
z ∈X +Y.
Example 15.2.5. Suppose that µ is a (positive)P Radon measure on [0, ∞). The renewal
measure associated to µ is defined as U = ∞ n=0 µ ∗n , where µ∗0 := δ . It is left as an
0
exercise to show that U is a Radon measure on R+ when µ(0, ∞) > 0 (see Exercise 15.9.18).
Here we show that (a) supp(U ) = ∪∞ ∗n ), and (b) supp(U ) is closed under addition.
n=0 supp(µ S
∗n
S
To check (a), set Fn := supp(µ ) and let F = Fn . Clearly that F ⊂ F ⊂ supp(U ).
c ∗n c
n
c P ∗n n cn
Each (Fn ) is open in R+ and µ (Fn ) = 0; hence, U (F ) = n µ (F ) = 0. This means
that supp(U ) ⊂ F since supp(U ) is the smallest closed set whose complement has zero U –
measure. To check (b), let x, y ∈ supp(U ) and let W be a ball around 0. Let V another
ball around 0 such that V + V ⊂ W . Then, for some n, m ∈ N, µ∗n (x + V )µ∗m (y + V ) > 0.
Hence µ∗(n+m) (x + y + W ) > 0. this shows that x + y ∈ supp(U ).
Example 15.2.6. Let µ be a complex Borel measure on Rn . For f ∈ Lloc n
1 (R , λn ), consider
the measure νf (dx) := f (x) dx. This defines a linear operator T : f 7−→ νf ∗ µ. It is obvious
that νf ∗ µ ≪ λn and that
Z
d(νf ∗ µ)
(x) = f (x − y)µ(dy)
dλn
468 15. Fourier transform and Convolution on Rn

R
We define (µ ∗ f )(x) := f (x − y) µ(dy) for any f ∈ Lloc
1 (λn ). If f ∈ L1 , then by Fubini’s
theorem we have
kT f k1 ≤ kµkkf k1
where kµk is the total variation of µ. If f ∈ L∞ (Rn ) then
kT f k∞ ≤ kµkkf k∞
Both, Marcinkiewicz and Riesz theorems show that T is of strong–type (p, p) for any p such
that 1 < p < ∞. Furthermore, Riesz theorem gives
kT f kp ≤ kµkkf kp .

From Example 15.2.6 it follows that when νf = f dλn and νg = g dλn then, νf ∗ νg ≪ λn
d(νf ∗νg ) R
and dλ n
(x) = Rn f (x − y) g(y) dy. This leads to the following definition.

Definition 15.2.7. Suppose f, g ∈ Lloc n

1 (R , λn ). The convolution f ∗ g of f and g is the
function defined as
Z Z
(f ∗ g)(x) = f (x − y)g(y) dy = f (y)g(x − y) dy

for all x ∈ Rn for which y 7→ f (x − y)g(y) ∈ L1 .

If f and g are measurable functions in Rn then (x, y) 7→ f (x − y)g(y) is measurable in

R2n . Fubini’s theorem and the translation and reflection invariance of Lebesgue’s measure
on Rn implies that h = f ∗ g is a well define measurable function in Rn and that f ∗ g = g ∗ f .
Lemma 15.2.8. Suppose {f, g} ⊂ Lloc n
1 (R , λn ).

(i) If f ∗ g is defined, then supp(f ∗ g) ⊂ supp(f ) + supp(g).

(ii) If f, g ∈ L1 (Rn , λn ), then f ∗ g ∈ L1 (Rn , λn ), kf ∗ gk1 ≤ kf k1 kgk1 , and
(15.7) f[
∗ g(t) = fb(t)b
g (t)

Then, |f (zε − y)g(y)| > 0 for some y ∈ supp(g). Hence, zε ∈ supp(f ) + supp(g).
(ii) follows by applying Theorem 15.2.2 to the measures νf = f dλn and νg = g dλn , and
by recalling that by definition, fb(t) = µ cf (−2πt) for all f ∈ L1 (Rn , λn ). A more direct
proof may be obtained by direct application of Fubini’s
theorem
R along with the translation
invariance of Lebesgue’s measure, for instance, (f ∗ g)(x) ≤ |f (x − y)||g(y)| dy, and

Z Z
kf ∗ gk1 ≤ |f (x − y)||g(y)| dy dx
Z Z
= |g(y)| |f (x − y)| dx dy = kf k1 kgk1 .
15.2. Convolution 469

Corollary 15.2.9. There is no g ∈ L1 (Rn , λn ) such that f ∗ g = f for all f ∈ L1 (Rn , λn ).

Proof. Suppose there is such g. Then gbfb = fb for all f ∈ L1 (Rn , λn ). Taking f (x) = e−π|x|
2

/ C0 (Rn ) which is a contradiction to Lebesgue–Riemann’s lemma.

gives gb ≡ 1 ∈

Remark 15.2.10. The space L1 (λn ) with the addition operation and scalar product in-
duced by pointwise evaluation, is a complex Banach space. Convolution makes L1 (λn )
a Banach ring, and Corollary 15.2.9 implies that L1 (λn ) is not an algebra. The Radon–
Nikodym theorem shows that the map from L1 (λn ) to the space of Borel complex measures
M(Rn ) given by f 7→ f · λn is an isometry. Hence, by considering L1 (λn ) as a subspace of
M(Rn ), we have that span(L1 (λn ) ∪ {δ0 }) is a Banach algebra with unit δ0 . Indeed, for any
f, g ∈ L1 (Rn ) and a, b ∈ C,
(f + aδ0 ) ∗ (g + bδ0 ) = (f ∗ g + ag + bf ) + abδ0
Z
kf + aδ0 kT V = |f | dλn + |a|δ0 (Rn ) = kf k1 + |a|.
Rn
1 1 1
Theorem 15.2.11. (Young) Let 1 ≤ r, p, q ≤ ∞ satisfy r = p + q − 1. If f ∈ Lp (Rn , λn )
and g ∈ Lq (Rn , λn ), then f ∗ g ∈ Lr (λn ) and
kf ∗ gkr ≤ kf kp kgkq

Proof. For any s ≥ 1, let s′ be its conjugate, that is 1s + s1′ = 1. Since

1 1 1 1 1 1
+ ′ + ′ = + 1− + 1− =1
r q p r q p
then
p ′ 1 1
′
1 ′
1− q =p − q =p 1− q =p
r p r q
q ′ 1 1 1 ′
1− p =q − p′ = q 1 − p =q
r q r p
If 1 < r, p, r < ∞, then by Hölder’s inequality
Z
|(f ∗ g)(x)| ≤ |f (y)|p/r |g(x − y)|q/r |f (y)|1−p/r |g(x − y)|1−q/r dy
Z 1/r Z 1/q′ Z 1/p′
p q (1−p/r)q ′ ′
≤ |f (y)| |g(x − y)| dy |f (y)| dy |g(x − y)|(1−q/r)p dy
1/r ′ ′
= |f |p ∗ |g|q (x) kf kp/q q/p
p kgkq

By Lemma 15.2.8 we conclude that

Z Z
′ ′
|f ∗ g(x)|r dx ≤ |f |p ∗ |g|q (x) dx kf kpr/q
p kgkqr/p
q
′ ′
= kf kpp kgkqq kf kpr/q
p kgkqr/p
q = kf krp kgkrq
470 15. Fourier transform and Convolution on Rn

If r = ∞ and q = p′ , then a direct application of Hölder’s inequality and the symmetric

and translation invariance properties of Lebesgue measure shows that
kf ∗ g(x)| ≤ kf kp kgkq , x ∈ Rn .
Hence kf ∗ gk∞ ≤ kf kp kgkq .

An alternative proof of Theorem 15.2.11 can be obtained by Riesz interpolation.

Proof. Let be g ∈ L1 fixed. The convolution operator Tg : f 7−→ f ∗ g maps L1 + L∞ into

L1 + L∞ and
kTg f k1 = kf ∗ gk1 ≤ kf k1 kgk1 f ∈ L1
kTg f k∞ = kf ∗ gk∞ ≤ kf k∞ kgk1 f ∈ L∞ .
By Riesz interpolation, for any 1 ≤ p ≤ ∞, Tg defines a bounded linear from Lp into itself
with kTg f k ≤ kgk1 kf kp .
If we fix f ∈ Lp , the convolution operator Tf g 7−→ f ∗ g maps L1 + Lp′ into Lp + L∞ and
kTf gkp ≤ kf kp kgk1 , kTf hk∞ ≤ kf kp khkp′ for all f ∈ L1 and h ∈ Lp′ where 1/p + 1/p′ = 1.
The last assertion follows from translation invariance and Hölder’s inequality
kTf hk∞ = kf ∗ hk∞ ≤ kf kp khkp′
By interpolation, Tf maps Lq into Lr for all r and q such that 1/r = (1/p) + (1/q) − 1, and
kTf gkr = kf ∗ gkr ≤ kf kp kgkq for all g ∈ Lq .

Theorem 15.1.20 can be used to obtain some regularity properties of convolution of

functions in conjugate integrable spaces.
Theorem 15.2.12. If 1/p + 1/q = 1, f ∈ Lp (Rn , λn ) and g ∈ Lq (Rn , λn ), then f ∗ g is
uniformly continuous. If 1 < p < ∞ then f ∗ g ∈ C0 (Rn ).

Proof. Without lost of generality, we might assume that 1 ≤ p < ∞. By Hölder’s inequality
and translation invariance of Lebesgue measure we have
Z
|(f ∗ g)(x + h) − (f ∗ g)(x + k)| ≤ |(f (x + h − y) − f (x + k − y)||g(y)| dy

≤ kτ−(k−h) f − f kp kgkq .
Uniform continuity follows directly from Theorem 15.1.20.
To prove the last statement, let {fk } ∪ {gk } ⊂ C00 (Rn ) such that lim kfk − f kp = 0 =
limk kgk − gkq and supp(fk ) ∪ supp(gk ) ⊂ B(0; ak ). Then, fk ∗ gk ∈ C00 (Rn ), supp(fk ∗ gk ) ⊂
B(0; 2ak ) and, by Hölder’s inequality,
kf ∗ g − fk ∗ gk ku ≤ kf − fk kp kgkq + kfk kp kg − gk kq .
We conclude that f ∗ g ∈ C0 and hence, uniformly continuous.
Theorem 15.2.13. Let f ∈ L1 (Rn , λn ). If ϕ ∈ C k (Rn ) and ∂ α ϕ is bounded for all 0 ≤
|α| ≤ k, then f ∗ ϕ ∈ C k and ∂ α (f ∗ ϕ) = f ∗ (∂ α ϕ) = (∂ α ϕ) ∗ f .
15.2. Convolution 471

Proof. Suppose sup{|α|≤k} k∂ α ϕk∞ = M . By the mean value theorem

ϕ(x + hej − y) − ϕ(x − y) ≤ |∂xj ϕ(x + θhej − y)||h| ≤ M |h|

Consequently, by dominated convergence

Z
1 ϕ(x + hej − y) − ϕ(x − y)
f ∗ ϕ(x + hej ) − f ∗ ϕ(x) = f (y) dy
h h
converges to (∂xj ϕ) ∗ f (x) as h → 0. Repeating the same argument proves the result for
any other partial derivative of order 1 ≤ |α| ≤ k.

For the following result we will make use of Stoke’s theorem from differential topology.

Lemma 15.2.14. Suppose f and g are functions in C 1 (Rd ) such that g∂j f and f ∂j g are in
L1 (Rn , λn ). If lim|x|→∞ |x|d−1 f (x)g(x) = 0 then
Z Z
(15.8) f ∂j g = − g∂j f

Proof. Let Br denote the ball of radius r in Rd centered at 0, Sr = ∂Br , σr (du) the
Lebesgue measure on Sr , and u(x) = x/kxk the normal vector outer vector at Sr in the
direction of x. By Stoke’s theorem
Z Z Z
f ∂j g = f guj dσr − g∂j f
Br Sr Br
Z Z
= rd−1 f (ru)g(ru) σ1 (du) − g∂j f,
S1 Br

The conclusion follows by dominated convergence.

Remark 15.2.15. In the setting of probability theory, if X and Y are independent random
vectors in Rn defined on a common probability space. Then the law of X + Y is the
convolution of the laws of X and Y .

Example 15.2.16. Let X, Y be a pair of independent random variables with uniform

distribution over (−c, c). The law of Z = X + Y is called tent distribution, and is given
by Tc (dx) = µc/2 ∗ µc/2 , where µc/2 (dx) = 1c 1(−c/2,c/2) (x) dx. From
Z Z c/2 Z c/2 Z c/2 Z u+c/2
c2 f (u) Tc (du) = f (x + y) dx dy = f (v) dv du
−c/2 −c/2 −c/2 u−c/2
Z cZ c/2 Z 0 Z −c/2 Z c
= f (v) du dv + f (v) du dv = f (v)(c − |v|) dv,
0 v−c/2 −c v+c/2 −c

we conclude that Tc (dx) = 1
c 1 − xc dx.
+
472 15. Fourier transform and Convolution on Rn

15.2.1. Convolution of distributions and test functions. Using the notion of convo-
lution of a locally integrable function with a complex measure introduced in Example 15.2.6,
we show how to define convolution of distributions and test functions.
Suppose u ∈ D∗ (Rn ). Since the maps φ 7→ τx φ and φ 7→ φ̃, where τx φ(y) = φ(y − x)
and φ̃(y) = φ(−y), are continuous maps from D(Rn ) into itself, it follows that
τx u(φ) := u(τ−x φ)
u ∗ φ(x) : = u(τx φ̃), φ ∈ D(Rn )
are well Rdefined distribution for each x ∈ Rn . Recall that for any complex measure µ,
uµ (φ) = φ dµ. For each x ∈ Rn define the measure (τx µ)(A − x). Then
Z Z
τx uµ (φ) = τ−x φ dµ = φ(y + x)µ(dy) = uτx µ (φ)
Z Z
(uµ ∗ φ)(x) = τx φ̃(y) µ(dy) = φ(x − y) µ(dy) = µ ∗ φ(x)

Lemma 15.2.17. Let Ω ⊂ Rn be an open set. For any φ ∈ D(Ω) and 1 ≤ j ≤ n,

τ0 − τhej h→0
φ −−−→ ∂xj φ in D(Ω)
h
Proof. The mean value theorem implies that every ψ ∈ D(Ω) is Lipschitz. Another appli-
cation of the mean value theorem show that there is a constant C = C(φ, α) such that
∂ α φ(x) − ∂ α φ(x − he )
j
− ∂xj ∂ α φ(x) = |∂xj ∂ α φ(x − tθh) − ∂xj ∂ α φ(x)|
h
h→0
≤ C|h| −−−→ 0
where θ = θ(h, x) ∈ (0, 1).
Theorem 15.2.18. Suppose u ∈ D∗ (Rn ), and let φ, ψ ∈ D(Rn ).
(i) For any x ∈ Rn , τx (u ∗ φ) = (τx u) ∗ φ = u ∗ (τx φ).
(ii) u ∗ φ ∈ C ∞ (Rn ), and
∂ α (u ∗ φ) = (Dα u) ∗ φ = u ∗ (∂ α φ)
for any α ∈ Zn+ .
(iii) u ∗ (φ ∗ ψ) = (u ∗ φ) ∗ ψ.

Proof. (i) Notice that τy τ−x = τy−x and τg x φ(z) = τx φ(−z) = φ(−z − x) = φ̃(z + x) =
τ−x φ̃(z). Consequently, for any y ∈ R and x ∈ Rn
n

τx (u ∗ φ) (y) = u ∗ φ(y − x) = u(τy−x φ̃)

(τx u ∗ φ (y) = (τx u)(τy φ̃) = u(τy−x φ̃)

u ∗ (τx φ) (y) = u(τy τg
x φ) = u(τy−x φ̃).
15.3. Approximation to the identity 473

(ii) For fixed x ∈ Rn , τx φ̃(y) = φ(x − y) and so,

∂ α (τx φ̃)(y) = (−1)|α| (∂ α φ)(x − y) = (−1)|α| τx ∂g
α φ (y),

that is,

(15.9) τx ∂g
α φ = (−1)|α| ∂ α (τ φ̃).
x

Applying u on both sides of (15.9) gives

u ∗ (∂ α φ) (x) = (−1)|α| u ∂ α (τx (φ̃)) = (Dα u)(τx (φ̃)) = (Dα u) ∗ φ (x)
By part (i)
(u ∗ φ)(x − hej ) − u(x) τ0 − τ
hej
=u∗ φ (x)
h h
τ0 −τhej
Set ηh := h . Then, by Lemma 15.2.17
h→0
τx ηg −−→ τx ∂g
hφ − xj φ in D(Rn )
Therefore, ∂xj (u ∗ φ) = u ∗ (∂xj φ). Iterating this argument shows that u ∗ φ ∈ C ∞ (Rn ) and
∂ α (u ∗ φ) = u ∗ (∂ α φ)

(iii) Let K1 = supp(ψ). Notice that

Z Z
φ]
∗ ψ(z) = ψ̃(y)φ̃(z − y) dy = ψ̃(y)(τy φ̃)(z) dy
−K1 −K1

By Theorem 14.4.3 and Example 14.4.4, for any x ∈ Rn

Z
]
u ∗ (φ ∗ ψ) (x) = (τ−x u)(φ ∗ ψ) = (τ−x u) ψ̃(y) τy φ̃ dy
− K1
Z Z

= ψ̃(y) τ−x u (τy φ̃) dy = ψ(−y) u ∗ φ (x + y) dy
−K Rn
Z 1

= ψ(x − y) u ∗ φ)(y) dy = (u ∗ φ) ∗ ψ (x)
Rn

15.3. Approximation to the identity

A family {Kε : ε > 0} of kernels in Rn is called and approximation to the identity for
a space L of functions f if f ∗ Kε → f in some sense.

Consider a collection {Kε : ε > 0} ⊂ L1 (Rn , λn ) that satisfy the following properties:
R
(i) Rn Kε (x) dx = a for all ε > 0.
(ii) supε>0 kKε k1 < ∞.
R
(iii) |x|>δ |Kε (x)| dx → 0 as ε → 0.
474 15. Fourier transform and Convolution on Rn

Theorem 15.3.1. Suppose {Kε : ε > 0} ⊂ L1 (Rn , λn ) satisfy (i)–(iii) above. Then, for
any f ∈ Lp (Rn , λn ), 1 ≤ p < ∞,
(15.10) lim kf ∗ Kε − a f kp = 0.
ε→0

If f ∈ L∞ (Rn , λn ) is continuous at a point x, then limε→0 f ∗ Kε (x) = f (x). If f is bounded

and uniformly continuous, then f ∗ Kε converges to f uniformly as ε → 0.

Proof. Let M = supε>0 kKε k1 . If f ∈ Lp (Rn , λn ), 1 ≤ p < ∞, we have that

Z Z p 1/p
kf ∗ Kε − af kp ≤ |f (x − y) − f (x)||Kε (y)| dy dx
n
Z Z R 1/p
≤ |f (x − y) − f (x)|p dx |Kε (y)| dy
d
ZR
= kτy f − f kp |Kε (y)| dy.

Theorem 15.1.20 along with assumption (ii) implies that for any η > 0, there exists δ > 0
′
such MR kτy f − f kp < η/2 whenever |y| ≤ δ. ′ By assumption (iii), for some ε > 0 we have
2kf kp |y|>δ |Kε (x)| dx < η/2 whenever ε < ε . Combining these facts, we obtain
Z
kf ∗ Kε − af kp ≤ kτy f − f kp |Kε (y)| dy
|y|≤δ
Z
η η
+ kτy f − f kp |Kε (y)| dy ≤ +
|y|>δ 2 2
whenever 0 < ε < ε′ .

The second statement follows similarly. Let η > 0 be fixed. If f is continuous at x, then for
η
some δ > 0, R|x − u| ≤ δ implies that |f (x) − f (u)| < 2M . For such δ > 0, there is ε′ > 0 such
η ′
that 2kf k∞ |x|>δ |Kε (x)| dx < 2 whenever 0 < ε < ε . Putting these statements together
gives
Z
|f ∗ Kε (x) − af (x)| ≤ |f (x − y) − f (x)||Kε |(y) dy
Z Z
η η
(15.11) ≤ + |f (x − y) − f (x)||Kε |(y) dy ≤ +
|y|≤δ |y|>δ 2 2
If f is bounded and uniformly continuous, then δ > 0 can be chosen so that
η
sup |f (u) − f (u)| < .
|v−u|<δ 2M

Then (15.11) holds uniformly.

A family {Kε : ε > 0} ⊂ L1 (Rn , λn ) satisfying properties (i)–(iii) is said to be a collection

of good kernels. A common construction of good kernels is obtained by renormalization
15.3. Approximation to the identity 475

of integrable functions. For any φ ∈ L1 (Rn , λn ), define φε (x) = ε−n φ(ε−1 x). It is an simple
exercise to show that {φε : ε > 0} satisfies (i) and (ii); as for (iii),
Z Z
−n −1 ε→0
ε |φ(ε x)| dx = |φ(u)| du −−−→ 0
{|x|>δ} {|u|> δε }

Example 15.3.2. (Mollification) Let U ⊂ Rn be an open set and let f be a function that
is locally integrable in U ; that is f ∈ L1 (V ) for any compact subset V of U . For any ε > 0,
let Uε = {x R∈ U : d(x, ∂U ) > ε}. Let η be a nonnegative function D(Rn ) with support in
B(0; 1) and η(x) dx = 1, and define ηε (x) = ε−n η(ε−1 x) for all ε > 0. The mollification
of f by η is defined as
Z Z
ε
f (x) := ηε (x − y)f (y) dy = ηε (x − y)f (y) dy, x ∈ Uε
U B(x;ε)

R
Lemma 15.3.3. Let η ≥ 0 be a mollifier with support in B(0; 1) and η(x) dx = 1. Suppose
f ∈ Lloc
1 (U ) and let fε be its mollification by η. Then,

(i) ηε ∗ f ∈ C ∞ (Uε ), and ∂ α f ε (x) = ∂ α ηε ∗ f (x) for all x ∈ Uε and α ∈ Zn+ .
(ii) f ε converges to f a.s. in U as ε → 0.
(iii) If f ∈ C(U ) then the convergence in (ii) is uniform in compact subsets of U .
(iv) Suppose 1 ≤ p < ∞ and f ∈ Lploc (U ). For any relatively compact set V ⊂ V ⊂ U ,

ε→0
kf ε − f kLp (V ) −−−→ 0

Proof. (i) Fix ε > 0 and let x ∈ Uε . Then, there is δ > 0 such that

B(x; ε) ⊂ B(x; ε + δ) ⊂ B(x; ε + δ) =: V ⊂ U

It follows that f ε is well defined on B(x; ε + δ/2), for if |h| ≤ δ/2

Z x + h − y
−n
(ηε ∗ f )(x + h) = ε η f (y) dy
U ∩B(x+h;ε) ε
Z x + h − y
= ε−n η (1V f )(y) dy
B(x+h;ε) ε
Z y
−n
=ε η (1V f )(x + h − y) dy = ηε ∗ (1V f ) (x + h)
B(0;ε) ε

Conclusion (i) follows from Theorem 15.2.13.

476 15. Fourier transform and Convolution on Rn

(ii) For x ∈ U , there is ε0 > 0 such that x ∈ Uε for all 0 < ε ≤ ε0 . Let C = kηk∞ ωn . By
Theorem 11.1.7
Z
ε
|f (x) − f (x)| = ηε (x − y) f (y) − f (x) dy
B(x;ε)
Z
1 x − y
≤ n η |f (y) − f (x)| dy
ε B(x;ε) ε
Z
1 ε→0
(15.12) ≤C |f (y) − f (x)| dy −−−→ 0
λn (B(x; ε) B(x;ε)
ε→0
whenever x is a Lebesgue point of f . Hence, f ε −−−→ f a.s. in U .

(iii) If V is a relatively compact subset of U then there is another relatively compact set
W with V ⊂ W ⊂ W ⊂ U . The function f is uniformly continuous on W and so, the limit
in (15.12) is uniform in x ∈ V .

(iv) Let W relatively compact such that V ⊂ W ⊂ W ⊂ U . For all ε > 0 small enough
W ⊂ Uε . By assumption 1W f ∈ Lp (Rn ), and for any x ∈ V ,
Z Z
ε

f (x) = ηε (x − y)f (y) dy = ηε (x − y) 1W f (y) dy = ηε ∗ 1W f (x)
B(x;ε)

As {ηε : ε > 0} ⊂ L1 (Rn , λn ) defines a god kernel

ε→0
kf ε − f kLp (V ) ≤ kηε ∗ 1W f − 1W f kLp (Rn ) −−−→ 0

Theorem 15.3.4. For any 1 ≤ p < ∞, D(Rn ) is dense in Lp (Rn , λd ).

Proof.
R Fix 1 ≤ p < ∞ and let f ∈ Lp (Rn ). Let η ∈ D(Rn ) be a mollifier such that
η(x) dx = 1, and define ηε (x) = εn η(ε−1 x). Given δ > 0, there is g ∈ C00 (Rn ) such
that kf − gk < 2δ . Since supp(ηε ∗ g) ⊂ B(0; ε) + supp(g), {ηε ∗ g : ε > 0} ⊂ D(Rn ) by
ε→0
Theorem 15.2.13. It follows from Theorem 15.3.1 that kηε ∗ g − gkp −−−→ 0. Hence, for all
ε > 0 small enough we have that kg ∗ ϕε − gkp < 2δ , and

kf − g ∗ ηε kp ≤ kf − gkp + kg − g ∗ ηε kp < δ.

This shows that D(Rn ) is dense in Lp (Rn ).

The following two classical examples are very important and will be used in the analysis
of the invertibility of the Fourier transform.

Example 15.3.5. (Poisson kernel on Rn ) Consider

1
P (x) = cn
(1 + |x|2 )(n+1)/2
15.3. Approximation to the identity 477

where cn = Γ( n+12 )π
−(n+1)/2 . Integration in polar coordinates followed by the change of

variable r = tan θ gives

Z Z∞ Zπ/2
1 rn−1
dx = σn−1 dr = σn−1 sinn−1 θ dθ
Rn (1 + |x|2 )(n+1)/2 (1 + r2 )(n+1)/2
0 0
1 π (n+1)/2
= σn =
2 Γ[(n + 1)/2]
Observe that P satisfies the condition in Theorem 15.3.9. Thus, the family of kernels
Pε (x) = cn (ε2 +|x|2ε)(n+1)/2 is an approximation to the identity.

Example 15.3.6. In this example we show that the Poisson kernel Pε introduced in Ex-
ample 15.3.5 is related to the function ρ(x) = e−2π|x| through the identity ρb(y) = P1 (y).
Using the inverse Fourier transform of the Cauchy distribution in R and applying Fubini’s
theorem we obtain that for β > 0,
Z∞ Z∞ Z∞
−β 2 cos βx 2 cos(βx) 2
e = dx = e−u(1+x ) du dx
π(1 + x2 ) π
0 0 0
Z∞ Z∞
2e−u 2
= e−ux cos(βx) dx du
π
0 0
Z∞ Z∞ Z∞
e−u −ux2 −iβx
e−u −β2
= e e dx du = √ e 4u du
π πu
0 −∞ 0

Replacing β with 2πkxk in place of β leads to

Z Z
Z∞ e−u
π 2 kxk2
−2πix·y −2πkxk
e e dx = √ e− u du e−2πix·y dx
Rn Rn πu
0
Z∞ Z Z∞
e−u −
π 2 kxk2
−2πix·y e−u u n2 −ukyk2
= √ e u e dx du = √ e du
πu Rn πu π
0 0
Z∞
1 n−1 2 Γ[(n + 1)/2] 1
= n+1 u 2 e−u(1+kyk ) du = n+1 n+1 .
π 2 π 2 (1 + kyk2 ) 2
0
kxk
As a consequence ρε (x) = ε−n e−2π ε defines a good kernel.

Example 15.3.7. (Gauss–Weierstrass) The function

1 2
W (x) = n/2
e−|y| /2
(2π)
478 15. Fourier transform and Convolution on Rn

R
satisfies the conditions of Theorems 15.3.9 and Rn W (x) dx = 1. Hence, the collection of
2 2
functions Wε (x) = (2πε12 )n/2 e−|y| /2ε is an approximation to the identity.

The Poisson and the Gaussian kernels given above are radial, that is, they come from
renormalization of integrable radial functions. A large class of good kernels {Kε : ε > 0}
found in applications can be dominated by a familly of radial kernels, and it is possible to
obtain a.s. convergence results.

Lemma 15.3.8. Let µ be either complex measure or a σ–finite measure on B(Rn ). Suppose
ψ ∈ L1 (Rn , λn ) is a nonnegative decreasing radial function. Then, for any x ∈ Rn
|ψ ∗ µ(x)| ≤ Mµ (x)kψk1 .
In particular, if f ∈ Lloc n
1 (R , λn ) then,

|ψ ∗ f (x)| ≤ Mf (x)kψk1 .

Proof. Fix x ∈ Rn and let µx be the measure given by µx (A) = µ(A + x). If E = {(y, t) ∈
Rn × [0, ∞) : ψ(y) > t} then, by assumption on ψ, E t = {y : ψ(y) > t} is a ball around the
origin. Fubini’s theorem implies that
Z Z Z ∞
|ψ ∗ µ(x)| ≤ ψ(y) |µx |(dy) = 1E dt d|µx |
n Rn 0
ZR∞ Z ∞
= |µx |(ψ > t) dt ≤ Mµx (0) λ(ψ > t) dt
0 0
= Mµ (x)kψk1 .

Theorem
R 15.3.9. Let {Kε : ε > 0} be a family of good kernels in Rn such that a =
Rn Kε (x) dx. Suppose φ0 is a nonnegative, decreasing function in [0, ∞) such that

(15.13) |Kε (x)| ≤ ψε (x) := ε−n ψ(ε−1 x)

where ψ(·) = φ0 (k · k) ∈ L1 (Rn , λn ). If f ∈ Lp , 1 ≤ p < ∞ then
(i) lim (f ∗ Kε )(x) = af (x) whenever x is a Lebesgue point of f ,
ε→0
(ii) sup |(f ∗ Kε )(x)| ≤ kψk1 Mf (x), where Mf is Hardy’s maximal function at f .
ε>0

1
R
Proof. Let Tr f (x) = λn (B(x;r)) B(0;r) |f (x − y) − f (x)| dy. Since f ∈ Lp , then Mf (x) < ∞
and limr→0 Tr f (x) = 0 at every Lebesgue point x of f . Let x be such a point. (i) Since ψ
is a λn –integrable nonnegative radial function, it follows that
Z Z
ψ(x) dx ≥ ψ(x) dx
Rn r/2≤|x|≤r
Z r Z
2n − 1 n
(15.14) = φ0 (s)sn−1 σn−1 (du) ds ≥ ωn r φ0 (r).
r/2 S n−1 2n
15.3. Approximation to the identity 479

For any ε > 0,

Z
|f ∗ Kε (x) − a f (x)| ≤ |f (x − y) − f (x)||Kε (y)| dy
Rn
Z
≤ |f (x − y) − f (x)|ψε (y) dy
Rn
X Z
(15.15) = |f (x − y) − f (x)|ψε (y) dy
k k+1 ε
k∈Z 2 ε<|y|≤2

Let Ik denote the k–th term in the sum (15.15). Since ψ is nonincreasing, we have
Z Z !
1
Ik ≤ c n |f (x − y) − f (x)| ψ(z) dz dy
(ωn 2k+1 ε)n 2k ε<|y|≤2k+1 ε |y| |y|
<|z|≤ ε
2ε

1 Z Z
≤ cn |f (x − y) − f (x)| dy ψ(z) dz
ωn (2k+1 ε)n |y|≤2k+1 ε 2k−1 <|z|≤2k+1
Z

≤ cn T2k+1 ε f (x) ψ(z) dz.
2k−1 <|z|≤2k+1
PR
Since 2k−1 <|z|≤2k+1 ψ(z) dz ≤ 2kψk1 < ∞, for any ε1 > 0 there is K0 big enough so that
k∈Z

X Z
ψ(z) dz < ε1 .
|k|>K0 2k−1 <|z|≤2k+1

Putting all these together gives

X X X
|f ∗ Kε (x) − a f (x)| ≤ Ik ≤ Ik + Ik
k∈Z |k|≤K0 |k|>K0
X
≤ cn kψk1 T2k+1 ε f (x) + cn (Mf (x) + |f (x)|)ε1 .
|k|≤K0

Statement (i) follows by letting ε → 0 and then ε1 → 0.

(ii) By Lemma 15.3.8 we have that

|f ∗ Kε (x)| ≤ |f | ∗ ψε (x) ≤ Mf (x)kψk1 .

In many situations, the radial function ψ in Theorem 15.3.9 is of the form

1
ψ(x) = A 1|x|≤1 + 1
|x|n+α |x|>1

where A and α are positive constants.

480 15. Fourier transform and Convolution on Rn

15.4. Fourier series

The results on approximations to the identity presented thus far can be used to study
functions in the unit circle S1 . Any functionR on S1 mayR be considered as a 2π–periodic
function in R. This, and the obvious fact that [−π,pi] f = Q f for any 2π–periodic function
R
f and interval Q of length 2π [−π,π] will allow us to use the real variable methods developed
in previous sections to study (periodic) integral operators of functions on S1 .
If f ∈ L1 (S1 ), then its Fourier series is formally defined as
X
(15.16) f (x) ∼ cn ei nx
n∈Z

1
R 2π −i ny dy. For functions in L (S1 ), the Fourier series (15.16) has a
where cn = 2π 0 f (y)e 2
precise geometrical interpretation.

Theorem 15.4.1. Suppose f ∈ L2 (S1 ). For each N ∈ N define

X
SN f (x) = cn ei nx , cn = hf, en i.
|n|≤N
P
Then, kf k2L2 (S1 ) = n∈Z |cn |
2 and lim kf − SN k2 = 0.
N →∞

P i nx
Conversely, if a ∈ L2 (Z), then SN (x) = |n|≤N cn e converges to a function f ∈ L2 (S1 )
whose n–th Fourier coefficient is cn .

1
Rπ
Proof. The space L2 (S1 ) is a Hilbert space with inner product hf, gi = 2π −π f (x)g(x) dx.
The sequence E = {en : n ∈ Z}, where e(x) = e i nx is an orthogonal collection in L2 (S1 )
1
that separates points of S . On the other hand, ej ek = ej+k , ek = e−k and e0 = 1. By
the Stone–Weierstrass theorem the algebra A generated by E is dense in C(S1 ), and so A is
dense in L2 (S1 ). The first conclusion follows from Parseval’s theorem.
P
Conversely, as kSn − Sm k2L2 (S1 ) = m 2
j=n+1 |cj | for all n < m, it follows that Sn is Cauchy
in L2 (S1 ). Hence, Sn converges to some f ∈ L2 (S1 ) and cn = hf, en i for all n ∈ Z.

Example 15.4.2. (sawtooth function) Let f be the 2π–periodic piecewise smooth odd
function defined as f (0) = 0 and f (x) = 21 (π − x) for all 0 < x < 2π. Then f ∈ L2 (S1 ) and
its Fourier series is given by
∞
1 X einx X sin nx
f (x) ∼ =
2i n n
|n|≥1 n=1

Hence Sn f converges to f in L2 (S1 ). As a consequence of Parseval’s theorem, we obtain

π2 2
P∞ 1
6 = 2kf kL2 (S1 ) = n=1 n2 .
15.4. Fourier series 481

We will study the convergence of Fourier series by studying a particular kernel operator.
For each n ∈ Z+ consider the sum

X e−inx − ei(n+1)x sin (n + 12 )x
ikx
Dn (x) = e = = .
1 − eix sin(x/2)
|k|≤n
1
The n–th Dirichlet kernel in the unit ball is given by 2π 1[−π,π] Dn . If f ∈ L1 (S1 ) then,
from the periodicity of Dn and f , we have that
X Z π
ikx 1
Sn f (x) = cn e = Dn (x − y)f (y) dy
2π −π
|k|≤n
Z π Z π
1 1
= Dn (y)f (x − y) dy = Dn (y)f (x + y) dy.
2π −π 2π −π
Notice that f˜ = f 1[−2π,2π] ∈ L1 (R). For any |x| ≤ π we have [x − π, x + π] ⊂ [−2π, 2π];
thus,
Z Z π
e iky
f (x − y)e 1[−π,π] (y) dy = f (x − y)eiky dy
R −π
Z π Z π
(15.17) =e ikx
f (y)e −iky
dy = e ikx
f˜(y)e−iky dy
−π −π
1 e

for all k ∈ N. Hence, Sn f (x) = 2π f ∗ 1[−π,π] Dn (x) for all |x| ≤ π and n ∈ Z+ .
1
Rπ
Notice that for each n ∈ N, 2π π Dn (x) dx = 1. As | sin(t)| ≤ |t|,
Z π Z π | sin (n + 1 )y |
Z (n+ 1 )π
2
2 | sin t|
|Dn (y)| dy ≥ dy = 2 dt
−π −π |y| 0 |t|
Z nπ X Z (k+1)π | sin t|
n−1
| sin t|
≥4 dy ≥ dt
π |t| kπ |t|
k=1
n−1
X
8 1 8
≥ ≥ log n
π k+1 π
k=1
From this, we conclude that the Dirichlet kernels do not a constitute a family of good
kernels. This in turn suggests that the convergence of Fourier series is intricate, and may
even fail for continuous functions.
The following result is the analogous of Theorem 15.1.20 for integrable functions on S1 .
Lemma 15.4.3. Suppose 1 ≤ p < ∞. The mapping τ : S1 −→ Lp (S1 ) given by h 7→ f (·−h)
is uniformly continuous.
1
Proof. Since kf kLp (S1 ) = 2π k1[0,2π] f kLp (R) for all f ∈ Lp (S1 ), it suffices to estimate Lp (R)–
norm of 1[0,2π] (f − τh f ). Notice that

(15.18) 1[0,2π] |f − τh f | ≤ |fe − τh fe| + |1[0,2π] − 1[h,2π+h] ||τh f |

482 15. Fourier transform and Convolution on Rn

h→0
By Theorem 15.1.20, kfe − τh fekLp (R) −−−→ 0. For the second term the right hand side
of (15.18) we have
Z p Z
h→0
1[0,2π] (y) − 1[h,2π+h] (y) f (y − h) dy = 2 1[−|h||h|] |f (y)|p dy −−−→ 0
R R
Consequently, lim kf − τh f kLp (S1 ) = 0.
h→0
1
Rπ
Theorem 15.4.4. (Riemann–Lebesgue) If f ∈ L1 (S1 ) and cn (f ) := 2π −π f (y)e−iny dy
then, lim cn (f ) = 0.
|n|→∞

Proof. Suppose f ∈ L1 (S1 ). Since

Z π Z π Z π
−iny π
−i n(y− n ) π −iny
f (y)e dy = − f (y)e dy = − f y+ e dy
−π −π −π n
we have that
Z
1 π
π −iny
cn (f ) = f (y) − f y + e dy.
2π −π n
|n|→∞
Therefore, |cn (f )| ≤ kf − τ nπ f kL1 (S1 ) −−−−→ 0.

We will use The Riemann–Lebesgue theorem to address the problem of pointwise con-
vergence of the Fourier partial sums Sn of (15.16). From
Z π Z π
1 1
Sn f (x) = Dn (y)f (x − y) dy = Dn (y)f (x + y) dy
2π −π 2π −π
we obtain that
Z π
1 f (x − y) + f (x + y) − 2f (x) sin (n + 21 )y
(15.19) Sn (f ) − f (y) = dy
2π −π 2 sin(y/2)
Z π
1 f (x − y) + f (x + y) − 2f (x)
(15.20) = cos(ny) dy
2π −π 2
Z π
1 f (x − y) + f (x + y) − 2f (x)
(15.21) + cot(y/2) sin(ny) dy
2π −π 2
The term (15.20) goes to zero as n → ∞ by Riemann–Lebesgue Theorem. Convergence of
the second term (15.21) provides a criteria of convergence of Sn f .
Theorem 15.4.5. (Dini’s test) Suppose f ∈ L1 (S1 ). If the integral
Z π
(15.22) |f (x − y) + f (x + y) − 2f (x)| cot(y/2) dy < ∞
0
at a point x ∈ [−π, π] then, limn Sn f (x) = f (x).

Proof. The conclusion follows identity (15.19) and an application of Riemann–Lebesgue’s

theorem applied to the odd function
gx (y) = (f (x − y) + f (x + y) − 2f (x)) cot(y/2).
15.4. Fourier series 483

2 sin t
Since π ≤ t ≤ 1 for |t| ≤ π2 , condition (15.22) holds whenever
Z π
f (x − y) + f (x + y) − 2f (x)
(15.23) dy < ∞
y
0
Suppose f has a jump discontinuity at x. Modifying the value of f at on sets of measure
zero does not change the value of Fourier coefficients so, we set f (x) = f (x−)+f
2
(x+)
. Hence,
if
Z π Z π
f (x − y) − f (x−) f (x + y) − f (x+)
(15.24) dy < ∞ and dy < ∞,
y y
0 0
f (x−)+f (x+)
limn→∞ Sn f (x) = 2 . We have the following result.
Corollary 15.4.6. If f is piecewise differentiable on S then,
f (x−) + f (x+)
lim Sn f (x) =
n→∞ 2
for all x ∈ [−π, π].

Proof. Being piecewise continuous, f is integrable. As the number of discontinuities of f is

finite, f may be modified so that f (x) = f (x−)+f2
(x+)
for all x ∈ [−π, π]. By assumption, the
number of discontinuities of f is also finite, the limits f+′ (x) = limh→0+ f (x+h)−f
′
h
(x+)
and
′ f (x−h)−f (x−) ′ ′
f− (x) = limh→0+ −h exits for all x ∈ [−π, π], and f− (x) = f+ (x) for all x with the
exception of a finite set of points. Consequently, the integrability conditions (15.24) hold
and the conclusion follows.
Example 15.4.7. Let f be the sawtooth functon defined in Example 15.4.2. Being a
2π–periodic piecewise smooth function, we have that
∞
1 X einx X sin nx
f (x) = =
2i n n
|n|≥1 n=1
Pn Rx
for all x. It is easy to check that Sn f (x) = k=1 sinkkx = 21 0 (Dn (t) − 1) dt. We now
compute the maximum of Sn f closer to the discontinuity point 0. This can be seen to be
achieved at the smallest positive solution of the equation (Sn f )′ (x) = 12 (Dn (x) − 1) = 0,
namely x = n+1π
. Since sin(t) ∼
= t as t → 0
Z π ! Z π
π 1 n+1 sin (n + 12 )t n+1 sin (n +
1
)t π
Sn f = 1 − 1 dt ∼= 2
dt −
n+1 2 0 sin( 2 t) 0 t 2(n + 1)
Z 2n+1 π Z π
2(n+1) sin t π n→∞ sin t
= dt − −−−→ dt
0 t 2(n + 1) 0 t
Notice that f (0+) = π2 = −f (0−). So, Sn f tends to overshoot the graph of f at the point
Rπ
discontinuity x = 0 by 0 sint t dt − π2 ≈ 0.28114 which is about 8.94% of the length of the
jump △f (0) = π. This overshoot phenomenon shown in this example is known as Gibbs
phenomenon. This overshot is caused the distribution of mass of the Dirichlet kernel,
484 15. Fourier transform and Convolution on Rn

which causes ripples to form along the graph of f around the point of discontinuity evenly
in both directions. The overshoot is about 9% of the length of the jump in both directions.

Although the Dirichlet kernel is not within the class of good kernels, its Cesàro and
Abel sums are well behaved. For each n ∈ N, consider the averages
n−1
1X 1 1 − cos(ny) sin2 (ny/2)
Kn (y) = Dk (y) = =
n
k=0
n 1 − cos(y) n sin2 (y/2)
with the convention that sin(t)/t = 1 at t = 0. The n–th Fejér kernelP in the unit circle
1
is defined as 2π 1[−π,π] Kn . The Cesàro sum of the Fourier series f ∼ n∈Z cn (f )einx of a
function f ∈ L1 (S1 ) is given by
n−1 Z π
1X
σn f (x) := Sk f (x) = f (y)Kn (x − y) dy
n −π
k=0
1 e

From (15.17) we have that σn f (x) = 2π f ∗ 1[−π,π] Kn (x) for all |x| ≤ π, where fe =
f 1[−2π,2π] .
Theorem 15.4.8. Suppose f ∈ Lp (S1 ) where 1 ≤ p < ∞. Then, σn f converges to f as
n → ∞ in Lp (S1 ) and pointwise at every Lebesgue point of f . In particular, if f ∈ Lp (S1 )
is such that cn (f ) = 0 for all n, then f ≡ 0 a.s.

Proof. We claim that the Fejér kernels form a family of good kernels. First, for each n ∈ N
Kn ≥ 0, and
Z π
1
Kn (y) dy = 1.
2π −π
If 0 < δ < |y| ≤ π then Kn (y) ≤ n sin21(δ/2) ; hence,
Z
1 2(π − δ)
lim Kn (y) dy ≤ lim = 0.
n→∞ 2π δ<|y|≤π n→∞ 2πn sin2 (δ/2)

This proves the claim and so, for any f ∈ Lp (S1 )

1 n→∞
kσn f − f kLp (S1 ) ≤ 2π 1[−π,π] Kn ∗ fe − fe −−−→ 0
p

We now prove a.s. pointwise convergence. For any |x| ≤ π,

1 x2 /4 sin2 (nx/2) 2 nπ
Kn (x) = n ≤ ,
2π 2πn sin2 (x/2) n2 (x2 /4) 8
and
1 x2 /4 4 π 1
Kn (x) ≤ 2 2
≤ .
2π 2nπ sin (x/2) x 2n x2
Define
π π 1
ψ(x) = 1 + 1
8 |x|≤1 2 x2 |x|>1
15.4. Fourier series 485

The, ψ is a nonnegative integrable radial integrable function for which

1
Kn (x)1[−π,π] (x) ≤ ψ1/n (x) := nψ(nx)
2π
Pointwise a.s. convergence is a consequence of Theorem 15.3.9. The last statement follows
immediately.
P
The Abel sum of the Fourier series f ∼ n∈Z cn (f )einx is given by
X
Ar (x) = r|n| cn (f )einx , 0 < r < 1.
n∈Z

As with Cesáro sums, Abel sums can be expressed in tems of a family of convolution
operators. For each 0 < r < 1 consider the function
1 − r2
Pr (x) = .
1 − 2r cos(x) + r2
It is easy to check that
X 1 + z 1 − |z|2
Pr (x) = r|n| eixn = Re = ,
1−z |1 − z|2
n∈Z

where z = reix and |x| ≤ π. The r–th Poisson kernel in the unit disc is defined as
1 1
2π Pr 1[−π,π] . It is easy to see that for any f ∈ L1 (S ),
X Z π Z π
1 1
cn r|n| ei nx = f (y)Pr (x − y) dy = Pr (y)f (x − y) dy
2π −π 2π −π
n∈Z

1 e

Hence, Ar f (x) = 2π f ∗ 1[−π,π] Pr (x) for all |x| ≤ π and 0 < r < 1, where fe = 1[−2π,2π] f .

Theorem 15.4.9. Suppose f ∈ Lp (S1 ) where 1 ≤ p < ∞. Then, Ar f converges to f as

r ր 1 in Lp (S1 ) and pointwise at every Lebesgue point of f . In particular, if f ∈ Lp (S1 ) is
such that cn (f ) = 0 for all n, then f ≡ 0 a.s.
n o
1
Proof. We claim that 2π Pr 1[−π,π] : 0 < r < 1 is a family of good kernels as r → 1−.
Indeed, since |1 − |z|| ≤ |1 − z|,
1 2
0≤ Pr (x) ≤ 0 < r < 1.
2π 1−r
For all 0 < r < 1
Z π
1
Pr (x) dx = 1
2π −π

For 0 < η ≤ |x| ≤ π, cos(x) ≤ cos(η) and so,

1 − r2
Pr (x) ≤ .
1 − 2r cos(η) + r2
486 15. Fourier transform and Convolution on Rn

Consequently
Z
lim Pr (x) dx = 0.
r→1− η<|x|≤π

This proves the claim. Hence, for any f ∈ Lp (S1 ), 1 ≤ p < ∞,

1 r→1−
kAr f − f kLp (S1 ) ≤ 2π 1[−π,π] Pr ∗ fe − fe −−−−→ 0
p

r→1−
We know show that Ar f −−−−→ f pointwise a.s. in S1 . Fix |x| ≤ π. The function g(r) =
1 − 2r cos(x) + r2 attains its minimum value within [0, 1] at r = cos(x). Thus,
1
Pr (x) ≤ 2(1 − r) 2 .
sin (x)
Hence, for |x| ≤ π/2
π2 1 1
Pr (x) ≤ (1 − r) ≤ 2π 2 (1 − r) 2
2 x2 x
while for π/2 ≤ |x| ≤ π, 1 − r cos(x) + r2 ≥ 1, and so
1
Pr (x) ≤ 2(1 − r) ≤ 2π 2 (1 − r)
x2
Define
1
ψ(x) = 21|x|≤1 + π 1 .
x2 |x|>1
Then, ψ is a nonnegative integrable radial function for which
1
Pr (x)1[−π,π] (x) ≤ ψ1−r (x) := (1 − r)−1 ψ((1 − r)−1 x)
2π
Pointwise a.s. convergence follows from Theorem 15.3.9. The last statement of follows
immediately.

Example 15.4.10.
P∞ Ifz nlog is the principal branch of logarithmic function, we have that
iθ
− log(1 − z) = n=1 n for all |z| < 1. If z = re with 0 < r < 1 then, the Abel sum of
1 P einθ P∞ sin(nθ)
the sawtooth function f (θ) = 2i |n|≥1 n = n=1 n is given by

1 X rn inθ
∞
X ∞
rn sin(nθ) 1 X r|n| einθ
Ar f (θ) = = = e − e−inθ
n 2i n 2i n
n=1 |n|≥1 n=1
1
=− log(1 − reiθ ) − log(1 − re−iθ ) = Im − log(1 − reiθ )
2i
= arg(1 − reiθ ).
15.4. Fourier series 487

It follows that f (θ) = limr→1− Ar f (θ) for all θ. For 0 < θ < 2π, we obtain another
expression for f , namely f (θ) = limr→1− Ar f (θ) = arg(1 − eiθ ). Let us now consider
X∞ X∞
rn cos(nθ) rn sin(nθ)
− log(1 − reiθ ) = +i
n n
n=1 n=1

(15.25) = − log(|1 − reiθ |) + i arg(1 − reiθ )
The second term the right hand side of (15.25) converges to −f for every θ. For 0 <
θ < 2π, the first term of the left hand side of (15.25) converges to the 2π–periodic even
function g(θ) := − log(|1 − eiθ |) = − log 2| sin(θ/2)| . Notice that g is unbounded and that
limθ→0 g(θ) = ∞ = limθ→2π g(θ). Since sin(t) ∼ = t as t → 0 and limt→0+ tα log(t) for any
P cos(nθ)
α > 0, we have that g ∈ Lp (S1 ) for all p ≥ 1. Being θ 7→ ∞ n=1 n square integrable
1
P∞ cos(nθ)
function on S , it follows that log 2| sin(θ/2)| = − n=1 n .
Remark 15.4.11. The statement of Theorem 15.4.1 holds also for L2 (Tn ), n ≥ 1. The
collection E = {Ek : k ∈ Zn } ⊂ L2 (Tn ) given by Ek (x) = e2πik·x separates points of Tn .
Since Ek Ej = Ek+j , E0 ≡ 1 and Ek = E−k , the linear span A generated by E is a dense
algebra in C(Tn ). Being C(Tn ) dense in Lp (TRn ) for all 1 ≤ p < ∞, we conclude that A is
also dense in Lp (Tn ). It is easy to check that [0,1)n Ek (x)E j (x)dx = δjk . Therefore, for any
P
f ∈ L2 (T, k∈Zn hf |Ek iEk converges to f in L2 (Tn ).

The following example makes a connection between the Fourier series and Fourier inte-
grals.
Example 15.4.12. (Poisson summation formula) If f ∈ L1 then the map S
P P f : x 7→
1 1 n n =
k∈Z n f (x+k) converges absolutely a.s. Indeed, set Q = − ,
2 2 . Then R k∈Zk (Q+
k). From
Z X
R P P R
f (x + k)dx ≤ Q k∈Z |f (x + k)|dx = k∈Zn Q+k |f (x)| dx = kf k1
Q k∈Zn
P
we conclude that k∈Zn f (x + k) converges absolutely a.s. and in L1 (Q) to some function
P f ∈ L1 (Q). We can extend P f periodically to almost all Rn by noticing that P f (x + ℓ) =
P f (x) for all ℓ ∈ Zn and for all x ∈ Q where P f converges. Thus P f can be consider as
a function on Tn . Moreover, by applying Fubini’s theorem we obtain that the ℓ–th Fourier
coefficient of P f is given by
Z X X Z
−2πix·ℓ
f (x + k) e dx = f (x + k)e−2πix·ℓ dx
Q k∈Zn k∈Zn Q
X Z X Z
= f (x)e−2πi(x−k)·ℓ dx = f (x)e−2πix·ℓ dx
k∈Zn Q+k k∈Zn Q+k
Z
= f (x)e−2πix·ℓ dx = fb(ℓ)
Rn
Suppose there is a radial nonincreasing function ψ0 in [0, ∞) such that |f (x)| ≤ ψ0 (kxk)
and that ψ ◦ k · k ∈ L1 (Rn ). For any x ∈ Q and k ∈ Zn , kk + xk ≥ 2√1 n kkk, and so
488 15. Fourier transform and Convolution on Rn

|f (x + k)| ≤ φ0 (kx + kk) ≤ φ0 (ckkk) which implies that

X X Z
|f (x + k)| ≤ φ0 (ckkk) ≤ C φ0 (kxk) dx < ∞
k∈Zn k∈Zn Rn
P
This means that P f (x) = k∈Zn f (x + k) converges absolutely and uniformly on Q. So
P f ∈ L∞ (Tn ) ⊂ L2 (Tn ). Consequently
X X
(15.26) f (x + k) = fb(k)e2πik·x
k∈Zn k∈Zn

a.s. If fb is also dominated by a nonincreasing radial function ϕ ∈ L1 , then the right hand
side of (15.26) converges absolutely and uniformly on Tn and thus, it is also continuous. If
f ∈ C(Rn ) then both series in (15.26) are absolutely and uniformly convergent; hence, (15.26)
P P
holds everywhere on Tn . In particular, for x = 0 we have that k∈Zn f (k) = k∈Zn fb(k).
Example 15.4.13. The Poisson summation obtained from periodization of the Gaussian
2
kernel ϕε (x) = e−πkxk in Rn is given by
X − πkx−mk2 X 2 2
ε−n e ε2 = e−πε kmk e2πim·x
m∈Zn m∈Zn
P −πtm2 e2πimx
For n = 1, the function Θ(x; t) := m∈Z e is called Theta function.
Example 15.4.14. The Poisson summation obtained from periodization of the Poisson
kernel (in Rn ) is given by
Γ[(n + 1)/2] X ε X
n+1 n+1 = e−2πεkmk e−2πix·m
π 2 (ε 2 + kx − mk2 ) 2
m∈Zn m∈Zm
For n = 1 and x = 0 we get that
X 1 π 1 + e−2πε
=
ε2 + m2 ε 1 − e−2πε
m∈Z
which means that
∞
X 1 π 1 + e−2πε 1
2 = − 2
ε + m2
2 ε 1−e −2πε ε
m=1
2 3 3 3
3 π ε + o(ε ) π2
= →
2πε3 + o(ε3 ) 3
P∞ 1 π2
as ε → 0. This gives another proof that m=1 m2 = 6 .

15.5. Inversion of the Fourier transform in L1 (Rn )

The following result shows how to recover a measure on R directly from its Fourier transform.
Theorem 15.5.1. (Inversion formula in R) Let µ be a complex Borel measure on R and let
b be its characteristic function. Then
µ
15.5. Inversion of the Fourier transform in L1 (Rn ) 489

(i) For any −∞ < a < b < ∞,

Z T Z b
1 1
(15.27) µ((a, b)) + µ({a, b}) = lim e−iyt µ
b(t) dy dt
2 2π T →∞ −T a
Z T
1
(15.28) µ({a}) = lim e−iat µ
b(t) dt
T →∞ 2T −T
(ii) If f ∈ L1 (R, λ) then
Z T
1
(15.29) f (x) = lim e−iyt fb(−t/2π) dt a.s.
2π T →∞ −T
b ∈ L1 , then µ ≪ λ and
(iii) If µ
Z
dµ 1
(15.30) (y) = e−ity µ
b(t) dt a.s.
dλ 2π
RT R |θ|T sin t
Proof. (i) Let ΨT (θ) := −T sint θt dt = 2sign(θ) 0 t dt. By Fubini’s theorem
Z T Z b Z T Z bZ
IT (a, b) = e−iyt µ
b(t) dy dt = eit(x−y) µ(dx) dy dt
−T a −T a R
Z Z T
sin(t(x − a)) − sin(t(x − b))
= dt µ(dx)
−T t
ZR

= ΨT (x − a) − ΨT (x − b) µ(dx)
R
Since limT →∞ ΨT (θ) = sign(θ)π, then

lim ΨT (x − a) − ΨT (x − b) = 2π1(a,b) (x) + π1{a,b} (x).
T →∞
As |ΨT (θ)| ≤ 2π, dominated convergence implies that
lim = IT (a, b) = 2πµ((a, b)) + πµ({a, b}).
T →∞

Similarly,
Z T Z Z T
1 −iat 1
JT (a) := e b(t) dt =
µ cos(t|x − a|)dt µ(dx)
2T −T R T 0
Z
sin(T |x − a|)
= µ({a}) + µ(dx).
{a}c T |x − a|
By dominated convergence, limT →∞ JT (a) = µ({a}).
(ii) The second statement of the theorem follows from part (i) applied to the measure
µf (dx) = f (x) dx, where f ∈ L1 (λ).
b ∈ L1 then, for any a ∈ R
(iii) If µ

JT (a) ≤ kb
µk1
→ 0,
2T
490 15. Fourier transform and Convolution on Rn

which implies that µ({a}) = 0. For any −∞ < a < b < ∞

Z
a −ity
e b(t) dy ≤ (b − a)|b
µ µ(t)| ∈ L1 .
b
Dominated convergence and Fubini’s theorem imply that
Z Z a Z aZ
−iyt
lim IT (a, b) = e b(t) dy dt =
µ e−ity µ
b(t) dt dy.
T →∞ R b b R
1
R −ity
Therefore µ(dy) = 2π R e b(t) dt λ(dy).
µ
Remark 15.5.2. Theorem 15.5.1 can be easily generalized to a multidimensional setting,
however, the equivalent left-hand-side expresions to the identities (15.27) and (15.28) are
slightly more involve. If [−T, T] = Πnk=1 [−Tk , Tk ] and [a, b] = Πnk=1 [ak , bk ] are the n–
dimesional intervals, then
Z Z
1
(15.31) µ (a, b) + Lµ (∂(a, b)) = lim e−iy·t µ
b(t) dy dt
(2π)n T→∞ [−T,T] [a,b]
where Lµ is a weighted sum of the measure of the k–dimensional hyperfaces Fk (k =
0, . . . , n − 1) of the d–dimensional cube [a, b]
n−1
X 1
Lµ = µ(Fk ).
2n−k
k=0
The derivation of (15.31) is similar as that of (15.27).
Z Z
1
lim e−iy·t µ
b(t) dy dt
T→∞ (2π)n [−T,T] [a,b]
Z
1 n

= lim Πk=1 Ψ T (x k − a k ) − Ψ T (x k − b k ) µ(dx)
T→∞ (2π)n
k k
Z
1 n

= Π k=1 2π1 (ak ,bk ) + π1 {ak ,bk } dµ.
(2π)n
The measure of the lower dimensional faces is obtained similarly. For any J ⊂ {1, . . . , n},
let δaJ (dyJ ) = Πj∈J δaj , λJ (dyJ ) = Πj∈J dyj and [aJ , bJ ] = Πj∈J [aj , bj ]. If k = #J then
µ({aJ } × [aJ c , bJ c ]) is given by
Z Z
1 −iy·t
lim e b
µ (t) δ a (dy J ) ⊗ dy J c dt
T→∞ (2T )k (2π)n−k [−T,−T]
J
RJ ×[aJ c ,bJ c ]
Z Y
1 sin Tj |xj − aj | Y
= n−k
lim ΨTj (xj − aj ) − ΨTj (xj − bj ) dx
(2π) T→∞ Tj |xj − aj |
Jc
Z Y JY
1
= n−k
1aj 2π1(aj ,bj ) + π1{aj ,bj } dµ.
(2π) c
J J
As in Theorem 15.5.1, for any f ∈ L1 (Rn , λn ),
Z
1
f (x) = lim e−iy·t fb(−t/2π) dt a.s.,
(2π)n T→∞ [−T,T]
15.5. Inversion of the Fourier transform in L1 (Rn ) 491

b ∈ L1 (Rn , λn ), then µ ≪ λn and

and if µ
Z
dµ 1
(x) = e−it·x µ
b(t) dt a.s.
dλn (2π)n Rn

Example 15.5.3. The double exponential distribution ν(dx) = 21 e−|x| dx has characteristic
1
function νb(t) = 1+t 2 ∈ L1 (R). Therefore, by (15.30)

Z
1 −|x| 1 e−itx
e = dt
2 2π R 1 + t2
1
Consequently, the Cauchy distribution ρ(dx) = π(1+x2 )
dx has characteristic function
given by ρb(t) = e−|t| .

Example 15.5.4. If Tc is the tent distribution defined in Example 15.2.16, then

sin2 (tc/2) 1 − cos(ct)

Tbc (t) = (µd
c/2 ) 2
(t) = 4 2 2
=2 .
c t (ct)2

Since Tbc ∈ L1 ,it follows from (15.30) that

Z Z
1 y 1 1 − cos(ct) −iyt 1 1 − cos t −iyt/c
1− = e dt = e dt
c c + π (ct) 2 cπ t2

The probability measure defined by pc (dx) = πc 1−cos(cx)

(cx)2
dx is called Polya’s distribution.
Its characteristic function is given by pbc (t) = (1 − |t/c|)+

The following result provides another way to invert the Fourier transform of funcitons
in L1 as a limit of regular functions. This approach involves convolution operations and
provides L1 and a.s. convergence.

Theorem 15.5.5. (Inversion Theorem of theR Fourier transform in L1 (Rn , λn )). Suppose
ϕ ∈ L1 (Rn , λn ) such that ϕ
b ∈ L1 (Rn , λn ) and ϕ(t)
b dt = 1. For any ε > 0 define
Z
(15.32) Sε (f, x) = ϕ(εs)fb(s)ei2πx·s ds.

ε→0
Then, kSε (f ) − f k1 −−−→ 0. If fb ∈ L1 (Rn , λn ) then, Sε (f ) converges pointwise to f at every
Lebesgue point of f . Furthermore,
Z
(15.33) f (x) = lim Sε (f, x) = e2πt·x fb(t) dt
ε→0

pointwise at every Lebesgue point x of f . In particular f coincides almost surely with a

function in C0 (Rn ).
492 15. Fourier transform and Convolution on Rn

b
Proof. Let K(x) = ϕ(−x) and define Kε (x) = ε−n K(ε−1 x) for all ε > 0. By Fubini’s
theorem,
Z Z
−n −1
Kε ∗ f (x) = ε ei2πε (x−y)·s ϕ(s) ds f (y) dy
Z Z
−1
= ε ϕ(s) ei2πε (x−y)·s f (y) dy ds
−n

Z
= ϕ(εs)fb(s)ei2πx·s ds,

where the last equality follows by a change of variables s 7→ ε−1 s. The first conclusion
follows from Theorem 15.3.1.
2
For the second statement consider ϕ(x) = e−π|x| . Then, ϕ(x) = ϕ(−x) and
Z n Z∞
Y 2
−2πix·t −π|x|2 2
b =
ϕ(t) e e dx = e−2πixj tj e−πxj dxj = e−π|t| = ϕ(t).
Rn j=1−∞
R R
b dt = 1 = ϕ(x) dx. Clearly the kernels Kε defined in the first part of the
Hence ϕ(t)
proof satisfy the conditions of Theorem 15.3.9; thus, the left hand side of 15.32 converges
to f pointwise at every Lebesgue point of f as ε → 0. By dominated convergence, the right
b
hand side of (15.32) converges to fb (−x) pointwise as ε → 0, and (15.33) follows. The last
statement is a consequene of Riemann–Lebesgue’s lemma 15.1.21.

The following result is a simple consequence of Fourier inversion theorem.

Corollary 15.5.6. Suppose f ∈ L1 (Rn , λn ) and fˆ ≥ 0. If f is continuous at 0 then

fˆ ∈ L1 (Rn , λn ) and
Z
f (0) = fˆ(y) dy.
Rn

2
b −1 x) = εn φ(ε−1 x) and so, Sε (f ) ≡
Proof. For φ(x) = e−π|x| , we have that Kε (x) = ε−n φ(ε
n
Kε ∗ f in R . If f is continuous at 0, then 0 is a Lebesgue point of f . If follows from by
Theorem 15.3.9 that lim Sε (f, 0) = lim Kε ∗ f (0) = f (0).
ε→0 ε→0
R
If fˆ ≥ 0 then, by monotone convergence, fˆ ∈ L1 and fb = lim Sε (f, 0).
ε→0

15.6. L2 Theory and Plancherel’s Theorem

We know that the space L1 ∩ L2 is dense in L2 . Here we will extend the Fourier transform
from the former space to the latter. One of the nice properties of this extension is that it
turns out to be a unitary linear transformation.

Theorem 15.6.1. If f ∈ L1 ∩ L2 then fb ∈ L2 and kfbk2 = kf k2

15.6. L2 Theory and Plancherel’s Theorem 493

Proof. If g(x) = f (−x) then, h = f ∗ g ∈ L1 and, by Theorem 15.2.12, h ∈ C0 . As gb = fb,

R
we have that b
h = fbgb = |fb|2 . By Corollary 15.5.6, b h is integrable and h(0) = Rn b
h(y) dy.
Thus
Z Z Z
b 2
|f (y)| dy = b
h(y) dy = h(0) = f (y)g(0 − y) dy
Rn Rn Rn
Z Z
= f (y)f (y) dy = |f (y)|2 dy
Rn Rn

Theorem 15.6.1 states that Fourier transform F maps L1 ∩ L2 into L2 isometrically.

By Caratheódory extension, the F admits a unique extension to all of L2 which is also an
isometry. We will keep the notation fb = Ff for f ∈ L2 . If f ∈ L2 , then hk = f 1B(0;k) ∈
L1 ∩ L2 and khk − f k2 → 0 as k → ∞. Therefore, bhk ∈ L2 and
Z
b b
f (y) = Ff (y) = lim hk (y) = lim f (x)e−2πix·y dx
k→∞ k→∞ |x|≤k

in L2 . The next results establishes that F is in fact a unitary operator.

Theorem 15.6.2. (Plancherel)
(i) The Fourier transform is a unitary operator on L2 .
(ii) For all g ∈ L2 (λn ), (F −1 g)(x) = (F gb)(−x) a.s.

Proof. (i) Since F is an isometry on L2 , F(L2 ) is a closed subspace of L2 . Let g ∈

⊥
F(L2 ) . A simple density argument extends theorem 15.1.17 to functions in L2 . Thus
R R
n
b(y)g(x) dx = n f (x)b
f g (x) dx = 0 for all f ∈ L2 . Therefore kgk2 = kbg k2 = 0.
R R
R
(ii) As F is a unitary operator, it preserves the inner product: (u|v) = Rn uv dx. Let g
be any function in L1 ∩ L2 and f ∈ L2 . Since fb ∈ L2 , fb1B(0;k) ∈ L1 ∩ L2 for each k.
Consequently, Z
fk (x) = fb(y)e2πix·y dy = F(fb1B(0;k) )(−x) ∈ L2
|y|≤k

and the sequence fk converges to f˜(x) = F(fb)(−x) in L2 . Therefore,

Z Z
˜
(g|f ) = lim (g|fk ) = lim g(x) fb(y)e−2πiy·x dy dx
k→∞ k→∞ Rn |y|≤k
Z Z
= lim fb(y) g(x)e−2πiy·x dx dy
k→∞ |y|≤ k Rn
Z
= lim gb(y)fb(y) dy = (b g |fb) = (g|f )
k→∞ |y|≤k

This implies that f (x) = (F −1 fb)(x) = f˜(x) = (F fb)(−x) for all f ∈ L2 .

Riesz’ interpolation theprem extends the Fourier transform to all Lp spaces with 1 <
p < 2.
494 15. Fourier transform and Convolution on Rn

Theorem 15.6.3. (Hausdorff–Young) For each 0 ≤ θ ≤ 1, let pθ = 2/(2 − θ) and qθ = 2/θ.

The Fourier transform F is a bounded linear tranformation on Lpθ into Lqθ with norm
kFk ≤ 1.

Proof. From Riemann–Lebesgue’s lemma and Plancherel’s theorem we know that the
Fourier transform F is a linear map on L1 (Rn ) + L2 (Rn ) into L∞ (Rn ) + L2 (Rn ) such that
kF(f )k∞ ≤ kf k1 and kF(g)k2 = kgk2 for all f ∈ L1 and g ∈ L2 . By Riesz’s interpolation
theorem, for any 0 < θ < 1 we can define the Fourier transform as a bounded operator on
Lpθ into Lqθ with kF(f )kqθ ≤ kf kpθ .

15.7. Schwartz functions

A function φ ∈ C ∞ (Rn ) is said to be rapidly decreasing if
(15.34) ρα,β (ρ) := sup |xα (∂ β φ)(x)| < ∞
x∈Rn

for any n–tuples α = (α1 , . . . , αn ) and β = (β1 , . . . , βn ) of nonnegative integers. The

collection S of all such functions is also known as the Schwartz space. The topology ρ
generated by collection of seminorms φα,β defined in (15.34) makes (S, ρ) a Fréchet space.
Indeed, if {φm : m ∈ N} is Cauchy in S, then for any α, β ∈ Zn+ , xβ (∂ α φm )(x) converges
uniformly to a bounded continuous function gα,β (x) as m → ∞. Using the fundamental
theorem of Calculus repeteadly, we have that φm → g00 and gα,β (x) = xβ (∂ α g00 )(x).
Clearly φ ∈ S if and only if for any polynomial P in Rn and any n–tuple β of nonnegative
integers P (x)Dβ φ(x) is bounded. The Leibniz formula, or induction, shows that S is a vector
ring with respect to pointwise addition and multiplication. It is left as an exercise to show
that (S, ρ) is also induced by the family of norms
(15.35) ρm (φ) := sup |(1 + |x|2 )m ∂ β φ(x)|, m ∈ Z+
x∈Rn
|β|≤m

2
Example 15.7.1. The family of functions ϕα (x) = e−α|x| , α > 0, is contained in S.

The next results makes a connection between D(Rn ) (with the strictly inductive limit
topology τ defined in Example (12.4.4), and the S with the Fréchet topology ρ induced by
the seminorms ρm .
Theorem 15.7.2.
(i) D(Rn ) is dense in (S, ρ), and S is dense in (Lp (λn ), k kp ) for all 1 ≤ p < ∞.
(ii) The inclusion map ι : (D(Rn ), τ ) → (S, ρ) is continuous.
(iii) For any 1 ≤ p < ∞ there is a constant 0 < C = C(n, p) < ∞ such that
kφkp ≤ C ρn (φ), φ∈S
Hence, the inclusion map j : (S, ρ) → (Lp (λn ), k kp ) is continuous.
15.7. Schwartz functions 495

Proof. (ii) For each compact K ⊂ Rn , the topology induced on DK by ρ is the same as
the topology τK induced on DK by the seminorms pm (φ) = sup{|φα (x)| : x ∈ Rn , |α| ≤ m},
m ∈ Z+ , since (1+|x|2 )m is bounded on K for each m ∈ Z+ . This shows that the restriction
of ι to DK is continuous. Thus, by Theorem 12.5.4, ι is continuous.
(i) Let φ ∈ S. Choose η ∈ D(Rn ) with 0 ≤ η ≤ 1 such that η ≡ 1 in the unit ball and
zero outside the ball of radius 2. Define φr (x) := φ(x)η(rx) for r > 0. Clearly φr ∈ D(Rn ).
r→0
We claim that φr −−−→ φ in S. Indeed, by the Leibniz formula for differentiation, for any
polynomial P (x) and α ∈ Z+
X α
α
(15.36) P (x) ∂ (φ − φr )(x) = P (x) (∂ α−β φ)(x)r|β| (∂ β (1 − ψ))(rx)
β
0≤β≤α

On the closed ball B(0; 1/r) each term Dβ (1−ψ)(rx)

≡ 0, 0 ≤ β ≤ α. Since P (x)∂ α−β φ)(x) ∈
S for each 0 ≤ β ≤ α, we have that for any ε > 0, there is a δ > 0 such that
X α
k∂ β (1 − ψ)k∞ P (x) Dα−β φ (x)| < ε, |x| ≥ 1/δ
β
0≤β≤α
r→0
It follows that the sum in (15.36) converges uniformly to 0 as r → 0. Therefore, φr −−−→ φ
in S.
(iii) Let φ ∈ S and 1 ≤ p < ∞. Since
1 ρn (φ)
|φ(x)| = (1 + |x|2 )n |φ(x)| 2 n
≤ ,
(1 + |x| ) (1 + |x|2 )n
we have that
Z ∞
1 1/p
kφkp ≤ dx ρn (φ)
0 (1 + |x|2 )np
The density of S in Lp (λn ) follows from Theorem 15.3.4.

The dual space of (S(Rn ), ρ) is called the space of tempered distributions (see Exer-
cise 15.9.28).
Example 15.7.3. Suppose µ a positive Radon measure on (Rn , B(Rn )) such that
Z
C := (1 + |x|2 )−N µ(dx) < ∞
Rn
R
form some N ∈ N. The map uµ : φ 7→ φ dµ is a tempered distribution. To see that,
m→∞
suppose φm → 0 in S. Then k(1 + |x|2 )N φn (x)k∞ −−−−→ 0. Consequently, |uµ (φm )| ≤
k(1 + |x|2 )N φn (x)k∞ C → 0.
Theorem 15.7.4. The Fourier transform F maps the space S onto itself; moreover, F :
(S, ρ) → (S, ρ) is a continuous bijection and F −1 = F 3 . For any polynomial P on Rn ,
(15.37) b = (P (2πix)ϕ(x))∧ (t)
P (−∂)ϕ(t)
(15.38) P\ b
(∂)ϕ(t) = P (2πit)ϕ(t)
496 15. Fourier transform and Convolution on Rn

Proof. Suppose φ ∈ S. Then, xα φ(x) is integrable for any α ∈ Zn+ . By Theorem 15.1.9,
φb ∈ C ∞ (Rn ) and (15.37) holds for p(x) = xα and hence, for any polynomial by linearity.

Since ∂ α φ in integrable for all α ∈ Zn+ , ∂d

α φ is well defined. Consider first the differential

operator ∂x1 . Fubini’s theorem and integration by parts gives

Z Z
(∂1 φ)(x)e−2πix·t dx = 2πit1 b
φ(x)e−2πit·x dx = 2πit1 φ(t)
Rn Rn
Iterating this argument for any α ∈ Zn+ we obtain that
Z Z
α
(∂ φ)(x)e −2πix·t
dx = (2iπt) α b
φ(x)e−2iπt·x dx = (2πit)α φ(t)
Rn Rn

Consequently ∂d b and (15.38) follows. Since xα φ(x) and ∂ β (xα φ(x)) are
α φ(t) = (2πit)α φ(t)
n
both in S for any α, β ∈ Z+ , applying (15.37) first and then (15.38) we obtain that

b = (2iπt)β (−2iπx)α φ(x) ∧ (t)
(2iπt)β (∂ α φ)(t)
∧

= ∂ β (−2iπx)α φ(x) (t)

≤ (2π)|α| ∂ β xα φ(x) < ∞
1

This proves that φb ∈ S.

The Fourier inversion theorem 15.5.5 implies that F restricted to S is bijective and that
for any φ ∈ S, φ(−x) = F 2 φ(x). This implies that F 4 φ = φ which means that F −1 = F 3 .
To prove continuity of F on S we use the closed graph theorem. Suppose φm → φ in
m→∞ m→∞
(S, ρ) and that for some ψ ∈ S, φc m −−−−→ ψ in S. Then φm −−−−→ φ in L1 (λn ) since
m→∞ m→∞
(1 + |x|2 )n φm (x) − φ(x) −−−−→ 0 uniformly. Hence φc −−−→ φb uniformly, which means
m −
that ψ = φ.b
Remark 15.7.5. Theorem 15.7.4 implies the existence of smooth Lebesgue integrable func-
tions whose Fourier transform is not only smooth, but also has compact support.
Corollary 15.7.6. If u ∈ S ∗ , the map u
b : S → C given by u b where φb = F(φ), is
b(φ) := u(φ),
∗ ∗
a tempered distribution. Furthermore Φ : S → S given by u 7→ u b is a continuous bijection
and Φ−1 = Φ3 .

Proof. u b ∈ S ∗ . To show continuity of Φ. let W be any neighborhood of

b = u ◦ F and so, u
0 in S ∗ . THen, there are φ1 , . . . , φk ∈ S such that
{u ∈ S ∗ : |u(φj )| < ε for 1 ≤ j ≤ k} ⊂ W
Define
V := {u ∈ S ∗ : |u(φbj )| < ε for 1 ≤ j ≤ k}
If u ∈ V , then Φu = u b ∈ W which means that Φ is continuous. The last statment follos
from the fact that F 4 φ = φ for all φ ∈ S.
15.7. Schwartz functions 497

b follows from the identity

Remark 15.7.7. Motivation for the definition of u
Z Z
bφ dλn = uφb dλn
u

when u, φ ∈ L1 .

A function f ∈ Lp (Rn ) has a partial derivative with respect to the k–th coordinate
in the sense of Lp (Rn ) if there exists g ∈ Lp (Rn ) such that
f (· + h e ) − f (·)
k k
lim − g(·) = 0
hk →0 hk p

In such case g is a.s. unique. Clearly, if f admists a partial derivative ∂k f in Rn the sense
of differential Calculus as well as a partial derivative g with respect to xk in the sense of
Lp (Rn ), then g = ∂k f a.s.
Lemma 15.7.8. If f ∈ L1 (Rn ) has a partial derivative g w.r.t. xk in the sense of L1 (Rn )
then,
gb(t) = 2πitk fb(t)
for all t ∈ Rn .

Proof. This follows from

e2πi tk hk − 1 f (· + h e ) − f (·)
k k
fˆ(t) − gb(t) ≤ − g(·) ,
hk hk 1

and by letting hk → 0.
Theorem 15.7.9. If ψ ∈ S, then
τ−hej ψ − ψ h→0
−−−→ ∂xj ψ
h
in S and in Lp (λn ) for each 1 ≤ j ≤ n and 1 ≤ p < ∞.
τhej −τ0
Proof. Let ηh = h . From
1 + |x| ≤ 1 + |x − y| + |y| ≤ (1 + |x − y|)(1 + |y|),
and Jensen’s inequality we obtain that
(1 + |x|2 ) ≤ (1 + |x|)2 ≤ (1 + |x − y|)2 (1 + |y|)2 ≤ 4(1 + |x − y|2 )(1 + |y|2 ).
Let N ∈ Z+ and α ∈ Zn+ with |α| ≤ N . There is a constant A = A(ψ, N, α) > 0 such that

(1 + |x|2 )N (∂ α (ηh φ(x) − φ(x)|) = (1 + |x|2 )N ∂xj ∂ α ψ(x + θhej ) − ∂xj ∂ α ψ(x)

= (1 + |x|2 )N ∂x2 ∂ α ψ(x + ξθhej )|h|
j

(1 + |x|2 )N
≤ A|h| ≤ 4N (1 + |h|2 )N A|h|
(1 + |x + ξθhej |2 )N
498 15. Fourier transform and Convolution on Rn

where ξ, θ ∈ (0, 1) result from applications of the mean value theorem. Convergence in S
and in Lp (λn ) follow from
h→0
ρN (ηh φ − φ) ≤ 4N (1 + |h|2 )n |A||h| −−−→ 0,
and Theorem 15.7.2[(iii)].

15.8. Harmonic functions

A function u defined on an open set U ⊂ Rn satisfies the mean–value property if for any
x ∈ U and r > 0 such that B(x; r) ⊂ U
Z
1
(15.39) u(x) = n−1 u(z) σ(dz)
r σn−1 (Sn−1 ) ∂B(x,r)
Z
1
= u(x + rz) σn−1 (dz)
σ(Sn−1 ) Sn−1
where σ is the spherical measure on ∂B(x; r) and σn−1 is the spherical measure on Sn−1 . If
u satisfies (15.39) then
Z Z r Z
1 1 n−1
(15.40) u(y) dy = n s u(x + sz) σ(dz) ds = u(x)
rn ωn B(x,r) r ωn 0 Sn−1

Example 15.8.1. We have already seen that any analytic function, and thus its real and
imaginary parts, on an open subset U ⊂ C satisfy the mean value property. In fact, in
R
this case the mean value property coincides with the Cauchy formula f (a) = γr (a) fz−a
(z)
dz,
it
where γr (a)(t) = a + re , 0 ≤ t ≤ 2π.
Theorem 15.8.2. If u ∈ C(U ) satisfies the mean–value property in U , then u ∈ C ∞ (U ).

Proof. Let η(x) = ψ(|x|) be a mollifier with support B(0; 1) with mass one. As before,
uε = ηε ∗ u denotes the mollification of u with ηε . We will show that u = uε in Uε . Indeed,
if x ∈ Uε , then by using polar coordinates, we obtain
Z
ε 1 |x − y|
u (x) = n η u(y) dy
ε U ε
Z |y|
1
= n η u(x − y) dy
ε B(0;ε) ε
Z
= η(y)u(x − εy) dy
B(0;1)
Z 1 Z
n−1
= ψ(r)r u(x − rεz)σn−1 (dz) dr
0 ∂S n−1
Z 1
= u(x)σn−1 (Sn−1 ) ψ(r)rn−1 dr = u(x).
0
Thus u ∈ C ∞ (Uε ) for any ε > 0.
15.8. Harmonic functions 499

A function u ∈ C 2 (U ) is harmonic in U if
n
X
(15.41) △ u(x) = ∇ · ∇u(x) = ∂x2j xj u(x) = 0, x∈U
j=1

Remark 15.8.3. If f is a complex valued function on U , then f is harmonic iff u = Re(f )

and v = Im(f ) are harmonic. Linear combination of harmonic functions are harmonic.
Analytic functions in a complex region D are harmonic on D.
Theorem 15.8.4. A function u ∈ C(U ) satisfies the mean–valued property in U iff it is
harmonic.

Proof. Suppose B(x0 ; R) ⊂ U and let

Z
1
φ(r) = u(x0 + rz)σn−1 (dz), 0 < r < R.
σn−1 (S1 ) Sn−1
If u ∈ C 2 then, ∇u is bounded on any closed ball contained in U . Consequently, φ is
differentiable and the derivative can be drag inside the integral. From Stoke’s theorem,
Z
′ 1
φ (r) = ∇u(x + rz) · z σn−1 (dz)
σn−1 (S1 ) Sn−1
Z
1
(15.42) = (∇ · ∇u)(x + ry) dy
σn−1 (Sn−1 ) B(0;1)
whenever B(x; r) ⊂ U . Hence, if u satisfies the mean–value property or if u is harmonic,
then φ′ (r) ≡ 0 for all 0 < r < R.

Suppose u satisfies the mean–value property. If ∇u 6≡ 0, then without loss of general-

Rity, u(x0 ) > 0 for some x0 ∈ U , and so′ ∇u > 0 in the a ball B(x0 ; R) ⊂ U . Then
B(x0 ;r) △ u(y) dy > 0 which contradicts φ (r) = 0.

Conversely, if u ∈ C 2 (U ) is harmonic, then φ ≡ 0 it follows that φ is constant and

φ(r) = lim φ(s) = u(x)
s→0+
by dominated convergence. Therefore u satisfies the mean–value property.
Corollary 15.8.5. Suppose {un : n ∈ N} is a sequence of harmonic functions on an open
set Ω ⊂ Rd . If un converges to a function u uniformly on compact subsets of Ω, then u is
hamonic on Ω.

Proof. Fix x0 ∈ Ω. For any r > 0 such that B(x0 ; r) ⊂ Ω un satisfies the mean–value
property. From the hypothesis of the Theorwm we have that u is continuous on Ω and
{un : n ∈ N} is bounded on B(x0 ; r). By dominated convergence, we obtain that
Z Z
1 1
u(x0 ) = lim un (x0 ) = lim un (x) dx = lim u(x) dx
n n ωd r d B(x ;r) n ωd r d B(x ;r)
0 0

Therefore, u also satisfies the mean–value property, and so is is harmonic.

500 15. Fourier transform and Convolution on Rn

Theorem 15.8.6. (Maximum principle) Suppose u is a real–valued continous function on

U that is harmonic on U . If u attains a maximum on U , then
max u(x) = max u(x).
x∈U x∈∂U

If U is connected and u attains a local maximum in U , then u ≡ maxx∈U u(x). Similar

implication holds for min by using −u instead.

Proof. Suppose that u attains its maximum at some point x0 ∈ U , that is, u(x0 ) ≥ u(y)
for all y ∈ U . As
Z
1
u(x0 ) = n u(y) dy
r ωn B(x0 ;r)

for any ball B(x0 , r) ⊂ U , Corollary (4.2.5)(ii) it follows that u ≡ u(x0 ) on B(x0 ; r). Hence
the set {x ∈ U : u(x) = u(x0 )} is both closed and open in U . Therefore, u ≡ u(x0 ) in the
connected componet of x0 in U .

We will use the results on harmonic functions discussed above to study harmonic func-
tions in the unit disc of the complex plane.
If f ∈ C(S1 ), and Pr is the Poisson kernel in the unit disc then, as in Example 11.5.17
(with µ(dθ) = f (eiθ ) dθ on S1 ), u = Pr ∗ f is harmonic on B(0; 1). Let Hf (z) = Pr ∗ f (θ)
for z = reiθ ∈ B(0; 1) and f (eiθ ) if |z| = 1. Then, Hf is bounded in B(0; 1) and
(15.43) kHf ku(B(0;1)) ≤ kf ku(S1 ) .

For each ek (θ) = eikθ , k ∈ Z, we have that Hek (reiθ ) = r|k| ek (θ); consequently, Hg ∈
C(B(0; 1)) and Hg (eiθ ) = g(eiθ ) for any trigonometric polynomial g. As trigonomet-
ric polynomials are dense in C(S1 ), we conclude from (15.43) that Hf ∈ C(B(0; 1)) and
Hf (eiθ ) = f (eiθ ). In the remainder of this section, we use P [f ] to denote the function Hf
on B(0; 1) introduced above.
The next result shows that any continuous function f on B(0; 1) that is harmonic on
B(0; 1) is obtained by applying the Poisson kernel to the restriction of f ∂B(0; 1) = S1 . For
any function u on B(0; 1), we use ur to denote the map on S1 given by θ 7→ u(reiθ ). For
any function f on S1 .
Theorem 15.8.7. If u ∈ C(B(0; 1)) and harmonic on B(0; 1), then u = P [u].

Proof. If v = P [u], then v ∈ C(B(0; 1)), harmonic on B(0; 1) and v = u on S1 . The

maximum principle theorem asserts that v − u ≡ 0.
Corollary 15.8.8. Suppose f is harmonic on B(0; 1). For any 1 ≤ p ≤ ∞ the map
r 7→ kfr kLp (S1 ) is non decreasing on (0, 1).

Proof. As f is harmonic on B(0; 1), so is fr (z) = f (rz); moreover, fr ∈ C(B(0; 1)). The-
orem 15.8.7 shows that for any 0 ≤ r, ρ < 1, frρ (θ) = f (rρeiθ ) = fr (ρeiθ ) = Pρ ∗ fr (θ).
15.9. Exercises 501

Therefore, by Jensen’s inequality (or the generalized Minkowski inequality)

kfrρ kLp (S1 ) ≤ kPρ kL1 (S1 ) kfr kLp (S1 ) = kfr kLp (S1 ) .
This concludes our proof.

15.9. Exercises
Exercise 15.9.1. Suppose that µb(t) is the characteristic function of a finite positive mea-
n n
sure µ on (R , B(R )). Show that
b(−t) = µ
(a) µ b(t)
b is uniformly continuous and |b
(b) µ b(0) = µ(Rn ).
µ(t)| ≤ µ
Pm
b is positive definite, i.e.,
(c) µ b(tk − tj )zk zj ≥ 0 for all tj ∈ Rn and zj ∈ C,
k,j=1 µ
j = 1, . . . , m.
R
(d) For any g ∈ L1 (Rd ), show that g(x)b µ(x − y)g(y) dx ⊗ dy ≥ 0.
Remark 15.9.2. In Section 18.5 we will show that a function ϕ that satisfy conditions
(c) and (c) of Exercise 15.9.1 is in fact the characteristic function of a finite measure µ in
B(Rd ).
Exercise 15.9.3. (Hamburger moment problem.) The question is whether given a sequence
of real numbers {mn : n R∈ Z+ }, m0 = 1, there is a unique probability measure µ on
(R, B(R)) suchR that mn = xn µ(dx). Suppose there is one such probability measure, and
n
define Mn := |x| µ(dx). Assume
p
1 2n
r := lim sup M2n < ∞
n 2n
Show that
√
(a) r = lim supn n1 n Mn (Hint: By Cauchy–Schwartz, M2n+1 ≤ (M2n+2 M2n )1/2 .)
1
R
(b) For |z| < re , e|xz| µ(dx) < ∞. (Hint: Given ε > 0, there is N ∈ N such that
Mn < nn (r + ε)n whenever n ≥ N . Thus
|z|n Mn
≤ |e z (r + ε)|n
n!
P Mn z n
and so, n≥0 n! converges for |z| < 1/(re).)
1
b admits an analytic extension to the strip D = {z ∈ C : | Im(z)| <
(c) µ re }.
Conclude that µ must be unique.
Exercise 15.9.4. Suppose f ∈ L1 and h ∈ Rn , α ∈ R. Show that
(a) If g(x) = f (x)e2πix·h then gb(y) = fb(y − h).
(b) If g(x) = f (x − h) then, gb(y) = fb(y)e−2πih·y .
(c) If g(x) = f (−x), then gb(y) = fb(y).
(d) If g(x) = f (x/α) and α > 0, then gb(y) = αn fb(αy).
502 15. Fourier transform and Convolution on Rn

Exercise 15.9.5. Suppose µ is a finite positive measure on R. If supp µ ⊂ hZ for some

e is 2π/h–periodic.
h > 0, show that µ
Exercise 15.9.6. Show that for each t > 0, the map Ut : f 7→ 1(−∞,0] f + τt (1(0,∞) f ) is a
linear isometry on Lp (R, B(R), λ1 ) for all 1 ≤ p < ∞ but Ut (Lp (R)) 6= Lp (R). Show that
tց0
kI − Ut k −−→ 0 in operator norm.
Exercise 15.9.7. Suppose h is a nontrivial continuous linear functional on L1 (Rn , λn ) such
that h(f ∗g) = h(f )·h(g) for all f, g in L1 (Rn , λn ). Show that there is a bounded continuous
function β : Rn → C such that
Z
h(f ) = β(x)f (x) dx, f ∈ L1 (Rn , λn )

β(x + y) = β(x)β(y), (x, y) ∈ Rn × Rn

Conclude that there is t ∈ Rn such that β(x) = e−ix·t and so, h(f ) = fb(t/2π) for some
t ∈ Rn . That is, evaluations of the Fourier transform are the only complex–valued con-
n
tinuous algebraic isomorphisms in the vector ring
n
∗ L1 (R , λnn) with convolution
R as the mul-
tiplication operation. (Hint: Since L1 (R , λn ) = L∞ (R , λn ), h = f β dλn for some
β ∈ L∞ (Rn , λn ). Choose f such that h(f ) 6= 0 and use Fubini’s theorem to show that
h(f )β(y) = h(τy f ) a.s. where τy is the translation operator. Deduce that β has a continu-
ous representation.)
R
Exercise 15.9.8. Let f ∈ L1 (Rn ) and assume that |xα ||f (x)| dx < ∞ for all α ∈ Zn+ such
that |α| ≤ m. Show that fb ∈ C m (Rn ).
Exercise 15.9.9. Suppose that f ∈ L1 (R) is differentiable and that f ′ ∈ L1 (R). Show
R
that lim|x|→∞ f (x) = 0 and that f (x) = (−∞,x] f ′ (s) ds for all x ∈ R. Show that fc′ (t) =
2πitfb(t). Extend the conclusion to functions in Rn by proving that if Dα f ∈ C(Rn )∩L1 (Rn ),
for all α ∈ Zn with |α| ≤ m then, D d α f (t) = (2πit)α fb(t) whenever |α| ≤ m.
+

Exercise 15.9.10. The Bessel kernel of order α > 0 is the function on Rn defined by
Z ∞ |x|2 (α−n)
1
Gα (x) = n
α
exp − − t t 2 −1 dt
(4π) Γ 2 0
2 4t
R
(a) Show that {Gα : α > 0} ⊂ L1 (Rn , λn ) and Gα dλn = 1.
(b) Show that Gcα (ξ) = (1 + |2πξ|2 )− α2 , and that that Gα ∗ Gβ = Gα+β .

Exercise 15.9.11. The gamma distribution with parameters u > 0 and θ > 0 is defined
as the Borel measure γu,θ on R with
dγu,θ θu u−1 −θx
(x) = x e 1(0,∞) (x).
dx Γ(u)
The cases γ1,θ and γ1/2,1/2 correspond to the exponential E(θ) and χ21 distributions respec-
tively. For n ∈ N, γn,θ is known as Erlang distribution E(θ, n). Show that
R
(a) esx γu,θ (dx) = θu (s − θ)−u < ∞ for all s ∈ (−∞, θ).
15.9. Exercises 503

R
(b) The map Gu,θ (z) = ezx γu,θ (dx) is analytic on (−∞, θ) × R ⊂ C.
P zn
(c) Gu,θ (z) = Gu,1 ( zθ ) = 1 + n≥1 Γ(u+n) u
Γ(u) n!θ n = θ (θ − z)
−u for all |z| < θ.

u −u .
u,θ (s) = θ (θ − is)
(d) The characteristic function γd

Exercise 15.9.12. Let γu,θ be the Gamma measure on (0, ∞), B((0, ∞)) . Show that the
measure on (0, ∞) induced by the function Y : x 7→ x1 is absolutely continuous w.r.t. the
Lebesgue measure on (0, ∞) with Radon–Nikodym derivative given by
θu −(u+1) − yθ
fu,θ (y) = y e 1(0,∞) (y)
Γ(u)
This induced measure is called the inverse–gamma distribution, and is denoted by
Ig(u, θ).
Exercise 15.9.13. Let µ and ν be two complex measures in B(Rn ). Assume that µ ≪ λn
R
and that f = λµn . Show that µ ∗ ν ≪ λn and that d(µ∗ν)
dλn (x) := f ∗ ν(x) = Rn f (x − y) ν(dy).

Exercise 15.9.14. Let A, B ∈ B(Rn ) be such that λ(A), λ(B) > 0. Show that A + B
contains a set an open ball. (Hint: With out loss of generality assume that A and B are
compact. Then 1A ∗ 1B is continuous and not identically zero.)
Exercise 15.9.15. For each a > 0 define fa (x) = 1[−a,a] ∗ 1[−1,1] (x). Then fba (t) =
1
(πt)2
sin(2πat) sin(2πt) ∈ L1 (R). Show that kfa ku = 4 and that lima→∞ kfba k1 = ∞. Con-
clude from the open mapping theorem that the Fourier transform map f 7→ fb from L1 to
C0 is not surjective.
Exercise 15.9.16. If γu,θ denotes the gamma measure with parameters u > 0, θ > 0, then
γu1 ,θ ∗ γu2 ,θ = γu1 +u2 ,θ . In particular (a) the convolution of exponential distribution E(θ)
n times with itself is the Erlang E(n, θ) distribution; (b) the convolution χ21 ∗ χ21 is the
exponential distribution E(1/2).
Exercise 15.9.17. Suppose that µ is a finite Borel measure on Rn such that µ ∗ ν = ν for
some Borel measure ν not identically zero. Show that µ = δ0 .
Exercise 15.9.18. Let U be the renewal measure associated to a positive Radon measure
µ on R+ . Let λ be Lebesgue’s measure on [0, ∞). Suppose ν is a another Radon measure
on [0, ∞), and z is a measurable function on [0, ∞) bounded in compact sets. Show that
(a) U Radon measure on ([0, ∞), B([0, ∞)). (Hint: For finite µ consider µ̌(s) =
R is a −sx
[0,∞) e µ(dx). For infinite µ, consider µt (dx) = 1[0,t] (x)µ(dx) and check that
µt [0, s] = µ∗n [0, s] for all n and 0 ≤ s ≤ t.)
∗n

(b) The function Z := U ∗ z is the unique solution to the equation

1
Z(t) = z(t) + µ ∗ Z(t), Z(0) = z(0)
1 − µ({0})
that is bounded in compact sets.
(c) λ = U ∗ ν iff ν(dx) = (1 − µ([0, x])) · λ(dx).
504 15. Fourier transform and Convolution on Rn

Exercise 15.9.19. Suppose {Kε : ε > 0} is a family of good kernels in Rn dominated by

a nonnegative radial decreasing function ψ as in (15.13). Show that Kε ∗ µ → Dµ λn –a.s.
as ε → 0, where Dµ is the symmetric derivative of µ with respect to λn .
1
P inx define the 2π–periodic func-
R x For f ∈ L1 (S ) with f ∼ n∈Z cn (f )e
Exercise 15.9.20.
tion as F (x) = 0 f (t) dt − c0 (f ) x for 0 ≤ x < 2π. Show that
(1) F ∈ L1 (S1 ) is and absolutely continuous.
P 1
(2) F ∼ C0 + |n|≥1 in cn (f )einx for some constant C0 . (Hint: integration by parts.)
(3) If in addition f ∈ L2 (S1 ), show thatn Sn F converges uniformly to F .

Exercise 15.9.21. Suppose f ∈ L1 (S1 ). If f ∈ C k (S1 ), show that cn (f ) = o |n|1 k as
|n| → ∞ in which case, Sn f converges to f uniformly. (Hint: Integration by parts.) If f is
1
of finite variation over S1 , show that cn (f ) = O |n| .
Exercise 15.9.22. Suppose that µ is a probability measure on R with µ(hZ) = 1. Show
h
R π/h −ixt
that µ({x}) = 2π −π/h e b(t) dt for all x ∈= hZ.
µ

Exercise 15.9.23. Assume g ∈ L2 (λn ). If µ is complex Borel measure, show that F(g∗µ) =
b. (Hint: Consider first functions g ∈ L1 ∩ L2 .)
gbµ
Exercise 15.9.24. Let g ∈ L2 (λn ) and suppose that gb ∈ L∞ (Rn , λn ). If f ∈ L2 (λn ), show
that f ∗ g ∈ L2 (λn ) and F(f ∗ g) = fbgb. (Hint: Consider first functions f ∈ L1 ∩ L2 .)
Exercise 15.9.25. Show that D is dense in Lp (µ), 1 ≤ p < ∞, for any regular measure
µ on B(Rn ). (Hint: Use the density of C00 (Rn ) combined with Stone–Weierstrass theorem
and Exercise 13.7.4.)
Exercise 15.9.26. Suppose φ ∈ Lp admits a partial derivative ∂xj φ at every point x ∈ Rn
and that |∂xj φ(x)| ≤ (1+|x|2A)(n+α)/2 for some constants A > 0 and α > 0. Show that ∂xj φ is
also the Lp partial derivative of φ.
Exercise 15.9.27. Show that the shift opeartor τh : φ(x) 7→ φ(x − h) is continuous on S
h→0
and that τh φ −−−→ φ in S.
Exercise 15.9.28. For any tempered distribution L show that
(a) uL = L ◦ ι, where ι : D(Rn ) → S is the inclusion map, is a distribution in D(Rn ).
(b) There is a unique uL ∈ D∗ (Rn ) such that uL = L ◦ ι.
(c) For any α ∈ Zn+ , polynomial P , and g ∈ S the following are also tempered distri-
butions: Dα L(φ) := (−1)|α| L(Dα φ), P · L(φ) := L(P φ), g · L(φ) := L(gφ).
Exercise 15.9.29. Show that (15.39) and (15.40) are equivalent.
Exercise 15.9.30. Find all radial functions in Rd that are harmonic. (Hint: Suppose
u(x) = v(r), where r = |x| = (x21 + . . . + x2n )1/2 . Show that △ u = v ′′ (r) + n−1 ′
r v (r) on
Rn \ {0}.)
Chapter 16

Countable product of
probability spaces

Products of measurable spaces are common in Probability theory as they provide the natural
setting for the study of sequences of random variables, and more generally, the construction
of of random processes.

16.1. Product of measurable spaces

Q
Let {(Eα , Fα , µα ) : α ∈ A} beN a collection probability spaces. We equipped E := α Eα
with the product σ–algebra α∈A Fα := σ(pα : α ∈ A). The main problem is to assign
a measure on the product space (E, F ) that is compatible with the measure structure of
each the factors (Eα , Fα , µα ).
Recall that for a collection {Xα : α ∈ A} of topological spaces, the product topology τp
is the minimal
Q topology for each projection pα is continuous. This means that the product
σ–algebra α B(Xα ), where Q B(X i ) is the Borel σ–algebra genrated by τi , is contained in
the Borel σ–algebra B α Xα generated by τp . The converse inclusion is not true in
general. By Theorem 3.9.5, if A is at most countable and each Xα is second countable then,
both σ–fields coincide.
Throughout this chapter, we will consider countable product of nice measurable spaces.

Example 16.1.1. A typical example, albeit theoretical, of a product space in Probability

theory is idealization of tossing of a fair coin infinitely many times. Here, A = N, En =
{Head, T ail} for each n ∈ N, and En = P({Head, T ail}).

505
506 16. Countable product of probability spaces

16.2. Independence
The concept of independence plays an important role in Probability and Statistics. It is
related to notion that one can repeat an experiments whose outcomes neither influence nor
are influenced by the outcomes of other experiments.

Definition 16.2.1. Consider a probability space (Ω, F , P). The sets in a collection C ⊂ F
are mutually independent if for any finite sub-collection D ⊂ C,

\ Y
P[ C] = P[C].
C∈D C∈D

The collections in {Ct ⊂ F : t ∈ T } are independent if for any finite I ⊂ T and any
choice Ci ∈ Ci , i ∈ I,

\ Y
P[ Ci ] = P[Ci ].
i∈I i∈I

A collection {Xt : t ∈ T } of measurable functions is said to be independent if the σ–

algebras in {σ(Xt ) : t ∈ T } are independent. A collection {Xt : t ∈ T } of independent
measurable of functions with values in a common space is said to be independent iden-
tically distributed , abreviated as i.i.d., if all Xt are equal in law.

Example 16.2.2. Consider ([0, 1], B([0, 1]), λ), the sets A = [0, 21 ] and B = [ 14 , 43 ] are
independent, for λ(A ∩ B) = 14 = λ(A)λ(B).

Definition 16.2.3. A measurable function X such that P[X = 0] = p, P[X = 1] = 1 − p

is called a Bernoulli random variable; a measurable function θ is uniformly distributed
on [0, 1], denoted θ ∼ U (0, 1), if its law is the Lebesgue measure on [0, 1].

P n
Recall that every x ∈ [0, 1] has a unique binary expansion x = n≥1 rn /2 where
P
rn ∈ {0, 1}, and n≥1 rn = ∞ for x > 0. Observe that for each n ∈ N, the n–th bit
map x 7→ rn (x) defines a measurable function from ([0, 1], B([0, 1])) to ({0, 1}, P({0, 1})).
Therefore, the map β : [0, 1] → {0, 1}N given by x 7→ (rn (x)) is measurable. The next result
is a mathematical formulation of tossing a fair coin.

Lemma 16.2.4. Suppose θ ∼ U [0, 1], and let {Xn = rn ◦ θ} its binary expansion. Then,
{Xn } is an i.i.d. Bernoulli sequence with rate p = 12 . Conversely, if (Xn ) is an i.i.d.
P
Bernoulli sequence with rate p = 21 , then θ = n≥1 2−n Xn ∼ U [0, 1].
16.2. Independence 507

Proof. Suppose that θ ∼ U (0, 1). For any N ∈ N and k1 , . . . , kN ∈ {0, 1},
N
\ N
X N
X
kj kj 1
{x ∈ (0, 1] : rj (x) = kj } = ( 2j
, 2j
+ 2N
]
j=1 j=1 j=1
2N −1 −1
[
{x ∈ (0, 1] : rN (x) = 0} = ( 22jN , 2j+1
2N
]
j=0
2N −1 −1
[
{x ∈ (0, 1] : rN (x) = 1} = ( 2j+1
2N
, 2(j+1)
2N
]
j=0

T QN
It follows immediately that P[ N j=1 {Xj = kj }] =
1
2N
= j=1 P[Xj = kj ]. Hence {Xn } is a
1
Bernoulli sequence with rate 2 .
Conversely, suppose {Xn : n ≥ 1} is a Bernoulli sequence with rate 12 . If θe ∼ U (0, 1), then
d
en } =
the first part shows that the sequence of bits {X {Xn }. Therefore,
X d
X
θ := 2−n Xn = en = θe
2−n X
n≥1 n≥1

since θ is a measurable function of {Xn }.

One can generate a U [0, 1] i.i.d. sequence out of a single U [0, 1] random variable.

Lemma 16.2.5. There exist a sequence (fn ) of measurable functions on [0, 1] such that for
any θ ∼ U [0, 1], (fn (θ)) is an i.i.d sequence random variables with f1 (θ) ∼ U [0, 1].

Proof. Reorder the sequence (rm ) of binary bit maps into a two–dimensional array (hn,j :
P h
n, j ∈ N), and define the function fn := j≥1 2njj on [0, 1] for each n. By Lemma 16.2.4,

{Xn = rn ◦ θ} forms a Bernoulli sequence with rate p = 21 . Thus, the collections σ(Xnj :
j ≥ 1) are independent. Again, by Lemma 16.2.4, it follows that (fn ) is an i.i.d. sequence
of U [0, 1] random variables.

Corollary 16.2.6. Suppose that (SnQ , B(Sn N

), µn ) are Borel probability spaces. Then, there
is a map F : ([0, 1], B([0, 1]), λ) → ( n Sn , n Sn ) suchQ that the pn : s 7→ −1
Nprojections sn ,
form an independent sequence of random variables on S
n n , S
n n , µ , µ = λ ◦ F ,
d
with pn = µn .

Proof. Suffices to assume that each (Sn , Sn ) = ([0, 1], B([0, 1])). Lemma 16.2.5 provides a
U [0, 1]–distributed i.i.d. sequence (fn ) of random variables defined on [0, 1]. Theorem 4.6.4
shows that for each n, there is a map Tn : [0, 1] → Sn such that λ ◦ Tn−1 = µn . The map F
given by x 7→ (Tn (fn (x))) has the stated properties.
508 16. Countable product of probability spaces

16.3. Ionescu Tulcea’s Theorem

In this section we consider a countable collection of probability spaces. We study conditions
under which it is possible to defined a single probability measure on the cartesion product
of spaces that is compatible with the probability measure of each factor.
Definition 16.3.1. Suppose that (S, S ) and (T, T ) are measurable spaces. A kernel µ
from S to T is a function µ : S × T → R+ such that
(i) For fixed s ∈ S, µ(s, ·) is measure on (T, T ).
(ii) For any B ∈ T , µ(·, B) is S –measurable.
If µ(·, T ) ≡ 1, then µ said to be a stochastic kernel .
Lemma 16.3.2. Suppose that µ is a stochastic kernel from S to T and that ν is a stochastic
kernel from S ⊗ T to U . For any measurable map f : S × T × U → R+
Z Z
(16.1) (µ ⊗ ν)f (s) = f (s, t, u)ν((s, t); du)µ(s; dt),
T T ×U
is S –measurable. Moreover, (16.1) defines a stochastic kernel from S to T × U . In partic-
ular, (µ ⊗ ν)(s, B × U ) = µ(s, B) for all B ∈ T .

Proof. Considering maps of the form 1A×B with A ∈ S and B ∈ T first. Then, a
∈ S ⊗ T . By
monotone class argument shows that µ(s, E) is S –measurable for any E
linearity and monotone convergence we extend S –measurability of µg (s) for arbitrary
S ⊗ T –measurable function g.

Next, consider maps of the form 1A×B×C , with A ∈ S , B ∈ T and C ∈ U . Then a

monotone class argument to show that (16.1) is S –measurable for functions of the form
1D , D ∈ S ⊗T ⊗U . Linearity and linearity and monotone convergence gives the extension
to arbitrary f .

The following result establishes the existence of a unique probability measure on any
countable product of measurable spaces, where a compatible collection of stochastic kernels
that involve finite–dimensional projections is prescribed. No topological restrictions need
to be imposed on the spaces.
Theorem 16.3.3. (Ionescu Tulcea) For any measurable spaces (Sn , Sn ) and stochastic
kernels µn from S1 × . . . × Sn−1 N to Sn , where µ1 is a measure on (S1 , S1 ), there exists
a unique probabilityQ measure on n Sn such that for any k, the law of the projection
(p1 , . . . , pk ) : n Sn → S1 × . . . × Sk is µ1 ⊗ . . . ⊗ µk .
Q N Q fn = Nn Sj ,
Proof. Let Ω = n Sn and F = n Sn . For each n, let Tn = j>n Sj , F j=1
f S
Fn = Fn × Tn and C = n Fn . Observe that C is an algebra and that σ(C) = F . For each
A∈F fn , define

(16.2) µ(A × Tn ) = (µ1 ⊗ . . . ⊗ µn )(A)

16.4. 0–1 laws. 509

By Lemma 16.3.2, formula (16.2) defines an additive function in C. By Carathéodory’s

extension theorem, to show that µ extends to a measure on F , it suffices to show that µ is
countably additive on C. To that effect, we will show that µ is continuous at ∅. Let {Cn } ⊂ C
be such that Cn ց ∅. Without loss of generality, we may assume that Cn = An × Tn for
some An ∈ F fn . Define
fkn = µk+1 ⊗ . . . ⊗ µn 1An , 0≤k<n
(16.3)
fnn = 1An
fk for each k ≤ n and
Clearly fkn ∈ F
(16.4) fkn = µk+1 fk+1
n
, 0≤k<n
Since {Cn } decreases, for k fixed.

fkn = µk+1 ⊗ . . . ⊗ µn 1An = µk+1 ⊗ . . . ⊗ µn ⊗ µn+1 1An ×Sn+1

≥ µk+1 ⊗ . . . ⊗ µn ⊗ µn+1 1An+1 = fkn+1
fk and, by (16.4) and dominated convergence,
Hence gk = limn fkn exists gk ∈ F
(16.5) gk = µk+1 gk+1 , k ≥ 0.
From the definition (16.2) and (16.3) we have
µ(Cn ) = f0n ց g0 , as n → ∞
If g0 6= 0, then by (16.5) there is s1 ∈ S1 such that g1 (s1 ) > 0. By induction, we obtain a
sequence s = (sn ) ∈ T0 such that gn (s1 , . . . , sn ) > 0 for each n. Thus,
1Cn (s) = 1An (s1 , . . . , sn ) = fnn (s1 , . . . , sn ) ≥ gn (s1 , . . . , sn ) > 0.
T
Therefore, s ∈ n Cn which contradicts Cn ց ∅.
Corollary 16.3.4. For any sequenceQ of probability
N spaces (Sn , Sn , µn ), there is a unique
probability measure µ = ⊗n µn on ( n Sn , n Sn ) such that the projections pn : s 7→ sn are
d
independent and pn = µn .

Proof. Consider the measures µn as kernels from S1 × . . . × Sn−1 to Sn and apply Ionescu
Tulcea’s theorem.

16.4. 0–1 laws.

Let (S, F ) be a measurable space. For a given countable index set I, let {Fi : i ∈ I} be
collection of sub–σ–algebras of F , {Xi : i ∈ I} be a collection of F –measurable functions.
Definition 16.4.1. The tail–σ–algebra of {Fi : i ∈ I} is defined as
\ [
τ {Fi : i ∈ I} := σ Fj .
J⊂I j∈I\J
#J<∞

The tail–σ–field of {Xi : i ∈ I} is defined as τ {σ(Xi ) : i ∈ I} .
510 16. Countable product of probability spaces

Events in the τ ({Xi : i ∈ I}) are those whose occurrence is independent of any fixed
finite subfamily of {Xi : i ∈ I}.

Lemma 16.4.2. Let Jn ⊂ I be an increasing sequence of finite sets with Jn ր I. Then

∞
\ [
τ {Fi : i ∈ I} = σ Fj .
n=1 j∈I\Jn
T S
In particular, if I = N, then τ {Fi : i ∈ N} = ∞n=1 σ
∞
m=n Fm .

Proof. The inclusion ⊂ is obvious. To prove inclusion in the opposite direction, suppose
J ⊂ I is finite. There exits N ∈ N such that J ⊂ JN . Then
∞
\ [ N
\ [ [ [
(16.6) σ Fj ⊂ σ Fj ⊂ σ Fj ⊂ σ Fj .
n=1 j∈I\Jn n=1 j∈I\Jn j∈I\JN j∈I\J

The left hand side of (16.6) is independent of J. Taking the intersection over finite subsets
of I gives the reverse inclusion.

Example 16.4.3. Suppose P is a probability measure on (S I , F ⊗I ), I = Z+ or I = Z, and

for each j ∈ I let pj denote the projection onto the j–th component. For each extended
integers −∞ ≤ m ≤ n ≤ ∞, let Fm n = σ(p : m ≤ j ≤ n). For each n, define F (n) = σ(p :
j j
|j| > n). Then, the tail σ–algebra of {pj : j ∈ I} is T = ∩n F (n) .

Theorem 16.4.4. (Kolmogorov 0 − 1 law) Let (S, F , P) be a probability space. Suppose

{Ai : i ∈
I} is a countable
collection of independent sub σ–algebras of F . The tail σ–algebra
T := T {Ai : i ∈ I} is trivial, i.e., P[A] ∈ {0, 1} for all A ∈ T .
nT o
Proof. For each finite set J ⊂ I, define EJ = j∈J Aj : Aj ∈ Aj and set
[
E = EJ
J⊂I
#J<∞

It is easy to check that

(a) For any finite subsets J1 ⊂ J2 ⊂ I, EJ1 ⊂ EJ2 .
(b) E is a semiring.
S
(c) σ(E ) = σ i∈I Aj .
Suppose A ∈ T . As T ⊂ σ(E ), by Corollary 3.3.6, for any ε > 0 there exits a finite collection
{Aj1 , . . . , AjN } ⊂ E such that

h[
N i N
[

P[A] − P Ajk ≤ P A△ Ajk < ε

k=1 k=1
16.5. Canonical space 511

S S
The set B = N k=1 Ajk belongs to EJ , with J = {j1 , . . . , jk }. Since A ∈ σ j∈I\J Aj , A
and B are independent, and so
ε > P[A \ B] = P[A](1 − P[B]) > P[A](1 − P [A] − ε).
Letting ε ց 0 yields 0 = P[A](1 − P[A]).

16.5. Canonical space

Suppose I is a countable index set and for a given measureable space (S, F ), P is a prob-
ability measure on (S I , F ⊗I ), for which the projections {pj : j ∈ I} form an independent
family. Kolmogorov’s 0–1 law assers that P[A] ∈ {0, 1} for any set A in the tail σ–algebra T .
If µ is a probability measure on (S, F ), then the canonical product space (S I , F ⊗I , µ⊗I )
has {pj : j ∈ I} as a collection of i.i.d. variables.
Consider the case I = N or I = Z. A natural transformation on (S I , F ⊗I ) is the shift
operator θ defined as (θs)(n) = s(n + 1) for all s ∈ F ⊗I . Clearly θ−1 (Fmn ) = F n+1 . The
m+1
−1
shift is P–invariant iff Pθ = P (P is θ–invariant). A set A ∈ F ⊗I is θ–invariant if
θ−1 (A) = A. It is easy to check that the collection Iθ of all θ–invariant measurable sets
σ–algebra. If P is the product measure µ⊗I , then P is clearly θ–invariant.
Lemma 16.5.1. If I = Z+ then Iθ ⊂ T . If I = Z and P is θ–invariant, then Iθ is
−n
contained in the measurable completion of the σ–algebras T− = ∩n F−∞ , T+ = ∩n Fn∞ and
T.

Proof. For I = Z+ the statement follows from θ−n (F0∞ ) = Fn∞ .

nk
Suppose I = Z. For A ∈ Iθ , let Bk ∈ F−n k
, nk > nk−1 ≥ 0, be a sequence of sets such that
−2n −1 −nk
limk P[A△Bk ] = 0. Since θ k (Bk ) ∈ Fn∞k and θ2nk +1 (Bk ) ∈ F−∞ , it follows that
P[A△Bk ] = P[θ−2nk −1 (A△Bk )] = P[θ2nk +1 (A△Bk )]
= P[A△θ−2nk −1 (Bk )] = P[A△θ2nk +1 (Bk )] → 0.
This shows that Ck = θ2nk +1 (Bk ) and Dk = θ−2nk −1 (Bk ) converge to A in L1 (P). Conse-
quently, along a subsequence k ′ , Ck′ and Dk′ converge to pointwise P–a.s. to A. which shows
that A1 = lim supk′ Ck′ ∈ T− , A2 = lim supk′ Dk′ ∈ T+ and P(A△A1 ) = 0 = P(A△A2 ).
Theorem 16.5.2. Suppose that the family of projections {pj : j ∈ I} is i.i.d. If A ∈ Iθ ,
then P[A] ∈ {0, 1}.

Proof. Let A ∈ Iθ be fixed. If I = Z+ , then Iθ ⊂ T . Thus, the conclusion follows from

Kolmogorov’s 0–1 law. If I = Z, then by Lemma 16.5.1 there are sets A1 ∈ T− and A2 ∈ T+
such that P[A△A1 ] = 0 = P[A△A2 ]. Therefore, by independence P[A] = P[A1 ∩ A2 ] =
P[A1 ]P[A2 ] = P[A]2 .

Another important type of measurable transformation on (S I , F ⊗I ) is obtain by per-

mutations of finite number of indexes of I. That is if π : I → I is a bijective function
such that π(j) = j for all but finitely many j, then π(s)(j) = s(π(j)) for all s ∈ S I . The
512 16. Countable product of probability spaces

collection P of sets in F ⊗I which are invariant under finite permutations forms a σ–algebra
called symmetric or exchangeable σ–algebra. It is easy to verify that
Iθ ⊂ T ⊂ P
Example 16.5.3. Suppose (S, F ) = (R, B(R)). and let A, B ∈ B(R).
(i) {x ∈ RZ+ : limn x(n) ∈ A} ∈ Iθ .
(ii) {x ∈ RZ+ : limn x(2n) ∈ A, limn x(2n + 1) ∈ B} ∈ T \ Iθ .
P
(iii) {x ∈ RZ+ : ∞ n=1 x(n) ≥ 0} ∈ P \ T .

Theorem 16.5.4. (Hewitt–Savage 0–1 law.) Suppose that the the family of projections
{pj : j ∈ I} is i.i.d. If A ∈ P, then P(A) ∈ {0, 1}.
nk
Proof. We will consider the case I = Z. Let A ∈ P and let Bk ∈ F−n k
be a sequence such
that limk P[A△Bk ] = 0. For each j, let sign(j) = −1{n<0} (j) + 1{n≥0} (j). Then,

j + sign(nk − j)(2nk + 1) if −nk ≤ j ≤ 3nk + 1
πk (j) =
j otherwise
is a finite permutation with πk ◦ πk = Id and Bk′ = πk−1 (Bk ) ∈ Fn∞k +1 . Hence
P[A△Bk′ ] = P[πk−1 (A△Bk )] = P[A△Bk ] → 0
By independence, P[A] = P[A ∩ A] = limk P[Bk′ ∩ Bk ] = limk P[Bk′ ]P[Bk ] = P[A]2 .

The Borel–Cantelli theorem, in the case of sequence {An } ⊂ F of independent events,

can be viewed as an instance of the Kolmogorov–0–1 law.
Theorem 16.5.5. (Borel–Cantelli: reprise). PSuppose that {An } are independent subsets in
a probability space. Then P[An i.o.] = 0 iff n P[An ] < ∞.

Proof. Necessity is Borel–Cantelli 4.3.4; sufficiency is the inverse Borel–Cantelli

P 8.5.3. We
provide another proof for sufficiency that uses only independence. Suppose that n P[An ] =
∞. Since 1 − x ≤ e−x for 0 ≤ x ≤ 1, if n < m then
m
Y Pm
P ∩m Ac
k=n k = 1 − P[Ak ] ≤ e− k=n P[Ak ] .
k=1

Hence P[∩k≥n Ack = 0. Therefore P[An i.o] = limn→∞ P[∪k≥n Ak = 1.
Example 16.5.6. A monkey writing a novel. Writing a novel amounts to enter a sequence
of N symbols into a computer. There are 30N different ways to write a string of characters
of length N with 26 letters together with a minimal set of punctuation: space, comma,
exclamation and interrogation signs. Suppose that a monkey can type–in one second– any
of the 30 characters with equal probability; then, the chance that this monkey types the
quote to be, or not to be, that is the question. with 42 random hits onto the keyboard
is about 9.14−63 . Let Ak the event that the previous quote is typed between the 42k + 1–th
and 42(k + 1)-th key–strokes. These events are independent and by second Borel–Cantelli
16.6. Symmetrization 513

theorem, they occur infinitely many often. The life span of a monkey is of the order of 108
seconds, as a result, it is unlikely that this famous quote will ever be typed this way.

16.6. Symmetrization
Let (Ω, F , P) be a probability space and let X be a real–valued random variable on Ω.
Definition 16.6.1. A probability measure µ on (R, B(R)) is symmetric if µ(A) = µ(−A)
for all A ∈ B(R). A random variable X defined on (Ω, F , P) is symmetric if its law
µX = P ◦ X −1 is symmetric.
Example 16.6.2. Suppose X and X̃ are two i.i.d. random variables in Ω. Then, Y = X −X̃
is symmetric. Y is called a symmetrization of X.
A median m for X is a 21 –quantile of the distribution of X, that is
1
max{P[X < m], P[X > m]} ≤
2
Proposition 16.6.3. If X ∈ L1 (P) and m is a median for X then,
E[|X − m|] = inf E[|X − c|].
c∈R

Proof. Suppose m ≤ a ≤ b. A simple

calculation shows that |X − b| − |X − a| = 2(b −
X)1{a<X≤B} + (b − a) 21{X≤a} − 1 ; whence,
E[|X − b|] − E[|X − a|] = 2E[(b − X); a < X ≤ b] + (a − b)(1 − 2P[X ≤ a]) ≥ 0.
Observe that −m is a median for −X whenever m is a median for X. Thus, if b ≤ a ≤ m,
we have that E[|b − X|] ≥ E[|a − X|].
Proposition 16.6.4. Suppose
√ that X ∈ Lp (P), p ≥ 1, and let µX = E[X]. If m is a median
p
for X, then |µX − m| ≤ 2kX − µX kp .

Proof. If m = µ then the statement holds trivially. Assume that m 6= µX . Then

|m − µX |p
≤ |m − µX |p min{P[X ≥ m], P[X ≤ m]}
2
≤ |m − µX |p P |X − µX | ≥ |m − µX | ≤ kX − µX kpp

Lemma 16.6.5. Let Y be the symmetrization of X as above, and let m be a median for
X. Then
1
(16.7) P |X − m| > r ≤ P |Y | > r ≤ 2P |X| > 2r , r≥0
2
Proof. As before, let Y = X − X̃ where {X, X̃} is an i.i.d pair of random variables. Observe
that
{X − m > r, X̃ ≤ m} ∪ {X − m < −r, X̃ ≥ m}
⊂ {|Y | > r} ⊂ {|X| > 2r } ∪ {|X̃| > 2r }
514 16. Countable product of probability spaces

By taking expectation we obtain that

1
P[|X − m| > r] ≤ P[X − m > r]P[X̃ ≤ m] + P[X − m < −r]P[X̃ ≥ m]
2
= P[X − m > r, X̃ ≤ m] + P[X − m < −r, X̃ ≥ m]
= P[{X − m > r, X̃ ≤ m} ∪ {X − m < −r, X̃ ≥ m}]
= P[|Y | > r] ≤ 2P[|X| > 2r ].

Corollary 16.6.6. Suppose Xn is a sequence of random variables that converge to zero in
probability. If mn is a median for Xn , then mn → 0.

Proof. From (16.7) we have that

P[|Xn − mn | > 2ε] ≤ 4P[|Xn | > ε] → 0
Since {|mn | > 2ε} ⊂ {|Xn − mn | > ε} ∪ {|Xn | > ε}, it follows that mn → 0.
Lemma 16.6.7. Suppose that X1 , . . . , Xn are independent symmetric random variables.
Let Sn = X1 + · · · Xn , and Mn = max1≤k≤n |Xk |. If ηn is the first of the Xk such that
|Xk | = Mn , then (ηn , Sn − ηn ) and (ηn , ηn − Sn ) are identically distributed random vectors.

Proof. Decompose Ω by the pairwise disjoint sets {ηn = Xk }, k = 1, . . . , n and observe

that the vectors (−X1 , . . . , −Xk−1 , Xk , −Xk+1 , . . . , −Xn ) and (X1 , . . . , Xk ) have the same
law. Hence, for any bounded measurable function f : R2 → R
E[f (ηn , Sn − ηn ); ηn = Xk ] = E[f (Xk , X1 + · · · Xk−1 + Xk+1 + · · · Xn ); ηn = Xk ]
= E[f (Xk , −X1 − · · · − Xk−1 − Xk+1 − · · · − Xn ); ηn = Xk ]
= E[f (ηn , ηn − Sn ; ηn = Xk ].
Therefore E[f (ηn , Sn − ηn )] = E[f (ηn , ηn − Sn )].
Theorem 16.6.8. Suppose that X1 , . . . , Xn are independent symmetric random variables.
Let Sn = X1 + · · · Xn , and Mn = max1≤k≤n |Xk |. Then,
1
(16.8) P[|Sn | > t] ≥ P[Mn > t].
2
Moreover, if the Xk are i.i.d., then
1
(16.9) P[|Sn | > t] ≥ 1 − exp(−nP[|X1 | > t])
2
Proof. Let ηn be as in Lemma 16.6.7, then (ηn , Sn − ηn ) and (ηn , ηn − Sn ) have the same
law. Consequently,
P[ηn > t] ≤ P[ηn > t; Sn − ηn ≥ 0] + P[ηn > t; Sn − ηn ≤ 0]
= 2P[ηn > t; Sn − ηn ≥ 0] ≤ 2P[Sn > t]
Similarly, P[ηn < −t] ≤ 2P[Sn < −t]; whence (16.8) follows.
16.6. Symmetrization 515

If in addition the Xk are i.i.d., then

1 1
P[|Sn | > t] ≥ 1 − (1 − P[|X1 | > t])n ≥ 1 − exp(−nP[|X1 | > t]));
2 2
where the last inequality follows from 1 − x ≤ e−x for 0 ≤ x ≤ 1.
Lemma 16.6.9. (Lèvy) Let {Xn } be a sequence of independent random variables and Sn =
X1 + · · · + Xn . For each l, k, let ml,k be a median for Sl − Sk . Then
P[ max |Sk + mn,k | > ε] ≤ 2P[|Sn | ≥ ε]
1≤k≤n

for any n ≥ 1 and ε > 0.

Proof. Let Tn = inf{1 ≤ k ≤ n : Sk + mn,k > ε} and Ak = {T Sn = k}. Clearly the

sets Ak are pairwise disjoint and {max1≤k≤n Sk + mn,k > ε} = nk=1 Ak . Notice that
{Sn > ε} ⊃ Ak ∩ {Sn − Sk ≥ mn,k }; therefore,
n
X n
X
P[Sn > ε] ≥ P[Ak ∩ {Sn − Sk ≥ mn,k }] = P[Ak ]P[{Sn − Sk ≥ mn,k }]
k=1 k=1
n
1X 1
≥ P[Ak ] = P[ max Sk + mn,k > ε]
2 2 1≤k≤n
k=1

Repeating the same reasoning to the sequence {−Xn }, we obtain that

1
P[−Sn > ε] ≥ P[ min Sk + mn,k < −ε].
2 1≤k≤n
We conclude that 2P[Sn > ε] ≥ P[max1≤k≤n |Sk + mn,k | > ε].
Theorem 16.6.10. Let {Xn } be a sequence of i.i.d random variables. Suppose there is a
sequence {an } ⊂ R such that
Sn
− an −→ 0 in probability
n
Then limx→∞ xP[|X1 | > x] = 0.

Proof. Let {X̃n } be an independent copy of {Xn }. Observe that for each n, Zn = Sn − S̃n =
Pn
k=1 (Xk − X̃k ) is a symmetrization of both Sn − nan and Sn . Let m be a median for X;
then, combining Lemma 16.7 and Theorem 16.6.8, for all n large enough, we obtain
1 1
−nP |X1 −X̃1 |>2nε
2P |Sn − nan | > nε ≥ P |Zn | > 2nε ≥ 1−e
2 2
1 1 1 1
≥ 1 − e− 2 nP |X1 −m|>2nε ≥ 1 − e− 2 nP |X1 |>2nε−|m| .
2 2
Thus, if limn P[|Sn − an | > nε] = 0, then limx→∞ xP[|X1 | > x] = 0.

.
516 16. Countable product of probability spaces

16.7. Series of independent random variables

16.7.1. Kolmogorov’s three series theorem.

Theorem 16.7.1. Suppose {Xn } ⊂ L2 (P) is a sequence of independent random variables

with zero mean. Then, for any ε > 0
n
1 X
(16.10) P[ max |Sk | > ε] ≤ E[Xk2 ].
1≤k≤n ε2
k=1

If in addition R = supn kXn k∞ < ∞, then

(R + ε)2
(16.11) P[ max |Sk | > ε] ≥ 1 − Pn 2
1≤k≤n k=1 E[Xk ]

Proof. Let T = inf{k S≥ 1 : |Sk | > ε} and define Ak = {T = k}. Observe that Bn =
{max1≤k≤n |Sk | > ε} = nk Ak , and that Sk 1Ak is independent from Sn −Sk for all 1 ≤ k ≤ n.
Hence
n
X n
X
E[|Sn |2 ] ≥ E[|Sn |2 1Bn ] = E[|Sn |2 1Ak ] ≥ E (|Sk |2 + 2Sk (Sn − Sk ))1Ak
k=1 k=1
n
X n
X
= E[(|Sk |2 1Ak ] ≥ ε2 P[Ak ] = ε2 P[Bn ]
k=1 k=1

Suppose that R = supn kXn k∞ < ∞. Then

E[|Sn |2 ] = E[|Sn |2 1Bn ] + E[|Sn |2 1Bnc ]

≤ E[[|Sn |2 1Bn ] + ε2 (1 − P[Bn ])

On the other hand, |Sk |1Ak ≤ (|Xk | + |Sk−1 |)1Ak ≤ 1Ak (R + ε); hence
n
X n
X
E[|Sn |2 1Bn ] = E (Sk2 + (Sn − Sk )2 )1Ak = E Sk2 1Ak ] + P[Ak ]E[|Sn − Sk |2 ]
k=1 k=1
X
n
2 2 2 2
≤ (R + ε) + E[|Sn | ] P[Ak ] = P[Bn ] (R + ε) + E[|Sn | ] .
k=1

Therefore
E[|Sn |2 ] − ε2 (R + ε)2
P[Bn ] ≥ ≥ 1 −
(R + ε)2 + E[|Sn |2 ] − ε2 E[|Sn |2 ]

Lemma 16.7.2. (Kolmogorov)

P Suppose {X
P n } ⊂ L2 (P) is an independent sequence of ran-
dom variables. If n var[Xn ] < ∞, then n (Xn − E[Xn ]) converges P–a.s.
16.7. Series of independent random variables 517

Proof. Let Yn = Xn − E[Xn ] and Sn = Y1 + . . . Yn . Then by (16.10)

1 X
P[ sup |Sm − Sn | ≥ ε] ≤ 2 var[Xk ] → 0 as n → ∞
m≥n ε
m≥n
This shows that Sn converges P–a.s.
Theorem 16.7.3. (Kolmogorov’s three series theorem) Let {Xn } be s sequence
P of indepen-
b
dent random variables. Given b > 0, let Xn = Xn 1{|Xn |≤b} . The series n Xn converges
P–a.s. if and only if the following holds
P
(i) P[|Xn | > b] < ∞;
Pn
(ii) E[Xnb ] converges;
Pn b
(iii) n var[Xn ] < ∞.
for some b > 0.
P
Proof. Necessity: Suppose n Xn converges, then Xn → 0 P–a.s. Hence,
P[Xnb 6= Xn , i.o] = P[|Xn | > b, i.o] = 0.
P
By the reversed Borel–Cantelli,
Pn n P[|Xn | > b] < ∞, that is, (i) holds. Consequently,
Snb = k=1 kX b converges P–a.s. Let {X̃ } be an independent copy of {X b } and let
b,n n
P
Zb,n := Xnb − X̃b,n and Tb,n = nk=1 Zb,k be the corresponding symmetrization of Xnb and Snb
P
respectively. Then, n X̃b,n and therefore Tb,n converge P–a.s. Observe that kZb,n k∞ ≤ 2b
and E[Zb,n ] = 0. The second Kolmogorov inequality (16.11) shows that
(2b + ε)2
P sup |Tb,m − Tb,n | > ε ≥ 1 − P 2
m≥n m≥n E[Zb,m ]

Since P–a.s. convergence of Tb,n is equivalent to limn P supm≥n |Tb,m − Tb,n | > ε = 0, we
P 2 ] =
P P
conclude that n E[Zb,n n var[Xnb − X̃b,n ] = 2 n var[Xnb ] converges. So (iii) holds,
P P
and by Lemma 16.7.2, n Xnb − E[Xnb ] converges. Consequently, n E[Xnb ] converges; that
is (ii) holds.
P P
Sufficiency: (iii) implies that n Xnb − E[Xnb ] converges; (ii) implies that n Xnb converges;
(i) implies, by the reversed Borel–Cantelli lemma, that P[Xnb 6= Xn , i.o] = 0. Therefore
P
n Xn converges P–a.s.
Example 16.7.4. Let (ǫn : n ∈ N) be an i.i.d sequence of random variables with P[ǫn =
1] = 12 = P[ǫn = 1]. For any sequence of nonnegative numbers (cn : n ∈ N), Kolmogorov’s
P P
three series theorem implies that X = n≥1 cn ǫn converges a.s. iff n c2n < ∞.
The law µX of X is continuous in the sense that µX ({a}) = 0 P for any a ∈ R. Indeed, let (cnk )
be a subsequence of (cn ) such that cnk+1 < 12 cnk . Then X1 := ∞ k=1 cnk ǫnk and X2 := X−X1
are independent. The law of X1 admits no atoms since the map X ′ : {−1, 1}N → R given
by
X
ω 7→ ωk c n k
k
518 16. Countable product of probability spaces

is injective. To check this, suppose ω = 6 ω ′ and let k be the first component such that
′
ωk 6= ωk . Then
X X
|X ′ (ω) − X ′ (ω ′ )| ≥ 2cnk − 2 cnk+j > 2cnk − 2cnk 2−j = 0.
j≥1 j≥1

As µX = µX1 ∗ µX2 , the conclusion follows from Fubini’s theorem.

The characteristic function φX of X is given by

Y
φX (t) = cos cn t
n≥1

By the inversion formula (i) and the continuity of µX

Z T
1 sin at sin bt Y
P[a < X ≤ b] = lim − cos cn t dt
T →∞ 2π −T t t
n≥1

From the inversion formula (ii)

Z T Y
1
P[X = a] = lim e−iat cos cn t dt
T →∞ 2T −T
n
Z Y
1 T
= lim cos(at) cos cn t dt = 0
T →∞ T 0
n
for all a ∈ R.

16.7.2. Lévy characterization theorem. Let {Xn } be a sequence of random variables

and Sn = X1 + · · · + Xn . For any l, k let ml,k be a median for Sl − Sk .
Theorem 16.7.5. Sn converges P–a.s. iff Sn converges in probability

Proof. Only sufficiency needs be proved. If {Sn } converges in probability, then

lim sup P[|Sl − Sk | > ε] = 0;
n→∞ l,k≥n

hence, limn→∞ supl,k≥n ml,k = 0. Therefore, given ε > 0, there is n0 such that
sup |mk,l | < ε n ≥ n0
k,l≥n

Consequently, by Lemma (16.6.9), for n ≥ n0

Letting k → ∞ and then n → ∞ shows that

lim P[sup |Sk − Sn | > 2ε] = 0
n k≥n

for any ε > 0. This is equivalent to Sn converges P–a.s.

16.8. The law of large numbers of independent variables 519

16.8. The law of large numbers of independent variables

The law of large numbers (LLN) is a set of statements that describes the result of performing
the same experiment a large number of times. According to the law, the average of the
results obtained from a large number of trials should be close to the expected value. The
LLN guarantees stable long-term results for the averages of some random events. For
example, while a casino may lose money in a single spin of the roulette wheel, its earnings
will tend towards a predictable percentage over a large number of spins. Any winning streak
by a player will eventually be overcome by the parameters of the game. It is important
to remember that the LLN only applies (as the name indicates) when a large number of
observations are considered. There is no principle that a small number of observations will
coincide with the expected value or that a streak of one value will immediately be balanced.
In this section we discussed two kinds of law of large numbers. The first one corresponds
results reveals probabilistic (weak) properties of averages of random events; the second
(strong) describes almost surely and integrability properties of such averages.

16.8.1. Weak law of large numbers. We start this section with the statement and proof
of two technical results will be useful in the proof of both the weak and strong versions of
the LLN.

Lemma 16.8.1. (Cesáro average) Let {µt : t > 0} be a family of probability measures on
[0, ∞) such that limt→∞ µt ([0, a]) = 0 for any a ≥ 0. If f is a bounded measurable function
in [0, ∞) and limt→∞ f (t) = b, then
Z
lim f (s) µt (ds) = b.
t→∞ [0,∞)

Proof. Given ε > 0, choose a > 0 such that |f (t) − b| < ε whenever t ≥ a. Then

Z Z Z

f (s) µt (ds) − b ≤ |f (s) − b| µt (ds) + |f (s) − b| µt (ds)
[0,∞) [0,a] (a,∞]
≤ 2kf ku µt [0, a] + ε(1 − µt [0, a])

The result follows by letting t → ∞ first, and then letting ε ց 0.

P 16.8.2. (Kronecker) Let an and xn be numeric sequences such that an ր ∞ and

Lemma
that n xann converges, then

n
1 X
(16.12) lim xk = 0
n→∞ an
k=1
520 16. Countable product of probability spaces

P P
Proof. Let bn = nk=1 xakk and a0 = 0 = b0 so that xn = an (bn − bn−1 ). If sn = nk=1 xk ,
then summation by parts gives
X n n
X n
X
sn = ak (bk − bk−1 ) = an bn + ak−1 bk−1 − ak bk−1
k=1 k=2 k=1
n
X
= a n bn − (ak − ak−1 )bk−1 .
k=1
Hence
n
X (ak − ak−1 )
sn
(16.13) = bn − bk .
an an
k=1
Let b = limn bn . As ak−1 ≤ ak ր ∞, the sum on the right of 16.13 converges to b by
Cesáro’s lemma and thus, (16.12) follows.
Theorem 16.8.3. For each n, let Xn,m , m = 1, . . . , mm be independent random variables.
Let {bn } ⊂ (0, ∞) be a numeric sequence with bn → ∞ and define the truncated sequence
en,m = Xn,m 1{|X ≤b } . Suppose that
X n,m n
Pm n
(a) limn→∞ m=1 P[|Xn,m | > bn ] = 0;
P en,m |2 ] = 0.
(b) limn→∞ 12 mn E[|X
bn m=1
P mn P mn
Let Sn = e Sn −an
m=1 Xn,m and an = E m=1 Xn,m . Then, bn converges to 0 in probability.

Proof. For any ε > 0,

h i h i
e en −an
(16.14) P Snb−a
n
n
> ε ≤ P[S n 6
= Sn ] + P S
bn > ε
The first term on the right of (16.14) converges to 0 by (a) since
m
[n mn
X
P[Sn 6= Sen ] ≤ P[ en,m }] ≤
{Xn,m 6= X P[|Xn,m | > bn ] → 0.
m=1 m=1
The second term on the right of (16.14) converges to 0 by Chebyshev–Markov’s inequality
and (b) since
mn mn
1 e 2
1 X e 1 X en,m |2 ] → 0.
E | Sn − a n | = var( X n,m ) ≤ E[|X
ε2 b2n ε2 b2n ε2 b2n
m=1 m=1
Since ε > 0 is arbitrary, the proof is complete.
Theorem 16.8.4. (Weak law of Plarge numbers (WLLN)) Let {Xn } be a sequence of i.i.d
random variables, define Sn = nm=1 Xn and µn = E[Xn 1{|Xn |≤n} ]. The following state-
ments are equivalent
(i) limx→∞ xP[|X1 | > x] = 0.
Sn
(ii) n − µn → 0 in probability.
Sn
Consequently, if X1 ∈ L1 , then n → E[X1 ] in probability.
16.8. The law of large numbers of independent variables 521

Proof. The implication (ii) implies (i) is in Theorem 16.6.10.

Suppose (i) holds. For each n, define Xn,m = Xm , m = 1, . . . , n and let bn = n. By

Theorem 16.8.3, it is enough to verify condition (a) and (b) in that statement. Observe
that
n
X
P[|Xn,m | > ε] = nP[|X1 | > n] → 0
m=1

e1 = X1 1{|X |≤n} , then by Fubini’s theorem

Hence, (a) holds. Let X 1

Z ∞ Z n
e 1 |2 ] =
E[|X e1 | > t] dt =
2tP[|X 2tP[|Xe1 | > t] dt
Z0 n 0
Z n
= 2tP[n ≥ |X1 | > t] dt ≤ 2tP[|X1 | > t] dt.
0 0

Since tP[|X1 | > t → 0 as t → ∞, we conclude from Lemma 16.8.1 that

n Z
1 X e 2 1 n
E[|Xn,m | ] ≤ 2tP[|X1 | > t] dt → 0
n2 n 0
m=1

The last statement follows from dominated convergence since

xP[|X1 | > x] ≤ E[|X1 |1{|X|>x} ] → 0

µn = E[X1 1{|X1 |≤n} ] → E[X1 ].

16.8.2. Strong law of large numbers. The following result is one version of the strong
law of large numbers for i.i.d. random variables.

Theorem 16.8.5. (Kolmogorov, Marcinkiewicz, Zygmund) Let {Xn } be a sequence of i.i.d.

random variables. For any 0 < p < 2 define
n
1 X 1
(16.15) An (p) := √ Xk = √ Sn .
p
n p
n
k=1

An (p) converges P–a.s. as n → ∞ if and only if

(i) E[|X1 |p ] < ∞ and
(ii) either 0 < p ≤ 1 or E[X1 ] = 0.
In either case, limn An (1) = E[X1 ], and limn An (p) = 0 otherwise.
522 16. Countable product of probability spaces

Proof. Sufficiency: Assume that E[|X1 |p ] < ∞ and also that E[X1 ] = 0 if p > 1. Let
X̃n = Xn 1{|Xn |≤n1/p } . By Fubini’s theorem, we have that
X X XZ n
1/p
P[Xn 6= X̃n ] = P[|Xn | > n ] ≤ P[|Xn | > t1/p ] dt
n n n n−1
Z ∞
= P[|X|p > t] dt = E[|X|p ] < ∞
0

Borel–Cantelli lemma shows that P[Xn 6= X̃n , i.o] = 0. Consequently, to show that An (p)
1 Pn
converges it is enough to show that n1/p k=1 X̃n → 0 P–a.s. By Kronecker’s lemma, it
P X̃n
suffices to show that n n1/p < ∞ P–a.s.

In the case p < 1 we have the estimate

hX 1 i X 1
E 1/p
| X̃ n | = 1/p
E[|Xn |; |Xn | ≤ n1/p ]
n
n n
n
Z ∞
1/p 1
≤ 2 1/p
E[|X1 |; |X1 | ≤ t1/p ] dt
0 t
h Z ∞ i 21/p p
1/p 1
= 2 E |X1 | 1/p
dt = E[|X1 |p ] < ∞.
|X1 | t
p 1 − p

P 1
If p > 1, then by Kolmogorov’s lemma, it is enough to show that n var( n1/p X̃n ) =
P 1 P 1
n n2/p var(X̃n ) and n n1/p E[X̃n ] converge. For the former series, the following estimate
holds for p ≥ 1
X 1 X 1
2/p
var( X̃ n ) ≤ 2/p
E[|Xn |2 ; |Xn | ≤ n1/p ]
n
n n
n
Z ∞
1
≤ 22/p 2/p
E[|X1 |2 ; |X1 | ≤ t1/p ] dt
0 t
h Z ∞ i 41/p p
1
= 41/p E |X1 |2 2/p
dt = E[|X1 |p ] < ∞
|X1 |p t 2 − p

As for the latter series, observe that E[X̃n ] = −E[Xn ; |Xn | > n1/p ]. Hence, for p > 1
X 1 X 1
1/p
|E[X̃n ]| ≤ 1/p
E[|Xn |; |Xn | > n1/p ]
n
n n
n
Z ∞
1
≤ 1/p
E[|X1 |; |X1 | > t1/p ] dt
0 t
h Z |X1 |p i
1 p
= E |X1 | 1/p
dt = E[|X1 |p ] < ∞.
0 t 1 − p
P
In the special case p = 1, notice that n1 nk=1 E[Xk ; |Xk | ≤ k] = 0 since by dominated
convergence E[Xn ; |Xn | ≤ n] = E[X1 ; |X1 | ≤ n] → E[X1 ] = 0. Hence, it suffices to show
16.9. Random Walks 523

P P
that n1 nk=1 (X̃k −E[X̃k ]) → 0, which follows from the previous estimate n 1
n2/p
var(X̃n ) ≤
Cp E[|X1 |p ] with p = 1 and Kolmogorov’s lemma.
1
Necessity: Assume that Ap := limn S
n1/p n
converges P–a.s. Then
Xn Sn n − 1 1/p S
n−1
= 1/p − →0
n1/p n n (n − 1)1/p
Consequently, P[|Xn | > n1/p , i.o] = 0 and by the reversed Borel–Cantelli lemma and Fu-
bini’s theorem
Z ∞ X
p
E[|X1 | ] = P[|X1 |p > t] dt ≤ 1 + P[|X1 | > n1/p ] < ∞.
0 n≥1

The proof of sufficiency shows that Ap := 0 for p < 1 and A1 = E[X1 ]. If p > 1, the proof
of sufficiency shows that
n
1 X 1
(Xn − E[Xn ]) = (Sn − nE[X1 ]) → 0
n1/p k=1
n1/p

This implies that n1−1/p E[X1 ] converges. Consequently, E[X1 ] = 0.

Theorem 16.8.6. ( L1 law of large numbers) Suppose {Xn } ⊂ L1 (P) is an i.i.d sequence.
Then n1 Sn converges to E[X1 ] P–a.s. and in L1 .

Proof. Only L1 needs to be proved. For bounded X1 , the conclusion of the statement
follows from dominated convergence. For general X1 , use the truncation Xnm = Xn 1|Xn |≤m .
By dominated convergence,
Pn for any ε, there is m0 such that kX1m − X1m k1 < ε/3 for all
m m
m ≥ m0 . Let Sn = k=1 Xk , then
kSn − E[X1 ]k1 ≤ kSn − Snm0 k1 + kSnm0 − E[X1m0 ]k1 + kX1m0 − X1 k1
≤ 2kX1m0 − X1 k1 + kSnm0 − E[X1m0 ]k1 < 2ε/3 + kSnm0 − E[X1m0 ]k1
The conclusion follows by first letting n → ∞, and then ε → 0.

16.9. Random Walks

A sequence {Sn : n ∈ Z+ } of Rd –valued random variable is a random walk if S0 ≡ 0 and
Y = {Yn = Sn − Sn−1 : n ∈ N} is an i.i.d. sequence. The sequence Y are the steps of the
random walk. Clearly Sn = Y1 + . . . + Yn for all n ≥ 1.
Theorem 16.9.1. For a random walk on R, one and only one of the following hold:
(i) Sn = 0 P–a.s. for all n.
(ii) limn Sn = −∞ P–a.s.
(iii) limn Sn = +∞ P–a.s.
(iv) −∞ = lim inf n Sn < lim supn Sn = +∞ P–a.s.
524 16. Countable product of probability spaces

Proof. Hewitt–Savage 0 − 1 law implies that for some constant c ∈ R lim inf n Sn = c
P–a.s. Since Y is an i.i.d. sequence {Sn+1 − Y1 : n ∈ N} and {Sn : n ∈ N} have the same
distribution; therefore, c − Y1 = c P–a.s. If c is finite, then Y1 = 0 P–a.s. which in turn
implies (i). If Y1 6≡ 0, then c is either +∞ or −∞. The same analysis applies to lim supn Sn .
Clearly the possibility lim supn Sn = −∞ and lim inf n Sn = +∞ is not possible. This proves
the theorem.

In the remaining of this section we will analyze how often a random walk on Rd returns
near a point x ∈ Rd .
Definition 16.9.2. Suppose S = {Sn : n ∈ Z+ } is a random walk in Rd . A point x is said
to be a recurrent point for S if for every ε > 0, P[kSn − xk < ε i.o] = 1. A point x is said
to be a possible values for S if for any ε > 0, there is n ∈ N such that P[kSn − xk < ε] > 0.

The set of recurrent points and the set of possible

S values for Sn will be denoted by V
and U respectively. Since {kSn − xk < ε i.o} ⊂ n≥1 {kSn − xk < ε} for all x ∈ Rd and
ε > 0, it is clear that V ⊂ U .
Theorem 16.9.3. The set of recurrent points V is either ∅ or a closed subgroup of Rd . In
the latter case, V = U .

Proof. Suppose V = 6 ∅ throughout the proof. If x ∈ Rd \ V there is ε > 0 such that

P[kSn − xk < ε i.o.] < 0 (in fact 0 by Hewitt–Savage’s 0 − 1 law). Since A ⊂ B(x; ε} implies
that {Sn ∈ A, i.o} ⊂ {Sn ∈ B(x; ε), i.o}, it follows that V is closed.

To prove that V = U and that V is a subgroup of (Rd , +) it is enough to show that x ∈ U

and y ∈ V implies y − x ∈ V (see Exercise 1.6.12. Suppose that is not the case and that
there is a pair (x, y) ∈ U × V such that y − x ∈
/ V. Then, for some ε > 0 and m ∈ N
h \ i
P {kSn − (y − x)k ≥ 2ε} > 0
n≥m

As x ∈ U , there is k ∈ N such that P[kSk − xk < ε] > 0. Since {Sn+k − Sk : n ∈ N} and

{Sn : n ∈ N} have the same law, we have that
h \ i h \ i
P {kSn − (y − x)k ≥ 2ε} = P {kSn+k − Sk − (y − x)k ≥ 2ε} > 0
n≥m n≥m

Notice that
\ \
{kSk − xk < ε} ∩ {kSn+k − Sk − (y − x)k ≥ 2ε} ⊂ {kSn+k − yk ≥ ε}
n≥m n≥m

Since {Yn : n ∈ N} are i.i.d, we obtain that

h \ i h \ i
0 < P[kSk − xk < ε]P {kSn − (y − x)k ≥ 2ε} ≤ P {kSn − yk ≥ ε}
n≥m n≥m+k

This contradicts to the assumption y ∈ V which implies that P[kSn − yk < ε i.o] = 1.
Therefore y − x ∈ V.
16.9. Random Walks 525

The following result provides some conditions under which V = ∅.

P
Lemma 16.9.4. If
P n≥1 P[kSn k < ε] < ∞, then P[kSn k < ε i.o.] = 0. Conversely, if
n≥1 P[|Sn k < ε] = ∞, then P[kSn k < 2ε i.o.] = 1.

Proof. The first statement is a direct consequence of Borel–Cantelli’s theorem. For the
second part, set F = {kSn k < ε i.o}c . Looking at the last time kSn k < ε we obtain
X h \ i
P[F ] = P {kSm k < ε} ∩ {kSn k ≥ ε}
m≥0 n≥m+1
X h \ i
≥ P {kSm k < ε} ∩ {kSn − Sm k ≥ 2ε}
m≥0 n≥m+1
 
X h\ i
= P[kSm k < ε] P {kSn k ≥ 2ε}
m≥0 n≥1

P T i
Since P[F ] < 1 and m≥0 P[kSm k < ε] = ∞, we conclude that P n≥1 {kSn k ≥ 2ε} = 0.
Let k ≥ 2 and set
\
A(m, k) := {kSm k < ε} ∩ {kSn k ≥ ε}
n≥m+k

For each ℓ = 0, . . . , k − 1, the sets in Aℓ = {A(m, k) : m ≡ ℓ mod k} are pairwise disjoint.

Hence
k−1
X [ X
k≥ P Aℓ = P[A(m, k)]
ℓ=0 m≥0
X h \ i
≥ P {kSm k < ε} ∩ {kSn − Sm k ≥ 2ε}
m≥0 n≥m+k
 
X h h\ i
= P {kSm k < ε} P {kSn k ≥ 2ε}
m≥0 n≥k
hT i
As before, we conclude that P n≥k {kSn k ≥ 2ε} = 0. Therefore
h\ i
P[lim inf {kSn k ≥ 2ε}] = lim P {kSn k ≥ 2ε} = 0
n n
n≥k

The next result shows that convergence of the series in Lemma 16.9.4 is independent of
ε > 0. To make things simpler, we will use the uniform norm kxk = max1≤j≤d |xj | on Rd .
Lemma 16.9.5. For any integer m ≥ 2
X X
P[kSn k < mε] ≤ (2m)d P[kSn k < ε]
n≥0 n≥0
526 16. Countable product of probability spaces

Proof. Dividing the d–cube (−mε, mε)d in (2m)d cubes of size ε we obtain that
X XX
P[kSn k < mε] ≤ P[Sn ∈ kε + [0, ε)d ]
n≥0 n≥0 k

where the inner sum is over k ∈ {−m, . . . , m − 1}d . Let

Tk := inf{ℓ ≥ 0 : Sℓ ∈ kε + [0, ε)d }
Then, by Fubini’s theorem
X n
XX
d
P[Sn ∈ kε + [0, ε) ] = P[Sn ∈ kε + [0, ε)d , Tk = ℓ]
n≥0 n≥0 ℓ=0
XX h i
≤ P kSn − Sℓ k < ε, Tk = ℓ
ℓ≥0 n≥ℓ

Since the events {Tk = ℓ} and {kSn − Sℓ k < ε} are independent, we further obtain that
  
XX h i X X
P kSn − Sℓ k < ε, Tk = ℓ =  P[Tk = ℓ]  P[kSn k < ε]
ℓ≥0 n≥ℓ ℓ≥0 n≥0
X
≤ P[kSn k < ε]
n≥0

As the cardinality of {−m, . . . , m − 1}d is (2m)d , the conclusion of the lemma follows.
P
Theorem 16.9.6. For any random walk Sn , V = ∅ iff n≥0 P[kSn k < ε] < ∞ for some
(and hence all) ε > 0.

Proof. This follows by combining Lemmas 16.9.4 and 16.9.5.

Lemma 16.9.7. If Sn is a recurrent random walk in R and V =
6 {0}. Then either V = hZ
for some h > 0, or V = R.

Proof. Suppose V is not a lattice, i.e., there is no h > 0 for which V/h ⊂ Z. We claim
that m := inf{x ∈ V : x > 0} = 0. suppose m > 0. Then, there is d ∈ V such that
mq < d < (m + 1)q for some q ∈ N. Hence
d 1
m< <m 1+
q q
Then, there is x ∈ V such that m ≤ x < dq ; consequently,
0 < d − xq < q(m − x) + m ≤ m
Since d − xq ∈ V, this contradicts the definition of m. Therefore,Sm = 0. To conclude,
for any ε > 0 choose v ∈ G with 0 < v < ε. Since G ∩ (0, ∞) = n≥0 (nv, (n + 1)v] and
{nv : n ∈ N} ⊂ V, we conclude that any x ∈ (0, ∞) is with ε–distance from V. This shows
that V is dense in R.

We conclude this section with an important result for one–dimensional random walks.
16.10. Exercises 527

1
Theorem 16.9.8. (Chung–Fuchs) Suppose Sn is a random walk on R. If n Sn → 0 in
probability, then V =
6 ∅.
P
Proof. By Theorem 16.9.6, it suffices to show that n≥0 P[|Sn | < 1] = ∞. By Lemma 16.9.5,
for any m ≥ 2 and L ∈ N

1 X h ni
X Lm
1 X
P[|Sn | < 1] ≥ P[|Sn | < m] ≥ P |Sn | <
2m 2m L
n≥0 n≥0 n=0

where the last inequality follows from

h the observation
i that x 7→ P[|Sn | < x] is nondecreasing
|Sn | 1
for each n. By assumption limn P n < L = 0; hence,
X L
P[|Sn | < 1] ≥
2
n≥0
P
Since L is arbitrary, it follows that n≥0 P[|Sn | < 1] = ∞.

16.10. Exercises
Exercise 16.10.1. Show that if {Ct : t ∈ T } is an independent family of π–systems then,
the σ–algebras σ(Ct ), t ∈ T ), are independent.

Exercise 16.10.2. Suppose X, Y are identically distributed random variables and that
Y
X > 0 and E[X] < ∞. Show that E X > 1, unless X is constant a.s.

Exercise 16.10.3. Suppose X1 , . . . , Xn are positive independent

random variables with
Xj ∼ Gamma(aj , θ). Show that P = PnX1 Xj , . . . , PXnn−1Xj is a random vector with
j=1 j=1
n−1
values in Dn−1 := {p ∈ R+ : p1 + . . . + pn−1 < 1} whose distribution δa1 ,...,an is absolutely
n−1
continuous with respect to Lebesgue measure on R+ , and
dδa1 ,...,an 1 an−1 −1 a −1
(p) = pa11 −1 · . . . · pn−1 1 − (p1 + . . . + pn−1 ) n 1Dn−1 (p)
dλn−1 B(α1 , . . . , an )
where B(a1 , . . . , an ) is the generalized Beta function (See Example 9.6.12. The probability
measure δa1 ,...,an on Dn−1 is called Dirichlet’s distribution with parameters a1 , . . . , an .

Exercise 16.10.4. Suppose X and Y are independent Rd –valued random vectors (that is,
Rd –valued measurable functions) defined on a common probability space (Ω, F , P). Let µX
and µY be the laws of X and Y respectively. Show that the law µZ of Z = X + Y is given
by the convolution µX ∗ µY .

Exercise 16.10.5. Suppose that X and Y are independent randomRvariables defined on a

common probability space (Ω, F , P). If X, Y ∈ L1 (P) and E[Y ] := Y dP = 0, show that
E[|X|] ≤ E[|X + Y |].
528 16. Countable product of probability spaces

Exercise 16.10.6. Let (ǫn : n ∈ N) be an i.i.d sequence of Bernoulli random variables with
p = 1/2. Let
X ǫn
X=
3n
n≥1

Show that X has Cantor Devil’s stairs distribution defines in Example 3.4.4. Find E[X]
and var[X].
Exercise 16.10.7. If Z is a compound random walk subordinated by N with P –distributed
steps, show that
P
(i) ϕZ (t) = E[eitZ ] = ∞ n N
n=0 ϕP (t)P[N = n] = E[(ϕP (t)) ].

(ii) If N is Poisson distributed with parameter λ then ϕZ (t) = exp λ(ϕP (t) − 1) .
p
(iii) If N is geometric with parameter p, then ϕZ (t) = 1−(1−p)ϕP (t)

Exercise 16.10.8. Let (S, F , µ) be a probability space and that I is a countable set of
indices. If the set of projections {pj : j ∈ I} is i.i.d on (S I , F ⊗I , µ⊗I ), show that Pπ −1 = P
for all finite permutation π of I. (Hint: Consider first the collection of all finite dimensional
elementary cylinders.)
Exercise 16.10.9. For any q ∈ (0, 1) consider the function

φq (x) = q − 1(−∞,0] (x) x.
It is easy to check that φ1−q (−x) = φq (x). Show that for any a ≤ b

φq (x − b) − φq (x − a) = (b − x)1(a,b] (x) + (b − a) 1(−∞,a] (x) − q .
If X ∈ L1 (P) and zq is a q–th quantile of X, show that
E[φq (X − zq )] = min E[φq (X − a)].
a∈R
Observe that Proposition 16.6.3 follows by taking q = 1/2.
Exercise
S 16.10.10. Suppose µ is the step distribution of a random walk S. Show that
U = n≥1 supp(µ∗n ), and that U is closed under addition.
Exercise 16.10.11. Let {Sn : n ∈ Z+ } be a random walk on Z with steps Yn = Sn − Sn−1 ,
n ≥ 1. Suppose E[|Y1 |] < ∞ and that Y1 is aperiodic, that is, the greatest common divisor
of {m : P[Y1 = m] > 0} is 1. Show that P[Sn = x i.o.] = 1 for any x ∈ Z.
Exercise 16.10.12. Suppose S is a random walk Pon R+ . Let µ the step distribution ans
assume µ({0}) < 1. For any t ≥ 0 define N (t) = n≥0 1[0,t] (Sn ). Show that
N (t) 1
lim = , P − a.s.
t
t→∞ m
R
where m = [0,∞) xµ(dx) ∈ R+ . (Hint: SNt −1 ≤ t < SNt and limn→∞ Sn = ∞ P–a.s.)
Chapter 17

Weak convergence of
measures

Weak convergence of measures plays an important role in probability theory, statistics and
their applications. The central limit theorem, for instance, is one of such important and
widely used applications. In this chapter we present the theoretical framework of weak
convergence of measures. The following chapter we discuss the setting of Euclidean spaces
and discuss the Central Limit Theorem for independent random variables.

17.1. The weak topology for measures of finite variation

For any topological space (S, τ ) the collection of C(S)–Baire sets is, by Lemma 5.6.5, the
σ–algebra generated by C(S). Let M(S) denote the space of all Baire measures of finite
total variation on (S, σ(C(S))), and M+ (S) ⊂ M(S) to denote sub–collection of finite and
positive Baire measures. Recall that if S is a metric space, then the Baire and the Borel
σ–algebras, σ(C(S)) and B(S) respectively, coincide.
For a given linear subspace W of continuous functions on S we consider the space of
measures M for which
Z
f d|µ| < ∞, f ∈ W, µ ∈ M.

We equipped M with the weak* topology σ(M, W). In particular, when W separates points
of M, Theorem 12.11.5 implies that (M, σ(M, W)) is a locally convex Hausdorff topological
vector space whose dual is W. In such case, limits of convergent nets in (M, σ(M, W)) are
uniquely defined.
When S is a metric space, it is natural to consider the dual pair (M, W0 ) where W0 =
Cb (S) as M(S) is contained in the dual space of (Cb (S), k ku ), and Cb (S) separates the Borel
measures. When S is locally compact Hausdorff topological space, then based on Riesz’s

529
530 17. Weak convergence of measures

representation theorem, it is natural to consider the dual pair (M, Wk ) where W1 = C00 (S)
or W2 = C0 (S). If S is a compact metric space space, then the dual pairs (M, Wk ),
k = 0, 1, 2, coincide. In Probability theory one is mainly concerned with M+1 as a subspace
of (M, σ(M, W0 )).
Definition 17.1.1. Let W be a linear space of bounded measurable functions on S. A net
w
{µα : α ∈ D} ⊂ M(S) converges W–weakly to µ ∈ M(S), denoted by µα − → µ, if
Z Z
lim f dµα = f dµ
α S S
for all f ∈ W. If W = Cb (S) we simply say that µα converges weakly to µ, which we
denoted by µα ⇒ µ. If S is locally compact and Hausdorff and W = C00 (S) then we say
v
that µα converges vaguely to µ, which we denoted by µα −
→ µ; if W = C0 (S) then we say
w∗
that µα converges vaguely* to µ, which we denoted by µα −−→ µ.
Example R 17.1.2. If {µα : α ∈ D} and µ are finite measures on (S, B(S)) and kµα −µkT V →
0, then | f d(µα − µ)| ≤ kf ku kµα − µkT V for any f ∈ Cb (S). Therefore, µα ⇒ µ.
The converse is not necessarily true. For instance, consider µn = δ1/n , n ∈ N, and µ = δ0
on (R, B(R)). Clearly µn ⇒ µ, however kµn − µkT V = 2.
w∗
Example 17.1.3. If S is locally compact Hausdorff then µα −−→ µ iff supα kµα kT V < ∞
v
and µα −→ µ. This follows from the fact that C00 (S) is dense in C0 (S). To see that the net
{µα } needs be bounded, consider the example S = (0, ∞) and the sequence µn = nδ1/n .
v
Then µn − → 0 however, µn does not weak*–converge as any function f ∈ C0 (S) such that
√
f (1/n) ∼ 1/ n will show.

The weak topology σ(M(S), Cb (S)), as the example below shows, may be too restrictive
for only bounded continuous functions are considered as test functions.

Example 17.1.4. On (R, B(R)), the sequence µn = 1 − n1 δ0 + n1 δn converges weakly to
δ0 . Consider the (unbounded) continuous function ψ(x) = x. Then µn (ψ) = 1 6= 0, for all
n ∈ N and so, µn (ψ) 6→ 0 as n → ∞.

We present below one extension of the theory weak convergence developed thus far
which enlarges the collection of test functions to include some unbounded functions, and
which is usefulR in many applications. Suppose ψ ∈ C(S) with ψ ≥ 1. Let Mψ (S) =
{µ ∈ M(S) : S ψ d|µ| < ∞}, and C ψ (S) = {f ∈ C(S) : ψ −1 f ∈ Cb (S)}. Equip Mψ (S)
with the weak topology σ(Mψ (S), C ψ (S)). As kψ −1 µkT V < ∞ for all µ R∈ M(S), the
map Ψ : M(S) → Mψ (S) given by µ 7→ ψ1 · µ is well defined, and Ψ(µ)f = f ψ1 dµ for all
f ∈ C ψ (S). Since Cb (S) ⊂ C ψ (S), the weak topology σ(Mψ (S), C ψ (S)) on Mψ (S) is stronger

than the relative topology on Mψ (S) inherited as a subspace of M(S), σ(M(S), Cb (S)) .
The following Theorem shows that results about weak convergence in σ(M(S), Cb (S))
can be then translated into results about weak convergence in σ(Mψ (S), C ψ (S)).

Theorem 17.1.5. The map Ψ is an homeomorphism between M(S), σ(M(S), Cb (S)) and

( Mψ (S), σ(Mψ (S), C ψ (S)) .
17.2. Weak convergence of measures on metric spaces 531

Proof. Notice that µ ∈ M(S) iff ψ1 · µ ∈ Mψ (S), and that f ∈ Cb (S) iff ψf ∈ C ψ (S). Let
µ ∈ M(S), f1 , . . . , fN ∈ Cb (S), N ∈ N, and ε > 0. Consider the neighborhood
Z

Uε (µ; f1 , . . . , fN ) := ν ∈ M(S) : fj (dν − dµ) < ε, j = 1, . . . , N

Clearly ν ∈ Uε (µ; f1 , . . . , fN ) iff

Z Z Z 1 1

fj dν − fj dµ = ψfj d ·ν− ·µ
ψ ψ
Z

= ψfj d(Ψ(ν) − Ψ(µ)) < ε

This shows that Ψ Uε (µ; f1 , . . . , fN ) = Uε (Ψ(µ); ψf1 , . . . , ψfN ), whence the conclusion
follows immediately.

17.2. Weak convergence of measures on metric spaces

For the rest of this section we will assume that (S, d) is a metric space. We use Lb (S) to
denote the space of all real lower semicontinuous functions which are bounded below, Ub (S)
to denote the space of all real upper semicontinuous functions bounded above, and Lipb (S)
the space of bounded Lipschitz functions in (S, d).
Theorem 17.2.1. Let (S, d) be a metric space. For any net {µα : α ∈ D} ⊂ M+ (S) and
µ ∈ M+ (S),
(i) µα ⇒ µ if and only if
Z Z
(17.1) lim inf f dµα ≥ f dµ
α

for all f ∈ Lb (S).

Suppose that (S, d) is a locally compact separable metric space.
v
(ii) If µα −
→ µ, then (17.1) holds for all 0 ≤ f ∈ Lb (S).

Proof. (i): Suppose that µα ⇒ µ and let g ∈ Lb (S) with g ≥ c. By Theorem B.1.6, there
is a sequence gk of bounded Lipschitz functions such that c ≤ gk ≤ gk+1 ր g. Hence, for
each k
Z Z Z
lim inf g dµα ≥ lim inf gk dµα = gk dµ.
α α
R R
As µ(S) < ∞, by monotone convergence we obtain that lim inf α g dµα ≥ g dµ.
Conversely, suppose f ∈ Cb (S). Since Cb (S) ⊂ Lb (S), both f and −f are in Lb (S), so
Z Z
lim inf f dµα ≥ f dµ
α
Z Z
lim inf −f dµα ≥ −f dµ
α
R R
Therefore, limα f dµα = f dµ.
532 17. Weak convergence of measures

(ii) Let 0 ≤ f ∈ Lb (S) and let fk ∈ Cb (S) be such that 0 ≤ fk ր f pointwise. Since S is
locally compact and separable, there is a sequence of open sets Vj with compact closure such
that V j ⊂ Vj+1 ր S. Choose vj ∈ C00 (S) so that 1V j ≤ vj ≤ 1Vj+1 and supp vj ⊂ Vj+1 .
Let fkj = fk vj ; clearly fkj ∈ C00 (S) and fkj ր fk as j ր ∞. Then for all k and j
Z Z Z Z
lim inf f dµα ≥ lim inf fk dµα ≥ lim inf fkj dµα = fkj dµ
α α α

By MCT we obtain (17.1) by letting j ր ∞ and then k ր ∞.

Let Ub (S) ⊂ Cb (S) denote the collection if all bounded uniformly continuous functions
on S. Then, Lip
b (S) ⊂ U b (S) ⊂ Cb (S), and so, by Corollary 12.11.12, σ M(S), L b (S) ⊂
σ M(S), Ub (S) ⊂ σ M(S), Cb (S) . These weak topologies coincide on the cone M+ (S).
Corollary 17.2.2. Let (S, d) be a metric space. A net {µα : α ∈ D} ⊂ M+ (S) converges
weakly to µ ∈ M(S) if and only if µα f → µf for all f ∈ Lipb (S).

Proof. Suppose limα µα f = µf for each f ∈ Lipb (S). We claim that µ ∈ M+ (S). Indeed,
for any function g ∈ Cb+ (S) there is, by Theorem B.1.6, a sequence {fn : n ∈ N} ⊂ Lip+
b (S)
such that fn ր g. Then 0 ≤ limα µα fn = µfn . By dominated convergence limn µfn =
µg. Hence µg ≥ 0 for every g ∈ Cb+ (S). The conclusion follows as a consequence of
Theorem B.1.6 along with Theorem 17.2.1(i).
Theorem 17.2.3. (Portmanteau theorem) Let (S, d) be a metric space, µ ∈ M+ (S) and
suppose {µα : α ∈ D} is a net in ⊂ M+ (S). The following statements are equivalent
(i) µα converges weakly to µ.
(ii) lim supα µα (S) ≤ µ(S) and lim inf α µα (U ) ≥ µ(U ) for each U open.
(iii) lim inf α µα (S) ≥ µ(S) and lim supα µα (F ) ≤ µ(F ) for each F close.
(iv) limα µα (A) = µ(A) for each Borel set A such that µ(∂A) = 0.

Proof. (i) =⇒ (ii). If (i) holds then limα µα (S) = µ(S). For any open set U we have that
1U is a bounded lower semicontinuous. Therefore, by Theorem 17.2.1(i), (ii) holds.

The equivalence of (ii) and (iii) is evident by taking complements.

(iii) =⇒ (iv): Suppose that A is such that µ(∂A) = 0. Denote by Ao the interior of A then,
since µα (Ao ) ≤ µα (A) ≤ µα (A), we obtain
µ(Ao ) ≤ lim inf µα (A) ≤ lim sup µα (A) ≤ µ(A)
α α

Since 0 = µ(∂A) = µ(A \ Ao ), µ(A) = µ(Ao ).

(iv) =⇒ (i): Since ∂S = ∅, we have that limα µα (S) = µ(S). Suppose f ∈ Cb (S) with f ≥ c
so that 0 ≤ g := f − c ∈ Cb (S). The sets Ft := {g = t} ∈ B(S), t ≥ 0, are pairwise disjoint.
Since µ(S) < ∞, µ(Ft ) = 0 for all but finitely many t ≥ 0. For any δ > 0 and k ∈ Z+ define
Bk,δ := {kδ ≤ g ≤ (k + 1)δ}. As g is bounded, for each δ > 0 there is Nδ ∈ N such that
17.2. Weak convergence of measures on metric spaces 533

Bk,δ = ∅ for all k > Nδ ; since g is continuous, ∂Bk,δ ⊂ Fkδ ∪ F(k+1)δ . It follows that for any
n ≥ 1, there are uncountably many 0 ≤ δ < n1 such that

(17.2) µ Fkδ = 0, for all k ∈ N.
For each such δ we have the estimate
Z Nδ
X Nδ
X Z

g dµ − δµ(S) ≤ kδ µ Bk,δ = lim kδ µα Bk,δ ≤ lim inf g dµα
α α
k≥0 k≥0

Letting δ → 0 over all δ satisfying (17.2) leads to

Z Z
f dµ ≤ lim inf f dµα .
α
R R
Substituting f by −f implies that lim supα f dµα ≤ f dµ. Consequently, µα ⇒ µ.
Lemma 17.2.4. Let {xα } be a net in a metric space S and x ∈ S. δxα ⇒ δx iff xα → x.

Proof. δxα ⇒ δx iff limα f (xα ) = f (x) for all f ∈ Cb (S). The particular choice f (y) =
1 ∧ d(y, x) shows that xα → x.
Lemma 17.2.5. The set {δx : x ∈ S} is weakly closed in M+ (S).

Proof. Suppose that δxα ⇒ µ. As M+ (S) is a closed in M(S), σ(M(S), Cb (S)) and
{δx : x ∈ s} ⊂ M+ (S), µ ∈ M+ (S). Let x ∈ supp(µ). For any open neighborhood R V of x,
let f ∈ Cb (S) be such that 0 ≤ f ≤ 1, f (x) = 1 and f = 0 on S \ V . Clearly f dµ > 0
and, as limα δxα f = limα f (xα ) = µf , there exists α0 ∈ D such that α ≥ α0 implies that
x0 ∈ V . Therefore, xα → x and δx = µ.
Theorem 17.2.6. For any metric space (S, ρ), co(δx : x ∈ S) is σ(M(S), Cb (S))–dense in
M+1 (S).

Proof. Suppose there exists µ ∈ M+ x : x ∈ S). As M(S) is locally convex with

1 \ co(δ
respect the weak topology σ(M(S), Cb (S)) , by Theorem 12.10.15(b) there exist a function
f ∈ M′ (S) = Cb (S) and a constant
R c ∈ R such that νf < c < µf for Rall ν ∈ co(δ
R x : x ∈ S).
In particular, f (x) = δx f < c < f dµ for all x ∈ S. If follows that f dµ < f dµ which
is contradiction.

The following result is a direct consequence of Theorem 17.2.6.

Corollary 17.2.7. For any metric space (S, ρ), span{δx : x ∈ S} and co(aδx : a ≥ 0, x ∈ S)
are σ(M(S), Cb (S))–dense in M(S) and M+ (S) respectively.

The next result gives sufficient conditions for the uniform convergence of integrals with
respect to a weakly convergent net of positive measures.
Theorem 17.2.8. (R. Rao) Let (S, d) be a separable metric space and suppose that the net
{µα : α ∈ D} ⊂ M+ (S) weakly converges to a nonnengative measure µ. If Γ ⊂ Cb (S) is
534 17. Weak convergence of measures

uniformly bounded (i.e., supf ∈Γ kf ku < ∞), and equicontinuous (i.e., for any x ∈ S and
ε > 0 there is δ > 0 such that d(x, y) < δ implies that |f (x) − f (y)| < ε) then,
Z

lim sup f d(µα − µ) = 0

α f ∈Γ

Proof. Let M := supf ∈Γ kf ku . For any x ∈ S and ε > 0 there is an open ball Bx centered
S x ) = 0 and |f (x) − f (y)| < ε for all y ∈ Bx and f ∈ Γ. Since S is
at x such that µ(∂B
separable, S = n∈N Bxn for some countable subcollection of balls. Set A1 = Bx1 , and
Sn−1
An = Bxn \ j=1 Aj for n > 1. It follows that {An : n ∈ N} is a pairwise disjoint collection
of Borel sets covering S with µ(∂An ) = 0 for all n ∈ N. Define
X
ν := µ(An )δxn
n
X
να := µα (An )δxn
n
S
For any δ > 0, there is N ∈ N large enough such that µ Ω \ N An < δ. Since
SN SN SN n=1 S
∂ Ω \ n=1 An ⊂ n=1 ∂An , we have that limα µα Ω \ n=1 An = µ Ω \ N n=1 An .
Hence, for any f ∈ Γ
Z N
!
X X
f d(να − ν) ≤ M
µα (An ) − µ(An ) + µα (An ) − µ(An )
S n=1 n>N
!
N
X N
[ N
[
≤M |µα (An ) − µ(An )| + M µα (Ω \ An + µ (Ω \ An
n=1 n=1 n=1
R
Passing to the limit we obtain that lim sup supf ∈Γ S f d(να − ν) ≤ 2M δ. As δ may be
α
arbitrarily small, we conclude that
Z

(17.3) lim sup f d(να − ν) = 0.

α f ∈Γ S

Since An ⊂ Bxn , for any f ∈ Γ

Z Z Z Z

f d(µα − µ) ≤ f d(µα − να ) + f d(να − ν) + f d(ν − µ)

S S S S
X Z Z

≤ |f (x) − f (xn )|(µα + µ)(dx) + f d(να − ν)
n An S
Z

(17.4) ≤ ε µα (S) + µ(S) + sup f d(να − ν) .

f ∈Γ S
R
From (17.3) and the fact that limα µα (S) = µ(S) we get that lim supα sup S f d(µα − µ) ≤
f ∈Γ
2εµ(S). The conclusion follows by letting ε → 0.
17.2. Weak convergence of measures on metric spaces 535

We will use Rao’s theorem to show that M+ (S), as a subspace of S), σ(M(S), Cb (S)) ,
is metrizable. Recall that function f on S is Lipschitz iff
|f (x) − f (y)|
Lip(f ) = sup <∞
x6=y d(x, y)
We denote by Lip1 (S) the collection of all bounded Lipschitz functions with kf kBL :=
kf ku + Lip(f ) ≤ 1.
Theorem 17.2.9. (Kantorovich–Rubinstein) Let (S, d) be a separable metric space. For
any µ ∈ M(S) define
Z
∗
(17.5) kµk := sup f dµ : f ∈ Lip1 (S)
S

Then, k k∗ is a norm on M(S). Moreover, on M+ (S), the topology induced by k k∗

coincides with the topology induced by σ M(S), Cb (S) .

Proof. We prove that kµk∗ = 0 implies that µ ≡ 0. The remaining details that show
that k k∗ is a norm on M(S) are left as an exercise. Let F ⊂ S be a closed set. Then
R 1 ∧ nd(x,
fn (x) := F ) is a sequence of bounded Lipschitz functions such that fn Rր 1S\F.

Then, S fn dµ ≤ |fn kBL kµk∗ = 0. By monotone convergence |µ(S \F )| = limn S fn dµ.
Thus, |µ(S \ F ) = 0 for any closed set F . Being µ a finite measure, Sierpinski’s monotone
class theorem implies that µ(A) = 0 for any Borel set A.
α
We claim that for any net {µα : α ∈ D} ⊂ M+ (S) and µ ∈ M(S), µα ⇒ µ iff µ ∈ M+ (S)
and limα kµα −µk∗ = 0. To prove necessity, notice that M+ (S) is closed in σ(M(S), Cb (S)).
α
Thus, µα ⇒ µ implies that µ ∈ M+ (S). A direct application of Rao’s theorem with
∗
Γ = Lip1 (S) shows that lim R α kµα − µk = 0. To prove sufficiency, notice that for any

Lipschitz function f , limα S f d(µα − µ) ≤ kf kBL limα kµα − µk∗ = 0. This shows that
limα µα f = µf for all f ∈ Lip(S). The conclusion of the claim follows from Corollary 17.2.2.

Suppose A ⊂ M+ (S) is σ(M+ (S), Cb (S))–closed. If µ belongs to the closure of A under k k∗ ,

there is net {µα : α ∈ D} ⊂ A (a sequence will suffice) such that limα kµα − µk∗ = 0, which
means that µα ⇒ µ. Hence µ ∈ A and so A is k k∗ –closed. Conversely, suppose A is k k∗ –
closed. If µ is in the closure of A under σ(M(S), Cb (S)), there is a net {µα : α ∈ D} ⊂ A
with µα ⇒ µ. Then, limα kµα −µk∗ = 0 and so µ ∈ A. This shows that A is σ(M(S), Cb (S))–
closed.

If (S, d) is a locally compact separable metric space then C0 (S) is separable with respect
to the uniform norm. The Riesz representation theorem along with Alaoglu’s theorem
implies that the ball M≤1 := {µ ∈ M(S) : kµkT V ≤ 1} is weak∗ –compact, metrizable
and thus, sequentially weak∗ –compact. We have shown above that if (S, d) is a separable
metric space then M+ (S), σ(M(S), Cb (S) is metrizable. Consequently, M+ 1 (S) is a convex
weakly–closed and metrizable subspace of M(S). To emphasize the connection between
536 17. Weak convergence of measures

weak convergence and weak topology, we will give a different proof of the metrizability of
the weak topology on M+
≤1 (S).

Theorem 17.2.10. Let (S, d) be a metric space. If (S, d) is a separable metric space, then
M+ +
≤1 := {µ ∈ M (S) : kµkT V ≤ 1}
is a separable and metrizable closed subset of M(S). Furthermore, (S, d) is separable iff
M+ 1 (S), σ(M(S), C b (S)) is separable and metrizable.

Proof. It is obvious that M+ ≤1 (S) is a closed set in M(S), σ(M(S), Cb (S)) .
If (S, d) is a separable metric space then, by Theorem 2.9.3, there is an equivalent met-
ric e on S so that (a) (S, e) is isometrically homeomorphic to a dense set of a com-

pact subset Sb of [0, 1]N, and (b) Ub (S, e), k ku is isometric to Cb (S), b k ku . Conclu-

sion (a), together with the Riesz representation theorem, implies that Cb (S), b k ku ∗ =

M(S),b k kT V ; conclusion (b) implies that M(S), σ(M(S), Ub (S)) can be embedded as a

subspace of M(S), b σ(M(S), b Cb (S))
b . By Theorem 2.11.9, Cb (S) b is separable which, along
with Alaoglu’s theorem and Theorem 12.12.9, implies that {µ ∈ M(S) b : kµkT V ≤ 1} is a
σ(M(S),b Cb (S))–compact,
b metrizable, and hence separable, space. As σ(M(S), Ub (S)) and
σ(M(S), Cb (S)) coincide on M+ (S), it follows that M+ ≤1 (S) is metrizable and separable.

For the last statement, Lemma 17.2.4 shows that the map x 7→ δx is a continuous embedding
of S into M+ +
1 (S). Therefore, if M1 (S) is separable, so is S.

If S is a separable l.c.H topological space, then from Theorem 2.11.9, C00 (S) is sep-
∗
arable. By Riesz representation theorem C00 (S), kw ku = M(S), k kT V . Hence, by
Theorem 12.12.9, any normed–bounded subset of M(S) is relatively weak*–compact and
metrizable and thus, separable. Therefore, the weak* topology is completely determined
by considering vague convergence of sequences of measures. The following result is the
equivalent to the Portmanteau theorem for vague convergence in locally compact separable
(l.c.s) metric spaces.
Theorem 17.2.11. Suppose (S, d) is a locally compact separable metric space, {µn : n ∈ N}
v
is a sequence in M+ (S), and µ ∈ M+ (S). If µn −
→ µ, then
(i) lim inf n µn (U ) ≥ µ(U ) for all U open,
(ii) lim supn µn (K) ≤ µ(K) for all K compact,
(iii) limn µn (A) = µ(A) for all Borel set A with compact closure and µ(∂A) = 0.
Conversely, assume that supn kµn kT V < ∞. If either (i) & (ii) hold, or if (iii) holds, then
v
µn −
→ µ.
v
Proof. (i): Assume that µn −→ µ. As 1U is lower semicontinuous for any open set U , (i) is
a direct consequence of Theorem 17.2.1 (ii).
(ii): Suppose K is compact and consider the open sets K ε = {s ∈ S : d(s, K) < ε}. There
is a an open set Uε such that K ⊂ Uε ⊂ Uε ⊂ K ε such that U ε is compact. Let fε ∈ C00 (S)
17.3. Weak convergence under continuous transformations 537

ε
R
Rbe such that supp(f ε ) ⊂ U and 1K ≤ fε ≤ 1U ε . Hence, lim supn µn (K) ≤ limn fε dµn =
fε dµ ≤ µ(U ). Since µ(U ε ) < ∞, (ii) follows by letting ε ց 0.
ε

(iii): We show that (i) and (ii) imply (iii). Suppose A is Borel measurable, then µn (Ao ) ≤
µn (A) ≤ µn (A). If A is compact and µ(∂A) = 0, then (iii) follows from (i) and (ii).

For the last statement, assume that (iii) holds and that supn kµn kT V < ∞. Let 0 ≤ f ∈
R Rb
C00 (S), b = kf ku , and K = supp f . By Fubini’s we have that f dµ = 0 µ(f > t)dt. Since
µ(S) < ∞ and ∂{f > t} ⊂ {f = t} ⊂ K, we obtain that µ(∂{f > t}) ≤ µ(f = t) = 0
for a.s. all t ≥ 0. Therefore limn µn (f > t) = µ(f > t) for a.s all t. The assumption
supn kµn kT V < ∞ and dominated convergence imply that
Z Z b Z b Z
f dµ = µ(f > t) dt = lim µn (f > t) dt = lim f dµn
0 n 0 n

Since f+ , f− ∈ C00 (S) if and only if f ∈ C00 (S), vague convergence follows.

Theorem 17.2.12. Let (S, d) be a l.c.s metric space and let {µn , µ} be finite measures in
M+ (S). The following statements are equivalent.
(i) µn ⇒ µ
v
(ii) µn −
→ µ and µn (S) → µ(S).

Proof. Clearly (i) implies (ii). Conversely, assume (ii), and let f ∈ Lb (S) Rwith c ≤ f for
Rsome constant c. Then 0 ≤ f − c ∈ Lb (S) and by Theorem 17.2.1(ii) lim inf n (f − c) dµn ≥
(f − c) dµ. The assumption µn (S) → µ(S) implies that
Z Z
lim inf f dµn ≥ f dµ.
n

The conclusion follows from Theorem 17.2.1(i).

17.3. Weak convergence under continuous transformations

An important property of weak convergence of measures is that it is preserved by continuous
transformation of spaces.

Theorem 17.3.1. Let (S, d) and (S ′ , d′ ) be metric spaces and µn , µ in M+ (S). If h : S ⇒

S ′ is continuous and µα ⇒ µ, then µα ◦ h−1 ⇒ µ ◦ h−1 .

Proof. If f ∈ Cb (S ′ ), then f ◦ h ∈ Cb (S), so

Z Z Z Z
−1
f dµα ◦ h = f ◦ h dµα −→ f ◦ h dµ = f dµ ◦ h−1
S′ S S S′

538 17. Weak convergence of measures

Weak convergence behaves well also under a.s continuous transformations. Let h be a
function on an arbitrary space X with values in a metric space (S ′ , d′ ). For any T ⊂ S, the
modulus of continuity of h on T is defined as

Ωh (T ) := sup{ρ′ (h(x), h(y)) : x, y ∈ T }.

If X is also a metric space, the modulus of continuity h at x is defined as

ωh (x) = lim Ωh (B(x; δ)) = inf Ωh (B(x; δ))

δց0 δ>0

Lemma 17.3.2. Let S and S ′ be metric spaces and let h : S → S ′ . For any r > 0, the set
Jr = {x ∈ S : ωh (x) ≥ r} is closed.

Proof. If x ∈ Jrc , there is δ > 0 such that Ωh (B(x; δ)) < r. Clearly B(x; δ) ⊂ Jrc .

Lemma 17.3.3. For any function h : S −→ S ′ , the set Dh ⊂ S of discontinuities of h is a

σ–F set and thus, Borel measurable.

ωh (x) = 0. By Lemma 17.3.2 the set Eε = {x :

Proof. h is continuous at x if an only if S
ω(x) ≥ ε} is closed in S. Therefore Dh = n E1/n is a σ–F set.

Theorem 17.3.4. Under the assumptions of Theorem 17.3.1, if h : S → S ′ is continuous

µ–a.s. then
(i) If µα ⇒ µ, then µα ◦ h−1 ⇒ µ ◦ h−1
R R
(ii) If S ′ = R and h is bounded, then limα S h dµα = S h dµ

Proof. (i) Clearly limα µα h−1 (S ′ ) = limα µα (S) = µ(S) = µh−1 (S ′ ). For any closed set
F ⊂ S ′ , we have
h−1 (F ) ⊂ h−1 (F ) ⊂ Dh ∪ h−1 (F )
If µ(Dh ) = 0 then µ(h−1 (F )) = µ(h−1 (F )). By the Portmanteau theorem

lim sup µα (h−1 (F )) ≤ lim sup µα (h−1 (F )) ≤ µ(h−1 (F )) = µ(h−1 (F ))

α α

(ii) Let f (x) = ((−M ) ∨ x) ∧ M where M = khku . As h = f ◦ h and f ∈ Cb (R), by part (i)
Z Z Z Z Z Z
−1 −1
h dµα = f ◦ h dµα = f dµα ◦ h −→ f dµ ◦ h = f ◦ h dµ = h dµ.

An important consequence of Theorem 17.3.4 which is useful in Probability and Statis-

tics is the following result.
R R
Corollary 17.3.5. In M(R), if µn ⇒ µ, then |x| µ(dx) ≤ lim inf n |x| µn (dx)
17.4. Tightness and Prohorov’s theorem 539

Proof. Consider the function ha (x) = |x|1{|x|≤a} . Notice that ha is continuous everywhere
but Dha = {±a}. With the exception of at most countably number of points a, we have
that µ({±a}) = 0. For such typical a, Theorem 17.3.4 shows that
Z Z Z
|x| µ(dx) = lim |x| µn (dx) ≤ lim inf |x| µn (dx).
{|x|≤a} n {|x|≤a} n

The conclusion follows by letting a ր ∞ along typical values.

17.4. Tightness and Prohorov’s theorem

Weak convergence criteria for (complex or signed) measures in a complete separable metric
space (S, d) is related to concentration of mass in sets of finite size. In this setting, M(S)
is the collection of all finite (complex) measures on B(S).
Definition 17.4.1. Let (S, d) be a metric space with the Borel σ–algebra. A family Π ⊂
M(S) is tight if for every ε > 0 there is a compact set K ⊂ S such that
sup{|µ|(S \ K) : µ ∈ Π} < ε
where |µ| is the variation measure of µ.
Example 17.4.2. The sequence of measures µn = nδ0 − δcos n is tight, but not uniformly
bounded in total variation. The family of measures (−1)n δcos n is tight and also uniformly
1
bounded in total variation. The sequence νn = 2n 1[−n,n] is uniformly bounded in total
variation, but it is not tight.

Let (X, B) be a topological space with its Borel σ–algebra. A measurable function
V : X −→ [0, ∞] is precompact or norm–like if V −1 ([0, r]) is compact for any 0 ≤ r < ∞.
Theorem 17.4.3. A collection Π ⊂ M(X) is bounded in total variation and tight iff there
exists a precompact function V ≥ 1 such that
Z
sup V d|µ| < ∞
µ∈Π
R
Proof. Suppose there is a precompact function V ≥ 1 with a := supµ∈Π V d|µ| < ∞.
Then
Z
sup kµkT V ≤ sup V d|µ| < ∞.
µ∈Π

For any ε > 0 let r > 1 so that r > a/ε. The set K = V −1 ([1, r]) is compact and
a
sup |µ|(K c ) ≤ sup |µ|(V > r) ≤ < ε.
µ∈Π µ∈Π r
Therefore, Π is bounded in total variation and tight.
Suppose Π is of bounded total variation and tight. There exists a sequence of compact sets
K1 ⊂ K2 ⊂ . . . such that
|µ|(Knc ) < 2−n , µ ∈ Π.
540 17. Weak convergence of measures

P
Let V (x) = 1 + ∞ c . For any r > 0 let nr = [r] + 1. As {V ≤ r} ⊂ {V ≤ nr } ⊂ Kn ,
n=1 1Kn r
V is precompact and
Z
sup V d|µ| ≤ sup kµkT V + 1 < ∞
µ∈Π µ∈Π

Definition 17.4.4. Let (S, d) a metric space and µ ∈ M(S). A set A ∈ B(S) is inner
regular with respect to µ if
(17.6) |µ|(A) = sup{|µ|(K) : K compact, K ⊂ A},
and µ is inner regular, or simply regular , if (17.6) holds for all A ∈ B(S).

If (17.6) holds with supremum over closed sets, then A is closed regular .
Lemma 17.4.5. Let (S, d) be a metric space.
(i) If µ ∈ M(S) is tight, then family R of measurable sets A such that A and S \ A
are inner regular is a σ–algebra.
(ii) For any finite measure µ ∈ M(S), the collection RF of measurable sets A such
that A and S \ A are closed regular is a σ–algebra.

Proof. (i) Without loss of generality, we may assume µ is a finite nonnegative tight measure.
Let R be the collection of Borel sets A such that A and S \ A are both regular. Clearly
S
S ∈ R, and A ∈ R if and only if S \A ∈ R. Suppose {An : n ∈ N} ⊂ R and set A = n An .
For each n, there are compact sets Kn ⊂ An and Ln ⊂ S \An such that µ(An \Kn ) < ε2−n−1
S
and µ((S \ An ) \ Ln ) < ε2−n . Choose N large enough so that µ A \ N k=1 Ak < ε/2. The
SN T
sets F = j=1 Kj and L = n Ln are compact and
N
X
µ(A \ F ) ≤ ε/2 + ε 2−j−1
j=1
X
−n
µ((S \ A) \ L) ≤ ε 2 = ε.
n

This shows that A ∈ R, and so R is a σ–algebra.

(ii) The same proof with closed in place of compact shows that RF = {A ∈ B(S) :
A and S \ A closed regular} is a σ–algebra.
Theorem 17.4.6. Let (S, d) be a metric space and µ ∈ M(S). µ is inner regular iff the
singleton {µ} is tight.

Proof. Necessity is obvious. To prove sufficiency, we may assume without loss of generality
that µ is a nonnegative tight measure on B(S). We show first that µ is closed regular. For
any open set U , let F = S \ U . The sequence of sets closed sets Fn = {x ∈ S : ρ(x, F ) ≥ n1 },
n ∈ N, satisfies Fn ր U . Consequently U is closed regular. By Lemma 17.4.5 we conclude
that µ is closed regular.
17.4. Tightness and Prohorov’s theorem 541

For ε > 0, let K be a compact set such that µ(S \ K) < ε/2. For any B ∈ B(S),
let F ⊂ B be a closed set with µ(B \ F ) < ε/2. Hence L = F ∩ K is compact and
µ(B \ L) ≤ µ(B \ F ) + µ(F \ L) < ε. This shows that µ is inner regular.
Theorem 17.4.7. (Ulam) If (S, d) is a complete separable metric space and µ ∈ M(S),
then µ is tight.

Proof. Without loss of generality, assume µ is a nonnegative measure. Let {xn } ⊂ S a

dense and ε > 0. Denote by B(x;r) = {s ∈ S : ρ(s, x) ≤ r}. Then, for any m, there is
S m T S m
nm such that µ S \ nj=1 1
B(xj ; m ) < ε2−m . Consider the set K = m nj=1 1
B(xj ; m ) and
observe that
X n
[ m X
1 1
µ(S \ K) ≤ µ S\ B(xj ; m ) ≤ε 2n = ε
m j=1 m

As K is closed and totally bounded, the completeness of S implies that K is compact.

For any set A ⊂ S and ε > 0, denote by Aε = {x ∈ S : ρ(x, A) ≤ ε} the ε–neighborhood

of A. It is clear that Aε is a closed set.
Lemma 17.4.8. Let (S, d) be a complete metric space. A family Π ⊂ M(S) of (com-
plex) measures is tight if and only if for any ε > 0, there is a compact K ⊂ S such that
supµ∈Π |µ|(S \ K ε ) ≤ ε.

Proof. Necessity is obvious, even without the assumption of completeness.

P To prove suf-
ficiency, let ε > 0 be fixed and let εn > 0 be a sequence such that n εn < ε. For any n
there is a compact Kn ⊂ S such that supT εn
µ∈Π |µ|(S \ Kn ) ≤ εn .
ε
We will show that the closed set K = n Knn is in fact compact. Let {xk } ⊂ K and
choose {sk } ⊂ K1 such that ρ(xk , sk ) ≤ εk . By compactness of K1 , we obtain a convergent
subsequence {sm1 } ⊂ {sk } so that diam({sm1 }) < ε1 /2. It follows that diam({xm1 }) ≤ 3ε1 .
k k k
Suppose that for ℓ ≥ 1, we have obtained a convergent subsequence {xmℓ } ⊂ {xmℓ−1 } with
k k
diam({xmℓ }) ≤ 3εℓ . Then, by compactness of Kℓ+1 , we can construct a convergent sub-
k
sequence {xmℓ+1 } ⊂ {xmℓ } such that diam({xmℓ+1 } ≤ 3εℓ+1 . By a diagonal argument, we
k k k
can subtract a subsequence {xnk } ⊂ {xk } such that diam({xnm : m ≥ k}) ≤ εk . Therefore,
{xnk } is a Cauchy sequence, and by completeness, it converges to a point x ∈ K.
Observe that K satisfies
X X
|µ|(S \ K) ≤ |µ|(S \ Knεn ) ≤ εn < ε
n n
for all µ ∈ Π. Hence, Π is tight.
Theorem 17.4.9. (Prohorov) Suppose (S, d) is a complete and separable metric space and
let Π ⊂ M(S). The following statements are equivalent
(i) Every sequence {µn } ⊂ Π has a weakly convergent subsequence.
(ii) The family Π is tight and bounded in total variation.
542 17. Weak convergence of measures

The above conditions are equivalent on a complete metric space (S, d) if each µ ∈ Π is tight.
On any metric space (S, d), (ii) implies (i).

Proof. Assume (i). First we show that Π is bounded in total variation. Suppose that there
is {µn } with kµn kT V > n and let {µn′ } be a convergent subsequence; then, supn′ |µn′ f | < ∞
for any f ∈ Cb (S). Since Π ⊂ Cb (S)∗ and Cb (S) is a Banach space, it follows from the
Banach–Steinhaus theorem that {µn′ } is bounded with respect to the total variation norm.
This contradicts the choice of {µn }.
We show now that Π is tight. Suppose that Π fails to be tight. Then there exists ε > 0
such that for any compact K ⊂ S, one can find µK ∈ Π such that kµK k ≥ |µK |(K) + ε. In
particular, there is µ1 ∈ Π with kµ1 k > ε. Ulam’s theorem provides a compact set K1 ⊂ S
with |µ1 |(K1 ) > ε. By Lemma 17.4.8, there is µ2 ∈ Π such that |µ2 |(S \ K1ε ) > ε. Let
K2 ⊂ S \ K1ε be a compact set so that |µ2 |(K2 ) > ε. By induction, having constructed a
Sm−1 ε
compact set Km and µm ∈ Π with Km ⊂ S \ j=1 Km and |µm |(Km ) > ε, we can find
Sm S
µm+1 ∈ Π so that |µm+1 S \ j=1 Kj > ε. Let Km+1 ⊂ S \ m
ε ε
j=1 Km be a compact set
ε/4
such that |µm+1 |(Km+1 ) > ε. This construction yields the sequence {Um } = {Int(Km )} of
pairwise P P sets. For each m, we choose fm ∈ Cb (S) such that 1Km ≤ fm ≤ 1Um .
disjoint open
Clearly, m fm = m |fm | ≤ 1 and
Z Z
(17.7) fm d|µm | = fm d|µm | > ε
S Um

By assumption, {µm } has a weakly convergent subsequence. Without loss of generality, we

R is {µm } is already a convergent
assume Psequence. It follows
R P that the numerical sequence an :
m 7→ fm d|µn | belongs P to ℓ 1 , since |a
m n (m)| ≤ R d|µn | ≤ kµn kT V . For any b ∈
m fm
ℓ∞ we have that hb = m bm fm ∈ Cb (S), therefore hb, an i = hb dµn converges. This means
that {an } converges in the weak–(ℓ1 , ℓ∞ ) topology; in particular, an converges pointwise
to some numerical sequence to some number a. By Corollary 13.1.3 to the Vitali–Hahn–
Saks theorem with (Ω, F , ν) = (N, 2N, #), an converges in ℓ1 to a. Hence, limn an (n) = 0
contradicting (17.7). Therefore, Π is tight.

Assume (ii): Choose an increasing sequence Kn ⊂ S of compact subsets such that

sup |µ|(S \ Kn ) ≤ 2−n .

µ∈Π

Denote by c = supµ∈Π kµkT V . Since any compact set K ⊂ S is a separable metric space,
it follows that C(K) is a separable Banach space. Alaoglu’s theorem implies that BK =
{µ ∈ M(K) : kµkT V ≤ c} is weak*–compact. The separability of C(K) implies that
BK is metrizable; hence, sequentially compact. A standard diagonal argument shows that
any sequence {µm } ⊂ Π has a subsequence {µmk } that weak*–converges in each space
(M(Kn ), C(Kn ))w∗ .
17.5. Vague convergence for σ–finite measures 543

We will show that µmk converges weakly on M(S). For f ∈ Cb (S) and ε > 0, let m0 be so
that 2−m0 < ε. Then, for all n, k with mn ≥ mk > m0 we have
Z Z

f d(µmn − µm ) ≤ εkf ku +
(17.8) k f d(µ m n − µ m k
)
S Km0
R
The choice of {µmk } implies R that S f dµmk is a numerical Cauchy sequence. Let us denote
the limit by µ∗ (f ) = limk S f dµmk . It is clear that µ∗ is a linear bounded functional on
Cb (S). It remains to show that µ∗ is a measure on σ(Cb (S)) = B(S). By considering the
families {|µ| : µ ∈ Π} and {µ+ : µ ∈ Π}, we can assume without loss of generality that
Π ⊂ M+ (S). So, let {fn } ⊂ C(S) be a non increasing sequence with fn ց 0 pointwise. By
Dini’s theorem, fn ց 0 uniformly on the compact set Km0 (m0 as before). Therefore, for
all n large enough, we have that
Z Z
0≤ fn dµmk ≤ kf1 kε + fn dµmk ≤ (kf k1 + c)ε,
S Km0

from which we conclude that limn µ∗ (fn ) = 0. The Daniell–Stone theorem implies that µ∗
is a measure.
Remark 17.4.10. If µn ⇒ µ converges weakly, |µn | may fail to converge. Consider for
example µn = δ0 − δ 1 for n even and µn = n1 δn for n odd. Then, µn ⇒ 0 but |µn | does not
n
converge weakly.

If both µn and |µn | converge weakly, say to µ and ν respectively, then it might be that
ν 6= |µ|. Consider µn = δ0 − δ 1 . Then µn ⇒ 0, while |µn | ⇒ 2δ0 .
n

If |µn | converges weakly, µn might fail to converge. Consider µn = δ0 − δ 1 for n even and
n
µn = 2δ0 for n odd. Then |µn | ⇒ 2δ0 , but µn fails to converge weakly.

17.5. Vague convergence for σ–finite measures

Here we consider vague convergence of Radon measures on a locally compact separable
metric space.
Theorem 17.5.1. Suppose (S, d) is a locally compact separable metric space. If ∆ = {µt :
t ∈ T } is a family of Radon measures such that cK = supt µt (K) < ∞ for every compact
K ⊂ S then, any sequence {µm } ⊂ ∆ admits a subsequence {µmk } that converges vaguely.
S
Proof. Without loss of generality, we assume that S = n Kn with Kn+1 ⊂ Kn com-
pact. Since a compact set K ⊂ S is a separable metric space, C(K) is a separable Banach
space. By Alaouglu’s theorem, BK = {µ ∈ M(K) : kµkT V ≤ cK } is weak*–compact. The
separability of C(K) implies that BK is metrizable and hence, sequentially compact. A
standard diagonal argument shows that any sequence {µm } ⊂ ∆ has a subsequence {µmk }
that weak*–converges on each space (M(Kn ), C(Kn ))w∗ .
544 17. Weak convergence of measures

We show that µmk converges vaguely to some measure ν in (S, B(S)) that is finite on
compact sets. Let f ∈ CR00 (S) and suppose
R supp(f ) ⊂ Km0 . Then, by the choice ofR {µmk },
the numerical sequence f dµmk = Km f dµmk converges. Clearly, L(f ) = limk f dµmk
0
is a positive linear functional on C00 (S). By the Daniell–Stone theorem, it suffices to show
that L is δ–continuous. Let fn ⊂ C00 (S) be a decreasing sequence converging to zero. If
Kp ⊃ supp(f1 ), then Kp ⊃ supp(fn ) and by Dini’s theorem, fn ց 0 uniformly. From
Z
L(fn ) = lim fn dµmk ≤ kfn kc(Kp ),
k Kp

we conclude that L(fn ) ց 0.

17.6. Converging determining classes

It is sometimes convenient to prove weak convergence by showing that µn (A) → µ(A) for a
special class of sets. Such a class is called a convergence–determining class. We will assume
throughout this section that µn and µ are measures in M+ (S).
Theorem 17.6.1. Suppose that U ⊂ B(S) is a π–system and that every open set in S
is a countable union of sets in U . Then, µn ⇒ µ if and only if µn (A) → µ(A) for every
A ∈ U ∪ {S}.
T
Proof. If A1 , . . . , Am are in U , then jl=1 Akl ∈ U for any 1 ≤ j ≤ m and 1 ≤ k1 <
. . . kj ≤ m. Then, from
m
Y m
X X j
Y
j+1
1Smj=1 Aj =1− (1 − 1Aj ) = (−1) 1Akl
j=1 j=1 1≤k1 <...<kj ≤m l=1
S S S
it follows that µn ( nj=1 Aj ) → µ( nj=1 Aj ). If G is open, then G = n An for some finite or
S
infinite sequence An ∈ U . For any ε > 0, there is m ∈ N such that µ(G) < µ( m j=1 Aj ) + ε.
Sm Sm
Then µ(G) − ε < µ( j=1 Aj ) = limn µn ( j=1 Aj ) ≤ lim inf n µn (G). The conclusion follows
from the Portmanteau theorem (ii) by letting ε ց 0.
Corollary 17.6.2. Let U ⊂ B(S) be a π–system and suppose that for any x ∈ S and ε > 0
there is A ∈ U such that x ∈ Ao ⊂ A ⊂ B(x; ε). If S is separable and µn (A) → µ(A) for
every A ∈ U ∪ {S}, then µn ⇒ µ.

Proof. Let G be open. Then, for any x ∈ G there is ε > 0 and A ∈ U such that
x ∈ Ao ⊂ A ⊂ B(x; ε) ⊂SG. Since S is separable, there is aSfinite or infinite sequence
An ∈ U such that G ⊂ n Aon and An ⊂ G. Hence, G = n An and U satisfies the
hypotheses in Theorem 17.6.1.
Corollary 17.6.3. Let V be the π–system generated by the collection of open balls B(x; ε).
If S is separable and µn (A) → µ(A) for any A ∈ V ∪ {S} with µ(∂A) = 0, then µn ⇒ µ.

Proof. Since ∂B(x; ε) ⊂ {y ∈ S : ρ(x, y) = ε}, the boundaries of the open balls around a
point x are pairwise disjoint; hence, all but countably many have zero µ–measure. Since
17.7. Uniform integrability and weak convergence of measures 545

∂(A ∩ B) ⊂ ∂A ∪ ∂B, the collection U of finite intersections of open balls with zero µ–
measure boundary satisfies the hypothesis of Corollary 17.6.2.

17.7. Uniform integrability and weak convergence of measures

In probability theory is sometimes useful to estimate the asymptotic behavior of certain
statistical qualities of a sequence of weakly convergent probability measures such as their
means or variances. If there is uniform integrability of random variables, these asymptotic
properties can be easily studied. A partial result in this direction has already been discussed
in Corollary 17.3.5.
Theorem 17.7.1. Let Xn and X be real–valued measurable functions defined on some
−1 −1
R Xn ⇒ X, i.e. µ ◦ Xn ⇒ µ ◦ X . If {Xn } is
finite measure space (Ω, F R, µ). Suppose that
uniformly integrable, then X dµ = limn Xn dµ.

Proof.
R For each αR > 0 consider the functions gα (x) = x1{|x|≤α} . By Theorem 17.3.4,
gα (X) dµ = limn gα (Xn ) dµ for all but countable many α ≥ 0. Observe that
|µ(Xn − X)| ≤|µ(gα (Xn ) − gα (X))|
Z Z
(17.9) + sup |Xn | dµ + |X| dµ
n {|Xn |>α} {|X|>α}
We have seen that the first term on the right of (17.9) converges to zero; the second term
converges to zero from the uniform integrability of Xn as in Theorem 8.7.4(iii); the third
converges to zero by obvious reasons (dominated convergence for instance).
Corollary 17.7.2. Let Xn and X be measurable functions in topological space S defined on
a finite measure space (Ω, F , µ) and ϕ is a continuous real–valued function in S. Suppose
that Xn ⇒ X. If {ϕ(Xn )} is uniformly integrable, then limn µϕ(Xn ) = µϕ(X).

Proof. Follows immediately from Theorem 17.7.1 since ϕ(Xn ) ⇒ ϕ(X).

R For any measurable function X ∈ L1 (Ω, F , µ) define a measure νX ∈ M(R) by νX (A) =

X −1 (A) X dµ. Clearly {Xn } is a uniform integrable sequence if and only if the sequence of
measures {νXn } is tight.
Corollary 17.7.3. If Xn ⇒ X and {Xn } is u.i. then, νXn ⇒ νX .

Proof. Let f ∈ Cb (R), the continuity of x 7→ xf (x) implies that Xn f (Xn ) ⇒ Xf (X). Since
|f (Xn )Xn | ≤ kf k∞ |Xn |, Theorem R {f (Xn )Xn } is uniformly integrable.
R 8.7.4 implies that
From Theorem 17.7.1 we obtain f (Xn )Xn dµ → f (X)X dµ.

Theorem 17.7.1 can be stated in terms of measures.

Theorem 17.7.4. Let µn , µ finite measures on R and suppose that µn ⇒ µ. If
Z
lim sup |x|µn (dx) = 0,
α→∞ n {|x|>α}
then the measures νn (dx) = x µn (dx) ⇒ x µ(dx).
546 17. Weak convergence of measures

R
Proof. It followsRthe same steps as that of Theorem 17.7.1 replacing µf (Xn ) with f (x) µn (dx)
and µf (X) with f (x) µ(dx) for all functions f involved.

17.8. Weak convergence on probability spaces

One of the most important applications of weak convergence is in the context of probability
measures. The central limit theorem being one of the most celebrated results in Probability
and Statistics. In this section we will give a representation of weak convergence of prob-
ability measures in terms of almost surely convergence of random variables in a suitable
probability space.

Theorem 17.8.1. Let X and Xn random variables on a probability space (Ω, F , µ) with
values in a metric space (S, d).
(i) If Xn converges in measure to X, then Xn ⇒ X, that is µ ◦ Xn−1 =⇒ µ ◦ X −1 .
(ii) Let a ∈ S. Then Xn =⇒ a if and only if Xn → a in measure.

Proof. (i) Suppose the contrary,

so there is f ∈ Cb (S) and N ⊂ N such that we have
R
inf n∈N f (Xn ) − f (x) dµ > 0. By hypothesis, there is a subsequence N ′ ⊂ N along
which Xn → X µ–a.s. ′
R f (Xn ) → f (X) along n ∈ N ; thus, by dominated
R By continuity
convergence limn∈N ′ f (Xn ) dµ = f (X) dµ contradicting the choice of N .
(ii) Sufficiency is clearR from (i). For Rnecessity, observe that f (x) = d(x, a)∧1 is bounded
and continuous, so that f (Xn ) dµ → f (a) dµ = 0. The conclusion follows from Lemma
8.6.7.

Convergence in law of a sequence of random variables converging does not provide any
pointwise information about the random variables; even more, each random variable may
be defined on different probability spaces. When the probability laws are defined on a
nice space, it is possible to construct a probability space supporting a sequence of random
variables with prescribed laws in which the sequence of random variables converges pointwise
to a random variable with the prescribed limiting law.

Lemma 17.8.2. (Kallenberg) Suppose κ and {κn } are random variables in S = {1, . . . , m}
such that κn ⇒ κ. If θ ∼ U (0, 1) and θ and κ are independent, then there are measurable
d
functions fn : S ⊗ [0, 1] −→ S such that κ en → κ almost surely as
en = fn (κ, θ) = κn and κ
n → ∞.

Proof. Let µn and µ be the laws of κn and κ respectively and denote by pnj = µn ({j}) and
pj = µ({j}). For each n ∈ N, let Jn be the set of j ∈ S such that pnj ≤ pj . For each j ∈ Jn ,
divide the interval ∆j = [pnj /pj , 1] in #(Jnc ) disjoint subintervals ∆j,i , i ∈ Jnc , so that
pnj pni − pi
|∆j,i | = αi 1 − , αi = P n , i ∈ Jnc .
pj j∈Jnc (pj − pj )
17.8. Weak convergence on probability spaces 547

Let fn : S × [0, 1] → S be defined as

X X X
fn (s, t) = j1{κ=j} (s)1∆cj (t) + j 1{κ=j} (s) + 1{κ=i} (s)1∆j,i (t)
j∈Jn j∈Jnc i∈Jnc

en = fn (κ, θ). Observe that if j ∈ Jn , then

and define κ
κn = j] = P[κ = j]P[θ ≤ pnj /pj ] = pnj .
P[e
And if j ∈ Jnc , then
X
κn = j] = P[κ = j] +
P[e P[κ = i]P[θ ∈ ∆j,i ]
i∈Jnc
X
= pj + αj (pi − pni ) = pnj .
i∈Jnc
d
en = κn , and since limn pnj = pj for each j ∈ S, we have that κ
Hence, κ en → κ P–a.s.
Theorem 17.8.3. (Skorokhod–Dudley) Let µ and {µn } be probability measures on a sep-
arable metric space (S, d) such that µn ⇒ µ. There exist random variables X and {Xn }
d d
on (S, d) defined on a common probability space (Ω, F , P) such that Xn = µn , X = µ and
Xn → X P–a.s. as n → ∞.

Proof. For any p ∈ N let {Bkp : k ∈ N} be a partition of S by measurable sets such that
Sm p p
supk diam(Bkp ) < 2−p and µ(∂Bkp ) = 0 for each k. Choose mp so that µ k=1 Bk > 1−2−p
and define
m
Ap0 = ∪k=1
p
(Bkp )c , Apk = Bkp 1 ≤ k ≤ mp .
For each n, p ∈ N and 1 ≤ k ≤ mp , let

µn (·|Apk ) if µn (Apk ) 6= 0
µpn,k (·) =
µ otherwise
d
By Corollary 16.3.4 there exists a probability space (Ω, F , P) and random variables Gpn,k =
d d
µnn,k , X = µ and θ = U (0, 1) on Ω such that {Gpn,p } and (X, θ) are independent, as well as
X and θ.
d
Let {Yn } be random variables in S such that Yn = µnP(defined not necessarily in a common
mp p
probability space). For each p, define K p : s 7→ k=0 k1Ak (s), and set κn = K (X),
p p
p p p p
κn = K p (Yn ). Since limn µn (Ak ) = µ(Ak ) for each 1 ≤ k ≤ mp , it follows that κn ⇒ κp as
n → ∞. Consequently, be Lemma 17.8.2, there exist random variables κ epn = κ
epn (X, θ) such
p p
en → κ P–a.s. as n → ∞. Define
that κ
Xnp = Gpn,k on κpk = k},
{e
and observe
mp mp
X X
P[Xnp ∈ A] = P[Gpn,k ∈ epn
A, κ = k] = µn (A ∩ Apk ) = µn (A).
k=0 k=0
548 17. Weak convergence of measures

d
for any A ∈ B(S), that is, Xnp = µn for each n, p ∈ N. Since X ∈ Apκp and Xnp ∈ Apκep
n
P–a.s., we have that
κpn 6= κp } ∪ {X ∈ Ap0 }.
{d(Xnp , X) > 2−p } ⊂ {e
S
epn → κp P–a.s. and {e
Since κ κpn 6= κ} = {|eκpn − κ| ≥ 1}, then limm P[ κpn
n≥m {e 6= κ}] = 0,
p
and from µ(A0 ) < 2−p , we conclude that there is np ∈ N such that
[
P[ κpn 6= κ}] < 2−p .
{e
n≥np

We may assume that n1 < n2 < . . .. By Borel–Cantelli’s, we have that sup d(Xnp , X) ≥ 2−p
n≥np
d
P–a.s. for all but finitely many p. If Xn = Xnp , np ≤ n < n, then Xn = µn and Xn → X
P–a.s. as n → ∞.

The following result, very useful in applications, states that sequences of random vari-
ables that are closed to one another have the same weak limit distribution.

Theorem 17.8.4. (Slutsky) Let {Xn }, {Yn } and X be random variables in (S, d) defined on
a probability space (Ω, F , P). If Xn ⇒ X and d(Xn , Yn ) → 0 in probability, then Yn ⇒ X.

Proof. Let F ε = {x ∈ S : d(x, F ) ≤ ε}. Then

P[Yn ∈ F ] ≤ P[d(Xn , Yn ) > ε] + P[Xn ∈ F ε ]
Since F ε is closed, by the Portmanteau theorem (iii), we obtain
lim sup P[Yn ∈ F ] ≤ lim sup P[Xn ∈ F ε ] ≤ P[X ∈ F ε ]
n n

If F is closed, then Fε ց F as ε ց 0 and the result Follows from the Portmanteau theorem
(iii).

Corollary 17.8.5. Let Xn and Yn be real or complex valued random variables defined on
a common probability space, and let c be a real or complex constant Assume that Xn ⇒ X
and Yn ⇒ c. Then
(i) Xn + Yn ⇒ X + c.
(ii) Yn Xn ⇒ cX.
(iii) If c 6= 0, then Xn /Yn ⇒ X/c.

Proof. Clearly [Xn , c]⊤ ⇒ [X, c]⊤ . As d2 ([Xn , Yn ]⊤ , [Xn , c]⊤ ) = |Yn − c| converges to 0
in measure, [Xn , Yn ] ⇒ [X, c]⊤ . Consequently, f (Xn , Yn ) ⇒ f (X, c) for any f which is
P ⊗ δc –a.s. continuous. (i), (ii) follow from the particular continuous cases [x, y]⊤ 7→ x + y
and [x, y]⊤ 7→ yx. (iii) follows from the particular case h : [x, y]⊤ 7→ x/y since the set of
discontinuities of h is Dh = C × {0} and P[(X, c) ∈ C × {0}] = P[X ∈ C]δc ({0}) = 0.
17.9. Exercises 549

17.9. Exercises
Exercise 17.9.1. Let (S, τ ) be a Hausdorff topological space. Suppose W is a linear
subspace of Cb (S) which separates points of M(S). Show that
(a) M+ (S) is a closed convex pointed cone of (M, σ(M, W)).
(b) The collection M+ +
1 (S) = {µ ∈ M (S) : µ(S) = 1} of Baire probability measures
on S is a closed convex subset of (M, σ(M, W)).
Exercise 17.9.2. Let (S, d) be a metric space. Suppose ψ ∈
C(S) and ψ ≥ 1. Show that
span δx : x ∈ S , co aδx : x ∈ S, a ≥ 0 , and co δx : x ∈ S are σ Mψ (S), C ψ (S) –dense
in Mψ (S), Mψ ψ +
+ (S), and M (S) ∩ M1 (S) respectively.
Exercise 17.9.3. Suppose (K, B(K)) is a compact metric space
with Borel σ–algebra, and
let D a dense subset of K. Show that M(D), σ(M(D), Ub (D)) coincides with the subspace
MD (K) ⊂ M(K), σ(M(K), Cb (K)) of all measures ν on K such that ν(K \ D) = 0.
Exercise
R 17.9.4. Suppose f ∈ Lb (S), where (S, d) is a metric space. Show that the map
µ 7→ f dµ on M+ (S) with the relative weak R topology σ(M(S), Cb (S)) is lower semicon-
tinuous. Similarly, if f ∈ Ub (S), then µ 7→ f dµ is upper semicontinuous.
Exercise 17.9.5. Complete the proof that k k∗ given by (17.5) defines a norm on the space
M(S), where (S, d) is metric space (not necessarily separable).
Exercise 17.9.6. Under the assumptions of Lemma 17.3.2, given δ > 0 define hδ (x) :=
Ωh (B(x; δ)).
T For any r > 0 show that the set Aδr := {x ∈ S : hδ (x) > r} is open in (S, d).
δ
Show that δ>0 Ar = Jr .
Exercise 17.9.7. Suppose Xn and X are random variables in a metric space (S, d) and
that Xn converges in probability to X. Suppose that h : S → (S ′ , d′ ) is continuous on a set
C ⊂ S and that P[X ∈ C] = 1. Show that h(Xn ) converges in probability to h(X). (Hint:
Fix ε > 0. For any δ > 0,
P∗ [d′ (h(Xn ), h(X)) > ε] ≤ P∗ [d(Xn , X) ≥ δ]
+ P∗ [d(Xn , X) < δ, d′ (h(Xn ), h(X)) > ε]
≤ P∗ [d(Xn , X) ≥ δ] + P[X ∈ Aδε ].
Use Exercise 17.9.6.)
Exercise 17.9.8. Suppose µn , n ∈ Z+ , and µ are probability measures on the (R, B(R).
On ((0, 1), B((0, 1)), λ1 ) define Xn (t) = inf{x : Fn (x) ≥ t} and X(t) = inf{x : F (x) ≥ t},
where Fn (x) = µn (−∞, x] (similarly for F ). If µn ⇒ µ, show that Xn → X pointwise.
Exercise 17.9.9. Let X, {Xn } be real valued random variables. Suppose Xn ⇒ X and let
{an } be a numerical sequence.
(i) Suppose that Xn + an converges in distribution. Show that an converges.
(ii) Show that if an > 0, an Xn ⇒ X, and X is not identically zero, then limn an = 1.
(iii) Show that if an → ∞ and an Xn converges in law, then Xn → 0 in measure.
Chapter 18

Weak convergence in
Euclidean spaces

Weak convergence of (complex) measure on complete separable spaces is fully characterized

by the tightness through the Prohorov–Varadarajan theorem. However, for positive finite
measures in Euclidean spaces, the treatment can be carried out without the heavy machinery
used in section 17.4.

18.1. Weak convergence and distribution functions

In this section we will observe that weak convergence of positive finite Borel measures on Rn
is closely related to the continuity properties of the corresponding distribution functions.

Theorem 18.1.1. (Helly–Bray) Suppose that µ, µn are nonnegative finite measures on

B(Rd ) with distributions F and Fn respectively. Then, µn ⇒ µ if and only if µn (Rd ) →
µ(Rd ), and Fn (x) → F (x) at any point x where F is continuous.

Proof. Since F is monotone nondecreasing on each coordinate variable xk and right–

continuous, it is easy to check that F is continuous at x if and only if limyրx F (y) = F (x),
that is, if and only of µ(∂{y : y ≤ x}) = 0. Then, necessity is a consequence of the
Portmanteau Theorem (iv).

To prove sufficiency we assume without loss of generality that µn (Rd ) = µ(Rd ). Each d–
dimensional interval (a, b] is determined by the 2d hyper planes that contain its faces.
Let U be the class of d–dimensional intervals for which the hyper planes containing their
faces have µ–measure zero. Notice that there are at most countably many hyper planes
(orthogonal to one of the main axis) with positive µ–measure. For each A = (a, b] ∈ U , let
VA the set of vertices of A; then, each v ∈ VA is a point of continuity for F . Since µn (A) =

551
552 18. Weak convergence in Euclidean spaces

P P
v∈VA (−1) where p(v) = dk=1 1{vk =ak } , we conclude that µn (A) → µ(A) for
p(v) F (v),
n
each A ∈ U . Therefore, by Corollary 17.6.2, µn ⇒ µ.
Theorem 18.1.2. Let µn be a sequence of measures in M+ (B(Rd ) with distributions Fn ,
and let F be a right–continuous function in Rd with positive increments. If supn kµn kT V <
v
∞ and Fn (x) → F (x) for each point x of continuity of F , then µn − → µ, where µ is
Qd
the Lebesgue–Stieltjes measure with µ((a, b]) = k=1 ∆k (ak , bk )F for each d–dimensional
interval (a, b].

Proof. We will show that µn (A) → µ(A) for any Borel measurable set A such that µ(∂A) =
0 and A compact. Let U be the collection of d–dimensional intervals for which the parallel
hyper planes containing their faces have zero µ–measure. Clearly A is compact and µ(∂A) =
0 if A ∈ U , and as in the proof of Theorem 17.6.1, lim inf n µn (G) ≥ µ(G) for any G
open. By Theorem 17.2.11, it suffices to prove that lim supn µn (K) ≤ µ(K) holds for any
d
Sn set K. Assume that K ⊂ R compact, then for any ε, one can choose an open set
compact
V = j=1 (ai , bj ) ⊃ K such that (ak , bk ] ∈ U and µ(V ) < µ(K) + ε. As in the proof of
Theorems 17.6.1 and 18.1.1, limn µn (V ) = µ(V ), so lim supn µn (K) ≤ limn µn (V ) = µ(V ) <
µ(K) + ε.
Theorem 18.1.3. (Helly’s selection theorem) Any sequence of uniformly bounded measures
in (Rd , B(Rd )) has a weak*–convergent subsequence.

Proof. Without loss of generality, assume that supn kµn kT V ≤ 1. A short proof follows
from the Alaoglu’s and Riesz representation theorems. Indeed, C0 (Rd )∗ = M(Rd ) and The
closed ball B = {µ ∈ M(Rd ) : kµkT V ≤ 1} is weak* compact. Since C0 (Rd ) is separable,
then B with the weak* topology is metrizable and hence sequentially compact.

If µn is a sequence of probability functions on (Rd , B(Rd )), the vague limit in Helly’s
selection Theorem may not be a probability measure since some mass may escape to infinity.
We conclude this section with a result that extends the notion of approximations to the
identity to measures in B(Rn ).
R
Theorem 18.1.4. Let {Kε : ε > 0} be a family of good kernels on Rn such that Kε dλn = a
for all ε > 0. For any complex measure µ on (Rn , B(Rn )), (Kε · λn ) ∗ µ converges weakly to
a µ as ε → 0.

Proof. Fix f ∈ Cb (Rd ). Applying Fubini’s theorem and using the translation of Lebesgue
measure we have that
Z Z Z Z

f (z)(Kε · λn ∗ µ)(dz) − a f dµ = f (x + y) − f (y) Kε (x) dx µ(dy)
Z Z
≤ |f (x + y) − f (y)||Kε (x)| dx |µ|(dy)

Let M = supε>0 kKε k1 . For any η > 0, there exists a compact subset K of Rn such that
η
(18.1) 2kf ku M |µ|(K c ) <
3
18.1. Weak convergence and distribution functions 553

For such K, there is δ > 0 such that

η
(18.2) kµkT V M sup |f (v) − f (u)| <
{v∈K,|v−u|≤δ} 3
For such δ, there exists ε′ > 0 such that 0 < ε < ε′ implies that
Z
η
(18.3) 2kf ku kµkT V |Kε |(x) dx <
|x|>δ 3
Putting (18.1), (18.2) and (18.3) together gives
Z Z
|f (x + y) − f (y)||Kε |(x) dx|µ|(dy)
Z Z Z Z Z Z
≤ + + |f (x + y) − f (y)||Kε |(x) dx|µ|(dy) < η
K |x|≤δ K |x|>δ Kc Rn

This shows that Kε ∗ µ ⇒ aµ as ε → 0.

R
Corollary 18.1.5. Let ϕ ∈ L1 (Rn , λn ) be such that ϕ
b ∈ L1 (Rn , λn ) and b dt = 1. For
ϕ(t)
any complex measure on B(Rn ) define
Z
Sε (µ, x) = ϕ(εs)ei2πx·s µ b(−2πs) ds

b ∈ L1 (Rn , λn ) then µ ≪ λn and

Then, Sε · λn ⇒ µ as ε → 0. If µ
Z
dµ
(x) = ei2πx·s µ b(−2πs) ds
dλn
dµ
at every Lebesgue point of dλn (x).

b
Proof. Set K(x) = ϕ(−x) and let Kε (x) = ε−n K(ε−1 x) for all ε > 0. Using Fubini’s
theorem and the translation invariance of λn we obtain that
Z
Kε ∗ µ(x) = ε−n ϕ(εb −1 (y − x)) µ(dy)
Z Z
−n −1
= ε e−i2πε (y−x)·s ϕ(s) dsµ(dy)
Z Z
= e−i2π(y−x)·s ϕ(εs) dsµ(dy)
Z Z
i2πx·s
= e ϕ(εs) e−i2πy·s µ(dy) ds
Z
= ϕ(εs)ei2πx·s µ b(−2πs)ds

The first statement follows from Theorem 18.1.4. The second statement follows from the
observations in Remark 15.5.2 and by Theorem (15.3.8)(i). It can also be proved directly
2
by considering ϕ(x) = e−π|x| . By dominated convergence, for any f ∈ C00 (Rn )
Z Z Z Z
f dµ = lim f (x) Sε (µ, x) dx = f (x) ei2πx·s µb(s) ds dx.
ε→0
554 18. Weak convergence in Euclidean spaces

R
Then ν(dx) = µ(dx) − ei2πx·s µ
b(s) ds dx is a complex measure that vanishes on C00 (Rn ).
It follows from the Riesz representation theorem that ν = 0.
Remark 18.1.6. Theorem 18.1.4 offers a direct proof the separability of the Fourier trans-
form of complex measures in B(Rn ). Indeed, if µ
b = νb then Sε (µ) = Sε (ν), and so for all
f ∈ Cb (Rn )
Z Z
µf = lim f Sε (µ) λn = lim f Sε (ν) λn = νf.
ε→0 ε→0

In particular ν and µ coincide in C0 (Rn ); hence µ = ν by Riesz representation.

18.2. Tightness and weak convergence of positive measures in Rn

The connection between tightness and convergence of (positive) measures in Rd is easy to
study in comparison the case of complex measures.
Lemma 18.2.1. Let µn , µ be measures in M+ (Rd ). If µn ⇒ µ, then {µn } is tight.

Proof. Given ε > 0, let M > 0 such that µ(|x| > M ) < 2ε . Let f (x) be a continuous
function such that 1Rd \B(0;2M ) ≤ f ≤ 1Rd \B(0;M ) , for instance

f (x) = 0 ∨ (1 ∧ ( |x|
M − 1)).
R R
Then lim supn µn (|x| > 2M ) ≤ limn f dµn = f dµ < 2ε ; hence, there is n0 such that
supn>n0 µn (|x| > 2M ) < ε. For 1 ≤ n ≤ n0 , let Mn > 0 such that µn (|x| > Mn ) < ε.
Therefore, if J = 2M ∨ max1≤n≤n0 Mn then supn µn (|x| > J) < ε.
v
Lemma 18.2.2. Let {µn }, µ be measures in M+ (Rd ) such that µn − → µ. Then, {µn } is
tight if and only if limn µn (Rd ) = µ(Rd ), in which case µn ⇒ µ.

Proof. Assume {µn } is tight and let f ∈ Cb (Rd ). For ε > 0, there is r > 0 such that
µ(|x| > r) ∨ supn µn (|x| > r) < 2(kf kεu +1) . Let gr ∈ C00 (Rd ) such that 1B(0;r) ≤ gr ≤ 1.
Then
Z Z Z Z
| f d(µn − µ)| ≤ | (f − f gr ) dµn | + | f gr d(µn − µ)| + | (f − f gr ) dµ|
Z
≤ | f gr d(µn − µ)| + kf ku (µn + µ)(|x| > r)
Z
< | f gr d(µn − µ)| + ε
R R
Letting n ր ∞ and then ε ց 0 we obtain limn f dµn = f dµ.
Sufficiency follows from Theorem 17.2.12 and Lemma 18.2.1.

Combining Lemmas 18.2.1 and 18.2.2 we obtain the following result.

Theorem 18.2.3. A family Π of uniformly bounded measures in M+ (Rd ) is tight if and
only if every sequence in Π has a weakly convergent subsequence.
18.4. Characteristic functions and weak convergence 555

Proof. For any sequence {µn } ⊂ Π there is, by Helly’s selection theorem, a subsequence
v
µn′ and a finite measure µ such that µn′ −
→ µ. If Π is tight, then Lemma 18.2.2 implies that
µn′ ⇒ µ.
Conversely, if Π is not tight, then for some ε > 0 and there is a sequence {µn } ⊂ Π such
that µn (|x| > n) ≥ ε. By hypothesis, there is a subsequence {µn′ } such that µn′ ⇒ µ and
hence, by Lemma 18.2.2, {µn′ } is tight. Therefore, supn′ µn′ (|x| > M ) < ε for some M > 0.
This contradicts the choice of µn .

18.3. Random series with independent terms

In Sections 16.7.2
P and 16.7.1 we derived necessary and sufficient conditions for converge of
random series n Xn whose terms {Xn } for an independent sequence. Using Prohorov’s
theorem, we can obtain another characterization in terms of convergence in distribution.
Theorem 18.3.1. Let {Xn } be a sequence of independent random variables defined on a
probability space (Ω, F , P). The following statements are equivalent
P
(a) n Xn converges P–a.s.
P
(b) n Xn converges in probability.
P
(c) n Xn converges in distribution.

Proof. It is clear that (a) or (b) imply (c). Theorem 16.7.5 states Pthat (a) and (b) are
equivalent. Thus, it suffices to show that (c) implies (b). Let Sn = nk=1 Xk , and let µ be
the Borel probability measure on R to which Sn converges. Then, {Sn } is uniformly tight
and so is the family {Sm − Sn : m < n}. We will show that {Sm } is a Cauchy sequence in
measure. If that were not the case, there is a sequence Yj := Smj − Snj such that
Z
(18.4) inf D(Yj , 0) = inf |Yj | ∧ 1 dP > 0
j j

By Prohorov’s theorem, we can extract a subsequence Yj ′ that converges in distribution to

some probability measure ν. Since Smj = Snj + (Smj − Snj ), it follows that µ ∗ ν = µ.
Hence, ν = δ0 and therefore Yj ′ → 0 in measure; this is in contradiction to (18.4). We
conclude that {Sn } is Cauchy and therefore convergent in probability.

18.4. Characteristic functions and weak convergence

In this section we show how characteristic functions can be used to determine weak con-
vergence of sequences of measures in M+ (Rd ). The starting point will be the following tail
estimate.
Lemma 18.4.1. If µ ∈ M+ (Rd ) has characteristic function µ b, then
Z
r 2/r
(18.5) µ(x : |a · x| ≥ r) ≤ (µ(Rd ) − µ
b(ta)) dt
2 −2/r
556 18. Weak convergence in Euclidean spaces

x
Proof. It suffices to assume that µ(Rd ) = 1. Since sin x ≤ 2 for all x ≥ 2, we obtain
Z c Z Z c
d
(µ(R ) − µ
b(ta)) dt = ( (1 − eita·x ) dt) µ(dx)
−c −c
Z
sin(ca · x)
= 2c (1 − )µ(dx) ≥ cµ(x : |c a · x| ≥ 2)
ca · x

(18.5) follows by taking c = 2/r.

Lemma 18.4.2. Suppose {µn : n ∈ N} ⊂ M+ (Rd ), and let charac µ bn be the characteristic
function of µn . If {b
µn : n ∈ N} converges pointwise to a some limit p and {µn : n ∈ N} is
tight, then µn ⇒ µ for some measure µ ∈ M+ (Rd ) with µ b = p.

Proof. Pointwise convergence implies that µn (Rd ) = µ bn (0) → p(0); thus, {µn } is uniformly
bounded. Tightness, Helly’s theorem and Lemma 18.2.2 imply that any subsequence of
{µn } has a weakly convergent subsequence µn′ . Suppose µn′ ⇒ µ, then since ft (x) =
eit·x ∈ Cb (Rd ), we have that p(t) = limn µbn′ (t) = µ
b(t). By uniqueness of characteristic
functions, any subsequential limit is actually µ; therefore, µn ⇒ µ.

Theorem 18.4.3. (Lévy–Bochner) Let {µn : n ∈ N} ⊂ M+ (Rd ). Suppose that µ bn (t) →

p(t) pointwise, and that p is continuous at 0. Then, {µn } is tight and µn ⇒ µ for some µ
b = p.
with µ

Proof. Clearly µn is uniformly bounded. By (18.5) and dominated convergence, for any
a ∈ Rd and r fixed we have
Z
r 2/r
(18.6) lim sup µn (x : |a · x| > r) ≤ (p(0) − p(ta)) dt
n 2 −2/r

Since p is continuous at 0, the right hand side of (18.6) tends to 0 as r ր ∞. Therefore,

{µn } is tight, and by Lemma 18.4.2 µn ⇒ µ for some µ with µ b = p.

The next results gives necessary and sufficient conditions for weak convergence of a
sequence of probability measures in Euclidean space in terms of the corresponding sequence
of characteristic functions.

Theorem 18.4.4. Let {µ, µn : n ∈ N} ⊂ M+ (Rd , B(Rd )). Then, µn ⇒ µ if and only if
bn → µ
µ b uniformly in compact sets.

Proof. Necessity follows by a direct application of Rao’s theorem 17.2.8 with (S, d) =
(Rd , k k2 ), and Γ := {ft (x) = eit·x : t ∈ K} where K ⊂ Rd is compact.

Sufficiency follows from Lévy–Bochner’s theorem 18.4.3.

18.5. Positive definite functions 557

18.5. Positive definite functions

If ϕ(t) is the characteristic function of a finite measure µ on B(Rd ), we know that ϕ is
uniformly continuous, ϕ(0) = µ(Rd ) and that ϕ is positive definite; that is
n
X
(18.7) cl ϕ(tl − tk )ck ≥ 0
l,k=1

for all c1 , . . . , cn ∈ C, t1 , . . . , tn ∈ Rd , n ≥ 1. The converse of result is the celebrated

Bochner–Herglotz theorem.
Lemma 18.5.1. Suppose that ϕ ∈ L1 (Rd ) is continuous and positive definite. There exists
a function f ∈ L+ d
1 (R ) such that
Z
b
ϕ(t) = f (t) = e−2πi t·x f (x) dx.

b ∈ L1 . Since ϕ is positive definite,

Proof. By Theorem 15.5.5, it is enough to show that ϕ
we have that
(|c1 |2 + |c2 |2 )ϕ(0) + c1 ϕ(t1 − t2 )c2 + c1 ϕ(t2 − t1 )c2 ≥ 0
for all c1 , c2 ∈ C and t1 , t2 ∈ Rd . The choice (c1 , c2 ) = (1, 0) shows that ϕ(0) ≥ 0; the
choices (c1 , c2 ) = (1, 1), (t1 , t2 ) = (0, t) and (c1 , c2 ) = (i, −1), (t1 , t2 ) = (0, t) show that
ϕ(−t) = ϕ(t); the choice (c1 , c2 ) = (ϕ(t)/|ϕ(t)|, −1), (t1 , t2 ) = (0, t) shows that |ϕ(t)| ≤
ϕ(0).
Given g ∈ L1 (Rd ) define F = ϕ ∗ g ∗ ǧ, where ǧ(x) = g(−x). Then, F ∈ L1 ∩ C and
(18.8) kF ku ≤ kgk21 ϕ(0), kF k1 ≤ kϕk1 kgk21
R
Since ϕ is positive definite then F (0) = Rd ×Rd ϕ(x − y)g(x)g(y)dx ⊗ dy ≥ 0 and Fb (t) =
g(t)|2 ϕ(t).
|b̄ b In addition, if g is in the Schwartz class S, then |b̄g|2 ∈ S by Corollary 15.7.4;
moreover, |b̄ 2
g| ϕb ∈ L1 ∩ C0 and by Theorem 15.5.5,
Z Z
F (x) = e2πi t·x Fb(t) dt e2πi t·x |b̄
g(t)|2 ϕ(t)
b dt.

We show now that ϕ b 0 ) < 0 for some point t0 ∈ Rd , choose a real valued
b ≥ 0. Indeed, if ϕ(t
∞ (Rd ) that equals one at t and that vanishes outside a neighborhood of t
function ρ ∈ C00 0 R 2πi t·x 0
in which ϕb is negative. Let g(x) = F −1 ρ(x) = e ρ(t) dt so that, by Theorem 15.5.5,
R
gb = ρ. Thus, 0 ≤ F (0) = ρ2 (t)ϕ(t)
b dt < 0, which is a contradiction.
2
Let g1 (x) = √1 e−|x| , gn (x) = nd g(|x|n) and define Fn = gn ∗ gn ∗ ϕ. Since kgn k1 = 1
( 2π)d
2 2 2
and gbn (t) = e−2π |t| /n ր 1, we obtain by Monotone Convergence Theorem and (18.8) that
Z Z
b dt = lim
ϕ(t) (gbn (t))2 ϕ(t)
b dt ≤ ϕ(0) < ∞.
n

b ∈ L+
Therefore, ϕ d
1 (R ).
558 18. Weak convergence in Euclidean spaces

Theorem 18.5.2. (Bochner–Herglotz) ϕ : Rd → C is the characteristic function of a finite

measure µ in B(Rd ) iff ϕ is a bounded positive definite continuous function.

Proof. We only prove sufficiency. If ϕ ∈ L1 , then Lemma 18.5.1 implies that ϕ = µ b for
some finite measure µ ≪ λd . The general case will be obtained from the integrable case
through Levy–Bochner’s continuity theorem. R i t·x Suppose γ is a positive integrable function
such that kγk1 = 1 and define σ(t) := e γ(x) dx = γ b(−t/(2π)). Then, t 7→ ϕ(t)σ(t) also
satisfies the conditions of the Theorem, for
Xn X n Z
(tj −tk )·y
cj ϕ(tj − tk )σ(tj − tk )ck = cj ϕ(tj − tk ) e γ(y) dy ck
j,k=1 j,k=1
Z X
n
= cj ei tj ·y ϕ(tj − tk )ck ei tk ·y γ(y) dy ≥ 0.
j,k=1

2
Consider γ(x) = √1 e−|x| /2 and define γn = nd γ(nx) for each n ∈ N. Since σn (t) =
2π
2 2
e−|t| /n , ϕσn ∈ L1 (Rd ) for each n ∈ N; consequently, ϕσn = µ
cn for some finite measure
µn ≪ λd , and µn (Rd ) = ϕ(0). Since limn ϕσn = ϕ, we conclude from Levy’s continuity
theorem that there is a finite measure µ such that µn ⇒ µ and µ
b = ϕ.

18.6. Classical Central Limit Theorem

In this section we derive the classical central limit theorem for sequences of i.i.d. random
vectors. This result is very important in the applications of Probability and Statistics.
We will start by discussing a few estimates of the exponential function on C.

Lemma 18.6.1. Let {zj , wj : 1 ≤ j ≤ n} ⊂ C such that maxj {|zj |, |wj |} ≤ θ. Then
Yn n
Y n
X
n−1
(18.9) z j − w j ≤ θ |zj − wj |
j=1 j=1 j=1

Proof. (18.9) holds trivially if n = 1. By induction, suppose that (18.9) holds for n − 1.
Then
Yn Yn Y n n
Y Y n n
Y

zj − w j ≤ z1 zj − z1 w j + z1 wj − w1 wj
j=1 j=1 j=2 j=2 j=2 j=2
Yn n
Y n
X

≤ θ zj − wj + θn−1 |z1 − w1 | ≤ θn−1 |zj − wj |
j=2 j=2 j=1

Theorem 18.6.2. Suppose |z| ≤ 1, then |ez − 1 − z| ≤ |z|2 .

18.6. Classical Central Limit Theorem 559

P zn
Proof. Observe that 2n−1 ≤ n! for n ≥ 2. Since ez − 1 − z = n≥2 n! and |z| ≤ 1, it
follows that
|z|2 X −(n−2)
|ez − 1 − z| ≤ 2 = |z|2
2
n≥2

18.6.1. Classical CLT for i.i.d. sequences.

cn n
Theorem 18.6.3. Suppose that {cn } ⊂ C and that cn → c. Then, 1 + n → ec .

Proof. Let γ > |c| and n0 large enough so that |cn | < γ and γ/n ≤ 1 whenever n ≥ n0 . If
zj = 1 + cnn and wj = ecn /n , 1 ≤ j ≤ n, then have that
max {|zj |, |wj |} ≤ eγ/n
1≤j≤n
c /n γ2
e n − 1 − c n ≤ .
n n2
Therefore
2
(1 + cn )n − ecn ≤ eγ(n−1)/n n γ → 0.
n n2
The conclusion follows from the continuity of the exponential function.
Theorem 18.6.4. Let {cn,m : 1 ≤ m ≤ mn } ⊂ C. Suppose that
(i) lim sup1≤m≤mn |cn,m | = 0,
n→0
Pm n
(ii) lim
n→∞ m=1 cn,m = c ∈ C,
P n
(iii) and M := supn m m=1 |cn,m | < ∞.
Then,
mn
Y
(1 + cn,m ) = ec
m=1

log(1+z)
Proof. If log is the principal logarithm on C\(−∞, 0]×{0}, then for |z| < 1 lim z = 1.
z→0
Given ε > 0, there is δ > 0 such that 0 < |z| < δ implies
| log(1 + z) − z| < ε|z|.
Without loss of generality, we can assume that supm |cn,m | < 1 for all n. Then,
Xmn
Xmn mn
X

log(1 + cn,m ) − cn,m ≤ | log(1 + cn,m ) − cn,m | ≤ ε |cn,m | < M ε
m=1 m=1 m=1
By letting n → ∞ and then ε → 0 we obtain
mn
X mn
X
lim log(1 + cn,m ) = lim cn,m = c
n→∞ n→∞
m=1 m=1
The conclusion follows from the continuity of the exponential function.
560 18. Weak convergence in Euclidean spaces

Theorem 18.6.5. (Classical CLT) Let {Xn } ⊂ PL2 (P) be a sequence of i.i.d random vectors
with covariance matrix Σ = E[XX ∗ ]. If Sn = nk=1 Xk , then
Sn − nE[X1 ]
√ =⇒ N (0, Σ)
n
where N (0, Σ) is s multivariate normal distribution with mean 0 and covariance Σ.

Proof. By setting Xn′ = Xn − E[X1 ] we can assume that {Xn } is a mean zero sequence.
Equation 15.4 in Theorem 15.1.9 shows that
t∗ Σt
ϕX1 (t) = 1 − + o(|t|2 )
2
Therefore, for fixed t
it · Sn t∗ Σt 1 n 1
E √ = 1− +o → exp − t∗ Σt
n 2n n 2
by Lemma 18.6.3. The conclusion follows from Levy–Bochner’s theorem.

18.6.2. Lindeberg–Feller CLT. In this section we obtain a slightly more general CLT
for independent random variables.
Theorem 18.6.6. For each n, Let Xn,m , 1 ≤ m ≤ mn be independent random vectors with
E[Xn,m ] = 0. Suppose that
Pm n ∗
(1) m=1 E[Xn,m Xn,m ] → Σ, where Σ is a positive definite matrix.
P n n→∞
(2) For any ε > 0, m 2
m=1 E[|Xn,m | ; |Xn,m | > ε] −−−→ 0.
P n
Then Sn = m m=1 Xn,m =⇒ N (0, Σ).

∗ ]. By Levy–Bochner’s theorem,
Proof. Let ϕn,m (t) = E[eit·Xn,m ] and Σn,m = E[Xn,m Xn,m
it is enough to show that
Ymn 1
(18.10) ϕn,m (t) → exp t∗ Σt
2
m=1

Let zn,m = ϕn,m (t) and wn,m = 1 − 21 t∗ Σn,m t. For 0 < ε ≤ 1, Corollary 15.4 shows that
|t·Xn,m |3
|zn,m − wn,m | ≤ E 6 ∧ |t · Xn,m |2
|t|3
≤ E |Xn,m |3 ; |Xn,m | ≤ ε + |t|2 E |Xn,m |2 ; |Xn,m | > ε
6
ε|t|3
≤ E |Xn,m |2 ; |Xn,m | ≤ ε + |t|2 E |Xn,m |2 ; |Xn,m | > ε
6
Adding over m = 1, . . . , mn and passing first to the limit n → ∞ and then ε → 0, we obtain
from assumptions (1) and (2)
mn
X
(18.11) lim |zm,n − wn,m | = 0.
n→∞
m=1
18.6. Classical Central Limit Theorem 561

Assumption (2) along with

kΣn,m k ≤ E[|Xn,m |2 ] ≤ ε2 + E[|Xn,m |2 ; |Xn,m | > ε]

mm
X
≤ ε2 + E[|Xn,m |2 ; |Xn,m | > ε]
m=1

shows that lim supm kΣn,m k = 0. Hence, for all n large enough |wn,m | ≤ 1. Since
n→∞
|ϕn,m (t)| ≤ 1, Lemma 18.9 with θ = 1 and (18.11) imply that

Ymn mn
Y 1

lim ϕn,m (t) − (1 − t∗ Σn,m t) = 0
n→0 2
m=1 m=1

P n ∗ P mn ∗ n→∞
Since m m=1 |t Σn,m t| =
∗
m=1 t Σn,m t −−−→ t Σt, the conditions in Lemma 18.6.4 with
1 ∗
cn,m = − 2 t Σn,m t are satisfied; hence

mn
Y mn
Y 1 1
lim ϕn,m (t) = lim (1 − t∗ Σn,m t) = exp − t∗ Σt .
n→∞ n→∞ 2 2
m=1 m=1

The following result is useful to derive asymptotic properties of smooth transformations of

a sequence of random vectors Xn that converges in law.

Theorem 18.6.7. (Delta Method) Let Xn , n ∈ N, and Y be random vectors in Rm . Suppose

an → ∞ and an (Xn − c) ⇒ Y . If g is a differentiable function, then

an (g(Xn ) − g(c)) ⇒ g ′ (c)Y

Proof. Using a first order Taylor expansion we obtain

g(x) − g(c) − g ′ (c)(x − c) = o(x − c)

where |o(x − c)|/|x − c| → 0 as x → c. By Slutsky’s theorem and Theorem 17.8.1, it

is enough to show that an o(Xn − c) converges in law, an hence in measure, to 0. Since
an (Xn − c) converges in law and an → ∞, by Slutsky’s theorem and Theorem 17.8.1 shows
that Xn − c = a1n an (Xn − c) converges in law, and hence in measure, to 0. For any ε > 0,
there is δ > 0 such that |x − c| < δ implies that |o(x − c)| < ε|x − c|. Hence,

P[|o(Xn − c)|/|Xn − c| ≥ ε] ≤ P[|Xn − c| ≥ δ] → 0

as n → ∞, and so o(Xn − c)/|Xn − c| converges in measure to 0. As an (Xn − c) converges

Example 18.6.8. Let (Xn : n ∈ N) be an i.i.d sequence of random variables in L4 (P) and
assume that E[X1 ] = 0 and let σ 2 = E[X12 ].
n
X X
(n − 1)Sn2 : = (Xi − X n ) = Xi2 − n(X n )2
i=1 j=1
Xn n
X X
1
= Xi2 − Xj2 + 2 Xj Xj
n
i=1 j=1 1≤j<k≤n

Then (n − 1)E[Sn2 ] = (n − 1)(σ 2 − cov(X1 , X2 )) = (n − 1)σ 2 . By the i.i.d Central Limit

Theorem

√ Xn 0
(18.12) n − =⇒ N (0, Σ)
X 2n σ2
where
σ 2 m3
Σ=
m3 m4 − σ 2
where m3 = E[X 3 ] and m4 = E[X 4 ]. Consider the function in R × (0, ∞) given by φ :
(x, y) 7→ y − x2 . Then φ(X n , X 2 n ) = n−1 2 2 2
n Sn and Dφ(0, σ ) = [0, σ ]. The delta-method
shows that
√ n−1 2
n Sn − σ 2 =⇒ [0 1]N (0, Σ) = N (0, m4 − σ 2 ).
n
By the law of large numbers √1n Sn2 → 0 a.s. Therefore, by Slutsky’s theorem
√
n(Sn2 − σ 2 ) =⇒ N (0, m4 − σ 2 ).

18.7. Poisson approximation

The Poisson distribution with parameter λ ≥ 0 is the measure supported on Z+ given by
P n
cλ (t) = exp λ(eit −1) .
Pλ (dx) = e−λ n≥0 λn! δn (dx). Its characteristic function is given by P
Theorem 18.7.1. Let Xn,m , m = 1, . . . , mn , be independent random random variables with
values in Z+ . Suppose that
Pm n
(i) m=1 P[Xn,m = 1] → λ as n → ∞.
(ii) max1≤m≤mn P[Xn,m = 1] → 0 as n → ∞.
Pm n
(iii) m=1 P[Xn,m ≥ 2] → 0 as n → ∞.
P mn
If Sn = m=1 Xn,m , then Sn =⇒ Pλ .

Proof. Let Yn,m = 1{Xn,m =1} , then P[Yn,m = 1] = P[Xm,n = 1], P[Yn,m 6= Xn,m ] =
Pmnn,m ≥ 2]. Hence, {Yn,m : 1 ≤ m ≤ mn } satisfy (i) and (ii); furthermore, if Tn =
P[X
m=1 Yn,m , then (iii) implies that |Tn − Sn | → 0 in measure. Therefore, by Slutsky’s
theorem, it suffices to prove that Tn =⇒ Pλ . Denote by pn,m = P[Yn,m = 1], then
mn
Y
ϕTn (t) = E[itTn ] = (1 + pn,m (eit − 1)).
m=1
18.8. Exercises 563

Let zn,m = 1 + pn,m (eit − 1) and wn,m = exp pn,m (eit−1 ) . Notice that |zn,m | ≤ 1 and
|wn,m | = epn,m (cos t−1) ≤ 1. By (ii), there is n0 such that ∆n := max1≤m≤mn pn,m < 1/2
whenever n ≥ n0 . Hence, by Lemma 18.9 and Theorem 18.6.2 we have that
Ymn mn
Y mn
X

zn,m − wn,m ≤ exp pn,m (eit − 1) − 1 − pn,m (eit − 1)
m=1 m=1 m=1
mn
X mn
X
≤ p2n,m |eit − 1|2 ≤ 4∆n pn,m → 0
m=1 m=1
Q mn
Since limn m=1 wn,m = exp(λ(eit − 1)) by (i), we conclude that
φTn (t) → exp(λ(eit − 1)).
Therefore, by Levy–Bochner’s theorem, Tn =⇒ Pλ .

18.8. Exercises
Exercise 18.8.1. Let λ be Lebesgue measure on ([0, 1], B([0, 1]). For each n, let µn (dt) =
fn (t)λ(dt) where
n−1
X
fn (t) = n2 1 k k 1
(t).
k=0 n , n + n3

Let Fn (x) be the distribution function of µn . Show that |Fn (x) − x| ≤ n1 , fn → 0 λ–a.s.
2
and that kµn − λkT V = nn−1 2 + 1 − n12 . Conclude that µn ⇒ λ but µn fails to converge in
total variation.
Exercise 18.8.2. Consider µn = δn and νn = 31 (δ−n + δ0 + δn ) on (R, B(R)). Show that
v v
µn −
→ 0 and νn −→ 31 δ0 . In the process of passing to the limit, {µn } lets mass escape to
{−∞} whereas {νn } lets equal mass escape to −∞ and ∞.
Exercise 18.8.3. Let µ be a complex Borel measure on Rn . Suppose T is the operator
on L1 (Rn , λn ) into itself given by f 7→ f ∗ µ. Show that kT k = kµkT V . (Hint: Let h be
a measurable function such that |h| = 1 and dµ = h d|µ|. Choose g ∈ C00 (Rn ) such that
|g| −π|x|2 , choose δ > 0 small enough so that
R ≤ 1 and kg − hkL1R(Rn ,λn ) < ε. With φ(x) = e

g(x)φδ ∗ µ(x)dx − g dµ < ε.)

Exercise 18.8.4. Let Zλ be a compound Poisson random walk with parameter λ and
P –distributed steps. Show that
Zλ − E[Zλ ]
√ =⇒ N (0, 1)
var Zλ
R
provided that x2 P (dx) < ∞.
Chapter 19

Conditioning and
disintegration

Throughout this section will consider probability spaces only, that is a measure space
(Ω,
R F , P) with P(Ω) = 1. For any integrable function X, we will denote by E[X] =
X(ω)P(dω).

19.1. Conditional expectation

Suppose that A ⊂ F is a sub–σ–algebra and that f ∈ L1 (P). The measure Pf (dω) :=
f (ω) P(dω) is absolutely continuous with respect P and, if we restrict the measures Pf and
P to A , then we also have that Pf ≪ P. The Radon–Nikodym theorem shows that there
exists a P–a.s. unique A –measurable function g = dPf /dP such that
Z
Pf (A) = g dP, A∈A.
A
This motivates the following definition.
Definition 19.1.1. Let A ⊂ F be a sub–σ–algebra. Given f ∈ L1 (P), its conditional
expectation given A is an A –measurable random variable, E[f |A ], satisfying
(19.1) E[f 1A ] = E[E[f |A ]1A ]
for all A ∈ A .

The conditional expectation of an integrable function f exists and it is essentially unique

by virtue of the Radon Nikodym theorem. From (19.1), it follows that E[E[f |A ]] = E[f ].
The followig result contains several important properties of conditional expectation.
Lemma 19.1.2. Suppose that f, g ∈ L1 (P), a, b ∈ R, and A , B ⊂ F are sub–σ–algebras.
Then, P–a.s.

565
566 19. Conditioning and disintegration

(a) E[|E[f |A ]|] ≤ E[|f |];

Proof. (a) Let A = [E[f |A ] ≥ 0], then ∈ A and

E[|E[f |A ]|] = E[1A E[f |A ]] − E[1Ac E[f |A ]] = E[1A f ] − E[1Ac f ] ≤ E[|f |]
(b) Ac = [E[f |A ] < 0] ∈ A , so 0 ≤ E[f 1Ac ] = E[E[f |A ]1Ac ] ≤ 0. Therefore P[Ac ] = 0.
(c) It follows from the linearity of the integral and the P–a.s. uniqueness of the conditional
expectation.
(d) If A ∈ A thenA ∈ B. So, E[1A E[f |B]] = E[1A f ] = E[1A E[f |A ]]. The statement follows
from uniqueness of the conditional expectation.
(e) The statement clearly holds when g is an A –measurable indicator function; then, by
linearity, it holds for A –measurable simple functions; by standard monotone class argu-
ments, it holds for A –measurable functions.
(f) From (a), (b) and monotone convergence 0 ≤ E[fn |A ] converges monotonically to an
A –measurable function X a.s. and in L1 . Consequently
0 ≤ E[E[f |A ] − X] = E[E[(f − fn )|A ]] − E[X − E[fn |A ]].
By passing to the limit we obtain that X = E[f |A ].
Theorem 19.1.3. Let X and Y be random variables with values in measurable spaces E
and F respectively. Let f : E ×F → C be a measurable function such that E[|f (X, Y |)] < ∞.
Suppose that A is a σ–algebra such that X ∈ A and σ(Y ) is independent from A . Then
E[f (X, Y )|A ] = h(X)
where h is the map on E given by x 7→ E[f (x, Y )].

Proof. Let A ∈ A and denote by µ and ν the laws of (1A , X) and Y respectively. The
independence of A and Y means that the joint law of ((1A , X), Y ) is the product measure
µ ⊗ ν. Thus, by Fubini’s theorem
Z ZZ Z Z

f (X, Y ) dP = sf (x, y) µ(ds, dx) ⊗ ν(dy) = s f (x, y) ν(dy) µ(ds, dx)
A
Z Z
= sE[f (x, Y )] µ(ds, dx) = h(X) dP
A
Observe that h(X) ∈ A .
Theorem 19.1.4. (Conditional Jensen’s inequality) Let X : Ω → (a, b), where −∞ ≤ a <
b ≤ ∞, is an integrable function. If ϕ : (a, b) → R is a convex function and ϕ ◦ X ∈ L1 ,
19.2. Conditional Independence 567

then
(19.2) ϕ(E[X|A ]) ≤ E[ϕ ◦ X|A ]
for any sub–σ–algebra A .

Proof. Let S = {(p, q) ∈ R2 : px + q ≤ ϕ(x), a < x < b}. The convexity of ϕ implies S 6= ∅
and that ϕ(x) = sup{px+q : (p, q) ∈ S}. If S ′ be a countable dense subset of S, then we also
have ϕ(x) = sup{px + q : (p, q) ∈ S ′ }. Hence, for all (p, q) ∈ S ′ , E[ϕ ◦ X|A ] ≥ p E[X|A ] + q
almost surely. Taking the supremum over all (p, q) ∈ S ′ gives (19.2).

Elementary results from the theory of Hilbert spaces also lead to the notion of condi-
tional expectation without reference to the R Radon–Nikodym theorem. Indeed, L2 (Ω) is a
Hilbert space with inner product hf |gi = Ω f ḡ dP. Given a σ–algebra A , the space H of
A –measurable square integrable functions is a close subspace of L2 . Thus, for any f ∈ L2
the orthogonal projection g = PH of f onto H satisfies hf − g|hi = 0 for all h ∈ H . So, g
satisfies (19.1), that is, g = E[f |A ]. Since A = [g < 0] ∈ A , we have that
E[|g|] = E[1Ac g] − E[1A g] = E[1Ac f ] − E[1A f ] ≤ E[|f |]
Thus the map f → 7 E[f |A ] defined on L2 is L1 –continuous. Since P(Ω) = 1, L2 is dense
as a subspace of L1 , there is a unique continuous extension of the conditional expectation
map to L1 .
Theorem 19.1.5. Suppose that G is a collection of σ–algebras contained in F and let
f ∈ L1 (P). The family {E[f |A ] : A ∈ G} is uniformly integrable.

Proof. Denote fA = E[f |A ]. As {|fA | > a} ≤ {E[|f |A ] > a},
Z Z Z

|fA | dP ≤
E[|f | A ] dP = |f | dP.
{|fA |>a} {E[|f |A ]>a} {E[|f |A ]>a}
E[|f |]
Since P[E[|f |A ] > a] ≤ a −→ 0 as a → ∞, we conclude that
Z
inf sup |fA | dP = 0.
a>0 A ∈G {|f |>a}
A

19.2. Conditional Independence

Let (Ω, F , P) be a probability space. Suppose A , B and C are σ–algebras contained F .
We say that A and B are independent given C , denoted by A ⊥⊥C B, if
P[A ∩ B|C ] = P[A|C ]P[B|C ]
for all A ∈ A and B ∈ B. Notice that if C = {∅, Ω} then independence given C is the
same as independence in the usual sense.
568 19. Conditioning and disintegration

Theorem 19.2.1. (Doob) Let A , B and C be sub–σ–algebras of F . A ⊥⊥C B iff

(19.3) P[A|σ(C , B)] = P[A|C ]
for all A ∈ A .

Proof. Suppose that A and B are conditional independent given C . For any A ∈ A , B
and C ∈ C we have

P A ∩ C ∩ B) = P 1C P[A ∩ B|C ] = P 1C P[A|C ]P[B|C ]
h i
= P P[A|C ]P[B ∩ C|C ] = P P P[A|C ]1B∩C C

= P P[A|C ]1B∩C

Since σ(B, C ) = σ {B ∩ C : B ∈ B, C ∈ C } , a monotone class argument shows that

P[A ∩ H] = P P[A|C ]1H
for all H ∈ σ(B, C ). This means that
P[A|σ(B, C )] = P[A|C ]

Conversely, suppose that (19.3) holds. For any A ∈ A and B ∈ B we have

This shows that A and B are independent given C .

Corollary 19.2.2. Under the conditions of Theorem 19.2.1, we have that A ⊥⊥C B iff
A ⊥⊥C σ(C , B).

Proof. The statement is a direct consequence of σ C , σ(C , B) = σ(C , B).
P
Corollary 19.2.3. A ⊥⊥C A iff A ⊂ C .

Proof. If A ⊥⊥C A then, for any A ∈ A

1A = P[A|σ(C , A )] = P[A|C ]
P
This means that A ∈ C .
P
Conversely, suppose that A ⊂ C . Then For any A ∈ A there is CA ∈ C such that
P[A△CA ] = 0. Then
P[A|σ(C , A )] = 1A = 1CA = P[CA |C ] = P[A|C ]
We conclude that A ⊥⊥C A .
Lemma 19.2.4. Let A and B0 , . . . , Bm , Bm+1 , m ∈ Z+ , be sub–σ–algebras of F . If
A ⊥⊥B0 σ(B1 , . . . , Bm+1 ), then A ⊥⊥σ(B0 ,...,Bm ) Bm+1 .
19.3. Regular conditional probabilities 569

Proof. For m = 0 there is nothing to proof. Suppose m ≥ 1. Then, as σ(B1 , . . . , Bm ) ⊂

σ(B1 , . . . , Bm+1 ), A ⊥⊥B0 σ(B1 , . . . , Bm ). Thus, by Theorem19.2.1, for any A ∈ A we
have that
P[A|σ(B0 , B1 , . . . , Bm )] = P[A|B0 ] = P[A|σ(B0 , B1 , . . . , Bm+1 )].
Therefore, by Theorem 19.2.1, A ⊥⊥σ(B0 ,B1 ,...,Bm ) Bm+1 .
Remark 19.2.5. Let (Ω, F , P) be a probability space. For any X ∈ L1 (P), E[X|{∅, Ω}] =
E[X].
Theorem 19.2.6. (Chain rule) Let {A , Bn : n ∈ Z+ } be a family of sub–σ–algebras of F .
The following statements are equivalent:
(i) A ⊥⊥B0 σ(Bn : n ∈ N)
(ii) A ⊥⊥σ(B0 ,...,Bm ) Bm+1 for all m ∈ Z+ .

Proof. (i) implies (ii): As σ(B1 , . . . , Bm+1 ) ⊂ σ(Bn : n ∈ N) for all m ∈ Z+ , A ⊥⊥B0
σ(B1 , . . . , Bm+1 ) for all m ∈ Z+ . The result then follows from Lemma 19.2.4.

(ii) implies (i): Let A ∈ A . For any m ∈ Z+

P[A|σ(B0 , B1 , . . . , Bm )] = P[A|σ(B0 , B1 , . . . , Bm+1 )]
Adding from m = 0, . . . , n we obtain that
P[A|B0 ] = P[A|σ(B0 , B1 , . . . , Bn+1 )]
This shows that for any n ∈ N, A ⊥⊥B0 σ(B1 , . . . , Bn ). A monotone class argument then
shows that A ⊥⊥B0 σ(Bn : n ∈ N).

19.3. Regular conditional probabilities

Consider a probability space (Ω, F , µ), and let G ⊂ F ) be a sub–σ–algebra. Then, µ[1|G ] =
1 µ–a.s. and forPany sequence {An } ⊂ F of pairwise disjoint sets with union A, we have
that µ[1A |G ] = ∞ n=1 µ[1An |G ] µ–a.s. A natural question is whether µ[·|G ] defines is a
probability measure µ–a.s. Since F is often uncountable, the exceptional sets could exhaust
the whole space Ω, or their uncountable union may not even be measurable.
Definition 19.3.1. µ[·|G ] admits a regular conditional probability if there is a sto-
chastic kernel ν from (Ω, G ) to (Ω, F ), and a set N ∈ F with µ[N ] = 0 such that
µ[1A |G ](ω) = ν(ω, A) for all A ∈ F and ω ∈ Ω \ N .

We introduce some technical concepts that will be useful in the construction of regular
conditional probabilities.
Definition 19.3.2. A collection K ⊂ P(Ω) is a compact class if it is closed under
the finite intersection property : for any {Kn : n ∈ N} ⊂ K ,
finite unions and it has T
T
K
n n = ∅ implies that n≤n0 Kn = ∅ for some n0 ∈ N.
570 19. Conditioning and disintegration

Definition 19.3.3. Let (Ω, F , µ) be a σ–finite measure space. Suppose K ⊂ F is a

compact class. Then, µ has the approximation property with respect to K if
µ[F ] = sup{µ[K] : K ∋ K ⊂ F }, F ∈ F.
Example 19.3.4. The collection of compact sets in a Hausdorff topological space is a
compact class.
Example 19.3.5. In (Rd , B(Rd )), the Lebesgue measure λd (any positive Radon measure
will do) has the approximating property w.r.t. the collection of all compact sets K in Rd .

A linear operator T : L1 → L1 is a positive contraction if T f ∈ L+ +

1 for all f ∈ L1
∗
and kT f k1 ≤ kf k1 for all f ∈ L1 . The dual operator T : L∞ → L∞ is defined as the map
g → T ∗ g so that
Z Z
f T ∗ g dµ = T f · g dµ

for all f ∈ L1 . It follows that kT ∗ gk∞ ≤ kgk∞ for any g ∈ L∞ .

Example 19.3.6. Let (Ω, F , µ) be a probability space and G a sub σ–algebra in F . The
conditional expectation operator T : f → µ[f |G ] on L1 (µ) is a positive contraction. It is
left as an exercise (Exercise 19.9.4) to show that T ∗ g = µ[g|G ] for all g ∈ L∞ (µ).
Lemma 19.3.7. Let µ be a σ–finite measure on (Ω, F ) and let T be a positive L1 contrac-
tion. Let {h, hn } ⊂ L∞ , and suppose that hn ր h µ–a.s. Then, T ∗ hn ր T ∗ h µ–a.s.

Proof. Since T ∗ is a positive L∞ contraction T ∗ hn ր g ≤ T ∗ h µ–a.s. for some g ∈ L∞ .

By dominated convergence
Z Z Z Z Z
f · T ∗ h dµ = (T f ) · h dµ = lim (T f ) · hn dµ = lim f · T ∗ hn dµ = f g dµ
n n

for all f ∈ L1 . Therefore, T ∗h = g µ–a.s.

Theorem 19.3.8. Let (Ω, F , µ) be a σ–finite measure, E ⊂ F a sub–σ–algebra that is
countably generated. Assume that µ has the approximation property on E with respect to
a compact class K ⊂ E . Let T be a positive contraction on L1 (Ω, F , µ), and assume that
for some σ–algebra G ⊂ F , T ∗ (L∞ ) is contained in the space M G of G –measurable
functions. Then, there Rexists a quasi–stochastic kernel P from (Ω, G )R to (Ω, E ) (P (ω, Ω) ≤
1) such that µP ≪ µ ( P (x, A) dµ = 0 if µ[A] = 0), and T ∗ g(ω) = g(x)P (ω, dx) µ–a.s.
for all g ∈ L∞ (Ω, E , µ). If T ∗ 1 = 1, then P can be chosen to satisfy P (ω, Ω) = 1 for all
ω ∈ Ω. The kernel P is µ–G unique in the sense that if P ′ is another G –representation of
T ∗ on L∞ (Ω, E , µ) then, there is M ∈ G such that µ(M ) = 0 and P (ω, ·) = P ′ (ω, ·) for all
ω ∈ Ω.

Proof. As E is countable generated and µ is σ–finite, there is a countable algebra B on

which µ is finite, that generates E . For each B ∈ B, there is an increasing sequence sequence
{KmB : m ∈ N} ⊂ K such that K B ⊂ B, and lim µ(K B ) = µ(B). In particular, K B ր B
m m m m
µ–a.s. The algebra D generated by B ∪ {Km B : m ∈ N, B ∈ B} is also countable and
19.3. Regular conditional probabilities 571

σ(D) = E . For eachD ∈ D we choose P (·, D) := P 1D in the class of equivalence of T ∗ 1D .

As T ∗ (L∞ ) ⊂ M G , each map ω 7→ P (ω, D), D ∈ D, is G –measurable. Moreover,
(a) For each D ∈ D, 0 ≤ P 1D ≤ 1 µ–a.s., and if T ∗ 1 = 1, then P 1Ω = 1 µ–a.s.
(b) For any pair of disjoint sets D, D′ ∈ D, P 1D∪D′ = P 1D + P 1D′ µ–a.s.
(c) For each B ∈ B, P 1Km
B ր P 1B as m → ∞ µ–a.s. by Lemma 19.3.7.

Since D is countable, there is a µ–null set N ∈ G such that conditions (i), (ii) and (iii) hold
on Ω \ N . Hence, for each ω ∈ Ω \ N , P (ω, ·) is a finitely additive quasi–probability measure
on D.
We claim that P (ω, ·) is countable additive in B for each ω ∈ Ω \ N . Fix ω ∈ Ω \ N . It is
enough to show that if B ∋ Bj ց ∅, then limj P (ω, Bj ) = 0. Given ε > 0, for each Bj there
B B B T B
is Kmjj ∈ K such that Kmjj ⊂ Bj , and P (ω, Bj ) < P (ω, Kmjj ) + ε/2j . Since j Kmjj = ∅,
T B
there is j0 ∈ N such that j≤j0 Kmjj = ∅. For all j ≥ j0 ,
\ \ \ [
Bℓ Bℓ
Bj ⊂ Bj 0 = Bℓ = Bℓ \ Km ℓ
⊂ (Bℓ \ Km ℓ
)
ℓ≤j0 ℓ≤j0 ℓ≤j0 ℓ≤j0
Consequently,
X
Bℓ
P (ω, Bj ) ≤ P (ω, Bj0 ) ≤ P (ω, Bℓ \ Km ℓ
)<ε
ℓ≤j0

Therefore, limj P (ω, Bj ) = 0.

Carathéodory’s extension theorem implies that for all ω ∈ Ω\N , P (ω, ·) is uniquely extended
to a quasi–probability measure on σ(B) = E . Let ν be a probability measure equivalent
to µ and set P (ω, ·) = ν(·) for ω ∈ N . By construction, P (·, B) is G –measurable for each
B ∈ B. Then, by Sierpinski’s monotone class theorem, P (·, E) is G –measurable for all
E ∈ E . This means that P is a quasi–stochastic kernel from (Ω, G) to (Ω, E ), and that
P 1E = T ∗ 1E µ–a.s. for any E ∈ E . Since µ(A) = 0 implies that kT ∗ 1A k∞ ≤ k1A k∞ = 0,
we also have that µP ≪ µ.
To prove uniqueness, suppose P ′ is another quasi–stochastic kernel from (Ω, G ) to (Ω, E )
such that for each g ∈ L∞ (Ω, E , µ), T ∗ g = P ′ g µ–a.s. Then, P 1B = P ′ 1B µ–a.s. for
each B ∈ B. As B is a countable σ–algebra, there is a negligible set M ∈ G out of which
P (ω, B) = P ′ (ω, B) for all B ∈ B. By Sierpinski’s monotone class theorem we conclude
that P 1E (ω) = P ′ 1E (ω) for all ω ∈ Ω \ M and all E ∈ E .
Remark 19.3.9. Under the assumptions of Theorem 19.3.8, define for each f ∈ L1 (Ω, E , µ)
R dµf P
the measure µf P : E 7→ f · P 1E dµ on E . Then T f = dµ ,
Corollary 19.3.10. (Existence of regular conditional probabilities) Let (Ω, F , µ) be a prob-
ability measure, and G ⊂ F be a sub σ–algebra. Suppose (S, B(S)) is a Polish space with
its Borel σ–algebra. If X is an S–valued measurable function then, there is a µ–G a.s.
unique stochastic kernel ν from (Ω, G ) to (S, B(S)) such that

ν(ω, A) = µ 1A (X)|G µ–a.s.
572 19. Conditioning and disintegration

for all A ∈ B(S), The stochastic kernel P (ω, X −1 (A)) := ν(ω, A) is the µ–G regular con-
ditional probability that represents µ[X ∈ ·|G ].

Proof. As S is a Polish space, B(S) is countably generated, and so σ(X) = {X −1 (B) : B ∈

B(S)} is a countably generated. By Ulam’s theorem, µ has the approximating property on
σ(X) with respect to the compact class K = {X −1 (K) : K compact in S}. The operator
T f = µ[f |G ] on L1 (µ) is a positive contraction, and T ∗ g = µ[g|G ] on L∞ (µ). As µ[1|G ] = 1
µ–a.s., Theorem 19.3.8 with E = σ(X) implies the existence of a stochastic kernel P from
(Ω, G ) to (Ω, E ), and a set N ∈ G with µ(N ) = 0, such that T ∗ g(ω) = P g(ω) for all
g ∈ L∞ (Ω, E , µ) and ω ∈ Ω \ N . Moreover, the kernel P is µ–G unique. The stochastic
kernel ν from (Ω, G ) to (S, B(S)) is defined by

ν(ω, A) = P (ω, X −1 (A))

for all A ∈ B(S).

Remark 19.3.11. In many situations, the σ–algebra G is generated by a measurable

function Y : (Ω, F ) → (T, T ), where (T, T ) is a given measurable space, that is G =
σ(Y ) = {Y −1 (A) : A ∈ T }. Then, for each B ∈ B(S) the map ω 7→ P (ω, X −1 (B))
is real–valued σ(Y )–measurable, and so P (ω, X −1 (B)) = gB (Y (ω)) for some function T –
measurable function gB : T → [0, 1]. Let Q be any probability measure on (S, B), and
define a stochastic kernel νe from (T, T ) to (S, B(S)) by

gB (t) if t ∈ Y (Ω)
νe(t, B) =
Q(B) if t ∈ / Y (Ω)

Then νe is a stochastic kernel from (T, T ) to (S, B(S)), and P (ω, X −1 (·)) = νe(Y (ω), ·) is
the unique regular conditional probability that represents µ[X −1 ∈ ·|σ(Y )].

19.4. Disintegration
The goal in this section is the extension of Fubini’s theorem through the use of regular
conditional expectations.

Theorem 19.4.1. (Disintegration) Let (Ω, F , P) be a probability space, and (S, S ) and
(T, T ) measurable spaces. Let G ⊂ F sub–σ–algebra. Let X : (Ω, F ) → (S, S ) be such
that P[X ∈ ·|G ] has a regular version ν. If Y : (Ω, G ) → (T, T ) and f : (S ×T, S ⊗T ) → C
are functions such that E[|f (X, Y )|] < ∞ then,
Z
(19.4) E[f (X, Y )|G ](·) = f (x, Y (·))ν(·, dx) P–a.s.
ZS Z
(19.5) E[f (X, Y )] = f (x, Y (ω))ν(ω, dx) P(dω)
Ω S
19.4. Disintegration 573

If G = σ(Y ) and P[X ∈ dx|σ(Y )] = ν(Y (ω), dx) for some stochastic kernel from (T, T ) to
(S, S ) then,
Z
E[f (X, Y )|σ(Y )](·) = f (x, Y (·))µ(Y (·), dx) P–a.s.
S
Z Z
E[f (X, Y )] = f (x, Y (ω))µ(Y (ω), dx) P(dω)
Ω S
If X and Y are independent then, µ(X ∈ dx|σ(Y )](·) = P[X ∈ dx] P–a.s.

Proof. Suppose B ∈ S and C ∈ T , then by the properties of conditional expectation

E[1B (X)1C (Y )] = E 1C (Y )E[1B (X)|G ]
hZ i
= E[1C (Y )ν(·, B)] = E 1B×C (s, Y (·))ν(·, ds) .
S
This shows that (19.5) holds for f (s, t) = 1B×C (s, t). By Sierpinki’s monotone class theorem
the formula extends to all indicator functions 1D , with D ∈ S ⊗ T ; hence it also extends
to nonnegative simple functions and by monotone convergence to all nonnegative S ⊗ T –
measurable functions f .
Fix f : S × T → R+ and let G ∈ G . Regarding ω 7→ (Y (ω), 1G (ω)) as a T × {0, 1}–valued
G –measurable function, we obtain from (19.5) that
hZ i
E[f (X, Y )1G ] = E f (s, Y (·))1G (·)ν(·, ds) ,
S
proving that (19.4) holds for all f ≥ 0. The result for general f follows by taking real and
imaginary parts and positive and negative parts.
The case when G = σ(Y ) is discussed in Remark 19.3.11. If X and Y are independent then,
P[(X, Y )−1 (dx, dy)] = P[X −1 (dx)] ⊗ P[Y −1 (dy)], and the conclusion follows from Fubini’s
theorem.
Example 19.4.2. (Conditional density) Let (Ω, F , P) be a probability space, and (S, S , µ)
and (T, T , ν) be σ–finite measure spaces. Suppose X and Y are random variables with
values measurable spaces S and T respectively, and that P ◦ (X, Y )−1 ≪ µ ⊗ ν with Radon–
Nikodym derivative fX,Y (x, y). Then
Z
P[X ∈ A, Y ∈ B] = 1A (x)1B (y)fX,Y (x, y) µ(dx)ν(dy), A ∈ S, B ∈ T
S×T
The law of X, P ◦ X −1 ,
is absolutely continuous with respect to µ since
Z Z Z Z
P[X ∈ A] = 1A (x) fX,Y (x, y) ν(dy) µ(dx) = fX,Y (x, y) ν(dy) µ(dx)
S T A T
dP◦X −1
R
Hence, fX (x) := dµ = T f (x, y) ν(dy) µ–a.s. It follows that for any boudend measur-
able function g on T
Z Z
fX,Y (x, y)
E[g(Y ), X ∈ A] = fX (x) g(y) ν(dy) µ(dx)
A T fX (x)
574 19. Conditioning and disintegration

19.5. Kolmogorov’s extension theorem

The Ionescu–Tulcea theorem states that for countable collections of probability spaces with
a given collection of finite dimensional conditional probabilities, there is a unique prob-
ability measure on the product space with the prescribed finite dimensional conditional
probabilities. Here, we will prove a similar result for arbitrary collection of Borel spaces
where a system of compatible finite–dimensional probabilities is prescribed. The result is
based on the existence of conditional regular probabilities for Borel spaces.
Lemma 19.5.1. (Randomization) Let µ be a stochastic kernel from a measure space S to
a Borel space T . There is a function f : S ⊗ [0, 1] → T such that if θ ∼ U [0, 1], then the
law of f (s, θ) is µ(s, ·).

Proof. We will consider only the case where T is uncountable in which case, by measurable
isomorphism theorem 3.9.15, there is bijection φ : (R, B(R)) −→ (T, T ) such that φ and
φ−1 are measurable. Then ν(s, B) := µ(s, φ(B)) is a stochastic kernel from S to R. Let
g : S × (0, 1) :→ R be defined as the quantile tranformation
g(s, t) = inf{x ∈ (0, 1) : ν(s, (−∞, x]) ≥ t}
Since g(s, t) ≤ x iff ν(s, (−∞,x]) ≥ t, the measurability of the map s 7→ ν(s, (−∞, x])
implies that g is S ⊗ B (0, 1) measurable. If θ ∼ U [0, 1], then
P[g(θ, s) ≤ x] = P[θ ≤ ν(s, (−∞, x])] = ν(s, (−∞, x])
This shows that g(θ, s) ∼ ν(s, dx). Therefore, for f := φ ◦ g, f (θ, s) ∼ µ(s, dt).
Theorem 19.5.2. (Transfer) Suppose (S, S ) is a measurable space, (T, T ) is a Borel space,
and (ξ, η) be a random variable in S × T . If {ξ, e θ} are independent random variables in S
d d
and [0, 1] respectively so that ξe = ξ and θ = U [0, 1], then there exists a measurable function
e f (ξ, d
e θ)) =
f : S × [0, 1] −→ T such that (ξ, (ξ, η).

Proof. Example 19.3.10 shows that there is a stochastic kernel µ from S to T such that
E[η ∈ ·|ξ] = µ(ξ, ·). Lemma 19.5.1 implies that there is a function f : S × [0, 1] → T such
that the law of f (s, θ) is µ(s, ·) for each s ∈ S. Hence, for any bounded measurable function
g on S × T we have
hZ 1 i hZ 1 i
e e
E[g(ξ, f (ξ, θ))] = E e e
g(ξ, f (ξ, u)) du = E g(ξ, f (ξ, u)) du
0 0
hZ i
= E g(ξ, t)µ(ξ, dt) = E E[g(ξ, η)|ξ] = E[g(ξ, η)]
T
19.5. Kolmogorov’s extension theorem 575

d e
e θ), we obtain that (ξ, η) =
Setting ηe = f (ξ, (ξ, ηe).
Theorem 19.5.3.
Q (Daniell)
N Let {(S n , Sn )} be a sequence of Borel spaces. Suppose that the
sequence ( nk=1 Sk , nk=1 Sk , µn ) is projective; that is,
(19.6) µn+1 (· × Sn+1 ) = µn (·),
n ∈ N.
Q N
Then, there exists a unique probability measure µ on ( n Sn , n Sn ) such that
µ ◦ (p1 , . . . , pk )−1 = µk , k ∈ N,
Q Qk
where (p1 , . . . , pk ) : n Sn −→ j=1 Sj is the projection s 7→ (s1 , . . . , sk ).

Proof. Let θ := (θn ) be an i.i.d. sequence of U [0, 1]–random variables defined on the
space ((0, 1), B)(0, 1)), λ). Since S1 is a Borel space, there is a function F : (0, 1) → S1
d
such that f1 (θ1 ) = µ1 . For n ≥ 1, suppose that we have constructed random variables
d
ξ1 = f1 (θ1 ), . . . , ξn = fn (θ1 , . . . , θn ) on S1 , . . . , Sn respectiely, so that (ξ1 , . . . , ξn ) = µn .
Q d
Let (η1 , . . . , ηn+1 ) be an arbitrary random variable in n+1 k=1 Sk with (η1 , . . . , ηn+1 ) = µn+1 .
d
By (19.6), we have that (ξ1 , . . . , ξn ) = (η1 , . . . , ηn ). Theorem 19.5.2 implies that there is a
Q
measurable function fen+1 on nj=1 Sj × [0, 1] such that if ξn+1 := fen+1 (ξ1 , . . . , ξn , θn+1 ) then
d d
(ξ1 , . . . , ξn+1 ) = (η1 , . . . , ηn+1 ) = µn+1 . The law of the sequence (ξn ) satisfies the required
conditions. Uniqueness follows from Sierpinski’s monotone class theorem.

Suppose {(S t ∈ T } is a collection of Borel spaces. For each I ⊂ T . Denote by

Q t , St ) :N
(SI , SI ) = t∈I St , t∈I St and let pI : ST −→ SI be the projection (st : t ∈ T ) 7→
(st : t ∈ I). A family of probability measures {µJ : J ⊂ T , J finite or countable} on SJ
is projective if

µJ · ×SJ \I = µI · , I⊂J
for any finite or countable J ⊂ T .
Theorem 19.5.4. (Kolmogorov’s extension theorem) Suppose {(St , St ) : t ∈ T } is a family
of Borel spaces. If {µI : I ⊂ T , I finite} is a projective family of probability measures on
SI , then there exists a unique probability measure µ on ST such that
µ ◦ p−1
I = µI

for any finite I ⊂ T .

Proof. As ST = σ pJ : J ⊂ T , J countable , for each A ∈ ST there is a countable
J ⊂ T and B ∈ SJ such that A = B × ST \J . By Theorem 19.5.3 there exists a unique
probability measure µJ on SJ such that

µJ · ×SJ \I = µI (·)
for all finite I ⊂ J . It follows that if J ⊂ K ⊂ T and K countable then, for any finite
set I ⊂ J and B ∈ SI µJ (B × SJ \I ) = µI (B) = µK (B × SK\I ). This means that the
576 19. Conditioning and disintegration

collection {µJ : J ⊂ T , J countable} is projective. Therefore, the map µ : ST −→ [0, 1]

given by
µ(A) = µJ (B × ST \J ), B ∈ SJ , A = B × ST \J
is well defined. We show now that µ is in fact a probability measure on ST . To that
purpose, it suffices to check that µ is countable additive. For a pairwise disjoint sequence
{An } ⊂ ST , choose countable Jn ⊂ T and Bn ∈ SJn such that An = Bn × ST \Jn . Let
S S S
J = n Jn and define Cn = Bn × SJ \Jn . Then, n An = n Cn × ST \J , so
[ [ X X
µ An = µJ Cn = µJ (Cn ) = µ(An ).
n n n n

Consequently, µ is a measure on ST and µ ◦ p−1

= µI for all finite I ⊂ T . Uniqueness
I
follows immediately from Sierpinski’s monotone class theorem.

Suppose (S, S ) is a Borel space and let T be a non-empty index set, and consider
the product space (S T , S ⊗T ). Kolmogorov’s extension theorem shows that for any pro-
jective family {µI : I ⊂ T , I finite} of probability measures on S ⊗I there exists a unique
probability measure µ on S ⊗T .
The identity map in S T is called the S–values canonical stochastic process in T
with distribution µ, and the coordinate evaluation maps Xt (s) = s(t) are the values of the
process at t.
For any probability space (Ω, F , P), a measurable map Xe : (Ω, F ) −→ (S T , S ⊗T ) is
an S–valued stochastic process in T with distribution P ◦ X e −1 , and X
et (ω) = (X(ω))(t)
e
is the value of the process at t.

19.5.1. Weakly stationary processes. A mean zero complex valued process {Xt : t ∈
R} ⊂ L2 (Ω, P) is weakly stationary if E[Xt X s ] = E[Xt−s X 0 ] for all t, s ∈ R. Denote
by ρ(h) = E[Xh X 0 ] the covariance function of X. If X is a weakly stationary, then r is a
positive definite function, that is,
X n
zk ρ(tk − tj )z j ≥ 0
k,j

for all {z1 , . . . , zn } ⊂ C and {t1 , . . . , tn } ⊂ R. Indeed,

n
! n n
X X X
0 ≤ var zk Xt k = E zk Xt k zj Xt j = zk ρ(tk − tj )z j
k=1 k,j k,j

We will show next that the converse statement holds.

Theorem 19.5.5. (Kolmogorov–Kintchine) Let κ be a complex valued function on R. Then,
κ is positive definite iff there exists a weakly stationary process X such that κ(h) = E[Xh X 0 ].
d
Proof. Let t = P(t 1 , . . . , td ) ∈ R be fixed. We first consider the case where κ is real
valued.
n
Then, Q(z) = k,j=1 zk κ(tk − tj )zj is a positive quadratic form. Thus, exp − 12 Q(z) is the
characteristic function function of a n–dimensional Gaussian measure µt with covariance
19.6. Sufficient statistics 577

matrix (κ(tk − tj ))k,j . It is clear that the collection of measures {µt : t ∈ Rn , n ∈ N} is

projective. Therefore, by Kolmogorov’s extension theorem, there exists a stochastic process
{Xt : t ∈ R} with the prescribed covariance function κ.

For general κ, let α and β be its real and imaginary part respectively. Since κ is positive
definite, then the function Q in R2n given by
n
X
Q(u, v) = (uk − i vk )(uj − i vj )κ(tk − tj )
k,j=1
Xn

= (α(tk − tj )(uk uj + vk vj ) − β(tk − tk )(uk vj − uj vk ) .
k,j=1

is a positive quadratic form. Therefore exp − 12 Q(u, v) is the characteristic function of a
2d–dimensional Gaussian random vector (Yt , Zt ) with
(19.7) E[Ytk Ytj ] = E[Ztk Ztj ] = α(tk − tj )
(19.8) E[Ytk Ztj ] = −E[Ytj Ztk ] = −β(tk − tj ).

If Xt = √12 (Yt + iZt ), then E[Xtk Xtj ] = κ(tk − tj ). The laws µt = L Yt , Zt are projective,
and so the existence of weakly stationary process with covariance function κ follows from
Kolmogorov’s extension theorem.

19.6. Sufficient statistics

This section we discuss an application of conditional expectation to the theory of Mathe-
matical Statistics. We will assume that (Ω, F ) is a fixed measurable space (sample space)
with F 6= {∅, Ω}. A measurable function T is called a statistic; a family of probability
measures on (Ω, F ) is called a population.

Definition 19.6.1. A sub σ–algebra A ⊂ F is sufficient for P if for any F ∈ F , there

exists an A –measurable function µF such that
P [F |A ] = µF , P ∈ P.
A statistic T is sufficient if σ(T ) is sufficient.

Remark 19.6.2. When each P [·|A ], P ∈ P, admits a regular conditional probability (for
instance, if (Ω, F ) is a Borel space), there is a slightly stronger notion of sufficiency. A
σ–algebra A ⊂ F is said to be strongly sufficient if there is a stochastic kernel from
(Ω, A ) to (Ω, F ) such that
P [F |A ](ω) = µ(ω, F ), F ∈F P ∈ P.
If A = σ(T ), then µ(ω, F ) = η(T (ω), F ) for some stochastic kernel η from (E, E ) to (Ω, F )
with (T (Ω) ⊂ E).
578 19. Conditioning and disintegration

A sufficient σ–algebra reduces the uncertainty in the population conditionally. Observe

that a constant statistic (equivalently, the trivial σ–algebra {∅, Ω}) is not sufficient. On
the opposite extreme, F is sufficient. Thus, sufficient σ–fields are useful in so far as they
contain minimal information enough to conditionally reduce the population P.
Theorem 19.6.3. (Factorization) Let P be a population on a measurable space (Ω, F ).
Assume that there is a σ–finite measure ν such that P ≪ ν for all P ∈ P. A σ–algebra
A ⊂ F is sufficient iff there exist functions g : P −→ A+ and h : (Ω, F ) −→ [0, ∞) such
that
dP
(19.9) = gP h
dν
for each P ∈ P.

Proof. Necessity: Suppose A is sufficient for P. For any F ∈ F , let µF be an A –

measurable function such that for each P ∈ P, P [F |A ] = µF P –a.s. P Halmos–Savage’s
theorem shows that there is a sequence {Pn } ⊂ P such that P ≪ Q = n 2−n Pn for all
P ∈ P. Hence, for any B ∈ A
Z X Z
−n
Q[F |A ] dQ = Q[F ∩ B] = 2 Pn [F |A ] dPn
B n B
X Z Z
= 2−n µF dPn = µF dQ
n B

Consequently, Q[F |A ] = µF Q–a.s. Let gP be the Radon–Nikodym derivative of P w.r.t.

Q on (Ω, A ). Then,
Z Z Z
P [F ] = P [F |A ] dP = Q[F |A ]gP dQ = Q[gP 1F |A ] dQ
Z Z
dQ
= gP dQ = gP dν
F F dν
Therefore, dP
dν = gP dQ
dν .

Sufficiency: Suppose that (19.9) holds and let Q be as above. As

dP dQ dP
= ν–a.s.
dQ dν dν
and hence, Q–a.s.,
dP gP
= P −n Q–a.s..
dQ n 2 gPn
dP
Consequently, dQ admits an A –measurable version. Then, for any F ∈ F and B ∈ A
Z Z h dP i
dP
P [F ∩ B] = 1F dQ = Q 1F A dQ
dQ dQ
ZB B
Z
dP
= Q[F |A ] dQ = Q[F |A ] dP.
B dQ B
19.6. Sufficient statistics 579

Therefore P [F |A ] = Q[F |A ] P –a.s. for all P ∈ P.

Example 19.6.4. Let P be the family of d–dimensional Gaussian measures µa,σ with
covariance matrix σ 2 Id and mean ad = (a, . . . , a)T . Let λd be Lebesgue measure on Rd .
Then
2
|x−a|
dµa,σ 1 −
(x) = e 2σ 2 = e−ξ(η(a,σ)) eη(a,σ)·T (x)
dλd (2πσ 2 )d/2
R η·T (x)
where T (x) = (x · 1d , −|x|2 ), η(a, σ) = ( σa2 , 2σ1 2 )T , and ξ(η) = log e dx . There-
fore, T is a sufficient statistic for P = {µa,σ : (a, σ) ∈ R × (0, ∞)}.

We recall that given a population P on (Ω, F ) and a sub–σ–algebra B ⊂ F , a set

N is P–negligible with respect to B if for any P ∈ P, there is BP ∈ B such that N ⊂
BP and P(BP) = 0. The collection of all P–negligible sets w.r.t B is denoted by NBP .
Exercise 3.10.22 shows that BeP = σ(B ∪ N P ), the sub–completion of B with respect to
B
P, equals to

A ⊂ Ω : ∃B ∈ B, with A△B ∈ NBP
When B = F , then we simply use N P to denote NFP . When P consists of only one
eP coincides with B P, the completion of B under P.
measure P, then B
eP ⊂ B and thus, P
Remark 19.6.5. Since NBP ⊂ NBP for each P ∈ P, B
\ P
BeP ⊂ B P := B .
P∈P
ν eP .
If ν is a σ–finite such that P ≪ ν for all P ∈ P, then B ⊂ B
Definition 19.6.6. A sufficient σ–algebra A ⊂ F for a population P is minimal suffi-
eP .
cient if whenever B ⊂ F is a sufficient σ–algebra for P, then A ⊂ B

A minimal sufficient σ–algebra for some population P has the minimal information
needed to conditionally reduce the entire population P.
Remark 19.6.7. If T is a minimal sufficient statistic and S is a sufficient statistic with
S = ψ(T ) for some measurable function ψ, then S is also minimal sufficient.
Lemma 19.6.8. Suppose N P0 = N P for some P0 ⊂ P. If A is sufficient for P and
minimal sufficient for P0 , then A is minimal sufficient for P.

Proof. If B is sufficient for P, then so is B sufficient for the smaller population P0 ; thus,
A ⊂B e P0 = BeP .
Theorem 19.6.9. (Bahadur) Let P be a population on (Ω, F ). If there is a σ–finite
measure ν on (Ω, F ) such that P ≪ ν for all P ∈ P, then there exists a minimal sufficient
statistic for P.
580 19. Conditioning and disintegration

Proof.
P Halmos–Savage theorem shows that there is a sequence
P {Pn } ⊂ P such that P ≪
c P
n n n := Q, where the c are
P n dPn
postive constants with n n = 1. For any P ∈ P let
c
dP dQ
fP = dν and f = dν = n cn dν . The map
fP (x)
T : x 7→ 1{f >0} : P ∈ P
f (x)

is measurable as a function from (Ω, F ) to the product space (RP , B ⊗P ). We will show that
T is a minimal sufficient statistic for P. Indeed, for each P ∈ P, let gP be the projection
in RP onto its P –th component. As fP = (gP ◦ T ) f , T is sufficient. If B is another
sufficient σ–algebra then, by the factorization theorem, there are B–measurable functions
gP : P ∈ P) and a measurable function h such that fP = geP h ν–a.s. for each P ∈ P. The
(e
uniqueness of the Radon–Nikodym derivative implies that
geP h geP
gP ◦ T = =P ν–a.s.
f en
n cn g

eP –measurable.
Therefore T is B

Lemma 19.6.10. Suppose A ⊂ B eP . If A is sufficient for P, then so is B. In particular,

if T is a sufficient statistic for P and T = φ(S) for some measurable function φ, then S
sufficient for P.

Proof. Lemma 4.2.2 shows Pthat for any f ∈ A+ , there are sequences {an } ⊂ R+ and
{An } ⊂ A such that f = n an 1An . By assumption, there exists a sequence {Bn } ⊂ B
P
with {An △Bn } ⊂ NBP so that if fe = n an 1Bn , then supP ∈P P [f 6= fe] = 0.

Consider the case where there exists a σ–finite measure ν such that P ≪ ν for all P ∈ P.
By the factorization theorem, there is h ∈ F+ such that for any P ∈ P
dP
(x) = gP (x)h(x)
dν
e P ∈ B+ be such that P [gP 6= G
for some gP ∈ A+ . Let G e P ] = 0; then,

dP
(x) = GP (x)h(x).
dν
By the factorization theorem, we conclude that B is sufficient.
In the general case, suppose that B is not sufficient. Then there are F ∈ F and
P1 , P2 ∈ P such that

(19.10) P1 [F |B] 6= P2 [F |B].

Let P0 = {P1 , P2 } and notice that A is sufficient for P0 . Since Pi ≪ P1 + P2 , i = 1, 2,

and BeP ⊂ B eP0 , the dominated part of the proof implies that B is sufficient for P0 ; this
contradicts (19.10).
19.6. Sufficient statistics 581

Example 19.6.11. Let P = {Pθ : θ ∈ Θ} be a family of probability measures on (Ω, F )

of exponential type relative to a σ–finite measure ν with
dPθ
fθ (x) = = eη(θ)·T (x)−ξ(θ) h(x),
dν
where T : (Ω, F ) −→ Rk and h : (Ω.F ) −→ R+ . If there is finite set Θ0 = {θ0 , . . . , θk } such
that ηj = η(θj ) − η(θ0 ), j = 1, . . . , k are linearly independent, then T is a minimal sufficient
statistic.

Proof. Let P0 = {Pθ0 , . . . , Pθk }. Since ν and Pθ are equivalent measures, then N P0 =
N P . Let ηi = η(θi ) − η(θ0 ) and ξj = ξ(θj ) − ξ(θ 0 ). As in the proof of Theorem 19.6.9
S(x) = [exp η1 · T (x) − ξ1 . . . exp ηp · T (x) − ξp ]⊺ is minimal sufficient for P0 . The linear
independence of the vectors ηj imply that the matix p × p–matrix L whose i–th row is ηi⊺
is invertible. Let g(w) = [ew1 . . . ewp ]⊺ , then the function
G(t) = g(Lt) diag(e−ξi )
is an homeomorphism from Rp onto (0, ∞)p and G(T (X)) = S(X). Consequently T (X) =
G−1 (S(X)) is minimal sufficient for P0 and, by Lemma 19.6.10 T (X) is minimal sufficient
for P.

As we pointed out above, it is desirable to have sufficient statistics that make a reduc-
tion with minimal information. The following result addresses the problem of existence of
minimal sufficient statistics under a mild condition.
Definition 19.6.12. Let P be a population on (Ω, F ). A σ–algebra A ⊂ F is said to be
complete (resp. bounded complete) for P if whenever f ∈ L1 (Ω, A , P) (resp. f bounded
and A –measurable)
Z

sup f dP = 0 implies sup P (f 6= 0) = 0.
P ∈P P ∈P
A statistic T is complete (resp. bounded complete) if σ(T ) is complete (resp. bounded
complete).
Example 19.6.13. Consider a population of exponential type
Pη
(x) = eη·T (x)−ξ(η) ψ(x)
dν
where η ∈ ∆ ⊂ Rk , and ∆ has nonempty–interior. It follows from example 19.6.11 that T
is a sufficient statistic. We prove here that T is also complete.

Proof. Let F be a measurable function with Eη [F (T )] = 0 for all η ∈ ∆. Let η0 be fixed,

then Eη [F+ (T )] = Eη [F− (T )] for all η ∈ B(η0 ; δ) ⊂ ∆. Hence,
Z Z
g(h) := F+ (t)e e τ (dt) = F− (t)eh·t eη0 ·t τ (dt) < ∞
h·t η0 ·t

where τ = (ψ · ν) ◦ T −1 and |h| < δ for some δ > 0. The function g can be analyti-
cally extended to B(0; δ) + iRk . This implies that the Borel measures F+ (t)eη0 ·t τ (dt) and
582 19. Conditioning and disintegration

F− (t)eη0 ·t τ (dt) on Rk have the same characteristic function. From the uniqueness of the
characteristic function we obtain F+ = F− τ –a.s. Therefore F (T ) = F+ (T ) − F− (T ) = 0
ν–a.s.
Lemma 19.6.14. Suppose A ⊂ B eP . If B is complete for P, then so is A . In particular, if
T is a complete statistic and S = ψ(T ) for some measurable function ψ, then S is complete.

Proof. For any f ∈ L1 (A , P), there is fe ∈ L1 (B, P) such that kf − fekL1 (P) = 0.

So, if supP ∈P EP [f ] = 0, then supP ∈P EP [fe] = 0; consequently, supP ∈P P [f 6= 0] =
supP ∈P P [fe 6= 0] = 0.
Theorem 19.6.15. (Lehmann-Scheffe-Bahadur) Suppose there exists a minimal sufficient
σ–algebra A for P. A σ–algebra B is sufficient and complete for P iff B is minimal
sufficient for P and A is complete.

Proof. Suppose B is sufficient and complete. The minimal sufficiency of A implies that
A ⊂B eP ; thus, by Lemma 19.6.14, A is complete. The sufficiency of A implies that for

any B ∈ B, there is a function gB ∈ A+ with supP ∈P P P [B|A ] 6= gB = 0. Hence,
there is geB ∈ B+ such
that supP ∈P P [gB 6= geB ] = 0. Since G = (1B − geB ) ∈ L1 (B, P)
and supP ∈P EP [G] = 0, the completeness of B implies that supP ∈P P (G 6= 0) = 0. Thus

supP ∈P EP |1B − gB |] = 0, which in turn means that B ∈ AfP . Consequently, B ⊂ AfP
and thus, B is minimal sufficient.
Definition 19.6.16. Given a population P on (Ω, F ), a statistic V : (X , B) → (V, D) of
X is said to be ancillary if there is a probability measure µ on (V, D) such that
P[V (X) ∈ D] = µ(D), P ∈ P, D ∈ D
Example 19.6.17. Given a fixed σ 2 > 0, consider the family P of normal distribution Pµ
with mean µ ∈]mathbbR and variance σ 2 . If X ∼ N (µ, σ 2 ), then V (X) = X − µ ∼ N (0, σ 2 )
and so, V is ancillary for P.
Theorem 19.6.18. (Basu) Let T and V be two statistics of X from the population P. If
T is boundedly complete and sufficient and V is ancillary, then T and V are independent.

Proof. Let µ be a measure on (V, D) such that P (V (X) ∈ D) = µ(D) for any D ∈ D. It
is enough to show that EP [1(V (X) ∈ D)|T ] = µ(D) = 0 for each P–a.s. for all D ∈ D.
Sufficiency of T and ancillarity of V imply that ψD (T ) = E[1(V ∈ D)|T ] − P (V ∈ D) is
bounded P–a.s. and does not depend on P . By bounded completeness we have

EP ψD (T )] = 0, P ∈P
Hence ψ(T ) = 0 P–a.s., that is E[V (X) ∈ B|T ] = EP [V (X) ∈ B] for all P ∈ P.
Example 19.6.19. Let σ 2 > 0 be fixed. Let {Xj : 1 ≤ j ≤ n} be a iid sample form a
normal Ppopulation {Pµ : µ ∈ R} with mean (unknown) µ and variance (known) σ 2 . Then,
1 n 1 Pn
X̄ := n j=1 Xj is sufficient and complete. The statistic S (X) := n−1 j=1 (Xj − X̄)2 is
2
19.7. Bayes model and conjugate priors 583

ancillary since its distribution, as in Example 19.6.17, it distribution does not depend on
the value of µ. Therefore, var X and S 2 are independent. Notice that
n
X̄ − µ 2 (n − 1)S 2 X Xj − µ 2
(19.11) n + =
σ σ2 σ
j=1

The first term in the left–hand side of (19.11) is the square of a normal N (0, 1) distribution
and so, it has ξ 2 distribution. The term in the right–hand side of (19.11) is the summ of
n iid normal N (0, 1) distributions, and so it has distribution χ2n . The independence of X̄
2
and S 2 implies that (n−1)S
σ2
has charactersitic function (1 − 2it)−(n−1)/2 which means that
2
it has χn−1 distribution.

19.7. Bayes model and conjugate priors

Let (∆, B) and (X , F ) be measurable spaces. A stochastic kernel Pθ (dx) = P (θ, dx)
from (∆, B) to (X , F ) is called a parametric model in Statistics. Denote by X and Θ
the projections from (X × ∆, F ⊗ B) to (X , F ) and (∆, B) respectively. A probability
measure Π on (∆, B) is called prior distribution. The probability measure P = Π P on
(X × ∆, F ⊗ B) has marginals
Z
P[X ∈ A] = P (θ, A) Π(dθ), P[Θ ∈ B] = Π[B]
∆

Disintegration implies that the conditional distribution of X given Θ exists and is given by
P[X ∈ dx|Θ = θ] = Pθ (dx).
Consider a parametric model {Pθ : θ ∈ ∆} such that Pθ ≪ µ for all θ ∈ ∆, where µ is
a σ–finite measure on F. Suppose there is a function f : (X × Θ, F ⊗ B) → (R+ , B(R+ ))
with
dPθ
(x) = fθ (x) := f (x; θ)
dµ
Then, the function L(θ; x) = fθ (x) is called likelihood function and ℓ(θ; x) := log(fθ (x))
is called log–likelihood function. Maximum likelihood estimators are solutions to the
problem
b
θ(x) : = arg max L(θ; x)
θ∈∆
= arg max ℓ(θ; x).
θ∈∆

The function ℓ̂(x) = supθ∈∆ L(θ; x) is called super–log–maximal function.

For the rest of this section we will focus on the class of models of exponential type. As
we will see, families of exponential type have rich convex structure which makes their study
amenable through convex analysis.
584 19. Conditioning and disintegration

Lemma 19.7.1. Suppose {Pθ : θ ∈ ∆} is a family of natural exponential type relative to a

Borel measure µ on Rn . The set
Z o

∆ := θ ∈ Rn : eθ·x µ(dx) < ∞
R
is a convex set and that the map Λ : Rn 7→ R given by Λ(θ) = log eθ·x µ(x) is proper,
lower semicontinuous and convex with dom(Λ) = ∆.

Proof. From Theorem 10.6.5 we know that ∆ is convex and that Λ is finite and convex on
∆. Clearly, if θ ∈
/ ∆ then Λ(θ) = +∞. Hence Λ is proper and convex.

Suppose (θn : n ∈ N) ⊂ ∆ and that θn → θ. By Fatou’s lemma

Z Z
e µ(dx) ≤ lim inf eθn ·x µ(dx).
θ·x
n

Therefore, Λ(θ) ≤ lim inf n Λ(θn ).

The log–likelihood function of this parametric model is given by

Z
ℓ(θ; x) = θ · x − Λ(θ), Λ(θ) = log eθ·x µ(dx) .

The super-log-maximal function is given by

ℓ̂(x) = sup {θ · x − Λ(θ)}.
θ∈Rn
This is the same as the Frenchel–Legendre transform of Λ.
Theorem 19.7.2. Suppose {Pθ : θ ∈ ∆} is an exponential model in Rn . Let Λ be the
extension to all Rn of the cumulant generating function. The super–log–maximum likelihood
function ℓ̂ is convex and lower semicontinuous on Rn and

ℓ̂(x) = sup θ · x − Λ(θ)
θ∈∆

Λ(θ) = sup θ · x − ℓ̂(x)
x∈Rn

Proof. As Λ is proper, lower semicontinuous and convex on Rn , the conclusion follows from
Frenchel-Legendre’s duality theorem B.2.14[(iv)].
Definition 19.7.3. A family of exponential type {Pθ : θ ∈ ∆} relative to a Borel measure
µ on Rn is said to be full if
(A) ∆o 6= ∅
(B) For any v ∈ Rn \ {0} and r ∈ R, 0 ≤ µ({x : v · x = r}) < 1.
(C) C = co(supp(µ)) has nonempty interior in Rn .

Assumptions (A), (B) and (C) guarantee that the model {Pθ : θ ∈ ∆} is a truly n–
dimensional and that if Pθ = Pη , then θ = η.
19.7. Bayes model and conjugate priors 585

Theorem 19.7.4. Suppose {Pθ : θ ∈ ∆} is a family of full exponential type relative to a

Borel measure on Rn .
C o ⊂ dom(ℓ̂) ⊂ C.

Proof. Without loss of generality, we may assume that µ is in fact a probability measure.
Indeed, fix θ0 ∈ ∆. As Pθ0 and µ are equivalent measures, they have the same support S.
Then by by shifting ∆ to ∆′ := ∆ − θ0 we can consider the exponential model P′θ′ (dx) =
′ ′ ′
eθ ·x−Λ (θ ) Pθ0 (dx) where θ′ ∈ ∆′ and Λ′ (θ′ ) = Λ(θ + θ0 ).

It t ∈
/ C then, by Theorem 12.10.15[(iii)] there exists v ∈ R and real constants α < β such
that
v · x ≤ α < β < v · t, x ∈ C.
Hence, for λ > 0
Z
−λv·t
e eλv·x µ(dx) ≤ e−λ(β−α) → 0

as λ → ∞. This shows that ℓ(λv; t) = λv · t − Λ(λv) → +∞ as λ → ∞. Therefore

ℓ̂(t) = +∞.

Suppose
that t ∈ C o . Then,o there is a finite set {sj : j = 1, . . . , m} ⊂ supp(µ) such that
t ∈ co(sj : j = 1, . . . , m) . For each u ∈ Sn−1 let Hu (t) be the hyperplane through t
with normal u. The function ρ : u 7→ max{d(sj , Hu ) : j = 1, . . . , m} is clearly continuous,
and so attains its minimum at some point u0 ∈ S n−1 . Since Hu is an affine space of
dimension (n − 1), ρ0 = ρ(u0 ) > 0. This means that for any u ∈ Sn−1 , the half–space
Hu+ (t) := {x : u · x ≥ u · t} contains at least one of the balls B(sj ; ρ0 ), each of which has
positive measure under µ. Hence

ξ(t) := inf µ(Hu+ (t)) ≤ min µ B(sj ; ρ0 ) > 0.
u∈Sn−1 1≤j≤m

To conclude, notice that for any θ ∈ Rn

Z

ξ(t) ≤ µ {x : x · θ ≥ t · θ} ≤ e−θ·t eθ·x µ(dx);

whence it follows that

θ · t − Λ(θ) ≤ − log ξ(t) < −∞.
Therefore, t ∈ dom(ℓ̂).

Suppose P = {Pθ : θ ∈ ∆} is a family of full exponential type with

Pθ (dx) = eθ·x−Λ(θ) µ(dx), θ∈∆
The conjugate family of priors on (∆, B(∆)) is defined as the collection of probability
measures
Πa,τ (dθ) = D(a, τ )eθ·τ −aΛ(θ) λ(θ)
586 19. Conditioning and disintegration

where λ is Lebesgue measure on Rn , and D(a, τ ) is a normalizing factor. Define

n Z o
(19.12) E = (a, τ ) ∈ R × Rn : eθ·τ −aΛ(θ) dθ < ∞
∆

Theorem 19.7.5. If a > 0, then

Ea := {τ ∈ Rn : (a, τ ) ∈ E} = {τ ∈ Rn : a−1 τ ∈ C o }
where C = co(supp(µ)).
R
Proof. Fix a > 0 and let fτ (θ) := aΛ(θ) − θ · τ . Then, (a, τ ) ∈ E iff e−f (θ) dθ < ∞. By
Theorem B.3.13, this is equivalent to continuity of the Frenchel–Legendre transform fτ∗ of
fτ at 0. Since

fτ∗ (x) = sup θ · (x + τ ) − aΛ(λ) = Λ∗ (x + τ )/a = ℓ̂ a−1 (x + τ ) ,
θ∈Rn
o
(a, τ ) ∈ E iff ℓ̂ is continuous at a−1 τ or, equivalently, a−1 τ ∈ dom(ℓ̂) ⊂ C o.

Example 19.7.6. Consider the normal distribution N (µ, σ 2 ) where both µ and σ 2 are
unknown. Then
1 (x−µ)2 µ 1 1 µ2
φµ,σ2 (x) = √ e− 2σ2 = exp 2 x − 2 x2 − log(2πσ 2 ) + 2
2πσ 2 σ 2σ 2 2σ
Let T1 (x) = x, T2 (x) = x2 , θ1 = σµ2 , and θ2 = − 2σ1 2 . Then, the normal distribution has a
(natural) exponential representation

fθ1 ,θ2 (t1 , t2 ) = exp θ · t − K(θ) ν(dt)

θ12
where θ ∈ R × (−∞, 0), K(θ) = 21 log( −θ π
2
) − 2
2θ2 , and ν is measure on R supported on
the parabola t2 = t21 . The conjugate measure is of form
pa,b (θ) = D(a, b)eθ·b−aK(θ)
By Theorem 19.7.5, the domain of conjugacy E contains the set {(a, b) : a > 0, ab2 > b21 }.
This can be seen directly from
Z Z ∞ Z ∞ 2
b1 θ1 +b2 θ2 −aK(θ1 ,θ2 ) 1 −b2 s a/2
aθ1
e dθ1 dθ2 = a e s eb1 θ1 − 4s dθ1 ds
R×(−∞,0) π2 0 −∞
Z ∞
2 a+3 b2
= √ s 2 −1 exp − s b2 − 1 ds
π a−1 a 0 a
a+3

2 Γ 2 1
= √ a+3 =
π a−1 a b2 2 D(a, b)
b2 − a1

which implies a > 0 and ab2 > b21 . To obtain conjugate measure in terms of the original
parameters (µ, σ 2 ), we apply the change of variables formula for integration. Consider the
19.8. Information inequality 587

µ
change of variable (θ1 , θ2 ) = G(µ, σ 2 ) = σ2
, − 2σ1 2 on R × (0, ∞). Then,

qa,b (µ, σ 2 ) = pa,b (G(µ, σ 2 ))|JG (µ, σ 2 )|

a 2b1 b2 π a2 1
= D(a, b) exp − 2 µ2 − µ+
2σ a a 2σ 2 2σ 6
D(a, b)π 2
a a b1 2 1 1 b21
= a+2 exp − µ − a+6 exp − b 2 −
2 2 2σ 2 a (σ 2 ) 2 2σ 2 a
r b2 a+3
a a b1 2 b22 − 2a1 2 1 1 b2 b21
= exp − µ − a+3 exp − −
2πσ 2 2σ 2 a Γ a+3 (σ 2 ) 2 +1 σ2 2 2a
2

The expression qa,b (µ, σ 2 ) can be interpreted as follows: given σ 2 , the distribution
of µ is
b1 σ 2
2 a+3 1 b2
normal N a , a ; while marginally, σ has inverse–gamma distribution Ig 2 , 2 b2 − a1 .

19.8. Information inequality

The following application is a well known result in the theory of point estimation of pa-
rameters in Statistics. Consider a measurable space (X, B) and let µ be a σ–finite measure
on B. Suppose that an open set ∆ ⊂ Rk parameterizes a family of probability densities
{fθ : θ ∈ ∆}. The problem in parameter estimation is to find a statistic T = T (x) that
estimates a function g : Φ −→ Rp of the parameter θ.
R
Suppose that T ∈ L2 (fθ dµ) and let g(θ) = Eθ [T ] = X T (x)fθ (x) µ(dx). We will further
assume that g and fθ satisfy enough regularity conditions that allow the exchange of order
of differentiation and integration. Then, we have that
Z
(19.13) ∂θ g(θ) = T (x)∂θ fθ (x) µ(dx) = Eθ [T ∂θ logθ (fθ )]
Z
(19.14) 0 = ∂θ fθ (x) µ(dx) = Eθ [∂θ log(fθ )]

The function s : (x, θ) 7→ ∂θ log(fθ (x)) is called the score function. From (19.13) and
(19.14) we have that ∂θ g(θ) = cov(T, s⊤θ ) = Eθ [(T − g(θ))(sθ − Eθ (sθ ))].

Theorem 19.8.1. For any real valued functions ψ1 , . . . , ψk on X ×∆ such that for ψi (·, θ) ∈
L2 (fθ dµ) for each θ ∈ ∆, define

C(θ) = varθ (ψ) = Eθ [(ψ − Eθ [ψ])(ψ − Eθ [ψ])⊤ ] = covθ (ψi , ψj )

γ(θ) = covθ (ψ, T ) = Eθ [(ψ − Eθ [ψ])(T − Eθ [T ])⊤ ] = covθ (ψi , Tj )

where ψ = (ψ1 , . . . , ψk )⊤ . If C(θ) is invertible for each θ ∈ ∆, then

(19.15) varθ (T ) ≥ γ ⊤ (θ)C −1 (θ)γ(θ)

where the inequality is in the sense of symmetric matrices.

588 19. Conditioning and disintegration

Proof. Choosing an arbitrary v ∈ Rp and considering v⊤ g(θ) instead of g(θ) shows that it
suffices to consider the case p = 1.
For any a ∈ Rk , the Cauchy-Schwartz inequality shows that
2
covθ (T, a⊤ ψ)
(19.16) varθ (T ) ≥ .
varθ (a⊤ ψ)
Since covθ (T, a⊤ ψ) = a⊤ γ(θ) and varθ (a⊤ ψ) = a⊤ C(θ)a, we conclude that
a⊤ γ(θ)γ ⊤ (θ)a
(19.17) varθ (T ) ≥ sup = ρ(θ).
a6=0 a⊤ C(θ)a
From the theory of symmetric matrices, we know that ρ equals to the largest eigenvalue of
⊤
the matrix C −1 γγ ⊤ , which has the same eigenvalues as C −1/2 γγ ⊤ C −1/2 = C −1/2 γ C −1/2 γ .
Therefore, ρ = γ ⊤ C −1 γ and (19.15) follows.

If ψ(θ, x) = ∂θ⊤ log(fθ (x)), then (19.15) takes the form

(19.18) varθ (T ) ≥ ∂θ g(θ)I −1 (θ)∂θ⊤ g(θ)

where I(θ) = Eθ [∂θ⊤ log(fθ )∂θ log(fθ )]. In Statistics, I(θ) is refereed as the Fisher’s infor-
mation matrix, and (19.18) as Cramér–Rao’s information inequality.

19.9. Exercises
Exercise 19.9.1. Suppose that A = σ({A1 , . . . , An }) where the sets {Aj } form
Pa pairwise
disjoint measurable partition of Ω. Show that for any f ∈ L1 (E), E[f |A ] = nj=1 aj 1Aj ,
E[f 1Aj ]
where aj = P[Aj ] if P[Aj ] > 0, or aj = 0 otherwise.
P[A∩B]
Exercise 19.9.2. For any pair of measurable sets A and B, P[A|B] := P[B] if P[B] > 0
or P[A|B] = 0 otherwise. Let A ⊂ F be a sub–σ–algebra and A ∈ A .
(i) (Bayes’s formula) Suppose that P[B] > 0. Show that
E[1A E[1B |A ]]
P[A|B] =
E[E[1B |A ]]
(ii) If A is generated by a partition {A1 , . . . , An }, show that
P[B|Ak ]P[Ak ]
P[Ak |B] = Pn
j=1 P[B|Aj ]P[Aj ]

Exercise 19.9.3. Let A ⊂ F be a σ–algebra. If f ∈ L1 (P) and σ(f ) is independent from

A , then E[f |A ] = E[f ].
Exercise 19.9.4. Let (Ω, F , µ) be a probability space and G a sub σ–algebra in F . Show
that T : f → µ[f |G ], f ∈ L1 (µ), is a positive contraction and that T ∗ g = µ[g|G ], g ∈ L∞ (µ).
19.9. Exercises 589

Exercise 19.9.5. Let (T, T ) and (U, U ) be Borel spaces and let µ and ν be probability
measures on (T × U, T ⊗ U ). Assume that µ ≪ ν and that
dµ
(t, u) = a(t)b(u)
dν
Let µT and νT be the marginals on (T, T ) of µ and ν respectively. Similarly, let µU |T
and νU |T be the regular conditional probabilities of U given T with respect to µ and ν
respectively. Show that
(i) µT ≪ νT and
Z
dµT
(t) = a(t) b(u)νU |T (du|t)
dνT U

(Hint: Consider g(t, u) = 1A (t)1B (u), compute Eµ [g] and apply disintegration.)

Exercise 19.9.6. Let P = {Pθ,ψ : (θ, ψ) ∈ ∆}, ∆ ⊂ Rk × Rm open, be a family of

exponential type of probability measures on (Ω, F ) with
dPθ,ψ
= c(θ, ψ) exp θ · T (x) + ψ · U (x)
dν
where ν is a σ–finite measure on (Ω, F ). Let πk any πm be the projection onto Rk and Rm
respectively and let ∆θ = {ψ : (θ, ψ) ∈ ∆}. Show that:
(a) The law of V (x) = [T (x) U (x)]⊺ is exponential type relative to ν ◦ V −1 .
(b) There is a σ–finite kernel λ from πm (∆) to Rk such that for ψ ∈ πm (∆), the law
T of T is of exponential type relative to λ(ψ, ·) and
Pθ,ψ
T
dPθ,ψ (dt) = c(θ, ψ)eθ·t λ(ψ, dt)

(c) There is a σ–finite kernel µ from Rm to Rk such that for θ ∈ πk (∆), the conditional
T |U
distribution Pθ,ψ of T given U is of exponential type relative to µ(U, ·) and
T |U
dPθ,ψ (dt|U ) = c̄θ (y)eθ·t µ(U, dt)
Conclude that for θ ∈ πk (∆) fixed, U is a sufficient statistic for {Pθ,ψ : ψ ∈ ∆θ }.

Exercise 19.9.7. If X : (Ω, F ) −→ (S T , S ⊗T ) is measurable and P is a probability

measure on F , show that family of probability measures µI = P ◦ (pI ◦ X)−1 on S ⊗I ,
where I ⊂ T and I is finite, is projective.

Exercise 19.9.8. Assume Pθ ≪ µ for all θ ∈ ∆ where σ–finite measure µ on F and let
fθ = Pdµθ .
590 19. Conditioning and disintegration

(a) Show that the marginal of PX of X is absolutely continuous with respect to µ and
has density
Z
dPX
m(x) := = fθ (x) Π(dθ).
dµ ∆
(b) Show that the conditional distribution of P[Θ ∈ dθ|X] - called posterior distri-
bution- exits, is absolutely continuous with respect to Π, and
P[Θ ∈ dθ|X] fθ (X)
= .
dΠ m(X)
dΠ
(c) If Π ≪ τ for some σ–finite measure τ on (∆, B) and π = dτ , show that the
posteriori distribution P[Θ ∈ dθ|X] ≪ τ and
P[Θ ∈ dθ|X] fθ (X)π(θ)
= .
dτ m(X)
Exercise 19.9.9. Consider the normal distribution N (µ, σ 2 ) where µ is a fixed known
number and σ 2 is unknown. Show that this distribution admits a natural exponential
representation of the form

1 π
fθ (x) = eθT (x)− 2 log −θ
where T (x) = (x − µ)2 , and θ = − 2σ1 2 . Show that the conjugate prior has density w.r.t.
Lebesgue measure on (0, ∞) given by
b
a+3 a+3 b 1
2 2
2
1 2
+1
ga,b (σ ) = exp −
Γ a+32
σ2 2 σ2
and that its domain of conjugacy E = {(a, b) : a > 0, b > 0}. This means that the conjugate
prior is distributed as an inverse-gamma Ig a+32 , a
2 .
Chapter 20

Martingales

A martingale {Xt : t ∈ T}, where T ⊂ R, is a family of random variables (the index t

represents time evolution) that do not anticipate the future given the present information.
For example, if Xt is the fortune at time t of a gambler, then a martingale corresponds
to a fair game. Similarly, a submartingale correspond to betting on a favorable game; a
supermartingale corresponds to betting on an unfavorable game. Typically one considers
time T to be Z+ or [0, ∞).

20.1. Measurability concepts for stochastic processes

Suppose (S, S ) is a Borel space and let T be a non-empty index set, and consider the
product space (S T , S ⊗T ). Kolmogorov’s extension theorem shows that for any projec-
tive family {µI : I ⊂ T , I finite} of probability measures on S ⊗I there exists a unique
probability measure µ on S ⊗T .
The identity map on S T is called canonical S–valued stochastic process in T with
distribution µ; the coordinate evaluation maps Xt (s) = s(t) are the values of the process at
t.
For any probability space (Ω, F , P), a measurable map Xe : (Ω, F ) −→ (S T , S ⊗T ) is an
S–valued stochastic process in T with distribution P ◦ X e −1 ; the map X
et : ω 7→ (X(ω))(t)
e
e
is the value of the process at t. An S–valued stochastic process X can also be viewed as a
function Xe : Ω × T → S.

Definition 20.1.1. Let (Ω, F ) be a measurable space and suppose T ⊂ R. A filtration

{Ft : t ∈ T} is a family of sub–σ–algebras such that Fs ⊂ Ft ⊂ F for every s ≤ t with
s, t ∈ T. A stochastic process X is said to be adapted to the filtration {Ft : t ∈ T} if
Xt : ω 7→ X(ω, t) is Ft –measurable for all t ∈ T.
Example 20.1.2. Let X : Ω × T → (S, S ) be a process. For each t ∈ T, let
FtX = σ(Xs : s ∈ T, s ≤ t).

591
592 20. Martingales

The filtration {FtX : t ∈ T}, called the natural history of X, is the smallest filtration
with respect to which X is adapted.
Definition 20.1.3. When T = [0, ∞), B̂ = (Ω ⊗ [0, ∞), F ⊗ B([0, ∞)) is referred as the
base space. Suppose (Ω, Ft )t≥0 is a filtered space. A process X : Ω × [0, ∞) → S taking
values on a measurable space (S, S ) is progressively measurable if for each t ≥ 0 the
process defined by X t : (ω, s) 7→ Xt∧s (ω) is Ft ⊗ B([0, ∞)) − S –measurable.
Remark 20.1.4. A set Γ ⊂ Ω × [0, ∞) is progressively measurable if the process 1Γ is
progressively measurable. Thus, Γ is progressively measurable iff Γ ∩ Ω × [0, t] ∈ Ft ⊗
B([0, t]).

Loosely speaking, progressive measurability means that the information on the evolution
of the process X up to time t is contained in Ft .
Theorem 20.1.5. Let X be a stochastic process in a filtered space (Ω, {Ft : t ∈ T}).
(i) If X is progressively measurable, then it is adapted.
Suppose X takes values on a metric space.
(ii) If X is left–continuous and adapted, then X is progressively measurable.
(iii) If X is right–continuous and adapted, then X is progressively measurable.
(iv) If {Xn : n ∈ N} is a sequence of progressively measurable processes converging
poitwise to X, then X is progressively measurable.

Proof. (i) Suppose A ∈ S . For any t ≥ 0 fixed, (X t )−1 (A) ∈ Ft ⊗ B([0, ∞)). As the
t
t–cross section (X t )−1 (A) = (Xt )−1 (A), the conclussion follows from Lemma 9.4.1.

(ii) For t fixed, consider the sequence

X ω, 2kn t if 2kn t ≤ s < k+1
2n t, k = 0, . . . , 2n − 1
(20.1) Yn (ω, s) =
X(ω, t) if s ≥ t
Each Yn , n ∈ N, is right–continuous and Ft ⊗ B([0, ∞))–measurable. The left–continuity
of X implies that limn Yn (ω, s) = X(ω, s ∧ t) and, by Theorem 3.6.8, X is progressively
measurable.

(iii) For t fixed, consider the sequence

X ω, k+1
2n t if 2kn t ≤ s < k+1
2n t, k = 0, . . . , 2n − 1
(20.2) Zn (ω, s) =
X(ω, t) if s ≥ t
Each Zn , n ∈ N, is right–continuous and Ft ⊗ B([0, ∞))–measurable. The right continuity
of X implies that limn Zn (ω, s) = X(ω, s ∧ t) and, by Theorem 3.6.8, X t is Ft ⊗ B([0, ∞))–
measurable. As this holds for each t ≥ 0, we have that X is progressive–measurable.

(iv) For any t ≥ 0, {(Xn )t : n ∈ N} converges to X t poitwise. The conclusion follows from
Theorem 3.6.8.
20.1. Measurability concepts for stochastic processes 593

A stochastic process X on [0, ∞) with values in a metric space (S, d) is said to be

right contonuous with left limits( or càdlàg ) if for any ω ∈ Ω, t → X(ω, t) is a right
continuous function, and for any t > 0, limsցt X(ω, s) =′ : X(ω, t−) exists. Similarly, X is
said to be left contonuous with right limits( or càdlàg ) if for any ω ∈ Ω, t → X(ω, t)
is a left continuous function, and for any t ≥ 0, limsրt X(ω, s) =′ : X(ω, t+) exists.
Given a filtered space (Ω, Ft )t≥0 , the σ–algebra P on Ω×[0, ∞) generated by all the F –
adapted left continuous functions with right limits is called the predictable σ–algebra. The
σ–algebra O on Ω × [0, ∞) generated by all the F –adapted right–continuous functions with
left limits is called the optional σ–algebra. Theorem 20.1.5 implies that any predictable
or optional process is progressive measurable.
T
Definition 20.1.6. Suppose {Ft : t ∈ [0, ∞)} is a filtration in Ω. Define Ft+ = t<s Fs .
A filtration Ft is right–continuous if Ft = Ft+ for all t ≥ 0.

Clearly {Ft+ : t ≥ 0} is a filtration and Ft ⊂ Ft+ for all t ≥ 0. Moreover, if Gt = Ft+ ,

then Gt+ = Gt . This follows from
\ \ \
Fu = Fu = Ft+
t<s s<u t<u
S
If {Ft : t ≥ 0} is a filtration then A∞ = t≥0 Ft is an algebra of subsets in Ω. Denote
by A∞σ the collection of countable unions in A∞ . A set N ⊂ Ω is called nearly empty if
there exists a set A∞σ ∋ M ⊃ N with P[M ] = 0. We will use NP to denote the collection
of all nearly empty sets in Ω.
Roughly speaking, a nearly empty set A is either a negligible set that someone with a
long span life would be able to measure at some point t in time, or a set which is a countable
union of such negligible measurable sets.
Definition 20.1.7. Let (Ω, F , P) be a probability space and {Ft ⊂ F : t ∈ T} be a
filtration on Ω. The natural regularization of Ft with respect to P is defined by
FtP = {A ⊂ Ω : A△AP ∈ NP, for some AP ∈ Ft }.
If P is a collection of probability measures on (Ω, F ) then the natural P–regularization
of {Ft : t ∈ T} is defined as
\
FtP := FtP, t ∈ T.
P∈P

FtP contains the completion of Ft and if P is the collection of all probability measures
on (Ω, F ) then FtP contains the universal completion of Ft .
Lemma 20.1.8. {FtP : t ∈ T} is a filtration and the restrictionSof P to FtP is complete for
each t ∈ T. Moreover, Ft ⊂ FtP = σ(Ft ∪ NP) and F∞ P := σ
t∈T Ft
P = σ(F ∪ N ).
∞ P

Proof. We first show that FtP is a σ–algebra. Since A△B = Ac △B c , it follows that FtP
is closed under taking complements. For {An } ⊂ FtP, let {AP P
n } ⊂ Ft such that An △An ⊂
594 20. Martingales

S S S S
Nn ∈ A∞σ and P[Nn ] = 0. Then, △ P P
n An n An ⊂ n (An △An ) ⊂ n Nn ∈ A∞σ .
S
Therefore, n An ∈ FtP.

By definition, Ft ∪ NP ⊂ FtP; hence σ(Ft ∪ NP) ⊂ FtP. Conversely, if A ∈ FtP, there is

AP ∈ Ft with A△AP ⊂ N ∈ A∞σ and P[N ] = 0. In particular, AP \ A and A \ AP belong
to NP. Since A = (AP \ (AP \ A)) ∪ (A \ AP), we conclude that A ∈ σ(Ft ∪ NP).

To show that FtP is complete is enough to show that if E ∈ FtP and P[E] = 0 then E ∈ NP.
In such case, there is E P ∈ Ft and N ∈ A∞σ with P[N ] = 0 such that E P△E ⊂ N , which
is the equivalent to E P \ N ⊂ E ⊂ E P ∪ N ∈ A∞σ . As P[E P] = P[E] = 0, it follows that
E ∈ NP. The last statement is evident.
Lemma 20.1.9. The right–continuous augmentation of the P–regularization of {Ft : t ≥ 0}
is the same as the P–regularization of the right–continuous augmentation {Ft+ : t ≥ 0}.

Proof. First notice that ∪t Ft+ = ∪t ∩u>t Fu = ∪t Ft . Therefore the filtrations Ft and
Ft+ have the same nearly empty sets.
Denote by Gt = Ft+ and by Ht = FtP. We want to show that GtP = Ht+ . If A ∈ Ht+ ,
then A ∈ FuP for all u > t. Hence, for each n there is An ∈ Ft+1/n such that A△An ∈ NP.
T S
Let Ã = n m≥n Am , then Ã ∈ Gt and A△Ã ⊂ ∪n A△An ∈ NP. Thus, Ht+ ⊂ GtP.
Conversely, if A ∈ GtP, then there is A′ ∈ Gt such that A△A′ ∈ NP. Since A′ ∈ Fu for all
u > t, then A ∈ FuP = Hu for all u > t. It follows that A ∈ Ht+ , that is, GtP ⊂ Ht+ .
Definition 20.1.10. Suppose (Ω, F , P) is a probability space. The natural augmenta-
tion of a filtration {Ft : T} is defined as {FtP : t ∈ T} if T ⊂ Z+ or {(F P)t+ : t ∈ T}
(equivalently {Ft+ )P : t ∈ T}) if T = [0, ∞). A filtration satisfies the natural conditions
if it is equal to its natural extension.

20.2. Stopping times

Definition 20.2.1. Assume that T is either [0, ∞) or Z+ and let T : Ω → T.
(a) T is a stopping time with respect to {Ft : t ∈ T} if {T ≤ t} ∈ Ft for any t ∈ T.
(b) If T = [0, ∞), then T is an optional time with respect to {Ft : t ≥ 0} if
{T < t} ∈ Ft for any t ∈ [0, ∞).
For any set Γ ⊂ Ω × [0, ∞), the map DΓ : Ω −→ R+ given by
DΓ (ω) = inf{t ≥ 0 : (ω, t) ∈ Γ}
is called the debut time of Γ.

Suppose T and S are stopping and optional times with respect {Ft : t ∈ T} respectively.
The collections of sets
FT := {A ∈ F : A ∩ {T ≤ t} ∈ Ft , ∀t ∈ T}
20.2. Stopping times 595

and
FS+ := {A ∈ F : A ∩ {S < t} ∈ Ft , ∀t ∈ T}
S S
are sub σ–algebras of F . Indeed, clearly Ω ∈ FT . Since {T ≤ t}∩ n An = n {T ≤ t}∩An ,
c
cunder countable unions. If A ∈ FT then, as A ∩ {T ≤ t} =
it follows that FT is closed
{T ≤ t} \ A ∩ {T ≤ t} , A ∈ FT . Similar arguments show that FS+ is a sub σ–algebra
of F .
Remark 20.2.2. Since {T ≤ t} ∩ {T ≤ s} = {T ≤ t ∧ s} ∈ Ft∧s , T is FT –measurable. If
T ≡ t, show that FT = Ft . A similar argument shows that S is FS+ measurable.

Clearly, any stopping time is an optional time; however, the converse statement depends
upon the continuity properties of the filtration Ft .
Lemma 20.2.3. Suppose T = [0, ∞). T is an F –optional time iff T is an F+ –stopping
time. In such case,
\
FT +h = FT + = A ∈ F∞ : A ∩ {T ≤ t} ∈ Ft+
h>0

Proof. We only consider the case T = [0, ∞). Notice that

[ t \ 1
{T < t} = {T ≤ t − }, {T ≤ t} = {T < t + }.
n
n n
n

If A ∩ {T < t} ∈ Ft for all t ≥ 0 then A ∩ {T ≤ t} ∈ Ft+ 1 for all n ≥ 1, and so

n
A ∩ {T ≤ t} ∈ Ft+ for all t ≥ 0.

Conversely, if A ∩ {T ≤ t} ∈ Ft+ for all t ≥ 0, then A ∩ {T ≤ t − nt } ∈ F(t− t )+ ⊂ Ft , and

n
so A ∩ {T < t} ∈ Ft for all t ≥ 0. Therefore,
FT + = {A ∈ F : A ∩ {T ≤ t} ∈ Ft+ , ∀t ≥ 0}

The first
T statement follows by letting A = Ω. To prove the last statement, observe that
A ∈ h>0 FT +h iff A ∩ {T + h ≤ t} ∈ Ft for all t ≥ 0Tand h > 0. This is equivalent to
A ∩ {T ≤ t} ∈ Ft+h for all t ≥ 0 and h > 0. Hence A ∈ h>0 FT +h iff A ∩ {T ≤ t} ∈ Ft+
for all t ≥ 0.

A direct consequence of Lemma 20.2.3 is that if {Ft : t ≥ 0} is right–continuous, i.e.,

Ft = Ft+ , then an Ft –optional time is an Ft –stopping time and FT = FT + .
Theorem 20.2.4. Assume X is a progressively measurable process taking values in a metric
space. If T is a stopping time, then
(i) the process X T : (ω, s) 7→ X(ω, T (ω) ∧ s) is progressively measurable.
(ii) XT : ω 7→ X(ω, T (ω)) is FT –measurable.
596 20. Martingales

Proof. (i) For any t ∈ T, the map ΦT,t : (ω, s) 7→ (ω, T (ω) ∧ t ∧ s) is Ft ⊗ B([0, ∞))–
Ft ⊗ B([0, ∞)) measurable. Indeed, for any u ≥ 0

Ω if t ≤ u
{T ∧ t ≤ u} =
{T ≤ u} if u < t
which means that {T ∧ t ≤ u} ∈ Ft for all u ≥ 0. Hence, for any A ∈ Ft

Φ−1
T,t (A × [0, u]) = A × [0, u] ∪ A ∩ {T ∧ t ≤ u} × (u, ∞) ∈ Ft ⊗ B([0, ∞))

Since (X T )t = X t ◦ ΦT,t , we conclude that X T is progressively measurable.

(ii) By part (i) it follows that XtT is adapted to Ft . Therefore {XT ∈ B} ∩ {T ≤ t} =

{XtT ∈ B} ∩ {T ≤ t} ∈ Ft for all B ∈ B(S) and t ≥ 0.
Lemma 20.2.5. Suppose that X is Ft –adapted. If T is a stopping time taking values on
a countable set, then XT is FT –measurable.

Proof. Let {tn : n ∈ N} be an enumeration of T (Ω). Since {T = tn } ∈ Ftn and X is

adapted,
[
XT−1 (A) ∩ {T ≤ t} = Xt−1
n
(A) ∩ {T = tn } ∈ Ft
tn ≤t

for all t ≥ 0.
Theorem 20.2.6. If T , S and {Tn } are stopping times. Then,
(i) S + T , S ∧ T , S ∨ T and supn Tn are stopping times.
(ii) If S ≤ T , then FS ⊂ FT , and S is FT –measurable.
(iii) A ∩ {S ≤ T } ∈ FT for all A ∈ FS .
(iv) In addition, if Ft is right–continuous then inf n Tn is a stopping time.

Proof. We consider only the case T = [0, ∞]. The case T = Z+ is left as an exercise.

(i) The conclusion {T + S > t} ∈ Ft for each t ≥ 0 follows directly from the identity
[
{S + T > t} = {S > t} ∪ {T > t} ∪ {q < T ≤ t} ∩ {t − q < S ≤ t} .
q∈Q∩(0,t]
T
The last statement follows from {supn T ≤ t} = n {Tn ≤ t}.

(ii) Suppose S ≤ T . For any a ≥ 0, {S ≤ a} ∩ {T ≤ t} = {S ≤ a ∧ t} ∩ {T ≤ t} ∈ Ft , and

so S is FT –measurable.
If A ∈ FS , then A ∩ {T ≤ t} = (A ∩ {S ≤ t}) ∩ {T ≤ t} ∈ Ft . Hence FS ⊂ FT .

(iii) Suppose A ∈ FS . The process Zt (ω) = 1A (ω)1[S(ω),∞] (t). is right–continuous and

adapted to Ft and so it is progressively measurable. In particular, ZT = 1A∩{S≤T } is
FT –measurable.
20.2. Stopping times 597

S
(iv) If {Tn : n ∈ N} is a sequence of optional times, then {inf n Tn < t} = n {Tn < t} ∈ Ft
for all t ≥ 0. Hence T = inf n Tn is an optional time. By Lemma 20.2.3, if {Ft : t ≥ 0} is
right continuous, then T is a stopping time.
Corollary 20.2.7. Suppose T and S are stopping times. Then,
(i) FS∧T = FS ∩ FT .
(ii) Each of the events {T < S}, {S < T }, {T ≤ S}, {S ≤ T }, and {T = S} belong to
FS∧T .
(iii) {S = T } ∩ FS = {T = S} ∩ FT .

Proof. (i): Clearly FS∧T ⊂ FS ∩ FT . If A ∈ FS ∩ FT , then

A ∩ {S ∧ T ≤ t} = (A ∩ {S ≤ t}) ∪ (A ∩ {T ≤ t} ∈ Ft
Therefore, FS∧T = FS ∩ FT .
(ii): Theorem 20.2.6 shows that {S ≤ T } ∈ FT and, reversing the roles of S and T , that
B ∩ {T ≤ S} ∈ FS for all B ∈ FT . Hence {S = T } = {S ≤ T } ∩ {T ≤ S} ∈ FS .
Again, reversing the roles of S and T gives {S = T } ∈ FS∧T . It is now obvious that
{T ≤ S}c = {S < T } = {S ≤ T } \ {S = T } ∈ FS∧T , and so on.
(iii): By part (ii), if A ∈ FS , then A ∩ {S ≤ T } ∈ FT ∩ FS = FS∧T . Hence, if A ∈ FS ,
then A ∩ {S = T } = A ∩ {S ≤ T } ∩ {T ≤ S} ∈ FS∧T . Changing the roles of S and T
shows that B ∩ {S = T } = B ∩ {T ≤ S} ∩ {S ≤ T } ∈ FS∧T for all B ∈ FT . Therefore
{S = T } ∩ FT = {S = T } ∩ FS .
Theorem 20.2.8. Suppose Z ∈ L1 (P). Then
1{T ≤S} E[Z|FT ] = 1{T ≤S} E[Z|FS∧T ]
Moreover,
E[E[Z|FT ]|FS ] = E[Z|FS∧T ]

Proof. If B ∈ FT , then B ∩ {T ≤ S} ∈ FS∧T ; hence,

E[1B 1{T ≤S} E[Z|FT ]] = E[1{T ≤S}∩B Z] = E[1B 1{T ≤S} E[Z|FS∧T ]].

If A ∈ FS , then A∩{S < T } = (A∩{S ≤ T })\{S = T } ∈ FS∧T by Corollary 20.2.7[(i),(ii)].

Hence,
E[1A E[Z|FT ]] = E[1A 1{T ≤S} E[Z|FT ]] + E[1A 1{S<T } E[Z|FT ]]
= E[1A 1{T ≤S} E[Z|FS∧T ]] + E[1A 1{S<T } Z]
= E[1A 1{T ≤S} E[Z|FS∧T ]] + E[1A 1{S<T } E[Z|FT ∧S ]]
= E[1A E[Z|FS∧T ]];
that is, E[E[Z|FT ]|FS ] = E[Z|FS∧T ].

The next result shows that a stopping time can be approximated from above by stopping
times that take countably many values.
598 20. Martingales

Lemma 20.2.9. Suppose T = [0, ∞). Let T be a stopping time w.r.t. some filtration Ft .
Let
Tn = 2−n ([2n T ] + 1)1T <∞ + ∞1{T =∞} .
Then {Tn } is
Ta sequence of stopping times and Tn ց T . Moreover, if Ft is right–continuous,
then FT = n FTn .

Proof. If k = [2n t], then {Tn ≤ t} = {T < k/2n } ∈ Fk/2n ⊂ Ft . As dyadic numbers are
dense in R, Tn ց T .
T T
Clearly FT ⊂ FTn . Suppose A ∈ n FTn . From
[
A ∩ {T < t} = A ∩ {Tn < t} ∈ Ft
n
T
we obtain A ∈ FT + . Therefore, if Ft is right–continuous, we conclude that FT = FT n .

Optional times are often constructed recursively in terms of shifts on the underlying
path space. Recall that Xt ◦ θs (ω) = Xt+s (ω) := ω(t + s) for all s, t ≥ 0. For any pair of
optional times S and T on the canonical space consider the random time U = S + T ◦ θS
defined as
+∞ if S(ω) = ∞
U (ω) =
s + T (θs ω) if S(ω) = s < ∞
Then

XT ◦ θS (ω) := XS(ω)+T (θS(ω) ) (ω) = ω S(ω) + T (θS(ω) ω)

if S(ω) < ∞.
Theorem 20.2.10. (compound optional times) For any metric space (S, d), let S and T
be optional times on the canonical space S Z+ , C([0, ∞), S) or D([0, ∞), S) endowed with the
right–continuous filtration Ft+ = σ(Xs : s ≤ t)+ . Then U = S + T ◦ θS is an optional time.

Proof. For S Z+ the proof is simple and is left as an exercise. For canonical spaces
C([0, ∞), S) or D([0, ∞), S), the process X is progressively measurable. As S∧n+T ◦ωS∧n ր
U , we may assume without loss of generality that S is bounded. By Theorem 20.2.4(b),
XS+t ∈ F(S+t)+ for all t ≥ 0. Therefore, for any set A = {Xs ∈ B} with B ∈ B(S) and
0 ≤ s ≤ t we have θS−1 (A) ∈ F(S+t)+ . The set of all such sets A generate the history
σ–algebra Ft := σ(Xs : s ≤ t) and thus,
θS−1 (Ft ) ⊂ F(S+t)+
For t ≥ 0 fixed,
[
{U < t} = {S < q} ∩ {T ◦ θS < t − q}
q∈Q∩(0,t)
20.3. Martingales and Stopping times 599

If 0 < q < t then {T < t − q} ∈ Ft−r and thus, θS−1 ({T < t − q}) ∈ F(S+t−q)+ . By
Lemma 20.2.3
{S < q, T ◦ θS < t − q} = {S + t − q < t} ∩ θS−1 (T < t − q} ∈ Ft .
Therefore, {U < t} ∈ Ft and so U is an Ft –optional time.

Suppose (E, E) is a measurable space and let B ∈ E. If X is an E–valued adapted

process, then the first hitting time of B by X, denoted by TB , is defined as the debut time
of X −1 (B), that is,
TB (ω) = inf{t ≥ 0 : Xt (ω) ∈ B} = DX −1 (B) (ω)
In general, the debut time of a set Γ ⊂ Ω × [0, ∞) is not a stopping time. The following
result gives sufficient regularity conditions so that a debut time is a stopping time.
Theorem 20.2.11. Suppose that the filtration Ft satisfies the natural conditions. If Γ ⊂
Ω × [0, ∞) is progressively measurable, then DΓ is a stopping time. Consequently, if X is a
progressive measurable E–valued process and B ∈ E, then TB is a stopping time.

Proof. By the assumption on {Ft : t ≥ 0}, it is enough to show that {D Γ < t} ∈ Ft for
all t ≥ 0. Since Γ is progressively measurable, then Γt := Γ ∩ Ω ∩ [0, t) ∈ Ft ⊗ B([0, t])
for all t ≥ 0. Notice that if pΩ is the projection (ω, s) 7→ ω, then
{DΓ < t} = pΩ (Γt )
The measurable projection theorem 3.8.6 shows that pΩ (Γt ) is universally measurable with
respect to Ft . Since Ft is complete, then pΩ (Γt ) ∈ Ft .

20.3. Martingales and Stopping times

Definition 20.3.1. Suppose that X = {Xt : t ∈ T} ⊂ L1 (P) is adapted with respect to a
filtration Ft . with respect to a filtration {Ft } if
(i) X is called a martingale if for any s, t ∈ T with s ≤ t,
(20.3) Xs = E[Xt |Fs ]
(ii) X is called a submartingale if for any s, t ∈ T with s ≤ t
(20.4) Xs ≤ E[Xt |Fs ]
(iii) X is called a supermartingale if for any s, t ∈ T with s ≤ t
(20.5) Xs ≥ E[Xt |Fs ]
When no mention to the specific filtration is made, then it is assumed that Ft = σ(Xs :
s ∈ T, s ≤ t).
Example 20.3.2. Suppose thatP{ξn }n is a sequence of integrable independent random
variables and let S0 = 0, Sn = nj=1 ξj for n ≥ 1. Consider the filtration F0 = {∅, Ω},
Fn = σ(ξj : 1 ≤ j ≤ n) for n ≥ 1. Then Sn
(i) If E[ξn ] = 0 for all n, then Sn is a martingale with respect Fn ;
600 20. Martingales

(ii) If E[ξn ] ≥ 0 for all n, then Sn is a submartingale;

(iii) If E[ξn ] ≤ 0 for all n, then Sn is a supermartingale.
Remark 20.3.3. The term martingale is associated to that of harmonic functions. Recall
Pd ∂ 2 d
from calculus the Laplace operator △ = j=1 ∂x2 . A function f : D ⊂ R → R is
j
harmonic if △f = 0; subharmonic if △f ≥ 0; superharmonic if △f ≤ 0. Stokes’s
theorem gives another characterization of such functions. Let us denote by λd the Lebesgue
measure on Rd .
1
R
(a) f is harmonic in D iff f (x) = λd (B(0;r)) B(x;r) f (y) dy for all x ∈ D and r > 0 such
that B(x, r) ⊂ D;
1
R
(b) f is subharmonic in D iff f (x) ≤ λd (B(0;r)) B(x;r) f (y) dy for all x ∈ D and r > 0
such that B(x, r) ⊂ D;
1
R
(b) f is superharmonic in D iff f (x) ≥ λd (B(0;r)) B(x;r) f (y) dy for all x ∈ D and r > 0
such that B(x, r) ⊂ D.
If {ξn } is a sequence of i.i.d random variable uniformly distributed in the disc B(0; 1) and
f is harmonic on Rd , then f (Sn ) is a martingale. Similarly, if f is sub(super)–harmonic,
then f (Sn ) is a sub(super)–martingale.

Proof.
Z
1
E[f (Sn+1 )|Fn ] = E[f (Sn + ξn+1 )|Fn ] = B(0;1) f (Sn + y) dy
B(0;1)
Z
1
= B(0;1) f (y) dy = f (Sn )
B(Sn ;1)

Let T ⊂ R+ and suppose {Xt , Ft : t ∈ T} is an adapted process. A function H on Ω × T

is called elementary if it is of the form
N
X
Ht (ω) = h0 (ω)1{0} + htj (ω)1(tj ,tj+1 ] (t)
j=1

where {0 = t0 = t1 < . . . tN +1 } ⊂ T, {htn : 0 ≤ n ≤ N } are bounded functions such that

hn ∈ Ftn . We define the function H · X
N
X
(20.6) (H · X)(ω) = h0 (ω)X0 + htj (ω)(Xtj+1 (ω) − Xtj (ω))
j=1

Observe that for any u ∈ T, the function H(t, ω)1[0,u] (t) is also elementary. Hence, we can
define a process (H · X)u by letting
(H · X)u = (H1[0,u] ) · X
It is straight forward to check that {(H · X)u , Fu : u ∈ T} is an adapted process.
20.3. Martingales and Stopping times 601

Lemma 20.3.4. Let {Ω, {Fu : u ∈ T}, F , P} a filtered probability space where T ⊂ R+ ,
Fu ⊂ F for all u ∈ T.
(i) If {Xu , Fu : u ∈ T} is a martingale, then so is {(H · X)u , Fu : u ∈ T}.
(ii) If {Xu , Fu : u ∈ T} is a supermartingale, then so is {(H · X)u , Fu : u ∈ T}.
(iii) If {Xu , Fu } and H ≥ 0, then {(H · X)u , Fu : u ∈ T} is a submartingale.

Proof. Let s ≤ u and let A ∈ Fs . As

H(t, ω)1[0,u] = H(t, ω)1[0,s] (t) + H(t, ω)1(s,u] (t),
we have that
1A (ω)(H · X)u (ω) = 1A (ω)(H · X)s (ω) + 1A (ω)(H1(s,u] · X)(ω).
Hence
(20.7) E[1A (H · X)u ] = E[1A (H · X)s ] + E[1A (H1(s,u] ) · X)].
The second term on the right hand side is the sum of terms of the form
E[(Xr − Xt )1A g] g ∈ Ft , s ≤ t < r ≤ u.
If X is a martingale, then
E[(Xr − Xt )1A g] = E[1A g E[Xr − Xt |Ft ]] = 0.
If X is a submartingale and H ≥ 0, then g ≥ 0 and
E[(Xr − Xt )1A g] = E[1A g E[Xr − Xt |Ft ]] ≥ 0.
If X is a supermartingale and H ≥ 0, then g ≥ 0 and
E[(Xr − Xt )1A g] = E[1A g E[Xr − Xt |Ft ]] ≤ 0.
The desired results follow readily from (20.7).
Theorem 20.3.5. Suppose {Xt , Ft : t ∈ T ⊂ R+ } is a martingale (submartingale, super-
martingale). Let ∞ ∈ R+ such that u < ∞ for all u ∈ T. If S and T , S ≤ T , are stopping
times taking values on a discrete countable subset of T ∪ {∞} then,
(i) X T − X S is a martingale (submartingale, supermartingale).
(ii) If T (Ω) is finte then E[XT |FS ] − XS = 0 (≥ 0, ≤ 0).

Proof. Let H(s, ω) = 1(S(ω),T (ω)] (s). Then, for any u ∈ T

H(ω, s)1[0,u] (s) = 1(0,T (ω)] (s) − 1(0,S(ω)] (s) 1[0,u] (s) = 1(S∧u,T ∧u] (s).
Let V = {0 ≤ t1 < . . . < tN ≤ u} be the set of values of S ∧ u and T ∧ u and set tN +1 = u.
For 1 ≤ k ≤ N , tk < s ≤ tk+1 and S ∧ u < s ≤ T ∧ u iff S ∧ u ≤ tk and T ∧ u ≥ tk+1 , or
equivalently, iff S ∧ u ≤ tk < T ∧ u. Therefore
N
X
H(s, ω)1[0,u] = 1{S∧u≤tk <T ∧u} (ω)1(tk ,tk+1 ] (s)
k=1
602 20. Martingales

is a nonnegative elementary function and

N
X
(H · X)u = 1[S(ω)∧u,T (ω)∧u) (tk )(Xtk+1 − Xtk ) = XT ∧u − XS∧u
k=1
(i) Lemma 20.3.4 implies that XT ∧u − XS∧u is a martingale (submartingale, supermartin-
gale) whenever X is a martingale (submartingale, supermartingale).
(ii) For any A ∈ FS it is easy to check that S A := 1A S + ∞ · 1Ac is a stopping time (See
Exercise 20.8.5). If T (Ω) is finite so is S(Ω); hence, XS , XT ∈ L1 (P). Setting p := max T (Ω)
we have that 1A 1(S,T ] = 1(S A ∧p,T A ∧p] , and part (i)
((1A H) · X)u = XTA ∧p∧u − XSA ∧p∧u = 1A (XT ∧u − XS∧u ).
From Lemma 20.2.5 XS ∈ FS ⊂ FT ; therefore,
0 = E[((1A H) · X)0 ] = (≤, ≥) E[((1A H) · X)p ] = E[XT A ∧p − XS A ∧p ]
= E[1A (XT − XS )] = E[1A (E[XT |FS ] − XS )]
whenever X is a martingale (submartingale, supermartingale).
Theorem 20.3.6. Suppose that {Ft : t ∈ T ⊂ R+ } is a martigale. Let T and S be
stopping times taking values in a countable subset T, and that T ≤ u for some u ∈ T. Then
E[XT |FS ] = XT ∧S .

Proof. Let {tn : n ∈ N} = T (Ω) so 0 ≤ tn ≤ u for all n. Since

X X
E[|XT |] = E 1{T =tn } |XT | = E[1{T =tn } |Xtn |]
tn ≤u tn ≤u
X
≤ E[1{T =tn } E[|Xu ||Ftn ]] = E[|Xu |] < ∞,
tn ≤u

XT ∈ L1 (P). We claim that E[Xu |FT ] = XT . Indeed, let A ∈ FT . Since A ∩ {T = tn } ∈

Ftn , it follows that
X X
E[1A Xu ] = E 1A∩{T =tn } Xu = E[1A∩{T =tn } Xtn ]
tn ≤u tn ≤u
X
= E[1A∩{T =tn } XT ] = E[1A XT ]
n
Therefore, from Theorem 20.2.8 we conclude that
E[XT |FS ] = E[E[Xu |FT ]|FS ] = E[Xu |FS∧T ] = XS∧T

20.4. Martingale convergence theorem

20.4.1. Doob’s up-crossing. In this section we study the path regularity of martingales.
Let a, b be a pair of real numbers such that a < b. A stochastic process X indexed by
T ⊂ R. has an up-crossing from a to b in the time interval [s, u] ∩ T if there is a pair of
instants s ≤ t1 < t2 ≤ u in T such that Xt1 ≤ a and Xt2 ≥ b. Suppose D is a countable
20.4. Martingale convergence theorem 603

[a,b]
dense set in T and let U[s,u]∩D be the number of up-crossings of X from a to b over the time
interval [s, u] ∩ D.
Theorem 20.4.1. (Doob’s up-crossing theorem) If {Xt , Ft : t ∈ T} be a submartingale,
then
[a,b] 1 1
(20.8) E[U[s,u]∩D ] ≤ (E[Xu+ ] + |a|) ≤ (E[|Xu |] + |a|).
b−a b−a
Proof. By translating to origin to s, we may assume without loss of generality that 0 =
s < u. Given a finite sequence SN = {s1 < . . . < sN } ⊂ [0, u] ∩ D, let
T0 = min{t ∈ SN : Xt ≤ a} ∧ u
T2k−1 = min{SN ∋ t > T2k−2 : Xt ≥ b} ∧ u
T2k = min{SN ∋ t ∈ T2k−1 : Xt ≤ a} ∧ u
[a,b]
for 1 ≤ k ≤ [N/2] + 1. It is easy to check that each Tj is a stopping time. Let UN denote
[a,b]
the number of up-crossings from a to b of X in SN . It is clear by definition that UN ∈ Fu .
Observe that Zt = a + (Xt − a)+ is a submartingale that has the same up-crossings in SN
as Xt . Consider the elementary process
[N/2]+1
X
H= 1(T2k ,T2k+1 ] .
k=0
[a,b]
If UN = m then m ≤ [N/2] + 1, each of the first m terms H · Z contributes at least (b − a),
the (m + 1)–th term contributes at most (Xu − a)+ , and the remaining terms are all zero;
therefore,
[a,b]
(b − a)UN ≤ (H · Z)u .
After taking expectations we obtain
[a,b] 1
E[UN ] ≤ E[(H · Z)u ].
b−a
If K = 1(0,u] − H, then
Zu − Z0 = 1(0,u] · Z = (H · Z)u + (K · Z)u .
As K is a nonnegative elementary process, Lemma 20.3.4(b) implies that
E[Zu − Z0 ] = E[(H · Z)u ] + E[(K · Z)u ]
≥ E[(H · Z)u ] + E[(K · Z)0 ] = E[(H · Z)u ].
Therefore
[a,b] 1 1
E[UN ] ≤ (E[(Xu − a)+ ] − E[(X0 − a)+ ]) ≤ E[(Xu − a)+ ].
(b − a) (b − a)
The estimate (20.8) follows from monotone convergence after taking supremum over all
finite sets S ⊂ D ∩ [0, u].
604 20. Martingales

Remark 20.4.2. If T is an interval of the form [u, v], Doob’s up-crossing Theorem implies
that
[ h [a,b] i
(20.9) Osc[u,v]∩D := U[u,v]∩D = ∞
a,b∈Q,a<b
is nearly empty. Therefore, the limits
Xu+ = lim Xq
D∋qցu
Xv− = lim Xq
D∋qրv

exists a.s. in R.

20.4.2. Doob’s regularity and convergence theorems.

Theorem 20.4.3. (martingale convergence theorem) Let {Xu , Fu : u ∈ T} be a submartin-
gale over T = [0, ∞) or T = Z+ . Let D be a countable dense subset of T. If supu E[Xu+ ] < ∞,
then
X∞ := lim Xq
D∋q→∞
exists a.s. and X∞ ∈ L1 .
[a,b]
Proof. Let D be a countable dense subset of T and UD denote the number of up-crossings
[a,b] [a,b]
from a to b of X over the range D. Then U[0,n]∩D ր UD as n → ∞. By monotone
convergence
[a,b] [a,b] supn E[Xn+ ] − |a|
E[UD ] = lim E[U[0,n]∩D ] ≤ < ∞.
n b−a
S [a,b]
Therefore OscD := a,b∈Q {UD = ∞} is nearly empty. Consequently, if Ω0 = Ω \ OscD ,
a<b
Ω
then X∞ := limD∋q→∞ Xq exists on as an element of R 0 . By Fatou’s Lemma,
+
E[X∞ ] ≤ lim inf E[Xq+ ] ≤ sup E[Xu+ ] < ∞
D∋q→∞ u
−
E[X∞ ] = lim inf (E[Xq+ ] − E[Xq ]) ≤ sup E[Xu+ ] − E[X0 ] < ∞.
D∋q→∞ u
Hence X∞ ∈ L1 and X∞ is finite a.e. in Ω0 .
Corollary 20.4.4. Suppose that X is a supermartingale and X ≥ 0 on Ω × T. Then
XD∋q→∞ Xq = X∞ exists a.e. and X∞ ∈ L1 . Furthermore, {Xt : t ∈ T ∪ {∞}} is a
supermartingale with respect to {Ft : t ∈ T ∪ {∞}}.

Proof. The process Yu = −Xu is a submartingale and sup E[Yu+ ] = 0. The conclusion
follows from the martingale convergence theorem 20.4.3. For the last statement assume
A ∈ Ft , t < ∞. By Fatou’s lemma, for any sequence {tn } ⊂ D with tn ր ∞
Z Z Z
X∞ dP ≤ sup inf Xtm dP ≤ Xt dP
A tn tm >tn A A
since {Xs : s ∈ T} is a supermartingale. Therefore E[X∞ |Ft ] ≤ Xt .
20.4. Martingale convergence theorem 605

Theorem 20.4.5. Suppose T = Z+ or T = [0, ∞). S X := {Xt , Ft : t ∈ T} is a uniformly

integrable martingale iff there exists an F∞ = σ t Ft –integrable function X∞ such that
Xt = E[X∞ |Ft ]. In either case, limt→∞ Xt = X∞ a.s. and in L1 .

Proof. Sufficiency is a direct consequence of Theorem 19.1.5.

To prove necessity, let T ∋ tn ր ∞. As supt E[|Xt |] < ∞, the martingale convergence
theorem 20.4.3 implies that Xtn converges a.s to a F∞ –measurable limit X∞ ∈ L1 . By
Theorem 8.7.5, uniform integrability implies that kXtn − X∞ k1 → 0. Given t ∈ T, choose
N large enough so that tn ≥ t if n > N . For any A ∈ Ft ,
Z Z Z
Xt dP = Xtn dP → X∞ dP.
A A A
Therefore, Xt = E[X∞ |Ft ] = for all t ∈ T ∪ {∞}.

A direct consequence of Theorem 20.4.5 is:

Corollary 20.4.6. (Lévy’s 0-1 law) If A ∈ F∞ then E[1A |Ft ] → 1A a.s. and in L1 .

A more technical application of Theorem 20.4.5 is:

Theorem 20.4.7. (Dominated convergence for conditional expectation) Suppose Xn −→ X
a.s. and |Xn | ≤ Z for all n where Z ∈ L1 (P). If Fn ր F then
E[Xn |Fn ] −→ E[X|F∞ ] a.s

Proof. Let WN = sup{|Xn − Xm | : n, m ≥ N }. Then 0 ≤ WN ≤ 2Z and WN ∈ L1 for all

N . By Theorem 20.4.5, for each N

lim sup E |Xn − X|Fn ≤ lim E[WN |Fn ] = E[WN |F∞ ].
n n

As WN ց 0 as N → N , Lemma 19.1.2(f) implies that E[WN |F∞ ] ց 0 a.s. Therefore

E[Xn |Fn ] − E[X|Fn ] ≤ E |Xn − X|Fn −→ 0
a.s. as n → ∞. A second application of Theorem 20.4.5 shows that E[X|Fn ] −→ E[X|F∞ ].
The desired result follows by the triangle inequality.
Theorem 20.4.8. (Reverse martingale theorem) Suppose T = Z− or T = (−∞, T 0]. If
{Xt , Ft : t ∈ T} is a martingale then, there exists a function X−∞ ∈ F−∞ := t∈T Ft to
which Xt converges as t → −∞ a.s. and in L1 . Moreover, X−∞ = E[X0 |F−∞ ].

Proof. Doob’s up–crossing theorem shows that

[a,b] 1
E[UD∩[t,0] ] ≤ (E[X0+ ] + |a|)
b−a
[a,b] [a,b]
for any rational pair a < b, and any dense subset D of T . Since UD∩[n,0] ր UD as n ց −∞,
[a,b] S [a,b]
by monotone convergence we conclude that E[UD ] < ∞; thus, OscD = a,b∈Q[UD = ∞]
a<b
is nearly empty. As X is uniformly integrable, Xt converges a.s. and in L1 to a function
606 20. Martingales

X−∞ as t ր −∞. It is readily seen that X−∞ ∈ F−∞ . If A ∈ F−∞ , then as F−∞ ⊂ Fq
for all q ≤ 0,
Z Z Z
q→−∞
X0 dP = Xq dP −→ X−∞ dP.
A A A

Therefore X−∞ = E[X0 |F−∞ ].

Theorem 20.4.9. (Law of P Large numbers) Let {Xn } ⊂ L1 (P) be an i.i.d sequence of
random variables. Let Sn = nk=1 Xk . Then,
1
(20.10) Sn → E[X1 ]
n
P–a.s. and in L1 (P).

1
Proof. For each n ≥ 1, define F−n := σ(Xk : k ≥ n) and let M−n = n Sn . Since {Xk } is
an i.i.d sequence, it follows that
1
E[Xk |F−n ] = Sn for all 1≤k≤n
n
Consequently, E[M−n |F−n−1 ] = M−n−1 . Hence, {M−n : n ∈ N} is a backwards martingale
and by Corollary 20.4.8, M−n converges P–a.s and in L1 (P) to M = E[M1 |F−∞ ].

For each k ∈ N M = limn→∞ Sn −S n . Hence M is measurable with respect to the Tk =

σ(X
T p : p > k) which means that M is measurable with respect to the tail σ–algebra T =
k Tk . By the Kolmogorov 0–1 law, we have that M = E[M ] P–a.s. The L1 convergence
of M−n shows that E[M ] = E[X1 ].

Theorem 20.4.10. (Path regularity of Martingales.) Suppose that S Ft , t ≥ 0, is a right–

continuous filtration and let {Xt , Ft } be a martingale. Then, Osc := n OscQ∩[0,n] is nearly
empty. LetΩ0 = Ω \ Osc and for each t ≥ 0 define

′ limQ∋qցt Xq (ω) ω ∈ Ω0
(20.11) Xt (ω) =
0 otherwise

Then, X ′ is a right–continuous with left limits FtP–adapted modification of X. Any other

such modification of X is indistinguishable from X ′ .

[a,b]
Proof. As X is a martingale, UQ∩[0,n] is integrable for each rational pair a < b and n ∈ Z+ .
Thus Osc is nearly empty. As {Xu : u ∈ [t, t + 1] ∩ Q} is a uniformly integrable martingale,
Xq converges a.s. and in L1 to Xt′ as Q ∋ q ց t. For all A ∈ Ft
Z Z Z
Q∋qցt
Xt dP = Xq dP −→ Xt′ dP.
A A A

Therefore, as Xt′ and Xt are Ft+

P = F P measurable, we conclude that X ′ = X a.s.
t t t
20.5. Optional stopping time theorems 607

Suppose that Y is another right–continuous with left–limits modification of X. Let d(x, y) =

| arctan x − arctan y|, then the right continuity of X ′ and Y implies that
[
{X ′ 6= Y } = {d(Xq′ , Yq ) > 0}
q∈Q+

The set on the righthand side is nearly empty since Xq′ = Xq = Yq a.s. for all q ∈ D+ .

20.5. Optional stopping time theorems

The following set of results are known as Doob’s optional times. They show that under
some regularity conditions, martingales properties are preserved by stopping times. We
first present three results concerning discrete time processes. The more technical continuous
time result will be treated at the end.

20.5.1. Discrete time optional stopping theorems. When T = R+ or Z+ , we say that

a martingale
supermartingale) {Xt , Ft : t ∈ T} is closable if X∞ is defined
(submartingale,
and Xt , Ft : t ∈ T ∪ {∞} is also a martingale (submartingale, supermartingale).
Theorem 20.5.1. (optional stopping time: discrete time u.i. martingales) Suppose X :=
{Xt , Ft : t ∈ T} is a uniformly integrable martingale and let T be a stopping time taking
values in a countable subset of T ∪ {∞}. Then, X is closable, XT ∈ L1 , and XT =
E[X∞ |FT ].

Proof. Closability follows from Theorem 20.4.5. Hence, there is X∞ ∈ σ Ft : t ∈ T such
that Xt = E[X∞ |Ft ] for all t ∈ T. Let {tk : k ∈ N} = T (Ω).

The integrability of XT follows from

X X
E |XT | = E |XT |1{T =tk } = E |Xtk |1{T =tk }
k k
X h i X
≤ E E[X∞ |Ftk ]1{T =tk } = E |X∞ |1{T =tk } = E |X∞ | ,
k k

where we have use the fact that |E[X∞ |Ft | ≤ E |X∞ |Ft for all t ∈ T ∪ {∞}.

To proof the last statement, let A ∈ FT . Then,

Z XZ XZ
X∞ dP = X∞ dP = Xtk dP
A k A∩{T =tk } A∩{T =tk }
XZ Zk
= XT dP = XT dP
k A∩{T =tk } A

Theorem 20.5.2. (optional stopping time: discrete time closable processes) Suppose {Xn :
n ∈ Z+ ∪ {∞}} is a closed martingale (resp. submartingale, supermartingale). Then,
for any stopping time T , XT ∈ L1 . If S ia another stopping time and S ≤ T , then
E[XT |FS ] = (resp. ≥, ≤)XS .
608 20. Martingales

Proof. Since Y is a submartingale iff −X is a supermartingale, it is enough to assume that

X supermartingale.
Case X ≥ 0 and X∞ = 0. By Theorem 20.3.5, X T is also a supermartingale and so,
E[XnT ] ≤ E[X0 ]. Applying Fatou’s lemma gives 0 ≤ E[XT ] ≤ lim inf n E[XnT ] ≤ E[X0 ] whence
we conclude that XT ∈ L1 . Theorem 20.3.5 shows that E[XT ∧n |FS∧n ] ≤ XS∧n for all
n ∈ Z+ . Since A ∩ {S ≤ n} ∈ FS∧n whenever A ∈ FS ,

E 1A 1{S≤n} XT ∧n ≤ E 1A 1{S≤n} XS∧n ≤ E 1A XS .
A second application of Fatou’s lemma shows that E[1A XT ] ≤ E[1A XS ]. This shows that
E[XT |FS ] ≤ XS .
General case Set Mn := E[X∞ |F n ]. Then, M is a uniformly integrable martingale and by
Theorem 20.5.1, E[MT |FS ] = E E[X∞ |FT ]|FS = E[X∞ |FS ] = MS . As Y = X − M is a
nonnegative supermartingale and Y∞ = 0,
E[XT |FS ] − E[MT |FS ] = E[XT − MT |FS ] = E[YT |FS ] ≤ YS = XS − MS .

The next example shows that the closable condition in Theorem 20.5.2 is necessary.
Example 20.5.3. Consider an i.i.d. sequence
P {ξn } of random variables with P[ξ1 = −1] =
P[ξ1 = 1] = 1/2. Let X0 = 0 and Xn = nk=1 ξk for n ≥ 1 and Fn = σ(Xk : k ≤ n). If T =
inf{n ≥ 0 : Xn = 1}, then P[T < ∞] = 1 and thus, E[XT ] = 1; however E[X0 ] = E[Xn ] = 0
for all n ∈ Z+ .
Corollary 20.5.4. (Wald inequality) Suppose that {Xn , Fn : n ∈ Z+ } is a martingale
(resp. submartingale, supermartingale) such that
h
E Xn+1 − Xn |Fn } ≤ B
for some constant B > 0. If T is a stopping time and T ∈ L1 (P) then, the stopped process
X T is a u.i martingale (resp. submartingale, supermartingale) and
E[XT ] = (resp. ≥, ≤) E[X0 ]

Proof. It is enough to suppose X is a submartingale. Theorem 20.3.5 shows that X T is

also a submartingale. Observe that
T
X −1 ∞
X
|XnT | ≤ |X0 | + |Xm+1 − Xm | = |X0 | + |Xm+1 − Xm |1{T >m} := Z
m=0 m=0
Since {T > m} ∈ Fm−1 , it follows that
h i
E[|Xm+1 − Xm |1{T >m} ] = E 1{T >m} E |Xm+1 − Xm ||Fm ≤ B P[T > m].
P
Adding over m ∈ Z+ , we obtain E[Z] ≤ B m P[T > m] = BE[T ] < ∞. Consequently,
X T is a u.i. submartingale, and by the martingale convergence theorem 20.4.3 it follows
that {XnT : n ∈ Z+ ∪ {∞}}, where XT = X∞ T = lim X T , is a submartingale. Thus, by
n n
T
Theorem 20.5.2, E[XN ] ≥ E[XS ] for any stopping times N ≥ S. in particular, for N = T
and S = 0 we get E[XT ] ≥ E[X0 ].
20.5. Optional stopping time theorems 609

Example 20.5.5. Suppose

P {ξn } ⊂ L1 is an i.i.d sequence of real random variables. Define
X0 = 0 = S0 , Sn = nk=1 ξk , Xn = Sn − nE[ξ1 ] for n ∈ N, and Fn = σ(Xk : k ≤ n).
If T is a stopping time and E[T ] < ∞, then E[ST ] = E[ξ1 ]E[T ]. Indeed, since ξn+1 and
Fn are independent, E[|Xn+1 − Xn ||Fn ] = E[|ξn+1 ||Fn ] = E[|ξ1 |] < ∞ for all n. Hence,
from Corollary 20.5.4, we conclude that X T is a u.i. martingale and E[ST ] − E[ξ1 ]E[T ] =
E[XT ] = E[X0 ] = 0. In particular, if P[ξ = ±1] = 12 , Sn = Xn for all n ∈ N. Hence, if
Tx = inf{n ≥ 0 : Xn = x}, x 6= 0, then E[Tx ] = ∞.

20.5.2. Continuous time optional stopping theorems.

Theorem 20.5.6. (optional stopping time: càdlàg Martingales) Suppose {Xt , Ft : t ∈ R+ }
is a right–continuous with left limits martingale with right–continuous filtration, and let T
be a stopping time taking values in R+ .
(i) If X is uniformly integrable (u.i.), then XT = E[X∞ |FT ] and the stopped process
{XtT , Ft : t ∈ R+ } is a u.i. martingale and XsT = E[XT |Fs ].
(ii) Even if X not necessarily u.i., the stopped process {XtT , Ft : t ∈ R+ } is a martin-
gale. If T is bounded, then X T is u.i.

Proof. (i) Suppose first that T takes values on a countable set {tk } ⊂ T. Theorem 20.5.1
shows that XT ∈ L1 and that E[X∞ |FT ] = XT .

For the general case, let Tn be a sequence of stopping times taking values on a countable
set and decreasing to T as in Lemma 20.2.9. By the backwards martingale theorem and
the right–continuity of X
E[X∞ |FT ] = lim E[X∞ |FTn ] = lim XTn = XT
n n

almost surely and in L1 .h The last statement in (i) follows from Theorem 20.2.8 since
i

XT ∧s = E[X∞ |FT ∧s ] = E E[X∞ |FT ] Fs = E[XT |Fs ].
(ii) Suppose s, t ∈ T, with s < t, and let A ∈ Fs . Notice that for any stopping time
{T > s} ∈ Fs ⊂ FT ∨s . Since the stopped process X t is a u.i. martingale, by part (i)
E[1A∩{T >s} XtT ] = E[1A∩{T >s} (X t )T ∨s ] = E[1A∩{T >s} E[X∞
t
|FT ∨s ]]
= E[1A∩{T >s} Xt ] = E[1A∩{T >s} Xs ] = E[1A∩{T >s} XT ∧s ]
Hence
E[1A XtT ] = E[1A∩{T ≤s} XtT ] + E[1A∩{T >s} XtT ]
= E[1A∩{T ≤s} XT ∧s ] + E[1A∩{T >s} XT ∧s ] = E[1A XsT ]
Therefore, E[XtT |Fs ] = XsT . If T is bounded, let τ = sup T . then X T = (X τ )T and by part
(i) it follows that X T is u.i. with XsT = E[XT |Fs ].
Lemma 20.5.7. (Chung) Suppose {X−n , F−n : n ∈ Z+ } is a reversed submartingale. If
{E[X−n ] : n ∈ Z+ } is bounded below, then {X−n } is uniformly integrable.
610 20. Martingales

+ −
Proof. Let ℓ = inf n E[X−n ] = limn E[X−n ] > −∞. We will show that {X−n } and {X−n }
+
are u.i. sequences. Notice that {X−n , F−n } is also a reversed submartingale.
+
λP[|X−n | > λ] ≤ E |X−n | = 2E[X−n ] − E[X−n ] ≤ 2E[X0+ ] − ℓ < ∞
Therefore, limλ→∞ supn P[|X−n | > λ] = 0. From the submartingale property, we have that
h i h i h i
+
E X−n 1{X + >λ} ≤ E X0+ 1{X + >λ} ≤ E X0+ 1{|X−n |>λ} ;
−n −n

+ −
whence we conclude that {X−n } is uniform integrable. It remains to show that {X−n } is
uniformly integrable. Given ε > 0, there is n0 such that |E[X−m ] − E[X−n ]| < ε/2 whenever
n ≥ m ≥ n0 . Then,
−
E X−n 1{X − >λ} = −E X−n 1{X−n <−λ} = E X−n 1{X−n ≥−λ} − E[Xn ]
−n

≤ E X−m 1{X−n ≥−λ} − E[Xn ] = E[X−m ] − E X−m 1{X−n <−λ} − E[Xn ]
ε ε
≤ − E X−m 1{X−n <−λ} ≤ + E |X−m |1{|X−n |>λ}
2 2
−
Setting m = n0 , letting λ → ∞ and then ε → 0, we conclude that {X−n } is uniformly
integrable. Therefore {Xn } is uniformly integrable.

Theorem 20.5.8. (optional stopping time: closable càdlàg processes) Suppose {Xt , Ft :
t ∈ R+ } is a closed right–continuous with left limits martingale (resp. submartingale, su-
permartingale). If S ≤ T are stopping times, then E[XT |FS ] = (resp. ≥, ≤) XS

Proof. Without loss of generality, we will assume X is a submartingale. As in Lemma 20.2.9,

let Sn and Tn be stopping times decreasing to S and T respectively and such that Sn ≤ Tn .
By Theorem 20.5.2, {XSn , XTn : n ∈ Z+ } ⊂ L1 and whenever m ≤ n,
E[XSm |FSn ] ≥ XSn , E[XTm |FTn ] ≥ XTn , E[XTm |FSn ] ≥ XSn .
Let Y−n = XSn , G−n = FSn , Z−n = XTn and H−n = FTn . Clearly the processes (Y, G )
and (Z, H ) are reversed submartingales; and since E[X0 ] ≤ E[XSn ] ≤ E[XTn ] for all n ∈
Z+ , they are uniformly integrable by Lemma 20.5.7. By Doob’s upcrossing theorem and
Theorem 8.7.5, we conclude that Y−n and Z−n converge P–a.s and in L1 to some integrable
functions Y−∞ and Z−∞ respectively.
T Since X is right–continuous, it follows that Y−∞ = XS
and Z−∞ = XT . Since FS = m FSm , for any A ∈ FS
E[XT 1A ] = lim E[XTm 1A ] ≥ lim E[XSm 1A ] = E[XS 1A ].
m m

Therefore, E[XT |FS ] ≥ XS .

20.6. Doob’s decomposition

A process {An : n ∈ Z+ } is previsible w.r.t. the filtration {Fn } if X0 ∈ F0 , and An+1 ∈ Fn
for all n.
20.6. Doob’s decomposition 611

Theorem 20.6.1. Let {Xn : n ∈ Z+ } ⊂ L1 (P) be adapted with respect to {Fn ; n ∈ Z+ }.

Then, X can be decomposed as
(20.12) Xn = X0 + Mn + An ; M0 = A0 = 0,
where M is a martingale, and A is a previsible process. Moreover, this decomposition
is unique modulus sets of measure 0. If in addition, X is a submartingale, then A is a
nondecreasing process, i.e., An ≤ An+1 .

Proof. If (20.12) holds, then by taking conditional expectation of Xn − Xn−1 w.r.t. Fn−1
we obtain
E[Xn − Xn−1 |Fn−1 ] = An − An−1 .
Consequently, we obtain and expression for A
n
X
(20.13) An = E[Xk − Xk−1 |Fk−1 ]; A0 = 0
k=1

Defining A as in (20.13) and letting

(20.14) Mn := Xn − X0 − An
gives the desired decomposition.

Doob’s decomposition introduces an important process that relates to martingales.

Definition 20.6.2. Let {Mn : n ∈ Z+ } ⊂ L2 be a martingale with M0 = 0. The compen-
sator process hM i associated to M is the unique previsible process with hM i0 = 0 such
that M 2 − hM i is a martingale.

Since M 2 is a submartingale, then Doob’s decomposition implies that

n
X n
X
hM in = E[Mk2 − Mk−1
2
|Fk−1 ] = E[(Mk − Mk−1 )2 |Fk−1 ]
k=1 k=1
This expression motivates the introduction of another important process associated to mar-
tingales
Definition 20.6.3. Let {Mn : n ∈ Z+ } ⊂ L2 be a martingale with M0 = 0. The quadratic
variation process [M ] associated to M is defined by
n
X
[M ]n = (Mn − Mn−1 )2 , [M ]0 = 0
k=1

Since (Mk − Mk−1 )2 = Mk2 − Mk−1

2 − 2Mk−1 (Mk − Mk−1 ), it follows that
Mn2 = (C · M )n + [M ]n
where Ck = 2Mk−1 . Thus, M 2 − [M ] is a martingale. In fact, from Doob’s decomposition,
it follows that hM i is the previsible part (compensator) of [M ].
612 20. Martingales

20.7. Doob’s maximal function

Suppose that X is a right–continuous submartingale w.r.t. {Ft : t ∈ T}. Clearly the
processes Xt♮ = sup0≤s≤t Xt+ and Xt∗ = sup0≤s≤t |X|t are increasing, right–continuous and
adapted. The following results state that these processes are controlled by the the process
X.
Lemma 20.7.1. (Doob’s maximal lemma) For any 0 ≤ t ∈ T,
Z
♮ 1 1
P[{Xt > λ}] ≤ Xt+ dP ≤ kXt+ kL1 (P)
λ {Xt♮ >λ} λ

Proof. Let Q be a countable dense set in T and let S ⊂ Q ∩ [0, t] be finite. Define MS =
maxs∈S Xs+ , and for T ∋ u > t fixed let U = min{s ∈ S : Xs > λ} ∧ u. Then U is a stopping
time taking values on a finite subset of T and
{U < u} = {U ≤ t} = {MS > λ} ⊂ {Xt♮ > λ}
λ1{U <u} = λ1{MS >λ} ≤ XU 1{U <u} = XU ∧t 1{U ≤t} ∈ Ft
By Theorem 20.3.5, E[Xt |FU ∧t ] ≥ XU ∧t , therefore
Z Z
1 1
P[{MS > λ}] ≤ XU ∧t dP ≤ E[Xt |FU ∧t ] dP
λ {U ≤t} λ {U ≤t}
Z Z Z
1 1 1
= Xt dP = Xt dP ≤ X + dP
λ {U ≤t} λ {MS >λ} λ {Xt♮ >λ} t
By taking suprema over all finite subsets S of {t} ∪ (Q ∩ [0, t]), Doob’s inequality follows
from the right–continuity of X.
Theorem 20.7.2. (Doob’s inequality.) Suppose that X is a right–continuous submartingale
w.r.t. Ft and let 1 ≤ p, q ≤ ∞ with p1 + 1q = 1. If {Xt+ : t ∈ T} ⊂ Lp (P) then

kXt♮ kp ≤ q kXt+ kp
(20.15)
♮
kX∞ kp ≤ q supt kXt+ kp
If X is actually a martingale and {Xt : t ∈ T} ⊂ Lp , then
kXt∗ kp ≤ q kXt kp
(20.16)
∗ k
kX∞ p ≤ q supt kXt kp
∗ k ≤ q kX k .
If X is u.i. then kX∞ p ∞ p

Proof. If p = 1 or p = ∞ the result is obvious. Suppose p > 1, and let S ⊂ Q ∩ [0, t] as

before, then MS = maxs∈S Xs+ ∈ Lp and
Z ∞ Z ∞Z
p
E[(MS ) ] = p p−1
λ P[{MS > λ}]dλ ≤ p λp−2 Xt+ 1{MS >λ} dP dλ
0 0 Ω
Z
p p
= p−1 Xt+ (MS )p−1 dP ≤ p−1 kXt+ kp kMS kp/q
p
20.7. Doob’s maximal function 613

Therefore kMS kp ≤ q kXt+ kp . The first inequality follows from the right–continuity of X
and by taking suprema over all finite subsets of {t} ∪ (Q ∩ [0, t]). The second by monotone
convergence once t → ∞.
(ii) If X is actually a martingale, then (20.16)
p from applying (20.15) to Yt =∗ |Xt |.
follows
If in addition, X is u.i. then |Xt |p = E[X∞ |Ft ] ≤ E[|X∞ |p |Ft ]. Therefore kX∞ kp ≤
q supt kXt kp ≤ kX∞ kp .
Corollary 20.7.3. If X is a right–continuous martingale w.r.t. {Ft : t ∈ T} and supt kXt kp <
∞, then limtրsup T Xt = X∞ exists a.s. and in Lp . Consequently, X is u.i.

Proof. Since kXt k1 ≤ kXt kp , it follows from the martingale convergence theorem 20.4.3
that limtրsup T Xt = X∞ exists a.s. To obtain convergence in Lp notice that
|Xt − X∞ |p ≤ (2Xt∗ )p .
The conclusion follows from Doob’s maximal inequality and dominated convergence.
Theorem 20.7.4. (Azuma–Hoeffding) Suppose {Xn , Fn } is a martingale such that
|Xn − Xn−1 | ≤ cn , n ≥ 1.
Then, for all m ∈ Z+ and a > 0
a2
(20.17) P[|Xm − X0 | > a] ≤ 2 exp − Pm 2 .
2 k=1 ck

Proof. For any numbers a and x, ax = −a 1−x 1+x

2 + a 2 ; thus, for |x| ≤ 1, the convexity of
the exponential function implies that
1 − x −a 1 + x a 2
(20.18) eax ≤ e + e = cosh(a) + x sinh(a) ≤ ea /2 + x sinh(a).
2 2
For any a > 0 and t > 0, inequality (20.18) gives
h m
X i
P[Xm − X0 > a] = P exp t (Xk − Xk−1 ) > eta
k=1
h m−1
X (Xm −Xm−1 ) i
≤ e−at E exp t (Xk − Xk−1 ) E etcn cn |Fm−1
k=1
h m−1
X i 2 2 m
X
−ta t2
≤ e E exp t (Xk − Xk−1 ) et cm /2 ≤ e−ta exp 2 c2k .
k=1 k=1

A similar estimate for P[Xm − X0 < −a] is obtained by setting Xn′ = −Xn . Hence,
m
X
−ta t2

P[|Xm − X0 | > a] ≤ 2e exp 2 c2k .
k=1
Pm 2
The choice t = a/( k=1 ck ) implies (20.17).
614 20. Martingales

Example 20.7.5. Pn Let Xn be a sequence of integrable i.i.d. random variables and let
S0 = 0, Sn = k=1 (Xk − E[X1 ]) for n ≥ 1. Then, Sn is a martingale with respect to
Fn = σ(Xk ; k ≤ n) and Sn − Sn−1 = Xn − E[X1 ] for allP n ≥ 1. Suppose that |X1 | ≤ C with
probability one for some C > 0. Then, with X n = n1 nk=1 Xk ,
h1 i a2 n
P |X n − E[X1 ]| > a = P |Sn | > a ≤ 2 exp −
n 8C 2
for all a > 0. An L1 rate of convergence can be derive for the strong law of large numbers
by integrating over a,
Z ∞ √
2 2 2C 2π
E[|X n − E[X1 ]|] ≤ 2 exp(−a n/8C ) da = √
0 n

20.8. Exercises
Exercise 20.8.1. If X : (Ω, F ) −→ (S T , S ⊗T ) is measurable and P is a probability
measure on F , show that family of probability measures µI = P ◦ (pI ◦ X)−1 on S ⊗I ,
where I ⊂ T and I is finite, is projective. This shows that every stochastic process has a
canonical representation.
Exercise 20.8.2. Show that P is generated by the sets of the form A×(a, b] where A ∈ Fb
and 0 ≤ a < b < ∞. Show that P ⊂ O.
Exercise 20.8.3. Suppose T is a stopping time with respect the filtration {Ft : t ∈ [0, ∞)}.
Show that T −1 (A) ∈ Ft for any A ∈ B([0, t]) and t ≥ 0.
Exercise 20.8.4. If T = Z+ show that T is a stopping time iff {T = n} ∈ Fn for any
n ∈ Z+ .
Exercise 20.8.5. Suppose T be a stopping time, and let A ∈ FT . Define T A = 1A T +∞1Ac
(under the convention that ∞ · 0 = 0). show that T A is a stopping time.
Exercise 20.8.6. (a) Suppose X is a Martingale w.r.t {Ft }, and let ϕ be a convex function
such that Y = ϕ ◦ X ∈ L1 . Show that Y is a submartingale.
(b) Suppose that X is a real–values submartingale instead, and that ϕ is a convex, nonde-
creasing function such that Y = ϕ ◦ X ∈ L1 . Show that Y is a submartingale.
Exercise 20.8.7. Suppose that T and S are stopping times taking values on a discrete
countable set and that max T := u < ∞. If X is a submartingale, show that XT ∈ L1 , that
E[Xu |FT ] ≥ XT , and that E[XT |FS ] ≤ E[Xu |FT ∧S ].
Chapter 21

Applications of
Martingale theory

21.1. Differentiation
We will present versions of the Radon–Nikodym theorem for stochastic kernels using a
technique based on martingales.
S
Theorem 21.1.1. Let (Ω, {Bn : n ∈ N}) be a filtered space and let B = σ n Bn .
Suppose µ and ν are a probability measure and a finite measure on (Ω, B) respectively.
Denote by µn and νn the restrictions of µ and ν to Bn respectively. Suppose that νn ≪ µn
dνn
for all n and let Xn = dµ n
. Then
(i) {Xn , Bn } is a martingale.
(ii) lim supn Xn = X ∈ L1 (µ),
Z
ν(A) = X dµ + ν(A ∩ µ({X = ∞})) = νa (A) + νs (A),
A

and νa ≪ µ and νs ⊥ µ is the Radon–Nikodym decomposition of ν w.r.t. µ.

Proof. (i) Since Xn is Bn –measurable, for any A ∈ Bn ⊂ Bn+1

Z Z Z Z
Xn+1 dµn+1 = Xn+1 dµ = νn+1 (A) = νn (A) = Xn dµn = Xn dµ,
A A A A

that is, Eµ [Xn+1 |Bn ] = Xn .

(ii) Since Xn ≥ 0, Corollary 20.4.4 implies that limn Xn = X exists a.s. and X ∈ L1 (µ).
By replacing ν by ν/ν(Ω), we may assume that ν is also a probability measure. Define
µ+ν µn + ν n
ρ= , ρn =
2 2

615
616 21. Applications of Martingale theory

Then, ρn is the restriction of ρ to Bn , and

(21.1) µ ≪ ρ ≪ µ, µn ≪ ρ n ≪ µ n , ν n ≪ ρn .

Set Yn = dµ dνn
dρn and Zn = dρn . Then, Yn , Xn ≥ 0, Ym + Zn = 2, and so by part (i),
n

{Yn , Bn } and {Zn , Bn } are a nonnegative bounded martingales with respect to ρ. The
martingale convergence theorem 20.4.3 and dominated convergence
S imply that limn Yn = Y
and limn Zn = Z exist ρ–a.s. and in L1 (ρ). For any A ∈ n Bn ,
Z Z
µ(A) = lim µn (A) = Yn dρ = Y dρ
n
ZA ZA
ν(A) = lim νn (A) = Zn dρ = Z dρ
n A A
S
Consequently, as n Bn is a π–system, by Dynkin’s monotone class theorem we have that
(21.2) dµ = Y dρ, dν = Z dρ.
dνn
As Xn = dµ n
, we have that Xn = ZYnn ρ–a.s., and hence µ–a.s. Since Y + Z = 2 ρ–a.s.,
it follows that ρ(Y = 0 = Z) = 0, and so X = YZ ρ-a.s., and hence µ–a.s. By (21.1)
and (21.2) we have that µ({Y = 0}) = 0, and ρ({Y = 0}△{X = ∞}) = 0. Thus, 1 =
Y Y1 1{Y >0} + 1{Y =0} , and so
Z Z Z
Z
ν(A) = Z dρ = 1{Y >0} Y dρ + Z1{Y =0} dρ
A A Y A
Z
= X dµ + ν(A ∩ {X = ∞}).
A

Since µ({X = ∞}) = µ({Y = 0}) = 0, we conclude that νs ⊥ µ.

The assumption νn ≪ µn can be relaxed if we assume that B is countably generated.

In such case, there is a sequence
S of finite partitions Bn ⊂ B of Ω such that
S Bn+1 is a
refinement of Bn (i.e., B = {B ′ : Bn+1 ∋ B ′ ⊂ B} for each B ∈ Bn ), and σ( n Bn ) = B.
In particular, σ(Bn ) ⊂ σ(Bn+1 ).

Theorem 21.1.2. Suppose (Ω, B, µ) is a countably generated probability space and let Bn
be as above. Given a finite measure ν on B, define
X ν(B)
(21.3) Xn (ω) = 1B (ω)1(0,∞) (µ(B) ∨ ν(B))
µ(B)
B∈Bn

with the understanding that 0 · ∞ = 0. Then,

(i) {Xn , σ(Bn )} is a nonnegative submartingale w.r.t µ, and a martingale if ν ≪ µ.
(ii) Xn converges µ–a.s. to some B–measurable function X∞ .
(iii) If ν = νa + νs , νa ≪ µ and νs ⊥ µ, is the Radon–Nikodym decomposition of ν
w.r.t. µ then X∞ = dνdµ .
a
21.1. Differentiation 617

Proof. Notice that Xn = ∞ in B if µ(B) = 0 and ν(B) > 0, and Xn = 0 on B if

ν(B) = µ(B) = 0. It is obvious that Xn ≥ 0 and Eµ [Xn ] ≤ ν(Ω) < ∞, and so Xn ∈ L1 (µ).
(i) and (ii): Fix B ∈ Bn with µ(B) 6= 0. Then
1B X
1B Eµ [Xn+1 |σ(Bn )] = {ν(B ′ ) : B ′ ∈ Bn+1 , B ′ ⊂ B, µ(B ′ ) > 0} ≤ Xn 1B
µ(B)
This shows that {Xn , σ(Bn )} is a nonnegative supermartingale; moreover, if ν ≪ µ on
B then, Xn is a martingale w.r.t. {σ(Bn ) : n ∈ N}. Corollary 20.4.4 implies that Xn
converges µ–a.s. to a function X∞ ∈ L1 (µ).
(iii) Suppose ν = νa + νs , with νa ≪ µ and νs ⊥ µ, is the Radon–Nikodym decomposition
of ν w.r.t. µ. Decompose as Xn = Xna + Xns where
X νa (B)
(21.4) Xna (ω) = 1B (ω)1(0,∞) (µ(B) ∨ νa (B))
µ(B)
B∈Bn
X νs (B)
(21.5) Xns (ω) = 1B (ω)1(0,∞) (µ(B) ∨ νs (B))
µ(B)
B∈Bn

There
Sis
s
∈
X∞ L+
1 (µ) to which Xns converges µ–a.s. By Fatou’s lemma and (21.5), for any
B ∈ n σ(Bn )
Z Z
s
(21.6) X∞ dµ ≤ lim inf Xns dµ = νs (B)
B n B
By monotone convergence,
S the class of sets in B that satisfy (21.6) is a monotone class
containing the algebra n σ(Bn ). Hence, by the monotone class theorem 3.5.2, (21.6) holds
for all sets in B. As νs ⊥ µ, there is B ∈ B such that νs (B) = 0 = µ(Ω \ B). This means
that
Z Z Z
s s s
X∞ dµ = X∞ dµ + X∞ dµ = 0
B Ω\B
It follows that s
X∞ = 0 µ–a.s., and so X∞ = a
X∞ µ–a.s.
dνa
To conclude, suppose that Y = dµ . For any B ∈ Bn
Z Z
νa (B) = Y dµ = Xna dµ
B B
that is, E[Y |σ(Bn )] = Xna .
This means that a
{Xn , σ(Bn )}
is uniformly integrable martingale;
hence, by Theorem 20.4.5, Y = limn Xna µ–a.s. and in L1 (µ). Therefore Y = X∞ µ–a.s.
Remark 21.1.3. The martingale approach developed above can be used to extend the
notion of symmetric derivative to general measures. For example, Theorem 21.1.2 should
be compared with Corollary 11.1.9 by considering a d–dimensional interval I with integer
vertices, µ as the normalized Lebesgue measure on I, and ν any finite Borel measure on I.
As partitions Bn , we may consider dyadic boxes contained in I.

We conclude this section with a result that establishes the measurability of Radon–
Nikodym’s decomposition between σ–finite kernels.
618 21. Applications of Martingale theory

Theorem 21.1.4. (de Possel, Doob) Let µ and ν be σ–finite kernels from (S, S ) to (T, T ).
If T is countably generated then, there is a measurable function X : (S×T, S ⊗T ) → [0, ∞]
such that for all B ∈ T
Z

(21.7) ν(s, B) = X(s, t)1{X<∞} µ(s, dt) + ν s, B ∩ {X = ∞}
B

For each s ∈ S, X(s, ·) is unique (µ + ν)s –a.s.; kernels ν a := (X1{X<∞} ) · µ and ν s :=

1{X=∞} · ν are the absolutely continuous and the singular part, respectively, of the Radon–
Nikodym decomposition of ν w.r.t. µ; sets {r ∈ S : ν(r, dt) ≪ µ(r, dt)} and {r ∈ S :
ν(r, dt) ⊥ ν(r, dt)} belong to S .

Proof. Suppose µ and ν are σ–finite. Then ρ := µ + R ν is also σ–finite; hence, there exists
a function f : (S × T, S ⊗ T ) → (0, ∞) such that f (s, t)ρ(s, dt) = 1{ρ(s,T )6=0} . We may
assume without loss of generality that ρ(s, T ) > 0 for all s ∈ S. It follows that ρ′ = f · ρ is
a stochastic kernel, and µ′ = f · µ, and ν ′ = f · ν are finite kernels.

Let Bn ⊂ B be a sequence of finite partitions of T such that Bn+1 is a refinement of Bn .

For each n, let
X ν ′ (s, B)
(21.8) Xn (s, t) = 1B (t)1(0,∞) (ρ′ (s, B)).
ρ′ (s, B)
B∈Bn

Notice that 0 ≤ Xn ≤ 1. For each s ∈ S, the sequence of functions t 7→ Xn (s, t) is a

bounded martingale with respect to {σ(Bn ) : n ∈ N} and the measure ρ′s (·) := ρ′ (s, ·).
Hence, Xn (s, ·) converges ρ′s –a.s. (and in L1 (ρ′s ) by dominated convergence) to some T –
measurable function 0 ≤ X ′ (s, ·) ≤ 1. X ′ admits a product–measurable version given by
X ′ (s, t) = lim sup Xn (s, t), (s, t) ∈ S × T.
n

From (21.8) we have that

Z
′
ν (s, B) = Xn (s, t)ρ′ (s, dt), B ∈ Bn
B

By L1 (ρ′s ) convergence, ′ ′ ′
ν (s, dt) = X (s, t) · ρ (s, dt) in σ(Bn ) for each n ∈ N, which extends
S
to B = σ n σ(Bn ) by the monotone class theorem. Any other function Y such that
Y · ρ = X ′ · ρsatisfies Y (s, ·) = X ′ (s, ·) ρs –a.s. for each s ∈ S. From ρ′ = µ′ + ν ′ , we obtain
(1 − X ′ ) · f · ν = (X ′ · f ) · µ. Hence

f · ν = (1{X ′ <1} · f · ν + 1{X ′ =1} · f ) · ν
X′
= 1{X ′ <1} ′
· f · µ + 1{X ′ =1} · f · ν
1−X

= 1{X<∞} X · f · µ + (1{X=∞} f · ν
X′

where X = 1−X ′ with 1/0 := ∞. Consequently ν = 1{X<∞} X · µ + (1{X=∞} · ν.
21.2. Disintegration of Stochastic kernels 619

a
a
RClearly ν := 1{X<∞} X · µ ≪ µ, and for any B ∈ T the map ′ s 7→ ν (s, B) =
B X(s, t)1{X(s,·)<∞} (t)µ(s, dt) is S –measurable. Since {X = ∞} = {X = 1} and
Z Z
′ ′ ′ ′
µ (s, X (s, ·) = 1) = X (s, t)µ (s, dt) = (1 − X ′ (s, t))ν ′ (s, dt) = 0,
{X ′ (s,·)=1} {X ′ (s,·)=1}

we conclude that µ ⊥ ν s := (1{X=∞} · ν. The measurability of r 7→ ν s (r, B) follows from
1
ν s = ν − ν s . The uniqueness of X follows from the uniqueness of X ′ = 1 − 1+X .

21.2. Disintegration of Stochastic kernels

In this section we use Doob’s martingale method described above to obtain disintegration
formulas for stochastic kernels from arbitrary measurable spaces to product of nice spaces
(product of Polish spaces or of Borel sets of Polish spaces).
Theorem 21.2.1. Let ρ be a stochastic kernel from a measurable space (S, S ) to (T, T ) ⊗
(U, U ), where (T, T ) and (U, U ) are Borel sets of Polish spaces. There exists stochastic
kernels ν from S to T and µ from S × T to U such that ρ = ν ⊗ µ, that is,
Z Z Z
f (t, u) ρ(s, dt × du) = f (t, u)µ((s, t), du) ν(s, dt)
T ×U T U
for all nonnegative T ⊗ U –measurable function f .

Proof. For each B ∈ U consider the measures νB (s, dt) = ρ(s, dt × B). It is easy to verify
that each νB is a kernel from S to T ; moreover, νB ≤ νU for all B ∈ U , and so νB ≪ νU ,
and νU is a stochastic kernel. We use ν to denote νU . As (T, T ) is countably generated,
by Theorem 21.1.4, there exists a measurable function hB : S × T → (0, ∞) such that
νB = hB · νU for each B ∈ U . For each s ∈ S and B ∈ B, the function hB (s, ·) is uniquely
determined ν(s, dt)–a.s., and
(1) hU (s, ·) = 1 ν(s, dt)–a.s.
(2) For all A, B ∈ U with A ⊂ B, hA (s, ·) ≤ hB (s, ·) ν(s, dt)–a.s.
(3) For any monotone sequence {Bn :∈ N} ⊂ U with Bn ր B, hBn (s, ·) ր hB (s, ·)
ν(s, dt)–a.s.
We will consider only the case where U is uncountable in which case, by the measurable
isomorphism theorem 3.9.15, we may assume without loss of generality that (U, U ) =
(R, B(R)). For each s ∈ S and each set D ⊂ S × T , we denote Ds = {t ∈ T : (s, t) ∈ D}.
The sets defined by
Ω(p, q) = {(s, t) : h(−∞,p] (s, t) ≤ h(−∞,q] (s, t)},
where p < q, and
Ω(−∞) = {(s, t) : inf h(−∞,p] (s, t) = 0}
p∈Q
Ω(+∞) = {(s, t) : sup h(−∞,p] (s, t) = 1}
p∈Q
620 21. Applications of Martingale theory

are S ⊗ T –measurable, and properties (1)–(3) imply that

ν s, Ω(p, q) s = ν s, Ω(−∞) s = ν s, Ω(+∞) s = 1
T
for each s ∈ S and all p < q. Thus Ω := Ω(−∞) ∩ Ω(+∞) ∩ Ω(p, q) is S ⊗ T –
p,q∈Q
p<q
measurable, ν(s, Ωs ) = 1 for each s ∈ S, and
(21.9) H((s, t), x) := inf h(−∞,q] (s, t)1Ω (s, t) + 1[0,∞) (x)1Ωc (s, t).
q∈Q
x<q

is a distribution function for each (s, t) ∈ S × T , i.e.

lim H((s, t), x) = 0, lim H((s, t), x) = 1,
x→−∞ x→+∞

and x 7→ H((s, t), x) is nondecreasing and right–continuous. Since infima in (21.9) is taken
over a countable set, for any x ∈ R the map (s, t) 7→ H((s, t), x) is S ⊗ T –measurable.
As a consequence, for each (s, t) there is a unique measure µ((s, t), du) on (U, U ) whose
distribution is given by H((s, t), ·). We claim that µ is in fact a kernel from S × T to U .
Indeed, let C be the collection of sets B ∈ U for which the map (s, t) 7→ µ((s, t), B) is
S ⊗ T –measurable. It is obvious that C is a λ–system that contains all the intervals (a, b],
−∞ ≤ a < b < ∞. Sierpinski’s monotone class theorem implies that C = U .

Let ρe = ν ⊗ µ. We prove that ρe = ρ. It is enough to show that

(21.10) ρe(s, A × B) = ρ(s, A × B)
for all s ∈ S, A × B ∈ T × U . Fix s ∈ S and A ∈ T . Then, by construction, for any q ∈ Q
Z Z !
ρe(s, A × (−∞, q]) = 1A (t) µ((s, t), du) ν(s, dt)
Ωs (−∞,q]
Z

= µ (s, t), (−∞, q] ν(s, dt)
ZΩs ∩A
= h(−∞,q] (s, t) ν(s, dt) = ρ(s, A × (−∞, q])
A
By monotone convergence ρe(s, A×(−∞, x]) = ρ(s, A×(−∞, x]) for all x ∈ R. By monotone
class arguments it follows that ρe(s, A × B) = ρ(s, A × B) for all B ∈ U . As A is arbitrary,
we conclude that (21.10) holds for each s ∈ S and all measurable boxes A × B ∈ T × U ;
moreover, for each s ∈ S and B ∈ U , hB (s, ·) = µ((s, ·), B) ν(s, dt)–a.s.

21.3. Exchangeability
Suppose X = {Xi : i ∈ I} is a collection of measurable functions defined on some probability
space (Ω, F , P) with values in some Polish space (E, d). Denote by SI the collections of all
finite permutations ρ on I, that is, ρ : I → I is bijective and ρ(i) = i for all but finitely any
i ∈ I.
21.3. Exchangeability 621

Definition 21.3.1. The collection X is exchangeable if the joint law of X is the same as
the joint law of Xρ = {Xρ(i) : ρ ∈ SI }, for any ρ ∈ SI .

A simple example is an i.i.d sequence {Xn }. In this section we will derive a result that
characterizes exchangeable sequences of E–valued functions. For simplicity, we will consider
the canonical probability space of sequences in E invariant under finite permutations; that
is Ω = E N, F = B ⊗N(E), and P is a probability measure on F such that P = P ◦ ρ−1 for
any ρ ∈ SN. We will use Xn (x) = xn to denote the projection onto the n–th component (or
copy of E).
Give a permutation ρ ∈ Sn (Sn := S{1,...,n} ), we will consider its extension, also denoted
by ρ, to SN by setting ρ(j) = j for all j > n. Similarly, given a measurable function
f : E n −→ R, we consider its extension the E N, denoted also by f , as f (x1 , . . . , xn , . . .) =
f (x1 , . . . , xn ).
A measurable function F : E N → R is symmetric if F (xρ ) = F (x) for all ρ ∈ SN. A
measurable function f : E n → R is n–symmetric if f (xρ ) = f (x) for all ρ ∈ Sn .
P
Example 21.3.2. S(x) = n1 nk=1 xk is n–symmetric, but not symmetric as a function
P
on RN. The function a(x) = lim supn n1 nk=1 xk is symmetric R–valued function. If B ∈
B(E),F (x) = 1{Xn ∈B, i.o} (x) is symmetric.

The n–exchangeable σ–algebra, En , is the σ–algebra generated by the n–symmetric

functions. Similarly, the exchangeable σ–algebra, E, is the σ–algebra generated by the
symmetric functions.
Given a measurable function f : E n −→ R, we defined its n–symmetrization as the
function An (f ) given by
1 X
An (f )(x) = f (xρ )
n!
ρ∈Sn

Lemma 21.3.3. Let X = (Xn )n∈N be an exchangeable sequence. If g : Ek → R is measur-

able and g ◦ X ∈ L1 , then
E[g(X)|En ] = E[g(Xρ )||En ] = An (g)(X)
for all ρ ∈ Sn .

Proof. Suppose A ∈ En , then 1A (Xρ ) = 1A (X) for all ρ ∈ Sn . Hence,

Z Z Z
1A (X)g(X) dP = 1A (Xσ )f (Xσ ) dP = 1A (X)g(Xσ ) dP

1 P
This shows that E[g(X)|En ] = E[g(Xρ )|En ]. Since An g(X) = n! ρ∈Sn g(Xρ ) = An (g)(Xρ ),
we conclude that E[g(X)|En ] = An g(X).

The following result is a direct application of the backwards martingale theorem.

622 21. Applications of Martingale theory

Theorem 21.3.4. Suppose g : E k −→ R is a measurable function such that g ◦ X ∈ L1 .

Then
An g(X) → E[g(X)|E] = E[g(X)|T ] n→∞
P–a.s. and in L1 , where T is the tail σ–field.

Proof. Observe that En is a decreasing sequence of σ–algebras and that

An g(X) = E[g(X)|En ]
Thus (An g(X), En ) defines a reversed
T martingale, so An g(X) converges in P–a.s and in
L1 to Ag(X) = E g(X)|E since n En = E . To show that Ag(X) ∈ T , we show that
Ag(X) ∈ σ(Xj : j > m) for all m ∈ N. For m fixed, let ∆nm be the collection of permutations
ρ ∈ Sn such that ρ({1, . . . , k}) ∩ {1, . . . , m} =
6 ∅. Then
1 X 1 X
An g(X) = g(Xρ ) + g(Xρ ) = Bn,m + Gn,m .
n! n
n! n
ρ∈∆m ρ∈∆
/ m

The term Bn,m on the right–side involves at most (n − 1)!m terms; thus, by the exchange-
ability of X, kBn,m k ≤ m
n kg(X1 , . . . , Xk )k1 → 0 as n → ∞. Therefore, E[Ag(X)|E] =
limn Gn,m ∈ σ(Xj : j > m).
Theorem 21.3.5. (de Finetti) The sequence (Xn ) is exchangeable iff there exists a σ–
algebra A ⊂ F such that (Xn ) is i.i.d. conditioned to A . In either case, A can be taken
to be the exchange σ–algebra E or any sub σ–algebra of it.

Proof. Necessity: Suppose (Xn ) is exchangeable.

Qn For each k ∈ N let fk : E → R be a
bounded measurable function. Let gk (x) = k=1 fk (xk ). Then
n
1 X 1X
An (gk−1 )(X)An (fk )(X) = gk−1 (Xρ ) fk (Xj )
n! n
ρ∈Sn j=1
n−k+1 1 X
= gk (Xρ ) + En,k (X)
n n!
ρ∈Sn
n−k+1
= An (gk )(X) + En,k (X)
n
The term En,k (X) is estimated by
n
1 1 XX
kEn,k (X)ku ≤ kgk−1 ku kfk ku 1ρ({1,...,k−1}) (j)
n! n
ρ∈Sn j=1
1 1
= kgk−1 ku kfk ku n!(k − 1) → 0
n! n
Therefore, if A = E or T ,

E gk−1 (X1 , . . . , Xk−1 )|A E fk (X1 )|A = E gk−1 (X1 , . . . , xk−1 )|A E fk (Xk )|A
= E[gk−1 (X1 , . . . , Xk−1 )fk (Xk )|A ];
21.4. Exercises 623

that is, conditioned to A , (Xn ) is an i.i.d. sequence.

Sufficiency: Suppose (Xn ) is i.i.d. when conditioned to A . Then, for any integrable function
φ : E n −→ R,

E[φ(X)] = E E[φ(X)|A ] = E E[φ(Xρ )|A ] = E[φ(Xρ )]
for all ρ ∈ SN.

For each n ∈ N there is a natural measurable map ξn from (Ω, F , P) to the space of
1 Pn
probability measures on (E, B(E)) given by x 7→ n k=1 δxk . This is the n–empirical
measure of P. The following result gives a detailed description of the limiting distribution
of the empirical measures ξn as n → ∞.
Theorem 21.3.6. (Xn ) is an exchangeable sequence iff there is a σ–algebra A and a
A –measurable random measure ξ on (Ω, F ) such that conditioned to A , (Xn ) is an i.i.d
sequence with
Z
E[g(X1 )|A ] = E[g(X1 )|ξ] = g(x) ξ(dx)

for any bounded g ∈ B(E). Moreover, for any g ∈ Cb (E),

Z Z
An (g)(X) = E[g(X1 )|En ] = g(y) ξn (dy) → g(y) ξ(dy)
as n → ∞.

Proof. Sufficiency is contained in de Finetti’s theorem.

Necessity: By de Finetti’s theorem, the sequence (Xn ), conditioned on A = E, is i.i.d.
Since E is a Polish space, there exists a regular conditional probability
ξ(x; ·) = P[X1 ∈ ·|A ](x),
which is obviously A –measurable.
The last statement follows from Theorem 21.3.4 with k = 1 and g ∈ Cb (E).

21.4. Exercises
Exercise 21.4.1. Apply Theorem 21.1.1 to provide a probabilistic proof of the Radon–
Nikodym theorem in the case where B is countably generated. Observe that the conditional
expectations that are needed have elementary definitions in this case and there is no need
of circular reasoning.
Appendix A

Infinite series on
Banach spaces

P
Recall that
Pn for any numerical sequence
P {an : n ∈ N}, the series n aP
n is convergent if
limP
n→∞ a
k=1 k exits. The series a
n n is absolutely convergent when n |an | converges.
P
If n an converges but it is not absolutely convergent, we Psay that the series n an is
conditionally
P convergent. It is a well known result that when a
n n is absolutely convergent,
then n an is convergent. All this is easily extended to sequences {an : n ∈ N} in a complete
(real or complex) normed space X by substituting the modulus | | in R or C by the norm
k k in X. Through out this section, we will assume that (X, | |) is a Banach space.

A.1. Properties of absolutely convergent series

P
LetPf : N → N be a bijection. Let bn = af (n) , then the series n bn is called a rearrangement
of n an . We have the following result.
P P
Theorem P A.1.1. Suppose
P n a n is absolutely convergent, and
P let Pn bn be any rearrange-
ment of n an . Then, n bn is absolutely convergent and n an = n bn .

Proof. For any n ∈ N, let N = N (n) = max{f (k) : 1 ≤ k ≤ n}. Then

n
X N
X ∞
X
|bk | ≤ |ak | ≤ |ak |
k=1 k=1 k=1
P P
Thus is absolutely convergent. Let s := n an . Given ε > 0 there is N1 = N1 (ε) ∈ N
n bn
P
n P
such that | ak − s| < 2ε and |ak | < 2ε whenever n ≥ N1 . Setting N2 := max{f −1 (k) :
k=1 k>n

625
626 A. Infinite series on Banach spaces

1 ≤ k ≤ N1 } we obtain that {1, . . . , N1 } ⊂ {f (1), . . . , f (N2 )}. Hence, for n > N2

Xn X n N1
X X N1

bk − s ≤ bk − am + a m − s
k=1 k=1 m=1 m=1
X ε
< |ak | + < ε
2
n>N1
P
Theorem A.1.2. Let f : N × N → N be a bijection. Suppose n an is an absolutely
convergent series, and for each k ∈ N define bk (n) := af (k,n) . Then,
P P
(i) For each k, n bk (n) is an absolutely convergent subseries of n an .
P P P P
(ii) If sk := n bk (n),the series k sk converges absolutely and k sk = n an .
Pk
Proof. (i) is a consequence of Theorem A.1.1. To prove (ii), set tk := j=1 |sj |. Then

X ∞
k X ∞ X
X k ∞ X
X k ∞
X
|tk | ≤ |bj (n)| = |bj (n)| = |af (j,n) | ≤ |an |
j=1 n=1 n=1 j=1 n=1 j=1 n=1
P P
Thus, k sP k is absolutely convergent. Let S = n an . Given ε > 0, there is N = N (ε)
ε
such that Sr k
k>n |a | < 2 whenever n > N . Let r be the minimum integer such that
{1, . . . , N } ⊂ j=1 f (j, N). For n > N ∨ r,
Xn n
X X ε

sk − ak ≤ |an | <
2
k=1 k=1 n>N
X n X∞ X ε

ak − an ≤ |ak | <
2
k=1 n=1 k>n
P P

Putting things together, we have that nk=1 sk − ∞ n=1 an < ε whenever n > N ∨ r. This
completes the proof of (ii).
Lemma A.1.3. For any sequence (an ) ⊂ X,
|an+1 | p p |an+1 |
lim inf ≤ lim inf n |an | ≤ lim sup n |an | ≤ lim sup
|an | |an |
p p
Proof. Let β ∗ = lim sup |a|an+1
n |
|
, α ∗ = lim sup n |a |, β = lim inf |an+1 | and α = lim inf n |a |.
n ∗ |a n | ∗ n
If β ∗ < ∞, then for any b > β ∗ fixed, there exists N ∈ N such that
|an+1 | < b|an | for all n ≥ N.
Hence, |am+N | ≤ bm |aN | for all m > 0; consequently,
p p
n
|an | ≤ b1−N/n n |aN |
for n > N . Letting n → ∞ and then b ց β ∗ shows that α∗ ≤ β ∗ . A similar argument
shows that β∗ ≤ α∗ .
A.1. Properties of absolutely convergent series 627

p P
Theorem A.1.4. p (i) If lim sup n |an | < 1, then an converges absolutely.
n
P
(ii) If lim sup p|an | > 1, then an diverges.
P
(iii) If lim sup n |an | = 1, the convergence (or divergence) of an is inconclusive.
p
∗ = lim sup n |a | < 1. Then for α < A < 1, there exists N ∈ N such
Proof. p Suppose α n
that nP|an | < A for all n ≥ N . Thus, n
n
P|an | < A for all n ≥ N and since the geometric
series A converges, it follows that an converges absolutely.
On the other hand, if α∗ > 1, then for any P
fixed 1 < A < α∗ , there are infinitely many an
n
with |an | > A . Therefore, an 9 0, and so an diverges.
P P
The series an and bn with an√= 1/n and bn = 1/n2 diverge and converge respectively;
√
in both cases, lim an = 1 = lim n bn .
n

For conditional convergent series we present the following result

Theorem A.1.5. PLet {an : n ∈ N} ⊂ C and {xnP: nn ∈ N} ⊂ X be sequences such that
limn→∞ an = 0, n P |an+1 − an | < ∞, and Sn := k=1 xn defines a bounded sequence in
X. Then, the series n an xn converges in X.

Proof. Suppose kSn k ≤ c for all n ∈ N and some constant c > 0. Summation by parts
gives
Xn n
X
ak xk = an+1 Sn − (ak+1 − ak )Sk
k=1 k=1
The first term in the right hand side converges to 0 in X since kan+1 Sn k ≤ c|an+1 |. The
second term in the right side converges absolutely since |(ak+1 − ak )Sk | ≤ c|ak+1 − ak |.
Example A.1.6. The simplest application of Theorem A.1.5 is to determine convergence
of
P alternating series. Suppose an is a nonincreasing sequence converging to 0. Then
n+1
n (−1) an converges.

We conclude this section by introducing one type of convergence of series that we will
appear is these notes and which can be extended to complete normed spaces.
Pn P
Definition A.1.7. Let sn = k=1 bn be the the n–th partial sum of the series bn .
1 Pn−1
The series
P is Cesàro summable if σn = n k=0 sn converges, and Abel summable if
A(r) = ∞ n=0 b n r n converges as r → 1−.

P P
Theorem A.1.8. (Abel’s test) If bn converges and has sum B then, the series bn is
Cesàro and Abel summable, and B = lim σn = lim A(r).
n→∞ r→1−
Pn
Proof. Let Bn := k=0 bn . For all integers N > M , we have by summation by parts that
N
X N
X −1 N
X −1
n N +1 k+1 k N +1
bk r = r BN − (r − r )Bk = r BN + (1 − r) r k Bk
k=0 k=0 k=0
628 A. Infinite series on Banach spaces

P
SinceP n Bn converges, {Bn : n ∈ Z+ } is bounded, that is |Bn | ≤ c for some constant c > 0
and rn Bn converges. Given ε > 0 there is N such that n ≥ N implies that |Bn − B| < 2ε .
Breaking the sum in two parts we obtain
X ∞ X∞ N
X −1
ε
n n
r bn − B = (1 − r) r |Bn − B| ≤ (1 − r) rn |Bn − B| + rN
2
n=0 n=0 n=0
ε
≤ (1 − r)2cN +
2
P
∞ n
Thus, if |1 − r| < 4Nε c we obtain that r bn − B < ε. Therefore lim A(r) = A(1) := B.
n=0 r→1−
Cesàro convergence is left as an exercise.

A.2. Double series

Definition A.2.1. A double sequence a : N × N → R converges to L, which we denote
by lim a(p, q) = L, if for any ε > 0, there is an integer Nε such that if |a(p, q) − L| < ε
p,q→∞
whenever p > N and q > N .

Even if the iterated limits lim lim a(p, q) and lim lim a(p, q) exist and are equal, it
p→∞ q→∞ q→∞ p→∞
may happen that the double sequence a(p, q) diverges
pq
Example A.2.2. Consider a(p, q) = p2 +q 2
. The iterated limits are both zero, however, the
double sequence a(p, q) diverges.
Theorem A.2.3. Suppose that lim a(p, q) = α and that for any p, the limit lim a(p, q)
p,q→∞ q→∞
exists. Then the iterated limit lim lim a(p, q) = α.
p→∞ q→∞

Proof. For any ε > 0, there is N1 = N1 (ε) such that if p > N1 and q > N2 , then
|a(p, q) − α| < 2ε .
For each p, let A(p) = limq→∞ a(p, q). Therefore, there is N2 = N2 (p, ε) such that if q > N2 ,
then |a(p, q) − A(p)| < 2ε .
For each p > N1 , choose q = q(p) > N1 ∨ N2 . It follows that
|A(p) − α| < |A(p) − a(p, q)| + |a(p, q) − α| < ε
This completes the proof.
Definition A.2.4. Given a double sequence a(n, m) consider the sequence of double partial
sums
Xp X q
(A.1) s(p, q) = a(m, n)
m=1 n=1
P
The
P double series a(p, q) convergesPto a sum S if limp,q→∞ s(p, q) = S. The double series
a(p, q) is absolutely convergent if |a(p, q)| converges.
P P
Lemma A.2.5. If p,q a(p, q) converges absolutely, then p,q a(p, q) converges.
A.2. Double series 629

Pn P n P
Proof. Let φn := p=1 q=1 a(p, q). Since p,q |a(p, q)| converges, φn is a Cauchy se-
quence and so it converges to some pointPsay S. Given ε > 0, there is N1 such that
|φn − S| < 2ε whenever n ≥ N1 . Let s = p,q |a(p, q)|. For ε > 0, there is N2 such that
when p, q > N2
p X
X q
ε
s− |a(m, n)| <
2
m=1 n=1

For p, q > N := N1 ∨ N2
N X
X N
ε
|s(p, q) − S| ≤ |s(p, q) − φN | + |φN − S| ≤ s − |a(p, q)| + <ε
2
p=1 q=1
P
This shows that p,q a(p, q) converges to S.
p P
P q
Remark A.2.6. If a(m, n) a nonnegative double sequence, and s(p, q) := a(m, n)
P m=1 n=1
is a bounded double sequence, then the series a(p, q) converges. In particular, if the
∞ P
P ∞ P
iterated sum a(m, n) converges and has limit s, then a(p, q) converges to s.
m=1 n=1 p,q
P
Theorem A.2.7. Suppose n,m a(n, m) is an absolutely convergent double series. Let
P
g : N → N × N be a bijection. Consider the rearrangement G(n) = a(g(n)). Then n G(n)
is absolutely convergent and
∞
X
G(n) = lim s(p, q) = lim lim s(p, q) = lim lim s(p, q)
p,q→∞ p→∞ q→∞ q→∞ p→∞
n=1

That is
X X X ∞
∞ X X ∞
∞ X
G(n) = a(n, m) = a(n, m) = a(n, m)
n n,m n=1 m=1 m=1 n=1

Pk Pp Pq
Proof. Let Tk = j=1 |G(j)| and S(p, q) = m=1 n=1 |a(m, n)|. For any k, there is
a pair (p, q) such that Tk ≤ S(p,
P q). Conversely, for any pair (p,P
q) there is k such that
S(p, q) ≤ Tk . This shows that n G(n) is absolutely convergent iff p,q a(p, q) is absolutely
P P P
convergent. Let s(p, q) = pm=1 qn=1 a(m, n). Lemma A.2.5 shows that s := a(p, q) =
limp,q→∞ s(p, q) exits.

Let H(n) = a(h(n)) be another rearrangement

P of a. Then H(m) = P
P G(g −1 h(m)) is a
rearrangement
P∞ of G(n) and so s′ := Pn G(n) = n H(n). Since Ap := ∞ n=1 a(p, n) and
Bq :=P Pn=1 s(n, q) are a
P P subseries of Pn G(n), wePobtain from Theorems A.1.2 and A.2.3
that p q a(p, q) = q p a(p, q) = n G(n) = p,q a(p, q).
630 A. Infinite series on Banach spaces

Example
P A.2.8. Suppose {an : n ∈ N} and {bn : n ∈ N} are bounded sequences. Then
a b (nm) s is convergent for all s ∈ C with Re(s) > 1. Moreover,
nm n m
X a m bn X ∞
an X bm X X
∞ ∞ 1
= = a b
d n/d
mn
(mn)s ns ms ns
n=1 m=1 n=1 d|n

where for each n ∈ N, the sum inside parenthesis runs along all (positive integer) divisors d
of n. In particular, for an = 1 = bn , we have that
X ∞
1 2 X d(n)
∞
=
ns ns
n=1 n=1
where d(n) is the number of divisors of n.

A.3. Exercises
Exercise A.3.1. Suppose (an : n ∈ N) is a sequence in a Banach space (X, | |). Show that
P
(i) n an converges iff for any ε > 0 there exists N such that |an + . . . + am | < ε
whenever m > n ≥ N .
P P
(ii) If n an converges absolutely, then n an converges.

Exercise A.3.2. Suppose bn > 0, and kakan+1 nk

k
≤ bn+1
bn for all n ≥ N . Show that
P P
(a) If n bn converges the so does n kan k.
P P
(b) If n kan k diverges, then so does n bn .
P
(c) If kakan+1
n k
k
≤ 1 − n
p
for some p > 1, n an converges absolutely.
P
(d) If kakan+1
nk
k
≥ 1 − np for some p ≤ 1, n kan k diverges.
Parts (c) and (d) constitute Raabe’s test of convergence. (Hint: For 0 < x < 1 define
φ(x) = px + (1 − x)p . Show that 1 − px ≤ (1 − x)p if p > 1 and 1 − px ≥ (1 − x)p if p ≤ 1.)
P
Exercise A.3.3. (Kummer) Let n an be a series in X and bn a sequence of positive
numbers. If there is r > 0 and n ∈ N such that
kan+1 k
bn − bn+1 ≥ r, n≥N
kan k
P
then, n an is absolutely convergent. Conversely, if there is n ∈ N such that
kan+1 k
bn − bn+1 < 0, n≥N
kan k
P
then, n an is not absolutely convergent. When bn = n − 1, we get Raabe’s test back.
Exercise A.3.4. (Tauber) The converse of Abel’s summability theorem is false in general
unless some additional conditions on the sequence an are imposed. This exercise proves the
simples of such conditions. Let {an : n ∈ Z+ } ⊂ X be a bounded sequence in X. Let
P PN
f (z) = ∞ n
n=0 an z and SN := n=0 an . Prove that
A.3. Exercises 631

(a) f converges absolutely for any |z| < 1, and

XN
1
∞
X

SN − f (z) ≤ |1 − z| knan k + knan k|z|n
N
n=0 n=N +1
(Hint: |1 − zn| ≤ |1 − z|n.)
(b) Deduce from (a) that
1
N
1 X

SN − f 1 − ≤ knan k + sup knan k
N N n>Nn=0
n→∞ N →∞
(c) If nan −−−→ 0 and limr→1− f (r) = α, prove that SN −−−−→ α.
P
Exercise A.3.5. SupposeP n an is an absolutely convergent series in a Banach space
(X, k k) with sum A, and n bn is a convergent numeric series with sum b. Show that
(a) For any m ∈ Z+
!
m X
X n n
X n−k
X n
X
bn−k ak = bm − b a k + b ak .
n=0 k=0 k=0 m=0 k=0
Pm Pn
(b) n=0 b a
k=0 n−k k converges to bA in X.
P
(Hint: If dn → 0 inP C, then nk=0 dn−k ak → 0 in X. Given ε > 0, let N = N (ε) be such
that |dn | < Kε and m>n |am | < Lε whenever n ≥ N where K and L are some constants.
Then, for m > 2N ,
X m X N 2N
X ε X ε

dm−k ak ≤ |dm−k |kak k + |dm−k |kak k < kan k + sup |dn |
K n L n
k=0 k=0 k=N +1
P
n
Then use part (i) with dn = m=0 bm − b.)
Appendix B

Lower semicontinuous
and convex functions

B.1. Lower semicontinuous functions

Definition B.1.1. Let (X, τ ) be a topological space. A function f : X → R is lower
semicontinuous if f −1 ((a, ∞)) is open for all a ∈ R. Similarly, f is upper semicontinous
if −f is lower semicontinuous. f is said to be proper if domf := {x ∈ X : f (x) < ∞} is
not empty and f (x) > −∞ for all x ∈ X.

Suppose X is a Hausdorff topological space and let f ∈ RX . For any {xα : α ∈ D} is a

net in X define
lim inf f (xα ) := sup inf f (xα )
α α0 ∈D α∈D:α≥α0
lim sup f (xα ) := inf sup f (xα ).
α α0 ∈D α∈D:α≥α0

Theorem B.1.2. f is lower semicontinuous iff for any x ∈ X and any net xα → x,
(B.1) f (x) ≤ lim inf f (xα ).
α

Similarly, f is upper semicontinuous iff for any x ∈ X and any net xα → x,

f (x) ≥ lim sup f (xα ).
α

If if addition X is first countable, the statements above hold with sequences in place of nets.

Proof. Suppose f is lower semicontinuous and let {xn : n ∈ D} be a net that converges
to x. For any α > f (x) the set V = {f > α} is an open neighborhood of x. Hence there
is n0 ∈ D such that n ≥ n0 implies that f (xn ) > α; this implies that α ≤ lim inf n f (xn ).
(B.1) follows by letting α → f (x).

633
634 B. Lower semicontinuous and convex functions

Suppose that (B.1) holds for any x ∈ X and any net xn → x. We will show that for each
α ∈ R, the set Fα := {f ≤ α} is closed. Let {xn : n ∈ D} be a net in Fα that converges to
a point x ∈ X. Then f (xn ) ≤ α for all n ∈ D, and so
f (x) ≤ lim inf f (xn ) = sup inf f (xm ) ≤ α.
n n∈D m∈D:m≥n
Therefore x ∈ Fα .
Lemma B.1.3. The epigraph of a function f : X → R is defined as
epi(f ) = {(x, α) ∈ X × R : f (x) ≤ α}.
Then, f is lower semicontinuous iff epi(f ) is closed.

Proof. Suppose f is lower semicontinuous and let {(xn , αn ) : n ∈ D} be a net in epi(f )

converging to (x, α). Then
f (x) ≤ lim inf f (xn ) ≤ lim αn = α
n n
that is, (x, α) ∈ epi(f ).
Conversely, suppose epi(f ) is closed. We will show that {f > c} is open in X. If f (x) > c
the (x, c) ∈ epic (f ). As epic f is open, there is a neighborhood V of x and an open interval
I containing c such that V × I ⊂ epic (f ). Hence, for any y ∈ V we have that (y, c) ∈ epic ,
that is, f (y) > c. Therefore V ⊂ {f > c}.
Theorem B.1.4. Suppose X is a compact Hausdorff space. If f : X → R is lower semi-
continuous, then there is x0 ∈ X such that f (x0 ) = inf x∈X f (x).

Proof. For any a ∈ f (X) let Fa = {f ≤ a}. Each Fa is closed and the collection {Fa :
a ∈ fT
(X)} satisfies the finite intersection property. Consequently the set of minimizers
F := a∈f (X) Fa 6= ∅.
Theorem B.1.5. Let X be a locally compact Hausdorff topological space. For any lower
semicontinuous function f ≥ 0, f = sup{φ ∈ C00 (X) : φ ≤ f }.

Proof. Let x0 ∈ X. If f (x0 ) = 0 let ψ ≡ 0. If f (x0 ) > 0, then for any 0 < a < f (x0 ),
Ua = {x ∈ X : f (x) > a} is an open neighborhood of x0 . By Urysohn’s lemma, there
is ψa ∈ C00 (X) such that 1{x0 } ≺ ψa ≺ 1Ua . Hence, φa = aψ satisfies a = φa (x0 ) and
0 ≤ φa ≤ f . The conclusion follows immediately.
Theorem B.1.6. Let (S, d) be a metric space and suppose f ∈ RS and f (x) ≥ b > ∞ for
all x ∈ S. The function f is lower semicontinuous if and only if there is a sequence of
bounded Lipschitz continuous functions fk such that inf k,x {fk (x)} ≥ b an fk ր f pointwise.

Proof. Sufficiency is clear since continuous functions are lower semicontinuous, and so is
the supremum of lower semicontinuous functions.
It suffices to assume that f ≥ 0. For each t ≥ 0 define
gt (x) = inf {f (z) + td(x, z)}
z
B.2. Convex functions 635

Clearly 0 ≤ gs ≤ gt whenever s < t, and gt (x) ≤ f (x) + td(x, x) = f (x). Notice that for all
x, y ∈ S, f (z) + td(x, z) ≤ f (z) + td(y, z) + td(x, y); consequently, gt (x) ≤ gt (y) + td(x, y).
By symmetry, we obtain |gt (x) − gt (y)| ≤ td(x, y), which means that each gt is Liptschitz
continuous. If h = limn gn , then 0 ≤ h ≤ f . We will show that h = f . To that purpose, fix
x ∈ S and let ε > 0. For each n ∈ N, there is zn ∈ S such that
(B.2) gn (x) + ε > f (zn ) + nd(x, zn ) ≥ nd(x, zn )
Since f (x) ≥ gn (x), it follows that f (x) + ε > nd(x, zn ) for all n; hence, zn converges to
x. Since f is lower semicontinuous, there is N such that for n ≥ N , f (x) − ε < f (zn ). For
such n, we obtain from (B.2) that gn (x) > f (x) − 2ε. Letting n → ∞ and then ε ց 0 shows
that h = f . To conclude, notice that {fn := gn ∧ n : n ∈ Z+ } is an increasing sequence of
nonnegative bounded Lipschitz–continuous functions which converges to f .

B.2. Convex functions

Suppose X is a topological vector space and let X ∗ be its topological dual.
Definition B.2.1. A function f : X −→ R is said to be convex iff epi(f ) is a convex subset
of Rn+1 .
Example B.2.2. Suppose C ⊂ X is a nonempty convex set and that f : C → R satisfies
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for all x, y in C and all 0 ≤ λ ≤ 1. If f is extended
to X by setting f (x) = ∞ for all x ∈ X \ C, then f is convex and proper.
Lemma B.2.3. Suppose f : X → R is lower semicontinuous and convex. If f (x0 ) ∈ R for
some x0 ∈ X, then f is proper.

Proof. Suppose there is x1 ∈ X with f (x1 ) = −∞. Let xλ := λx0 + (1 − λ)x1 . Then,
since epi(f ) is convex, it follows that f (xλ ) = −∞ for all 0 ≤ λ < 1. Therefore, f (x) ≤
lim inf λ→1 f (xλ ) = −∞ which is a contradiction.
Lemma B.2.4. Suppose f is convex. If x ∈ core(dom(f )) and f (x) ∈ R, then f is proper.

Proof. Let y ∈ dom(f ). There exists ε > 0 such that x + ε(x − y) ∈ dom(f ). As
ε 1
x= y+ (x + ε(x − y)),
1+ε 1+ε
for any α > f (y) and β > f (x + ε(x − y) we have that
ε 1
f (x) ≤ α+ β.
1+ε 1+ε
Since f (x) > −∞, letting α → f (y) we conclude that f (y) > −∞.
Theorem B.2.5. Suppose f : X → R is convex and that for some x0 ∈ X there is an open
o that supx∈V f (x0 + x) < ∞. If f (x0 ) ∈ R, then f is proper and
neighborhood V of 0 such
continuous on dom(f ) .
636 B. Lower semicontinuous and convex functions

Proof. By assumptions x0 + V ⊂ dom(f ); hence, xo ∈ core(dom(f )). As f (x0 ) > −∞, f

is proper by Lemma B.2.4. Let m := supx∈V f (x0 + x).

Without loss of generality suppose V is balanced. Then, for any 0 < λ ≤ 1 and x ∈ V
f (x0 + λx) = f (λ(x0 + x) + (1 − λ)x0 ) ≤ λf (x0 + x) + (1 − λ)f (x0 ).
Thus
f (x0 + λx) − f (x0 ) ≤ λ(f (x0 + x) − f (x0 )) ≤ λ(m − f (x0 ))
On the other hand,
1 λ
x0 = (x0 + λx) + (x0 − x).
1+λ 1+λ
Thus,
1 λ
f (x0 ) ≤ f (x0 + λx) + f (x0 − x)
1+λ 1+λ
whence it follows that
f (x0 ) − f (x0 + λx) ≤ λ(f (x0 − x) − f (x0 )) ≤ λ(m − f (x0 ))
Consequently,
|f (y) − f (x0 )| ≤ λ(m − f (x0 )), y ∈ x0 + λV
o
and continuity at x0 follows. For any z ∈ dom(f ) there is µ > 1 such that x0 +µ(z−x0 ) ∈
dom(f ))o . For any x ∈ V ,
1 1 1
z+ 1− x = (x0 + µ(z − x0 )) + 1 − (x0 + x).
µ µ µ
Hence
1 1 1
f (z + 1 − x) ≤ f ((x0 + µ(z − x0 ))) + 1 − f (x0 + x)
µ µ µ
1 1
≤ f ((x0 + µ(z − x0 ))) + 1 − m.
µ µ

This shows that f is bounded above on z + 1 + µ1 V ; therefore, by the argument developed
above, f is continuous at z.

Definition B.2.6. Let f : X → R be a convex function. A linear function x′ ∈ X ∗ is said

to be a subgradient of f at x if
f (y) ≥ f (x) + x′ (y − x), y ∈ X.
The collection of all subgradiants of f at x, denoted by ∂f (x), is called the subdifferential
of f at x.
B.2. Convex functions 637

Lemma B.2.7. Suppose f is a convex function in a real vector space X. For any x ∈
dom(f ) with f (x) > −∞ and d ∈ X, the function
f (x + λd) − f (x)
λ 7→ ,
λ
is monotone nondecreasing on R \ {0}.

Proof. Suppose 0 < µ < λ. Then µ = alphaλ where 0 < α = µλ < 1. If (x + λd) ∈ / dom(f )
there is nothing to prove. If (x + λd) ∈ dom(f ) then, for and v > f (x + λd),
f (x + µd) = f ((1 − α)x + α(x + λd)) < (1 − α)f (x) + αv.
Rearranging the terms above we obtain
µ
f (x + µd) − f (x) <(v − f (x)).
λ
Letting v → f (x + λd) shows the result holds on (0, ∞).
f (x−µd)−f (x) f (x−λd)−f (x)
Applying the result to −d we have that µ ≤ λ whenever 0 < µ < λ.
Hence,
f (x − λd) − f (x) f (x − µd) − f (x)
≤ .
−λ −µ
This shows the result holds on (−∞, 0).
Remark B.2.8. If f is a convex function with dom(f ) 6= ∅ and ∂f (x) 6= ∅ for some x ∈ X,
then clearly x ∈ dom(f ). Also, if f attains a minimum at x iff 0 ∈ ∂f (x).

Suppose f is a proper convex function and let x ∈ dom(f ). The map f+′ (x; d) given by
f (x + λd) − f (x) f (x + λd) − f (x)
d 7→ lim = inf
λ→0 λ λ≥0 λ
is the right–sided directional derivative of f at x. If f+′ (x; ·) ∈ X ∗ then it is the
Gâteaux derivative of f at x.
Theorem B.2.9. Suppose f is a proper convex function in the topological vector space X.
Let x ∈ dom(f ). Then
(i) The function d 7→ f+′ (x; d) is positive homogeneous and convex.
(ii) If f is continuous at x then d 7→ f+′ (x; d) is continuous on X.

f (x+αλd)−f (x)
Proof. Positive homogeneity follows from λ = α f (x+αλd)−f
αλ
(x)
. Let d, v ∈ X
and 0 < λ < 1. As f is convex and proper
f (x + λ(αd + (1 − α)v)) − f (x) f (x + λd) − f (x) f (x + λv) − f (x)
≤α + (1 − α) .
λ λ λ
Statement (i) follows by letting λ ց 0.
638 B. Lower semicontinuous and convex functions

(ii) If f is continuous at x, then there is a balanced neighborhood V of 0 on which f is

bounded above. As in the proof of Theorem B.2.5, with m = supv∈V f (x + v),
|f (x + λv) − f (x)| ≤ λ(m − f (x))
for all v ∈ V and 0 ≤ λ ≤ 1. Consequently for all d ∈ V ,
f (x − λd) − f (x) f (x + λd) − f (x)
f (x) − m ≤ ≤ f+′ (x; d) ≤ ≤ m − f (x)
−λ λ
f ′ (x; ·) is bounded on V and so it is finite everywhere. Since f+′ (x; 0) = 0 ∈ R, continuity
follows from Theorem B.2.5.

The topological dual space of X × R is X ∗ × R. For any (v, λ) ∈ X ∗ × R and c ∈ R we

refer to the set
H(v,λ) (c) := {(x, α) : v(x) + λα = c}
as a hyperplane in X × R. If λ 6= 0, then Hv,λ (c) is said to be a non–vertical hyperplane.
The connection between hyperplanes and affine functions is given by the following result.
Lemma B.2.10. A hyperplane H is non–vertical iff it is the graph of an affine function.

Proof. If f (x) = v(x) + c, where v ∈ X ∗ then,

Graph(f ) = {(x, α) : α = v(x) + c} = {(x, α) : (v, −1) (x, α) = −c} = H(v,−1) (−c).
Conversely, if λ 6= 0 then
H(v,λ) (c) = {(x, α) : v(x) + λα = c} = {(x, α) : α = − λ1 v(x) + λc } = Graph(f )
where f (x) = − λ1 v(x) + λc .
Theorem B.2.11. Let X be a locally convex topological space and suppose f : X → R ∪
{+∞} is a proper lower semicontinuous convex function. For any x ∈ X, if α < f (x) then
there exits a continuous affine function g such that α ≤ g(x) (with equality if x ∈ dom(f ))
and g(y) < f (y) for all y ∈ X.

Proof. By assumption (x, α) ∈/ epi(f ). Since X × R is locally convex and epi(f ) is convex
and closed in X × R, by Theorem 12.10.15[(iii)] there exit some (v, λ) ∈ X ∗ × R and ε > 0
such that

(B.3) v(x) + λα + ε = (v, λ) (x, α) + ε < v(y) + λβ
for all (y, β) ∈ epi(f ). By letting β → ∞ we obtain that λ ≥ 0. If λ > 0 then
1
g(y) = v(x − y) + α, y∈X
λ
satisfies α = g(x) and g(y) < f (y) for all y ∈ X.
If x ∈ dom(f ) then (x, f (x)) ∈ epi(f ) and from (B.3) we conclude that λ > 0.
If f (x) = +∞ and λ = 0 then v(x − y) < −ε for all y ∈ dom(f ). Hence, the continuous
affine function h(y) := v(x − y) + 2ε satisfies h(x) > 2ε > 0 and h(y) < 0 for all y ∈ dom(f ).
B.2. Convex functions 639

Fix y0 ∈ dom(f ). As before, there exists a continuous affine function φ such that φ(y0 ) =
f (y0 ) − 1 and φ(y) < f (y) for all y ∈ X. For any c > 0 define
gc (y) := ch(y) + φ(y), y ∈ X.
For y ∈ dom(f ) we have that gc (y) < φ(y) < f (y); whereas for if f (y) = ∞, gc (y) < ∞ =
f (y). We choose c large enough so that gc (x) = c 2ε + φ(x) ≥ α. The corresponding function
gc has the desired properties.
Corollary B.2.12. Suppose X is a normed space and that f : X → R ∪ {+∞} is proper
lower semicontinuous and convex. If B ⊂ X is bounded then inf x∈B f (x) > −∞.

Proof. If B ⊂ {f = ∞} there is nothing to proof. Suppose x0 ∈ B ∩ dom(f ) and let

α < f (x0 ). By Theorem B.2.11, there is a continuous affine function g(y) = v(y) − c such
that α ≤ g(x0 ) and g(y) < f (y) for all y ∈ X. Hence M := inf y∈B v(y) > −∞ since v ∈ X ∗ .
As a consequence, inf y∈B f (y) ≥ M − c.
Definition B.2.13. For any function f : X → R, the function f ∗ : X ∗ → R given by

(B.4) f ∗ (v) = sup v(x) − f (x) , v ∈ X∗
x∈X
is the Frenchel–Legendre transform of f .
Theorem B.2.14. For any function f : X → R,
(i) f ∗ is convex on X ∗ .
(ii) (Frenchel–Young inequality) f (x) + f ∗ (v) ≥ v(x) for all (x, v) ∈ X × X ∗ .
(iii) Under the σ(X ∗ , X)–topology on X ∗ , f ∗ is lower semicontinuous and f ∗∗ (x) =
(f ∗ )∗ (x) ≤ f (x).
(iv) (Frenchel–Legendre duality) If in addition, X is locally convex and f is proper
lower semicontinuous and convex, we have that
f (x) = f ∗∗ (x) = sup{g(x) : g affine, g(y) < f (y)}.

Proof. (i) By definition f ∗ (v) ≥ v(x) − f (x) for all (x, v) ∈ X × X ∗ .

(ii) Suppose (v, α) and (w, β) are in epi(f ∗ ). Then, for any x ∈ X
v(x) − f (x) ≤ f ∗ (v) ≤ α
w(x) − f (x) ≤ f ∗ (w) ≤ β.
Hence, for any 0 ≤ λ ≤ 1 we have that
(λv + (1 − λ)w)(x) − f (x) ≤ (λα + (1 − λ)β), x ∈ X.
Taking suprema over all x ∈ X leads to f ∗ (λv + (1 − λ)w) ≤ λα + (1 − λ)β. Therefore
epi(f ∗ ) is a convex subset of X ∗ × R.
. (iii) Let {vn : n ∈ D} a net with vn → v in σ(X ∗ , X). Then, for any x ∈ X

lim inf f ∗ (vn ) ≥ lim vn (x) − f (x) = v(x) − f (x)
n n
640 B. Lower semicontinuous and convex functions

whence it follows that f ∗ (v) ≤ lim inf n f ∗ (vn ).

For any x ∈ X, part (i) gives f (x) ≥ v(x) − f ∗ (v) for all v ∈ X ∗ . Taking suprema over all
v ∈ X ∗ gives f (x) ≥ f ∗∗ (x).

(iv) It is enough to show that under the additional condition in (iv), f ≤ f ∗∗ . Let x ∈
X and suppose α < f (x). By Theorem B.2.11 there exists a continuous affine function
g(y) = v(y) − c such that α ≤ g(x) and g(y) < f (y) for all y ∈ X. We claim that
(v, c) ∈ E ∗ := epi(f ∗ ). Otherwise, f ∗ (v) > c and by definition of f ∗ there exits x0 ∈ X
such that c < v(x0 ) − f (x0 ) which leads to the contradiction f (x0 ) < v(x0 ) − c = g(x0 ).
Consequently,
n o
α ≤ g(x) = v(x) − c ≤ sup w(x) − λ
(w,λ)∈E ∗
n o
= sup w(x) − f ∗ (w) = f ∗∗ (x).
w∈dom(f ∗ )

The conclusion follows by letting α → f (x).

Example B.2.15. Suppose X is a Banach space. As ab ≤ p1 ap + 1q bq for all a, b ≥ 0 and
p, q > 0 with p1 + 1q = 1,
1 1
v(x) ≤ kvkkxk ≤ kvkp + kxkq
p q
Therefore, if f (x) = 1q kxkq , then f ∗ (v) = p1 kvkp .

Example B.2.16. For any C ⊂ X, δC (x) := 0 if x ∈ C and +∞ otherwise. Then

(δC )∗ (v) = sup{v(x) : x ∈ C}.
Theorem B.2.17. Suppose X is a normed space and that f : X → R ∪ {+∞} is a proper
lower semicontinuous convex function. The following statements are equivalent:
(i) Fa = {f ≤ a} is bounded for all a ∈ R.
f (x)
(ii) lim inf kxk→∞ kxk > 0.
(iii) There exists k > 0 and β ∈ R such that f (x) ≥ kkxk + β.
(iv) limkxk→∞ f (x) = +∞
(v) f ∗ (0) ∈ R and f ∗ is continuous at v = 0.

Proof. (i) implies (ii): By considering g(x) = f (x+x0 )−f (x0 ) for some x0 ∈ dom(f ) if nec-
essary, we can assume without loss of generality that f (0) = 0. Suppose lim inf kxk→∞ fkxk
(x)
≤
f (xn ) 1
0. Then, there exists a sequence (xn : n ∈ N) ⊂ X such that kxn k > n and kxn k < n.
Hence

f kxnn k xn = f (1 − kxnn k )0 + kxnn k xn ≤ kxnn k f (xn ) < 1.
n
This shows that kxn k xn ∈ F1 an so, F1 is unbounded.
B.3. Asymptotic Cones and Functions in Rn 641

(ii) implies (iv): Suppose A := lim inf kxk→∞ fkxk

(x)
> 0 and let 0 < k < A. There is r > 0
such that kxk > r implies that f (x) ≥ kkxk. By Corollary B.2.12 there is α > 0 such that
inf kxk≤r f (x) > −α. Hence
f (x) ≥ kkxk + β, x∈X
where β = −(α + kr).
(iii) implies (i) and (iv): For any a ∈ R, Fa ⊂ {x : kkxk + β ≤ a} which is either empty or
a closed ball in X. As k > 0, limkxk→∞ f (x) = ∞.
(iv) implies (i): For any a ∈ R there is r > 0 such that kxk > r implies that f (x) > |a|.
Therefore Fa ⊂ F|a| ⊂ B(0; r).
(iii) implies (v): For any v ∈ X ∗
v(x) − f (x) ≤ v(x) − kkxk − β
Hence
1
f ∗ (v) ≤ sup v(x) − kkxk − β = sup v(kx) − kkxk − β
x∈X x∈X k

= δ{kvk≤r} v − β
Thus f ∗ is bounded above a neighborhood of 0 in X ∗ . As f is proper and −f (x) ≤ f ∗ (0) ≤
−β for all x ∈ X, we have that f ∗ (0) ∈ R. Then, by Theorem B.2.5 f ∗ is continuous at
0. (v) implies (iii): Suppose f ∗ (0) ∈ R and that f ∗ continuous at 0. Then f ∗ is bounded
above in a neighborhood B(0; r) ⊂ X ∗ . Then, for some m ∈ R
v(x) − f (x) ≤ f ∗ (v) ≤ m, x ∈ X, v ∈ X ∗ , kvk ≤ r,
which leads to
f (x) ≥ v(x) − m, x ∈ X, v ∈ X ∗ , kvk ≤ r.
We claim that f (x) ≥ rkxk − m for all x ∈ X. If x = 0 there is nothing to prove. If x 6= 0
then, by Theorem 12.10.9, there is x∗ ∈ X ∗ such that kx∗ k = 1 and x∗ (x) = kxk. So, if
v = rx∗ we obtain that f (x) ≥ rkxk − m.

A proper lower semicontinuous convex function that satisfies (iv) in Theorem B.2.17 is
said to be coercive.

B.3. Asymptotic Cones and Functions in Rn

In the section we will consider subsets and functions defined on Euclidean space Rn .
Definition B.3.1. Let C ⊂ Rn be a nonempty. The asymptotic cone of C is defined as
as the set

C∞ := d ∈ Rn : ∃tn ր ∞, xn ∈ C, with lim t−1
n xn = dn

Lemma B.3.2. Suppose C ⊂ Rn is nonempty. Then

642 B. Lower semicontinuous and convex functions

(i) C∞ is a nonempty closed cone.

(ii) C ∞ = C∞
(iii) If C is a cone, C∞ = C.

Proof. (i) It is clear that C∞ is a cone and that 0 ∈ C∞ . Suppose d ∈ C∞ and let
{dn : n ∈ N} be a sequence in C∞ with dn → d. There is tn ≥ 1 and x1 ∈ C such that
kt−1
1 x1 − d1 k ≤ 1. Once t1 , . . . , tn−1 and x1 , . . . , xn−1 have been constructed, we can find
1 1
tn > tn−1 and xn such that kt−1 −1
n xn −dn k ≤ n . It follows that ktn xn −dk ≤ n +kdn −dk → 0,
whence we conclude that d ∈ C∞ .

(ii) Clearly C∞ ⊂ C ∞ . Suppose d ∈ C ∞ . Let tn ր ∞ and xn ∈ Cn such that
1
t−1
n xn → d. For each n, there is x̂n ∈ C with kxn − x̂n k ≤ n . Hence ktn x̂n − dk ≤
−1
−1 −1
tn kx̂n − xn k + ktn xn − dk → 0. Therefore, d ∈ C∞ .
(iii) Suppose C is a cone. If d ∈ C∞ then for some tn ր ∞ and xn ∈ C, d = limn t−1
n xn .
Since t−1
n x n ∈ C, d ∈ C.
Conversely, suppose d ∈ C. As nd ∈ C and d = n1 nd, we have that d ∈ C∞ .
Theorem B.3.3. Suppose C is a nonempty convex subset in Rn . Then C∞ is a closed
convex cone and

(B.5) C∞ = d ∈ Rn : ∀x ∈ C, ∀λ ≥ 0, x + λd ∈ C

(B.6) = {d ∈ Rn : ∀λ ≥ 0, x0 + λd ∈ C
for any x0 ∈ C.

Proof. Let R be the set in the right hand side of (B.5) and let Rx0 be the set in (B.6).
Evidently R ⊂ Rx0 for any x0 ∈ C.
−1
d ∈ Rx0 so that dλ = x0 + λd ∈ C for all λ ≥ 0. As n dn → d, it follows that
Suppose
d ∈ C ∞ = C∞ ; therefore Rx0 ⊂ C∞ .
We now show that C∞ ⊂ R. Let d ∈ C∞ . There is tn ր ∞ and xn ∈ C such that
d = limn t−1 −1
n xn . Let x ∈ C and set dn = tn (xn − x). Then dn → d and xn = x + tn dn . For
any λ > 0, there exits n0 such that tn > λ whenever n ≥ n0 . By convexity
λ λ
x̂n := x + λdn = 1 − x + xn ∈ C.
tn tn
Then limn x̂n = x + λd ∈ C, which means that d ∈ R.
It remains to show that C∞ is convex. Suppose d1 , d2 ∈ C∞ . Let 0 < α < 1. Fix x0 ∈ C.
Then for all λ > 0 x0 + λdi ∈ C, i = 1, 2. Since C is convex so is C, and so
x0 + λ(αd1 + (1 − α)d2 ) = α(x0 + λd1 ) + (1 − α)(x0 + λd2 ) ∈ C.
Therefore, αd1 + (1 − α)d2 ∈ Rx0 = C∞ .
Theorem B.3.4. A nonempty subset C of Rn is bounded iff C∞ = {0}.
B.3. Asymptotic Cones and Functions in Rn 643

Proof. Suppose M := supx∈C kxk < ∞. We know that 0 ∈ C∞ . If d ∈ C∞ then there is

M
tn ր ∞ and xn ∈ C such that d = limn t−1 −1
n xn . Then kdk = limn tn kxn k ≤ limn tn = 0,
and so d = 0.
Suppose C is not bounded. Then there is a sequence {xn : n ∈ N} ⊂ C such that kxn k > n.
As un = kxn k−1 xn ∈ Sn−1 , by compactness, there is a subsequence unk which converges to
some u ∈ Sn−1 . This shows that u ∈ C∞ \ {0}.

Suppose that F is a nonempty closed subset of X ×R with the property that if (x, µ) ∈ F
then (x, µ′ ) ∈ F for all µ′ > µ. Then, the function on X defined as
g(x) := inf{µ : (x, µ) ∈ F }
is the unique lower semicontinuous function with epi(g) = F .
Example B.3.5. Let f : X → R and set F = epi(f ). The function f (x) := inf{µ : (x, µ) ∈
F } is called closure of f . Clearly, it is lower semicontinuous and f ≤ f . If f is convex,
then so is f .

Theorem B.3.6. Suppose f, g : X → R satisfy g ≤ f . Then g ≤ f . If in addition g is

lower semicontinuous, then g ≤ f .

Proof. g ≤ f implies that epi(f ) ⊂ epi(g). Whence it follows that g ≤ f . If g is lower

semicontinuous then epi(g) = epi(g) and so g = g.
Theorem B.3.7. Suppose f : X → R ∪ {+∞} is a proper function. There exists a unique
function f∞ : X → R such that

(i) epi(f∞ ) = epi(f ) ∞ .
(ii) f∞ is lower semicontinuous and positive homogeneous.
(iii) f∞ (0) = 0 or f∞ (0) = −∞.
(iv) If f∞ (0) = 0, then f∞ is proper.

Proof. (i) F := epi(f ) ∞ is a nonempty closed set in X × R since f is proper. Let
(x, µ) ∈ F and let µ′ > µ. There is tn ր ∞ and (xn , αn ) ∈ epi(f ) such that t−1 n (xn , αn ) →
(x, µ). Set µ′n = µn + tn (µ′ − µ). Then (xn , µ′n ) ∈ epi(f ) and t−1
n (x n , µ ′ ) → (x, µ′ ) and so
n
(x, µ′ ) ∈ F .

(ii) Lower semicontinuity of f∞ follows from Theorem B.1.4 since F := epi(f∞ ) = epi(f ) ∞
is closed.
To show positive homogeneity, let (x, µ) ∈ epi(f∞ ). Since F is a cone, λ(x, µ) ∈ F , and
so f∞ (λx) ≤ λµ, for all λ > 0. If f (x) = −∞, then by letting µ → −∞ we obtain that
f∞ (x) = −∞ if and only if f∞ (λx) = −∞ for all λ > 0.
If −∞ < f∞ (x), then f∞ (λx) ≤ λf∞ (x) for all λ > 0. It follows that −∞ < f∞ (λx) and
hence, (λx, f∞ (λx)) ∈ epi(f∞ ), for all λ > 0. This leads to (x, λ−1 f∞ (λx)) ∈ epi(f∞ ).
Therefore λf∞ (x) ≤ f∞ (λx).
644 B. Lower semicontinuous and convex functions

(iii) Since f is proper, then (0, 0) ∈ epi(f∞ ), and so f∞ (0) ≤ 0. If f∞ (0) > −∞ then, as
f∞ (0) = f∞ (λ0) = λf∞ (0) for all λ > 0, it follows that f∞ (0) = 0.
Suppose f∞ (0) = 0 and that f∞ (x) = −∞ for some x. Then 0 = f∞ (0) ≤ lim inf n f∞ (n−1 x) =
n−1 f∞ (x) = −∞, which is a contradiction.
Theorem B.3.8. For any proper function f : Rn → R ∪ {+∞}, the asymptotic function
f∞ associated to f is given by
f (tz)
(B.7) f∞ (d) = lim inf
z→d t
t→∞
for all d ∈ Rn .

Proof. Let g(d) denote the right hand side of (B.7). It is enough to show that epi(f∞ ) =
epi(g). Let (d, µ) ∈ epi(f∞ ). Then for some tn ր ∞ and (dn , µn ) ∈ epi(f ), limn t−1
n (dn , µn ) =
(d, µ). Hence
1 µn
f (tn t−1
n dn ) ≤
tn tn
1 −1
By passing to the limit we obtain that g(d) ≤ lim inf n tn f (tn tn dn ) ≤ µ, which means that
(d, µ) ∈ epi(g).
Conversely, suppose that (d, µ) ∈ epi(g). By definition of g, there is a sequence tn ր ∞
and a sequence dn → d such that
f (tn dn )
lim = g(d)
n tn
Hence for any ε > 0 there exits n0 ∈ N such that n ≥ n0 implies that f (tn dn ) < tn (µ + ε).
This means that tn (dn , µ + ε) ∈ epi(f ) for all n ≥ n0 . Consequently t−1 n tn (dn , µ + ε) →
(d, µ + ε) ∈ epi(f∞ ). Since epi(f∞ ) is closed, by letting ε → 0, (d, µ) ∈ epi(f∞ ).

When f is proper lower semicontinuous convex function in Rn we have the following

representation of f∞ .
Theorem B.3.9. Suppose f : Rn → R ∪ {+∞} is a proper lower semicontinuous convex
function. Then f∞ is proper lower semicontinuous convex function and

(B.8) f∞ (d) = sup f (x + d) − f (x) : x ∈ dom(f )
f (y + λd) − f (y)
(B.9) = sup
λ>0 λ
for any y ∈ dom(f ).

Proof. Since f is proper lower semicontinuous

and convex, epi(f ) is a nonempty closed
convex set in Rn+1 . Thus epi(f∞ ) = epi(f∞ ) is a closed convex cone. This means that
f∞ is a lower semicontinuous function. We will show that f∞ is proper by proving that (B.8)
holds. Notice that (d, µ) ∈ epi(f∞ ) iff for any x ∈ dom(f ) we have that (x + d, f (x) + µ) ∈
epi(f ). This is equivalent to (d, µ) ∈ epi(f∞ ) iff f (x + d) − f (x) ≤ µ for all x ∈ dom(f ).
B.3. Asymptotic Cones and Functions in Rn 645

Therefore, (d, µ) ∈ epi(f∞ ) iff sup{f (x + d) − f (x) : x ∈ dom(f )} ≤ µ, which is equivalent

to (B.8).

Let x ∈ dom(f ). By Theorem B.3.3, (d, µ) ∈ epi(f∞ ) iff for all λ > 0, (x + λd, f (x) + λµ) ∈
epi(f ). Equivalently, (d, µ) ∈ epi(f∞ ) iff f (x+λµ)−f
λ
(x)
≤ µ. Set
f (x + λd) − f (x)
g(d) := sup , d ∈ Rn .
λ>0 λ
We have proved that (d, µ) ∈ epi(f∞ ) iff (d, µ) ∈ epi(g), and so g ≡ f∞ .
Theorem B.3.10. Suppose f : Rn :→ R ∪ {+∞} is proper. For any α ∈ R, if {f ≤ α} =
6 ∅
then {f ≤ α}∞ ⊂ {f∞ ≤ 0}.

If in addition f is lower semicontinuous and convex, then {f ≤ α}∞ = {f∞ ≤ 0}.

Proof. Let d ∈ {f ≤ α}∞ . There exits tn ր ∞ and xn ∈ {f ≤ α} such that t−1

n xn → d.
Therefore, by Theorem B.3.8,
f (tz) tn t−1
n xn α
f∞ (d) = f∞ (d) = lim inf ≤ lim inf ≤ lim = 0.
z→d t n tn n tn
t→∞

Suppose in addition that f is lower semicontinuous and convex. If f∞ (d) ≤ 0 then, by

Theorem B.3.9, for any x ∈ dom(f )
f (x + λd) − f (x)
sup = f∞ (d) ≤ 0.
λ>0 λ
Hence f (x + λd) ≤ f (x) for all λ > 0. In particular, if f (x) ≤ α, then f (x + λd) ≤ α for all
λ > 0, that is, d ∈ {f ≤ α}∞ .

Combining Theorems B.2.17 and B.3.10 we obtain the following characterization of

coercive functions in terms of the asymptotic function.
Corollary B.3.11. A proper lower semicontinuous convex function in Rn is coercive iff
f∞ (d) > 0 for all d 6= 0.

Proof. f is coercive iff {f ≤ α} is bounded for all α ∈ R are bounded. As {f ≤ f (x)} =

6 ∅
for x ∈ dom(f ), we have that {0} = {f ≤ f (x)}∞ = {f∞ ≤ 0}. Consequently, for any
d 6= 0, f∞ (d) > 0.

The next result identifies the sign that the asymptotic function f∞ associated to a
proper lower semicontinuous convex function f in terms of the limit behavior of f along
rays.
Lemma B.3.12. Let f be a proper lower semicontinuous convex function in Rn . f∞ (d) ≤ 0
iff lim supλ→∞ f (x+λd) < ∞ for all x ∈ dom(x). Equivalently, f∞ (d) > 0 iff lim inf λ→∞ f (x+
λd) = +∞ for all x ∈ dom(f ).
646 B. Lower semicontinuous and convex functions

Proof. Suppose that f∞ (d) ≤ 0. Then, for any x ∈ dom(f ) and λ > 0
f (x + λd) − f (x)
≤ 0.
λ
Hence lim supλ→∞ f (x + λd) ≤ f (x) < ∞.
Conversely, suppose f (d) > α > 0. For any x ∈ dom(f ) there exits λ0 > 0 such that λ ≥ λ0
implies that
f (x + λd) − f (x)
> α.
λ
Consequently, f (x + λd) ≥ f (x) + λα → +∞.
Theorem B.3.13. Suppose f : Rn → R ∪ {+∞} is a proper lower semicontinuous convex
function. f is coercive iff
Z
(B.10) e−f (x) λ(dx) < ∞

where λ is Lebesgue’s measure on Rn .

Proof. If f is coercive, then (B.10) holds by Theorem B.2.17[(iii)].

Suppose (B.10) holds. Let ν(dx) := e−f (x) λ(x). For any 0 < ε < ν(Rn ), there exits a closed
ball B around the origin such that µ(Rn \ B) < ε.
We claim that f∞ (d) > 0 for all d 6= 0. Otherwise, suppose there is d 6= 0 with f∞ (d) ≤ 0.
Then for any x ∈ dom(f ),
(B.11) f (x + λd) ≤ f (x), λ ≥ 0.
R R
/ dom(f ). Since e−f (x+λd) dx = e−f (x) dx, we have that
Clearly (B.11) also holds for x ∈
f (· + λd) = f (·) λ–a.s. Hence
Z Z Z
−f −f (x+λd)
µ(B) = 1B e dλ = 1B (x)e dx = 1B+λd e−f dλ = ν(B + λd)

This is not possible as (B + λd) ∩ B = ∅ for λ large enough. Therefore, f∞ (d) > 0 for all
d 6= 0. The conclusion follows from Corollary B.3.11

B.4. Exercises
Exercise B.4.1. Suppose that {fα } is a collection of lower semicontinuous functions, then
W
α fα is lower semicontinuous. If f and g are real and lower semicontinuous, then f + g is
also lower semicontinuous. Any f ∈ C(X, R) is lower semicontinuous. For any U ∈ τ , the
function g(x) = 1U (x) is lower semicontinuous.
Exercise B.4.2. Let {pn : n ∈ Z+ } and {qn : n ∈ Z+ } be non-increasing sequence of
real–valued continuous functions on a compact set X. If pn ց u1 and qn ց u2 and u1 ≤ u2 ,
show that for any r ∈ N, there is Nr ∈ N such that n ≥ Nr implies that
1
pn (x) < qr (x) + x∈X
r
B.4. Exercises 647

(Hint: Consider the open sets En = {pn < qr + 1r }).

Exercise B.4.3. A function f : X → R is called affine if f (λx + (1 − λ)y) = λf (x) + (1 −
λ)f (y) for all x, y in X and λ ∈ R.
(a) If f is affine, show that there exist a unique linear functional v (not necessarily
continuous) and a constant c ∈ R such that f (x) = v(x) + c. Hence, f is also
convex.
(b) If f affine and lower semicontinuous, then f is continuous and v := f − f (0) is in
X ∗.
Exercise B.4.4. Suppose f is a proper convex function and let x ∈ dom(f ). Show
sup{x′ (y) : x′ ∈ ∂f (x)} ≤ f+′ (x; y)
for all y ∈ X.
Exercise B.4.5. Suppose X is a normed space. Show that if f (x) = kxk then f ∗ (v) =
δB ∗ (0;1) (v), where B ∗ (0; 1) is the normed unit ball in X ∗ .
Exercise B.4.6. If C is a nonempty closed convex set, show that
C∞ = {d : C + d ⊂ C}.
(Hint: show that if x + d ∈ C for all x ∈ C, then x + nd ∈ C for all x ∈ C and n ∈ Z+ .)
Index

0 − 1 law Baire σ–algebra, 129

Kolmogorov, 508 in a topological space, 127
Lévy, 603 Baire space, 116
balanced set, 325
Abel summable, 483, 625 Banach algebra, 346
Abel’s test, 625 C ∗ –algebra, 395
absolutely continuous Banach ring, 346
ementary integral (measure), 258 Banach space, 34, 323
function, 275 Banach–Alaoglu theorem, 367
absorbent set, 331 Banach–Steinhaus Theorem, 353
accumulation point, 20 band, 256
adjoint operator, 394 Beppo–Levi Theorem, 101
affine set, 325 Bernoulli measure, 457
Alexandroff’s lemma, 35 Bernoulli numbers, 321
algebra, 61 Bessel
analytic function, 322
at infinity, 284 kernel, 500
function, 283 Bessel inequality, 399
open map theorem, 308 beta distribution, 270
analytic set, 88 Beta function, 111
annihilator, 356 Bochner–Herglotz theorem, 556
approximation to the identity, 471 Borel σ–algebra, 61, 76
good kernels, 472 Borel–Cantelli theorem, 101
Arzela–Ascoli theorem, 51 converse, 187
atom, 196 bounded above, 183
Axiom of choice, 9 bounded class, 125
bounded set, 326
Baire category Theorem, 115 Box–Muller transformation, 270
Baire function
E–function, 126 χ2 –distribution, 109
in a topological space, 127 Cantor set, 90
Baire measure, 248 Carathéodory extension, 65
Baire set cardinality, 11
E–set, 126 Cartesian product, 9

649
650 Index

Cauchy distribution, 264, 489 convex set, 325

Cauchy sequence, 327 convolution
Cauchy’s theorem functions, 466
estimates, 290 measures, 464
in a convex set, 290 core
in a general domain, 295 of a set, 360
in a triangle, 288 point, 360
Cauchy–Riemann equations, 284 countably additive, 62
Cauchy–Schwarts inequality, 387 coupling, 262
central limit theorem, 556 critical value, 224
classical, 558 cross section, 209
Lindeberg–Feller, 558 curve
Cesàro summable, 482, 625 chain, 295
change of variables formula, 226 continuous, 436
charge, 62, 258 cycle, 295
pure, 258 path, 26, 287, 437
Chebyshev–Markov rectifiable, 436
generalized inequality, 112
inequality, 99 δ–ring, 61
closed graph Theorem, 374 d–system, see also Dynkin system
closed linear map, 436 Daniell extension, 148
closed under chopping, 119 Daniell mean, 147
codimension, 329 de Finetti’s theorem, 620
compact debut time, 592
relatively, 37 decomposition
sequentially, 37 Hahn, 252
set, 31 Hahn–Jordan, 260
compact class, 567 Jordan, 253
compact operator, 381 Lebesgue, 259
compact support, 127 decreasing rearrangement, 113
complex bounded class, 126 Delta method, 559
conditional derivative, 441
density, 571 directional, 443
expectation, 563 Gâteaux, 443
indepenence, 566 Radon–Nikodym, 261
regular probability, 567 symmetric, 271
confined Dieudonné–Schwartz theorem, 341
E–confined, 119 diffeomorphism, 226
self–confined, 119 Dini’s Theorem, 118
conjugate family, 583 Dirichlet kernel, 479
connected, 23 Dirichlet’s distribution, 525
arcwise, 26 disintegration
locally, 25 of probability measures, 570
pathwise, 26 of stochastic kernels, 617
continuous function, 21 distribution, 405
piecewise, 106 distribution function, 69
contraction, 38 Doob decomposition, 609
convergence in measure, 188 Doob’s inequality, 610
convex cone, 130 Doob’s maximal lemma, 610
pointed, 326 Doob’s upcrossing theorem, 601
convex hull, 335 dual pair, 364
balanced, 335 dual space, 356
Index 651

Dunford–Pettis theorem, 412 measure, 457

Dynkin system, 74 Fréchet space, 328
Fubini’s Theorem, 212
Eberlain–Smulian Theorem, 370 Fubini–Tonelli Theorem, 213
eigenvalue, 380 Fundamental Theorem of Calculus
elementary integral absolute continuous, 276
δ–continuous, 137 nearly differentiable, 276
σ–additive, 138
σ–continuous, 138 gamma distribution, 500
order continuous, 151 Gamma function, 110, 319
positive, 137 Gauss–Weierstrass kernel, 475
entire function, 293, 311 Gaussian distribution, 109
Liouville’s theorem, 293 Gibs phenomenon, 481
order of growth, 320 Gram–Schmidt orthogonalization, 400
equicontinuity, 353 Gronwall’s inequality, 282
equivalence relation, 4 group, 14
Euclidean space, 34
events, 59 Hölder’s inequality, 178
sure, 59 Hahn’s theorem, 259
void, 59 Hahn–Banach Theorem, 357
exchangeable half space, 362
σ–algebra, 510, 619 Hardy–Littlewood Theorem, 272
collection of random variables, 619 Hausdorff measure, 68, 70
expectation, 103 Hausdorff’s maximal principle, 9
exponential distribution, 457 Hausdorff–Young inequality, 492
extreme Heine–Borel theorem, 58
point, 366 hermitian operator, 397
set, 366 Hilbert space, 387, 390
Hilbert’s cube, 58
F–space, 328 holomorphic function, 284
family of exponential type, 265 homeomorphic, 21
Fatou’s Lemma, 102 homologous
Fejér kernel, 482 to 0, 295
field, 15 to η, 295
filtration, 589
natural augmentation, 592 ideal, 256
natural conditions, 592 implicit function theorem, 446
natural history, 590 indicator, 95
natural regularization, 591 inductive limit topology, 338
right continuous, 591 inductive system, 338
finite elementary integral, 248 infinite products, 311
finitely additive, 62 Initial value problem, 447
first category set, 115 inner product, 387
fixed point theorem integrable
Banach, 38 Daniell, 142
Brouwer, 232 in mean, 142
Capaccioli, 38 Lebesgue, 102
Schauder, 353 locally, 163
Tihonov, 352 Riemann, 105
Fourier inversion formula, 486, 489 integral, 98, 99
Fourier series, 478 Bochner, 429
Fourier transform Daniell, 147
function, 462 Lebesgue, 102
652 Index

Riemann, 105 likelihood

integrated hazard function, 281 function, 581
integration by parts, 277 maximal, 581
interpolation theorem limit point, see also accumulation point
Marcinkiewicz, 417 linear functional, 356
Riez, 415 continuous or bounded, 356
inverse function theorem, 447 positive, 358
inverse–gamma distribution, 501 linear map
involution, 395 bounded, 343
Ionescu Tulcea extension theorem, 506 operator, 343
isodiamtric inequality, 236 Lipschitz function, 72
isometry, 36 coefficient, 72
isomorphic degree, 72
measurable, 83 extension, 91
order, 5 local base, 331
locally compact, 52
Jacobian locally convex, 328
determinant, 223 logarithm
matrix, 223 branch, 286
Jensen’s formula, 309 of a function, 299
Jensen’s inequality, 177 principal, 286, 300
Jordan–seminorm, 135 Lusin’s Theorem, 173
Lyapunov’s convexity theorem, 198
Kantorovich’s theorem, 358
Kantorovich–Rubinstein theorem, 533
martingale, 597
kernel
of a linear map, 343 martingale convergence theorem
Kochen–Stone Lemma, 187 reverse martingale, 603
Kolmogorov’s extension theorem, 573 submartingale, 602
Kolmogorov’s three series theorem, 515 maximal function
Kolmogorov–Kintchine representation, 574 Hardy, 271
Krein–Milman theorem, 367 Mazur’s Theorem, 337
Kummer’s test, 628 mean, 141
Kuratowski–Ryll–Nardzewski Theorem, 129 Daniell, 140, 149
Daniell–Stone, 151
L0 , 190 dominating, 141
Lp , 183 inner regular, 172
Lévy–Bochner continuity theorem, 554 maximal, 167
Lagrange multipliers, 452 order continuous, 151
Laplace method, 239 outer regular, 172
Laplace transform, 322 regular, 172
large numbers mean value theorem, 442, 453
L1 –law, 521 measurable function, 76, 429
strong law, 519 measurable selection, 129
weak law, 518 measurable space, 62
Laurent–Weierstrass theorem, 300 measure, 62
law of a random variable, 107 complete, 63
least upper bound, 183 completion, 63
Lebesgue decomposition continuity properties, 90
Radon–Nikodym, 262 counting, 63
Lebesgue measure, 63, 69 semifinite, 62
Lebesgue’s dominated convergence, 103 universal completion, 77
Lebesgue–Stieljes measure, 69 measure space, 62
Index 653

median, 511 Hermite, 462

meromorphic, 302 Laguerre, 462
Minkowski’s functional, 331 Legendre, 462
Minkowski’s inequality, 179 orthogonal projection, 392
Mittag–Leffler’s theorem, 306 orthonormal polynomials
modulus of continuity, 106 Hermite, 409
mollifier, 420, 427 Laguerre, 409
monotone class, 125 Legendre, 409
sets, 74 othogonal, 389
monotone convergence Theorem, 100 outer measure, 64
Morera’s theorem, 289 countably subadditive, 64
multiplicative class measurable, 64
complex, 125 metric, 71
real, 125 monotonicity, 64
multivalued function, 129 negligible, 64
strongly measurable, 129
weakly measurable, 129 π–system, 74
parallelogram law, 388
nearly differentiable, 276
parametric model, 581
negligible set, 63
Parseval’s Theorem, 399
net, 27
partition of unity
cluster point, 28
continuous, 54
subnet, 28
smooth, 420
norm, 34, 323
Phragmen–Lindelöf theorem, 414
normal law, see also Gaussian distribution
Plancherel’s theorem, 491
normal operator, 397
Poisson approximation, 560
normed space, 34
Poisson kernel
nowhere dense, 115
disk, 293, 483
null set, see also negligible set
on Rn , 474
open map Theorem, 372 Poisson summation formula, 485
optional stopping time theorem polar
càdlàg martingale, 607 dual, 369
closable càdlàg process, 608 set, 367, 369
discrete time closable process, 605 polar coordinates, 234
discrete time u.i martingale, 605 Polish space, 34
optional time, 592 popullation, 575
order Portmanteau theorem, 530
chain, 5 possible value, 522
direction, 27 power set, 61
maximal element, 5 prior distribution, 581
partial, 4 probability
pre-order, 4 measure, 62
total, 4 space, 62
well–order, 5 process, 589
order complete, 5, 183 adapted, 589
vector lattice, 255 càdlàg, 591
order statistic, 229 càglàd, 591
order type, 5 progressively measurable, 590
ordinals, 8 prodcut space, 218
orthogonal basis, 399 product σ–algebra, 84, 218
orthogonal polynomials Prohorov’s theorem, 539
Chebyshev, 462 projection theorem, 390
654 Index

push forward, 107 self–adjoint, 397

seminorm, 323
quantile, 112 semiring, 60, 75
quotient separated, 23
space, 329 separating hyperplane Theorem, 360
topology, 329 dual separates points, 365
sequentially closed, 126
Raabe’s test, 628
sesquilinear, 393
radial function, 243
sesquilinear map, 389
Radon measure
set
positive, 151, 172
power, 2
real valued, 249
void, 2
Radon–Nikodym theorem, 261
Sierpinski’s monotone class, 75
random variable, 76
signed measure, 251
random walk, 521
negative set, 252
Rao’s theorem, 531
positive set, 252
rational function, 424
simple function, 95
recurrent point, 522
singularity
reflexive space, 359
essential, 302
relative interior, 406
pole of order m, 302
renewal measure, 465
removable, 302
residue, 302
solid, 256
theorem of, 302
space of functions
resolvent, 378
algebra, 117
Riemann–Lebesgue’s lemma
ring, 117
Fourier transform, 464
ring lattice closed under chopping, 119
Trigonometric series, 480
Riesz representation Stone lattice, 119
Lp , 1 ≤ p < ∞, 185 vector lattice, 117
C00 (X), 172 spectral radius, 379
Hilbert space, 392 spectrum, 378
Riesz–Markov representation, 172 point, 380
ring, 15, 61 statistic, 575
Rouché’s theorem, 308 ancillary, 580
Runge’s theorem, 425 complete, 579
Russell’s paradox, 2 minimal sufficient, 577
strongly sufficient, 575
σ–algebra, 61 sufficient, 575
Borel, 61 Steiner symmetrization, 237
countably generated, 91, 411 Steinhaus space, 99
optional, 591 Stirling’s formula, 240
predictable, 591 Stone–Weierstrass Theorem, 120
sub σ–algebra, 61 General, 124
trivial, 61 stopping time, 592
σ–finite strong–type (p, q), 416
function, 147 subadditive, 62
mean, 147 submartingale, 597
measure, 75 supermartingale, 597
σ–ring, 61 support
sampling space, 59 distribution, 421
Sard’s theorem, 224 function, 127
sawtooth function, 478 measure, 174
second category set, 115 supremum, 5
Index 655

Suslin weakly stationary

A–operation, 78 process, 574
E–analytic, 78 Weierstrass
scheme, 78 elementary factors, 313
symmetric function, 619 extension Theorem, 123
symmetric neighborhood, 325 factorization theorem, 315
Well ordering principle, 9
t-Student distribution, 264
Tauber’s theorem, 628 Zorn’s lemma, 9
topological vector space, 324
totally bounded, 37, 326
trace, 68
transfinite
construction, 8
induction, 8
transpose, 374

uniform contraction principle, 445

uniform distribution, 457
uniformity, 124
uniformly bounded, 353
uniformly continuous, 36
E–uniformity, 429
uniformly integrable, 192
unitary operator, 396
upper bound, 183
Uryshohn’s separation lemma, 23
Urysohn metrization theorem, 44
Urysohn’s lemma, 53

vanishing at infinity, 127

variation
of a function, 275
finite over interval, 275
local, 277
of a measure, 250
finite total, 250, 254
total, 250
of an elementary integral, 246
finite at an elementary function, 245
finite total, 248
total, 248
vector space, 117
algebra, 117
lattice, 117
partially ordered, 117
ring, 117
Vitali
covering lemma, 220
covering theorem, 221

weak∗ topology, 364

weak topology, 363

Vector Spaces First An Introduction To L
100% (2)
Vector Spaces First An Introduction To L
345 pages
Analysis Tools With Applications - Bruce Driver
No ratings yet
Analysis Tools With Applications - Bruce Driver
790 pages
Measure Integration PDF
No ratings yet
Measure Integration PDF
333 pages
Handbook of Analysis and Its Foundations 2
No ratings yet
Handbook of Analysis and Its Foundations 2
882 pages
Linear Algebra
No ratings yet
Linear Algebra
122 pages
Bruce K. Driver - Analysis Tools With Examples
No ratings yet
Bruce K. Driver - Analysis Tools With Examples
802 pages
Basic Analysis - K. Kuttler
100% (1)
Basic Analysis - K. Kuttler
499 pages
Basicengineeringmathematicsbyjohnbird 130311110900 Phpapp02
No ratings yet
Basicengineeringmathematicsbyjohnbird 130311110900 Phpapp02
992 pages
Advanced Functional Analysis - Ssab PDF
No ratings yet
Advanced Functional Analysis - Ssab PDF
182 pages
Calculus Analytic Geometry Crowell Slesnick
100% (1)
Calculus Analytic Geometry Crowell Slesnick
662 pages
Computer Enabled Mathematics
No ratings yet
Computer Enabled Mathematics
275 pages
Commutative Algebra
No ratings yet
Commutative Algebra
428 pages
Transformation Groups
No ratings yet
Transformation Groups
192 pages
Lukes-Maly - Measure and Integral PDF
100% (2)
Lukes-Maly - Measure and Integral PDF
232 pages
Lecture Notes On Multivariable Calculus
No ratings yet
Lecture Notes On Multivariable Calculus
36 pages
Foundations of Differential Geometry Vol 1 - Kobayashi, Nomizu PDF
No ratings yet
Foundations of Differential Geometry Vol 1 - Kobayashi, Nomizu PDF
169 pages
Complex
100% (1)
Complex
528 pages
Measure Theory
100% (1)
Measure Theory
253 pages
Kuttler LinearAlgebra AFirstCourse
No ratings yet
Kuttler LinearAlgebra AFirstCourse
318 pages
A Term of Commutative Algebra
No ratings yet
A Term of Commutative Algebra
133 pages
HyperbolicGeometryNotes PDF
No ratings yet
HyperbolicGeometryNotes PDF
64 pages
Algebra (Ferrar) PDF
100% (2)
Algebra (Ferrar) PDF
238 pages
Value Distribution of L-Functions
100% (1)
Value Distribution of L-Functions
319 pages
Get Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd Free All Chapters Available
100% (25)
Get Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd Free All Chapters Available
52 pages
Lessons in Electric Circuits - Vol 5 - Reference
No ratings yet
Lessons in Electric Circuits - Vol 5 - Reference
168 pages
On The Fundamentals of Geometry
No ratings yet
On The Fundamentals of Geometry
72 pages
Ziemer, Modern Real Analysis
No ratings yet
Ziemer, Modern Real Analysis
408 pages
An Analysis Companion - Heil
No ratings yet
An Analysis Companion - Heil
245 pages
Chapter 3 Introduction To Sobolev Spaces
100% (1)
Chapter 3 Introduction To Sobolev Spaces
52 pages
Functional Analysis I PDF
100% (2)
Functional Analysis I PDF
286 pages
Examples and Theorems in Analysis: Walker
No ratings yet
Examples and Theorems in Analysis: Walker
3 pages
Commutative Algebra and Its Applications
No ratings yet
Commutative Algebra and Its Applications
394 pages
An Introductory Course in Differrential Equations D.A. Murray PDF
No ratings yet
An Introductory Course in Differrential Equations D.A. Murray PDF
264 pages
AdvancedCalculus2015 Cook
100% (1)
AdvancedCalculus2015 Cook
304 pages
Engineering Mathematics - I PDF
No ratings yet
Engineering Mathematics - I PDF
665 pages
Jaroslav Lukeš, Jan Malý - Measure and Integral
100% (1)
Jaroslav Lukeš, Jan Malý - Measure and Integral
232 pages
Ernst Kunz - Introduction To Plane Algebraic Curves-Birkhäuser (2005) PDF
100% (1)
Ernst Kunz - Introduction To Plane Algebraic Curves-Birkhäuser (2005) PDF
288 pages
(RNM) R M Hardt - Introduction To Geometric Measure Theory-Longman Higher Education (2005)
No ratings yet
(RNM) R M Hardt - Introduction To Geometric Measure Theory-Longman Higher Education (2005)
36 pages
Topology and Grupoids
No ratings yet
Topology and Grupoids
538 pages
(Texts and Monographs in Symbolic Computation) Dipl.-Ing. Dr. Franz Winkler (Auth.) - Polynomial Algorithms in Computer Algebra (1996, Springer-Verlag Wien) PDF
No ratings yet
(Texts and Monographs in Symbolic Computation) Dipl.-Ing. Dr. Franz Winkler (Auth.) - Polynomial Algorithms in Computer Algebra (1996, Springer-Verlag Wien) PDF
283 pages
Lecture Slides For Introduction To Applied Linear Algebra: Vectors, Matrices, and Least Squares
No ratings yet
Lecture Slides For Introduction To Applied Linear Algebra: Vectors, Matrices, and Least Squares
470 pages
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
No ratings yet
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
5 pages
Bruce Driver-Undergraduate Analysis Tools
No ratings yet
Bruce Driver-Undergraduate Analysis Tools
228 pages
James Singer-Elements of Numerical Analysis-Academic Press PDF
100% (1)
James Singer-Elements of Numerical Analysis-Academic Press PDF
407 pages
Vector Calcus
No ratings yet
Vector Calcus
222 pages
Theory of Measure
No ratings yet
Theory of Measure
300 pages
Measure PDF
No ratings yet
Measure PDF
300 pages
Fremlin Vol2
No ratings yet
Fremlin Vol2
543 pages
I Lebesgue Integration For Functions of A Single Real Variable 1
No ratings yet
I Lebesgue Integration For Functions of A Single Real Variable 1
4 pages
Real Variables Notes: David P. Blecher
No ratings yet
Real Variables Notes: David P. Blecher
47 pages
Measure and Integration
100% (1)
Measure and Integration
417 pages
Spreij Measure Theoretic Probability
No ratings yet
Spreij Measure Theoretic Probability
169 pages
Get Measure Theory and Nonlinear Evolution Equations 1st Edition Flavia Smarrazzo PDF ebook with Full Chapters Now
100% (4)
Get Measure Theory and Nonlinear Evolution Equations 1st Edition Flavia Smarrazzo PDF ebook with Full Chapters Now
71 pages
Full download Measure Theory and Nonlinear Evolution Equations 1st Edition Flavia Smarrazzo pdf docx
100% (2)
Full download Measure Theory and Nonlinear Evolution Equations 1st Edition Flavia Smarrazzo pdf docx
75 pages
Measure Theory and Nonlinear Evolution Equations 1st Edition Flavia Smarrazzo - The ebook in PDF format is available for download
No ratings yet
Measure Theory and Nonlinear Evolution Equations 1st Edition Flavia Smarrazzo - The ebook in PDF format is available for download
74 pages
N and F, Then (F: T.K.Subrahmonian Moothathu
No ratings yet
N and F, Then (F: T.K.Subrahmonian Moothathu
18 pages
(Measure Theory) D. H Fremlin-Measure Theory-Torres Fremlin (2001)
No ratings yet
(Measure Theory) D. H Fremlin-Measure Theory-Torres Fremlin (2001)
563 pages
Anal Qual
No ratings yet
Anal Qual
10 pages
Partial Differencial Equations Book
No ratings yet
Partial Differencial Equations Book
790 pages
Mira
No ratings yet
Mira
426 pages
Pseudocode Revision Sheet
No ratings yet
Pseudocode Revision Sheet
19 pages
Experiment No 3
No ratings yet
Experiment No 3
16 pages
Unit 15 More Connectives: Siti Mukaromah 0203519060 Khusus 2
No ratings yet
Unit 15 More Connectives: Siti Mukaromah 0203519060 Khusus 2
14 pages
Riemann Integration Notes S01 2022
No ratings yet
Riemann Integration Notes S01 2022
27 pages
A Comparison of Some Fixed Point Iteration Procedures by Using The Basins of Attraction
No ratings yet
A Comparison of Some Fixed Point Iteration Procedures by Using The Basins of Attraction
9 pages
Chapter2 - Primitive Data
No ratings yet
Chapter2 - Primitive Data
38 pages
Computer Project
No ratings yet
Computer Project
5 pages
Exercise 1 Unit 12 13092021 - Fatihah
No ratings yet
Exercise 1 Unit 12 13092021 - Fatihah
3 pages
DLD Lab Manual
No ratings yet
DLD Lab Manual
109 pages
Fuzzy Logic Hybrid System
No ratings yet
Fuzzy Logic Hybrid System
42 pages
ECE114 LogicCrkt&SwtchngTheory CM5
No ratings yet
ECE114 LogicCrkt&SwtchngTheory CM5
17 pages
The Three Crises in Mathematics. Logicism, Intuition Ism and Formalism
No ratings yet
The Three Crises in Mathematics. Logicism, Intuition Ism and Formalism
7 pages
First Shifting Theorem
100% (1)
First Shifting Theorem
3 pages
Comparing Frege and Russell On The Concept of A Number
No ratings yet
Comparing Frege and Russell On The Concept of A Number
4 pages
M1 Unit 1 Lesson 4 SkillSupport
No ratings yet
M1 Unit 1 Lesson 4 SkillSupport
2 pages
Quiz 1 and 2
No ratings yet
Quiz 1 and 2
5 pages
An Introduction To Fuzzy Control
No ratings yet
An Introduction To Fuzzy Control
27 pages
Module 1 Lesson 1.2. Graphical Method
No ratings yet
Module 1 Lesson 1.2. Graphical Method
9 pages
From Kant to Hilbert Volume 1 A Source Book in the Foundations of Mathematics William Bragg Ewald instant download
100% (1)
From Kant to Hilbert Volume 1 A Source Book in the Foundations of Mathematics William Bragg Ewald instant download
52 pages
Numerical Methods in Civil Engineering: Instructions To Candidates
No ratings yet
Numerical Methods in Civil Engineering: Instructions To Candidates
2 pages
Sets
100% (1)
Sets
56 pages
EXP NO 2 - SOP and POS
No ratings yet
EXP NO 2 - SOP and POS
6 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Kinds of Sets
No ratings yet
Kinds of Sets
18 pages
Assignment 1 PDF
No ratings yet
Assignment 1 PDF
2 pages
Graph Coloring DA-1341572615794729
No ratings yet
Graph Coloring DA-1341572615794729
21 pages
Forward Backward Chaining
No ratings yet
Forward Backward Chaining
34 pages
Travelling Salesman Problem Using Branch and Bound Approach: Chaitanya Pothineni December 13, 2013
No ratings yet
Travelling Salesman Problem Using Branch and Bound Approach: Chaitanya Pothineni December 13, 2013
8 pages
Bottom-Up Parsing Including LR (0), SLR
No ratings yet
Bottom-Up Parsing Including LR (0), SLR
55 pages
JNTUH MCA Syllabus 2013
No ratings yet
JNTUH MCA Syllabus 2013
107 pages

Integration 2

Uploaded by

Integration 2

Uploaded by

c

Copyright 2010 Oliver Dı́az Espinosa

§3.1. Measurable spaces 61

§6.8. Exercises 157

§10.3. Signed measures 251

§13.1. Dunford–Pettis Theorem 413

§16.7. Series of independent random variables 516

§20.1. Measurability concepts for stochastic processes 591

Durham NC, 2010.

Elements of set theory 1

1.1. Naive set theory

1This chapter may be skipped and used only as reference.

If A and B are two sets, then the difference A \ B is defined as

Given sets A and B, the Cartesian product of A and B is defined as

We recall first the following concepts from the theory of sets.

We use the notation f : X → Y to indicate that f is a function from X to Y and

1.2. Order sets and transfinite induction

The simplest example of equivalence relation is equality of sets.

Definition 1.2.2. A pre–order R on a set X is a reflexive and transitive relation from X

Example 1.2.3. Here are some common examples of ordered sets:

(e) (lexicographic order) Suppose (A, ≤) is totally ordered. We define an order on

Theorem 1.2.11. Suppose f : A → B is an order isomorphism between two totally ordered

Proof. Suppose S ⊂ A is an ideal of A. Let z ∈ f (S) and suppose w ∈ B satisfies w ≺ z.

Let x ∈ A. As f is an order isomorphism, it is clear that f (Ax ) ⊂ Bf (x) . If y ≺ f (x) then,

Proof. Suppose S 6= A. Let x be the first element of A \ S. Then Ax ⊂ S. If y ∈ S then

Theorem 1.2.14. Suppose (A, ≤) is a well–ordered set. If f : (A, ≤) → (A, ≤) is an

Proof. Let S and T be order isomorphic ideals of A. Either S = A or there is x ∈ A such

f1 (I1 ∩ I2 ) = f2 (I1 ∩ I2 ) and by uniqueness of order isomorphisms, we have that f1 and

Proof. Let 0 denote the first element of W . Since W0 = ∅ ⊂ Q, it follows that 0 ∈ Q.

function f : T → E given by f (x) := fx (x) is well defined and satisfies

1.3. The Axiom of choice

A function f described in the Axiom of choice is called a choice function. Another

If f –string (W, ≤) and x ∈ W then, for any z ∈ Wx , W z = {y ∈ W : y < z} ∩ Wz ∩ Wx .

Example 1.4.1. The set of all integers Z is countable. The function

The following example shows that not every set is countable.

Proof. Let A be a set with cardinality A. By WO there is a well–order  on A. Let α

1.5. Simple algebraic structures

/ AjI }. If the right hand side of (1.2) is not

(d) We define a total order on Q by setting p < q iff q − p ∈ {[(a, b)]Q : a, b ∈ N} := P.

Elements of point set

2.1. General definitions

If τ1 and τ2 are topologies on X and τ1 ⊂ τ2 , we say that τ1 is weaker or coarser than

Definition 2.1.2. Let (X, τ ) be a topological space.

1This chapter may be skipped and used only as reference.

1. A collection B of subsets of X is a base for τ if any open set U ∈ τ is the union

The next definition relates arbitrary sets within a topological space.

We describe now properties of topological spaces describing relationship between points,

Continuity of functions is completely characterized in terms of preimages of open and

Suppose X is a non–empty set, {(Yα , τα ) : αA } a collection of topological spaces and

Proof. For any f ∈ F, let f |A be its restriction to A. As A ∩ f −1 (V ) = (f |A )−1 (V ) for all

Definition 2.1.15. Let (X, τ ) be a topological space. D ⊂ X is dense in X if D = X. X

If τ is a topology on X which is stronger than the order topology τo , and Z ′ is dense in

Proof. Suppose {Bn : n ∈ N} is a base for τ . For each n ∈ N choose xn ∈ Bn . We claim

Conversely suppose A and B are disjoint closed sets. Then A ⊂ U := X \ B. Choose V ∈ τ

Lemma 2.1.20. (Urysohn’s separation lemma) Suppose X is a normal topological space.

Proof. Let U = X \ B so that A ⊂ U . Let D0 = S{0, 1} and for each n ∈ N define

Define the function f : X → [0, 1] as

2.2. Connected spaces

Proof. Suppose D ⊂ Y is clopen in Y . Then A ∩ D is clopen in A for all A ∈ A. Since each

Proof. Suppose A ∪ M = U ∪ V where U and V are non–empty and separated. Since

Theorem 2.2.10. Suppose X and Y are topological spaces, and f : X → Y is continuous.

Proof. Suppose B ⊂ f (Y ) is clopen in f (Y ). Then, by continuity, A = f −1 (B) is clopen

Theorem 2.2.11. Suppose f : X → Y is a continuous function. If C is a connected

Proof. If f −1 (C) = ∅, take an empty union of components of X. Suppose x ∈ f −1 (C),

The following result is useful to determine connectedness of a set.

Theorem 2.2.12. A topological space (X, τ ) is connected iff no continuous function f :

Proof. Suppose f : (X, τ ) → ({0, 1}, τd ) is continuous. Then A = f −1 ({0}) and B =

If f is surjective, then X has proper nonempty clopen sets.

Conversely, if X is connected, then either A = ∅ or B = ∅; thus, f is not surjective.

Definition 2.2.13. A topological space (X, τ ) is locally connected if τ admits a basis of

Proof. Suppose X is locally component and U is open in X. Let C be a connected com-

A path in a topological space X is a continuous function γ : [0, 1] → X.

Proof. Fix x0 ∈ X. For any y ∈ X, let γy be a path joining x0 to y. Since the

Given a topological space X and points x, y ∈ X, a simple chain from x to y is a

A net on a set X indexed by a directed set D is a function x : D → X. Nets are

The following concept generalizes the notion of convergence of sequences.

Proof. Let A be a set with cardinality A. By WO there is a well–order on A. Let α