0% found this document useful (0 votes)
8 views

II Probability and Measure

The document is a lecture note from the University of Cambridge's Mathematics Tripos, focusing on Probability and Measure, specifically from the Michaelmas term of 2018. It covers topics such as Lebesgue measure, abstract measure theory, integration, product measures, and foundations of probability theory. The content includes definitions, propositions, and lemmas related to measure theory and its applications in probability.

Uploaded by

blade runner 737
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

II Probability and Measure

The document is a lecture note from the University of Cambridge's Mathematics Tripos, focusing on Probability and Measure, specifically from the Michaelmas term of 2018. It covers topics such as Lebesgue measure, abstract measure theory, integration, product measures, and foundations of probability theory. The content includes definitions, propositions, and lemmas related to measure theory and its applications in probability.

Uploaded by

blade runner 737
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

University of

Cambridge

Mathematics Tripos

Part II

Probability and Measure

Michaelmas, 2018

Lectures by
E. Breuillard
Notes by
Qiangru Kuang
Contents

Contents

1 Lebesgue measure 2
1.1 Boolean algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Jordan measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Abstract measure theory 9

3 Integration and measurable functions 17

4 Product measures 25

5 Foundations of probability theory 28

6 Independence 32
6.1 Useful inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Convergence of random variables 38

8 𝐿𝑝 spaces 43

9 Hilbert space and 𝐿2 -methods 48

10 Fourier transform 55

11 Gaussians 63

12 Ergodic theory 66
12.1 The canonical model . . . . . . . . . . . . . . . . . . . . . . . . . 67

Index 73

1
1 Lebesgue measure

1 Lebesgue measure
1.1 Boolean algebra

Definition (Boolean algebra). Let 𝑋 be a set. A Boolean algebra on 𝑋 is


a family of subsets of 𝑋 which

1. contains ∅,
2. is stable under finite unions and complementation.

Example.
• The trivial Boolean algebra ℬ = {∅, 𝑋}.
• The discrete Boolean algebra ℬ = 2𝑋 , the family of all subsets of 𝑋.

• Less trivially, if 𝑋 is a topological space, the family of constructible sets


forms a Boolean algebra, where a constructible set is the finite union of
locally closed set, i.e. a set 𝐸 = 𝑈 ∩ 𝐹 where 𝑈 is open and 𝐹 is closed.

Definition (finitely additive measure). Let 𝑋 be a set and ℬ a Boolean


algebra on 𝑋. A finitely additive measure on (𝑋, ℬ) is a function 𝑚 ∶ ℬ →
[0, +∞] such that
1. 𝑚(∅) = 0,

2. 𝑚(𝐸 ∪ 𝐹 ) = 𝑚(𝐸) + 𝑚(𝐹 ) where 𝐸 ∩ 𝐹 = ∅.

Example.
1. Counting measure: 𝑚(𝐸) = #𝐸, the cardinality of 𝐸 where ℬ is the
discrete Boolean algebra of 𝑋.
2. More generally, given 𝑓 ∶ 𝑋 → [0, +∞], define for 𝐸 ⊆ 𝑋,

𝑚(𝐸) = ∑ 𝑓(𝑒).
𝑒∈𝐸

3. Suppose 𝑋 = ∐𝑖=1 𝑋𝑖 , then define ℬ(𝑋) to be the unions of 𝑋𝑖 ’s. Assign


𝑁

a weight 𝑎𝑖 ≥ 0 to each 𝑋𝑖 and define 𝑚(𝐸) = ∑𝑖∶𝑋 ⊆𝐸 𝑎𝑖 for 𝐸 ∈ ℬ.


𝑖

1.2 Jordan measure


This section is a historic review and provides intuition for Lebesgue measure
theory. We’ll gloss over details of proofs in this section.

Definition. A subset of R𝑑 is called elementary if it is a finite union of


boxes, where a box is a set 𝐵 = 𝐼1 × ⋯ × 𝐼𝑑 where each 𝐼𝑖 is a finite interval
of R.

2
1 Lebesgue measure

Proposition 1.1. Let 𝐵 ⊆ R𝑑 be a box. Let ℰ(𝐵) be the family of elementary


subsets of 𝐵. Then
1. ℰ(𝐵) is a Boolean algebra on 𝐵,

2. every 𝐸 ∈ ℰ(𝐵) is a disjoint finite union of boxes,


3. if 𝐸 ∈ ℰ(𝐵) can be written as disjoint finite union in two ways,
𝐸 = ⋃𝑖=1 𝐵𝑖 = ⋃𝑗=1 𝐵𝑗′ , then ∑𝑛𝑖=1 |𝐵𝑖 | = ∑𝑚 |𝐵𝑗′ | where |𝐵| =
𝑛 𝑚
𝑗=1
∏𝑑𝑖=1 |𝑏𝑖 − 𝑎𝑖 | if 𝐵 = 𝐼1 × ⋯ × 𝐼𝑑 and 𝐼𝑖 has endpoints 𝑎𝑖 , 𝑏𝑖 .

Following this, we can define a finitely additive measure correponding to our


intuition of length, area, volume etc:

Proposition 1.2. Define 𝑚(𝐸) = ∑𝑖=1 |𝐵𝑖 | if 𝐸 is any elementary set and
𝑛

is the disjoint union of boxes 𝐵𝑖 ⊆ R𝑑 . Then 𝑚 is a finitely additive measure


on ℰ(𝐵) for any box 𝐵.

Definition. A subset 𝐸 ⊆ R𝑑 is Jordan measurable if for any 𝜀 > 0 there


are elementary sets 𝐴, 𝐵, 𝐴 ⊆ 𝐸 ⊆ 𝐵 and 𝑚(𝐵 \ 𝐴) < 𝜀.

Remark. Jordan measurable sets are bounded.

Proposition 1.3. If a set 𝐸 ⊆ R𝑑 is Jordan measurable, then

sup {𝑚(𝐴)} = inf {𝑚(𝐵)}.


𝐴⊆𝐸 elementary 𝐵⊇𝐸 elementary

In which case we define the Jordan measure of 𝐸 as

𝑚(𝐸) = sup {𝑚(𝐴)}.


𝐴⊆𝐸

Proof. Take 𝐴𝑛 ⊆ 𝐸 such that 𝑚(𝐴𝑛 ) ↑ sup and 𝐵𝑛 ⊇ 𝐸 such that 𝑚(𝐵𝑛 ) ↓ inf.
Note that

inf ≤ 𝑚(𝐵𝑛 ) = 𝑚(𝐴𝑛 ) + 𝑚(𝐵𝑛 \ 𝐴𝑛 ) ≤ sup +𝑚(𝐵𝑛 \ 𝐴𝑛 ) ≤ sup +𝜀

for arbitrary 𝜀 > 0 so they are equal.


Exercise.
1. If 𝐵 is a box, the family 𝒥(𝐵) of Jordan measurable subsets of 𝐵 is a
Boolean algebra.

2. A subset 𝐸 ⊆ [0, 1] is Jordan measurable if and only if 1𝐸 , the indicator


function on 𝐸, is Riemann integrable.

3
1 Lebesgue measure

1.3 Lebesgue measure


Although Jordan measure corresponds to the intuition of length, area and vol-
ume, it suffer from a few severe problems and issues:
1. unbounded sets in R𝑑 are not Jordan measurable.
2. 1Q∩[0,1] is not Riemann integrable, and therefore Q ∩ [0, 1] is not Jordan
measurable.
3. pointwise limits of Riemann integrable function 𝑓𝑛 ∶= 1 1 Z∩[0,1] → 1Q∩[0,1]
𝑛!
is not Riemann integrable.
The idea of Lebesgue is to use countable covers by boxes.

Definition. A subset 𝐸 ⊆ R𝑑 is Lebesgue measurable if for all 𝜀 > 0, there


exists a countable union of boxes 𝐶 with 𝐸 ⊆ 𝐶 and 𝑚∗ (𝐶 \ 𝐸) < 𝜀, where
𝑚∗ , the Lebesgue outer measure, is defined as

𝑚∗ (𝐸) = inf{∑ |𝐵𝑖 | ∶ 𝐸 ⊆ ⋃ 𝐵𝑖 , 𝐵𝑖 boxes}


𝑖≥1 𝑖≥1

for every subset 𝐸 ⊆ R𝑑 .

Remark. wlog in these definitions we may assume that boxes are open.

Proposition 1.4. The family ℒ of Lebesgue measurable subsets of R𝑑 is a


Boolean algebra stable under countable unions.

Lemma 1.5.
1. 𝑚∗ is monotone: if 𝐸 ⊆ 𝐹 then 𝑚∗ (𝐸) ⊆ 𝑚∗ (𝐹 ).
2. 𝑚∗ is countably subadditive: if 𝐸 = ⋃𝑛≥1 𝐸𝑛 where 𝐸𝑛 ⊆ R𝑑 then

𝑚∗ (𝐸) ≤ ∑ 𝑚∗ (𝐸𝑛 ).
𝑛≥1

Proof. Monotonicity is obvious. For countable subadditivity, pick 𝜀 > 0 and let
𝐶𝑛 = ⋃𝑖≥1 𝐶𝑛,𝑖 where 𝐶𝑛,𝑖 are boxes such that 𝐸𝑛 ⊆ 𝐶𝑛 and
𝜀
∑ |𝐶𝑛,𝑖 | ≤ 𝑚∗ (𝐸𝑛 ) + .
𝑖≥1
2𝑛

Then
𝜀
∑ ∑ |𝐶𝑛,𝑖 | ≤ ∑(𝑚∗ (𝐸𝑛 ) + ) = 𝜀 + ∑ 𝑚∗ (𝐸𝑛 )
𝑛≥1 𝑖≥1 𝑛≥1
2𝑛 𝑛≥1

and 𝐸 ⊆ ⋃𝑛≥1 𝐶𝑛 = ⋃𝑛≥1 ⋃𝑖≥1 𝐶𝑛,𝑖 so

𝑚∗ (𝐸) ≤ 𝜀 + ∑ 𝑚∗ (𝐸𝑛 )
𝑛≥1

for all 𝜀 > 0.

4
1 Lebesgue measure

Remark. Note that 𝑚∗ is not additive on the family of all subsets of R𝑑 .


However, it will be on ℒ, as we will show later.

Lemma 1.6. If 𝐴, 𝐵 are disjoint compact subsets of R𝑑 then

𝑚∗ (𝐴 ∪ 𝐵) = 𝑚∗ (𝐴) + 𝑚∗ (𝐵).

Proof. ≤ by the previous lemma so need to show ≥. Pick 𝜀 > 0. Let 𝐴 ∪ 𝐵 ⊆


⋃𝑛≥1 𝐵𝑛 where 𝐵𝑛 are open boxes such that

∑ |𝐵𝑛 | ≤ 𝑚∗ (𝐴 ∪ 𝐵) + 𝜀.
𝑛≥1

wlog we may assume that the side lengths of each 𝐵𝑛 are < 2,
𝛼
where

𝛼 = inf{‖𝑥 − 𝑦‖1 ∶ 𝑥 ∈ 𝐴, 𝑦 ∈ 𝐵} > 0.

where the inequality comes from the fact that 𝐴 and 𝐵 are compact and thus
closed. wlog we may discard the 𝐵𝑛 ’s that do not interesect 𝐴 ∪ 𝐵. Then by
construction

∑ |𝐵𝑛 | = ∑ |𝐵𝑛 | + ∑ |𝐵𝑛 | ≥ 𝑚∗ (𝐴) + 𝑚∗ (𝐵)


𝑛≥1 𝑛≥1,𝐵𝑛 ∩𝐴=∅ 𝑛≥1,𝐵𝑛 ∩𝐵=∅

so
𝜀 + 𝑚∗ (𝐴 ∪ 𝐵) ≥ 𝑚∗ (𝐴) + 𝑚∗ (𝐵)
for all 𝜀.

Lemma 1.7. If 𝐸 ⊆ R𝑑 has 𝑚∗ (𝐸) = 0 then 𝐸 ∈ ℒ.

Definition (null set). A set 𝐸 ⊆ R𝑑 such that 𝑚∗ (𝐸) = 0 is called a null


set.

Proof. For all 𝜀 > 0, there exist 𝐶 = ⋃𝑛≥1 𝐵𝑛 where 𝐵𝑛 are boxes such that
𝐸 ⊆ 𝐶 and ∑𝑛≥1 |𝐵𝑛 | ≤ 𝜀. But

𝑚∗ (𝐶 \ 𝐸) ≤ 𝑚∗ (𝐶) ≤ 𝜀.

Lemma 1.8. Every open subset of R𝑑 and every closed subset of R𝑑 is in


ℒ.
We will prove the lemma using the fact that the family of Lebesgue mea-
surable subsets is stable under countable union, which itself does not use this
lemma. This lemma, however, will be used to show the stability under comple-
mentation. Since the proof is quite technical (it has more to do with general
topology than measure theory), for brevity and fluency of ideas we present the
proof the main proposition first.

5
1 Lebesgue measure

Proof of Proposition 1.4. It is obvious that ∅ ∈ ℒ. To show it is stable under


countable unions, start with 𝐸𝑛 ∈ ℒ for 𝑛 ≥ 1. Need to show 𝐸 ∶= ⋃𝑛≥1 𝐸𝑛 ∈
ℒ.
Pick 𝜀 > 0. By assumption there exist 𝐶𝑛 = ⋃𝑖≥1 𝐵𝑛,𝑖 where 𝐵𝑛,𝑖 are boxes
such that 𝐸𝑛 ⊆ 𝐶𝑛 and
𝜀
𝑚∗ (𝐶𝑛 \ 𝐸𝑛 ) < 𝑛 .
2
Now
𝐸 = ⋃ 𝐸𝑛 ⊆ ⋃ 𝐶𝑛 =∶ 𝐶
𝑛≥1 𝑛≥1

so 𝐶 is again a countable union of boxes and 𝐶 \ 𝐸 ⊆ ⋃𝑛≥1 𝐶𝑛 \ 𝐸𝑛 . so

𝜀
𝑚∗ (𝐶 \ 𝐸) ≤ ∑ 𝑚∗ (𝐶𝑛 \ 𝐸𝑛 ) ≤ ∑ 𝑛
=𝜀
𝑛≥1 𝑛≥1
2

by countable subadditivity so 𝐸 ∈ ℒ.
To show it is stable under complementation, suppose 𝐸 ∈ ℒ. By assumption
there exist 𝐶𝑛 a countable union of boxes with 𝐸 ⊆ 𝐶𝑛 and 𝑚∗ (𝐶𝑛 \ 𝐸) ≤ 𝑛1 .
wlog we may assume the boxes are open so 𝐶𝑛 is open, 𝐶𝑛𝑐 is closed so 𝐶𝑛𝑐 ∈ ℒ.
Thus ⋃𝑛≥1 𝐶𝑛𝑐 ∈ ℒ by first part of the proof.
But
1
𝑚∗ (𝐸 𝑐 \ ⋃ 𝐶𝑛𝑐 ) ≤ 𝑚∗ (𝐸 𝑐 \ 𝐶𝑛𝑐 ) = 𝑚∗ (𝐶𝑛 \ 𝐸) ≤
𝑛≥1
𝑛
so 𝑚∗ (𝐸 𝑐 \ ⋃𝑛≥1 𝐶𝑛𝑐 ) = 0 so 𝐸 𝑐 \ ⋃𝑛≥1 𝐶𝑛𝑐 ∈ ℒ since it is a null set. But

𝐸 𝑐 = (𝐸 𝑐 \ ⋃ 𝐶𝑛𝑐 ) ∪ ⋃ 𝐶𝑛𝑐 ,
𝑛≥1 𝑛≥1

both of which are in ℒ so 𝐸 𝑐 ∈ ℒ.


Proof of Lemma 1.8. Every open set in R𝑑 is a countable union of boxes so is
in ℒ. It is more subtle for closed sets. The key observation is that every closed
set is the countable union of compact subsets so we are left to show compact
sets of R𝑑 are in ℒ.
Let 𝐹 ⊆ R𝑑 be compact. For all 𝑘 ≥ 1, there exist 𝑂𝑘 a countable union of
open sets such that 𝐹 ⊆ 𝑂𝑘 ∶= ⋃𝑖≥1 𝑂𝑘,𝑖 where 𝑂𝑘,𝑖 are open boxes such that

1
∑ |𝑂𝑘,𝑖 | ≤ 𝑚∗ (𝐹 ) + .
𝑖≥1
2𝑘

By compactness there exist a finite subcover so we can assume 𝑂𝑘 is a finite


union of open boxes. Moreover, wlog assume that
1. the side lengths of 𝑂𝑘,𝑖 are ≤ 1
2𝑘
.
2. for each 𝑖, 𝑂𝑘,𝑖 intersects 𝐹.
3. 𝑂𝑘+1 ⊆ 𝑂𝑘 (by replacing 𝑂𝑘+1 with 𝑂𝑘+1 ∩ 𝑂𝑘 iteratively).

6
1 Lebesgue measure

Then 𝐹 = ⋂𝑘≥1 𝑂𝑘 and we are left to show 𝑚∗ (𝑂𝑘 \ 𝐹 ) → 0. By additivity on


disjoint compact sets,

𝑚∗ (𝐹 ) + 𝑚∗ (𝑂𝑖 \ 𝑂𝑖+1 ) = 𝑚∗ (𝐹 ∪ (𝑂𝑖 \ 𝑂𝑖+1 ))

so
1
𝑚∗ (𝐹 ) + 𝑚∗ (𝑂𝑖 \ 𝑂𝑖+1 ) ≤ 𝑚∗ (𝑂𝑖 ) ≤ ∑ |𝑂𝑖,𝑗 | ≤ 𝑚∗ (𝐹 ) +
𝑗≥1
2𝑖

so 𝑚 (𝑂𝑖 \ 𝑂𝑖+1 ) ≤

2𝑖 .
1
Finally,

1 1
𝑚∗ (𝑂𝑘 \ 𝐹 ) = 𝑚∗ (⋃ (𝑂𝑖 \ 𝑂𝑖+1 )) ≤ ∑ 𝑚∗ (𝑂𝑖 \ 𝑂𝑖+1 ) ≤ ∑ = 𝑘−1 .
𝑖≥𝑘 𝑖≥𝑘 𝑖≥𝑘
2𝑖 2

The result we’re working towards is

Proposition 1.9. 𝑚∗ is countably additive on ℒ, i.e. if (𝐸𝑛 )𝑛≥1 where


𝐸𝑛 ∈ ℒ are pairwise disjoint then

𝑚∗ ( ⋃ 𝐸𝑛 ) = ∑ 𝑚∗ (𝐸𝑛 ).
𝑛≥1 𝑛≥1

Lemma 1.10. If 𝐸 ∈ ℒ then for all 𝜀 > 0 there exists 𝑈 open, 𝐹 closed,
𝐹 ⊆ 𝐸 ⊆ 𝑈 such that 𝑚∗ (𝑈 \ 𝐸) < 𝜀 and 𝑚∗ (𝐸 \ 𝐹 ) < 𝜀.

Proof. By definition of ℒ, there exists a countable union of open boxes 𝐸 ⊆


⋃𝑛≥1 𝐵𝑛 such that 𝑚∗ (⋃𝑛≥1 𝐵𝑛 \ 𝐸) < 𝜀. Just take 𝑈 = ⋃𝑛≥1 𝐵𝑛 which is
open.
For 𝐹 do the same with 𝐸 𝑐 = R𝑑 \ 𝐸 in place of 𝐸.
Proof of Proposition 1.9. First we assume each 𝐸𝑛 is compact. By a previous
lemma 𝑚∗ is additive on compact sets so for all 𝑁 ∈ N,
𝑁 𝑁
𝑚∗ ( ⋃ 𝐸𝑛 ) = ∑ 𝑚∗ (𝐸𝑛 ).
𝑛=1 𝑛=1

In particular
𝑁
∑ 𝑚∗ (𝐸𝑛 ) ≤ 𝑚∗ ( ⋃ 𝐸𝑛 )
𝑛=1 𝑛≥1

since 𝑚 is monotone. Take 𝑁 → ∞ to get one inequality. The other direction


holds by countable subadditivity of 𝑚∗ .


Now assume that each 𝐸𝑛 is a bounded subset in ℒ. By the lemma there
exists 𝐾𝑛 ⊆ 𝐸𝑛 closed, so compact, such that 𝑚∗ (𝐸𝑛 \ 𝐾𝑛 ) ≤ 2𝜀𝑛 . Since 𝐾𝑛 ’s
are disjoint, by the previous case

𝑚∗ ( ⋃ 𝐾𝑛 ) = ∑ 𝑚∗ (𝐾𝑛 )
𝑛≥1 𝑛≥1

7
1 Lebesgue measure

then

∑ 𝑚∗ (𝐸𝑛 )
𝑛≥1

≤ ∑ 𝑚∗ (𝐾𝑛 ) + 𝑚∗ (𝐸𝑛 \ 𝐾𝑛 )
𝑛≥1
𝜀
≤𝑚∗ ( ⋃ 𝐾𝑛 ) + ∑
𝑛≥1 𝑛≥1
2𝑛
≤𝑚∗ ( ⋃ 𝐸𝑛 ) + 𝜀
𝑛≥1

so one direction of inequality. Similarly the other direction holds by countable


subadditivity of 𝑚∗ .
For the general case, note that R𝑑 = ⋃𝑛∈Z𝑑 𝐴𝑛 where 𝐴𝑛 is bounded and
in ℒ, for example by taking 𝐴𝑛 to be product of half open intervals of unit
length. Write 𝐸𝑛 as ⋃𝑚∈Z𝑑 𝐸𝑛 ∩ 𝐴𝑚 so just apply the previous results to
(𝐸𝑛 ∩ 𝐴𝑚 )𝑛≥1,𝑚∈Z𝑑 .

Definition (Lebesgue measure). 𝑚∗ when restricted to ℒ is called the


Lebesgue measure and is simply denoted by 𝑚.

Example (Vitali counterexample). Althought ℒ is pretty big (it includes all


open and closed sets, countable unions and intersections of them, and has car-
dinality at least 2𝔠 where 𝔠 is the continuum, by considering a null set with
cardinality 𝔠, and each subset thereof), it does not include every subset of R𝑑 .
Consider (Q, +), the additive subgroup of (R, +). Pick a set of representative
𝐸 of the cosets of (Q, +). Choose it inside [0, 1]. For each 𝑥 ∈ R, there exists
a unique 𝑒 ∈ 𝐸 such that 𝑥 − 𝑒 ∈ Q (here we require axiom of choice). Claim
that 𝐸 ∉ ℒ and 𝑚∗ is not additive on the family of all subsets of R𝑑 .

Proof. Pick distinct rationals 𝑝1 , … , 𝑝𝑁 in [0, 1]. The sets 𝑝𝑖 + 𝐸 are pairwise
disjoint so if 𝑚∗ were additive then we would have
𝑁 𝑁
𝑚∗ ( ⋃ 𝑝𝑖 + 𝐸) = ∑ 𝑚∗ (𝑝𝑖 + 𝐸) = 𝑁 𝑚∗ (𝐸)
𝑖=1 𝑖=1

by translation invariance of 𝑚∗ . But then


𝑁
⋃ 𝑝𝑖 + 𝐸 ⊆ [0, 2]
𝑖=1

since 𝐸 ⊆ [0, 1] so by monotonicity of 𝑚∗ have


𝑁
𝑚∗ ( ⋃ 𝑝𝑖 + 𝐸) ≤ 2
𝑖=1

so for all 𝑁 𝑚∗ (𝐸) ≤ 2 so 𝑚∗ (𝐸) = 0. But

[0, 1] ⊆ ⋃ 𝐸 + 𝑞 = R,
𝑞∈Q

8
1 Lebesgue measure

by countable subadditivity of 𝑚∗ ,

1 = 𝑚∗ ([0, 1]) ≤ ∑ 𝑚∗ (𝐸 + 𝑞) = 0.
𝑞∈Q

Absurd.
In particular 𝐸 ∉ ℒ as 𝑚∗ is additive on ℒ.

9
2 Abstract measure theory

2 Abstract measure theory


In this chapter we extend measure theory to arbitrary set. Most part of the
theory is developed by Fréchet and Carathéodory.

Definition (𝜎-algebra). A 𝜎-algebra on a set 𝑋 is a Boolean algebra stable


under countable unions.

Definition (measurable space). A measurable space is a couple (𝑋, 𝒜)


where 𝑋 is a set and 𝒜 is a 𝜎-algebra on 𝑋.

Definition (measure). A measure on (𝑋, 𝒜) is a map 𝜇 ∶ 𝒜 → [0, ∞] such


that
1. 𝜇(∅) = 0,
2. 𝜇 is countably additive (also known as 𝜎-additive), i.e. for every family
(𝐴𝑛 )𝑛≥1 of disjoint subsets in 𝒜, have

𝜇( ⋃ 𝐴𝑛 ) = ∑ 𝜇(𝐴𝑛 ).
𝑛≥1 𝑛≥1

The triple (𝑋, 𝒜, 𝜇) is called a measure space.

Example.
1. (R𝑑 , ℒ, 𝑚) is a measure space.

2. (𝑋, 2𝑋 , #) where # is the counting measure.

Proposition 2.1. Let (𝑋, 𝒜, 𝜇) be a measure space. Then


1. 𝜇 is monotone: 𝐴 ⊆ 𝐵 implies 𝜇(𝐴) ⊆ 𝜇(𝐵),
2. 𝜇 is countably subadditive: 𝜇(⋃𝑛≥1 𝐴𝑛 ) ≤ ∑𝑛≥1 𝜇(𝐴𝑛 ),

3. upward monotone convergence: if

𝐸1 ⊆ 𝐸2 ⊆ ⋯ ⊆ 𝐸𝑛 ⊆ …

then
𝜇( ⋃ 𝐸𝑛 ) = lim 𝜇(𝐸𝑛 ) = sup 𝜇(𝐸𝑛 ).
𝑛→∞ 𝑛≥1
𝑛≥1

4. downard monotone convergence: if

𝐸1 ⊇ 𝐸2 ⊇ ⋯ ⊇ 𝐸𝑛 ⊇ …

10
2 Abstract measure theory

and 𝜇(𝐸1 ) < ∞ then

𝜇( ⋂ 𝐸𝑛 ) = lim 𝜇(𝐸𝑛 ) = inf 𝜇(𝐸𝑛 ).


𝑛→∞ 𝑛≥1
𝑛≥1

Proof.
1.
𝜇(𝐵) = 𝜇(𝐴) + 𝜇(𝐵
⏟ \ 𝐴)
≥0

by additivity of 𝜇.
2. See example sheet. The idea is that every countable union ⋃𝑛≥1 𝐴𝑛 is a
disjoint countable union ⋃𝑛≥1 𝐵𝑛 where for each 𝑛, 𝐵𝑛 ⊆ 𝐴𝑛 . It then
follows by 𝜎-additivity.
3. Let 𝐸0 = ∅ so
⋃ 𝐸𝑛 = ⋃ (𝐸𝑛 \ 𝐸𝑛−1 ),
𝑛≥1 𝑛≥1

a disjoint union. By 𝜎-additivity,

𝜇( ⋃ 𝐸𝑛 ) = ∑ 𝜇(𝐸𝑛 \ 𝐸𝑛−1 )
𝑛≥1 𝑛≥1

but for all 𝑁, by additivity of 𝜇,


𝑁
∑ 𝜇(𝐸𝑛 \ 𝐸𝑛−1 ) = 𝜇(𝐸𝑁 )
𝑛=1

so take limit. The supremum part is obvious.


4. Apply the previous result to 𝐸1 \ 𝐸𝑛 .

Remark. Note the 𝜇(𝐸1 ) < ∞ condition in the last part. Counterexample:
𝐸𝑛 = [𝑛, ∞) ⊆ R.

Definition (𝜎-algebra generated by a family). Let 𝑋 be a set and ℱ be


some family of subsets of 𝑋. The the intersection of all 𝜎-algebras on 𝑋
containing ℱ is a 𝜎-algebra, called the 𝜎-algebra generated by ℱ and is
denoted by 𝜎(ℱ).

Proof. Easy check. See example sheet.


Example.
1. Suppose 𝑋 = ∐𝑖=1 𝑋𝑖 , i.e. 𝑋 admits a finite partition. Let ℱ = {𝑋1 , … , 𝑋𝑛 },
𝑁

then 𝜎(ℱ) consists of all subsets that are unions of 𝑋𝑖 ’s.


2. Suppose 𝑋 is countable and let ℱ be the collection of all singletons. Then
𝜎(ℱ) = 2𝑋 .

11
2 Abstract measure theory

Definition (Borel 𝜎-algebra). Let 𝑋 be a topological space. The 𝜎-algebra


generated by open subsets of 𝑋 is called the Borel 𝜎-algebra of 𝑋, denoted
by ℬ(𝑋).

Proposition 2.2. If 𝑋 = R𝑑 then ℬ(𝑋) ⊆ ℒ. Moreover every 𝐴 ∈ ℒ can


be written as a disjoint union 𝐴 = 𝐵 ∪ 𝑁 where 𝐵 ∈ ℬ(𝑋) and 𝑁 is a null
set.

Proof. We’ve shown that ℒ is a 𝜎-algebra and contains all open sets so ℬ(𝑋) ⊆
ℒ. Given 𝐴 ∈ ℒ, 𝐴𝑐 ∈ ℒ so for all 𝑛 ≥ 1 there exists 𝐶𝑛 countable unions of
(open) boxes such that 𝐴𝑐 ⊆ 𝐶𝑛 and 𝑚∗ (𝐶𝑛 \ 𝐴𝑐 ) ≤ 𝑛1 . Take 𝐶 = ⋂𝑛≥1 𝐶𝑛 ∈
ℬ(𝑋). Thus 𝐵 ∶= 𝐶 𝑐 ∈ ℬ(𝑋) and 𝑚(𝐴 \ 𝐵) = 0 because 𝐴 \ 𝐵 = 𝐶 \ 𝐴𝑐 .
Remark.
1. It can be shown that ℬ(R𝑑 ) ⊊ ℒ. In fact |ℒ| ≥ 2𝔠 and |ℬ(R𝑑 )| = 𝔠.
2. If ℱ is a family of subsets of a set 𝑋, the Boolean algebra generated by
ℱ can be explicitly described as

ℬ(ℱ) = {finite unions of 𝐹1 ∩ ⋯ ∩ 𝐹𝑁 ∶ 𝐹𝑖 ∈ ℱ or 𝐹𝑖𝑐 ∈ ℱ}.

3. However, this is not so for 𝜎(ℱ). There is no “simple” description of 𝜎-


algebra generated by ℱ. (c.f. Borel hierarchy in descriptive set theory and
transfinite induction)

Definition (𝜋-system). A family ℱ of subsets of a set 𝑋 is called a 𝜋-system


if it contains ∅ and it is closed under finite intersection.

Proposition 2.3 (measure uniqueness). Let (𝑋, 𝒜) be a measurable space.


Assume 𝜇1 and 𝜇2 are two finite measures (i.e. 𝜇𝑖 (𝑋) < ∞) such that
𝜇1 (𝐹 ) = 𝜇2 (𝐹 ) for every 𝐹 ∈ ℱ where ℱ is a 𝜋-system with 𝜎(ℱ) = 𝒜.
Then 𝜇1 = 𝜇2 .

For R𝑑 , we only have to check open boxes.


Proof. We state first the following lemma:

Lemma 2.4 (Dynkin lemma). If ℱ is a 𝜋-system on 𝑋 and 𝒞 is a family


of subsets of 𝑋 such that ℱ ⊆ 𝒞 and 𝒞 is stable under complementation and
disjoint countable unions. Then 𝜎(ℱ) ⊆ 𝒞.

Let 𝒞 = {𝐴 ∈ 𝒜 ∶ 𝜇1 (𝐴) = 𝜇2 (𝐴)}. Then 𝒞 is clearly stable under comple-


mentation as
𝜇𝑖 (𝐴𝑐 ) = 𝜇𝑖 (𝑋 \ 𝐴) = 𝜇𝑖 (𝑋) − 𝜇𝑖 (𝐴).
𝒞 is also clearly stable under countable disjoint unions by 𝜎-additivity. Thus by
Dynkin lemma, 𝒞 ⊇ 𝜎(ℱ) = 𝒜.

12
2 Abstract measure theory

Proof of Dynkin lemma. Let ℳ be the smallest family of subsets of 𝑋 contain-


ing ℱ and stable under complementation and countable disjoint union (2𝑋 is
such a family and taking intersection). Sufficient to show that ℳ is a 𝜎-algebra,
as then ℳ ⊆ 𝒞 implies 𝜎(ℱ) ⊆ 𝒞.
It suffices to show ℳ is a Boolean algebra. Let

ℳ′ = {𝐴 ∈ ℳ ∶ 𝐴 ∩ 𝐵 ∈ ℳ for all 𝐵 ∈ ℱ}.

ℳ′ again is stable under countable disjoint unions and complementation because

𝐴𝑐 ∩ 𝐵 = (𝐵𝑐 ∪ (𝐴 ∩ 𝐵))𝑐

as a disjoint union so is in ℳ.
As ℳ′ ⊇ ℱ, by minimality of ℳ, have ℳ = ℳ′ . Now let

ℳ″ = {𝐴 ∈ ℳ′ ∶ 𝐴 ∩ 𝐵 ∈ ℳ for all 𝐵 ∈ ℳ}.

The same argument shows that ℳ″ = ℳ. Thus ℳ is a Boolean algebra and a


𝜎-algebra.

Proposition 2.5 (uniqueness of Lebesgue measure). Lebesgue measure is


the unique translation invariant measure 𝜇 on (R𝑑 , ℬ(R𝑑 )) such that

𝜇([0, 1]𝑑 ) = 1.

Proof. Exercise. Hint: use the 𝜋-system ℱ made of all boxes in R𝑑 and dissect
a cube into dyadic pieces. Then approximate and use monotone.
Remark.
1. There is no countably additive translation invariant measure on R defined
on all subsets of R. (c.f. Vitali’s counterexample).

2. However, the Lebesgue measure can be extended to a finitely additive


measure on all subsets of R (proof requires Hahn-Banach theorem. See
IID Linear Analysis).
Recall the construction of Lebesgue measure: we take boxes in R𝑑 , and define
elementary sets, which is the Boolean algebra generated by boxes. Then we can
define Jordan measure which is finitely additive. However, this is not countably
additive but analysis craves limits so we define Lebesgue measurable sets, by
introducing the outer measure 𝑚∗ , which is built from the Jordan measure.
Finally we restrict this outer measure to ℒ. We also define the Borel 𝜎-algebra,
which is the same as the 𝜎-algebra generated by the boxes. We show that the
Borel 𝜎-algebra is contained in ℒ, and every element in ℒ can be written as a
disjoint union of an element in the Borel 𝜎-algebra and a measure zero set.
Suppose ℬ is a Boolean algebra on a set 𝑋. Let 𝜇 be a finitely additive
measure on ℬ. We are going to construct a measure on 𝜎(ℬ).

13
2 Abstract measure theory

Theorem 2.6 (Carathéodory extension theorem). Assume that 𝜇 is count-


ably additive on ℬ, i.e. if 𝐵𝑛 ∈ ℬ disjoint is such that ⋃𝑛≥1 𝐵𝑛 ∈ ℬ then
𝜇(⋃𝑛≥1 𝐵𝑛 ) = ∑𝑛≥1 𝜇(𝐵𝑛 ) and assume that 𝜇 is 𝜎-finite, i.e. there exists
𝑋𝑚 ∈ ℬ such that 𝑋 = ⋃𝑚≥1 𝑋𝑚 and 𝜇(𝑋𝑚 ) < ∞, then 𝜇 extends uniquely
to a measure on 𝜎(ℬ).

Proof. For any 𝐸 ⊆ 𝑋, let

𝜇∗ (𝐸) = inf{∑ 𝜇(𝐵𝑛 ) ∶ 𝐸 ⊆ ⋃ 𝐵𝑛 , 𝐵𝑛 ∈ ℬ}


𝑛≥1 𝑛≥1

and call it the outer measure associated to 𝜇. Define a subset 𝐸 ⊆ 𝑋 to be


𝜇∗ -measurable if for all 𝜀 > 0 there exists 𝐶 = ⋃𝑛≥1 𝐵𝑛 with 𝐵𝑛 ∈ ℬ such that
𝐸 ⊆ 𝐶 and
𝜇∗ (𝐶 \ 𝐸) ≤ 𝜀.
We denote by ℬ∗ the set of 𝜇∗ -measurable subsets. Claim that
1. 𝜇∗ is countably subadditive and monotone.

2. 𝜇∗ (𝐵) = 𝜇(𝐵) for all 𝐵 ∈ ℬ.


3. ℬ∗ is a 𝜎-algebra and contains all 𝜇∗ -null sets and ℬ.
4. 𝜇∗ is 𝜎-additive on ℬ∗ .

Then existence follows from the proposition as ℬ∗ ⊇ 𝜎(ℬ): 𝜇∗ will be a


measure on ℬ∗ and thus on 𝜎(ℬ). Uniqueness follows from a similar proof for
Lebesgue measure via Dynkin lemma.
Proof. This will be very easy as we only need to adapt our previous work to
the general case. Note that in a few occassion we used properties of R𝑑 , such
as openness of some sets, so be careful.
1. Same.
2. 𝜇∗ (𝐵) ≤ 𝜇(𝐵) for all 𝐵 ∈ ℬ by definition of 𝜇∗ . For the other direction, for
all 𝜀 > 0, there exist 𝐵𝑛 ∈ ℬ such that 𝐵 ⊆ ⋃𝑛≥1 𝐵𝑛 and ∑𝑛≥1 𝜇(𝐵𝑛 ) ≤
𝜇∗ (𝐵) + 𝜀. But
𝐵 = ⋃ 𝐵𝑛 ∩ 𝐵 = ⋃ 𝐶𝑛
𝑛≥1 𝑛≥1

where 𝐶𝑛 ∶= 𝐵𝑛 ∩ 𝐵 \ ⋃𝑖<𝑛 𝐵 ∩ 𝐵𝑖 and so 𝐶𝑛 ∈ ℬ. Thus by countable


additivity
𝜇(𝐵) = ∑ 𝜇(𝐶𝑛 ) ≤ ∑ 𝜇(𝐵𝑛 ) ≤ 𝜇∗ (𝐵) + 𝜀
𝑛≥1 𝑛≥1

3. 𝜇∗ -null sets and ℬ are obviously in ℬ∗ . Thus it is left to show that ℬ∗


is a 𝜎-algebra. Stability under countable union is exactly the same and
then we claim that ℬ∗ is stable under complementation. This is the bit
where we used closed/open sets in R𝑑 in the original proof. Here we use
a lemma as a substitute.

14
2 Abstract measure theory

Lemma 2.7. Suppose 𝐵𝑛 ∈ ℬ then ⋂𝑛≥1 𝐵𝑛 ∈ ℬ∗ .

Proof. First claim that if 𝐸 = ⋂𝑛≥1 𝐼𝑛 where 𝐼𝑛+1 ⊆ 𝐼𝑛 and 𝐼𝑛 ∈ ℬ such


that 𝜇(𝐼1 ) < ∞ then 𝜇∗ (𝐸) = lim𝑛→∞ 𝜇(𝐼𝑛 ) and 𝐸 ∈ ℬ∗ : by additivity of
𝜇 on ℬ,
𝑁
∑ 𝜇(𝐼𝑛 \ 𝐼𝑛+1 ) = 𝜇(𝐼1 ) − 𝜇(𝐼𝑁 )
𝑛=1

which converges as 𝑁 → ∞ (because 𝜇(𝐼𝑛+1 ) ≤ 𝜇(𝐼𝑛 )), so

∑ 𝜇(𝐼𝑛 \ 𝐼𝑛+1 ) → 0
𝑛≥𝑁

as 𝑁 → ∞. But LHS is greater than 𝜇∗ (𝐼𝑁 \ 𝐸) because 𝐼𝑁 \ 𝐸 =


⋃𝑛≥𝑁 𝐼𝑛 \ 𝐼𝑛+1 . Therefore 𝐸 ∈ ℬ∗ and

𝜇(𝐼𝑛 ) ≤ 𝜇 ∗ (𝐼 \ 𝐸) + 𝜇∗ (𝐸)
⏟⏟ ⏟𝑛 ⏟⏟ ⏟
→0 ≤𝜇(𝐼𝑛 )

so
lim 𝜇(𝐼𝑛 ) = 𝜇∗ (𝐸).
𝑛→∞

Now for the actual lemma, let 𝐸 = ⋂𝑛≥1 𝐼𝑛 where 𝐼𝑛 ∈ ℬ. wlog we may
assume 𝐼𝑛+1 ⊆ 𝐼𝑛 . By 𝜎-finiteness assumption, 𝑋 = ⋃𝑚≥1 𝑋𝑚 where
𝑋𝑚 ∈ ℬ with 𝜇(𝑋𝑚 ) < ∞ so

𝐸 = ⋃ 𝐸 ∩ 𝑋𝑚 .
𝑚≥1

By the claim for all 𝑚, 𝐸 ∩ 𝑋𝑚 ∈ ℬ∗ so 𝐸 ∈ ℬ∗ .

From the lemma we can derive that ℬ∗ is also stable under complementa-
tion: given 𝐸 ∈ ℬ∗ , for all 𝑛 there exist 𝐶𝑛 = ⋃𝑖≥1 𝐵𝑛,𝑖 where 𝐵𝑛,𝑖 ∈ ℬ
such that 𝐸 ⊆ 𝐶𝑛 and 𝜇∗ (𝐶𝑛 \ 𝐸) ≤ 𝑛.
1
Now

𝐸 𝑐 = ( ⋃ 𝐶𝑛𝑐 ) ∪ (𝐸 𝑐 \ ⋃ 𝐶𝑛𝑐 )
𝑛≥1 𝑛≥1

but 𝐶𝑛𝑐 is a countable intersection ⋂𝑖≥1 𝐵𝑛,𝑖


𝑐
and 𝐸 𝑐 \ ⋃𝑛≥1 𝐶𝑛𝑐 is 𝜇∗ -null
so by the lemma, 𝐶𝑛 ∈ ℬ . Therefore their union is also in ℬ∗ . Since
𝑐 ∗

we’ve shown that null sets are in ℬ∗ , 𝐸 𝑐 ∈ ℬ∗ .


4. We want to show 𝜇∗ is countably additive on ℬ∗ . Recall that 𝜇 is 𝜎-finite:
there exists 𝑋𝑚 ∈ ℬ such that 𝑋 = ⋃𝑚≥1 𝑋𝑚 , 𝜇(𝑋𝑚 ) < ∞. We say
𝐸 ⊆ 𝑋 is bounded if there exists 𝑚 such that 𝐸 ⊆ 𝑋𝑚 . It is then enough
to show countable additivity for bounded sets by the same argument as
before: write 𝑋 = ⋃𝑚≥1 𝑋̃ 𝑚 where 𝑋̃ 𝑚 = 𝑋𝑚 \ ⋃𝑖<𝑚 𝑋𝑖 ∈ ℬ so this is a
disjoint union. Then if 𝐸 = ⋃𝑛≥1 𝐸𝑛 as a disjoint union then

𝐸 = ⋃ ⋃ (𝐸𝑛 ∩ 𝑋̃ 𝑚 )
𝑛≥1 𝑚≥1

15
2 Abstract measure theory

which is also a countable disjoint union.


Given 𝐸, if we can show finite additivity then
𝑁 𝑁
∑ 𝜇∗ (𝐸𝑛 ) = 𝜇∗ ( ⋃ 𝐸𝑛 ) ≤ 𝜇∗ (𝐸) ≤ ∑ 𝜇∗ (𝐸𝑛 )
𝑛=1 𝑛=1 𝑛≥1

take limit as 𝑁 → ∞ to have equality throughout.


It suffices to prove finite additivity when 𝐸 and 𝐹 are countable intersec-
tions of sets from ℬ: 𝐸, 𝐹 ∈ ℬ∗ so for 𝜀 > 0 there exists 𝐶, 𝐷 countable
intersections of sets from ℬ such that 𝐶 ⊆ 𝐸, 𝐷 ⊆ 𝐹 and

𝜇∗ (𝐸) ≤ 𝜇∗ (𝐶) + 𝜀
𝜇∗ (𝐹 ) ≤ 𝜇∗ (𝐷) + 𝜀

As 𝐸 ∩ 𝐹 = ∅ and 𝐶 ⊆ 𝐸, 𝐷 ⊆ 𝐹, 𝐶 ∩ 𝐷 = ∅ so by finite additivity,

𝜇∗ (𝐸) + 𝜇∗ (𝐹 ) ≤ 2𝜀 + 𝜇∗ (𝐶 ∪ 𝐷) ≤ 2𝜀 + 𝜇∗ (𝐸 ∪ 𝐹 ).

As usual, reverse holds by subadditivity.


Finally for 𝐸 = ⋂𝑛≥1 𝐼𝑛 , 𝐹 = ⋂𝑛≥1 𝐽𝑛 bounded, wlog assume 𝐼𝑛+1 ⊆
𝐼𝑛 , 𝐽𝑛+1 ⊆ 𝐽𝑛 . 𝜇(𝐼𝑛 ), 𝜇(𝐽𝑛 ) < ∞. Now use claim 3,

𝜇∗ (𝐸) = lim 𝜇∗ (𝐼𝑛 )


𝑛→∞
𝜇∗ (𝐹 ) = lim 𝜇∗ (𝐽𝑛 )
𝑛→∞

so

𝜇∗ (𝐸) + 𝜇∗ (𝐹 ) = lim 𝜇(𝐼𝑛 ) + 𝜇(𝐽𝑛 ) = lim (𝜇(𝐼𝑛 ∪ 𝐽𝑛 ) + 𝜇(𝐼𝑛 ∩ 𝐽𝑛 ))


𝑛→∞ 𝑛→∞

But

⋂ (𝐼𝑛 ∩ 𝐽𝑛 ) = 𝐸 ∩ 𝐹 = ∅
𝑛≥1

⋂ (𝐼𝑛 ∪ 𝐽𝑛 ) = 𝐸 ∪ 𝐹
𝑛≥1

so by claim 3

lim 𝜇(𝐼𝑛 ∩ 𝐽𝑛 ) = 0
𝑛→∞
lim 𝜇(𝐼𝑛 ∪ 𝐽𝑛 ) = 𝜇∗ (𝐸 ∪ 𝐹 )
𝑛→∞

which finishes the proof.

Remark. We prove that every set in ℬ∗ is a disjoint union 𝐸 = 𝐹 ∪ 𝑁 where


𝐹 ∈ 𝜎(ℬ) and 𝑁 is 𝜇∗ -null.

16
2 Abstract measure theory

Definition (completion). We say that ℬ∗ is the completion of 𝜎(ℬ) with


respect to 𝜇.

Example. ℒ is the completion of ℬ(R𝑑 ) in R𝑑 .

17
3 Integration and measurable functions

3 Integration and measurable functions

Definition (measurable function). Let (𝑋, 𝒜) be a measurable space. A


function 𝑋 → R is called measurable or 𝒜-measurable if for all 𝑡 ∈ R,

{𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) < 𝑡} ∈ 𝒜.

Remark. The 𝜎-algebra generated by intervals (−∞, 𝑡) where 𝑡 ∈ R is the Borel


𝜎-algebra of R, denote ℬ(R). Thus for every measurable function 𝑓 ∶ 𝑋 → R,
the preimage 𝑓 −1 (𝐵) ∈ 𝒜 for all 𝐵 ∈ ℬ(R). However, it is not true that
𝑓 −1 (𝐿) ∈ 𝒜 for any 𝐿 ∈ ℒ.
Remark. If 𝑓 is allowed to take the values +∞ and −∞ we will say that 𝑓 is
measurable if additionally 𝑓 −1 ({+∞}) ∈ 𝒜 and 𝑓 −1 ({−∞}) ∈ 𝒜.
More generally,

Definition (measurable map). Suppose (𝑋, 𝒜) and (𝑌 , ℬ) are measurable


spaces. A map 𝑓 ∶ 𝑋 → 𝑌 is measurable if for all 𝐵 ∈ ℬ, 𝑓 −1 (𝐵) ∈ 𝒜.

Proposition 3.1.

1. The composition of measurable maps is measurable.


2. If 𝑓, 𝑔 ∶ (𝑋, 𝒜) → R are measurable functions then 𝑓 + 𝑔, 𝑓𝑔 and 𝜆𝑓
for 𝜆 ∈ R are also measurable.
3. If (𝑓𝑛 )𝑛≥1 is a sequence of measurable functions on (𝑋, 𝒜) then so are
sup𝑛 𝑓𝑛 , inf𝑛 𝑓𝑛 , lim sup𝑛 𝑓𝑛 and lim inf𝑛 𝑓𝑛 .

Proof.
1. Obvious.
2. Follow from 1 once it’s shown that + ∶ R2 → R and × ∶ R2 → R are
measurable (with respect to Borel sets). The sets

{(𝑥, 𝑦) ∶ 𝑥 + 𝑦 < 𝑡}
{(𝑥, 𝑦) ∶ 𝑥𝑦 < 𝑡}

are open in R2 and hence Borel.


3. inf𝑛 𝑓𝑛 (𝑥) < 𝑡 if and only if

𝑥 ∈ ⋃{𝑥 ∶ 𝑓𝑛 (𝑥) < 𝑡}


𝑛

and similar for sup. Similarly lim sup𝑛 𝑓𝑛 (𝑥) < 𝑡 if and only if
1
𝑥 ∈ ⋃ ⋂ ⋃ {𝑥 ∶ 𝑓𝑛 (𝑥) < 𝑡 − }.
𝑚≥1 𝑘≥1 𝑛≥𝑘
𝑚

18
3 Integration and measurable functions

Proposition 3.2. 𝑓 = (𝑓1 , … , 𝑓𝑑 ) ∶ (𝑋, 𝒜) → (R𝑑 , ℬ(R𝑑 )) where 𝑑 ≥ 1 is


measurable if and only if each 𝑓𝑖 ∶ 𝑋 → R is measurable.

Proof. One direction is easy: suppose 𝑓 is measurable then

{𝑥 ∶ 𝑓𝑖 (𝑥) < 𝑡} = 𝑓 −1 ({𝑦 ∈ R𝑑 ∶ 𝑦𝑖 < 𝑡}),

which is open so 𝑓𝑖 is measurable.


Conversely, suppose 𝑓𝑖 is measurable. Then
𝑑 𝑑
𝑓 −1 (∏[𝑎𝑖 , 𝑏𝑖 ]) = ⋂ {𝑥 ∶ 𝑎𝑖 ≤ 𝑓𝑖 (𝑥) ≤ 𝑏𝑖 }
𝑖=1 𝑖=1

As the boxes generate the Borel sets, done.


Example.
1. Let (𝑋, 𝒜) be a measurable space and 𝐸 ⊆ 𝑋. Then 𝐸 ∈ 𝒜 if and only if
1𝐸 , the indicator function on 𝐸, is 𝒜-measurable.

2. If 𝑋 = ∐𝑖=1 𝑋𝑖 and 𝒜 is the Boolean algebra generated by the 𝑋𝑖 ’s. A


𝑁

function 𝑓 ∶ (𝑋, 𝒜) → R is measurable if and only if 𝑓 is constant on each


𝑋𝑖 . In this case the vector space of measurable functions has dimension
𝑁.
3. Every continuous function 𝑓 ∶ R𝑑 → R is measurable.

Definition (Borel measurable). If 𝑋 is a topological space, 𝑓 ∶ 𝑋 → R is


Borel or Borel measurable if it is ℬ(𝑋)-measurable.

Definition (simple function). A function 𝑓 on (𝑋, 𝒜) is called simple if


𝑛
𝑓 = ∑ 𝑎𝑖 1𝐴𝑖
𝑖=1

for some 𝑎𝑖 ≥ 0 and 𝐴𝑖 ∈ 𝒜.

Of course simple functions are measurable.

Lemma 3.3. If a simple function can be written in two ways


𝑛 𝑠
𝑓 = ∑ 𝑎𝑖 1𝐴𝑖 = ∑ 𝑏𝑗 1𝐵𝑗
𝑖=1 𝑗=1

then 𝑛 𝑠
∑ 𝑎𝑖 𝜇(𝐴𝑖 ) = ∑ 𝑏𝑗 𝜇(𝐵𝑗 )
𝑖=1 𝑗=1

for any measure 𝜇 on (𝑋, 𝒜).

Proof. Example sheet 1.

19
3 Integration and measurable functions

Definition (integral of a simple function with respect to a measure). The


𝜇-integral of 𝑓 is defined by
𝑛
𝜇(𝑓) ∶= ∑ 𝑎𝑖 𝜇(𝐴𝑖 ).
𝑖=1

Remark.
1. The lemma says that the integral is well-defined.
2. We also use the notation ∫ 𝑓𝑑𝜇 to denote 𝜇(𝑓).
𝑋

Proposition 3.4. 𝜇-integral satisfies, for all simple functions 𝑓 and 𝑔,


1. linearity: for all 𝛼, 𝛽 ≥ 0, 𝜇(𝛼𝑓 + 𝛽𝑔) = 𝛼𝜇(𝑓) + 𝛽𝜇(𝑔).
2. positivity: if 𝑔 ≤ 𝑓 then 𝜇(𝑔) ≤ 𝜇(𝑓).

3. if 𝜇(𝑓) = 0 then 𝑓 = 0 𝜇-almost everywhere, i.e. {𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) ≠ 0}


is a 𝜇-null set.

Proof. Obvious from definition and lemma.

Definition. If 𝑓 ≥ 0 and measurable on (𝑋, 𝒜), define

𝜇(𝑓) = sup{𝜇(𝑔) ∶ 𝑔 simple , 𝑔 ≤ 𝑓} ∈ [0, +∞].

Remark. This is consistent with the definition for 𝑓 simple, due to positivity.

Definition (integrable). If 𝑓 is an arbitrary measurable function on (𝑋, 𝒜)


we say 𝑓 is 𝜇-integrable if
𝜇(|𝑓|) < ∞.

Definition (integral with respect to a measure). If 𝑓 is 𝜇-integrable, then


we define its 𝜇-integral by

𝜇(𝑓) = 𝜇(𝑓 + ) − 𝜇(𝑓 − )

where 𝑓 + = max{0, 𝑓} and 𝑓 − = (−𝑓)+ .

Note.
|𝑓| = 𝑓 + + 𝑓 −
𝑓 = 𝑓+ − 𝑓−

Theorem 3.5 (monotone convergence theorem). Let (𝑓𝑛 )𝑛≥1 be a sequence


of measurable functions on a measure space (𝑋, 𝒜, 𝜇) such that

0 ≤ 𝑓 1 ≤ 𝑓 2 ≤ ⋯ ≤ 𝑓𝑛 ≤ …

20
3 Integration and measurable functions

Let 𝑓 = lim𝑛→∞ 𝑓𝑛 . Then

𝜇(𝑓) = lim 𝜇(𝑓𝑛 ).


𝑛→∞

Lemma 3.6. If 𝑔 is a simple function on (𝑋, 𝒜, 𝜇), the map

𝑚𝑔 ∶ 𝒜 → [0, ∞]
𝐸 ↦ 𝜇(1𝐸 𝑔)

is a measure on (𝑋, 𝒜).

Proof. Write 𝑔 = ∑𝑟𝑖=1 𝑎𝑖 1𝐴𝑖 so 𝑔1𝐸 = ∑𝑟𝑖=1 𝑎𝑖 1𝐴𝑖 ∩𝐸 so


𝑟
𝜇(1𝐸 𝑔) = ∑ 𝑎𝑖 𝜇(𝐴𝑖 ∩ 𝐸).
𝑖=1

By a question on example sheet this is well-defined. Then 𝜎-additivity follows


immediately from 𝜎-additivity of 𝜇.

Proof of monotone convergence theorem. 𝑓𝑛 ≤ 𝑓𝑛+1 ≤ 𝑓 by assumption so

𝜇(𝑓𝑛 ) ≤ 𝜇(𝑓𝑛+1 ) ≤ 𝜇(𝑓)

by definition of integral so

lim 𝜇(𝑓𝑛 ) ≤ 𝜇(𝑓),


𝑛→∞

although RHS may be infinite.


Let 𝑔 be any simple function with 𝑔 ≤ 𝑓. Need to show that 𝜇(𝑔) ≤
lim𝑛→∞ 𝜇(𝑓𝑛 ). Pick 𝜀 > 0. Let

𝐸𝑛 = {𝑥 ∈ 𝑋 ∶ 𝑓𝑛 (𝑥) ≥ (1 − 𝜀)𝑔(𝑥)}.

Then 𝑋 = ⋃𝑛≥1 𝐸𝑛 and 𝐸𝑛 ⊆ 𝐸𝑛+1 . So we may apply upward monotone


convergence for sets to measure 𝑚𝑔 and get

lim 𝑚𝑔 (𝐸𝑛 ) = 𝑚𝑔 (𝑋) = 𝜇(𝑔1𝑋 ) = 𝜇(𝑔).


𝑛→∞

But
(1 − 𝜀)𝑚𝑔 (𝐸𝑛 ) = 𝜇((1 − 𝜀)𝑔1𝐸𝑛 )) ≤ 𝜇(𝑓𝑛 )
because (1 − 𝜀)𝑔1𝐸𝑛 is a simple function smaller than 𝑓𝑛 . Taking limit,

(1 − 𝜀)𝜇(𝑔) ≤ lim 𝜇(𝑓𝑛 )


𝑛→∞

which holds for all 𝜀. So


𝜇(𝑔) ≤ lim 𝜇(𝑓𝑛 ).
𝑛→∞

21
3 Integration and measurable functions

Lemma 3.7. If 𝑓 ≥ 0 is a measurable function on (𝑋, 𝒜) then there is a


sequence of simple functions (𝑔𝑛 )𝑛≥1

0 ≤ 𝑔𝑛 ≤ 𝑔𝑛+1 ≤ 𝑓

such that for all 𝑥 ∈ 𝑋, 𝑔𝑛 (𝑥) ↑ 𝑓(𝑥).

Notation. 𝑔𝑛 ↑ 𝑓 means that lim𝑛→∞ 𝑔𝑛 (𝑥) = 𝑓(𝑥) and 𝑔𝑛+1 ≥ 𝑔𝑛 .


Proof. We can take
1 𝑛
⌊2 min{𝑓, 𝑛}⌋
𝑔𝑛 =
2𝑛
pointwise. Check that ⌊2𝑦⌋ ≥ 2⌊𝑦⌋ for all 𝑦 ≥ 0.

Proposition 3.8. Basic properties of the integral (for positive functions):


suppose 𝑓, 𝑔 ≥ 0 are measurable on (𝑋, 𝒜, 𝜇).
1. linearity: for all 𝛼, 𝛽 ≥ 0, 𝜇(𝛼𝑓 + 𝛽𝑦) = 𝛼𝜇(𝑓) + 𝛽𝜇(𝑔).
2. positivity: if 0 ≤ 𝑓 ≤ 𝑔 then 𝜇(𝑓) ≤ 𝜇(𝑔).

3. if 𝜇(𝑓) = 0 then 𝑓 = 0 𝜇-almost everywhere.


4. if 𝑓 = 𝑔 𝜇-almost everywhere then 𝜇(𝑓) = 𝜇(𝑔).
Proof.
1. Follows from the same property for simple functions and from Lemma 3.7
combined with monotone convergence theorem.
2. Obvious from definition.
3.
1
{𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) ≠ 0} = ⋃ {𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) > }
𝑛≥0
𝑛
set 𝑔𝑛 = 𝑛1 1{𝑥∈𝑋∶𝑓(𝑥)>1/𝑛} which is simple and 𝑔𝑛 ≤ 𝑓 so by definition of
integral 𝜇(𝑔𝑛 ) ≤ 𝜇(𝑓) so 𝜇(𝑔𝑛 ) = 0, i.e. 𝜇({𝑥 ∶ 𝑓(𝑥) > 𝑛1 }) = 0.
4. Note that if 𝐸 ∈ 𝒜, 𝜇(𝐸 𝑐 ) = 0 then
𝜇(ℎ1𝐸 ) = 𝜇(ℎ)
for all ℎ simple. Thus it holds for all ℎ ≥ 0 measurable. Now take
𝐸 = {𝑥 ∶ 𝑓(𝑥) = 𝑔(𝑥)}.

Proposition 3.9 (linearity of integral). Suppose 𝑓, 𝑔 are 𝜇-integrable func-


tions and 𝛼, 𝛽 ∈ R. Then 𝛼𝑓 + 𝛽𝑔 is 𝜇-integrable and

𝜇(𝛼𝑓 + 𝛽𝑔) = 𝛼𝜇(𝑓) + 𝛽𝜇(𝑔).


Proof. We have shown the case when 𝛼, 𝛽 ≥ 0 and 𝑓, 𝑔 ≥ 0. In the general case,
use the positive and negative parts.

22
3 Integration and measurable functions

Lemma 3.10 (Fatou’s lemma). Suppose (𝑓𝑛 )𝑛≥1 is a sequence of measurable


functions on (𝑋, 𝒜, 𝜇) such that 𝑓𝑛 ≥ 0 for all 𝑛. Then

𝜇(lim inf 𝑓𝑛 ) ≤ lim inf 𝜇(𝑓𝑛 ).


𝑛→∞ 𝑛→∞

Remark. We may not have equality: let 𝑓𝑛 = 1[𝑛,𝑛+1] on (R, ℒ, 𝑚). Then
𝜇(𝑓𝑛 ) = 1 but lim𝑛→∞ 𝑓𝑛 = 0.

Proof. Let 𝑔𝑛 ∶= inf𝑘≥𝑛 𝑓𝑘 . Then 𝑔𝑛+1 ≥ 𝑔𝑛 ≥ 0 so by monotone convergence


theorem, 𝜇(𝑔𝑛 ) ↑ 𝜇(𝑔) as 𝑛 → ∞ where 𝑔 = lim𝑛→∞ 𝑔 = lim inf𝑛→∞ 𝑓𝑛 and
𝑔𝑛 ≤ 𝑓𝑛 so 𝜇(𝑔𝑛 ) ≤ 𝜇(𝑓𝑛 ) for all 𝑛. Take 𝑛 → ∞,

𝜇(𝑔) ≤ lim inf 𝜇(𝑓𝑛 ).


𝑛→∞

In both monotone convergence theorem and Fatou’s lemma we assumed that


the sequence of functions is nonnegative. There is another version of convergence
theorem where we replace nonnegativity by domination:

Theorem 3.11 (Lebesgue’s dominated convergence theorem). Let (𝑓𝑛 )𝑛≥1


be a sequence of measurable functions on (𝑋, 𝒜, 𝜇) and 𝑔 a 𝜇-integrable
function on 𝑋. Assume |𝑓𝑛 | ≤ 𝑔 for all 𝑛 (domination assumption) and
assume for all 𝑥 ∈ 𝑋, lim𝑛→∞ 𝑓𝑛 (𝑥) = 𝑓(𝑥). Then 𝑓 is 𝜇-integrable and

𝜇(𝑓) = lim 𝜇(𝑓𝑛 ).


𝑛→∞

This allows us to swap limit and integral.


Proof. |𝑓𝑛 | ≤ 𝑔 so |𝑓| ≤ 𝑔 so 𝜇(|𝑓|) ≤ 𝜇(𝑔) < ∞ and 𝑓 is integrable. Note that
𝑔 + 𝑓𝑛 ≥ 0 so by Fatou’s lemma,

𝜇(lim inf(𝑔 + 𝑓𝑛 )) ≤ lim inf 𝜇(𝑔 + 𝑓𝑛 ).


𝑛→∞ 𝑛→∞

But lim inf𝑛→∞ (𝑔 + 𝑓𝑛 ) = 𝑔 + 𝑓 and by linearity 𝜇(𝑔 + 𝑓𝑛 ) = 𝜇(𝑔) + 𝜇(𝑓𝑛 ), so

𝜇(𝑔) + 𝜇(𝑓) ≤ 𝜇(𝑔) + lim inf 𝜇(𝑓𝑛 ),


𝑛→∞

i.e.
𝜇(𝑓) ≤ lim inf 𝜇(𝑓𝑛 ).
𝑛→∞

Do the same with 𝑔 − 𝑓𝑛 in place of 𝑔 + 𝑓𝑛 , get

𝜇(−𝑓) ≤ lim inf 𝜇(−𝑓𝑛 ) = − lim sup 𝜇(𝑓𝑛 )


𝑛→∞ 𝑛→∞

so
𝜇(𝑓) = lim 𝜇(𝑓𝑛 ).
𝑛→∞

23
3 Integration and measurable functions

Corollary 3.12 (exchanging integral and summation). Let (𝑋, 𝒜, 𝜇) be a


measure space and let (𝑓𝑛 )𝑛≥1 be a sequence of measurable functions on 𝑋.
1. If 𝑓𝑛 ≥ 0 then
𝜇(∑ 𝑓𝑛 ) = ∑ 𝜇(𝑓𝑛 ).
𝑛≥1 𝑛≥1

2. If ∑𝑛≥1 |𝑓𝑛 | is 𝜇-integrable then ∑𝑛≥1 𝑓𝑛 is 𝜇-integrable and

𝜇(∑ 𝑓𝑛 ) = ∑ 𝜇(𝑓𝑛 ).
𝑛≥1 𝑛≥1

Proof.
1. Let 𝑔𝑁 = ∑𝑁 𝑓 , then 𝑔𝑁 ↑ ∑𝑛≥1 𝑓𝑛 as 𝑁 → ∞ so the result follows
𝑛=1 𝑛
from monotone convergence theorem.
2. Let 𝑔 = ∑𝑛≥1 |𝑓𝑛 | and 𝑔𝑁 as above. Then |𝑔𝑁 | ≤ 𝑔 for all 𝑁 so the
domination assumption holds. The result thus follows from dominated
convergence theorem.

Corollary 3.13 (differentiation under integral sign). Let (𝑋, 𝒜, 𝜇) be a


measure space. Let 𝑈 ⊆ R be an open set and let 𝑓 ∶ 𝑈 × 𝑋 → R be such
that

1. 𝑥 ↦ 𝑓(𝑡, 𝑥) is 𝜇-integrable for all 𝑡 ∈ 𝑈,


2. 𝑡 ↦ 𝑓(𝑡, 𝑥) is differentiable for all 𝑥 ∈ 𝑋,
3. domination: there exists 𝑔 ∶ 𝑋 → R 𝜇-integrable such that for all
𝑡 ∈ 𝑈 , 𝑥 ∈ 𝑋,
𝜕𝑓
(𝑡, 𝑥) ≤ 𝑔(𝑥).
𝜕𝑡

Then 𝑥 ↦ 𝜕𝑓 𝜕𝑡 (𝑡, 𝑥) is 𝜇-integrable for all 𝑡 ∈ 𝑈 and if we set 𝐹 (𝑡) =


∫ 𝑓(𝑡, 𝑥)𝑑𝜇 then 𝐹 is differentiable and
𝑋

𝜕𝑓
𝐹 ′ (𝑡) = ∫ (𝑡, 𝑥)𝑑𝜇.
𝑋
𝜕𝑡

Proof. Pick ℎ𝑛 > 0, ℎ𝑛 → 0 and define


1
𝑔𝑛 (𝑡, 𝑥) ∶= (𝑓(𝑡 + ℎ𝑛 , 𝑥) − 𝑓(𝑡, 𝑥)).
ℎ𝑛
Then
𝜕𝑓
lim 𝑔𝑛 (𝑡, 𝑥) = (𝑡, 𝑥).
𝑛→∞ 𝜕𝑡
By mean value theorem, there exists 𝜃𝑡,𝑛,𝑥 ∈ [𝑡, 𝑡 + ℎ𝑛 ] such that

𝜕𝑓
𝑔𝑛 (𝑡, 𝑥) = (𝜃, 𝑥)
𝜕𝑡

24
3 Integration and measurable functions

so
|𝑔𝑛 (𝑡, 𝑥)| ≤ 𝑔(𝑥)
by domination assumption. Now apply dominated convergence theorem.
Remark.

1. If 𝑓 ∶ [𝑎, 𝑏] → R is continuous where 𝑎 < 𝑏 in R, then 𝑓 is 𝑚-integrable


(where 𝑚 is the Lebesgue measure) and 𝑚(𝑓) = ∫ 𝑓(𝑥)𝑑𝑥 is the Riemann
𝑏
𝑎
integral. In general if 𝑓 is only assumed to be bounded, then 𝑓 will be
Riemann integrable if and only if the points of discontinuity of 𝑓 is an
𝑚-null set. See example sheet 2.
2. If 𝑔 ∈ GL𝑑 (R) and 𝑓 ≥ 0 is Borel measurable on R𝑑 , then

1
𝑚(𝑓 ∘ 𝑔) = 𝑚(𝑓).
| det 𝑔|

See example sheet 2. In particular 𝑚 is invariant under linear transfor-


mation whose determinant has absolute value 1, e.g. rotation.
Remark. In each of monotone convergence theorem, Fatou’s lemma and dom-
inated convergence theorem, we can replace pointwise assumption by the corre-
sponding 𝜇-almost everywhere. The same conclusion holds. Indeed, let

𝐸 = {𝑥 ∈ 𝑋 ∶ assumptions hold at 𝑥}

so 𝐸 𝑐 is a 𝜇-null set. Replace each 𝑓𝑛 (and similarly 𝑔 etc) by 1𝐸 𝑓𝑛 . Then


assumptions then hold everywhere as 𝜇(𝑓1𝐸 ) = 𝜇(𝑓) for all 𝑓 measurable.

25
4 Product measures

4 Product measures

Definition (product 𝜎-algebra). Let (𝑋, 𝒜) and (𝑌 , ℬ) be measurable spaces.


The 𝜎-algebra of subsets of 𝑋 ×𝑌 generated by the product sets 𝐸 ×𝐹 where
𝐸 ∈ 𝒜, 𝐹 ∈ ℬ is called the product 𝜎-algebra of 𝒜 and ℬ and is denoted by
𝒜 ⊗ 𝐵.

Remark.
1. By analogy with the notion of product topology, 𝒜 ⊗ ℬ is the smallest
𝜎-algebra of subsets of 𝑋 × 𝑌 making the two projection maps measurable.
2. ℬ(R𝑑1 ) ⊗ ℬ(R𝑑2 ) = ℬ(R𝑑1 +𝑑2 ). See example sheet. However this is not so
for ℒ(R𝑑 ).

Lemma 4.1. If 𝐸 ⊆ 𝑋 × 𝑌 is 𝒜 ⊗ ℬ-measurable then for all 𝑥 ∈ 𝑋, the


slice
𝐸𝑥 = {𝑦 ∈ 𝑌 ∶ (𝑥, 𝑦) ∈ 𝐸}
is in ℬ.

Proof. Let
ℰ = {𝐸 ⊆ 𝑋 × 𝑌 ∶ 𝐸𝑥 ∈ ℬ for all 𝑥 ∈ 𝑋}.
Note that ℰ contains all product sets 𝐴 × 𝐵 where 𝐴 ∈ 𝒜, 𝐵 ∈ ℬ. ℰ is a 𝜎-
algebra: if 𝐸 ∈ ℰ then 𝐸 𝑐 ∈ ℰ and if 𝐸𝑛 ∈ ℰ then ⋃ 𝐸𝑛 ∈ ℰ since (𝐸 𝑐 )𝑥 = (𝐸𝑥 )𝑐
and (⋃ 𝐸𝑛 )𝑥 = ⋃(𝐸𝑛 )𝑥 .

Lemma 4.2. Assume (𝑋, 𝒜, 𝜇) and (𝑌 , ℬ, 𝜈) are 𝜎-finite measure spaces.


Let 𝑓 ∶ 𝑋 × 𝑌 → [0, +∞] be 𝒜 ⊗ ℬ-measurable. Then
1. for all 𝑥 ∈ 𝑋, the function 𝑦 ↦ 𝑓(𝑥, 𝑦) is ℬ-measurable.

2. for all 𝑥 ∈ 𝑋, the map 𝑥 ↦ ∫ 𝑓(𝑥, 𝑦)𝑑𝜈(𝑦) is 𝒜-measurable.


𝑌

Proof.
1. In case 𝑓 = 1𝐸 for 𝐸 ∈ 𝒜 ⊗ ℬ the function 𝑦 ↦ 𝑓(𝑥, 𝑦) is just 𝑦 ↦ 1𝐸𝑥 (𝑦),
which is measurable by the previous lemma.
More generally, the result is true for simple functions and thus for all
measurable functions by taking pointwise limit.
2. By the same reduction we may assume 𝑓 = 1𝐸 for some 𝐸 ∈ 𝒜 ⊗ ℬ. Now
let 𝑌 = ⋃𝑚≥1 𝑌𝑚 with 𝜈(𝑌𝑚 ) < ∞. Let

ℰ = {𝐸 ∈ 𝒜 ⊗ ℬ ∶ 𝑥 ↦ 𝜈(𝐸𝑥 ∩ 𝑌𝑚 ) is 𝒜-measurable for all 𝑚}.

ℰ contains all product sets 𝐸 = 𝐴 × 𝐵 where 𝐴 ∈ 𝒜, 𝐵 ∈ ℬ because


𝜈(𝐸𝑥 ∩ 𝑌𝑚 ) = 1𝑥∈𝒜 𝜈(𝐵 ∩ 𝑌𝑚 ). ℰ is stable under complementation:

𝜈((𝐸 𝑐 )𝑥 ∩ 𝑌𝑚 ) = 𝜈(𝑌𝑚 ) − 𝜈(𝑌𝑚 ∩ 𝐸𝑥 )

26
4 Product measures

where LHS is 𝜈-measurable. ℰ is stable under disjoint countable union:


let 𝐸 = ⋃𝑛≥1 𝐸𝑛 where 𝐸𝑛 ∈ ℰ disjoint. Then by 𝜎-additivity

𝜈(𝐸𝑥 ∩ 𝑌𝑚 ) = ∑ 𝜈((𝐸𝑛 )𝑥 ∩ 𝑌𝑚 )
𝑛≥1

which is 𝒜-measurable.
The product sets form a 𝜋-system and generates the product measure so
by Dynkin lemma ℰ = 𝒜 ⊗ ℬ.

Definition (product measure). Let (𝑋, 𝒜, 𝜇) and (𝑌 , ℬ, 𝜈) be measure


spaces and 𝜇, 𝜈 𝜎-finite. Then there exists a unique product measure, denoted
by 𝜇 ⊗ 𝜈, on 𝒜 ⊗ ℬ such that for all 𝐴 ∈ 𝒜, 𝐵 ∈ ℬ,

(𝜇 ⊗ 𝜈)(𝐴 × 𝐵) = 𝜇(𝐴)𝜈(𝐵).

Proof. Uniqueness follows from Dynkin lemma. For existence, set

𝜎(𝐸) = ∫ 𝜈(𝐸𝑥 )𝑑𝜇(𝑥).


𝑋

𝜎 is well-defined because 𝑥 ↦ 𝜈(𝐸𝑥 ) is 𝒜-measurable by lemma 2. 𝜎 is countably-


additive: suppose 𝐸 = ⋃𝑛≥1 𝐸𝑛 where 𝐸𝑛 ∈ 𝒜 ⊗ ℬ disjoint, then

𝜎(𝐸) = ∫ 𝜈(𝐸𝑥 )𝑑𝜇(𝑥) = ∫ ∑ 𝜈((𝐸𝑛 )𝑥 )𝑑𝜇𝑥 = ∑ ∫ 𝜈((𝐸𝑛 )𝑥 )𝑑𝜇(𝑥) = ∑ 𝜎(𝐸𝑛 )


𝑋 𝑋 𝑛≥1 𝑛≥1 𝑋 𝑛≥2

by a corollary of MCT.

Theorem 4.3 (Tonelli-Fubini). Let (𝑋, 𝒜, 𝜇) and (𝑌 , ℬ, 𝜈) be 𝜎-finite mea-


sure spaces.

1. Let 𝑓 ∶ 𝑋 × 𝑌 → [0, +∞] be 𝒜 ⊗ ℬ-measurable. Then

∫ 𝑓(𝑥, 𝑦)𝑑(𝜇⊗𝜈) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝜈(𝑦)𝑑𝜇(𝑥) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝜇(𝑥)𝑑𝜈(𝑦).


𝑋×𝑌 𝑋 𝑌 𝑋 𝑋

2. If 𝑓 ∶ 𝑋 × 𝑌 → R is 𝜇 ⊗ 𝜈-integrable then for 𝜇-almost everywhere 𝑥,


𝑦 ↦ 𝑓(𝑥, 𝑦) is 𝜈-integrable, and for 𝜈-almost everywhere 𝑦, 𝑥 ↦ 𝑓(𝑥, 𝑦)
is 𝜇-integrable and

∫ 𝑓(𝑥, 𝑦)𝑑(𝜇⊗𝜈) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝜈(𝑦)𝑑𝜇(𝑥) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝜇(𝑥)𝑑𝜈(𝑦).


𝑋×𝑌 𝑋 𝑌 𝑋 𝑋

Without the nonnegativity or integrability assumption, the result is false in


general. For example for 𝑋 = 𝑌 = N, let 𝒜 = ℬ be discrete 𝜎-algebras and

27
4 Product measures

𝜇 = 𝜈 counting measure. Let 𝑓(𝑛, 𝑚) = 1𝑛=𝑚 − 1𝑛=𝑚+1 . Check that

∑ 𝑓(𝑛, 𝑚) = 0
𝑛≥1

0 𝑛≥2
∑ 𝑓(𝑛, 𝑚) = {
𝑚≥1 1 𝑛=1

so
∑ ∑ 𝑓(𝑛, 𝑚) ≠ ∑ ∑ 𝑓(𝑛, 𝑚).
𝑛≥1 𝑚≥1 𝑚≥1 𝑛≥1

Proof.

1. The result holds for 𝑓 = 1𝐸 where 𝐸 ∈ 𝒜 ⊗ ℬ by the definition of product


measure and lemma 2, so it holds for all simple functions. Now take limits
and apply MCT.
2. Write 𝑓 = 𝑓 + − 𝑓 − and apply 1.

Note.
1. The Lebesgue measure 𝑚𝑑 on R𝑑 is equal to 𝑚1 ⊗ ⋯ ⊗ 𝑚1 , because it is
true on boxes and extend by uniqueness of measure.

2. 𝐸 ∈ 𝒜 ⊗ ℬ if is 𝜇 ⊗ 𝜈-null if and only if for 𝜇-almost every 𝑥, 𝜈(𝐸𝑥 ) = 0.

28
5 Foundations of probability theory

5 Foundations of probability theory


Modern probability theory was founded by Kolmogorov, who formulated the
axioms of probability theory in 1933 in his thesis Foundations on the Theory of
Probability. He defined a probability space to be a measure space (Ω, ℱ, P). The
interpretation is as follow: Ω is the universe of possible outcomes. However, we
wouldn’t be able to assign probability to every single outcome unless the space
is discrete. Instead we are interested in studying some subsets of Ω, which
are called events and contained in ℱ. Finally P is a probability measure with
P(Ω) = 1. Thus for 𝐴 ∈ ℱ, P(𝐴 occurs) ∈ [0, 1]. Thus finite additivity of P says
that if 𝐴 and 𝐵 never occurs simultaneously then P(𝐴 or 𝐵 = P(𝐴) + P(𝐵).
𝜎-additivity is slightly more difficult to justify and it is perhaps to see the
equivalent notion continuity: if 𝐴𝑛+1 ⊇ 𝐴𝑛 and ⋂𝑛≥1 𝐴𝑛 = ∅ then P(𝐴𝑛 ) → 0
as 𝑛 → ∞.

Definition (probability measure, probability space). Let Ω be a set and ℱ


a 𝜎-algebra on Ω. A measure 𝜇 on (Ω, ℱ) is called a probability measure if
𝜇(Ω) = 1 and the measure space (Ω, ℱ, 𝜇) is called a probability space.

Definition (random variable). A measurable function 𝑋 ∶ Ω → R is called


a random variable.

We usually use a capital letter to denote a random variable.

Definition (expectation). If (Ω, ℱ, P) is a probability space then the P-


integral is called expectation, denoted E.

Definition (distribution/law). A random variable 𝑋 ∶ Ω → R on a proba-


bility space (Ω, ℱ, P) determines a Borel measure 𝜇𝑋 on R defined by

𝜇𝑋 ((−∞, 𝑡]) = P(𝑋 ≤ 𝑡) = P({𝜔 ∈ Ω ∶ 𝑋(𝜔) ≤ 𝑡})

and 𝜇𝑋 is called the distribution of 𝑋, or the law of 𝑋.

Note. 𝜇𝑋 is the image of P under

Ω→R
𝜔 ↦ 𝑋(𝜔)

Definition (distribution function). The function

𝐹𝑋 ∶ R → [0, 1]
𝑡 ↦ P(𝑋 ≤ 𝑡)

is called the distribution function of 𝑋.

29
5 Foundations of probability theory

Proposition 5.1. If (Ω, ℱ, P) is a probability space and 𝑋 ∶ Ω → R is a ran-


dom variable then 𝐹𝑋 is non-decreasing, right-continuous and it determines
𝜇𝑋 uniquely.

Proof. Given 𝑡𝑛 ↓ 𝑡,

𝐹𝑋 (𝑡𝑛 ) = P(𝑋 ≤ 𝑡𝑛 ) → P( ⋂ {𝑋 ≤ 𝑡𝑛 }) = P({𝑋 ≤ 𝑡}) = 𝐹𝑋 (𝑡)


𝑛≥1

by downward monotone convergence for sets. Uniqueness follows from Dynkin


lemma applied to the 𝜋-system {∅} ∪ {(−∞, 𝑡]}𝑡∈R .
Conversely,

Proposition 5.2. If 𝐹 ∶ R → [0, 1] is a non-decreasing right-continuous


function with

lim 𝐹 (𝑡) = 0
𝑡→−∞
lim 𝐹 (𝑡) = 1
𝑡→+∞

then there exists a unique probability measure 𝜇 on R such that

𝐹 (𝑡) = 𝜇((−∞, 𝑡])

for all 𝑡 ∈ R.

Remark. The measure 𝜇 is called the Lebesgue-Stieltjes measure on R associ-


ated to 𝐹. Furthermore for all 𝑎, 𝑏 ∈ R,

𝜇((𝑎, 𝑏]) = 𝐹 (𝑏) − 𝐹 (𝑎).

We can also construct Lebesgue measure this way.


Proof. Uniqueness is the same as above. For existence, we use the lemma

Lemma 5.3. Let

𝑔 ∶ (0, 1) → R
𝑦 ↦ inf{𝑥 ∈ R ∶ 𝐹 (𝑥) ≥ 𝑦}

then 𝑔 is non-decreasing, left-continuous and for all 𝑥 ∈ R, 𝑦 ∈ (0, 1), 𝑔(𝑦) ≤


𝑥 if and only if 𝐹 (𝑥) ≥ 𝑦.

Proof. Let
𝐼𝑦 = {𝑥 ∈ R ∶ 𝐹 (𝑥) ≥ 𝑦}.
Clearly if 𝑦1 ≥ 𝑦2 then 𝐼𝑦1 ⊆ 𝐼𝑦2 so 𝑔(𝑦2 ) ≤ 𝑔(𝑦1 ) so 𝑔 is non-decreasing. 𝐼𝑦 is
an interval of R because if 𝑥 > 𝑥1 and 𝑥1 ∈ 𝐼𝑦 then 𝐹 (𝑥) ≥ 𝐹 (𝑥1 ) ≥ 𝑦 so 𝑥 ∈ 𝐼𝑦 .
So 𝐼𝑦 is an interval with endpoints 𝑔(𝑦) and +∞. But 𝐹 is right-continuous so
𝑔(𝑦) = min 𝐼𝑦 and the minimum is obtained. Thus 𝐼𝑦 = [𝑔(𝑦), +∞).
This means that 𝑥 ≥ 𝑔(𝑦) if and only if 𝑥 ∈ 𝐼𝑦 if and only if 𝐹 (𝑥) ≥ 𝑦.
Finally for left-continuity, suppose 𝑦𝑛 ↑ 𝑦 then ⋂𝑛≥1 𝐼𝑦𝑛 = 𝐼𝑦 by definition
of 𝐼𝑦 so 𝑔(𝑦𝑛 ) → 𝑔(𝑦).

30
5 Foundations of probability theory

Remark. If 𝐹 is continuous and strictly increasing then 𝑔 = 𝐹 −1 .


Now back to the proposition. Set 𝜇 = 𝑔∗ 𝑚 where 𝑚 is the Lebesgue measure
on (0, 1). 𝜇 is a probability measure as 𝑔 is Borel-measurable. By the lemma
𝜇((𝑎, 𝑏]) = 𝑚(𝑔−1 (𝑎, 𝑏]) = 𝑚((𝐹 (𝑎), 𝐹 (𝑏))) = 𝐹 (𝑏) − 𝐹 (𝑎).

Proposition 5.4. If 𝜇 is a Borel probability measure on R then there exists


some probability space (Ω, ℱ, P) and a random variable 𝑋 on Ω such that
𝜇𝑋 = 𝜇.
In fact, one can even pick Ω = (0, 1), ℱ = ℬ(0, 1) and P = 𝑚, the
Lebesgue measure.
Proof. For the first claim set Ω = R, ℱ = ℬ(R), P = 𝜇 and 𝑋(𝑥) = 𝑥.
For the second claim, set 𝐹 (𝑡) = 𝜇((−∞, 𝑡]) and take 𝑋 = 𝑔 where 𝑔 is the
auxillary function defined in the previous lemma, namely
𝑋(𝜔) = inf{𝑥 ∶ 𝐹 (𝑥) ≥ 𝜔}.
Check that 𝜇𝑋 = 𝜇:
𝜇𝑋 ((𝑎, 𝑏]) = P(𝑋 ∈ (𝑎, 𝑏])
= 𝑚({𝜔 ∈ (0, 1) ∶ 𝑎 < 𝑋(𝑤) ≤ 𝑏})
= 𝑚({𝜔 ∈ (0, 1) ∶ 𝐹 (𝑎) < 𝜔 < 𝐹 (𝑏)})

Remark. If 𝜇 is a Borel probability measure on R such that 𝜇 = 𝑓𝑑𝑡 for


some 𝑓 ≥ 0 measurable, we say that 𝜇 has a density (with respect to Lebesgue
measure) and 𝑓 is called the density of 𝜇. Here 𝜇 = 𝑓𝑑𝑡 means that 𝜇((𝑎, 𝑏]) =
𝑏
∫ 𝑓(𝑡)𝑑𝑡.
𝑎
Example.
1. uniform distribution on [0, 1]:
𝑓(𝑡) = 1[0,1] (𝑡)
𝐹 (𝑡) = 𝜇((−∞, 𝑡] ∩ [0, 1])

2. exponential distribution of rate 𝜆:


𝑓𝜆 (𝑡) = 𝜆𝑒−𝜆𝑡 1𝑡≥0
𝑡
𝐹𝜆 (𝑡) = ∫ 𝑓𝜆 (𝑠)𝑑𝑠 = 1𝑡≥0 (1 − 𝑒−𝜆𝑡 )
−∞

3. Gaussian distribution with standard deviation 𝜎 and mean 𝑚:


1 (𝑡 − 𝑚)2
𝑓𝜎,𝑚 (𝑡) = √ exp(− )
2𝜋𝜎2 2𝜎2
𝑡
1 (𝑠 − 𝑚)2
𝐹𝜎,𝑚 (𝑡) = ∫ √ exp(− )𝑑𝑠
−∞ 2𝜋𝜎2 2𝜎2

31
5 Foundations of probability theory

Definition (mean, moment, variance). If 𝑋 is a random variable then


1. E(𝑋) is called the mean,
2. E(𝑋 𝑘 ) is called the 𝑘th-moment of 𝑋,
3. Var(𝑋) = E((𝑋 − E𝑋)2 ) = E(𝑋 2 ) − E(𝑋)2 is called the variance.

Remark. Suppose 𝑓 ≥ 0 is measurable and 𝑋 is a random variable. Then

E(𝑓(𝑋)) = ∫ 𝑓(𝑥)𝑑𝜇𝑋 (𝑥)


R

where by definition of 𝜇𝑋 = 𝑋∗ P.

32
6 Independence

6 Independence
Independence is the key notion that makes probability theory distinct from
(abstract) measure theory.

Definition (independence). Let (Ω, ℱ, P) be a probability space. A se-


quence of events (𝐴𝑛 )𝑛≥1 is called independent or mutually independent if
for all 𝐹 ⊆ N finite,
P(⋂ 𝐴𝑖 ) = ∏ P(𝐴𝑖 ).
𝑖∈𝐹 𝑖∈𝐹

Definition (independent 𝜎-algebra). A sequence of 𝜎-algebras (𝒜𝑛 )𝑛≥1


where 𝒜𝑛 ⊆ ℱ is called independent if for all 𝐴𝑛 ∈ 𝒜𝑛 , the family (𝐴𝑛 )𝑛≥1
is independent.

Remark.
1. To prove that (𝒜𝑛 )𝑛≥1 is an independent family, it is enough to check
the independence condition for all 𝐴𝑛 ’s with 𝐴𝑛 ∈ Π𝑛 where Π𝑛 is a 𝜋-
system generating 𝒜𝑛 . The proof is an application of Dynkin lemma. For
example for 𝜎-algebras 𝒜1 , 𝒜2 , suffices to check

P(𝐴1 ∩ 𝐴2 ) = P(𝐴1 )P(𝐴2 )

for all 𝐴1 ∈ Π1 , 𝐴2 ∈ Π2 . Fix 𝐴2 ∈ Π2 , look at the measures

𝐴1 ↦ P(𝐴1 ∩ 𝐴2 )
𝐴1 ↦ P(𝐴1 )P(𝐴2 )

on 𝒜1 . They coincide on Π1 by assumption and hence everywhere on 𝒜1 .


Subsequently consider 𝒜2 .
Notation. Suppose 𝑋 is a random variable. Denote by 𝜎(𝑋) the smallest
𝜎-subalgebra 𝒜 of ℱ such that 𝑋 is 𝒜-measurable, i.e.

𝜎(𝑋) = 𝜎({𝜔 ∈ Ω ∶ 𝑋(𝜔) ≤ 𝑡}𝑡∈R ).

Definition (independence). A sequence of random variables (𝑋𝑖 )𝑖≥1 is


called independent if the sequence of 𝜎-subalgebras (𝜎(𝑋𝑖 ))𝑖≥1 is indepen-
dent.
Remark. This is equivalent to the condition that for all (𝑡𝑖 )𝑖≥1 , for all 𝑛,
𝑛
P((𝑋1 ≤ 𝑡1 ) ∩ ⋯ ∩ (𝑋𝑛 ≤ 𝑡𝑛 )) = ∏ P(𝑋𝑖 ≤ 𝑡𝑖 ).
𝑖=1

Yet another equivalent formulation is


𝑛
𝜇(𝑋1 ,…,𝑋𝑛 ) = ⨂ 𝜇𝑋𝑖
𝑖=1

as Borel probability measures on R . “The joint law is the same as the product
𝑛

of individual laws”.

33
6 Independence

Note. Note that independence is a property of a family so pairwise indepen-


dence is necessary but not sufficient for independence. A famous counterexample
is Berstein’s example: take 𝑋 and 𝑌 to be random variables for two independent
fair coins flips. Set 𝑍 = |𝑋 − 𝑌 |. Then 𝑍 = 0 if and only if 𝑋 = 𝑌. Check that

1
P(𝑍 = 0) = P(𝑍 = 1) =
2
and each pair (𝑋, 𝑌 ), (𝑋, 𝑍) and (𝑌 , 𝑍) is independent. But (𝑋, 𝑌 , 𝑍) is not
independent.

Proposition 6.1. If 𝑋 and 𝑌 are independent random variables, 𝑋 ≥


0, 𝑌 ≥ 0 then
E(𝑋𝑌 ) = E(𝑋)E(𝑌 ).

Proof. Essentially Tonelli-Fubini:

E(𝑋𝑌 ) = ∫ 𝑥𝑦𝑑𝜇𝑋,𝑌 (𝑥, 𝑦) = ∫ 𝑑𝜇𝑋 (𝑥)𝑑𝜇𝑌 (𝑦)


R2 R2

= (∫ 𝑥𝑑𝜇𝑋 (𝑥)) (∫ 𝑦𝑑𝜇𝑌 (𝑦))


R R
= E(𝑋)E(𝑌 )

Remark. As in Tonelli-Fubini, we may require 𝑋𝑌 to be integrable instead and


the same conclusion holds.

Example. Let Ω = (0, 1), ℱ = ℬ(0, 1), P = 𝑚 the Lebesgue measure. Write
the decimal expansion of 𝜔 ∈ (0, 1) as

𝜔 = 0.𝜀1 𝜀2 …

where 𝜀𝑖 (𝜔) ∈ {0, … , 9}. Choose a convention so that each 𝜔 has a well-defined
expansion (to avoid things like 0.099 ⋯ = 0.100 …). Now let 𝑋𝑛 (𝜔) = 𝜀𝑛 (𝜔).
Claim that the (𝑋𝑛 )𝑛≥1 are iid. random variables uniformly distributed on
{0, … , 9}, where “iid.” stands for independently and identically distributed.
Proof. Easy check. For example 𝑋1 (𝜔) = ⌊10𝜔⌋ so

1
P(𝑋1 = 𝑖1 ) = .
10
Similarly for all 𝑛
1 1
P(𝑋1 = 𝑖1 , … , 𝑋𝑛 = 𝑖𝑛 ) = , P(𝑋𝑛 = 𝑖𝑛 ) =
10𝑛 10
so 𝑛
P(𝑋1 = 𝑖1 , … , 𝑋𝑛 = 𝑖𝑛 ) = ∏ P(𝑋𝑘 = 𝑖𝑘 ).
𝑖=1

34
6 Independence

Remark.
𝑋𝑛 (𝜔)
𝜔=∑
𝑛≥1
10𝑛
is distributed according to Lebesgue measure so if we want we can construct
Lebesgue measure as the law of this random variable.

Proposition 6.2 (infinite product of product measure). Let (Ω𝑖 , ℱ𝑖 , 𝜇𝑖 )𝑖≥1


be a sequence of probability spaces, Ω = ∏𝑖≥1 Ω𝑖 and ℰ be the Boolean algebra
of cylinder sets, i.e. sets of the form

𝐴 × ∏ Ω𝑖
𝑖≥𝑛

for some 𝐴 ∈ ⨂𝑛𝑖=1 ℱ𝑖 . Set ℱ = 𝜎(ℰ), the infinite product 𝜎-algebra. Then
there is a unique probability measure 𝜇 on (Ω, ℱ) such that it agrees with
product measures on all cylinder sets, i.e.
𝑛
𝜇(𝐴 × ∏ Ω𝑖 ) = (⨂ 𝜇𝑖 )(𝐴)
𝑖>𝑛 𝑖=1

for all 𝐴 ∈ ⨂𝑛𝑖=1 ℱ𝑖 .

Proof. Omitted. See example sheet 3.

Lemma 6.3 (Borel-Cantelli). Let (Ω, ℱ, P) be a probability space and (𝐴𝑛 )𝑛≥1
a sequence of events.
1. If ∑𝑛≥1 P(𝐴𝑛 ) < ∞ then

P(lim sup 𝐴𝑛 ) = 0.
𝑛

2. Conversely, if (𝐴𝑛 )𝑛≥1 are independent and ∑𝑛≥1 P(𝐴𝑛 ) = ∞ then

P(lim sup 𝐴𝑛 ) = 1.
𝑛

Note that lim sup𝑛 𝐴𝑛 is also called 𝐴𝑛 io. meaning “infinitely often”.
Proof.
1. Let 𝑌 = ∑𝑛≥1 1𝐴𝑛 be a random variable. Then

E(𝑌 ) = ∑ E(1𝐴𝑛 ) = ∑ P(𝐴𝑛 ).


𝑛≥1 𝑛≥1

Since 𝑌 ≥ 0, recall that we prove that E(𝑌 ) < ∞ implies that 𝑌 < ∞
almost surely, i.e. P-almost everywhere.
2. Note that
(lim sup 𝐴𝑛 )𝑐 = ⋃ ⋂ 𝐴𝑐𝑛
𝑛 𝑁 𝑛≥𝑁

35
6 Independence

so
𝑀
P( ⋂ 𝐴𝑐𝑛 ) ≤ P( ⋂ 𝐴𝑐𝑛 )
𝑛≥𝑁 𝑛=𝑁
𝑀 𝑀
= ∏ P(𝐴𝑐𝑛 ) = ∏ (1 − P(𝐴𝑛 ))
𝑛=𝑁 𝑛=𝑁
𝑀
≤ ∏ exp(−P(𝐴𝑛 ))
𝑛=𝑁
𝑀
≤ exp(− ∑ P(𝐴𝑛 ))
𝑛=𝑁
→0

as 𝑀 → ∞. Thus
P( ⋂ 𝐴𝑐𝑛 ) = 0
𝑛≥𝑁

for all 𝑁 so
P(⋃ ⋂ 𝐴𝑐𝑛 ) = 0.
𝑁 𝑛≥𝑁

Definition (random/stochastic process, filtration, tail 𝜎-algebra, tail event).


Let (Ω, ℱ, P) be a probability space and (𝑋𝑛 )𝑛≥1 a sequence of random vari-
ables.
1. (𝑋𝑛 )𝑛≥1 is sometimes called a random process or stochastic process.
2.
ℱ𝑛 = 𝜎(𝑋1 , … , 𝑋𝑛 ) ⊆ ℱ
is called the associated filtration. ℱ𝑛 ⊆ ℱ𝑛+1 .
3.
𝒞 = ⋂ 𝜎(𝑋𝑛 , 𝑋𝑛+1 , … )
𝑛≥1

is called the tail 𝜎-algebra of the process. Its elements are called tail
events.

Example. Tail events are those not affected by the first few terms in the se-
quence of random variables. For example,

{𝜔 ∈ Ω ∶ lim 𝑋𝑛 (𝜔) exists}


𝑛

is a tail event, so is
{𝜔 ∈ Ω ∶ lim sup 𝑋𝑛 (𝜔) ≥ 𝑇 }.
𝑛

Theorem 6.4 (Kolmogorov 0−1 law). If (𝑋𝑛 )𝑛≥1 is a sequence of mutually

36
6 Independence

independent random variables then for all 𝐴 ∈ 𝒞,

P(𝐴) ∈ {0, 1}.


Proof. Pick 𝐴 ∈ 𝒞. Fix 𝑛. For all 𝐵 ∈ 𝜎(𝑋1 , … , 𝑋𝑛 ),

P(𝐴 ∩ 𝐵) = P(𝐴)P(𝐵)

as 𝒞 is independent of 𝜎(𝑋1 , … , 𝑋𝑛 ). The measures 𝐵 ↦ P(𝐴)P(𝐵) and


𝐵 ↦ P(𝐴 ∩ 𝐵) coincide on each ℱ𝑛 so on ⋃𝑛≥1 ℱ𝑛 . Hence they coincide
on 𝜎(⋃𝑛≥1 ℱ𝑛 ) ⊇ 𝒞 so

P(𝐴) = P(𝐴 ∩ 𝐴) = P(𝐴)P(𝐴)

so
P(𝐴) ∈ {0, 1}.

6.1 Useful inequalities

Proposition 6.5 (Cauchy-Schwarz). Suppose 𝑋, 𝑌 are random variables


then
E(|𝑋𝑌 |) ≤ √E(𝑋 2 ) ⋅ E(𝑌 2 ).

Proof. For all 𝑡 ∈ R,

0 ≤ E((|𝑋| + 𝑡|𝑌 |)2 ) = E(𝑋 2 ) + 2𝑡E(|𝑋𝑌 |) + 𝑡2 E(𝑌 2 )

so viewed as a quadratic in 𝑡, the discriminant is nonpositive, i.e.

(E(|𝑋𝑌 |)2 − E(𝑋)2 − E(𝑌 )2 ≤ 0.

Proposition 6.6 (Markov). Let 𝑋 ≥ 0 be a random variable. Then for all


𝑡 ≥ 0,
𝑡P(𝑋 ≥ 𝑡) ≤ E(𝑋).

Proof.
E(𝑋) ≥ E(𝑋1𝑋≥𝑡 ) ≥ E(𝑡1𝑋≥𝑡 ) = 𝑡P(𝑋 ≥ 𝑡)

Proposition 6.7 (Chebyshev). Let 𝑌 be a random variable with E(𝑌 2 ) < ∞,


then for all 𝑡 ∈ R,
𝑡2 P(|𝑌 − E(𝑌 )| ≥ 𝑡) ≤ Var 𝑌 .

E(𝑌 2 ) < ∞ implies that E(|𝑌 |) < ∞ by Cauchy-Schwarz, so Var 𝑌 < ∞.


The converse is more subtle.
Proof. Apply Markov to 𝑋 = |𝑌 − E(𝑌 )|2 .

37
6 Independence

Theorem 6.8 (strong law of large numbers). Let (𝑋𝑛 )𝑛≥1 be a sequence of
iid. random variables. Assume E(|𝑋1 |) < ∞. Let
𝑛
𝑆𝑛 = ∑ 𝑋𝑘 ,
𝑘=1

then 1
𝑛 𝑆𝑛 converges almost surely to E(𝑋1 ).

Proof. We prove the theorem under a stonger condition: we assume E(𝑋14 ) <
∞. This implies, by Cauchy-Schwarz, E(𝑋12 ), E(|𝑋1 |) < ∞. Subsequently
E(|𝑋1 |3 ) < ∞. The full proof is much harder but will be given later when
we have developed enough machinery.
wlog we may assume E(𝑋1 ) = 0 by replacing 𝑋𝑛 with 𝑋𝑛 − E(𝑋1 ). Have

E(𝑆𝑛4 ) = ∑ E(𝑋𝑖 𝑋𝑗 𝑋𝑘 𝑋ℓ ).
𝑖,𝑗,𝑘,ℓ

All terms vanish because E(𝑋𝑖 ) = 0 and (𝑋𝑖 )𝑖≥1 are independent, except for
E(𝑋𝑖4 ) and E(𝑋𝑖2 𝑋𝑗2 ) for 𝑖 ≠ 𝑗. For example,

E(𝑋𝑖 𝑋𝑗3 ) = E(𝑋𝑖 ) ⋅ E(𝑋𝑗3 ) = 0

for 𝑖 ≠ 𝑗. Thus
𝑛
E(𝑆𝑛4 ) = ∑ E(𝑋𝑖4 ) + 6 ∑ E(𝑋𝑖2 𝑋𝑗2 ).
𝑖=1 𝑖<𝑗

By Cauchy-Schwarz,

E(𝑋𝑖2 𝑋𝑗2 ) ≤ √E𝑋𝑖4 ⋅ E𝑋𝑗4 = E𝑋14

so
𝑛(𝑛 − 1)
E(𝑆𝑛4 ) ≤ (𝑛 + 6 ⋅ )E𝑋14
2
and asymptotically,
𝑆𝑛 4 1
E( ) = 𝑂( 2 )
𝑛 𝑛
so
𝑆𝑛 4 𝑆
E(∑( ) ) = ∑ E( 𝑛 )4 < ∞.
𝑛≥1
𝑛 𝑛≥1
𝑛

Hence ∑( 𝑆𝑛𝑛 )4 < ∞ almost surely and it follows that

𝑆𝑛
lim =0
𝑛→∞ 𝑛
almost surely.

Strong law of large numbers has a very important statistical implication: we


can sample the mean of larger number of iid. to detect an unknown law, at least
the mean.

38
7 Convergence of random variables

7 Convergence of random variables

Definition (weak convergence). A sequence of probability measures (𝜇𝑛 )𝑛≥1


on (R𝑑 , ℬ(R𝑑 )) is said to converge weakly to a measure 𝜇 if for all 𝑓 ∈ 𝐶𝑏 (R𝑑 ),
the set of continuous bounded functions on R𝑑 ,

lim 𝜇𝑛 (𝑓) = 𝜇(𝑓).


𝑛→∞

Example.
1. Let 𝜇𝑛 = 𝛿1/𝑛 be the Dirac mass on R𝑑 , i.e. for 𝑥 ∈ R𝑑 , 𝛿𝑥 is the Borel
probability measure on R𝑑 such that

1 𝑥∈𝐴
𝛿𝑥 (𝐴) = {
0 𝑥∉𝐴

then 𝜇𝑛 → 𝛿0 .

2. Let 𝜇𝑛 = 𝒩(0, 𝜎𝑛2 ), Gaussian distribution with standard deviation 𝜎𝑛 ,


where 𝜎𝑛 → 0, then again 𝜇𝑛 → 𝛿0 . Indeed,

𝜇𝑛 (𝑓) = ∫ 𝑓(𝑥)𝑑𝜇𝑛 (𝑥)

1 𝑥2
= ∫ 𝑓(𝑥) exp(− 2 )𝑑𝑥
√2𝜋𝜎𝑛2 2𝜎𝑛
1 𝑥2
= ∫ 𝑓(𝑥𝜎𝑛 ) √ exp(− )𝑑𝑥
2𝜋 2

𝜎𝑛 → 0 so 𝑓(𝑥𝜎𝑛 ) → 𝑓(0) so by dominated convergence theorem, 𝜇𝑛 (𝑓) →


𝑓(0) = 𝛿0 (𝑓).

Definition (convergence of random variable). A sequence (𝑋𝑛 )𝑛≥1 of R𝑑 -


valued random variables on (Ω, ℱ, P) is said to converge to a random variable
𝑋
1. almost surely if
lim 𝑋𝑛 (𝜔) = 𝑋(𝜔)
𝑛→∞

for P-almost every 𝜔.


2. in probability or in measure if for all 𝜀 > 0,

lim P(‖𝑋𝑛 − 𝑋‖ > 𝜀) = 0.


𝑛→∞

Note that all norms on R𝑑 are equivalent so we don’t have to specify


one.

39
7 Convergence of random variables

3. in distribution or in law if 𝜇𝑋𝑛 → 𝜇𝑋 weakly, where 𝜇𝑋 = 𝑋∗ P is


the law of 𝑋, a Borel probability measure on R𝑑 . Equivalently, for all
𝑓 ∈ 𝐶𝑏 (R𝑑 ), E(𝑓(𝑋𝑛 )) → E(𝑓(𝑋)).

Proposition 7.1. 1 ⟹ 2 ⟹ 3.

Proof.
1. 1 ⟹ 2:
P(‖𝑋𝑛 − 𝑋‖ > 𝜀) = E(1‖𝑋𝑛 −𝑋‖>𝜀 )
so if 𝑋𝑛 → 𝑋 almost surely then

1‖𝑋𝑛 −𝑋‖>𝜀 → 0

P-almost everywhere so by dominated convergence theorem P(‖𝑋𝑛 −𝑋‖ >


𝜀) → 0.
2. 2 ⟹ 3: given 𝑓 ∈ 𝐶𝑏 (R𝑑 ), need to show that 𝜇𝑋𝑛 (𝑓) → 𝜇𝑋 (𝑓). But

𝜇𝑋𝑛 (𝑓) − 𝜇𝑋 (𝑓) = E(𝑓(𝑋𝑛 ) − 𝑓(𝑋)).

To bound this, note that 𝑓 is continuous and R𝑑 is locally compact so it is


locally uniformly continuous. In particular for all 𝜀 > 0 exists 𝛿 > 0 such
that if ‖𝑥‖ < 1𝜀 and ‖𝑦 − 𝑥‖ < 𝛿 then |𝑓(𝑦) − 𝑓(𝑥)| < 𝜀. Thus

|E(𝑓(𝑋𝑛 ) − 𝑓(𝑋))| ≤ E(1‖𝑋𝑛 −𝑋‖<𝛿 1‖𝑋‖<1/𝜀 |𝑓(𝑋


⏟⏟⏟⏟⏟𝑛 ) − 𝑓(𝑋)|
⏟⏟)
<𝜀
1
+ 2 ‖𝑓‖
⏟ ∞ (P(‖𝑋𝑛 − 𝑋‖ ≥ 𝛿) + P(‖𝑋‖ ≥ ))
𝜀
<∞

so
1
lim sup |E(𝑓(𝑋𝑛 ) − 𝑓(𝑋))| ≤ 𝜀 + 2‖𝑓‖∞ P(‖𝑋‖ > ) .
𝑛→∞ ⏟⏟⏟⏟⏟ 𝜀
→0 as 𝜀→0
which is 0 as 𝜀 is arbitrary.

Remark. When 𝑑 = 1, 3 is equivalent to 𝐹𝑋𝑛 (𝑥) → 𝐹𝑋 (𝑥) for all 𝑥 as 𝑛 → ∞


where 𝐹𝑋 is continuous. See example sheet 3.
The strict converses do not hold but we can say something weaker.

Proposition 7.2. If 𝑋𝑛 → 𝑋 in probability then there is a subsequence


(𝑛𝑘 )𝑘 such that 𝑋𝑛𝑘 → 𝑋 almost surely as 𝑘 → ∞.

Proof. We know for all 𝜀 > 0, P(‖𝑋𝑛 − 𝑋‖ > 𝜀) → 0 as 𝑛 → ∞. So for all 𝑘


exists 𝑛𝑘 such that
1 1
P(‖𝑋𝑛𝑘 − 𝑋‖ > ) ≤ 𝑘
𝑘 2
so
1
∑ P(‖𝑋𝑛𝑘 − 𝑋‖ > ) < ∞
𝑘≥1
𝑘

40
7 Convergence of random variables

so by the first Borel-Cantelli lemma


1
P(‖𝑋𝑛𝑘 − 𝑋‖ > io.) = 0.
𝑘
This means that with probability 1, ‖𝑋𝑛𝑘 − 𝑋‖ → 0 as 𝑘 → ∞.

Definition (convergence in mean). Let (𝑋𝑛 )𝑛≥1 and 𝑋 be R𝑑 -valued inte-


grable random variables. We say that (𝑋𝑛 )𝑛 converges in mean or in 𝐿1 to
𝑋 if
lim E(‖𝑋𝑛 − 𝑋‖) = 0.
𝑛→∞

Remark.
1. If 𝑋𝑛 → 𝑋 in mean then 𝑋𝑛 → 𝑋 in probability by Markov inequality:
𝜀 ⋅ P(‖𝑋𝑛 − 𝑋‖ > 𝜀) ≤ E(‖𝑋𝑛 − 𝑋‖).

2. The converse is false. For example take Ω = (0, 1), ℱ = ℬ(Σ) and P
Lebesgue measure. Let 𝑋𝑛 = 𝑛1[0, 1 ] . 𝑋𝑛 → 0 almost surely but E𝑋𝑛 = 1.
𝑛

When does convergence in probability imply convergence in mean? We need


some kind of domination assumption.

Definition (uniformly integrable). A sequence of random variables (𝑋𝑛 )𝑛≥1


is uniformly integrable if

lim lim sup E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) = 0.


𝑀→∞ 𝑛→∞

Remark. If (𝑋𝑛 )𝑛≥1 are dominated, namely exists an integrable random vari-
able 𝑌 ≥ 0 such that ‖𝑋𝑛 ‖ ≤ 𝑌 for all 𝑛 then (𝑋𝑛 )𝑛 is uniformly integrable by
dominated convergence theorem:
E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) ≤ E(𝑌1𝑌 ≥𝑀 ) → 0
as 𝑀 → ∞.

Theorem 7.3. Let (𝑋𝑛 )𝑛≥1 be a sequence of R𝑑 -valued integrable random


variable. Let 𝑋 be another random variable. Then TFAE:
1. 𝑋 is integrable and 𝑋𝑛 → 𝑋 in mean,
2. (𝑋𝑛 )𝑛≥1 is uniformly integrable and 𝑋𝑛 → 𝑋 in probability.

Proof.
• 1 ⟹ 2: Left to show uniform integrability:
E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) ≤ E(‖𝑋𝑛 − 𝑋‖1‖𝑋𝑛 ‖≥𝑀 ) + E(‖𝑋‖1‖𝑋𝑛 ‖≥𝑀 )
≤ E(‖𝑋𝑛 − 𝑋‖) + E(‖𝑋‖1‖𝑋𝑛 ‖≥𝑀 (1‖𝑋‖≤ 𝑀 + 1‖𝑋‖> 𝑀 ))
2 2

≤ E(‖𝑋𝑛 − 𝑋‖) + E(‖𝑋‖1‖𝑋𝑛 −𝑋‖≥ 𝑀 1‖𝑋‖≤ 𝑀 ) + E(‖𝑋‖1‖𝑋‖≥ 𝑀 )


2 2 2

𝑀 𝑀
≤ E(‖𝑋𝑛 − 𝑋‖) + P(‖𝑋𝑛 − 𝑋‖ ≥ ) + E(‖𝑋‖1‖𝑋‖≥ 𝑀 )
2 2 2

41
7 Convergence of random variables

Take lim sup,

lim sup E(‖𝑋𝑛 ‖1‖𝑋𝑀 ‖≥𝑛 ) ≤ 0 + 0 + E(‖𝑋‖1‖𝑋‖≥ 𝑀 ) → 0


2
𝑛→∞

by dominated convergence theorem.


• 2 ⟹ 1: Prove first that 𝑋 is integrable. By the previous proposition,
we can find a subsequence (𝑛𝑘 )𝑘 such that 𝑋𝑛𝑘 → 𝑋 almost surely. By
Fatou’s lemma,

E(‖𝑋‖1‖𝑋‖≥𝑀 ) ≤ lim inf E(‖𝑋𝑛𝑘 ‖1‖𝑋𝑛 ‖≥𝑀 )


𝑘→0 𝑘

which goes to 0 as 𝑀 → ∞ by uniform integrability assumption. Thus

E(‖𝑋‖) ≤ 𝑀 + E(‖𝑋‖1‖𝑋‖≥𝑀 ) < ∞

for 𝑀 sufficiently large. Thus 𝑋 is integrable.


To show convergence in mean, we use the same trick of spliting into small
and big parts.

E(‖𝑋𝑛 − 𝑋‖) = E((1‖𝑋𝑛 −𝑋‖≤𝜀 + 1‖𝑋𝑛 −𝑋‖>𝜀 )‖𝑋𝑛 − 𝑋‖)


≤ 𝜀 + E(1‖𝑋𝑛 −𝑋‖>𝜀 ‖𝑋𝑛 − 𝑋‖(1‖𝑋𝑛 ‖≤𝑀 + 1‖𝑋𝑛 ‖>𝑀 ))
≤𝜀+†+‡

where

† = E(‖𝑋𝑛 − 𝑋‖1‖𝑋𝑛 −𝑋‖>𝜀 1‖𝑋𝑛 ‖≤𝑀 )


≤ E((𝑀 + ‖𝑋‖)1‖𝑋𝑛 −𝑋‖>𝜀 (1‖𝑋‖≤𝑀 + 1‖𝑋‖>𝑀 )
≤ 2𝑀P(‖𝑋𝑛 − 𝑋‖ > 𝜀) + 2E(‖𝑋‖1‖𝑋‖>𝑀 )

so
lim sup † ≤ 2E(‖𝑋‖1‖𝑋‖>𝑀 ) → 0
𝑛→∞

as 𝑀 → ∞. On the other hand

‡ = E(‖𝑋𝑛 − 𝑋‖1‖𝑋𝑛 −𝑋‖>𝜀 1‖𝑋𝑛 ‖>𝑀 )


≤ E((‖𝑋𝑛 ‖ + ‖𝑋‖)1‖𝑋𝑛 ‖≥𝑀 )
≤ E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 + ‖𝑋‖1‖𝑋‖>𝑀 + ‖𝑋‖1‖𝑋𝑛 ‖≥𝑀 1‖𝑋‖≤𝑀 )
≤ 2E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) + E(‖𝑋‖1‖𝑋‖>𝑀 )

taking lim sup,

lim sup ‡ ≤ 2 lim sup E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) + E(‖𝑋‖1‖𝑋‖>𝑀 ) → 0 + 0


𝑛→∞ 𝑛→∞

as 𝑀 → ∞.

42
7 Convergence of random variables

Definition. We say that a sequence of random variables (𝑋𝑛 )𝑛≥1 is bounded


in 𝐿𝑝 if there exists 𝐶 > 0 such that E(‖𝑋𝑛 ‖𝑝 ) ≤ 𝐶 for all 𝑛.

Proposition 7.4. If 𝑝 > 1 and (𝑋𝑛 )𝑛≥1 is bounded in 𝐿𝑝 then (𝑋𝑛 )𝑛≥1 is
uniformly integrable.

Proof.

𝑀 𝑝−1 E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) ≤ E(‖𝑋𝑛 ‖𝑝 1‖𝑋𝑛 ‖≥𝑀 ) ≤ E(‖𝑋𝑛 ‖𝑝 ) ≤ 𝐶

so
𝐶
lim sup E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) ≤ →0
𝑛→∞ 𝑀 𝑝−1
as 𝑀 → ∞.
This provides a sufficient condition for uniform integrability and thus con-
vergnce in mean.

43
8 𝐿𝑝 spaces

8 𝐿𝑝 spaces
Recall that 𝜑 ∶ 𝐼 → R is convex means that for all 𝑥, 𝑦 ∈ 𝐼, for all 𝑡 ∈ [0, 1],

𝜑(𝑡𝑥 + (1 − 𝑡)𝑦) ≤ 𝑡𝜑(𝑥) + (1 − 𝑡)𝜑(𝑦).

Proposition 8.1 (Jensen inequality). Let 𝐼 be an open interval of R and


𝜑 ∶ 𝐼 → R a convex function. Let 𝑋 be a random variable (Ω, ℱ, P). Assume
𝑋 is integrable and takes values in 𝐼. Then

E(𝜑(𝑋)) ≥ 𝜑(E(𝑋)).

Remark.
1. As 𝑋 ∈ 𝐼 almost surely and 𝐼 is an interval, have E(𝑋) ∈ 𝐼.
2. We’ll show that 𝜑(𝑋)− is integrable so

E(𝜑(𝑋)) = E(𝜑(𝑋)+ ) − E(𝜑(𝑋)− )

with the possibility that both sides are infinity.

Lemma 8.2. TFAE:


1. 𝜑 is convex,
2. there exists a family ℱ of affine functions (𝑥 ↦ 𝑎𝑥 + 𝑏) such that
𝜑 = supℓ∈ℱ ℓ on 𝐼.

Proof.
1. 2 ⟹ 1: every ℓ is convex and the supremum of ℓ is convex:

ℓ(𝑡𝑥 + (1 − 𝑡)𝑦) = 𝑡ℓ(𝑥) + (1 − 𝑡)ℓ(𝑦) ≤ 𝑡 sup ℓ(𝑥) + (1 − 𝑡) sup ℓ(𝑦)


ℓ∈ℱ ℓ∈ℱ

so

𝜑(𝑡𝑥 + (1 − 𝑡)𝑦) = sup ℓ(𝑡𝑥 + (1 − 𝑡)𝑦)


ℓ∈ℱ
≤ 𝑡 sup ℓ(𝑥) + (1 − 𝑡) sup ℓ(𝑦)
ℓ∈ℱ ℓ∈ℱ
= 𝑡𝜑(𝑥) + (1 − 𝑡)𝜑(𝑦)

2. We need to show that for all 𝑥0 ∈ 𝐼 we can find an affine function

ℓ𝑥0 (𝑥) = 𝜃𝑥0 (𝑥 − 𝑥0 ) + 𝜑(𝑥0 ),

where 𝜃𝑥0 is morally the slope at 𝑥0 , such that 𝜑(𝑥) ≥ ℓ𝑥0 (𝑥) for all 𝑥 ∈ 𝐼.
Then have 𝜑 = sup𝑥 ∈𝐼 ℓ𝑥0 .
0

To find 𝜃𝑥0 observe that for all 𝑥 < 𝑥0 < 𝑦 where 𝑥, 𝑦 ∈ 𝐼, have

𝜑(𝑥0 ) − 𝜑(𝑥) 𝜑(𝑦) − 𝜑(𝑥0 )


≤ .
𝑥0 − 𝑥 𝑦 − 𝑥0

44
8 𝐿𝑝 spaces

Indeed this is the convexity of 𝜑 on [𝑥, 𝑦] with 𝑡 = 𝑦−𝑥 .


𝑥0 −𝑥
This holds for
all 𝑥 < 𝑥0 , 𝑦 > 𝑥0 so there exists 𝜃 ∈ R such that
𝜑(𝑥0 ) − 𝜑(𝑥) 𝜑(𝑦) − 𝜑(𝑥0 )
≤𝜃≤ .
𝑥0 − 𝑥 𝑦 − 𝑥0
Then just set ℓ𝑥0 (𝑥) = 𝜃(𝑥 − 𝑥0 ) + 𝜑(𝑥0 ). By construction 𝜑(𝑥) ≥ ℓ𝑥0 (𝑥)
for all 𝑥 ∈ 𝐼.

Proof of Jensen inequality. Let 𝜑(𝑥) = supℓ∈ℱ ℓ(𝑥) where ℓ affine. then

E(𝜑(𝑋)) ≥ E(ℓ(𝑋)) = ℓ(E(𝑋))

for all ℓ ∈ ℱ so take supremum,

E(𝜑(𝑋)) ≥ sup ℓ(E(𝑋)) = 𝜑(E(𝑋)).


ℓ∈ℱ

Also for the remark,

−𝜑 = − sup ℓ = inf (−ℓ)


ℓ∈ℱ ℓ∈ℱ

so 𝜑 = (−𝜑) ≤ |ℓ| for all ℓ ∈ ℱ. Then


− +

𝜑(𝑋)− ≤ |ℓ(𝑋)| ≤ |𝑎||𝑋| + |𝑏|.

As 𝑋 is integrable, 𝜑(𝑋)− is integrable.


Jensen inequality is for probability space only. The following applies to all
measure spaces.

Proposition 8.3 (Minkowski inquality). Let (𝑋, 𝒜, 𝜇) be a measure space


and 𝑓, 𝑔 measurable functions on 𝑋. Let 𝑝 ∈ [1, ∞) and define the 𝑝-norm
1/𝑝
‖𝑓‖𝑝 = (∫ |𝑓|𝑝 𝑑𝜇) .
𝑋

Then
‖𝑓 + 𝑔‖𝑝 ≤ ‖𝑓‖𝑝 + ‖𝑔‖𝑝 .

Proof. wlog assume ‖𝑓‖𝑝 , ‖𝑓‖𝑝 ≠ 0. Need to show

‖𝑓‖𝑝 𝑓 ‖𝑔‖𝑝 𝑔
∥ + ∥ ≤ 1.
‖𝑓‖𝑝 + ‖𝑔‖𝑝 ‖𝑓‖ 𝑝 ‖𝑓‖𝑝 + ‖𝑔‖𝑝 ‖𝑔‖𝑝
𝑝

Suffice to show for all 𝑡 ∈ [0, 1], for all 𝐹 , 𝐺 measurable such that ‖𝐹 ‖𝑝 = ‖𝐺‖𝑝 =
1, have
‖𝑡𝐹 + (1 − 𝑡)𝐺‖𝑝 ≤ 1
“the unit ball is convex”. For this note that

[0, +∞) → [0, +∞)


𝑥 ↦ 𝑥𝑝

45
8 𝐿𝑝 spaces

is convex if 𝑝 ≥ 1 so
|𝑡𝐹 + (1 − 𝑡)𝐺|𝑝 ≤ 𝑡|𝐹 |𝑝 + (1 − 𝑡)|𝐺|𝑝
and
∫ |𝑡𝐹 + (1 − 𝑡)𝐺|𝑝 𝑑𝜇 ≤ 𝑡 ∫ |𝐹 |𝑝 𝑑𝜇 +(1 − 𝑡) ∫ |𝐺|𝑝 𝑑𝜇 = 1.
𝑋 ⏟𝑋
⏟⏟⏟⏟ ⏟𝑋
⏟⏟⏟⏟
=1 =1

Proposition 8.4 (Hölder inequality). Suppose (𝑋, 𝒜, 𝜇) is a measure space


and let 𝑓, 𝑔 be measurable functions on 𝑋. Given 𝑝, 𝑞 ∈ (1, ∞) such that
1 1
𝑝 + 𝑞 = 1,
1/𝑝 1/𝑞
∫ |𝑓𝑔|𝑑𝜇 ≤ (∫ |𝑓|𝑝 𝑑𝜇) (∫ |𝑔|𝑞 𝑑𝜇)
𝑋 𝑋 𝑋

with equality if and only if there exists (𝛼, 𝛽) ≠ (0, 0) such that 𝛼|𝑓|𝑝 = 𝛽|𝑔|𝑞
𝜇-almost everywhere.

Lemma 8.5 (Young inequality). For all 𝑝, 𝑞 ∈ (1, ∞) such that 1


𝑝 + 1
𝑞 = 1,
for all 𝑎, 𝑏 ≥ 0, have
𝑎𝑝 𝑏𝑞
𝑎𝑏 ≤ + .
𝑝 𝑞

Proof of Hölder inequality. wlog assume ‖𝑓‖𝑝 , ‖𝑔‖𝑞 ≠ 0. By scaling by factors


(𝛼, 𝛽) ≠ (0, 0) wlog ‖𝑓‖𝑝 = ‖𝑔‖𝑞 = 1. Then by Young inequality,
1 𝑝 1 𝑞
|𝑓𝑔| ≤ |𝑓| + |𝑔|
𝑝 𝑞
so
1 1 1 1
∫ |𝑓𝑔|𝑑𝜇 ≤ ∫ |𝑓|𝑝 𝑑𝜇 + ∫ |𝑔|𝑞 𝑑𝜇 = + = 1.
𝑋
𝑝 𝑋 𝑞 𝑋 𝑝 𝑞

Remark. Apply Jensen inequality to 𝜑(𝑥) = 𝑥𝑝 /𝑝 for 𝑝′ > 𝑝, we have


′ ′
E(|𝑋|𝑝 )1/𝑝 ≤ E(|𝑋|𝑝 )1/𝑝 ,
so the function 𝑝 ↦ E(|𝑋|𝑝 )1/𝑝 is non-decreasing. This can be used, for example,
to show that if 𝑋 has finite 𝑝′ th moment then it has finite 𝑝th moment for 𝑝′ ≥ 𝑝.

Definition. Let (𝑋, 𝒜, 𝜇) be a measure space.

• For 𝑝 ≥ 1,

ℒ𝑝 (𝑋, 𝒜, 𝜇) = {𝑓 ∶ 𝑋 → R measurable such that |𝑓|𝑝 is 𝜇-integrable}.

• For 𝑝 = ∞,

ℒ∞ (𝑋, 𝒜, 𝜇) = {𝑓 ∶ 𝑋 → R measurable such that essup |𝑓| < ∞}

46
8 𝐿𝑝 spaces

where

essup |𝑓| = inf{𝑡 ∶ |𝑓(𝑥)| ≤ 𝑡 for 𝜇-almost every 𝑥}.

Lemma 8.6. ℒ𝑝 (𝑋, 𝒜, 𝜇) is an R-vector space.

Proof. For 𝑝 < ∞ use Minkowski inequality. Similar we can check that ‖𝑓 +
𝑔‖∞ ≤ ‖𝑓‖∞ + ‖𝑔‖∞ for all 𝑓, 𝑔.

Definition. We say 𝑓 and 𝑔 are 𝜇-equivalent and write 𝑓 ≡𝜇 𝑔 if for 𝜇-


almost every 𝑥, 𝑓(𝑥) = 𝑔(𝑥).

Check that this is an equvalence relation stable under addition and multi-
plication.

Definition (𝐿𝑝 -space). Define

𝐿𝑝 (𝑋, 𝒜, 𝜇) = ℒ𝑝 (𝑋, 𝒜, 𝜇)/ ≡𝜇

and if [𝑓] denotes the equivalence class of 𝑓 under ≡𝜇 we define

‖[𝑓]‖𝑝 = ‖𝑓‖𝑝 .

Proposition 8.7. For 𝑝 ∈ [1, ∞], 𝐿𝑝 (𝑋, 𝒜, 𝜇) is a normed vector space


under ‖⋅‖𝑝 and it is complete, i.e. it is a Banach space.

Proof. If 𝑓 ≡𝜇 𝑔 then ‖𝑓‖𝑝 = ‖𝑔‖𝑝 so ‖⋅‖𝑝 on 𝐿𝑝 is well-defined. Triangle


inequality follows from Minkowski inequality and linearity is obvious so ‖⋅‖𝑝
is indeed a norm.
For completeness, pick (𝑓𝑛 )𝑛 a Cauchy sequence in ℒ𝑝 (𝑋, 𝒜, 𝜇). Need to
show that there exists 𝑓 ∈ ℒ𝑝 such that ‖𝑓𝑛 − 𝑓‖𝑝 → 0 as 𝑛 → ∞. This then
implies that [𝑓𝑛 ] → [𝑓] in 𝐿𝑝 .
We can extract a subsequence 𝑛𝑘 ↑ ∞ such that ‖𝑓𝑛𝑘+1 − 𝑓𝑛𝑘 ‖𝑝 ≤ 2−𝑘 . Let
𝐾
𝑆𝐾 = ∑ |𝑓𝑛𝑘+1 − 𝑓𝑛𝑘 |
𝑘=1

then
𝐾 𝐾
‖𝑆𝐾 ‖𝑝 ≤ ∑‖𝑓𝑛𝑘+1 − 𝑓𝑛𝑘 ‖𝑝 ≤ ∑ 2−𝑘 ≤ 1
𝑘=1 𝑘=1
so by monotone convergence,

lim ∫ |𝑆𝐾 |𝑝 𝑑𝜇 = ∫ |𝑆∞ |𝑝 𝑑𝜇,


𝐾→∞
𝑋 𝑋

i.e. 𝑆∞ ∈ ℒ𝑝 . In particular for 𝜇-almost everywhere 𝑥, |𝑆∞ (𝑥)| < ∞, i.e.

∑ |𝑓𝑛𝑘+1 (𝑥) − 𝑓𝑛𝑘 (𝑥)| < ∞.


𝑘≥1

47
8 𝐿𝑝 spaces

Hence (𝑓𝑛𝑘 (𝑥))𝑘 is a Cauchy sequence in R. By completeness of R, the limit


exists and set 𝑓(𝑥) to be it. When this limit does not exist set 𝑓(𝑥) = 0.
We then have, in case 𝑝 < ∞, by Fatou’s lemma

‖𝑓𝑛 − 𝑓‖𝑝 ≤ lim inf‖𝑓𝑛 − 𝑓𝑛𝑘 ‖𝑝 ≤ 𝜀


𝑘→∞

for any 𝜀 for 𝑛 sufficiently large. Thus

lim ‖𝑓𝑛 − 𝑓‖𝑝 = 0.


𝑛→∞

When 𝑝 = ∞, we use the fact that if 𝑓𝑛 → 𝑓 𝜇-almost everywhere then

‖𝑓‖∞ ≤ lim inf‖𝑓𝑛 ‖∞ .


𝑛→∞

Proof. Let 𝑡 > lim sup𝑛→∞ ‖𝑓𝑛 ‖∞ . Then exists 𝑛𝑘 ↑ ∞ such that

‖𝑓𝑛𝑘 ‖∞ = essup |𝑓𝑛𝑘 | = inf{𝑠 ≥ 0 ∶ |𝑓𝑛𝑘 (𝑥)| ≤ 𝑠 for 𝜇-almost every 𝑥} < 𝑡

for all 𝑘. Thus for all 𝑘, for 𝜇-almost every 𝑥, |𝑓𝑛𝑘 (𝑥)| < 𝑡. But by 𝜎-additivity
of 𝜇 we can swap the quantifiers, i.e. for 𝜇-almost every 𝑥, for all 𝑥, |𝑓𝑛𝑘 (𝑥)| < 𝑡.
Thus for 𝜇-almost every 𝑥, ‖𝑓(𝑥)‖∞ ≤ 𝑡.

Proposition 8.8 (approximation by simple functions). Let 𝑝 ∈ [1, ∞). Let


𝑉 be the linear span of all simple functions on 𝑋. Then 𝑉 ∩ ℒ𝑝 is dense in
ℒ𝑝 .

Proof. Note that 𝑔 ∈ ℒ𝑝 implies 𝑔+ , 𝑔− ∈ ℒ𝑝 . Thus by writing 𝑓 = 𝑓 + − 𝑓 −


and using Minkowski inequality, suffice to show 𝑓 ≥ 0 is the limit of a sequence
of simple functions.
Recall there exist simple functions 0 ≤ 𝑔𝑛 ≤ 𝑓 such that 𝑔𝑛 (𝑥) ↑ 𝑓(𝑥) for
𝜇-almost every 𝑥. Then

lim ‖𝑔𝑛 − 𝑓‖𝑝𝑝 = lim ∫ |𝑔𝑛 − 𝑓|𝑝 𝑑𝜇 = 0


𝑛→∞ 𝑛→∞
𝑋

by dominated convergence theorem (|𝑔𝑛 − 𝑓| ≤ 𝑔𝑛 + 𝑓 ≤ 2𝑓 so 𝑔𝑛 − 𝑓 is 𝜇-


integrable).
Remark. When 𝑋 = R𝑑 , 𝒜 = ℬ(R𝑑 ) and 𝜇 is the Lebesgue measure, 𝐶𝑐 (R𝑑 ),
the space of continuous functions with compact support is dense in ℒ𝑝 (𝑋, 𝒜, 𝜇)
when 𝑝 ∈ [1, ∞) (this does not hold for 𝑝 = ∞: a constant nonzero function has
no noncompact support). See example sheet. In fact, 𝐶𝑐∞ (R𝑑 ) suffices.

48
9 Hilbert space and 𝐿2 -methods

9 Hilbert space and 𝐿2 -methods

Definition (inner product). Let 𝑉 be a complex vector space. A Hermitian


inner product on 𝑉 is a map

𝑉 ×𝑉→C
(𝑥, 𝑦) ↦ ⟨𝑥, 𝑦⟩

such that
1. ⟨𝛼𝑥 + 𝛽𝑦, 𝑧⟩ = 𝛼⟨𝑥, 𝑧⟩ + 𝛽⟨𝑦, 𝑧⟩ for all 𝛼, 𝛽 ∈ C, for all 𝑥, 𝑦, 𝑧 ∈ 𝑉.

2. ⟨𝑦, 𝑥⟩ = ⟨𝑥, 𝑦⟩.


3. ⟨𝑥, 𝑥⟩ ∈ R and ⟨𝑥, 𝑥⟩ ≥ 0, with equality if and only if 𝑥 = 0.

Definition (Hermitian norm). The Hermitian norm is defined as ‖𝑥‖ =


√⟨𝑥, 𝑥⟩.

Lemma 9.1. Properties of norm:


1. linearity: ‖𝜆𝑥‖ = |𝜆|‖𝑥‖ for all 𝜆 ∈ C, 𝑥 ∈ 𝑉,
2. Cauchy-Schwarz: |⟨𝑥, 𝑦⟩| ≤ ‖𝑥‖ ⋅ ‖𝜓‖,
3. triangle inequality: ‖𝑥 + 𝑦‖ ≤ ‖𝑥‖ + ‖𝑦‖

4. parallelogram identity: ‖𝑥 + 𝑦‖2 + ‖𝑥 − 𝑦‖2 = 2(‖𝑥‖2 + ‖𝑦‖2 )

Proof. Exercise. For reference see author’s notes on IID Linear Analysis.

Corollary 9.2. (𝑉 , ‖⋅‖) is a normed vector space.

Definition (Hilbert space). We say (𝑉 , ‖⋅‖) is a Hilbert space if it is com-


plete.

Example. Let 𝑉 = 𝐿2 (𝑋, 𝒜, 𝜇) where (𝑋, 𝒜, 𝜇) is a measure space. Then we


can define
⟨𝑓, 𝑔⟩ = ∫ 𝑓𝑔𝑑𝜇
𝑋

which is well-defined (i.e. finite) by Cauchy-Schwarz. The axioms are easy to


check, with positive-definiteness given by

0 = ⟨𝑓, 𝑓⟩ = ∫ |𝑓|2 𝑑𝜇
𝑋

if and only if 𝑓 = 0 𝜇-almost everywhere so 𝑓 = 0.

49
9 Hilbert space and 𝐿2 -methods

Proposition 9.3. Let 𝐻 be a Hilbert space and let 𝒞 be a closed convex


subset of 𝐻. Then for all 𝑥 ∈ 𝐻, there exists unique 𝑦 ∈ 𝒞 such that

‖𝑥 − 𝑦‖ = 𝑑(𝑥, 𝒞)

where by definition
𝑑(𝑥, 𝒞) = inf ‖𝑥 − 𝑐‖.
𝑐∈𝒞

This 𝑦 is called the orthogonal projection of 𝑥 on 𝒞.

Proof. Let 𝑐𝑛 ∈ 𝒞 be a sequence such that ‖𝑥 − 𝑐𝑛 ‖ → 𝑑(𝑥, 𝒞). Let’s show that
(𝑐𝑛 )𝑛 is a Cauchy sequence. By parallelogram identity,

𝑥 − 𝑐𝑛 𝑥 − 𝑐𝑚 2 𝑥 − 𝑐𝑛 𝑥 − 𝑐𝑚 2 1
∥ + ∥ +∥ − ∥ = (‖𝑥 − 𝑐𝑛 ‖2 + ‖𝑥 − 𝑐𝑚 ‖2 )
2 2 2 2 2
so
2
∥ 𝑐 + 𝑐𝑚 ∥ 1 1
∥𝑥 − 𝑛 ∥ + ‖𝑐𝑛 − 𝑐𝑚 ‖2 = (‖𝑥 − 𝑐𝑛 ‖2 + ‖𝑥 − 𝑐𝑚 ‖2 )
∥ ⏟ 2 ∥ 4 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
2
∥⏟⏟⏟⏟⏟ ∥
∈𝒞 ⏟⏟ →𝑑(𝑥,𝒞)2
≥𝑑(𝑥,𝒞)2
so
lim ‖𝑐𝑛 − 𝑐𝑚 ‖ = 0
𝑛,𝑚→∞

i.e. (𝑐𝑛 )𝑛 is Cauchy. By completeness exist lim𝑛→∞ 𝑐𝑛 = 𝑦 ∈ 𝐻. As 𝒞 is closed,


𝑦 ∈ 𝒞. As ‖𝑥 − 𝑐𝑛 ‖ → 𝑑(𝑥, 𝒞), ‖𝑥 − 𝑦‖ → 𝑑(𝑥, 𝒞). This shows existence of 𝑦.
For uniqueness, use parallelogram identity

𝑦 + 𝑦′ 2 1 1
∥𝑥 − ∥ + ‖𝑦 − 𝑦′ ‖2 = (‖𝑥 − 𝑦‖2 + ‖𝑥 − 𝑦′ ‖2 ) = 𝑑(𝑥, 𝒞)2
⏟⏟⏟⏟⏟ 2 4 2
≥𝑑(𝑥,𝒞)

so ‖𝑦 − 𝑦′ ‖ = 0.

Corollary 9.4. Suppose 𝑉 ≤ 𝐻 is a closed subspace of a Hilbert space 𝐻.


Then
𝐻 =𝑉 ⊕𝑉⟂
where
𝑉 ⟂ = {𝑥 ∈ 𝐻 ∶ ⟨𝑥, 𝑣⟩ = 0 for all 𝑣 ∈ 𝑉 }
is the orthogonal of 𝑉.

Proof. 𝑉 ∩ 𝑉 ⟂ = 0 by positivity of inner product. If 𝑥 ∈ 𝐻 then there exists a


unique 𝑦 ∈ 𝑉 such that ‖𝑥 − 𝑦‖ = 𝑑(𝑥, 𝑦). Need to show that 𝑥 − 𝑦 ∈ 𝑉 ⟂ .
For all 𝑧 ∈ 𝑉,
‖𝑥 − 𝑦 − 𝑧‖ ≥ ‖𝑥 − 𝑦‖
as 𝑦 + 𝑧 ∈ 𝑉. Thus

‖𝑥 − 𝑦‖2 + ‖𝑧‖ − 2 Re⟨𝑥 − 𝑦, 𝑧⟩ ≥ ‖𝑥 − 𝑦‖2 .

50
9 Hilbert space and 𝐿2 -methods

Rearrange,
2 Re⟨𝑥 − 𝑦, 𝑧⟩ ≤ ‖𝑧‖2
for all 𝑧 ∈ 𝑉. Now substitute 𝑡𝑧 for 𝑧 where 𝑡 ∈ R+ , have
𝑡 ⋅ 2 Re 𝑥 − 𝑦, 𝑧 ≤ 𝑡2 ‖𝑧‖2 .
For 𝑡 = 0,
Re⟨𝑥 − 𝑦, 𝑧⟩ ≤ 0
Similarly replace 𝑧 by −𝑧 to conclude Re⟨𝑥 − 𝑦, 𝑧⟩ = 0. Finally replace 𝑧 by 𝑒𝑖𝜃 𝑧
to have ⟨𝑥 − 𝑦, 𝑧⟩ = 0 for all 𝑧. Thus 𝑥 − 𝑦 ∈ 𝑉 ⟂ .

Definition (bounded linear form). A linear form ℓ ∶ 𝐻 → C is bounded if


there exists 𝐶 > 0 such that |ℓ(𝑥)| ≤ 𝐶‖𝑥‖ for all 𝑥 ∈ 𝐻.

Remark. ℓ bounded is equivalent to ℓ continuous.

Theorem 9.5 (Riesz representation theorem). Let 𝐻 be a Hilbert space.


For every bounded linear form ℓ there exists 𝑣0 ∈ 𝐻 such that

ℓ(𝑥) = ⟨𝑥, 𝑣0 ⟩

for all 𝑥 ∈ 𝐻.
Proof. By boundedness of ℓ, ker ℓ is closed so write
𝐻 = ker ℓ ⊕ (ker ℓ)⟂ .
If ℓ = 0 then just pick 𝑣0 = 0. Otherwise pick 𝑥0 ∈ (ker ℓ)⟂ \ {0}. But (ker ℓ)⟂
is spanned by 𝑥0 : indeed for any 𝑥 ∈ (ker ℓ)⟂ ,
ℓ(𝑥)
ℓ(𝑥) = ℓ(𝑥0 )
ℓ(𝑥0 )
so
ℓ(𝑥)
ℓ(𝑥 − 𝑥 )=0
ℓ(𝑥0 ) 0
so 𝑥 − ℓ(𝑥)
ℓ(𝑥0 ) ∈ ker ℓ ∩ (ker ℓ)⟂ = 0. Now let

ℓ(𝑥0 )
𝑣0 = 𝑥
‖𝑥0 ‖2 0
and observe that ℓ(𝑥) − ⟨𝑥, 𝑣0 ⟩ vanishes on ker ℓ and on (ker ℓ)⟂ = C𝑥0 . Thus
it is identically zero.

Definition (absolutely continuous, singular measure). Let (𝑋, 𝒜) be a mea-


surable space and let 𝜇, 𝜈 be two measures on (𝑋, 𝒜).

1. 𝜇 is absolutely continuous with respect to 𝜈, denoted 𝜇 ≪ 𝜈, if for


every 𝐴 ∈ 𝒜, 𝜈(𝐴) = 0 implies 𝜇(𝐴) = 0.
2. 𝜇 is singular, denoted 𝜇 ⟂ 𝜈, if exists Ω ∈ 𝒜 such that 𝜇(Ω) = 0 and

51
9 Hilbert space and 𝐿2 -methods

𝜈(Ω𝑐 ) = 0.
Example.
1. Let 𝜈 be the Lebesgue measure on (R, ℬ(R)) and 𝑑𝜇 = 𝑓𝑑𝜈 where 𝑓 ≥ 0
is a Borel function then 𝜇 ≪ 𝜈.
2. If 𝜇 = 𝛿𝑥0 , the Dirac mass at 𝑥0 ∈ R, then 𝜇 ⟂ 𝑣.
Non-examinable theorem and proof:

Theorem 9.6 (Radon-Nikodym). Assume 𝜇 and 𝜈 are 𝜎-finite measures


on (𝑋, 𝒜).

1. If 𝜇 ≪ 𝜈 then there exists 𝑔 ≥ 0 measurable such that 𝑑𝜇 = 𝑔𝑑𝜈,


namely
𝜇(𝐴) = ∫ 𝑔(𝑥)𝑑𝜈(𝑥)
𝐴

for all 𝐴 ∈ 𝒜. 𝑔 is called the density of 𝜇 with respect to 𝜈 or Radon-


Nikodym derivative, sometimes denoted by 𝑔 = 𝑑𝜇 𝑑𝜈 .

2. For any 𝜇, 𝜈 𝜎-finite, 𝜇 decomposes as

𝜇 = 𝜇 𝑎 + 𝜇𝑠

where 𝜇𝑎 ≪ 𝜈 and 𝜇𝑠 ⟂ 𝜈. Moreover this decomposition is unique.


Proof. Consider 𝐻 = 𝐿2 (𝑋, 𝒜, 𝜇 + 𝜈), which is a Hilbert space. First assume 𝜇
and 𝜈 are finite. Consider the linear form
ℓ∶𝐻→R

𝑓 ↦ 𝜇(𝑓) = ∫ 𝑓𝑑𝜇
𝑋
ℓ is bounded by Cauchy-Schwarz and finiteness of the measures:

|𝜇(𝑓)| ≤ 𝜇(|𝑓|) ≤ (𝜇 + 𝜈)(|𝑓|) ≤ √(𝜇 + 𝜈)(𝑋) ⋅ √∫ |𝑓|2 𝑑(𝜇 + 𝜈) = 𝐶 ⋅ ‖𝑓‖𝐻


𝑋

so by Riesz representation theorem, there exists 𝑔0 ∈ 𝐿2 (𝑋, 𝒜, 𝜇 + 𝜈) such that

𝜇(𝑓) = ∫ 𝑓𝑔0 𝑑(𝜇 + 𝜈). (∗)


𝑋
Claim that for (𝜇 + 𝜈)-almost every 𝑥, 0 ≤ 𝑔0 (𝑥) ≤ 1: take 𝑓 = 1{𝑔0 <0} and plug
it into (∗),
0 ≤ 𝜇({𝑔0 < 0}) = ∫ 1⏟
{𝑔0 <0} 𝑔0 𝑑(𝜇 + 𝜈) ≤ 0
𝑋
≤0
so equality throughout. Thus 𝑔0 ≥ 0 (𝜇 + 𝜈)-almost everywhere. Similarly take
𝑓 = 1{𝑔0 >1+𝜀} for 𝜀 > 0 and plug it into (∗),
(𝜇 + 𝜈)({𝑔0 > 1 + 𝜀}) ≥ 𝜇({𝑔0 > 1 + 𝜀})

= ∫ 1{𝑔0 >1+𝜀} 𝑔0 𝑑(𝜇 + 𝜈)


𝑋
≥ (1 + 𝜀)(𝜇 + 𝜈)({𝑔0 > 1 + 𝜀})

52
9 Hilbert space and 𝐿2 -methods

so must have
(𝜇 + 𝜈)({𝑔0 > 1 + 𝜀}) = 0
i.e. 𝑔0 ≤ 1 (𝜇 + 𝜈)-almost everywhere.
Now set Ω = {𝑥 ∈ 𝑋 ∶ 𝑔0 ∈ [0, 1)} so on Ω𝑐 , 𝑔0 = 1 (𝜇 + 𝜈)-almost
everywhere. Then (∗) is equivalent to

∫ 𝑓(1 − 𝑔0 )𝑑𝜇 = ∫ 𝑓𝑔0 𝑑𝜈


𝑋 𝑋

for all 𝑓 ∈ 𝐿2 (𝑋, 𝒜, 𝜇 + 𝜈). Hence this holds for all 𝑓 ≥ 0. Now 𝑓 be 1−𝑔0 1Ω ,
𝑓

get
𝑔
∫ 𝑓𝑑𝜇 = ∫ 𝑓 0 𝑑𝜈. (∗∗)
Ω Ω
1 − 𝑔0
Set

𝜇𝑎 (𝐴) = 𝜇(𝐴 ∩ Ω)
𝜇𝑠 (𝐴) = 𝜇(𝐴 ∩ Ω𝑐 )

Clearly 𝜇 = 𝜇𝑎 + 𝜇𝑠 . Claim that this is the required result, i.e.

1. 𝜇𝑎 ≪ 𝜈,
2. 𝜇𝑠 ⟂ 𝜈,
3. 𝑑𝜇𝑎 = 𝑔𝑑𝜈 where 𝑔 = 1−𝑔0 1Ω .
𝑔0

Proof.

1. If 𝜈(𝐴) = 0 set 𝑓 = 1𝐴 and plug into (∗∗) to get 𝜇(𝐴 ∩ Ω) = 0, namely


𝜇𝑎 (𝐴) = 0.
2. Set 𝑓 = 1Ω𝑐 . On Ω𝑐 , 𝑔0 = 1 (𝜇 + 𝜈)-almost everywhere. Plug this into (∗)
to get 𝜈(Ω𝑐 ) = 0. But 𝜇𝑠 (Ω) = 0 so 𝜇𝑠 ⟂ 𝜈.
3. (∗∗) is equivalent to 𝑑𝜇𝑎 = 𝑔𝑑𝜈 where 𝑔 = 1−𝑔0 1Ω .
𝑔0

This settles part 2 of the theorem, and also part 1 as if 𝜇 ≪ 𝜈 then 𝜇 = 𝜇𝑎 .


If 𝜇 are 𝜈 are not finite but only 𝜎-finite, use the old trick of partition 𝑋
into countably many 𝜇- and 𝜈-finite sets and take their intersections. Suppose
we get a disjoint countable union 𝑋 = ⋃𝑛 𝑋𝑛 . Then 𝜇 = ∑𝑛 𝜇|𝑋𝑛 where for
each 𝑛 we can write
𝜇|𝑋𝑛 = (𝜇|𝑋𝑛 )𝑎 + (𝜇|𝑋𝑛 )𝑠 .
Then set

𝜇𝑎 = ∑(𝜇|𝑋𝑛 )𝑎
𝑛
𝜇𝑠 = ∑(𝜇|𝑋𝑛 )𝑠
𝑛

53
9 Hilbert space and 𝐿2 -methods

Remains to check uniqueness of decomposition. Suppose 𝜇 can be decom-


posed in two ways
𝜇 = 𝜇𝑎 + 𝜇𝑠 = 𝜇′𝑎 + 𝜇′𝑠 .
As 𝜇𝑠 , 𝜇′𝑠 ⟂ 𝜈 there exists Ω0 , Ω′0 ∈ 𝒜 such that

𝜇𝑠 (Ω0 ) = 0, 𝜈(Ω𝑐0 ) = 0
𝜇′𝑠 (Ω′0 ) = 0, 𝜈((Ω′0 )𝑐 ) = 0

Set Ω1 = Ω0 ∩ Ω′0 . Check that

𝜇𝑠 (Ω1 ) = 𝜇′𝑠 (Ω1 ) = 0


𝜈(Ω𝑐1 ) = 𝜈(Ω𝑐0 ∪ Ω′𝑐
0 )=0

Now 𝜇𝑎 , 𝜇′𝑎 ≪ 𝜈 so
𝜇𝑎 (Ω𝑐1 ) = 𝜇′𝑎 (Ω𝑐1 ) = 0.
Hence for all 𝐴 ∈ 𝒜,

𝜇𝑎 (𝐴) = 𝜇𝑎 (𝐴 ∩ Ω1 ) = 𝜇(𝐴 ∩ Ω1 ) = 𝜇′𝑎 (𝐴 ∩ Ω1 ) = 𝜇′𝑎 (𝐴)

so 𝜇𝑎 = 𝜇′𝑎 and hence 𝜇𝑠 = 𝜇′𝑠 .

Proposition 9.7. Let (Ω, ℱ, P) be a probability space. Let 𝒢 be a 𝜎-


subalgbebra of ℱ and 𝑋 a random variable on (Ω, ℱ, P). Assume 𝑋 is
integrable, then there exists a random variable 𝑌 on (Ω, 𝒢, P) such that

E(1𝐴 𝑋) = E(1𝐴 𝑌 )

for all 𝐴 ∈ 𝒢. Moreover 𝑌 is unique almost surely.

If you are perplexed by why it is nontrivial to recover a random variable on 𝒢


from one on ℱ, as 𝒢 ⊆ ℱ so it seems that we can easily restrict to a “sub-random
variable”. But this reasoning makes absolutely no sense as random variables
are functions from the space. In other words, id ∶ (Ω, ℱ, P) → (Ω, 𝒢, P) is
measurable but its inverse is not, and it is not obvious that 𝑋 has a pushforward.

Definition (conditional expectation). 𝑌 as above is called the conditional


expectation of 𝑋 with respect to 𝒢, denote by

𝑌 = E(𝑋|𝒢).

Proof. wlog assume 𝑋 ≥ 0. Set 𝜇(𝐴) = E(1𝐴 𝑋) for all 𝐴 ∈ 𝒢. 𝜇 is finite


by integrability of 𝑋 and is a measure on (Ω, 𝒢). Moreover 𝜇 ≪ P. Thus by
Radon-Nikodym there exists 𝑔 ≥ 0 𝒢-measurable such that

𝜇(𝐴) = ∫ 𝑔𝑑P = E(1𝐴 𝑔).


𝐴

Set 𝑌 = 𝑔.
Uniqueness is shown in example sheet 3.

54
9 Hilbert space and 𝐿2 -methods

Remark. In case 𝑋 ∈ 𝐿2 (Ω, ℱ, P) then 𝑌 is the orthogonal projection of 𝑋


onto 𝐿2 (Ω, 𝒢, P). It is well-defined since 𝐿2 (Ω, ℱ, P) is a Hilbert space and
𝐿2 (Ω, 𝒢, P) is a closed subspace. In this case TFAE:

1. E((𝑋 − 𝑌 )1𝐴 ) = 0 for all 𝐴 ∈ 𝒢,


2. E((𝑋 − 𝑌 )ℎ) = 0 for all ℎ simple on (Ω, 𝒢, P),
3. E((𝑋 − 𝑌 )ℎ) = 0 for all ℎ ≥ 0 𝒢-measurable,

4. E((𝑋 − 𝑌 )ℎ) = 0 for all ℎ ∈ 𝐿2 (Ω, 𝒢, P).


Remark. Special case when 𝒢 = {∅, 𝐵, 𝐵𝑐 , Ω} where 𝐵 ∈ ℱ:

P(𝐴|𝐵) 𝜔 ∈ 𝐵
E(1𝐴 |𝒢)(𝜔) = {
P(𝐴|𝐵𝑐 ) 𝜔 ∈ 𝐵𝑐

where
P(𝐴 ∩ 𝐵)
P(𝐴|𝐵) = .
P(𝐵)

Proposition 9.8 (non-examinable). Properties of conditional expectation:


1. linearity: E(𝛼𝑋 + 𝛽𝑌 |𝒢) = 𝛼E(𝑋|𝒢) + 𝛽E(𝑌 |𝒢).

2. if 𝑋 is 𝒢-measurable then E(𝑋|𝒢) = E(𝑋).


3. positivity: if 𝑋 ≥ 0 then E(𝑋|𝒢) ≥ 0.
4. E(E(𝑋|𝒢)|ℋ) = E(𝑋|ℋ) if ℋ ⊆ 𝒢.

5. if 𝑍 is 𝒢-measurable and bounded then E(𝑋𝑍|𝒢) = 𝑍 ⋅ E(𝑋|𝒢).

55
10 Fourier transform

10 Fourier transform

Definition (Fourier transform). Let 𝑓 ∈ 𝐿1 (R𝑑 , ℬ(R𝑑 ), 𝑑𝑥) where 𝑑𝑥 is the


Lebesgue measure. The function

𝑓 ̂ ∶ R𝑑 → C

𝑢 ↦ ∫ 𝑓(𝑥)𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝑥
R𝑑

where ⟨𝑢, 𝑥⟩ = 𝑢1 𝑥1 + ⋯ + 𝑢𝑑 𝑥𝑑 , is called the Fourier transform of 𝑓.

Proposition 10.1.
1. |𝑓(𝑢)|
̂ ≤ ‖𝑓‖1 .

2. 𝑓 ̂ is continuous.

Proof. 1 is clear. 2 follows from dominated convergence theorem.

Definition. Given a finite Borel measure 𝜇 on R𝑑 , its Fourier transform is


given by
𝜇(𝑢)
̂ = ∫ 𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝜇(𝑥).
R𝑑

Again |𝜇(𝑢)|
̂ ≤ 𝜇(R𝑑 ) and 𝜇̂ is continuous.
Remark. If 𝑋 is an R𝑑 -valued random variable with law 𝜇𝑋 then 𝜇𝑋
̂ is called
the characteristic function of 𝑋.

Example.
1. Normalised Gaussian distribution on R: 𝜇 = 𝒩(0, 1). 𝑑𝜇 = 𝑔𝑑𝑥 where
2
𝑒−𝑥 /2
𝑔(𝑥) = √ .
2𝜋
Claim that
2
𝜇(𝑢)
̂ = 𝑔(𝑢)
̂ = 𝑒−𝑢 /2
,

i.e. 𝑔 ̂ = 2𝜋𝑔. This is the defining characteristic of Gaussian distribution.

Proof. Since (𝑢 ↦ |𝑖𝑢 ⋅ 𝑒𝑖𝑢𝑥 𝑒−𝑥 |) ∈ 𝐿1 (R), we can differentiate under


2
/2

56
10 Fourier transform

integral sign to get


𝑑 𝑑 2 𝑑𝑥
𝑔(𝑢)
̂ = ∫ 𝑒𝑖𝑢𝑥 𝑒−𝑥 /2 √
𝑑𝑢 𝑑𝑢 R 2𝜋
2 𝑑𝑥
= ∫ 𝑖𝑥𝑒𝑖𝑢𝑥 𝑒−𝑥 /2 √
R 2𝜋

= − ∫ 𝑖𝑒𝑖𝑢𝑥 𝑔′ (𝑥)𝑑𝑥
R

= 𝑖 ∫(𝑒𝑖𝑢𝑥 )′ 𝑔(𝑥)𝑑𝑥
R

= −𝑢 ∫ 𝑒𝑖𝑢𝑥 𝑔(𝑥)𝑑𝑥
R
= −𝑢𝑔(𝑢)
̂

Thus
𝑑 𝑢2 /2
(𝑔(𝑢)𝑒
̂ )=0
𝑑𝑢
so
2
−𝑢 /2
𝑔(𝑢)
̂ = 𝑔(0)𝑒
̂ .
But
𝑔(0)
̂ = ∫ 𝑔(𝑥)𝑑𝑥 = 1
R

so 𝑔(𝑢)
̂ =𝑒 −𝑢2 /2
as required.

2. 𝑑-dimensional version: 𝜇 = 𝒩(0, 𝐼𝑑 ). 𝑑𝜇(𝑥) = 𝐺(𝑥)𝑑𝑥 where 𝑑𝑥 =


𝑑𝑥1 ⋯ 𝑑𝑥𝑑 and
𝑑 2
𝑒−‖𝑥‖ /2
𝐺(𝑥) = ∏ 𝑔(𝑥𝑖 ) = √ .
𝑖=1 ( 2𝜋)𝑑
Then

̂
𝐺(𝑢) = ∫ 𝑒𝑖⟨𝑢,𝑥⟩ 𝐺(𝑥)𝑑𝑥
R𝑑
𝑑
= ∏ ∫ 𝑔(𝑥𝑖 )𝑒𝑖𝑢𝑖 𝑥𝑖 𝑑𝑥𝑖
𝑖=1 R
𝑑
= ∏ 𝑔(𝑢
̂ 𝑖)
𝑖=1
−‖𝑢‖2 /2
=𝑒

Theorem 10.2 (Fourier inversion formula).

1. If 𝑓 ∈ 𝐿1 (R𝑑 ) is such that 𝑓 ̂ ∈ 𝐿1 (R𝑑 ) then 𝑓 is continuous (i.e. 𝑓


equals to a continuous function almost everywhere) and

1 ̂̂
𝑓(𝑥) = 𝑓(−𝑥).
(2𝜋)𝑑

57
10 Fourier transform

2. If 𝜇 is a finite Borel measure on R𝑑 such that 𝜇̂ ∈ 𝐿1 (R𝑑 ) then 𝜇 has


a continuous density with respect to Lebesgue measure, i.e. 𝑑𝜇 = 𝑓𝑑𝑥
with
1 ̂̂
𝑓(𝑥) = 𝜇(−𝑥).
(2𝜋)𝑑
Remark. In other words
1 ̂ −𝑖⟨𝑢,𝑥⟩
𝑓(𝑥) = ∫ 𝑓(𝑢)𝑒 𝑑𝑢
(2𝜋)𝑑 R𝑑

where 𝑓(𝑢)
̂ are Fourier coefficients and 𝑒−𝑖⟨𝑢,𝑥⟩ are called Fourier modes, which
are characters 𝑒𝑖⟨𝑢,−⟩ ∶ R𝑑 → {𝑧 ∈ C ∶ |𝑧| = 1}. Informally this says that every
𝑓 can be written as an “infinite linear combination” of Fourier modes.
Proof. (!UPDATE: 1 does not quite reduce to 2 as 𝑓 ̂ = 𝑓 +̂ − 𝑓 −̂ does not quite
hold. Instead write 𝑓 ̂ = 𝑎𝜇̂ − 𝑏𝜇̂ where 𝑎 = ‖𝑓 + ‖ , 𝑑𝜇 = 𝑓+ 𝑑𝑥)
𝑋 𝑌 1 𝑥 ‖𝑓 ‖1
1 reduces to 2 by considering 𝑓 = 𝑓 + − 𝑓 − . In 2 we may assume wlog 𝜇
is a probability measure so is the law of some random variable 𝑋. Let 𝑓(𝑥) =
1
(2𝜋)𝑑
̂̂
𝜇(−𝑥). We need to show 𝜇 = 𝑓𝑑𝑥, which is equivalent to for all 𝐴 ∈ ℬ(R𝑑 ),

𝜇(𝐴) = ∫ 𝑓1𝐴 𝑑𝑥.


R𝑑

Let ℎ = 1𝐴 and wlog assume 𝐴 is a bounded Borel set. The trick is to introduce
an independent Gaussian random variable 𝑁 ∼ 𝒩(0, 𝐼𝑑 ) with law 𝐺𝑑𝑥. We have

∫ ℎ(𝑥)𝑑𝜇(𝑥) = E(ℎ(𝑋))) = lim E(ℎ(𝑋 + 𝜎𝑁 ))


𝜎→0
R𝑑

by dominated convergence theorem. But

E(ℎ(𝑋 + 𝜎𝑁 )) = E (∫ ℎ(𝑋 + 𝜎𝑥)𝐺(𝑥)𝑑𝑥)


R𝑑
𝑑𝑢𝑑𝑥
= E (∫ ∫ ℎ(𝑋 + 𝜎𝑥)𝑒𝑖⟨𝑢,𝑥⟩ 𝐺(𝑢) √ )
R𝑑 R𝑑 ( 2𝜋)𝑑

as
1 ̂ 1
𝐺(𝑥) = √ 𝐺(𝑥) = √ ∫ 𝐺(𝑢)𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝑢.
( 2𝜋) 𝑑 ( 2𝜋)𝑑 R𝑑
So by a change of variable 𝑦 = 𝜎𝑥,
𝑑𝑢 𝑑𝑦
E(ℎ(𝑋 + 𝜎𝑁 )) = E (∫ ∫ ℎ(𝑋 + 𝑦)𝑒𝑖⟨𝑢,𝑦/𝜎⟩ 𝐺(𝑢) √ )
( 2𝜋)𝑑 𝜎𝑑
𝑑𝑢
= E (∫ ∫ ℎ(𝑧)𝑒𝑖⟨𝑢/𝜎,𝑧−𝑥⟩ 𝐺(𝑢) √ 𝑑𝑧)
( 2𝜋𝜎2 )𝑑
𝑢 𝑑𝑢
= ∫ ∫ ℎ(𝑧)𝑒𝑖⟨𝑢/𝜎,𝑧⟩ 𝜇𝑋
̂ (− )𝐺(𝑢) √ 𝑑𝑧 Tonelli-Fubini
𝜎 ( 2𝜋𝜎2 )𝑑
1 2 2
= 𝑑
∫ ∫ 𝜇𝑋̂ (𝑢)𝑒−𝑖⟨𝑢,𝑧⟩ ℎ(𝑧)𝑒−𝜎 ‖𝑢‖ /2 𝑑𝑢𝑑𝑧
(2𝜋)

58
10 Fourier transform

We want a condition to ensure 𝑓 ̂ ∈ 𝐿1 (R𝑑 ). Clearly continuity is necessary.


Here we use a generally principle in Fourier analysis: Fourier transform converts
decay at infinity to smoothness. In Fourier inversion formula, if 𝑢 is large then
the Fourier character has fast oscillation. Thus if 𝑓 ̂ decays fast at infinity then
the Fourier coefficients also decays fast, and the resulting transfrom is smoother.

Proposition 10.3. If 𝑓, 𝑓 ′ and 𝑓 ″ exists (for example if 𝑓 is 𝐶 2 ) and are


in 𝐿1 then 𝑓 ̂ ∈ 𝐿1 .

Proof. We prove the case 𝑑 = 1. The general case follows from Tonelli-Fubini.
We show first that 𝑓, 𝑓 ′ ∈ 𝐿1 implies that 𝑓(𝑢)
̂ = 𝑢𝑖 𝑓 ′̂ (𝑢). This easily follows
from integration by parts:

̂ = ∫ 𝑓(𝑥)𝑒𝑖𝑢𝑥 𝑑𝑥
𝑓(𝑢)
1
= ∫ 𝑓(𝑥)(𝑒𝑖𝑢𝑥 )′ 𝑑𝑥
𝑖𝑢
1
= − ∫ 𝑓 ′ (𝑥)𝑒𝑖𝑢𝑥 𝑑𝑥
𝑖𝑢

so in particular |𝑓(𝑢)|
̂ ≤ |𝑢| ‖𝑓 ‖1 .
1 ′

Thus if 𝑓, 𝑓 ′ , 𝑓 ″ ∈ 𝐿1 then 𝑓(𝑢)


̂ = − 12 𝑓 ″̂ (𝑢) so
𝑢

̂ ‖𝑓 ″ ‖1
|𝑓(𝑢)| ≤ .
|𝑢2 |

As ∫ < ∞, 𝑓 ̂ ∈ 𝐿1 .
∞ 1
1 |𝑢|2 𝑑𝑢

Definition (convolution). Given two Borel measures 𝜇 and 𝜈 on R𝑑 , we


define their convolution 𝜇 ∗ 𝜈 as the image of 𝜇 ⊗ 𝜈 under the addition map

Φ ∶ R𝑑 × R𝑑 → R𝑑
(𝑥, 𝑦) ↦ 𝑥 + 𝑦

i.e. 𝜇 ∗ 𝜈 = Φ∗ (𝜇 ⊗ 𝜈)

Thus given 𝐴 ∈ ℬ(R𝑑 ),

Φ∗ (𝜇 ⊗ 𝑣)(𝐴) = 𝜇 ⊗ 𝜈({(𝑥, 𝑦) ∶ 𝑥 + 𝑦 ∈ 𝐴}).

Example. Given 𝑋, 𝑌 independent random variables and 𝜇, 𝜈 be laws of 𝑋, 𝑌


respectively, then 𝜇 ∗ 𝜈 is the law of 𝑋 + 𝑌.

Definition (convolution). If 𝑓, 𝑔 ∈ 𝐿1 (R𝑑 ) define their convolution 𝑓 ∗ 𝑔 by

(𝑓 ∗ 𝑔)(𝑥) = ∫ 𝑓(𝑥 − 𝑡)𝑔(𝑡)𝑑𝑡.


R𝑑

59
10 Fourier transform

This is well defined by Fubini: 𝑓, 𝑔 ∈ 𝐿1 so

∫ ∫ |𝑓(𝑥 − 𝑡)𝑔(𝑡)|𝑑𝑡𝑑𝑥 < ∞


R𝑑 R𝑑

and

‖𝑓 ∗ 𝑔‖1 = ∫ ∣ ∫ 𝑓(𝑥 − 𝑡)𝑔(𝑡)𝑑𝑡∣𝑑𝑥 ≤ ∫ ∫ |𝑓(𝑥 − 𝑡)𝑔(𝑡)|𝑑𝑡𝑑𝑥 ≤ ‖𝑓‖1 ⋅ ‖𝑔‖1

Therefore (𝐿1 (R𝑑 ), ∗) forms a Banach algebra.


Remark. If 𝜇, 𝜈 are two finite Borel measures on R𝑑 and if 𝜇, 𝜈 ≪ 𝑑𝑥, i.e.
absolutely continuous, then by Radon-Nikodym there exist 𝑓, 𝑔 ∈ 𝐿1 (R𝑑 ) such
that
𝑑𝜇 = 𝑓𝑑𝑥
𝑑𝜈 = 𝑔𝑑𝑥
then 𝜇 ∗ 𝜈 ≪ 𝑑𝑥 and
𝑑(𝜇 ∗ 𝜈) = (𝑓 ∗ 𝑔)𝑑𝑥.

Proposition 10.4 (Gaussian approximation). If 𝑓 ∈ 𝐿𝑝 (R𝑑 ) where 𝑝 ∈


[1, ∞) then
lim ‖𝑓 ∗ 𝐺𝜎 − 𝑓‖𝑝 = 0
𝜎→0

where 𝐺𝜎 = 𝒩(0, 𝜎 𝐼𝑑 ), i.e.


2

1 ‖𝑥‖2
𝐺𝜎 (𝑥) = √ 𝑒− 2𝜎2 .
2
( 2𝜋𝜎 ) 𝑑

Lemma 10.5 (continuity of translation in 𝐿𝑝 ). Suppose 𝑝 ∈ [1, ∞) and


𝑓 ∈ 𝐿𝑝 . Then
lim‖𝜏𝑡 (𝑓) − 𝑓‖𝑝 = 0
𝑡→0

where 𝜏𝑡 (𝑓)(𝑥) = 𝑓(𝑥 + 𝑡), 𝑡 ∈ R𝑑 .

Proof. Example sheet.


Proof of Gaussian approximation.

(𝑓 ∗ 𝐺𝜎 − 𝑓)(𝑥) = ∫ 𝐺𝜎 (𝑡)(𝑓(𝑥 − 𝑡) − 𝑓(𝑥))𝑑𝑡 = E(𝑓(𝑥 − 𝜎𝑁 ) − 𝑓(𝑥))


R𝑑

where 𝑁 ∼ 𝒩(0, 𝐼𝑑 ) is Gaussian with density 𝐺1 . Then


‖𝑓 ∗ 𝐺𝜎 − 𝑓‖𝑝𝑝 ≤ E(‖𝑓(𝑥 + 𝜎𝑁 ) − 𝑓(𝑥)‖𝑝𝑝 ) = E(‖𝜏𝜎𝑁 (𝑓) − 𝑓‖𝑝𝑝 )
by Jensen’s inequality and convexity of 𝑥 ↦ 𝑥𝑝 . By the lemma,
lim ‖𝜏𝜎𝑁 (𝑓) − 𝑓‖𝑝 = 0.
𝜎→0

As ‖𝜏𝜎𝑁 (𝑓) − 𝑓‖𝑝 ≤ 2‖𝑓‖𝑝 , apply dominated convergence theorem to get the
required result.

60
10 Fourier transform

Proposition 10.6.
• If 𝑓, 𝑔 ∈ 𝐿1 (R𝑑 ) then
𝑓̂
∗ 𝑔 = 𝑓 ̂ ⋅ 𝑔.̂

• If 𝜇, 𝜈 are finite Borel measure then

̂
𝜇 ∗ 𝜈 = 𝜇̂ ⋅ 𝜈.̂

Proof. 1 reduces to 2 by writing

𝑓(𝑥)𝑑𝑥 = 𝑓 + (𝑥)𝑑𝑥 − 𝑓 − (𝑥)𝑑𝑥 = 𝑎𝑑𝜇 − 𝑏𝑑𝜈

for some probability measure 𝜇, 𝜈.


wlog we may assume 𝜇 and 𝜈 are laws of independent random variables 𝑋
and 𝑌. Then by a previous result 𝜇 ∗ 𝜈 is just the law of 𝑋 + 𝑌 so

̂
𝜇 ∗ 𝜈(𝑢) = ∫ 𝑒𝑖⟨𝑢,𝑥+𝑦⟩ 𝑑𝜇(𝑥)𝑑𝜈(𝑦)

= E(𝑒𝑖⟨𝑢,𝑋+𝑌 ⟩ ) = E(𝑒𝑖⟨𝑢,𝑋⟩ 𝑒𝑖⟨𝑢,𝑌 ⟩ ) homomorphism


= E(𝑒 𝑖⟨𝑢,𝑋⟩ 𝑖⟨𝑢,𝑌 ⟩
)E(𝑒 ) as 𝑋, 𝑌 are independent
= 𝜇(𝑢)
̂ ⋅ 𝜈(𝑢).
̂

In short, this is precisely because 𝑒𝑖⟨𝑢,−⟩ are characters.

Theorem 10.7 (Lévy criterion). Let (𝑋𝑛 )𝑛≥1 and 𝑋 be R𝑑 -valued random
variables. Then TFAE:
1. 𝑋𝑛 → 𝑋 in law,
2. For all 𝑢 ∈ R𝑑 , 𝜇𝑋
̂ 𝑛 (𝑢) → 𝜇𝑋
̂ (𝑢).
In particular if 𝜇𝑋
̂ = 𝜇𝑌̂ for two random variables 𝑋 and 𝑌 then 𝑋 = 𝑌 in
law, i.e. 𝜇𝑋 = 𝜇𝑌 .

Thus Fourier transform is an injection from Borel measure to certain function


space.
Proof.
• 1 ⟹ 2: Clear by defintion as 𝑓(𝑥) = 𝑒𝑖⟨𝑢,𝑥⟩ is continuous and bounded
for all 𝑢 ∈ R𝑑 .
• 2 ⟹ 1: Need to show that for all 𝑔 ∈ 𝐶𝑏 (R𝑑 ),

E(𝑔(𝑋𝑛 )) → E(𝑔(𝑋)).

wlog it’s enough to check this for all 𝑔 ∈ 𝐶𝑐∞ (R𝑑 ). For the sufficiency see
example sheet.
Note that for all 𝑔 ∈ 𝐶𝑐∞ (R𝑑 ), 𝑔 ̂ ∈ 𝐿1 so by Fourier inversion formula

−𝑖⟨𝑢,𝑥⟩ 𝑑𝑢
𝑔(𝑥) = ∫ 𝑔(𝑢)𝑒
̂ .
(2𝜋)𝑑

61
10 Fourier transform

Hence
−𝑖⟨𝑢,𝑋𝑛 ⟩
E(𝑔(𝑋𝑛 )) = ∫ 𝑔(𝑢)E(𝑒
̂ )𝑑𝑢
𝑑𝑢
= ∫ 𝑔(𝑢)
̂ 𝜇𝑋̂ 𝑛 (−𝑢)
(2𝜋)𝑑
𝑑𝑢
→ ∫ 𝑔(𝑢)
̂ 𝑚𝑋(−𝑢)
̂
(2𝜋)𝑑
= E(𝑔(𝑋))
by dominated convergence theorem.

Theorem 10.8 (Plancherel formula).

1. If 𝑓 ∈ 𝐿1 (R𝑑 ) ∩ 𝐿2 (R𝑑 ) then 𝑓 ̂ ∈ 𝐿2 (R𝑑 ) and

‖𝑓‖̂ 22 = (2𝜋)𝑑 ‖𝑓‖22 .

2. If 𝑓, 𝑔 ∈ 𝐿1 (R𝑑 ) ∩ 𝐿2 (R𝑑 ) then

⟨𝑓,̂ 𝑔⟩̂ 𝐿2 = (2𝜋)𝑑 ⟨𝑓, 𝑔⟩𝐿2 .

3. The Fourier transform

ℱ ∶ 𝐿1 (R𝑑 ) ∩ 𝐿2 (R𝑑 ) → 𝐿2 (R𝑑 )


1
𝑓↦ √ 𝑓̂
( 2𝜋)𝑑

extends uniquely to a linear operator on 𝐿2 (R𝑑 ) which is an isometry.


Moreover
ℱ ∘ ℱ(𝑓) = 𝑓 ̌
where 𝑓(𝑥)
̌ = 𝑓(−𝑥), for all 𝑓 ∈ 𝐿2 (R𝑑 ).

Proof. First we prove 1 and 2 assuming 𝑓,̂ 𝑔 ̂ ∈ 𝐿1 (R𝑑 ). By Fourier inversion


formula,

‖𝑓‖̂ 22 = ∫ |𝑓(𝑢)|
̂ 2 𝑑𝑢
R𝑑

̂ 𝑓(𝑢)𝑑𝑢
= ∫ 𝑓(𝑢) ̂

̂
= ∫ (∫ 𝑓(𝑥)𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝑥) 𝑓(𝑢)𝑑𝑢

̂
= ∫ ∫ 𝑓(𝑥)𝑓(𝑢)𝑒−𝑖⟨𝑢,𝑥⟩ 𝑑𝑢𝑑𝑥

= ∫ 𝑓(𝑥)𝑓(𝑥)(2𝜋)𝑑 𝑑𝑥

= (2𝜋)𝑑 ‖𝑓‖22

62
10 Fourier transform

and in particular 𝑓 ̂ ∈ 𝐿2 (R𝑑 ).


Similarly for 2,

⟨𝑓,̂ 𝑔⟩̂ 𝐿2 = ∫ 𝑓(𝑢)


̂ 𝑔(𝑢)𝑑𝑢
̂

= ∫ (∫ 𝑓(𝑥)𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝑥) 𝑔(𝑢)𝑑𝑢


̂

= ∫ ∫ 𝑓(𝑥)𝑔(𝑢)𝑒
̂ −𝑖⟨𝑢,𝑥⟩ 𝑑𝑢𝑑𝑥

= ∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥(2𝜋)𝑑

= (2𝜋)𝑑 ⟨𝑓, 𝑔⟩𝐿2


Now for the general case we use Gaussian as a mollifier. Consider
𝑓𝜎 = 𝑓 ∗ 𝐺 𝜎
𝑔𝜎 = 𝑔 ∗ 𝐺𝜎
and based on results and computations before,
𝑓𝜎̂ = 𝑓 ̂ ⋅ 𝐺𝜎̂ = 𝑓𝑒̂ −𝜎‖𝑢‖ /2 .
2

As ‖𝑓‖̂ ∞ ≤ ‖𝑓‖1 , 𝑓𝜎̂ ∈ 𝐿1 (R𝑑 ). Thus 𝑓𝜎̂ ∈ 𝐿2 (R𝑑 ) and ‖𝑓𝜎̂ ‖22 = (2𝜋)𝑑 ‖𝑓𝜎 ‖22 . But
by Gaussian approximation we know that 𝑓𝜎 → 𝑓 in 𝐿2 (R𝑑 ) as 𝜎 → 0. Hence
‖𝑓𝜎 ‖2 → ‖𝑓‖2 . Then

‖𝑓𝜎̂ ‖22 = ‖𝑓 ̂ ⋅ 𝐺𝜎̂ ‖22 = ∫ |𝑓(𝑢)|


̂ 2 𝑒−𝜎‖𝑢‖2 /2 𝑑𝑢 → ‖𝑓‖2
2

as 𝜎 → 0 by monotone convergence theorem. Thus


‖𝑓‖̂ 22 = (2𝜋)𝑑 ‖𝑓‖22 .
For 2,
⟨𝑓𝜎̂ , 𝑔𝜎̂ ⟩ = ∫ 𝑓 𝑔𝑒
̂ ̂ −𝜎‖𝑢‖2 𝑑𝑢 → ∫ 𝑓 𝑔𝑑𝑢
̂ ̂

as 𝜎 → 0 by dominated convergence theorem as 𝑓 𝑔̂ ̂ ∈ 𝐿1 . The result follows


from Gaussian approximation.
For 3, 𝐿1 (R𝑑 ) ∩ 𝐿2 (R𝑑 ) is dense in 𝐿2 (R𝑑 ) because it contains 𝐶𝑐 (R𝑑 ). Then
extend by completeness: given 𝑓 ∈ 𝐿2 (R𝑑 ), pick a sequence 𝑓𝑛 ∈ 𝐿1 (R𝑑 ) ∩
𝐿2 (R𝑑 ) such that 𝑓𝑛 → 𝑓 in 𝐿2 (R𝑑 ). Then define
ℱ𝑓 = lim ℱ𝑓𝑛 .
𝑛→∞

The limit exists as 𝐿 (R ) is complete. ℱ is well-defined as


2 𝑑

‖ℱ𝑓𝑛 − ℱ𝑓𝑚 ‖2 = ‖𝑓𝑛 − 𝑓𝑚 ‖2


by 1. Finally,
‖ℱ𝑓‖2 = ‖𝑓‖2
for all 𝑓 ∈ 𝐿 (R ) and
2 𝑑

ℱ ∘ ℱ(𝑓) = 𝑓 ̌
for all 𝑓 such that 𝑓, 𝑓 ̂ ∈ 𝐿1 (R𝑑 ). Thus by continuity this holds for 𝐿2 (R𝑑 ).

63
11 Gaussians

11 Gaussians

Definition (Gaussian). An R𝑑 -valued random variable 𝑋 is called Gaussian


if for all 𝑢 ∈ R𝑑 , ⟨𝑋, 𝑢⟩ is Gaussian, namely its law has the form 𝒩(𝑚, 𝜎2 )
for some 𝑚 ∈ R, 𝜎 > 0.

Proposition 11.1. The law of a Gaussian vector 𝑋 = (𝑋1 , … , 𝑋𝑑 ) ∈ R𝑑


is uniquely determined by
1. its mean E𝑋 = (E𝑋1 , … , E𝑋𝑑 ),

2. its covariance matrix (Cov(𝑋𝑖 , 𝑋𝑗 ))𝑖𝑗 where

Cov(𝑋𝑖 , 𝑋𝑗 ) = E((𝑋𝑖 − E𝑋𝑖 )(𝑋𝑗 − E𝑋𝑗 )).

Proof. If 𝑑 = 1 then this just says that it is determined by its mean 𝑚 and
covariance 𝜎2 , which is obviously true. For 𝑑 > 1, compute the characteristic
function
𝜇𝑋̂ (𝑢) = E(𝑒𝑖⟨𝑋,𝑢⟩ )
but by assumption ⟨𝑋, 𝑢⟩ is Gaussian in 𝑑 = 1 so its law is determined
1. the mean E(⟨𝑋, 𝑢⟩) = ⟨E𝑋, 𝑢⟩,
2. the variance Var⟨𝑋, 𝑢⟩. But

Var⟨𝑋, 𝑢⟩ = E((⟨𝑋, 𝑢⟩ − E⟨𝑋, 𝑢⟩)2 ) = ∑ 𝑢𝑖 𝑢𝑗 Cov(𝑋𝑖 , 𝑋𝑗 ).


𝑖,𝑗

In particular this shows that (Cov(𝑋𝑖 , 𝑋𝑗 ))𝑖𝑗 is a non-negative semidefinite


symmetric matrix.

Proposition 11.2. If 𝑋 is a Gaussian vector then exists 𝐴 ∈ ℳ𝑑 (R), 𝑏 ∈ R𝑑


such that 𝑋 has the same law as 𝐴𝑁 + 𝑏 where 𝑁 = (𝑁1 , … , 𝑁𝑑 ), (𝑁𝑖 )𝑑𝑖=1
are iid. 𝒩(0, 1).

Proof. Take 𝐴 such that

𝐴𝐴∗ = (Cov(𝑋𝑖 , 𝑋𝑗 ))𝑖𝑗

where 𝐴∗ is the adjoint/transpose of 𝐴, and

𝑏 = (E𝑋1 , … , E𝑋𝑑 ).

Check that for all 𝑢 ∈ R𝑑 ,

E(⟨𝑋, 𝑢⟩) = ⟨𝑏, 𝑢⟩


Var(⟨𝑋, 𝑢⟩) = ⟨𝐴𝐴∗ 𝑢, 𝑢⟩ = ‖𝐴∗ 𝑢‖22 = Var(⟨𝐴𝑁 + 𝑏, 𝑢⟩)

64
11 Gaussians

Proposition 11.3. If (𝑋1 , … , 𝑋𝑑 ) is a Gaussian vector then TFAE:


1. 𝑋𝑖 ’s are independent.
2. 𝑋𝑖 ’s are pairwise independent.
3. (Cov(𝑋𝑖 , 𝑋𝑗 ))𝑖𝑗 is a diagonal matrix.

Proof. 1 ⟹ 2 ⟹ 3 is obvious. 3 ⟹ 1 as we can choose 𝐴 to be diagonal.


Thus 𝑋 has the same law as (𝑎1 𝑁1 , … , 𝑎𝑑 𝑁𝑑 ) + 𝑏.

Theorem 11.4 (central limit theorem). Let (𝑋𝑖 )𝑖≥1 be R𝑑 -valued iid. ran-
dom variables with law 𝜇. Assume they have second moment, i.e. E(‖𝑋1 ‖2 ) <
∞. Let 𝐦 = E(𝑋1 ) ∈ R𝑑 and

𝑋1 + ⋯ + 𝑋𝑛 − 𝑛 ⋅ 𝐦
𝑌𝑛 = √ .
𝑛

Then 𝑌𝑛 converges in law to a central Gaussian on R𝑑 with law 𝒩(0, 𝐾)


where

𝐾𝑖𝑗 = (Cov(𝑋1 ))𝑖𝑗 = [∫ (𝑥𝑖 − E(𝑋1 ))(𝑥𝑗 − E(𝑋1 ))𝑑𝜇(𝑥)] .


R𝑑 𝑖𝑗

Proof. The proof is an application of Lévy criterion. Need to show 𝜇𝑌̂ 𝑛 (𝑢) →
𝜇𝑌̂ (𝑢) as 𝑛 → ∞ for all 𝑢, where 𝑌 ∼ 𝒩(0, 𝐾). As

𝜇𝑌̂ 𝑛 (𝑢) = E(𝑒𝑖⟨𝑌𝑛 ,𝑢⟩ ),

this is equivalent to show that for all 𝑢, ⟨𝑌𝑛 , 𝑢⟩ converges in law to ⟨𝑌 , 𝑢⟩. But

⟨𝑋1 , 𝑢⟩ + ⋯ + ⟨𝑋𝑛 , 𝑢⟩ − 𝑛⟨𝐦, 𝑢⟩


⟨𝑌𝑛 , 𝑢⟩ = √
𝑛

so we reduce the problem to 1-dimension case. By rescaling wlog E(𝑋1 ) =


0, E(𝑋12 ) = 1.
Now

𝜇𝑌̂ 𝑛 (𝑢) = E(𝑒𝑖𝑢𝑌𝑛 )


𝑋1 + ⋯ + 𝑋𝑛
= E(exp(𝑖𝑢 √ ))
𝑛
𝑛
𝑋
= ∏ E(exp(𝑖𝑢 √ 𝑖 ))
𝑖=1
𝑛
𝑋𝑖 𝑛
= (E(exp(𝑖𝑢 √ )))
𝑛
𝑢 𝑛
= (𝜇(̂ √ ))
𝑛

65
11 Gaussians

But E(𝑋1 ) = 0, E(𝑋12 ) = 1 so we can differentiate 𝜇̂ under the integral sign

𝜇(𝑢)
̂ = ∫ 𝑒𝑖𝑢𝑥 𝑑𝜇(𝑥)
R
𝑑
𝜇(𝑢)
̂ = ∫ 𝑖𝑥𝑒𝑖𝑢𝑥 𝑑𝜇(𝑥) = 𝑖E(𝑋1 )
𝑑𝑢 R
𝑑2
𝜇(𝑢)
̂ = ∫ −𝑥2 𝑒𝑖𝑢𝑥 𝑑𝜇(𝑥) = −E(𝑋12 )
𝑑𝑢2 R

Taylor expand 𝜇̂ around 0 to 2nd order,

𝑢2 ″
𝜇(𝑢)
̂ ̂ + 𝑢𝜇′̂ (0) +
= 𝜇(0) 𝜇̂ (𝑢) + 𝑜(𝑢2 )
2
𝑢2
=1+0⋅𝑢− + 𝑜(𝑢2 )
2
so
𝑢2 𝑢2 2
𝜇𝑌̂ 𝑛 (𝑢) = (1 − + 𝑜( ))𝑛 → 𝑒−𝑢 /2 = 𝑔(𝑢)
̂
2𝑛 𝑛
as 𝑛 → ∞ where 𝑔 is the law of 𝑌.

66
12 Ergodic theory

12 Ergodic theory
Let (𝑋, 𝒜, 𝜇) be a measure space. Let 𝑇 ∶ 𝑋 → 𝑋 be an 𝒜-measurable map. We
are interested in the trajectories of 𝑇 𝑛 𝑥 for 𝑛 ≥ 0 and their statistical behaviour.
In particular we are interested in those 𝑇 preserving measure 𝜇.

Definition (measure-preserving). 𝑇 ∶ 𝑋 → 𝑋 is measure-preserving if


𝑇∗ 𝜇 = 𝜇. (𝑋, 𝒜, 𝜇, 𝑇 ) is called a measure-preserving dynamical system.

Definition (invariant function, invariant set, invariant 𝜎-algebra).


• A measurable function 𝑓 ∶ 𝑋 → R is called 𝑇-invariant if 𝑓 = 𝑓 ∘ 𝑇.
• A set 𝐴 ∈ 𝒜 is 𝑇-invariant if 1𝐴 is 𝑇-invariant.

𝒯 = {𝐴 ∈ 𝒜 ∶ 𝐴 is 𝑇-invariant}
is called the 𝑇-invariant 𝜎-algebra.

Lemma 12.1. 𝑓 is 𝑇-invariant if and only if 𝑓 is 𝒯-measurable.

Proof. Indeed for all 𝑡 ∈ R,

{𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) < 𝑡} = {𝑥 ∈ 𝑋 ∶ 𝑓 ∘ 𝑇 (𝑥) < 𝑡} = 𝑇 −1 ({𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) < 𝑡}).

Definition (ergodic). 𝑇 is ergodic with respect to 𝜇, or that 𝜇 is ergodic


with respect to 𝑇, if for all 𝐴 ∈ 𝒯, 𝜇(𝐴) = 0 or 𝜇(𝐴𝑐 ) = 0.

This condition asserts that 𝒯 is trivial, i.e. its elements are either null or
conull.

Lemma 12.2. 𝑇 is ergodic with respect to 𝜇 if and only if every invariant


function 𝑓 is almost everywhere constant.

Proof. Exercise.
Example.
1. Let 𝑋 be a finite space, 𝑇 ∶ 𝑋 → 𝑋 a map and 𝜇 = # the counting
measure, then 𝑇 is measure preserving is equivalent to 𝑇 being a bijection,
and 𝑇 is ergodic is equivalent to there does not exists a partition 𝑋 =
𝑋1 ∪ 𝑋2 such that both 𝑋1 and 𝑋2 are 𝑇-invariant, which is equivalent to
for all 𝑥, 𝑦 ∈ 𝑋, there exists 𝑛 such that 𝑇 𝑛 𝑥 = 𝑦.
2. Let 𝑋 = R𝑑 /Z𝑑 , 𝒜 the Borel 𝜎-algebra and 𝜇 the Lebesgue measure.
Given 𝑎 ∈ R𝑑 , translation 𝑇𝑎 ∶ 𝑥 ↦ 𝑥 + 𝑎 is measure-preserving. 𝑇𝑎
is ergodic with respect to 𝜇 if and only if (1, 𝑎1 , … , 𝑎𝑑 ), where 𝑎𝑖 ’s are
coordinates of 𝑎, are linearly independent. See example sheet. (hint:
Fourier transform)

67
12 Ergodic theory

3. Let 𝑋 = R/Z and again 𝒜 Borel 𝜎-algebra and 𝜇 the Lebesgue mea-
sure. The doubling map 𝑇 ∶ 𝑥 ↦ 2𝑥 − ⌊2𝑥⌋ is ergodic with respect to
𝜇. (hint: again consider Fourier coefficeints). Intuitively in the graph
of this function the preimage 𝜀 is two segments each of length 𝜀/2, so
measure-preserving.
4. Furstenberg conjecture: every ergodic measure 𝜇 on R/Z invariant under
𝑇2 , 𝑇3 must be either Lebesgue or finitely supported.

12.1 The canonical model


Let (𝑋𝑛 )𝑛≥1 be an R𝑑 -valued stochastic process on (Ω, ℱ, P). Let 𝑋 = (R𝑑 )N
and define the sample path map

Φ∶Ω→𝑋
𝜔 ↦ (𝑋𝑛 (𝜔))𝑛≥1

Let

𝑇 ∶𝑋→𝑋
(𝑥𝑛 )𝑛≥1 ↦ (𝑥𝑛+1 )𝑛≥1

be the shift map. Let 𝑥𝑛 ∶ 𝑋 → R𝑑 be the 𝑛th coordinate function and let
𝒜 = 𝜎(𝑥𝑛 ∶ 𝑛 ≥ 1).
Note. 𝒜 is the infinite product 𝜎-algebra ℬ(R𝑑 )⊗N of ℬ(R𝑑 )N .

Let 𝜇 = Φ∗ P, a probability measure on (𝑋, 𝒜). This 𝜇 is called the law of


the process (𝑋𝑛 )𝑛≥1 . Now (𝑋, 𝒜, 𝜇, 𝑇 ) is called the canonical model associated
to (𝑋𝑛 )𝑛≥1 .

Proposition 12.3 (stationary process). TFAE:


1. (𝑋, 𝒜, 𝜇, 𝑇 ) is measure-preserving.
2. For all 𝑘 ≥ 1, the law of (𝑋𝑛 , 𝑋𝑛+1 , … , 𝑋𝑛+𝑘 ) on (R𝑑 )𝑘 is independent
of 𝑛.
In this case we say that (𝑋𝑛 )𝑛≥1 is a stationary process.

Proof.
• 1 ⟹ 2: 𝜇 = 𝑇∗ 𝜇 implies 𝜇 = 𝑇∗𝑛 𝜇 for all 𝜇 and this says law of (𝑋𝑖 )𝑖≥1
is the same as that of (𝑋𝑖+𝑛 )𝑖≥1 .

• 1 ⟸ 2: 𝜇 and 𝑇∗𝑛 𝜇 agree on cylinders 𝐴 × (R𝑑 )N\𝐹 for 𝐹 ⊆ N finite,


𝐴 ∈ ℬ((R𝑑 )𝐹 ).

In some sense ergodic system is the study of stationary process.

68
12 Ergodic theory

Proposition 12.4 (Bernoulli shift). If (𝑋𝑛 )𝑛≥1 are iid. then (𝑋, 𝒜, 𝜇, 𝑇 )
is ergodic. It is called the Bernoulli shift associated to the law 𝜈 of 𝑋1 . We
have
𝜇 = 𝜈 ⊗N .

Proof. Claim that Φ−1 (𝒯) ⊆ 𝒞, the tail 𝜎-algebra of (𝑋𝑛 )𝑛≥1 . But Kolmogorov
0-1 law says that if 𝐴 ∈ 𝒯 then P(Φ−1 (𝐴)) = 0 or 1, so 𝜇(𝐴) = 0 or 1, thus 𝜇
is 𝑇-ergodic.
Given 𝐴 ∈ 𝒯, 𝑇 −1 𝐴 = 𝐴 so

Φ−1 (𝐴) = {𝜔 ∈ Ω ∶ (𝑋𝑛 (𝜔))𝑛≥1 ∈ 𝐴}


= {𝜔 ∈ Ω ∶ (𝑋𝑛 (𝜔))𝑛≥1 ∈ 𝑇 −1 𝐴}
= {𝜔 ∈ Ω ∶ (𝑋𝑛+1 (𝜔))𝑛≥1 ∈ 𝐴}
= {𝜔 ∈ Ω ∶ (𝑋𝑛+𝑘 (𝜔))𝑛≥1 ∈ 𝐴} for all 𝑘
∈ 𝜎(𝑋𝑘 , 𝑋𝑘+1 , … )

for 𝜇-almost every 𝑥.

Theorem 12.5 (von Neumann mean ergodic theorem). Let (𝑋, 𝒜, 𝜇, 𝑇 )


be a measure-preserving system. Let 𝑓 ∈ 𝐿2 (𝑋, 𝒜, 𝜇). Then the ergodic
average
1 𝑛−1
𝑆𝑛 𝑓 = ∑ 𝑓 ∘ 𝑇 𝑖
𝑛 𝑖=0

converges in 𝐿2 to 𝑓, a 𝑇-invariant function. In fact 𝑓 is the orthogonal


projection of 𝑓 onto 𝐿2 (𝑋, 𝒯, 𝜇).

The intuition is as follow: if 𝑓 is the indicator function of a set, then 𝑆𝑛 𝑓 is


exactly the average of the time each orbit spending in 𝐴.
Proof. Hilbert space argument. Let 𝐻 = 𝐿2 (𝑋, 𝒜, 𝜇) and define

𝑈 ∶𝐻→𝐻
𝑓↦𝑓 ∘𝑇

which is an isometry: because 𝜇 is 𝑇-invariant, ∫ |𝑓 ∘ 𝑇 |2 𝑑𝜇 = ∫ |𝑓|2 𝑑𝜇. Then


by Riesz representation theorem it has an adjoint

𝑈∗ ∶ 𝐻 → 𝐻
𝑥 ↦ 𝑈 ∗𝑥

which satisfies ⟨𝑈 ∗ 𝑥, 𝑦⟩ = ⟨𝑥, 𝑈 𝑦⟩ for all 𝑦 ∈ 𝐻. Let

𝑊 = {𝜑 − 𝜑 ∘ 𝑇 ∶ 𝜑 ∈ 𝐻}

be the coboundaries. Let 𝑓 ∈ 𝑊. Then

1 𝑛−1 𝜑 − 𝜑 ∘ 𝑇𝑛
𝑆𝑛 𝑓 = ∑(𝜑 ∘ 𝑇 𝑖 − 𝜑 ∘ 𝑇 𝑖+1 ) = →0
𝑛 𝑖=0 𝑛

69
12 Ergodic theory

as 𝑛 → ∞.
Let 𝑓 ∈ 𝑊 then again 𝑆𝑛 𝑓 → 0 because for all 𝜀 exists 𝑔 ∈ 𝑊 such that
‖𝑓 − 𝑔‖ < 𝜀. Then
‖𝑆𝑛 𝑓 − 𝑆𝑛 𝑔‖ = ‖𝑆𝑛 (𝑓 − 𝑔)‖ ≤ ‖𝑓 − 𝑔‖ ≤ 𝜀
so lim sup𝑛 ‖𝑆𝑛 𝑓‖ ≤ 𝜀.
⟂ ⟂
Have 𝐻 = 𝑊 ⊕ 𝑊 and 𝑊 = 𝑊 ⟂ . Claim 𝑊 ⟂ is exactly the 𝑇-invariant
functions. The theorem then follows because if 𝑓 ∘ 𝑇 = 𝑓 then 𝑆𝑛 𝑓 = 𝑓 for all 𝑓.
Proof of claim.
𝑊 ⟂ = {𝑔 ∈ 𝐻 ∶ ⟨𝑔, 𝜑 − 𝑈 𝜑⟩ = 0 for all 𝜑 ∈ 𝐻}
= {𝑔 ∶ ⟨𝑔, 𝜑⟩ = ⟨𝑔, 𝑈 𝜑⟩ for all 𝜑}
= {𝑔 ∶ ⟨𝑔, 𝜑⟩ = ⟨𝑈 ∗ 𝑔, 𝜑⟩ for all 𝜑}
= {𝑔 ∶ 𝑈 ∗ 𝑔 = 𝑔}
= {𝑔 ∶ 𝑈 𝑔 = 𝑔}
where the last equality is by
‖𝑈 𝑔 − 𝑔‖2 = 2‖𝑔‖2 − 2 Re⟨𝑔, 𝑈 𝑔⟩ = 2‖𝑔‖2 − 2 Re⟨𝑈 ∗ 𝑔, 𝑔⟩,
and this shows that 𝑊 ⟂ are exactly 𝑇-invariant functions.

In fact we can do better:

Theorem 12.6 (Birkhoff (pointwise) ergodic theorem). Let (𝑋, 𝒜, 𝜇, 𝑇 ) be


a measure-preserving system. Assume 𝜇 is finite (actually 𝜎-finite suffices)
and let 𝑓 ∈ 𝐿1 (𝑋, 𝒜, 𝜇). Then

1 𝑛−1
𝑆𝑛 𝑓 = ∑𝑓 ∘ 𝑇𝑖
𝑛 𝑖=0

converges 𝜇-almost everywhere to a 𝑇-invariant function 𝑓 ∈ 𝐿1 . Moreover


𝑆𝑛 𝑓 → 𝑓 in 𝐿1 .

Corollary 12.7 (strong law of large numbers). Let (𝑋𝑛 )𝑛≥1 be a sequence
of iid. random variables. Assume E(|𝑋1 |) < ∞. Let
𝑛
𝑆𝑛 = ∑ 𝑋𝑘 ,
𝑘=1

then 1
𝑛 𝑆𝑛 converges almost surely to E(𝑋1 ).

Proof. Let (𝑋, 𝒜, 𝜇, 𝑇 ) be the canonical model associated to (𝑋𝑛 )𝑛≥1 , where
𝑋 = RN , 𝒜 = ℬ(R)⊗N , 𝑇 the shift operator and 𝜇 = 𝜈 ⊗N where 𝜈 is the law of
𝑋1 . It is a Bernoulli shift. Let
𝑓 ∶𝑋→R
𝑥 ↦ 𝑥1

70
12 Ergodic theory

the first coodinate. Then 𝑓 ∘ 𝑇 𝑖 (𝑥) = 𝑥𝑖+1 so


1
(𝑋 + ⋯ + 𝑋𝑛 )(𝜔) = 𝑆𝑛 𝑓(𝑥)
𝑛 1
where 𝑥 = (𝑋𝑛 (𝜔))𝑛≥1 . Hence by Birkhoff ergodic theorem
1
(𝑋 + ⋯ + 𝑋𝑛 ) → 𝑓 = ∫ 𝑓𝑑𝜇 = ∫ 𝑥1 𝑑𝜇 = E(𝑋1 )
𝑛 1
almost surely.
Remark. If 𝑇 is ergodic then 𝑓 is almost everywhere constant. Hence 𝑓 =
∫ 𝑓𝑑𝜇.

Lemma 12.8 (maximal ergodic lemma). Let 𝑓 ∈ 𝐿1 (𝑋, 𝒜, 𝜇) and 𝛼 ∈ R.


Let
𝐸𝛼 = {𝑥 ∈ 𝑋 ∶ sup 𝑆𝑛 𝑓(𝑥) > 𝛼}
𝑛≥1

then
𝛼𝜇(𝐸𝛼 ) ≤ ∫ 𝑓𝑑𝜇.
𝐸𝛼

Lemma 12.9 (maximal inequality). Let

𝑓0 = 0
𝑛−1
𝑓𝑛 = 𝑛𝑆𝑛 𝑓 = ∑ 𝑓 ∘ 𝑇 𝑖 𝑛≥1
𝑖=0

Let
𝑃𝑁 = {𝑥 ∈ 𝑋 ∶ max 𝑓𝑛 (𝑥) > 0}.
0≤𝑛≤𝑁

Then
∫ 𝑓𝑑𝜇 ≥ 0.
𝑃𝑁

Proof of maximal inequality. Set 𝐹𝑁 = max0≤𝑛≤𝑁 𝑓𝑛 . Observe that for all 𝑛 ≤


𝑁, 𝐹𝑁 ≥ 𝑓𝑛 and hence
𝐹𝑁 ∘ 𝑇 + 𝑓 ≥ 𝑓𝑛 ∘ 𝑇 + 𝑓 = 𝑓𝑛+1 .
Now if 𝑥 ∈ 𝑃𝑛 then
𝐹𝑁 (𝑥) ≤ max 𝑓𝑛+1 ≤ 𝐹𝑁 ∘ 𝑇 + 𝑓
0≤𝑛≤𝑁

Integrate to get
∫ 𝐹𝑁 𝑑𝜇 ≤ ∫ 𝐹𝑁 ∘ 𝑇 𝑑𝜇 + ∫ 𝑓𝑑𝜇.
𝑃𝑁 𝑃𝑁 𝑃𝑁

Note that 𝐹𝑁 (𝑥) = 0 if 𝑥 ∉ 𝑃𝑁 because 𝑓0 = 0 so

∫ 𝐹𝑁 = ∫ 𝐹𝑁 ≤ ∫ 𝐹𝑁 ∘ 𝑇 + ∫ 𝑓.
𝑃𝑁 𝑋 𝑋 𝑃𝑛

71
12 Ergodic theory

As 𝜇 is 𝑇-invariant, ∫ 𝐹𝑁 ∘ 𝑇 = ∫ 𝐹𝑁 so

∫ 𝑓𝑑𝜇 ≥ 0.
𝑃𝑁

Proof of maximal ergodic lemma. Apply the maximal inequality to 𝑔 = 𝑓 − 𝛼.


Observe that
𝐸𝛼 (𝑓) = ⋃ 𝑃𝑁 (𝑔)
𝑁≥1

and 𝑆𝑛 𝑔 = 𝑆𝑛 𝑓 − 𝛼. Thus

∫ (𝑓 − 𝛼)𝑑𝜇 ≥ 0,
𝐸𝛼 (𝑓)

which is equivalent to
𝛼𝜇(𝐸𝛼 ) ≤ ∫ 𝑓𝑑𝜇.
𝐸𝛼

Proof of Birkhoff (pointwise) ergodic theorem. Let

𝑓 = lim sup 𝑆𝑛 𝑓
𝑛
𝑓 = lim inf 𝑆𝑛 𝑓
𝑛

Observe that 𝑓 = 𝑓 ∘ 𝑇 , 𝑓 ∘ 𝑇 = 𝑓: indeed

1 1
𝑆𝑛 𝑓 ∘ 𝑇 = (𝑓 ∘ 𝑇 + ⋯ + 𝑓 ∘ 𝑇 𝑛 ) = ((𝑛 + 1)𝑆𝑛+1 𝑓 − 𝑓).
𝑛 𝑛

Need to show that 𝑓 = 𝑓 𝜇-almost everywhere. This is equivalent to for all


𝛼, 𝛽 ∈ Q, 𝛼 > 𝛽, the set

𝐸𝛼,𝛽 (𝑓) = {𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) < 𝛽, 𝑓(𝑥) > 𝛼}

is 𝜇-null, as then
{𝑥 ∶ 𝑓(𝑥) ≠ 𝑓(𝑥)} = ⋃ 𝐸𝛼,𝛽
𝛼>𝛽

is 𝜇-null by subadditivity. Observe that 𝐸𝛼,𝛽 is 𝑇-invariant. Apply the maximal


ergodic theorem to 𝐸𝛼,𝛽 to get

𝛼𝜇(𝐸𝛼,𝛽 (𝑓)) ≤ ∫ 𝑓𝑑𝜇.


𝐸𝛼,𝛽

Dually
−𝛽𝜇(𝐸𝛼,𝛽 (𝑓)) ≤ ∫ −𝑓𝑑𝜇
𝐸𝛼,𝛽

72
12 Ergodic theory

so
𝛼𝜇(𝐸𝛼,𝛽 ) ≤ 𝛽𝜇(𝐸𝛼,𝛽 ).
But 𝛼 > 𝛽 so 𝜇(𝐸𝛼,𝛽 ) = 0.
We have proved that the limit lim𝑛 𝑆𝑛 𝑓 exists almost everywhere, which we
now define to be 𝑓, and left to show 𝑓 ∈ 𝐿1 and lim𝑛 ‖𝑆𝑛 𝑓 − 𝑓‖1 = 0. This is an
application of Fatou’s lemma:

∫ |𝑓|𝑑𝜇 = ∫ lim inf |𝑆𝑛 𝑓|𝑑𝜇


𝑛

≤ lim inf ∫ |𝑆𝑛 𝑓|𝑑𝜇


𝑛

≤ lim inf‖𝑆𝑛 𝑓‖1


𝑛
≤ ‖𝑓‖1

so 𝑓 ∈ 𝐿1 where the last inequality is because


1
‖𝑆𝑛 𝑓‖ ≤ (‖𝑓‖1 + ⋯ + ‖𝑓 ∘ 𝑇 𝑛−1 ‖1 ) = ‖𝑓‖1 .
𝑛

Now to show ‖𝑆𝑛 𝑓 − 𝑓‖1 → 0, we truncate 𝑓. Let 𝑀 > 0 and set 𝜑𝑀 = 𝑓1|𝑓|<𝑀 .
Note that

• |𝜑𝑀 | ≤ 𝑀 so |𝑆𝑛 𝜑𝑀 | ≤ 𝑀. Hence by dominated convergence theorem


‖𝑆𝑛 𝜑𝑀 − 𝜑𝑀 ‖1 → 0.

• 𝜑𝑀 → 𝑓 𝜇-almost everywhere and also in 𝐿1 by dominated convergence


theorem.
Thus by Fatou’s lemma,

‖𝜑𝑀 − 𝑓‖1 ≤ lim inf‖𝑆𝑛 𝜑𝑀 − 𝑆𝑛 𝑓‖1 ≤ ‖𝜑𝑀 − 𝑓‖1


𝑛

Finally

‖𝑆𝑛 𝑓 − 𝑓‖1 ≤ ‖𝑆𝑛 𝑓 − 𝑆𝑛 𝜑𝑀 ‖1 + ‖𝑆𝑛 𝜑𝑀 − 𝜑𝑀 ‖1 + ‖𝜑𝑀 − 𝑓‖1


≤ ‖𝑓 − 𝜑𝑀 ‖1 + ‖𝑆𝑛 𝜑𝑀 − 𝜑𝑀 ‖1 + ‖𝑓 − 𝜑𝑀 ‖1

so
lim sup‖𝑆𝑛 𝑓 − 𝑓‖1 ≤ 2‖𝑓 − 𝜑𝑀 ‖1
for all 𝑀, so goes to 0 as 𝑀 → ∞.
Remark.
1. The theorem holds if 𝜇 is only assumed to be 𝜎-finite.
2. The theorem holds if 𝑓 ∈ 𝐿𝑝 for 𝑝 ∈ [1, ∞). The 𝑆𝑛 𝑓 → 𝑓 in 𝐿𝑝 .

73
Index

𝐿𝑝 -space, 47 Hermitian norm, 49


𝜋-system, 12 Hilbert space, 49
𝜎-algebra, 10 Hölder inequality, 46
independence, 33
invariant, 67 iid., 34
tail, 36 independence, 33
infinite product 𝜎-algebra, 35, 68
Bernoulli shift, 69 inner product, 49
Birkhoff ergodic theorem, 70 integrable, 20
Boolean algebra, 2 integral with respect to a measure,
Borel 𝜎-algebra, 12 20
Borel measurable, 19 invariant 𝜎-algebra, 67
Borel-Cantelli lemma, 35 invariant function, 67
invariant set, 67
canonical model, 68 io., 35
Carathéodory extension theorem,
14 Jensen inequality, 44
central limit theorem, 65 Kolmogorov 0 − 1 law, 36, 69
characteristic function, 56
coboundary, 69 law, 29
completion, 17 law of large numbers, 38
conditional expectation, 54 law of larger numbers, 70
convergence Lebesgue measure, 8
almost surely, 39 Lebesgue’s dominated convergence
in 𝐿1 , 41 theorem, 23
in distribution, 39 Lévy criterion, 61
in mean, 41
maximal ergodic lemma, 71
in measure, 39
maximal inequality, 71
in probability, 39
mean, 32, 64
convolution, 59
measurable, 18
covariance matrix, 64
measurable space, 10
density, 31, 52 measure, 10
Dirac mass, 39 absolutely continuous, 51
distribution, 29 finitely additive, 2
distribution function, 29 singular, 51
Dynkin lemma, 12 measure-preserving, 67
measure-preserving dynamical
ergodic, 67 system, 67
expectation, 29 Minkowski inequality, 45
moment, 32
Fatou’s lemma, 23 monotone convergence theorem, 20
filtration, 36
Fourier transform, 56 null set, 5
Furstenberg conjecture, 68 orthogonal projection, 50
Gaussian, 64 Plancherel formula, 62
Gaussian approximation, 60 probability measure, 29

74
Index

probability space, 29 stationary process, 68


product 𝜎-algebra, 26 stochastic process, 68
infinite, 35, 68 strong law of large numbers, 38, 70
product measure, 27
infinite, 35 tail event, 36, 69
Tonelli-Fubini theorem, 27
Radon-Nikodym derivative, 52
Radon-Nikodym theorem, 52 uniformly integrable, 41
random process, 36
random variable, 29 variance, 32
independence, 33 von Neumann mean ergodic
Riesz representation theorem, 51, theorem, 69
69
weak convergence, 39
sample path map, 68
simple function, 19 Young inequality, 46

75

You might also like