II Probability and Measure
II Probability and Measure
Cambridge
Mathematics Tripos
Part II
Michaelmas, 2018
Lectures by
E. Breuillard
Notes by
Qiangru Kuang
Contents
Contents
1 Lebesgue measure 2
1.1 Boolean algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Jordan measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Product measures 25
6 Independence 32
6.1 Useful inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 36
8 𝐿𝑝 spaces 43
10 Fourier transform 55
11 Gaussians 63
12 Ergodic theory 66
12.1 The canonical model . . . . . . . . . . . . . . . . . . . . . . . . . 67
Index 73
1
1 Lebesgue measure
1 Lebesgue measure
1.1 Boolean algebra
1. contains ∅,
2. is stable under finite unions and complementation.
Example.
• The trivial Boolean algebra ℬ = {∅, 𝑋}.
• The discrete Boolean algebra ℬ = 2𝑋 , the family of all subsets of 𝑋.
Example.
1. Counting measure: 𝑚(𝐸) = #𝐸, the cardinality of 𝐸 where ℬ is the
discrete Boolean algebra of 𝑋.
2. More generally, given 𝑓 ∶ 𝑋 → [0, +∞], define for 𝐸 ⊆ 𝑋,
𝑚(𝐸) = ∑ 𝑓(𝑒).
𝑒∈𝐸
2
1 Lebesgue measure
Proposition 1.2. Define 𝑚(𝐸) = ∑𝑖=1 |𝐵𝑖 | if 𝐸 is any elementary set and
𝑛
Proof. Take 𝐴𝑛 ⊆ 𝐸 such that 𝑚(𝐴𝑛 ) ↑ sup and 𝐵𝑛 ⊇ 𝐸 such that 𝑚(𝐵𝑛 ) ↓ inf.
Note that
3
1 Lebesgue measure
Remark. wlog in these definitions we may assume that boxes are open.
Lemma 1.5.
1. 𝑚∗ is monotone: if 𝐸 ⊆ 𝐹 then 𝑚∗ (𝐸) ⊆ 𝑚∗ (𝐹 ).
2. 𝑚∗ is countably subadditive: if 𝐸 = ⋃𝑛≥1 𝐸𝑛 where 𝐸𝑛 ⊆ R𝑑 then
𝑚∗ (𝐸) ≤ ∑ 𝑚∗ (𝐸𝑛 ).
𝑛≥1
Proof. Monotonicity is obvious. For countable subadditivity, pick 𝜀 > 0 and let
𝐶𝑛 = ⋃𝑖≥1 𝐶𝑛,𝑖 where 𝐶𝑛,𝑖 are boxes such that 𝐸𝑛 ⊆ 𝐶𝑛 and
𝜀
∑ |𝐶𝑛,𝑖 | ≤ 𝑚∗ (𝐸𝑛 ) + .
𝑖≥1
2𝑛
Then
𝜀
∑ ∑ |𝐶𝑛,𝑖 | ≤ ∑(𝑚∗ (𝐸𝑛 ) + ) = 𝜀 + ∑ 𝑚∗ (𝐸𝑛 )
𝑛≥1 𝑖≥1 𝑛≥1
2𝑛 𝑛≥1
𝑚∗ (𝐸) ≤ 𝜀 + ∑ 𝑚∗ (𝐸𝑛 )
𝑛≥1
4
1 Lebesgue measure
𝑚∗ (𝐴 ∪ 𝐵) = 𝑚∗ (𝐴) + 𝑚∗ (𝐵).
∑ |𝐵𝑛 | ≤ 𝑚∗ (𝐴 ∪ 𝐵) + 𝜀.
𝑛≥1
wlog we may assume that the side lengths of each 𝐵𝑛 are < 2,
𝛼
where
where the inequality comes from the fact that 𝐴 and 𝐵 are compact and thus
closed. wlog we may discard the 𝐵𝑛 ’s that do not interesect 𝐴 ∪ 𝐵. Then by
construction
so
𝜀 + 𝑚∗ (𝐴 ∪ 𝐵) ≥ 𝑚∗ (𝐴) + 𝑚∗ (𝐵)
for all 𝜀.
Proof. For all 𝜀 > 0, there exist 𝐶 = ⋃𝑛≥1 𝐵𝑛 where 𝐵𝑛 are boxes such that
𝐸 ⊆ 𝐶 and ∑𝑛≥1 |𝐵𝑛 | ≤ 𝜀. But
𝑚∗ (𝐶 \ 𝐸) ≤ 𝑚∗ (𝐶) ≤ 𝜀.
5
1 Lebesgue measure
𝜀
𝑚∗ (𝐶 \ 𝐸) ≤ ∑ 𝑚∗ (𝐶𝑛 \ 𝐸𝑛 ) ≤ ∑ 𝑛
=𝜀
𝑛≥1 𝑛≥1
2
by countable subadditivity so 𝐸 ∈ ℒ.
To show it is stable under complementation, suppose 𝐸 ∈ ℒ. By assumption
there exist 𝐶𝑛 a countable union of boxes with 𝐸 ⊆ 𝐶𝑛 and 𝑚∗ (𝐶𝑛 \ 𝐸) ≤ 𝑛1 .
wlog we may assume the boxes are open so 𝐶𝑛 is open, 𝐶𝑛𝑐 is closed so 𝐶𝑛𝑐 ∈ ℒ.
Thus ⋃𝑛≥1 𝐶𝑛𝑐 ∈ ℒ by first part of the proof.
But
1
𝑚∗ (𝐸 𝑐 \ ⋃ 𝐶𝑛𝑐 ) ≤ 𝑚∗ (𝐸 𝑐 \ 𝐶𝑛𝑐 ) = 𝑚∗ (𝐶𝑛 \ 𝐸) ≤
𝑛≥1
𝑛
so 𝑚∗ (𝐸 𝑐 \ ⋃𝑛≥1 𝐶𝑛𝑐 ) = 0 so 𝐸 𝑐 \ ⋃𝑛≥1 𝐶𝑛𝑐 ∈ ℒ since it is a null set. But
𝐸 𝑐 = (𝐸 𝑐 \ ⋃ 𝐶𝑛𝑐 ) ∪ ⋃ 𝐶𝑛𝑐 ,
𝑛≥1 𝑛≥1
1
∑ |𝑂𝑘,𝑖 | ≤ 𝑚∗ (𝐹 ) + .
𝑖≥1
2𝑘
6
1 Lebesgue measure
so
1
𝑚∗ (𝐹 ) + 𝑚∗ (𝑂𝑖 \ 𝑂𝑖+1 ) ≤ 𝑚∗ (𝑂𝑖 ) ≤ ∑ |𝑂𝑖,𝑗 | ≤ 𝑚∗ (𝐹 ) +
𝑗≥1
2𝑖
so 𝑚 (𝑂𝑖 \ 𝑂𝑖+1 ) ≤
∗
2𝑖 .
1
Finally,
1 1
𝑚∗ (𝑂𝑘 \ 𝐹 ) = 𝑚∗ (⋃ (𝑂𝑖 \ 𝑂𝑖+1 )) ≤ ∑ 𝑚∗ (𝑂𝑖 \ 𝑂𝑖+1 ) ≤ ∑ = 𝑘−1 .
𝑖≥𝑘 𝑖≥𝑘 𝑖≥𝑘
2𝑖 2
𝑚∗ ( ⋃ 𝐸𝑛 ) = ∑ 𝑚∗ (𝐸𝑛 ).
𝑛≥1 𝑛≥1
Lemma 1.10. If 𝐸 ∈ ℒ then for all 𝜀 > 0 there exists 𝑈 open, 𝐹 closed,
𝐹 ⊆ 𝐸 ⊆ 𝑈 such that 𝑚∗ (𝑈 \ 𝐸) < 𝜀 and 𝑚∗ (𝐸 \ 𝐹 ) < 𝜀.
In particular
𝑁
∑ 𝑚∗ (𝐸𝑛 ) ≤ 𝑚∗ ( ⋃ 𝐸𝑛 )
𝑛=1 𝑛≥1
𝑚∗ ( ⋃ 𝐾𝑛 ) = ∑ 𝑚∗ (𝐾𝑛 )
𝑛≥1 𝑛≥1
7
1 Lebesgue measure
then
∑ 𝑚∗ (𝐸𝑛 )
𝑛≥1
≤ ∑ 𝑚∗ (𝐾𝑛 ) + 𝑚∗ (𝐸𝑛 \ 𝐾𝑛 )
𝑛≥1
𝜀
≤𝑚∗ ( ⋃ 𝐾𝑛 ) + ∑
𝑛≥1 𝑛≥1
2𝑛
≤𝑚∗ ( ⋃ 𝐸𝑛 ) + 𝜀
𝑛≥1
Proof. Pick distinct rationals 𝑝1 , … , 𝑝𝑁 in [0, 1]. The sets 𝑝𝑖 + 𝐸 are pairwise
disjoint so if 𝑚∗ were additive then we would have
𝑁 𝑁
𝑚∗ ( ⋃ 𝑝𝑖 + 𝐸) = ∑ 𝑚∗ (𝑝𝑖 + 𝐸) = 𝑁 𝑚∗ (𝐸)
𝑖=1 𝑖=1
[0, 1] ⊆ ⋃ 𝐸 + 𝑞 = R,
𝑞∈Q
8
1 Lebesgue measure
by countable subadditivity of 𝑚∗ ,
1 = 𝑚∗ ([0, 1]) ≤ ∑ 𝑚∗ (𝐸 + 𝑞) = 0.
𝑞∈Q
Absurd.
In particular 𝐸 ∉ ℒ as 𝑚∗ is additive on ℒ.
9
2 Abstract measure theory
𝜇( ⋃ 𝐴𝑛 ) = ∑ 𝜇(𝐴𝑛 ).
𝑛≥1 𝑛≥1
Example.
1. (R𝑑 , ℒ, 𝑚) is a measure space.
𝐸1 ⊆ 𝐸2 ⊆ ⋯ ⊆ 𝐸𝑛 ⊆ …
then
𝜇( ⋃ 𝐸𝑛 ) = lim 𝜇(𝐸𝑛 ) = sup 𝜇(𝐸𝑛 ).
𝑛→∞ 𝑛≥1
𝑛≥1
𝐸1 ⊇ 𝐸2 ⊇ ⋯ ⊇ 𝐸𝑛 ⊇ …
10
2 Abstract measure theory
Proof.
1.
𝜇(𝐵) = 𝜇(𝐴) + 𝜇(𝐵
⏟ \ 𝐴)
≥0
by additivity of 𝜇.
2. See example sheet. The idea is that every countable union ⋃𝑛≥1 𝐴𝑛 is a
disjoint countable union ⋃𝑛≥1 𝐵𝑛 where for each 𝑛, 𝐵𝑛 ⊆ 𝐴𝑛 . It then
follows by 𝜎-additivity.
3. Let 𝐸0 = ∅ so
⋃ 𝐸𝑛 = ⋃ (𝐸𝑛 \ 𝐸𝑛−1 ),
𝑛≥1 𝑛≥1
𝜇( ⋃ 𝐸𝑛 ) = ∑ 𝜇(𝐸𝑛 \ 𝐸𝑛−1 )
𝑛≥1 𝑛≥1
Remark. Note the 𝜇(𝐸1 ) < ∞ condition in the last part. Counterexample:
𝐸𝑛 = [𝑛, ∞) ⊆ R.
11
2 Abstract measure theory
Proof. We’ve shown that ℒ is a 𝜎-algebra and contains all open sets so ℬ(𝑋) ⊆
ℒ. Given 𝐴 ∈ ℒ, 𝐴𝑐 ∈ ℒ so for all 𝑛 ≥ 1 there exists 𝐶𝑛 countable unions of
(open) boxes such that 𝐴𝑐 ⊆ 𝐶𝑛 and 𝑚∗ (𝐶𝑛 \ 𝐴𝑐 ) ≤ 𝑛1 . Take 𝐶 = ⋂𝑛≥1 𝐶𝑛 ∈
ℬ(𝑋). Thus 𝐵 ∶= 𝐶 𝑐 ∈ ℬ(𝑋) and 𝑚(𝐴 \ 𝐵) = 0 because 𝐴 \ 𝐵 = 𝐶 \ 𝐴𝑐 .
Remark.
1. It can be shown that ℬ(R𝑑 ) ⊊ ℒ. In fact |ℒ| ≥ 2𝔠 and |ℬ(R𝑑 )| = 𝔠.
2. If ℱ is a family of subsets of a set 𝑋, the Boolean algebra generated by
ℱ can be explicitly described as
12
2 Abstract measure theory
𝐴𝑐 ∩ 𝐵 = (𝐵𝑐 ∪ (𝐴 ∩ 𝐵))𝑐
as a disjoint union so is in ℳ.
As ℳ′ ⊇ ℱ, by minimality of ℳ, have ℳ = ℳ′ . Now let
𝜇([0, 1]𝑑 ) = 1.
Proof. Exercise. Hint: use the 𝜋-system ℱ made of all boxes in R𝑑 and dissect
a cube into dyadic pieces. Then approximate and use monotone.
Remark.
1. There is no countably additive translation invariant measure on R defined
on all subsets of R. (c.f. Vitali’s counterexample).
13
2 Abstract measure theory
14
2 Abstract measure theory
∑ 𝜇(𝐼𝑛 \ 𝐼𝑛+1 ) → 0
𝑛≥𝑁
𝜇(𝐼𝑛 ) ≤ 𝜇 ∗ (𝐼 \ 𝐸) + 𝜇∗ (𝐸)
⏟⏟ ⏟𝑛 ⏟⏟ ⏟
→0 ≤𝜇(𝐼𝑛 )
so
lim 𝜇(𝐼𝑛 ) = 𝜇∗ (𝐸).
𝑛→∞
Now for the actual lemma, let 𝐸 = ⋂𝑛≥1 𝐼𝑛 where 𝐼𝑛 ∈ ℬ. wlog we may
assume 𝐼𝑛+1 ⊆ 𝐼𝑛 . By 𝜎-finiteness assumption, 𝑋 = ⋃𝑚≥1 𝑋𝑚 where
𝑋𝑚 ∈ ℬ with 𝜇(𝑋𝑚 ) < ∞ so
𝐸 = ⋃ 𝐸 ∩ 𝑋𝑚 .
𝑚≥1
From the lemma we can derive that ℬ∗ is also stable under complementa-
tion: given 𝐸 ∈ ℬ∗ , for all 𝑛 there exist 𝐶𝑛 = ⋃𝑖≥1 𝐵𝑛,𝑖 where 𝐵𝑛,𝑖 ∈ ℬ
such that 𝐸 ⊆ 𝐶𝑛 and 𝜇∗ (𝐶𝑛 \ 𝐸) ≤ 𝑛.
1
Now
𝐸 𝑐 = ( ⋃ 𝐶𝑛𝑐 ) ∪ (𝐸 𝑐 \ ⋃ 𝐶𝑛𝑐 )
𝑛≥1 𝑛≥1
𝐸 = ⋃ ⋃ (𝐸𝑛 ∩ 𝑋̃ 𝑚 )
𝑛≥1 𝑚≥1
15
2 Abstract measure theory
𝜇∗ (𝐸) ≤ 𝜇∗ (𝐶) + 𝜀
𝜇∗ (𝐹 ) ≤ 𝜇∗ (𝐷) + 𝜀
𝜇∗ (𝐸) + 𝜇∗ (𝐹 ) ≤ 2𝜀 + 𝜇∗ (𝐶 ∪ 𝐷) ≤ 2𝜀 + 𝜇∗ (𝐸 ∪ 𝐹 ).
so
But
⋂ (𝐼𝑛 ∩ 𝐽𝑛 ) = 𝐸 ∩ 𝐹 = ∅
𝑛≥1
⋂ (𝐼𝑛 ∪ 𝐽𝑛 ) = 𝐸 ∪ 𝐹
𝑛≥1
so by claim 3
lim 𝜇(𝐼𝑛 ∩ 𝐽𝑛 ) = 0
𝑛→∞
lim 𝜇(𝐼𝑛 ∪ 𝐽𝑛 ) = 𝜇∗ (𝐸 ∪ 𝐹 )
𝑛→∞
16
2 Abstract measure theory
17
3 Integration and measurable functions
{𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) < 𝑡} ∈ 𝒜.
Proposition 3.1.
Proof.
1. Obvious.
2. Follow from 1 once it’s shown that + ∶ R2 → R and × ∶ R2 → R are
measurable (with respect to Borel sets). The sets
{(𝑥, 𝑦) ∶ 𝑥 + 𝑦 < 𝑡}
{(𝑥, 𝑦) ∶ 𝑥𝑦 < 𝑡}
and similar for sup. Similarly lim sup𝑛 𝑓𝑛 (𝑥) < 𝑡 if and only if
1
𝑥 ∈ ⋃ ⋂ ⋃ {𝑥 ∶ 𝑓𝑛 (𝑥) < 𝑡 − }.
𝑚≥1 𝑘≥1 𝑛≥𝑘
𝑚
18
3 Integration and measurable functions
then 𝑛 𝑠
∑ 𝑎𝑖 𝜇(𝐴𝑖 ) = ∑ 𝑏𝑗 𝜇(𝐵𝑗 )
𝑖=1 𝑗=1
19
3 Integration and measurable functions
Remark.
1. The lemma says that the integral is well-defined.
2. We also use the notation ∫ 𝑓𝑑𝜇 to denote 𝜇(𝑓).
𝑋
Remark. This is consistent with the definition for 𝑓 simple, due to positivity.
Note.
|𝑓| = 𝑓 + + 𝑓 −
𝑓 = 𝑓+ − 𝑓−
0 ≤ 𝑓 1 ≤ 𝑓 2 ≤ ⋯ ≤ 𝑓𝑛 ≤ …
20
3 Integration and measurable functions
𝑚𝑔 ∶ 𝒜 → [0, ∞]
𝐸 ↦ 𝜇(1𝐸 𝑔)
by definition of integral so
𝐸𝑛 = {𝑥 ∈ 𝑋 ∶ 𝑓𝑛 (𝑥) ≥ (1 − 𝜀)𝑔(𝑥)}.
But
(1 − 𝜀)𝑚𝑔 (𝐸𝑛 ) = 𝜇((1 − 𝜀)𝑔1𝐸𝑛 )) ≤ 𝜇(𝑓𝑛 )
because (1 − 𝜀)𝑔1𝐸𝑛 is a simple function smaller than 𝑓𝑛 . Taking limit,
21
3 Integration and measurable functions
0 ≤ 𝑔𝑛 ≤ 𝑔𝑛+1 ≤ 𝑓
22
3 Integration and measurable functions
Remark. We may not have equality: let 𝑓𝑛 = 1[𝑛,𝑛+1] on (R, ℒ, 𝑚). Then
𝜇(𝑓𝑛 ) = 1 but lim𝑛→∞ 𝑓𝑛 = 0.
i.e.
𝜇(𝑓) ≤ lim inf 𝜇(𝑓𝑛 ).
𝑛→∞
so
𝜇(𝑓) = lim 𝜇(𝑓𝑛 ).
𝑛→∞
23
3 Integration and measurable functions
𝜇(∑ 𝑓𝑛 ) = ∑ 𝜇(𝑓𝑛 ).
𝑛≥1 𝑛≥1
Proof.
1. Let 𝑔𝑁 = ∑𝑁 𝑓 , then 𝑔𝑁 ↑ ∑𝑛≥1 𝑓𝑛 as 𝑁 → ∞ so the result follows
𝑛=1 𝑛
from monotone convergence theorem.
2. Let 𝑔 = ∑𝑛≥1 |𝑓𝑛 | and 𝑔𝑁 as above. Then |𝑔𝑁 | ≤ 𝑔 for all 𝑁 so the
domination assumption holds. The result thus follows from dominated
convergence theorem.
𝜕𝑓
𝐹 ′ (𝑡) = ∫ (𝑡, 𝑥)𝑑𝜇.
𝑋
𝜕𝑡
𝜕𝑓
𝑔𝑛 (𝑡, 𝑥) = (𝜃, 𝑥)
𝜕𝑡
24
3 Integration and measurable functions
so
|𝑔𝑛 (𝑡, 𝑥)| ≤ 𝑔(𝑥)
by domination assumption. Now apply dominated convergence theorem.
Remark.
1
𝑚(𝑓 ∘ 𝑔) = 𝑚(𝑓).
| det 𝑔|
𝐸 = {𝑥 ∈ 𝑋 ∶ assumptions hold at 𝑥}
25
4 Product measures
4 Product measures
Remark.
1. By analogy with the notion of product topology, 𝒜 ⊗ ℬ is the smallest
𝜎-algebra of subsets of 𝑋 × 𝑌 making the two projection maps measurable.
2. ℬ(R𝑑1 ) ⊗ ℬ(R𝑑2 ) = ℬ(R𝑑1 +𝑑2 ). See example sheet. However this is not so
for ℒ(R𝑑 ).
Proof. Let
ℰ = {𝐸 ⊆ 𝑋 × 𝑌 ∶ 𝐸𝑥 ∈ ℬ for all 𝑥 ∈ 𝑋}.
Note that ℰ contains all product sets 𝐴 × 𝐵 where 𝐴 ∈ 𝒜, 𝐵 ∈ ℬ. ℰ is a 𝜎-
algebra: if 𝐸 ∈ ℰ then 𝐸 𝑐 ∈ ℰ and if 𝐸𝑛 ∈ ℰ then ⋃ 𝐸𝑛 ∈ ℰ since (𝐸 𝑐 )𝑥 = (𝐸𝑥 )𝑐
and (⋃ 𝐸𝑛 )𝑥 = ⋃(𝐸𝑛 )𝑥 .
Proof.
1. In case 𝑓 = 1𝐸 for 𝐸 ∈ 𝒜 ⊗ ℬ the function 𝑦 ↦ 𝑓(𝑥, 𝑦) is just 𝑦 ↦ 1𝐸𝑥 (𝑦),
which is measurable by the previous lemma.
More generally, the result is true for simple functions and thus for all
measurable functions by taking pointwise limit.
2. By the same reduction we may assume 𝑓 = 1𝐸 for some 𝐸 ∈ 𝒜 ⊗ ℬ. Now
let 𝑌 = ⋃𝑚≥1 𝑌𝑚 with 𝜈(𝑌𝑚 ) < ∞. Let
26
4 Product measures
𝜈(𝐸𝑥 ∩ 𝑌𝑚 ) = ∑ 𝜈((𝐸𝑛 )𝑥 ∩ 𝑌𝑚 )
𝑛≥1
which is 𝒜-measurable.
The product sets form a 𝜋-system and generates the product measure so
by Dynkin lemma ℰ = 𝒜 ⊗ ℬ.
(𝜇 ⊗ 𝜈)(𝐴 × 𝐵) = 𝜇(𝐴)𝜈(𝐵).
by a corollary of MCT.
27
4 Product measures
∑ 𝑓(𝑛, 𝑚) = 0
𝑛≥1
0 𝑛≥2
∑ 𝑓(𝑛, 𝑚) = {
𝑚≥1 1 𝑛=1
so
∑ ∑ 𝑓(𝑛, 𝑚) ≠ ∑ ∑ 𝑓(𝑛, 𝑚).
𝑛≥1 𝑚≥1 𝑚≥1 𝑛≥1
Proof.
Note.
1. The Lebesgue measure 𝑚𝑑 on R𝑑 is equal to 𝑚1 ⊗ ⋯ ⊗ 𝑚1 , because it is
true on boxes and extend by uniqueness of measure.
28
5 Foundations of probability theory
Ω→R
𝜔 ↦ 𝑋(𝜔)
𝐹𝑋 ∶ R → [0, 1]
𝑡 ↦ P(𝑋 ≤ 𝑡)
29
5 Foundations of probability theory
Proof. Given 𝑡𝑛 ↓ 𝑡,
lim 𝐹 (𝑡) = 0
𝑡→−∞
lim 𝐹 (𝑡) = 1
𝑡→+∞
for all 𝑡 ∈ R.
𝑔 ∶ (0, 1) → R
𝑦 ↦ inf{𝑥 ∈ R ∶ 𝐹 (𝑥) ≥ 𝑦}
Proof. Let
𝐼𝑦 = {𝑥 ∈ R ∶ 𝐹 (𝑥) ≥ 𝑦}.
Clearly if 𝑦1 ≥ 𝑦2 then 𝐼𝑦1 ⊆ 𝐼𝑦2 so 𝑔(𝑦2 ) ≤ 𝑔(𝑦1 ) so 𝑔 is non-decreasing. 𝐼𝑦 is
an interval of R because if 𝑥 > 𝑥1 and 𝑥1 ∈ 𝐼𝑦 then 𝐹 (𝑥) ≥ 𝐹 (𝑥1 ) ≥ 𝑦 so 𝑥 ∈ 𝐼𝑦 .
So 𝐼𝑦 is an interval with endpoints 𝑔(𝑦) and +∞. But 𝐹 is right-continuous so
𝑔(𝑦) = min 𝐼𝑦 and the minimum is obtained. Thus 𝐼𝑦 = [𝑔(𝑦), +∞).
This means that 𝑥 ≥ 𝑔(𝑦) if and only if 𝑥 ∈ 𝐼𝑦 if and only if 𝐹 (𝑥) ≥ 𝑦.
Finally for left-continuity, suppose 𝑦𝑛 ↑ 𝑦 then ⋂𝑛≥1 𝐼𝑦𝑛 = 𝐼𝑦 by definition
of 𝐼𝑦 so 𝑔(𝑦𝑛 ) → 𝑔(𝑦).
30
5 Foundations of probability theory
31
5 Foundations of probability theory
where by definition of 𝜇𝑋 = 𝑋∗ P.
32
6 Independence
6 Independence
Independence is the key notion that makes probability theory distinct from
(abstract) measure theory.
Remark.
1. To prove that (𝒜𝑛 )𝑛≥1 is an independent family, it is enough to check
the independence condition for all 𝐴𝑛 ’s with 𝐴𝑛 ∈ Π𝑛 where Π𝑛 is a 𝜋-
system generating 𝒜𝑛 . The proof is an application of Dynkin lemma. For
example for 𝜎-algebras 𝒜1 , 𝒜2 , suffices to check
𝐴1 ↦ P(𝐴1 ∩ 𝐴2 )
𝐴1 ↦ P(𝐴1 )P(𝐴2 )
as Borel probability measures on R . “The joint law is the same as the product
𝑛
of individual laws”.
33
6 Independence
1
P(𝑍 = 0) = P(𝑍 = 1) =
2
and each pair (𝑋, 𝑌 ), (𝑋, 𝑍) and (𝑌 , 𝑍) is independent. But (𝑋, 𝑌 , 𝑍) is not
independent.
Example. Let Ω = (0, 1), ℱ = ℬ(0, 1), P = 𝑚 the Lebesgue measure. Write
the decimal expansion of 𝜔 ∈ (0, 1) as
𝜔 = 0.𝜀1 𝜀2 …
where 𝜀𝑖 (𝜔) ∈ {0, … , 9}. Choose a convention so that each 𝜔 has a well-defined
expansion (to avoid things like 0.099 ⋯ = 0.100 …). Now let 𝑋𝑛 (𝜔) = 𝜀𝑛 (𝜔).
Claim that the (𝑋𝑛 )𝑛≥1 are iid. random variables uniformly distributed on
{0, … , 9}, where “iid.” stands for independently and identically distributed.
Proof. Easy check. For example 𝑋1 (𝜔) = ⌊10𝜔⌋ so
1
P(𝑋1 = 𝑖1 ) = .
10
Similarly for all 𝑛
1 1
P(𝑋1 = 𝑖1 , … , 𝑋𝑛 = 𝑖𝑛 ) = , P(𝑋𝑛 = 𝑖𝑛 ) =
10𝑛 10
so 𝑛
P(𝑋1 = 𝑖1 , … , 𝑋𝑛 = 𝑖𝑛 ) = ∏ P(𝑋𝑘 = 𝑖𝑘 ).
𝑖=1
34
6 Independence
Remark.
𝑋𝑛 (𝜔)
𝜔=∑
𝑛≥1
10𝑛
is distributed according to Lebesgue measure so if we want we can construct
Lebesgue measure as the law of this random variable.
𝐴 × ∏ Ω𝑖
𝑖≥𝑛
for some 𝐴 ∈ ⨂𝑛𝑖=1 ℱ𝑖 . Set ℱ = 𝜎(ℰ), the infinite product 𝜎-algebra. Then
there is a unique probability measure 𝜇 on (Ω, ℱ) such that it agrees with
product measures on all cylinder sets, i.e.
𝑛
𝜇(𝐴 × ∏ Ω𝑖 ) = (⨂ 𝜇𝑖 )(𝐴)
𝑖>𝑛 𝑖=1
Lemma 6.3 (Borel-Cantelli). Let (Ω, ℱ, P) be a probability space and (𝐴𝑛 )𝑛≥1
a sequence of events.
1. If ∑𝑛≥1 P(𝐴𝑛 ) < ∞ then
P(lim sup 𝐴𝑛 ) = 0.
𝑛
P(lim sup 𝐴𝑛 ) = 1.
𝑛
Note that lim sup𝑛 𝐴𝑛 is also called 𝐴𝑛 io. meaning “infinitely often”.
Proof.
1. Let 𝑌 = ∑𝑛≥1 1𝐴𝑛 be a random variable. Then
Since 𝑌 ≥ 0, recall that we prove that E(𝑌 ) < ∞ implies that 𝑌 < ∞
almost surely, i.e. P-almost everywhere.
2. Note that
(lim sup 𝐴𝑛 )𝑐 = ⋃ ⋂ 𝐴𝑐𝑛
𝑛 𝑁 𝑛≥𝑁
35
6 Independence
so
𝑀
P( ⋂ 𝐴𝑐𝑛 ) ≤ P( ⋂ 𝐴𝑐𝑛 )
𝑛≥𝑁 𝑛=𝑁
𝑀 𝑀
= ∏ P(𝐴𝑐𝑛 ) = ∏ (1 − P(𝐴𝑛 ))
𝑛=𝑁 𝑛=𝑁
𝑀
≤ ∏ exp(−P(𝐴𝑛 ))
𝑛=𝑁
𝑀
≤ exp(− ∑ P(𝐴𝑛 ))
𝑛=𝑁
→0
as 𝑀 → ∞. Thus
P( ⋂ 𝐴𝑐𝑛 ) = 0
𝑛≥𝑁
for all 𝑁 so
P(⋃ ⋂ 𝐴𝑐𝑛 ) = 0.
𝑁 𝑛≥𝑁
is called the tail 𝜎-algebra of the process. Its elements are called tail
events.
Example. Tail events are those not affected by the first few terms in the se-
quence of random variables. For example,
is a tail event, so is
{𝜔 ∈ Ω ∶ lim sup 𝑋𝑛 (𝜔) ≥ 𝑇 }.
𝑛
36
6 Independence
P(𝐴 ∩ 𝐵) = P(𝐴)P(𝐵)
so
P(𝐴) ∈ {0, 1}.
Proof.
E(𝑋) ≥ E(𝑋1𝑋≥𝑡 ) ≥ E(𝑡1𝑋≥𝑡 ) = 𝑡P(𝑋 ≥ 𝑡)
37
6 Independence
Theorem 6.8 (strong law of large numbers). Let (𝑋𝑛 )𝑛≥1 be a sequence of
iid. random variables. Assume E(|𝑋1 |) < ∞. Let
𝑛
𝑆𝑛 = ∑ 𝑋𝑘 ,
𝑘=1
then 1
𝑛 𝑆𝑛 converges almost surely to E(𝑋1 ).
Proof. We prove the theorem under a stonger condition: we assume E(𝑋14 ) <
∞. This implies, by Cauchy-Schwarz, E(𝑋12 ), E(|𝑋1 |) < ∞. Subsequently
E(|𝑋1 |3 ) < ∞. The full proof is much harder but will be given later when
we have developed enough machinery.
wlog we may assume E(𝑋1 ) = 0 by replacing 𝑋𝑛 with 𝑋𝑛 − E(𝑋1 ). Have
E(𝑆𝑛4 ) = ∑ E(𝑋𝑖 𝑋𝑗 𝑋𝑘 𝑋ℓ ).
𝑖,𝑗,𝑘,ℓ
All terms vanish because E(𝑋𝑖 ) = 0 and (𝑋𝑖 )𝑖≥1 are independent, except for
E(𝑋𝑖4 ) and E(𝑋𝑖2 𝑋𝑗2 ) for 𝑖 ≠ 𝑗. For example,
for 𝑖 ≠ 𝑗. Thus
𝑛
E(𝑆𝑛4 ) = ∑ E(𝑋𝑖4 ) + 6 ∑ E(𝑋𝑖2 𝑋𝑗2 ).
𝑖=1 𝑖<𝑗
By Cauchy-Schwarz,
so
𝑛(𝑛 − 1)
E(𝑆𝑛4 ) ≤ (𝑛 + 6 ⋅ )E𝑋14
2
and asymptotically,
𝑆𝑛 4 1
E( ) = 𝑂( 2 )
𝑛 𝑛
so
𝑆𝑛 4 𝑆
E(∑( ) ) = ∑ E( 𝑛 )4 < ∞.
𝑛≥1
𝑛 𝑛≥1
𝑛
𝑆𝑛
lim =0
𝑛→∞ 𝑛
almost surely.
38
7 Convergence of random variables
Example.
1. Let 𝜇𝑛 = 𝛿1/𝑛 be the Dirac mass on R𝑑 , i.e. for 𝑥 ∈ R𝑑 , 𝛿𝑥 is the Borel
probability measure on R𝑑 such that
1 𝑥∈𝐴
𝛿𝑥 (𝐴) = {
0 𝑥∉𝐴
then 𝜇𝑛 → 𝛿0 .
1 𝑥2
= ∫ 𝑓(𝑥) exp(− 2 )𝑑𝑥
√2𝜋𝜎𝑛2 2𝜎𝑛
1 𝑥2
= ∫ 𝑓(𝑥𝜎𝑛 ) √ exp(− )𝑑𝑥
2𝜋 2
39
7 Convergence of random variables
Proposition 7.1. 1 ⟹ 2 ⟹ 3.
Proof.
1. 1 ⟹ 2:
P(‖𝑋𝑛 − 𝑋‖ > 𝜀) = E(1‖𝑋𝑛 −𝑋‖>𝜀 )
so if 𝑋𝑛 → 𝑋 almost surely then
1‖𝑋𝑛 −𝑋‖>𝜀 → 0
so
1
lim sup |E(𝑓(𝑋𝑛 ) − 𝑓(𝑋))| ≤ 𝜀 + 2‖𝑓‖∞ P(‖𝑋‖ > ) .
𝑛→∞ ⏟⏟⏟⏟⏟ 𝜀
→0 as 𝜀→0
which is 0 as 𝜀 is arbitrary.
40
7 Convergence of random variables
Remark.
1. If 𝑋𝑛 → 𝑋 in mean then 𝑋𝑛 → 𝑋 in probability by Markov inequality:
𝜀 ⋅ P(‖𝑋𝑛 − 𝑋‖ > 𝜀) ≤ E(‖𝑋𝑛 − 𝑋‖).
2. The converse is false. For example take Ω = (0, 1), ℱ = ℬ(Σ) and P
Lebesgue measure. Let 𝑋𝑛 = 𝑛1[0, 1 ] . 𝑋𝑛 → 0 almost surely but E𝑋𝑛 = 1.
𝑛
Remark. If (𝑋𝑛 )𝑛≥1 are dominated, namely exists an integrable random vari-
able 𝑌 ≥ 0 such that ‖𝑋𝑛 ‖ ≤ 𝑌 for all 𝑛 then (𝑋𝑛 )𝑛 is uniformly integrable by
dominated convergence theorem:
E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) ≤ E(𝑌1𝑌 ≥𝑀 ) → 0
as 𝑀 → ∞.
Proof.
• 1 ⟹ 2: Left to show uniform integrability:
E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) ≤ E(‖𝑋𝑛 − 𝑋‖1‖𝑋𝑛 ‖≥𝑀 ) + E(‖𝑋‖1‖𝑋𝑛 ‖≥𝑀 )
≤ E(‖𝑋𝑛 − 𝑋‖) + E(‖𝑋‖1‖𝑋𝑛 ‖≥𝑀 (1‖𝑋‖≤ 𝑀 + 1‖𝑋‖> 𝑀 ))
2 2
𝑀 𝑀
≤ E(‖𝑋𝑛 − 𝑋‖) + P(‖𝑋𝑛 − 𝑋‖ ≥ ) + E(‖𝑋‖1‖𝑋‖≥ 𝑀 )
2 2 2
41
7 Convergence of random variables
where
so
lim sup † ≤ 2E(‖𝑋‖1‖𝑋‖>𝑀 ) → 0
𝑛→∞
as 𝑀 → ∞.
42
7 Convergence of random variables
Proposition 7.4. If 𝑝 > 1 and (𝑋𝑛 )𝑛≥1 is bounded in 𝐿𝑝 then (𝑋𝑛 )𝑛≥1 is
uniformly integrable.
Proof.
so
𝐶
lim sup E(‖𝑋𝑛 ‖1‖𝑋𝑛 ‖≥𝑀 ) ≤ →0
𝑛→∞ 𝑀 𝑝−1
as 𝑀 → ∞.
This provides a sufficient condition for uniform integrability and thus con-
vergnce in mean.
43
8 𝐿𝑝 spaces
8 𝐿𝑝 spaces
Recall that 𝜑 ∶ 𝐼 → R is convex means that for all 𝑥, 𝑦 ∈ 𝐼, for all 𝑡 ∈ [0, 1],
E(𝜑(𝑋)) ≥ 𝜑(E(𝑋)).
Remark.
1. As 𝑋 ∈ 𝐼 almost surely and 𝐼 is an interval, have E(𝑋) ∈ 𝐼.
2. We’ll show that 𝜑(𝑋)− is integrable so
Proof.
1. 2 ⟹ 1: every ℓ is convex and the supremum of ℓ is convex:
so
where 𝜃𝑥0 is morally the slope at 𝑥0 , such that 𝜑(𝑥) ≥ ℓ𝑥0 (𝑥) for all 𝑥 ∈ 𝐼.
Then have 𝜑 = sup𝑥 ∈𝐼 ℓ𝑥0 .
0
To find 𝜃𝑥0 observe that for all 𝑥 < 𝑥0 < 𝑦 where 𝑥, 𝑦 ∈ 𝐼, have
44
8 𝐿𝑝 spaces
Proof of Jensen inequality. Let 𝜑(𝑥) = supℓ∈ℱ ℓ(𝑥) where ℓ affine. then
Then
‖𝑓 + 𝑔‖𝑝 ≤ ‖𝑓‖𝑝 + ‖𝑔‖𝑝 .
‖𝑓‖𝑝 𝑓 ‖𝑔‖𝑝 𝑔
∥ + ∥ ≤ 1.
‖𝑓‖𝑝 + ‖𝑔‖𝑝 ‖𝑓‖ 𝑝 ‖𝑓‖𝑝 + ‖𝑔‖𝑝 ‖𝑔‖𝑝
𝑝
Suffice to show for all 𝑡 ∈ [0, 1], for all 𝐹 , 𝐺 measurable such that ‖𝐹 ‖𝑝 = ‖𝐺‖𝑝 =
1, have
‖𝑡𝐹 + (1 − 𝑡)𝐺‖𝑝 ≤ 1
“the unit ball is convex”. For this note that
45
8 𝐿𝑝 spaces
is convex if 𝑝 ≥ 1 so
|𝑡𝐹 + (1 − 𝑡)𝐺|𝑝 ≤ 𝑡|𝐹 |𝑝 + (1 − 𝑡)|𝐺|𝑝
and
∫ |𝑡𝐹 + (1 − 𝑡)𝐺|𝑝 𝑑𝜇 ≤ 𝑡 ∫ |𝐹 |𝑝 𝑑𝜇 +(1 − 𝑡) ∫ |𝐺|𝑝 𝑑𝜇 = 1.
𝑋 ⏟𝑋
⏟⏟⏟⏟ ⏟𝑋
⏟⏟⏟⏟
=1 =1
with equality if and only if there exists (𝛼, 𝛽) ≠ (0, 0) such that 𝛼|𝑓|𝑝 = 𝛽|𝑔|𝑞
𝜇-almost everywhere.
′ ′
E(|𝑋|𝑝 )1/𝑝 ≤ E(|𝑋|𝑝 )1/𝑝 ,
so the function 𝑝 ↦ E(|𝑋|𝑝 )1/𝑝 is non-decreasing. This can be used, for example,
to show that if 𝑋 has finite 𝑝′ th moment then it has finite 𝑝th moment for 𝑝′ ≥ 𝑝.
• For 𝑝 ≥ 1,
• For 𝑝 = ∞,
46
8 𝐿𝑝 spaces
where
Proof. For 𝑝 < ∞ use Minkowski inequality. Similar we can check that ‖𝑓 +
𝑔‖∞ ≤ ‖𝑓‖∞ + ‖𝑔‖∞ for all 𝑓, 𝑔.
Check that this is an equvalence relation stable under addition and multi-
plication.
‖[𝑓]‖𝑝 = ‖𝑓‖𝑝 .
then
𝐾 𝐾
‖𝑆𝐾 ‖𝑝 ≤ ∑‖𝑓𝑛𝑘+1 − 𝑓𝑛𝑘 ‖𝑝 ≤ ∑ 2−𝑘 ≤ 1
𝑘=1 𝑘=1
so by monotone convergence,
47
8 𝐿𝑝 spaces
Proof. Let 𝑡 > lim sup𝑛→∞ ‖𝑓𝑛 ‖∞ . Then exists 𝑛𝑘 ↑ ∞ such that
‖𝑓𝑛𝑘 ‖∞ = essup |𝑓𝑛𝑘 | = inf{𝑠 ≥ 0 ∶ |𝑓𝑛𝑘 (𝑥)| ≤ 𝑠 for 𝜇-almost every 𝑥} < 𝑡
for all 𝑘. Thus for all 𝑘, for 𝜇-almost every 𝑥, |𝑓𝑛𝑘 (𝑥)| < 𝑡. But by 𝜎-additivity
of 𝜇 we can swap the quantifiers, i.e. for 𝜇-almost every 𝑥, for all 𝑥, |𝑓𝑛𝑘 (𝑥)| < 𝑡.
Thus for 𝜇-almost every 𝑥, ‖𝑓(𝑥)‖∞ ≤ 𝑡.
48
9 Hilbert space and 𝐿2 -methods
𝑉 ×𝑉→C
(𝑥, 𝑦) ↦ ⟨𝑥, 𝑦⟩
such that
1. ⟨𝛼𝑥 + 𝛽𝑦, 𝑧⟩ = 𝛼⟨𝑥, 𝑧⟩ + 𝛽⟨𝑦, 𝑧⟩ for all 𝛼, 𝛽 ∈ C, for all 𝑥, 𝑦, 𝑧 ∈ 𝑉.
Proof. Exercise. For reference see author’s notes on IID Linear Analysis.
0 = ⟨𝑓, 𝑓⟩ = ∫ |𝑓|2 𝑑𝜇
𝑋
49
9 Hilbert space and 𝐿2 -methods
‖𝑥 − 𝑦‖ = 𝑑(𝑥, 𝒞)
where by definition
𝑑(𝑥, 𝒞) = inf ‖𝑥 − 𝑐‖.
𝑐∈𝒞
Proof. Let 𝑐𝑛 ∈ 𝒞 be a sequence such that ‖𝑥 − 𝑐𝑛 ‖ → 𝑑(𝑥, 𝒞). Let’s show that
(𝑐𝑛 )𝑛 is a Cauchy sequence. By parallelogram identity,
𝑥 − 𝑐𝑛 𝑥 − 𝑐𝑚 2 𝑥 − 𝑐𝑛 𝑥 − 𝑐𝑚 2 1
∥ + ∥ +∥ − ∥ = (‖𝑥 − 𝑐𝑛 ‖2 + ‖𝑥 − 𝑐𝑚 ‖2 )
2 2 2 2 2
so
2
∥ 𝑐 + 𝑐𝑚 ∥ 1 1
∥𝑥 − 𝑛 ∥ + ‖𝑐𝑛 − 𝑐𝑚 ‖2 = (‖𝑥 − 𝑐𝑛 ‖2 + ‖𝑥 − 𝑐𝑚 ‖2 )
∥ ⏟ 2 ∥ 4 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
2
∥⏟⏟⏟⏟⏟ ∥
∈𝒞 ⏟⏟ →𝑑(𝑥,𝒞)2
≥𝑑(𝑥,𝒞)2
so
lim ‖𝑐𝑛 − 𝑐𝑚 ‖ = 0
𝑛,𝑚→∞
𝑦 + 𝑦′ 2 1 1
∥𝑥 − ∥ + ‖𝑦 − 𝑦′ ‖2 = (‖𝑥 − 𝑦‖2 + ‖𝑥 − 𝑦′ ‖2 ) = 𝑑(𝑥, 𝒞)2
⏟⏟⏟⏟⏟ 2 4 2
≥𝑑(𝑥,𝒞)
so ‖𝑦 − 𝑦′ ‖ = 0.
50
9 Hilbert space and 𝐿2 -methods
Rearrange,
2 Re⟨𝑥 − 𝑦, 𝑧⟩ ≤ ‖𝑧‖2
for all 𝑧 ∈ 𝑉. Now substitute 𝑡𝑧 for 𝑧 where 𝑡 ∈ R+ , have
𝑡 ⋅ 2 Re 𝑥 − 𝑦, 𝑧 ≤ 𝑡2 ‖𝑧‖2 .
For 𝑡 = 0,
Re⟨𝑥 − 𝑦, 𝑧⟩ ≤ 0
Similarly replace 𝑧 by −𝑧 to conclude Re⟨𝑥 − 𝑦, 𝑧⟩ = 0. Finally replace 𝑧 by 𝑒𝑖𝜃 𝑧
to have ⟨𝑥 − 𝑦, 𝑧⟩ = 0 for all 𝑧. Thus 𝑥 − 𝑦 ∈ 𝑉 ⟂ .
ℓ(𝑥) = ⟨𝑥, 𝑣0 ⟩
for all 𝑥 ∈ 𝐻.
Proof. By boundedness of ℓ, ker ℓ is closed so write
𝐻 = ker ℓ ⊕ (ker ℓ)⟂ .
If ℓ = 0 then just pick 𝑣0 = 0. Otherwise pick 𝑥0 ∈ (ker ℓ)⟂ \ {0}. But (ker ℓ)⟂
is spanned by 𝑥0 : indeed for any 𝑥 ∈ (ker ℓ)⟂ ,
ℓ(𝑥)
ℓ(𝑥) = ℓ(𝑥0 )
ℓ(𝑥0 )
so
ℓ(𝑥)
ℓ(𝑥 − 𝑥 )=0
ℓ(𝑥0 ) 0
so 𝑥 − ℓ(𝑥)
ℓ(𝑥0 ) ∈ ker ℓ ∩ (ker ℓ)⟂ = 0. Now let
ℓ(𝑥0 )
𝑣0 = 𝑥
‖𝑥0 ‖2 0
and observe that ℓ(𝑥) − ⟨𝑥, 𝑣0 ⟩ vanishes on ker ℓ and on (ker ℓ)⟂ = C𝑥0 . Thus
it is identically zero.
51
9 Hilbert space and 𝐿2 -methods
𝜈(Ω𝑐 ) = 0.
Example.
1. Let 𝜈 be the Lebesgue measure on (R, ℬ(R)) and 𝑑𝜇 = 𝑓𝑑𝜈 where 𝑓 ≥ 0
is a Borel function then 𝜇 ≪ 𝜈.
2. If 𝜇 = 𝛿𝑥0 , the Dirac mass at 𝑥0 ∈ R, then 𝜇 ⟂ 𝑣.
Non-examinable theorem and proof:
𝜇 = 𝜇 𝑎 + 𝜇𝑠
𝑓 ↦ 𝜇(𝑓) = ∫ 𝑓𝑑𝜇
𝑋
ℓ is bounded by Cauchy-Schwarz and finiteness of the measures:
52
9 Hilbert space and 𝐿2 -methods
so must have
(𝜇 + 𝜈)({𝑔0 > 1 + 𝜀}) = 0
i.e. 𝑔0 ≤ 1 (𝜇 + 𝜈)-almost everywhere.
Now set Ω = {𝑥 ∈ 𝑋 ∶ 𝑔0 ∈ [0, 1)} so on Ω𝑐 , 𝑔0 = 1 (𝜇 + 𝜈)-almost
everywhere. Then (∗) is equivalent to
for all 𝑓 ∈ 𝐿2 (𝑋, 𝒜, 𝜇 + 𝜈). Hence this holds for all 𝑓 ≥ 0. Now 𝑓 be 1−𝑔0 1Ω ,
𝑓
get
𝑔
∫ 𝑓𝑑𝜇 = ∫ 𝑓 0 𝑑𝜈. (∗∗)
Ω Ω
1 − 𝑔0
Set
𝜇𝑎 (𝐴) = 𝜇(𝐴 ∩ Ω)
𝜇𝑠 (𝐴) = 𝜇(𝐴 ∩ Ω𝑐 )
1. 𝜇𝑎 ≪ 𝜈,
2. 𝜇𝑠 ⟂ 𝜈,
3. 𝑑𝜇𝑎 = 𝑔𝑑𝜈 where 𝑔 = 1−𝑔0 1Ω .
𝑔0
Proof.
𝜇𝑎 = ∑(𝜇|𝑋𝑛 )𝑎
𝑛
𝜇𝑠 = ∑(𝜇|𝑋𝑛 )𝑠
𝑛
53
9 Hilbert space and 𝐿2 -methods
𝜇𝑠 (Ω0 ) = 0, 𝜈(Ω𝑐0 ) = 0
𝜇′𝑠 (Ω′0 ) = 0, 𝜈((Ω′0 )𝑐 ) = 0
Now 𝜇𝑎 , 𝜇′𝑎 ≪ 𝜈 so
𝜇𝑎 (Ω𝑐1 ) = 𝜇′𝑎 (Ω𝑐1 ) = 0.
Hence for all 𝐴 ∈ 𝒜,
E(1𝐴 𝑋) = E(1𝐴 𝑌 )
𝑌 = E(𝑋|𝒢).
Set 𝑌 = 𝑔.
Uniqueness is shown in example sheet 3.
54
9 Hilbert space and 𝐿2 -methods
P(𝐴|𝐵) 𝜔 ∈ 𝐵
E(1𝐴 |𝒢)(𝜔) = {
P(𝐴|𝐵𝑐 ) 𝜔 ∈ 𝐵𝑐
where
P(𝐴 ∩ 𝐵)
P(𝐴|𝐵) = .
P(𝐵)
55
10 Fourier transform
10 Fourier transform
𝑓 ̂ ∶ R𝑑 → C
𝑢 ↦ ∫ 𝑓(𝑥)𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝑥
R𝑑
Proposition 10.1.
1. |𝑓(𝑢)|
̂ ≤ ‖𝑓‖1 .
2. 𝑓 ̂ is continuous.
Again |𝜇(𝑢)|
̂ ≤ 𝜇(R𝑑 ) and 𝜇̂ is continuous.
Remark. If 𝑋 is an R𝑑 -valued random variable with law 𝜇𝑋 then 𝜇𝑋
̂ is called
the characteristic function of 𝑋.
Example.
1. Normalised Gaussian distribution on R: 𝜇 = 𝒩(0, 1). 𝑑𝜇 = 𝑔𝑑𝑥 where
2
𝑒−𝑥 /2
𝑔(𝑥) = √ .
2𝜋
Claim that
2
𝜇(𝑢)
̂ = 𝑔(𝑢)
̂ = 𝑒−𝑢 /2
,
√
i.e. 𝑔 ̂ = 2𝜋𝑔. This is the defining characteristic of Gaussian distribution.
56
10 Fourier transform
= − ∫ 𝑖𝑒𝑖𝑢𝑥 𝑔′ (𝑥)𝑑𝑥
R
= 𝑖 ∫(𝑒𝑖𝑢𝑥 )′ 𝑔(𝑥)𝑑𝑥
R
= −𝑢 ∫ 𝑒𝑖𝑢𝑥 𝑔(𝑥)𝑑𝑥
R
= −𝑢𝑔(𝑢)
̂
Thus
𝑑 𝑢2 /2
(𝑔(𝑢)𝑒
̂ )=0
𝑑𝑢
so
2
−𝑢 /2
𝑔(𝑢)
̂ = 𝑔(0)𝑒
̂ .
But
𝑔(0)
̂ = ∫ 𝑔(𝑥)𝑑𝑥 = 1
R
so 𝑔(𝑢)
̂ =𝑒 −𝑢2 /2
as required.
̂
𝐺(𝑢) = ∫ 𝑒𝑖⟨𝑢,𝑥⟩ 𝐺(𝑥)𝑑𝑥
R𝑑
𝑑
= ∏ ∫ 𝑔(𝑥𝑖 )𝑒𝑖𝑢𝑖 𝑥𝑖 𝑑𝑥𝑖
𝑖=1 R
𝑑
= ∏ 𝑔(𝑢
̂ 𝑖)
𝑖=1
−‖𝑢‖2 /2
=𝑒
1 ̂̂
𝑓(𝑥) = 𝑓(−𝑥).
(2𝜋)𝑑
57
10 Fourier transform
where 𝑓(𝑢)
̂ are Fourier coefficients and 𝑒−𝑖⟨𝑢,𝑥⟩ are called Fourier modes, which
are characters 𝑒𝑖⟨𝑢,−⟩ ∶ R𝑑 → {𝑧 ∈ C ∶ |𝑧| = 1}. Informally this says that every
𝑓 can be written as an “infinite linear combination” of Fourier modes.
Proof. (!UPDATE: 1 does not quite reduce to 2 as 𝑓 ̂ = 𝑓 +̂ − 𝑓 −̂ does not quite
hold. Instead write 𝑓 ̂ = 𝑎𝜇̂ − 𝑏𝜇̂ where 𝑎 = ‖𝑓 + ‖ , 𝑑𝜇 = 𝑓+ 𝑑𝑥)
𝑋 𝑌 1 𝑥 ‖𝑓 ‖1
1 reduces to 2 by considering 𝑓 = 𝑓 + − 𝑓 − . In 2 we may assume wlog 𝜇
is a probability measure so is the law of some random variable 𝑋. Let 𝑓(𝑥) =
1
(2𝜋)𝑑
̂̂
𝜇(−𝑥). We need to show 𝜇 = 𝑓𝑑𝑥, which is equivalent to for all 𝐴 ∈ ℬ(R𝑑 ),
Let ℎ = 1𝐴 and wlog assume 𝐴 is a bounded Borel set. The trick is to introduce
an independent Gaussian random variable 𝑁 ∼ 𝒩(0, 𝐼𝑑 ) with law 𝐺𝑑𝑥. We have
as
1 ̂ 1
𝐺(𝑥) = √ 𝐺(𝑥) = √ ∫ 𝐺(𝑢)𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝑢.
( 2𝜋) 𝑑 ( 2𝜋)𝑑 R𝑑
So by a change of variable 𝑦 = 𝜎𝑥,
𝑑𝑢 𝑑𝑦
E(ℎ(𝑋 + 𝜎𝑁 )) = E (∫ ∫ ℎ(𝑋 + 𝑦)𝑒𝑖⟨𝑢,𝑦/𝜎⟩ 𝐺(𝑢) √ )
( 2𝜋)𝑑 𝜎𝑑
𝑑𝑢
= E (∫ ∫ ℎ(𝑧)𝑒𝑖⟨𝑢/𝜎,𝑧−𝑥⟩ 𝐺(𝑢) √ 𝑑𝑧)
( 2𝜋𝜎2 )𝑑
𝑢 𝑑𝑢
= ∫ ∫ ℎ(𝑧)𝑒𝑖⟨𝑢/𝜎,𝑧⟩ 𝜇𝑋
̂ (− )𝐺(𝑢) √ 𝑑𝑧 Tonelli-Fubini
𝜎 ( 2𝜋𝜎2 )𝑑
1 2 2
= 𝑑
∫ ∫ 𝜇𝑋̂ (𝑢)𝑒−𝑖⟨𝑢,𝑧⟩ ℎ(𝑧)𝑒−𝜎 ‖𝑢‖ /2 𝑑𝑢𝑑𝑧
(2𝜋)
58
10 Fourier transform
Proof. We prove the case 𝑑 = 1. The general case follows from Tonelli-Fubini.
We show first that 𝑓, 𝑓 ′ ∈ 𝐿1 implies that 𝑓(𝑢)
̂ = 𝑢𝑖 𝑓 ′̂ (𝑢). This easily follows
from integration by parts:
̂ = ∫ 𝑓(𝑥)𝑒𝑖𝑢𝑥 𝑑𝑥
𝑓(𝑢)
1
= ∫ 𝑓(𝑥)(𝑒𝑖𝑢𝑥 )′ 𝑑𝑥
𝑖𝑢
1
= − ∫ 𝑓 ′ (𝑥)𝑒𝑖𝑢𝑥 𝑑𝑥
𝑖𝑢
so in particular |𝑓(𝑢)|
̂ ≤ |𝑢| ‖𝑓 ‖1 .
1 ′
̂ ‖𝑓 ″ ‖1
|𝑓(𝑢)| ≤ .
|𝑢2 |
As ∫ < ∞, 𝑓 ̂ ∈ 𝐿1 .
∞ 1
1 |𝑢|2 𝑑𝑢
Φ ∶ R𝑑 × R𝑑 → R𝑑
(𝑥, 𝑦) ↦ 𝑥 + 𝑦
i.e. 𝜇 ∗ 𝜈 = Φ∗ (𝜇 ⊗ 𝜈)
59
10 Fourier transform
and
1 ‖𝑥‖2
𝐺𝜎 (𝑥) = √ 𝑒− 2𝜎2 .
2
( 2𝜋𝜎 ) 𝑑
As ‖𝜏𝜎𝑁 (𝑓) − 𝑓‖𝑝 ≤ 2‖𝑓‖𝑝 , apply dominated convergence theorem to get the
required result.
60
10 Fourier transform
Proposition 10.6.
• If 𝑓, 𝑔 ∈ 𝐿1 (R𝑑 ) then
𝑓̂
∗ 𝑔 = 𝑓 ̂ ⋅ 𝑔.̂
̂
𝜇 ∗ 𝜈 = 𝜇̂ ⋅ 𝜈.̂
̂
𝜇 ∗ 𝜈(𝑢) = ∫ 𝑒𝑖⟨𝑢,𝑥+𝑦⟩ 𝑑𝜇(𝑥)𝑑𝜈(𝑦)
Theorem 10.7 (Lévy criterion). Let (𝑋𝑛 )𝑛≥1 and 𝑋 be R𝑑 -valued random
variables. Then TFAE:
1. 𝑋𝑛 → 𝑋 in law,
2. For all 𝑢 ∈ R𝑑 , 𝜇𝑋
̂ 𝑛 (𝑢) → 𝜇𝑋
̂ (𝑢).
In particular if 𝜇𝑋
̂ = 𝜇𝑌̂ for two random variables 𝑋 and 𝑌 then 𝑋 = 𝑌 in
law, i.e. 𝜇𝑋 = 𝜇𝑌 .
E(𝑔(𝑋𝑛 )) → E(𝑔(𝑋)).
wlog it’s enough to check this for all 𝑔 ∈ 𝐶𝑐∞ (R𝑑 ). For the sufficiency see
example sheet.
Note that for all 𝑔 ∈ 𝐶𝑐∞ (R𝑑 ), 𝑔 ̂ ∈ 𝐿1 so by Fourier inversion formula
−𝑖⟨𝑢,𝑥⟩ 𝑑𝑢
𝑔(𝑥) = ∫ 𝑔(𝑢)𝑒
̂ .
(2𝜋)𝑑
61
10 Fourier transform
Hence
−𝑖⟨𝑢,𝑋𝑛 ⟩
E(𝑔(𝑋𝑛 )) = ∫ 𝑔(𝑢)E(𝑒
̂ )𝑑𝑢
𝑑𝑢
= ∫ 𝑔(𝑢)
̂ 𝜇𝑋̂ 𝑛 (−𝑢)
(2𝜋)𝑑
𝑑𝑢
→ ∫ 𝑔(𝑢)
̂ 𝑚𝑋(−𝑢)
̂
(2𝜋)𝑑
= E(𝑔(𝑋))
by dominated convergence theorem.
‖𝑓‖̂ 22 = ∫ |𝑓(𝑢)|
̂ 2 𝑑𝑢
R𝑑
̂ 𝑓(𝑢)𝑑𝑢
= ∫ 𝑓(𝑢) ̂
̂
= ∫ (∫ 𝑓(𝑥)𝑒𝑖⟨𝑢,𝑥⟩ 𝑑𝑥) 𝑓(𝑢)𝑑𝑢
̂
= ∫ ∫ 𝑓(𝑥)𝑓(𝑢)𝑒−𝑖⟨𝑢,𝑥⟩ 𝑑𝑢𝑑𝑥
= ∫ 𝑓(𝑥)𝑓(𝑥)(2𝜋)𝑑 𝑑𝑥
= (2𝜋)𝑑 ‖𝑓‖22
62
10 Fourier transform
= ∫ ∫ 𝑓(𝑥)𝑔(𝑢)𝑒
̂ −𝑖⟨𝑢,𝑥⟩ 𝑑𝑢𝑑𝑥
= ∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥(2𝜋)𝑑
As ‖𝑓‖̂ ∞ ≤ ‖𝑓‖1 , 𝑓𝜎̂ ∈ 𝐿1 (R𝑑 ). Thus 𝑓𝜎̂ ∈ 𝐿2 (R𝑑 ) and ‖𝑓𝜎̂ ‖22 = (2𝜋)𝑑 ‖𝑓𝜎 ‖22 . But
by Gaussian approximation we know that 𝑓𝜎 → 𝑓 in 𝐿2 (R𝑑 ) as 𝜎 → 0. Hence
‖𝑓𝜎 ‖2 → ‖𝑓‖2 . Then
ℱ ∘ ℱ(𝑓) = 𝑓 ̌
for all 𝑓 such that 𝑓, 𝑓 ̂ ∈ 𝐿1 (R𝑑 ). Thus by continuity this holds for 𝐿2 (R𝑑 ).
63
11 Gaussians
11 Gaussians
Proof. If 𝑑 = 1 then this just says that it is determined by its mean 𝑚 and
covariance 𝜎2 , which is obviously true. For 𝑑 > 1, compute the characteristic
function
𝜇𝑋̂ (𝑢) = E(𝑒𝑖⟨𝑋,𝑢⟩ )
but by assumption ⟨𝑋, 𝑢⟩ is Gaussian in 𝑑 = 1 so its law is determined
1. the mean E(⟨𝑋, 𝑢⟩) = ⟨E𝑋, 𝑢⟩,
2. the variance Var⟨𝑋, 𝑢⟩. But
𝑏 = (E𝑋1 , … , E𝑋𝑑 ).
64
11 Gaussians
Theorem 11.4 (central limit theorem). Let (𝑋𝑖 )𝑖≥1 be R𝑑 -valued iid. ran-
dom variables with law 𝜇. Assume they have second moment, i.e. E(‖𝑋1 ‖2 ) <
∞. Let 𝐦 = E(𝑋1 ) ∈ R𝑑 and
𝑋1 + ⋯ + 𝑋𝑛 − 𝑛 ⋅ 𝐦
𝑌𝑛 = √ .
𝑛
Proof. The proof is an application of Lévy criterion. Need to show 𝜇𝑌̂ 𝑛 (𝑢) →
𝜇𝑌̂ (𝑢) as 𝑛 → ∞ for all 𝑢, where 𝑌 ∼ 𝒩(0, 𝐾). As
this is equivalent to show that for all 𝑢, ⟨𝑌𝑛 , 𝑢⟩ converges in law to ⟨𝑌 , 𝑢⟩. But
65
11 Gaussians
𝜇(𝑢)
̂ = ∫ 𝑒𝑖𝑢𝑥 𝑑𝜇(𝑥)
R
𝑑
𝜇(𝑢)
̂ = ∫ 𝑖𝑥𝑒𝑖𝑢𝑥 𝑑𝜇(𝑥) = 𝑖E(𝑋1 )
𝑑𝑢 R
𝑑2
𝜇(𝑢)
̂ = ∫ −𝑥2 𝑒𝑖𝑢𝑥 𝑑𝜇(𝑥) = −E(𝑋12 )
𝑑𝑢2 R
𝑢2 ″
𝜇(𝑢)
̂ ̂ + 𝑢𝜇′̂ (0) +
= 𝜇(0) 𝜇̂ (𝑢) + 𝑜(𝑢2 )
2
𝑢2
=1+0⋅𝑢− + 𝑜(𝑢2 )
2
so
𝑢2 𝑢2 2
𝜇𝑌̂ 𝑛 (𝑢) = (1 − + 𝑜( ))𝑛 → 𝑒−𝑢 /2 = 𝑔(𝑢)
̂
2𝑛 𝑛
as 𝑛 → ∞ where 𝑔 is the law of 𝑌.
66
12 Ergodic theory
12 Ergodic theory
Let (𝑋, 𝒜, 𝜇) be a measure space. Let 𝑇 ∶ 𝑋 → 𝑋 be an 𝒜-measurable map. We
are interested in the trajectories of 𝑇 𝑛 𝑥 for 𝑛 ≥ 0 and their statistical behaviour.
In particular we are interested in those 𝑇 preserving measure 𝜇.
This condition asserts that 𝒯 is trivial, i.e. its elements are either null or
conull.
Proof. Exercise.
Example.
1. Let 𝑋 be a finite space, 𝑇 ∶ 𝑋 → 𝑋 a map and 𝜇 = # the counting
measure, then 𝑇 is measure preserving is equivalent to 𝑇 being a bijection,
and 𝑇 is ergodic is equivalent to there does not exists a partition 𝑋 =
𝑋1 ∪ 𝑋2 such that both 𝑋1 and 𝑋2 are 𝑇-invariant, which is equivalent to
for all 𝑥, 𝑦 ∈ 𝑋, there exists 𝑛 such that 𝑇 𝑛 𝑥 = 𝑦.
2. Let 𝑋 = R𝑑 /Z𝑑 , 𝒜 the Borel 𝜎-algebra and 𝜇 the Lebesgue measure.
Given 𝑎 ∈ R𝑑 , translation 𝑇𝑎 ∶ 𝑥 ↦ 𝑥 + 𝑎 is measure-preserving. 𝑇𝑎
is ergodic with respect to 𝜇 if and only if (1, 𝑎1 , … , 𝑎𝑑 ), where 𝑎𝑖 ’s are
coordinates of 𝑎, are linearly independent. See example sheet. (hint:
Fourier transform)
67
12 Ergodic theory
3. Let 𝑋 = R/Z and again 𝒜 Borel 𝜎-algebra and 𝜇 the Lebesgue mea-
sure. The doubling map 𝑇 ∶ 𝑥 ↦ 2𝑥 − ⌊2𝑥⌋ is ergodic with respect to
𝜇. (hint: again consider Fourier coefficeints). Intuitively in the graph
of this function the preimage 𝜀 is two segments each of length 𝜀/2, so
measure-preserving.
4. Furstenberg conjecture: every ergodic measure 𝜇 on R/Z invariant under
𝑇2 , 𝑇3 must be either Lebesgue or finitely supported.
Φ∶Ω→𝑋
𝜔 ↦ (𝑋𝑛 (𝜔))𝑛≥1
Let
𝑇 ∶𝑋→𝑋
(𝑥𝑛 )𝑛≥1 ↦ (𝑥𝑛+1 )𝑛≥1
be the shift map. Let 𝑥𝑛 ∶ 𝑋 → R𝑑 be the 𝑛th coordinate function and let
𝒜 = 𝜎(𝑥𝑛 ∶ 𝑛 ≥ 1).
Note. 𝒜 is the infinite product 𝜎-algebra ℬ(R𝑑 )⊗N of ℬ(R𝑑 )N .
Proof.
• 1 ⟹ 2: 𝜇 = 𝑇∗ 𝜇 implies 𝜇 = 𝑇∗𝑛 𝜇 for all 𝜇 and this says law of (𝑋𝑖 )𝑖≥1
is the same as that of (𝑋𝑖+𝑛 )𝑖≥1 .
68
12 Ergodic theory
Proposition 12.4 (Bernoulli shift). If (𝑋𝑛 )𝑛≥1 are iid. then (𝑋, 𝒜, 𝜇, 𝑇 )
is ergodic. It is called the Bernoulli shift associated to the law 𝜈 of 𝑋1 . We
have
𝜇 = 𝜈 ⊗N .
Proof. Claim that Φ−1 (𝒯) ⊆ 𝒞, the tail 𝜎-algebra of (𝑋𝑛 )𝑛≥1 . But Kolmogorov
0-1 law says that if 𝐴 ∈ 𝒯 then P(Φ−1 (𝐴)) = 0 or 1, so 𝜇(𝐴) = 0 or 1, thus 𝜇
is 𝑇-ergodic.
Given 𝐴 ∈ 𝒯, 𝑇 −1 𝐴 = 𝐴 so
𝑈 ∶𝐻→𝐻
𝑓↦𝑓 ∘𝑇
𝑈∗ ∶ 𝐻 → 𝐻
𝑥 ↦ 𝑈 ∗𝑥
𝑊 = {𝜑 − 𝜑 ∘ 𝑇 ∶ 𝜑 ∈ 𝐻}
1 𝑛−1 𝜑 − 𝜑 ∘ 𝑇𝑛
𝑆𝑛 𝑓 = ∑(𝜑 ∘ 𝑇 𝑖 − 𝜑 ∘ 𝑇 𝑖+1 ) = →0
𝑛 𝑖=0 𝑛
69
12 Ergodic theory
as 𝑛 → ∞.
Let 𝑓 ∈ 𝑊 then again 𝑆𝑛 𝑓 → 0 because for all 𝜀 exists 𝑔 ∈ 𝑊 such that
‖𝑓 − 𝑔‖ < 𝜀. Then
‖𝑆𝑛 𝑓 − 𝑆𝑛 𝑔‖ = ‖𝑆𝑛 (𝑓 − 𝑔)‖ ≤ ‖𝑓 − 𝑔‖ ≤ 𝜀
so lim sup𝑛 ‖𝑆𝑛 𝑓‖ ≤ 𝜀.
⟂ ⟂
Have 𝐻 = 𝑊 ⊕ 𝑊 and 𝑊 = 𝑊 ⟂ . Claim 𝑊 ⟂ is exactly the 𝑇-invariant
functions. The theorem then follows because if 𝑓 ∘ 𝑇 = 𝑓 then 𝑆𝑛 𝑓 = 𝑓 for all 𝑓.
Proof of claim.
𝑊 ⟂ = {𝑔 ∈ 𝐻 ∶ ⟨𝑔, 𝜑 − 𝑈 𝜑⟩ = 0 for all 𝜑 ∈ 𝐻}
= {𝑔 ∶ ⟨𝑔, 𝜑⟩ = ⟨𝑔, 𝑈 𝜑⟩ for all 𝜑}
= {𝑔 ∶ ⟨𝑔, 𝜑⟩ = ⟨𝑈 ∗ 𝑔, 𝜑⟩ for all 𝜑}
= {𝑔 ∶ 𝑈 ∗ 𝑔 = 𝑔}
= {𝑔 ∶ 𝑈 𝑔 = 𝑔}
where the last equality is by
‖𝑈 𝑔 − 𝑔‖2 = 2‖𝑔‖2 − 2 Re⟨𝑔, 𝑈 𝑔⟩ = 2‖𝑔‖2 − 2 Re⟨𝑈 ∗ 𝑔, 𝑔⟩,
and this shows that 𝑊 ⟂ are exactly 𝑇-invariant functions.
1 𝑛−1
𝑆𝑛 𝑓 = ∑𝑓 ∘ 𝑇𝑖
𝑛 𝑖=0
Corollary 12.7 (strong law of large numbers). Let (𝑋𝑛 )𝑛≥1 be a sequence
of iid. random variables. Assume E(|𝑋1 |) < ∞. Let
𝑛
𝑆𝑛 = ∑ 𝑋𝑘 ,
𝑘=1
then 1
𝑛 𝑆𝑛 converges almost surely to E(𝑋1 ).
Proof. Let (𝑋, 𝒜, 𝜇, 𝑇 ) be the canonical model associated to (𝑋𝑛 )𝑛≥1 , where
𝑋 = RN , 𝒜 = ℬ(R)⊗N , 𝑇 the shift operator and 𝜇 = 𝜈 ⊗N where 𝜈 is the law of
𝑋1 . It is a Bernoulli shift. Let
𝑓 ∶𝑋→R
𝑥 ↦ 𝑥1
70
12 Ergodic theory
then
𝛼𝜇(𝐸𝛼 ) ≤ ∫ 𝑓𝑑𝜇.
𝐸𝛼
𝑓0 = 0
𝑛−1
𝑓𝑛 = 𝑛𝑆𝑛 𝑓 = ∑ 𝑓 ∘ 𝑇 𝑖 𝑛≥1
𝑖=0
Let
𝑃𝑁 = {𝑥 ∈ 𝑋 ∶ max 𝑓𝑛 (𝑥) > 0}.
0≤𝑛≤𝑁
Then
∫ 𝑓𝑑𝜇 ≥ 0.
𝑃𝑁
Integrate to get
∫ 𝐹𝑁 𝑑𝜇 ≤ ∫ 𝐹𝑁 ∘ 𝑇 𝑑𝜇 + ∫ 𝑓𝑑𝜇.
𝑃𝑁 𝑃𝑁 𝑃𝑁
∫ 𝐹𝑁 = ∫ 𝐹𝑁 ≤ ∫ 𝐹𝑁 ∘ 𝑇 + ∫ 𝑓.
𝑃𝑁 𝑋 𝑋 𝑃𝑛
71
12 Ergodic theory
As 𝜇 is 𝑇-invariant, ∫ 𝐹𝑁 ∘ 𝑇 = ∫ 𝐹𝑁 so
∫ 𝑓𝑑𝜇 ≥ 0.
𝑃𝑁
and 𝑆𝑛 𝑔 = 𝑆𝑛 𝑓 − 𝛼. Thus
∫ (𝑓 − 𝛼)𝑑𝜇 ≥ 0,
𝐸𝛼 (𝑓)
which is equivalent to
𝛼𝜇(𝐸𝛼 ) ≤ ∫ 𝑓𝑑𝜇.
𝐸𝛼
𝑓 = lim sup 𝑆𝑛 𝑓
𝑛
𝑓 = lim inf 𝑆𝑛 𝑓
𝑛
1 1
𝑆𝑛 𝑓 ∘ 𝑇 = (𝑓 ∘ 𝑇 + ⋯ + 𝑓 ∘ 𝑇 𝑛 ) = ((𝑛 + 1)𝑆𝑛+1 𝑓 − 𝑓).
𝑛 𝑛
is 𝜇-null, as then
{𝑥 ∶ 𝑓(𝑥) ≠ 𝑓(𝑥)} = ⋃ 𝐸𝛼,𝛽
𝛼>𝛽
Dually
−𝛽𝜇(𝐸𝛼,𝛽 (𝑓)) ≤ ∫ −𝑓𝑑𝜇
𝐸𝛼,𝛽
72
12 Ergodic theory
so
𝛼𝜇(𝐸𝛼,𝛽 ) ≤ 𝛽𝜇(𝐸𝛼,𝛽 ).
But 𝛼 > 𝛽 so 𝜇(𝐸𝛼,𝛽 ) = 0.
We have proved that the limit lim𝑛 𝑆𝑛 𝑓 exists almost everywhere, which we
now define to be 𝑓, and left to show 𝑓 ∈ 𝐿1 and lim𝑛 ‖𝑆𝑛 𝑓 − 𝑓‖1 = 0. This is an
application of Fatou’s lemma:
Now to show ‖𝑆𝑛 𝑓 − 𝑓‖1 → 0, we truncate 𝑓. Let 𝑀 > 0 and set 𝜑𝑀 = 𝑓1|𝑓|<𝑀 .
Note that
Finally
so
lim sup‖𝑆𝑛 𝑓 − 𝑓‖1 ≤ 2‖𝑓 − 𝜑𝑀 ‖1
for all 𝑀, so goes to 0 as 𝑀 → ∞.
Remark.
1. The theorem holds if 𝜇 is only assumed to be 𝜎-finite.
2. The theorem holds if 𝑓 ∈ 𝐿𝑝 for 𝑝 ∈ [1, ∞). The 𝑆𝑛 𝑓 → 𝑓 in 𝐿𝑝 .
73
Index
74
Index
75