0% found this document useful (0 votes)
3 views

probability_script_070212.pdf

The document outlines a lecture on Probability Theory by Prof. Dr. Nina Gantert, covering various foundational topics such as measure spaces, measurable functions, and random variables. It includes definitions, examples, and theorems related to probability spaces, σ-fields, and measures. Key concepts discussed include independence, stochastic processes, and convergence theorems.

Uploaded by

leilasamer2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

probability_script_070212.pdf

The document outlines a lecture on Probability Theory by Prof. Dr. Nina Gantert, covering various foundational topics such as measure spaces, measurable functions, and random variables. It includes definitions, examples, and theorems related to probability spaces, σ-fields, and measures. Key concepts discussed include independence, stochastic processes, and convergence theorems.

Uploaded by

leilasamer2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Probability Theory

Prof. Dr. Nina Gantert

Lecture at TUM in SS 2011

February 7, 2012

Produced by Stefan Seichter


Contents
1 Measure spaces 3

2 Measurable functions, random variables 13

3 Integration of functions 19

4 Markov inequality, Fatou’s lemma, dominated convergence 22

5 Lp -spaces, inequalities 24

6 Different types of convergence, uniform integrability 28

7 Independence 32

8 Kernels and Fubini’s Theorem 36

9 Absolute continuity, Radon-Nikodym derivatives 41

10 Construction of stochastic processes 47

11 The law of large numbers 52

12 Weak convergence, characteristic functions and the central limit theorem 54

13 Conditional expectation 63

14 Martingales 70

2
1 Measure spaces
Let Ω be a non-empty set.

Definition 1.1 A ⊆ P(Ω), i.e. a collection A of subsets of Ω, is a σ-field on Ω, if

(i) Ω ∈ A

(ii) A ∈ A ⇒ Ac ∈ A
S
(iii) An ∈ A, ∀n ∈ N ⇒ An ∈ A
n∈N
T
If An ∈ A, ∀n ∈ N, then also An ∈ A.
n∈N

Definition 1.2 If A is a σ-field (”σ-Algebra”) on Ω, (Ω, A) is a measurable space and


each A ∈ A is measurable.

Definition 1.3 A probability space is a triple (Ω, A, P ) where Ω is a non-empty set, A a


σ-field on Ω and P : A → R a function with

(i) P (A) ≥ 0 ∀A ∈ A

(ii) P (Ω) = 1
∞  ∞
S P
(iii) P · An = P (An ) for each sequence of pairwise disjoint sets in A.
n=1 n=1

Property (iii) is the σ-additivity of P .

Example 1.4 (Discrete probability space)


Ω countable. Say Ω = {ω1 , ω2 , . . . }, A = P(Ω). p1 , p2 , . . . are weights i.e. pj ≥ 0, ∀j and

P P
pj = 1. P is defined by P (A) = pj , A ⊆ Ω. Then (Ω, A, P ) is a probability
j=1 j : ωj ∈A
space. Note that pj = P ({ωj }) > 0 for at least one j. (1.1)

Example 1.5 Ω = {0, 1}N = {ω = (x1 , x2 , . . . ) | xi ∈ {0, 1}, ∀i}


For a ∈ Ω, let An = {ω ∈ Ω | xj = aj , 1 ≤ j ≤ n} be the set of all sequences with first
entries (a1 , . . . , an ).
If P describes fair coin tosses, one should have An ∈ A and P (An ) = 21n . (*)
T
Since {a} = An also {a} ∈ A and since P has the property
n
\ 
P An = lim P (An ) if An ց (i.e. An ⊇ An+1 , ∀n) (1.2)
n→∞
n

(Proof of (1.2) see exercises)


one has
P ({a}) = lim P (An ) = 0 (1.3)
n→∞

In contrast to the discrete case, P is continuous, i.e. there is no ω ∈ Ω with P ({ω}) > 0.

3
Remark A theorem by Ulam says that (assuming the continuum hypothesis) there is no
continuous probability measure on (Ω, P(Ω). ⇒ (*) is only possible if A ( P(Ω).
In general cannot take A = P(Ω), see example 1.5. On the other hand, A should contain
the ”interesting” sets (e.g. An in example 1.5).
Often, one knows P (A) for certain sets A and one wants to choose A so that one can
extend P to a probability measure on (Ω, A).
Ω is always a non-empty set.
A collection of sets M ⊆ P(Ω) is ∪-stable if it contains finite unions of sets in M, ∩-stable
if it contains intersections of finitely many sets in M.

Definition 1.6 A collection A0 ⊆ P(Ω) is an algebra on Ω if

(i) Ω ∈ A0

(ii) A ∈ A0 ⇒ Ac ∈ A0

(iii) A0 is ∪-stable

An algebra contains the empty set ∅ and is ∩-stable (since A ∩ B is (Ac ∪ B c )c ).

Each σ-field is an algebra, but there are algebras which are not σ-fields:
Example Ω = N. Then, A0 = {A ⊆ Ω | A finite or Ac is finite} is the algebra of finite
or cofinite sets. A0 is not a σ-field.

Definition 1.7 A collection D ⊆ P(Ω) is a Dynkin system or λ-system, if

(i) Ω ∈ D

(ii) A, B ∈ D, A ⊆ B ⇒ B\A ∈ D

S
(iii) An ∈ D, ∀n ∈ N, Ai ∩ Aj = ∅ for i 6= j ⇒ · An ∈ D
n=1

A λ-system contains ∅ and the complement Ac of a set A ∈ D.

Each σ-field is a λ-system. On the other hand, there are λ-systems which are not σ-fields.
Example Ω = {1, 2, 3, 4}, D = {∅, Ω, {1, 2}, {1, 4}, {3, 4}, {2, 3}} D is a λ-system, but
not a σ-field.
Note that a λ-system which is ∩-stable is a σ-field:

S ∞
S ∞
S
for A1 , A2 , . . . ∈ D, An = A1 ∪ (An ∩ Ac1 ∩ . . . ∩ Acn−1 ) ⇒ An ∈ D.
n=1 n=2 n=1

Lemma 1.8 Let I 6= ∅ be an index set. If Ai is a σ-field on Ω, ∀i ∈ I, then


T
Ai = {A ⊆ Ω | A ∈ Ai , ∀i ∈ I} is a σ-field on Ω.
i∈I
The same statement holds for algebras and λ-systems.

4
Definition 1.9 Let M ⊆ P(Ω), M = 6 ∅. Then
T
σ(M) = {A | M ⊆ A ⊆ P(Ω) | A σ-field} is the σ-field generated by M.
T
D(M) = {A | M ⊆ A ⊆ P(Ω) | A λ-system} is the λ-system generated by M.
T
α(M) = {A | M ⊆ A ⊆ P(Ω) | A algebra} is the algebra generated by M.
M is the generating system of σ(M), D(M) or α(M).

Theorem 1.10 σ(M) is the smallest σ-field F with M ⊆ F .


If A is a σ-field with M ⊆ A ⊆ P(Ω), then σ(M) ⊆ A.
The same statement holds for α(M) and D(M).

Proof: Use Lemma 1.8, where I is an index set for {A | M ⊆ A ⊆ P(Ω) | A σ-field}.
Have I 6= ∅, since M ⊆ P(Ω) and P(Ω) is a σ-field. Hence σ(M) is a σ-field. -

Example 1.11 Let M = {{ω} | ω ∈ Ω}. Then


σ(M) = {A ⊆ Ω | A countable or Ac countable},
α(M) = {A ⊆ Ω | A finite or Ac finite},
D(M) = σ(M).

Some properties: For all collections M, M1, M2 ∈ P(Ω), we have

(a) M ⊆ σ(M)

(b) σ(M) = σ(σ(M))

(c) M1 ⊆ M2 ⇒ σ(M1 ) ⊆ σ(M2 )

(d) α(M) ⊆ σ(M)

(e) D(M) ⊆ σ(M)

(f) M1 ⊆ σ(M2 ) and M2 ⊆ σ(M1 ) ⇒ σ(M1 ) = σ(M2 )

Proof: (a) and (c) follow from the definition of σ(M).


(b) is true since σ(M) is a σ-field.
(d) and (e) are true since σ(M) is an algebra and a λ-system.
(f) is implied by (b) and (c). -

Lemma 1.12 If M ⊆ P(Ω) is ∩-stable, then D(M) = σ(M).

Proof:

1) We show A ∈ D(M), B ∈ M ⇒ A ∩ B ∈ D(M).


Proof: For B ∈ M, DB = {A ⊆ Ω | A ∩ B ∈ D(M)} is a λ-system,
M ∩ -stable ⇒ M ⊆ DB
DB λ-system ⇒ D(M) ⊆ DB

2) Let A ∈ D(M). We show A ∩ B ∈ D(M).


Proof: DA = {B | A∩B ∈ D(M)} is λ-system, M ⊆ DA due to 1) ⇒ DA = D(M). -

5
Example 1.13 An important σ-field is the Borel-σ-field on Rk .
Let Ok = {A ⊆ Rk | A open} be the collection of open subsets of Rk .
The σ-field Bk of Borel-sets of Rk (Borel-σ-field) is Bk = σ(Ok ).

Remark Each of the following collections generates Bk :


C k = {A ⊆ Rk | A closed},
Kk = {A ⊆ Rk | A compact},
I k = {(x, y] | x, y ∈ Rk , x ≤ y}, where (x, y] = (x1 , y1] × . . . × (xk , yk ] and x ≤ y, if
xi ≤ yi , 1 ≤ i ≤ k,
J k = {(−∞, x] | x ∈ Rk }.

Remark Using the axiom of choice, Bk 6= P(Rk ).

Goal Assume that for A ∈ M ⊆ P(Ω), P (A) is known (see for instance the sets An in
example 1.5), M is not a σ-field.
Can we extend P to a probability measure on a σ-field A with M ⊆ A?
Is this extension unique?
In the following definitions, think of Ω = Rk , M = I k , µ = k-dim. content.

Definition 1.14 Assume M = 6 ∅, M ⊆ P(Ω). A function µ : M → [0, ∞] is a non-


negative function (on sets). µ is
a) finite, if µ(A) < ∞, ∀A ∈ M.

S
b) σ-finite, if there is an increasing sequence A1 ⊆ A2 ⊆ . . . in M such that An = Ω
n=1
and µ(An ) < ∞, ∀n.

pairwise disjoint sets A1 , . . . , An ∈ M with


c) finitely additive, if for finitely many !
n
S Sn n
P
· Aj ∈ M, then we have µ · Aj = µ(Aj ).
j=1 j=1 j=1


S
d) σ-additive, if for pairwise disjoint sets A1 , A2 , . . . ∈ M with · Aj ∈ M, we have
j=1
!

S ∞
P
µ · Aj = µ(Aj ).
j=1 j=1

Definition 1.15 (Ω, A, µ) is a measure space if A is a σ-field on Ω and µ : A → [0, ∞] a


σ-additive non-negative function on A with µ(∅) = 0.
Hence, a probability space is a measure space with µ(Ω) = 1.
The measure space is finite, if µ is finite and σ-finite, if µ is σ-finite.

Example 1.16 Ω 6= ∅, A σ-field on Ω.



1 ω ∈ A
a) Let ω ∈ Ω and define a measure δω by δω (A) = A∈A
0 ω ∈
/A
δω is the Dirac measure in ω.

6
b) Let T ⊆ Ωand define a measure µT by
|A ∩ T | A ∩ T finite
µT (A) =
∞ else
µT is the counting measure in T .

c) Assume µn , n ≥ 1 are measures on (Ω, A) and bn ≥ 0, n ≥ 1. Define



P
µ(A) = bn µn (A) (A ∈ A) Then, µ is a measure on (Ω, A).
n=1


P
In particular, take in c) A = P(Ω), µn = δωn , ωn ∈ Ω, ∀n, bn > 0, ∀n and bn = 1.
n=1
In this case, the measure is a discrete measure which is concentrated on the set {ω1 , ω2, . . . }

Theorem 1.17 (Properties of measures)


Let (Ω, A, µ) be a measure space and A, B, A1 , A2 , . . . ∈ A. Then we have
!
n
S n
P
a) If A1 , A2 , . . . are pairwise disjoint sets, µ · Aj = µ(Aj )
j=1 j=1

b) A ⊆ B ⇒ µ(A) ≤ µ(B)

c) A ⊆ B, µ(A) < ∞ ⇒ µ(B\A) = µ(B) − µ(A)


!

S P∞
d) µ Aj ≤ µ(Aj ) ( σ-subadditivity)
j=1 j=1


S
e) An ր A (i.e. A1 ⊆ A2 ⊆ . . . , A = Ai ) ⇒ µ(A) = lim µ(An )
i=1 n→∞


T
f ) An ց A (i.e. A1 ⊇ A2 ⊇ . . . , A = Ai ) and µ(A1 ) < ∞ ⇒ µ(A) = lim µ(An )
i=1 n→∞

Remark

1. Assume that µ is a non-negative function on A and µ is finitely additive. Then:


(e) ⇒ µ is σ-additive.
Proof: Let (Aj ) be a sequence!of pairwise disjoint sets in A. Then
∞  n n ∞
S S P P
µ · An = lim µ · Aj = lim µ(Aj ) = µ(Aj ).
n=1 n→∞ j=1 n→∞ j=1 j=1

2. The assumption µ(A1 ) < ∞ in (f) is needed:



|A| A finite
Example Ω = N, A = P(Ω), µ(A) =
∞ else
An = {n, n + 1, . . . }. Then An ց ∅ but µ(An ) = ∞, ∀n.

7
Theorem 1.18 (Uniqueness for measures agreeing on a ∩-stable collection of
sets)
Let M ⊆ P(Ω) be ∩-stable and A = σ(M). Assume that µ1 , µ2 are measures on (Ω, A)
with µ1 (A) = µ2 (A), ∀A ∈ M and assume that there is a sequence (An )n≥1 , An ∈ M, ∀n
with An ր Ω and µ1 (An ) = µ2 (An ) < ∞, ∀n. Then µ1 (A) = µ2 (A), ∀A ∈ A.

Proof: Let B ∈ M with µ1 (B) = µ2 (B). Then DB := {A ∈ A | µ1 (B ∩ A) = µ2 (B ∩ A)}


is a λ-system. From Lemma 1.12, since M ⊆ DB , A = σ(M) = D(M) ⊆ DB . In
particular, A ⊆ DAn , ∀n. Since A ∩ An ր A ∈ A, property e) implies that µ1 (A) =
lim µ1 (A ∩ An ) = lim µ2 (A ∩ An ) = µ2 (A), ∀A ∈ A. -
n→∞ n→∞

Corollary 1.19

1. P1 , P2 probability measures on (Rk , Bk ) with P1 ((−∞, x]) = P2 ((−∞, x]), ∀x ∈ Rk


⇒ P1 = P2 .
k
Q
2. There is (at most) one measure λk on (Rk , Bk ) with λk ((x, y]) = (yj − xj ),
j=1
((x, y]k ∈ I k )

3. Let A0 ⊆ P(Ω) be an algebra and A = σ(A0 ), P1 , P2 probability measures on (Ω, A)


with P1 (A) = P2 (A), ∀A ∈ A0 . Then we have P1 = P2 .

Proof:

1. follows from Theorem 1.18 with Ω = Rk , M = J k , An = (−∞, xn ] xn = (n, . . . , n).

2. follows from Theorem 1.18 with Ω = Rk , M = I k , An = (xn , yn ],


xn = (−n, . . . , −n), yn = (n, . . . , n)

3. follows from Theorem 1.18 with An = Ω. -

Definition 1.20 A collection S ⊆ P(Ω) is a semiring (”Semiring”) on Ω if

i) ∅ ∈ S.

ii) S is ∩-stable.

iii) If A, B ∈ S with A ⊆ B there are finitely many pairwise disjoint sets C1 , . . . , Cn ∈ S


n
S
such that B\A = · Cj .
j=1

Examples

a) Ω = Rk , S = I k .

b) (Ω1 , A1 ), (Ω2 , A2) measurable spaces, Ω : = Ω1 × Ω2 . Then, the collection


S = {A1 × A2 | A1 ∈ A1 , A2 ∈ A2 } is a semiring on Ω.

Proof: exercise. -

8
Definition 1.21 A non-negative function µ∗ : P(Ω) → [0, ∞] is an outer measure
(”äußeres Maß”) if:

i) µ∗ (∅) = 0

ii) A ⊆ B ⇒ µ∗ (A) ≤ µ∗ (B)


!

S P∞
iii) µ∗ Aj ≤ µ∗ (Aj ) (σ-subadditivity)
j=1 j=1

Theorem 1.22 Let M ⊆ P(Ω) be a collection of sets with ∅ ∈ M and µ : M → [0, ∞]


a non-negative function on M with µ(∅) = 0. For A ⊆ Ω let
(∞ ∞
)
X [
µ∗ (A) = inf µ(An ) | (An ) sequence in M with A ⊆ An
n=1 n=1

if there is such a ”covering sequence” for A, +∞ otherwise. Then µ∗ is an outer measure


and we say µ∗ is the outer measure induced by µ.

Proof: See ”Measure Theory”. -

We will restrict µ∗ to a measure on a σ-field.

Lemma 1.23 (Carathéodory)


Let µ∗ : P(Ω) → [0, ∞] be an outer measure, A(µ∗ ) := {A ⊆ Ω | µ∗ (A ∩ E) + µ∗(Ac ∩ E) =
µ∗ (E), ∀E ⊆ Ω} is the σ-field of µ∗ -measurable sets. Then:

a) A(µ∗ ) is a σ-field on Ω.

b) The restriction of µ∗ on A(µ∗ ) is a measure on (Ω, A(µ∗)).

Proof: See ”Measure Theory”. -

A set A is µ∗ -measurable if it divides each subset E of Ω in two parts, on which µ∗ behaves


additively.

Theorem 1.24 (Extension Theorem)


Let S ⊆ P(Ω) be a semiring and µ : S → [0, ∞] a non-negative function with the following
properties:

i) µ(∅) = 0

ii) µ is finitely additive

iii) µ is σ-subadditive

Then there is a measure µ̃ on (Ω, σ(S)) with µ̃(A) = µ(A), ∀A ∈ S. If µ is σ-finite,


µ̃ is unique.

Proof: See ”Measure Theory”. -

9
Remark iii) in theorem 1.24 can be replaced with ”µ is σ-additive”.

Lemma 1.25 S ⊆ P(Ω), S semiring, µ : S → [0, ∞] non-negative function with µ(∅) =


0, µ finitely additive. Then:

µ is σ-additive ⇔ µ is σ-subadditive

Proof: See exercises -

Corollary 1.26 Let à ⊆ P(Ω) be an algebra and µ : à → [0, 1] be a non-negative function


with µ(∅) = 0, µ(Ω) = 1. Assume µ is σ-additive. Then there is a unique extension of µ
to a probability measure on (Ω, A) where A = σ(Ã).

Proof: Follows from Theorem 1.24 and Lemma 1.25 since each algebra is a semiring. -

Continuation of example 1.5 (coin tossing)


Ω = {0, 1}N = {ω = (x1 , x2 , . . . ) | xi ∈ {0, 1}, ∀i}
For a ∈ Ω, let Bn,a = {ω | xi = ai , 1 ≤ i ≤ n} and An = σ({Bn,a | a ∈ {0, 1}N }

S
Then: A1 ⊆ A2 ⊆ . . . ⊆ An is an increasing sequence of σ-fields on Ω, ⇒ Ã := An is
n=1
algebra (see exercises).
If P corresponds to tosses of a fair coin, we have P (Bn,a ) = 21n , ∀a ∈ {0, 1}N , ∀n.
This defines P on (Ω, Ã). To show that P satisfies the hypothesis of Corollary 1.26, we
have to show that P is σ-additive on Ã.
We then get an extension of P to a probability measure on (Ω, σ(Ã)).
We postpone the proof of the σ-additivity and will do it later in a more general frame.
Interpretation of An :
An contains the information about ω ”until time n” if ω = (x1 , x2 , . . . ) is seen as a process
in time.
As an application of Theorem 1.24 we consider the construction of probability measures
on (R, B).

Definition 1.27 A function G : R → R is a cdf (cumulative distribution function)


(”maßdefinierende Funktion”) if

(i) G is increasing (x ≤ y ⇒ G(x) ≤ G(y)).

(ii) G is continuous from the right (”rechtsstetig”).

If G satisfies in addition lim G(x) = 1, lim G(x) = 0 then G is a cdf of a probability


x→∞ x→−∞
measure (”Verteilungsfunktion”).

Theorem 1.28 If G is a cdf then there exists a unique measure µG on (R, B) such that

µG ((a, b]) = G(b) − G(a), ∀a, b ∈ R (1.5)

10
Further, µG is σ-finite.
If G is the cdf of a probability measure, then µG is a probability measure.
µG is the Lebesgue-Stieltjes measure associated to G.

Proof: µG is a non-negative function on the semiring I 1 with µG (∅) = 0 and µG is finitely


additive. Due to Theorem 1.24 and Lemma 1.25, it remains to show that µG is σ-additive,

S
i.e. that for each sequence of pairwise disjoint sets (An )n≥1 , An ∈ I, ∀n with · An ∈ I,
∞  n=1
S P∞
we have µG · An = µG (An ).
n=1 n=1 ∞ 

S S
Let · An = (x, y] ∈ I. Then µG · An = G(y) − G(x).
n=1 n=1

P
Assume An = (xn , yn ], ∀n. Then, inf xn = x, max yn = y and µG (An ) ր G(y) −
n n n=1 n→∞
G(x) -

Example 1.29

1. G(x) = x (x ∈ R). The associated measure λ1 or λ is the Borel-Lebesgue measure


on (R, B).

2. If G is a cdf, µG ({x}) = G(x) − G(x−), where G(x−) = lim G(y). Hence µG is


yրx
continuous (i.e. µG ({x}) = 0, ∀x) if and only if G is continuous.

3. For a probability measure ν on (R, B), F (x) := ν((−∞, x]) is the cdf of ν.
F is a cdf and we have µF = ν.
Example ν exponential
 distribution with parameter α.
1 − e−αx x > 0
Then F (x) =
0 x≤0

4. A ⊆ R, A countable ⇒ λ(A) = 0
”⇐” is not true.
Example (Cantor set)
A1 = ( 13 , 32 ), A2 = ( 29 , 93 ) ∪ ( 79 , 98 ) !
S k−1
P 2 k−1
P
1
Ak = zj 3j + 3k
, zj 32j + 2
3k
(k ≥ 2)
(z1 ,...,zk−1 )∈{0,1}k−1 j=1 j=1
S∞
A := · Ak
k=1
The Cantor set C is defined by C := [0, 1]\A
Claim: C is not countable.

P
Proof: Consider the function {0, 1, 2}N → [0, 1], a 7→ ak 31k
k=1
The set {a = (ak )k≥1 | ak ∈ {0, 2}, ∀k} is mapped one to one to C. -
∞ ∞ ∞ 
2 k−1
P P P
We have C ∈ B and λ(A) = λ(Ak ) = 2k−1 31k = 31 3
=1
k=1 n=1 k=1
⇒ λ(C) = 0.

11
Further, Theorem 1.24 implies the existence of the Borel-Lebesgue measure λk on (Rk , Bk ).
Qk
λk is defined on the semiring by λk ((x, y]) = (yi − xi ).
i=1

Proof: Check that the hypotheses of Theorem 1.24 are satisfied. -

Definition 1.31 Let (Ω, A, µ) be a measure space.


A set A ∈ A with µ(A) = 0 is a µ-nullset.
(Ω, A, µ) is complete if ∀A, B ∈ Ω, A ⊆ B, B µ-nullset ⇒ A ∈ A (⇒ µ(A) = 0).

Theorem 1.32 (Ω, A, µ) measure space. The collection

Aµ := {A ⊆ Ω | ∃E, F ∈ A s.t. E ⊆ A ⊆ F, µ(F \E) = 0}

is a σ-field on Ω, A ⊆ Aµ and the non-negative function

µ̄ : Aµ → [0, ∞] A 7→ µ̄(A) = sup{µ(B) | B ∈ A, B ⊆ A}

is a measure on (Ω, Aµ ) which extends µ. The measure space (Ω, Aµ , µ̄) is complete.
We say that (Ω, Aµ , µ̄) is the completion of (Ω, A, µ).

Proof: see exercises -

Remark The measure space (Rk , Bk , λk ) is not complete. Its completion is denoted
(Rk , Lk , λk ). Lk is the σ field of Lebesgue-measurable sets and λk is the Lebesgue measure
on (Rk , Lk ).

12
2 Measurable functions, random variables
Ω 6= ∅, Ω̃ 6= ∅, f : Ω → Ω̃.
The preimage (”Urbild”) of a set à ⊆ Ω̃ is f −1 (Ã) = {ω ∈ Ω | f (ω) ∈ Ã}.
The corresponding function f −1 is defined by f −1 : P(Ω̃) → P(Ω) Ã 7→ f −1 (Ã).
f −1 commutes  ∪
 S with and ∩, more precisely:
 T  letTI be an index set and Ãj ⊆ Ω̃ (j ∈ I),
S
then f −1 Ãj = f −1 (Ãj ), f −1 Ãj = f −1 (Ãj ), f −1 (Ãcj ) = (f −1 (Ãj ))c ,
j∈I j∈I j∈I j∈I
−1
f (Ω̃) = Ω.

Lemma 2.1 Ω 6= ∅, Ω̃ 6= ∅, f : Ω → Ω̃. Then:

a) If à is a σ-field on Ω̃, then f −1 (Ã) = {f −1 (Ã) | à ∈ Ã} is a σ-field on Ω.

b) If A is a σ-field on Ω, Af := {Ã ⊆ Ω̃ | f −1 (Ã) ∈ A} is a σ-field on Ω̃.

Proof: Check the properties of a σ-field in Definition 1.1. -

For M̃ ⊆ P(Ω̃) we define f −1 (M̃) = {f −1 (Ã) | Ã ∈ M̃}.

Definition 2.2 Let (Ω, A) and (Ω̃, Ã) be measurable spaces. The function f : Ω → Ω̃ is
(A, Ã)-measurable if

f −1 (Ã) ⊆ A (i.e. f −1 (Ã) ∈ A, ∀Ã ∈ Ã) (2.1)

Notation: f : (Ω, A) → (Ω̃, Ã).

Remark If à is “large” and A is “small”, there are less measurable functions


f : (Ω, A) → (Ω̃, Ã)

Examples

1. If A = {∅, Ω} and à = P(Ω), only the constant functions f with f (ω) = ω̃, ∀ω ∈ Ω
are (A, Ã)-measurable.

2. If A = P(Ω), each function f : Ω → Ω̃ is (A, Ã)-measurable (no matter what à is).

3. Assume A is generated by a countable partition of Ω into ”atoms”’ A1 , A2 , . . .



S
(i.e. Ai ∩ Aj = ∅, i 6= j and Ω = · Aj ).
j=1
Then f : Ω → R is (A, B)-measurable, if f is constant on each atom Ai .

We show that it suffices to check (2.1) for a generating system of Ã.

Lemma 2.3 Let (Ω, A) and (Ω̃, Ã) be measurable spaces, f : Ω → Ω̃ and assume
M̃ ⊆ P(Ω̃) satisfies σ(M̃) = Ã. Then
f is (A, Ã)-measurable ⇔ f −1 (M̃) ⊆ A.

13
Proof: ”⇒” is clear.
”⇐” holds since M̃ ⊆ Af , Af is a σ-field due to Lemma 2.1 b) ⇒ Ã = σ(M̃) ⊆ Af . -

Consequences 2.4 Let (Ω, A) be a measurable space. Then

a) f : Ω → Rk is (A, Bk )-measurable ⇔ f −1 (I k ) ⊆ A ⇔ f −1 (Ok ) ⊆ A etc.

b) fj : Ω → R (j = 1, . . . , k), f = (f1 , . . . , fk ). Then


f is (A, Bk )-measurable ⇔ fj is (A, B1 )-measurable ∀j ∈ {1, . . . , k}

c) f : Rk → Rm continuous ⇒ f is (Bk , Bm )-measurable (we say: f is Borel-measurable)

d) Let A ⊆ Ω. Then, IA is (A, B1 )-measurable ⇔ A∈ A.


1 ω ∈ A
IA is the indicator function defined by IA (ω) =
0 else

e) Let (Ωj , Aj ) (j = 1, 2, 3) be measurable spaces and f1 : (Ω1 , A1) → (Ω2 , A2 ),


f2 : (Ω2 , A2) → (Ω3 , A3 ) be measurable functions.
Then f2 ◦ f1 : Ω1 → Ω3 is (A1 , A3)-measurable.

Proof:

a) Follows from Lemma 2.3 since Bk = σ(I k ) = σ(Ok ) etc.


j−1
Q k
Q
b) ”⇒”: Fix j and let Aj ⊆ R be an open set. Then, the set A := R × Aj × R is
i=1 i=j+1
an open subset of Rk , and we have fj−1 (Aj ) = f −1 (A) ∈ A.
⇒ fj is (A, B1 )-measurable.
k
Q k
T
”⇐”: For (a, b] = (aj , bj ] ∈ I k , f −1 ((a, b]) = fj−1 ((a, b]) ∈ A.
j=1 j=1
⇒ f −1 (A) ∈ A, ∀A ∈ I k .

c) f continuous ⇒ f −1 (Om ) ⊆ Ok but Ok ⊆ Bk , σ(Om ) = Bm hence f −1 (A) ∈ Bk ,


∀A ∈ Bm .

d) For B ⊆ R, IA−1 (B) ∈ {Ω, A, Ac , ∅}.

e) (f2 ◦ f1 )−1 (A3 ) = f1−1 (f2−1 (A3 )) ∈ A1 , ∀A ∈ A3 . -

We can define σ-fields through functions:

Definition 2.5 Ω 6= ∅, (Ωj , Aj ) (j ∈ J) collection of measurable spaces and (fj )j∈J


collection of functions fj : Ω → Ωj . 
S −1
Then σ(fj , j ∈ J) := σ fj (Aj ) is the σ-field generated by the functions fj (j ∈ J).
j∈J
S −1
If J = {1, . . . , n} we write σ(f1 , . . . , fn ). Have fn−1 (Ak ) ⊆ fj (Aj ) ⊆ σ(fj , j ∈ J) ⇒
j∈J
for each k ∈ J, the function fk is (σ(fj , j ∈ J), Ak )-measurable.

14
Let A be a σ-field on Ω 
such that fk is (A, Ak )-measurable ∀k ∈ J.
S −1 S −1
⇒ fj (Aj ) ⊆ A ⇒ σ fj (Aj ) ⊆ A.
j∈J j∈J
Hence, σ(fj , j ∈ J) is the smallest σ-field A, such that all functions fk are (A, Ak )-
measurable.

Example 2.6 Let (Ω1 , F1), (Ω2 , F2 ), . . . be measurable spaces and



Q
Ω= Ωj = {ω = (x1 , x2 , . . . ) | xi ∈ Ωi }
j=1
Let Πj : Ω → Ωj , ω 7→ xj = Πj (ω) be the projection. Then J = N.

Definition 2.7 In the above setup, the σ-field σ(Πj , j ∈ J) is the product-σ-field (”Produkt-

Q
σ-Algebra”) of F1 , F2, . . . and we write Fj .
j=1

Remark Let An = σ({F1 × . . . × Fn × Ωn+1 × . . . | Fi ∈ Fi , 1 ≤ i ≤ n}).


We have Π−1
j (Fj ) = Ω1 × . . . × Ωj−1 × Fj × Ωj+1 × . . . and
∞ ∞ ∞ ∞
∞ 
T −1
S Q Q S
Πj (Fj ) = F1 × . . . × Fm × Ωm+1 × . . . ⇒ An ⊆ Fj ⇒ Fj = σ An
j=1 n=1 j=1 j=1 n=1

Q
since Fj is the smallest σ-field, which makes all the projections measurable.
j=1

Lemma 2.8 Let (Ω, A, µ) be a measure space and (Ω̃, Ã) be a measurable space and
f : (Ω, A) → (Ω̃, Ã).
Then, µf : Ã → [0, ∞], Ã 7→ µf (Ã) = µ(f −1(Ã)) is a measure on (Ω̃, Ã).
µf is the image of µ under f or the law of f under µ (”Bildmaß”).

Proof: Check the properties in Definition 1.15. -

Definition 2.9 Let P be a probability measure on (Ω, A), let (Ω̃, Ã) be a measurable
space. Then, a function X : (Ω, A) → (Ω̃, Ã) is a random variable with values in Ω̃.
The law PX of X under P (PX is a probability measure) is the law of X.
k k
 (Ω̃, Ã) = (R , B ) the random variable X = (X1 , . . . , Xk ) is
If
a random vector k≥2
a real-valued random variable k = 1
If k ≥ 2, PX = P(X1 ,...,Xk ) is the joint law of X1 , . . . , Xk under P .

Definition 2.10 Let (Ω, A) be a measurable space and Ω0 ⊆ Ω, Ω0 6= ∅.


Let id : Ω0 → Ω, ω 7→ ω be the injection map.
Then we are in the setup of Definition 2.5 with J = {1}, f1 = id.
The σ-field σ(id) = {id−1 (A) | A ∈ A} = {A ∩ Ω0 | A ∈ A} is the trace of A in Ω0 .

Example 2.11 λ Lebesgue measure on (R, B), Ω0 = [0, 1].


Then let B[0,1] denote the trace of B in [0, 1], i.e. B[0,1] = {B ∩ [0, 1] | B ∈ B}.
λ|[0,1] denotes the restriction of λ on ([0, 1], B[0,1] ).
We say λ|[0,1] is the equidistribution or uniform distribution on [0, 1] (”Gleichverteilung”).

15
Define a function f : [0, 1] → {0, 1}N in the following way:

P
choose for a ∈ [0, 1] a sequence (ak )k≥1 such that a = ak 21k (if there are two sequences,
k=1
take the sequence with infinitely many 1’s).
Q
Equip {0, 1}N with the product-σ-field Fi , where Fi = {∅, {0}, {1}, {0, 1}}.
i∈N Q
Then, f : ([0, 1], B[0,1] ) → (Ω, A), Ω = {0, 1}N , A = Fi i.e. f is (B[0,1] , A)-measurable.
i∈N

Proof: exercise -

Remark The image of λ under f is the probability measure on {0, 1}N which describes
infinitely many tosses of a fair coin.

Example 2.12 Ω = R2 , Ω0 = E = {(x, y) ∈ R2 | x2 + y 2 ≤ 1}


Let BE be the trace of B2 in E. The probability measure π1 λ|E on (E, BE ) is the uniform
distribution on E.

We want now to consider functions (or random variables) which can take values in
R̄ = R ∪ {−∞, ∞}.
Let B̄ = {B ⊆ R̄ | B ∩ R ∈ B}. B̄ is a σ-field on R̄.
We have B̄ = B ∪ {B ∪ {∞} | B ∈ B} ∪ {B ∪ {−∞} | B ∈ B} ∪ {B ∪ {−∞, ∞} | B ∈ B}.
We will consider functions f : (Ω, A) → (R̄, B̄). We write
{f > a} = {ω ∈ Ω | f (ω) > a} = f −1 ((a, ∞])
{f = g} = {ω ∈ Ω | f (ω) = g(ω)}
{f = ∞} = {ω ∈ Ω | f (ω) = ∞}

Lemma 2.13 (Ω, A) measurable space and f : (Ω, A) → (R̄, B̄).


The following statements are equivalent:

a) f is (A, B̄)-measurable

b) {f > a} ∈ A, ∀a ∈ R

c) {f ≥ a} ∈ A, ∀a ∈ R

d) {f < a} ∈ A, ∀a ∈ R

e) {f ≤ a} ∈ A, ∀a ∈ R

Proof: a) ⇒ b) since (a, ∞] ∈ B̄


T∞
b) ⇒ c) since {f ≥ a} = {f > a − n1 }
n=1
c) ⇒ d) since {f < a} = {f ≥ a}c
d) ⇒ a) since B̄ is generated by J¯ = {[−∞, a] | a ∈ R} Proof: exercise -

Lemma 2.14 Assume f, g : (Ω, A) → (R̄, B̄).


Then: {f < g} ∈ A, {f ≤ g} ∈ A, {f 6= g} ∈ A.

16
S
Proof: {f < g} = ({f < a} ∩ {g > a})
a∈Q
c
{f ≤ g} = {f > g}
{f = g} = {f ≤ g} ∩ {g ≤ f }
{f 6= g} = {f = g}c -

Theorem 2.15 Assume f, g : (Ω, A) → (R̄, B̄)


Then:

a) f + g and f − g are (A, B̄)-measurable (if defined everywhere!).

b) f · g : (Ω, A) → (R̄, B̄)


1
c) g
: (Ω, A) → (R̄, B̄)
1 1
Here: ∞ = −∞ = 0, 01 = ∞
0·∞ =∞·0= 0
0 · (−∞) = (−∞) · 0 = 0
∞ · ∞ = (−∞) · (−∞) = ∞
∞ · (−∞) = (−∞) · ∞ = −∞
∞ + (−∞) and (−∞) + ∞ are not defined.

Proof:

a) Due to Lemma 2.14, a + cf and −f are (A, B̄)-measurable, too (a, c ∈ R).
Hence {f + g ≥ a} = {f ≥ a − g} ∈ A due to Lemma 2.14.

b) Assume first that f, g take values in R.


Have f · g = 14 ((f + g)2 − (f − g)2 ).
Hence, it sufficesto take f = g i.e. to show that f 2 : (Ω, A) → (R̄, B̄).

2
Ω a≤0
But {f > a} = √ √
{f ≥ a} ∪ {f ≤ − a} a > 0
⇒ {f 2 ≥ a} ∈ A ⇒ f 2 is (A, B̄)-measurable.
General case:
Ω1 = {f · g = ∞}
Ω2 = {f · g = −∞}
Ω3 = {f · g = 0}
Ω4 = Ω\(Ω1 ∪ Ω2 ∪ Ω3 )
Then, Ωj ∈ A (j = 1, 2, 3, 4). The restriction f |Ω4 of f to Ω4 is (A, B)-measurable,
same for g|Ω4 . Due to first case, f |Ω4 · g|Ω4 is (A, B)-measurable. But {f · g ≥ a} =
4
S
· ({f · g ≥ a} ∩ Ωj ) = {f · g = ∞}∪∅ · ∪({f
· · |Ω4 · g|Ω4 )−1 ([a, ∞)) and
· g ≥ a} ∩ Ω3 )∪(f
j=1 
{f · g = 0} a ≤ 0
{f · g ≥ a} ∩ Ω3 =
∅ else
⇒ f · g : (Ω, A) → (R̄, B̄).

17
c) n
Assumeoa > n
0. Then o  n o  n o 
1 1 · 1 · 1
≥ a = ≥ a ∩ {g > 0} ∪ ≥ a ∩ {g = 0} ∪ ≥ a ∩ {g < 0} =
g  g g g
1
0<g≤ ∪· {g = 0} ∪· |{z}

a | {z }
| {z } ∈A ∈A
n ∈A o
⇒ 1g ≥ a ∈ A.
a < 0 analogous. -

Theorem 2.16 Assume f, f1 , f2 , . . . are (A, B̄)-measurable.


Then, the following functions are measurable, too:

a) sup fn , inf fn
n n

b) lim sup fn , lim inf fn


n n

c) f + = max(f, 0) = f ∨ 0 (f + is the positive part of f )


f − = − min(f, 0) = −f ∨ 0 (f − is the negative part of f )
|f | = f + + f −

Proof:

T
a) {sup fn ≤ a} = {fn ≤ a}
n≥1 n=1 | {z }
∈A
inf fn = − sup(−fn )
n n

b) follows from a) since lim sup fn = inf sup fn , lim inf fn = sup inf fn .
n n≥1 k≥n n n≥1 k≥n

c) f + , f − are (A, B̄)-measurable due to a), |f | due to Theorem 2.15 a). -

Note that
f + ≥ 0, f − ≥ 0, f = f + − f − (2.2)

Corollary 2.17 If (fn )n≥1 is a sequence of functions which are (A, B̄)-measurable and
lim fn (ω) exists, ∀ω (we say that the sequence (fn ) converges pointwise) then lim fn is
n→∞ n→∞
(A, B̄)-measurable.

18
3 Integration of functions
Assume (Ω, A, µ) is a measure space.
R
Goal: For as many functions f : (Ω, A) → (R̄, B̄) as possible, define f dµ.
Proceeds in 4 steps:

1. f = IA .

2. f ≥ 0, values in R, f takes only finitely many values.

3. f ≥ 0 (take limits from 2.).

4. f arbitrary (decompose f = f + − f − and apply 3.)

R R
1. IA dµ = 1 dµ = µ(A) (A ∈ A)
A

2. E = {f : (Ω, A) → (R, B | f ≥ 0, f takes finitely many values}


If f ∈ E, f (Ω) = {α1 , . . . , αn }, αj ∈ R+ ⇒
n
X
f= αj I A j (3.1)
j=1

n
S
with Aj = fj−1 ({αj }) ∈ A and Ω = · Aj .
j=1
R n
P
Define f dµ = αj µ(Aj )
j=1
- have to show that for two different representations of the form (3.1), the two
integrals coincide.

3. E ∗ = {f : (Ω, A) → (R̄, B̄) | f ≥ 0}

Theorem 3.1 Let f ∈ E ∗ . Then, there is an increasing sequence of functions


(un )n≥1 ⊆ E with un ր f (i.e. un ≤ un+1 , ∀n and lim un = f )
n

Proof: see ”Measure Theory”. -

Define: Z Z
f dµ = lim un dµ (3.2)
n

- have to show that the limit in (3.2) exists.


- have to show that for two different sequences (un )n≥1 and (vn )n≥1 with
R R
un ր f, vn ր f, lim un dµ = lim vn dµ
n n

19
4. Definition 3.2 The function f : f (Ω, A) → (R̄, B̄) is µ-integrable or integrable if
R + R
f dµ < ∞ and f − dµ < ∞.
In this case, Z Z Z
f dµ := f dµ − f − dµ
+
(3.3)
R
f dµ is the integral of f with respect to µ.

Notations:
R R R
f dµ = µ(f ) = f (ω) µ(dω) = f dµ

Remark
R R R
1. If f + dµ < ∞ or f − dµ < ∞ then (3.3) still defines f dµ - with possible values
+∞ or −∞. In this case, we say that f is quasi-integrable.

2. Assume (Ω, A, P ) is a probability space. Then


R
X is integrable with respect to P ⇔ E(|X|) = |X| dP < ∞.
R
We write E(X) = X dP (X : (Ω, A) → (R, B̄) random variable)

Theorem 3.3 (Properties of the integral)


f, g : (Ω, A) → (R̄, B̄) and f, g are integrable with respect to µ. Then:
R R
a) αf is integrable (α ∈ R) with αf dµ = α f dµ.
R R R
b) If f + g is defined on Ω, f + g is integrable and (f + g) dµ = f dµ + g dµ.

c) f ∪ g = max(f, g) and f ∩ g = min(f, g) are integrable as well.


R R
d) f ≤ g ⇒ f dµ ≤ g dµ
R R
e) | f dµ| ≤ |f | dµ

Proof: see ”Measure Theory” -

Examples

1. (Ω, A) measurable space, ω ∈ Ω, δω Dirac measure in ω.


A function f : (Ω, A) → (R̄, B̄) is integrable withrespect to δω if and only if |f (ω)| <
∞.
R
In this case, f dδω = f (ω).

|A| A finite
2. (Ω, A) = (N, P(N)) and µ is the counting measure µ(A) =
+∞ else
f : N → R̄ is given by the sequence bn = f (n). Then
X
f integrable with respect to µ ⇔ |bn | < ∞
n

R n
P
If f is integrable w.r.t µ, f dµ = bn .
n=1

20
Theorem 3.4 (Monotone Convergence)
If 0 ≤ f1 ≤ f2 ≤ . . . then Z Z
lim fn dµ = lim fn dµ
n n

Proof: see ”Measure Theory” -

Remark f ≥ 0 is needed.

Example 3.5 Ω ∈ R, A = B, µ = λ, fn (x) = − n1 , ∀x, fn is not µ-integrable


R
(fn is quasi-integrable with fn dλ = −∞.)
R
But: fn ր f, f (x) = 0, ∀x, f is integrable f dλ = 0.

Theorem 3.6 (Transformation of measure)


(Ω, A, µ) measure space, (Ω̃, Ã) measurable space, f : (Ω, A) → (Ω̃, Ã).
µf is the law of f under µ. Then:

(i) h : (Ω̃, Ã) → (R̄, B̄), h ≥ 0 ⇒


Z Z
h dµf = (h ◦ f ) dµ (3.4)

(ii) h : (Ω̃, Ã) → (R̄, B̄). Then:

h is integrable with respect to µf ⇒ h ◦ f is integrable with respect to µ

In this case, (3.4) holds.

Proof: In 4 steps, see the definition of the integral at the beginning of the chapter. -

21
4 Markov inequality, Fatou’s lemma, dominated
convergence
Definition 4.1 (Ω, A, µ) measure space. We say, that a property E holds µ-almost
everywhere (µ-a.e.) (”µ-fast überall”) if {ω | E does not hold for ω} is a µ-nullset.
If P is a probability measure, we say that E holds P -almost surely (P -a.s.)
(”P -fast sicher”). In this case P ({ω | E holds for ω}) = 1.

Example f, g : (Ω, A) → (R, B̄)


f = g, µ-a.e. means that µ({f 6= g}) = 0.

Theorem 4.2 Let f, g : (Ω, A) → (R̄, B̄) and assume f = g, µ-a.e. Then

f is µ-integrable ⇔ g is µ-integrable
R R
and in this case: f dµ = g dµ.

Proof: {f + 6= g + } ∪ {f − 6= g − } ⊆ {f 6= g}. Hence w.l.o.g. f ≥ 0, g ≥ 0.


N := {f 6= g} ∈ A since f, g are measurable
h := ∞ · IN and hn := nIN ր h
R mon. R
hn dµ = nµ(N) = 0 ⇒ h dµ = 0
conv.
Since f ≤ g + h and g ≤ f + h we have
f µ-integrable ⇔ g is µ-integrable
R R
and if f is µ-integrable, f dµ = g dµ. -

Theorem 4.3 (Markov inequality)


f : (Ω, A) → (R̄, B̄), f ≥ 0, Bt = {f ≥ t} ∈ A (t > 0). Then
Z
1
µ({f ≥ t}) ≤ f dµ. (4.1)
t
R
Proof: 0 ≤ t · IBt ≤ f · IBt ≤ f and t · IBt dµ = tµ(Bt ).
R R
Hence, µ({f ≥ t}) ≤ 1t f · IBt dµ ≤ 1t f dµ. -

Consequences 4.4
1. f : (Ω, A) → (R̄, B̄), f ≥ 0. Then,
Z
f dµ = 0 ⇔ f = 0 µ-a.e.
R
Proof: µ({f ≥ t}) ≤ 1t f dµ = 0 (t > 0)
S
hence µ({f > 0}) = µ {f ≥ n1 } ≤ 0
n
⇒ µ({f > 0}) = 0 -

2. f : (Ω, A) → (R̄, B̄)


Assume f is µ-integrable. Then:
µ({|f | = ∞}) = 0 i.e. |f | < ∞ µ-a.e.

22
(4.1) R
1 n→∞
Proof: µ({|f | = ∞}) ≤ µ({|f | ≥ n}) ≤ n
|f | dµ −−−→ 0. -

Lemma 4.5 (Fatou’s lemma)


(fn ) sequence of measurable functions, fn ≥ 0, ∀n. Then
Z Z
lim inf fn dµ ≤ lim inf fn dµ (4.2)
n→∞ n→∞

Proof: gn := inf fk , n = 1, 2, . . .
k≥n
Then 0 ≤ g1 ≤ g2 ≤ . . . and lim inf fn = lim gn .
n→∞ n→∞
R R mon. R gn ≤fn R
Hence lim inf fn dµ = lim gn dµ = lim gn dµ ≤ lim inf fn dµ. -
n→∞ n→∞ conv. n→∞ n→∞

Remark

1. Suffices to assume that fn ≥ g, ∀n for some function g which is µ-integrable.


Proof: Apply the statement to fn − g. -

2. (4.2) does not always hold.


Example Ω = [0, 1], A = B̄[0,1] , µ = λ|[0,1]
fn (ω) = −n2 ω n , 0 ≤ ω ≤ 1
R
fn (x) → 0 µ-a.s. hence 0 = lim inf fn dµ but
n→∞
R R1 2 n

n2

lim inf fn dµ = lim inf −n x dx = lim inf − n+1 = −∞
n→∞ n→∞ 0 n→∞

Theorem 4.6 (Dominated convergence, Lebesgue’s Theorem)


f, f1 , f2 , . . . (Ω, A) → (R̄, B̄).
Assume that f = lim fn µ-a.e. If there is a function g ≥ 0 which is integrable and
n

|fn | ≤ g µ-a.e., ∀n ≥ 1 (4.3)

then f is µ-integrable and we have


Z Z
f dµ = lim fn dµ
n→∞

Proof: see ”Measure Theory”. -

Remark If µ is a probability measure, (4.3) is satisfied if |fn | ≤ K µ-a.s. for some


constant K.

23
5 Lp-spaces, inequalities
(Ω, A, µ) measure space, p ≥ 1.
R
Definition 5.1 f : (Ω, A) → (R̄, B̄) has a finite p-th absolute moment if |f |p dµ < ∞.
R
Lp = Lp (Ω, A, µ) := {f : (Ω, A) → (R̄, B̄) | |f |p dµ < ∞} is the collection of measurable
functions on (Ω, A) with a finite p-th absolute moment.

Remark Lp (Ω, A, µ) is a vector space on R.


f, g ∈ Lp (Ω, A, µ) ⇒ αf ∈ Lp (α ∈ R) and
|f + g|p ≤ (|f | + |g|)p ≤ (2(|f | ∨ |g|))p ≤ 2p |f |p + 2p |g|p
⇒ f + g ∈ Lp if f, g ∈ Lp

Definition 5.2 Let X be a random variable on (Ω, A, P ) (X : (Ω, A) → (R̄, B̄))


R
E(|X|p) = |X|p dP is the p-th absolute moment of X.
X ∈ Lp (Ω, A, µ) ⇔ E(|X|p ) < ∞
In this case: E(X p ) exists and E(X p ) < ∞. E(X p ) is the p-th moment of X.

Definition 5.3 f : (Ω, A) → (R̄, B̄) is µ-almost everywhere bounded if there is a constant
K (0 ≤ K < ∞) such that µ({|f | > K}) = 0
L∞ = L∞ (Ω, A, µ) = {f : (Ω, A) → (R̄, B̄) | f µ-a.e. bounded}

Remark L∞ (Ω, A, P ) is a vector space on R.

Definition 5.4 (p-Norm)


For f : (Ω, A) → (R̄, B̄) we define
R 1
kf kp = |f |p dµ p and kf k∞ = inf{s > 0 : µ({|f | > s}) = 0}.
kf k∞ is called the essential supremum of f with respect to µ and we write
kf k∞ = esssup f .

Example 5.5 (Ω, A, µ) = (R, B, λ), f = IQ ⇒ kf k∞ = 0

Next we proof that k · kp is a seminorm on Lp for p ∈ [1, ∞].


In order to achieve this, we need two inequalities:

Theorem 5.6 (Hölder’s inequality)


For p ∈ [1, ∞] let q ∈ [1, ∞] such that p1 + 1q = 1. Then for f, g : (Ω, A) → (R̄, B̄) we have
R R  1 R q  1q
|f g| dµ ≤ |f |p dµ p |g| dµ or short: kf gk1 ≤ kf kp kgkq

The proof of Hölder’s inequality makes use of the following


1 1 xp yq
Lemma 5.7 1 < p, q < ∞, p
+ q
= 1. For all x, y ∈ [0, ∞] we have xy ≤ p
+ q

Proof: If 0 < x, y < ∞, we consider the following picture:


(picture will be here soon) -

24
Proof of Hölder’s inequality: W.l.o.g., let 0 < kf kp kgkq < ∞
Due to Lemma 5.7 we have
|f | |g| |f |p |g|q
kf kp kgkq
≤ 1p kf kpp
+ 1q kgk q
q
Integrating
R
with respect to µ yields
|f g| dµ 1 1
kf kp kgkq
≤ p ·1+ q ·1= 1 -

Remark Hölder’s inequality still holds for p = 1, q = ∞, i.e. kf gk1 ≤ kf k1 kgk∞.


Proof: see exercises -
If p = q = 2, Hölder’s inequality is also called Cauchy-Schwarz inequality.

Theorem 5.8 (Minkowski inequality)


Let p ∈ [1, ∞] and f, g : (Ω, A) → (R̄, B̄) such that f + g is well-defined. Then the triangle
inequality in Lp holds, i.e. we have kf + gkp ≤ kf kp + kgkp.

Proof:

Step 1 p < ∞
We can restrict ourselves to the non-trivial case in which we have kf kp , kgkp < ∞
and thus kf + gkp < ∞. Moreover, w.l.o.g. we can assume that f, g are non-
negative since we have kf + gkp ≤ k|f | + |g|kp.
Further, for p = 1 we have k|f | + |g|kp = kf kp + kgkp and thus we can assume
p > 1.
We define q := 1−1 1 ⇒ 1p + 1q = 1 and we have
R R p R
(f + g)p dµ = f (f + g)p−1 dµ + g(f + g)p−1 dµ
Hölder R 1
≤ kf kp k(f + g)p−1kq + kgkp k(f + g)p−1kq = (kf kp + kgkp ) (f + g)(p−1)q dµ q
(p−1)q=p R 1− 1
= (kf kp + kgkp ) (f + g)p dµ p .

Step 2 p = ∞
Again we assume that we have kf kp , kgkp < ∞.
For all ε > 0 we have
µ({|f + g| > kf k∞ + kgk∞ + ε}) ≤ µ({|f | > kf k∞ + 2ε }) + µ({|g| > kgk∞ + 2ε }) = 0
⇒ µ({|f + g| > s}) = 0, ∀s > kf k∞ + kgk∞ . -

Remark k · kp is a seminorm on Lp (Ω, A, µ):

• kf kp ≥ 0

• f ≡ 0 ⇒ kf kp = 0

• kαf kp = |α|kf kp

• kf + gkp ≤ kf kp + kgkp

But kf kp = 0 does not imply that we have f ≡ 0.


We have kf kp = 0 ⇔ f = 0 µ-a.e.

25
We define N0 := {f ∈ Lp : f = 0 µ-a.e.} and observe that N0 is a linear subset of Lp .
Let Lp := Lp (Ω, A, µ) = Lp (Ω, A, µ)/N0 denote the quotient space of Lp with respect to
N0 .
Lp is a normed vector space over R with norm k[f ]kp := kf kp for some f ∈ [f ] where [f ]
denotes the equivalence class of f .

Definition 5.9 (Lp -convergence)


For p ∈ [1, ∞] let (fn )n∈N be a sequence of functions in Lp . We say (fn )n∈N converges in
the p-th norm or in Lp to a limit f ∈ Lp if we have lim kfn − f kp = 0.
n→∞

Theorem 5.10 (Completeness of Lp )


Every Cauchy sequence in Lp converges in Lp to a limit f ∈ Lp , i.e. Lp equipped with the
metric d(·, ·) defined by d(f, g) = kf − gkp is a complete metric space.
In particular Lp (Ω, A, µ) is a Banach space.
Furthermore for every Cauchy sequence (fn )n∈N in Lp there is f ∈ Lp and a subsequence
k→∞
(fnk )k∈N such that fnk −−−→ f µ-a.e.

Proof:

Step 1 We construct a subsequence (fnk )k∈N which converges µ-a.e.


For k ∈ N there is nk ∈ N such that kfn − fm kp ≤ 2−k , ∀m, n ≥ nk ∈ N.

P
We define gk := fnk+1 − fnk and g := |gk |.
k=1

T hm. 5.8 P
kgkp ≤ kgk kp < ∞
k=1

P
⇒ |g| < ∞ µ-a.e. ⇒ gk converges absolutely µ-a.e.
k=1
m
P
⇒ the sequence (fnk )k∈N converges µ-a.e. because gk = fnm+1 − fn1 .
k=1
⇒ there is a set N ∈ A with µ(N) = 0 such that lim fnk exists for all ω ∈ Ω\N.
k→∞
We define f = lim fnk · IΩ\N and this completes step 1.
k→∞

Step 2 We show that (fn )n∈N converges in Lp to f .


In order to do so, we need the following lemma.

Lemma 5.11 Let (Ω, A, µ) be a measure space, p ∈ [1, ∞] and (fn ) a sequence of R̄-
valued, (A, B̄)-measurable functions such that:

• ∃g ≥ 0, g ∈ Lp , |fn | ≤ g µ-a.e., ∀n ∈ N

• fn → f µ-a.e. with f : (Ω, A) → (R̄, B̄)

Then we have f ∈ Lp , and


Z
Lp
|fn − f |p −−−→ 0 (⇔ fn −→ f )
n→∞

Proof: see exercises -

26
With the help of Lemma 5.11, we can conclude that (fnk ) converges in Lp to f since we
have
P
|fnk | ≤ |fnk − fnk−1 + fnk−1 − fnk−2 + . . . + fn2 − fn1 + fn1 | ≤ |fnk − fnk−1 | + |fn1 | ∈ Lp
k
In other words kfnk − f kp −−−→ 0, and since (fn ) is a Cauchy sequence,
k→∞
kfn − f kp −−−→ 0. -
n→∞

Lemma 5.12 Let (gn ) be a sequence of non-negative measurable functions. Then


P P
k g n kp ≤ kgn kp .
n n

N
P ∞
P
Proof: hN = gn ր h = gn .
n=1 n=1
R 1 R 1
Hence ( hpN dµ) p ր ( hp dµ) p (monotone convergence).
R 1 PN P∞
But ( hpN dµ) p ≤ kgn kp ≤ kgn kp .
Minkovski n=1 n=1
nR 1
o P ∞
Hence sup ( hpN dµ) p ≤ kgn kp . -
N n=1

Lp a.e.
Example 5.13 (fn −→ f ; fn −−→ f )
Ω = [0, 1], A = B and µ = λ
Define a sequence (An ) as follows:
m = 1: A1 = [0, 11 )
m = 2: A2 = [0, 21 ), A3 = [ 12 , 1)
m = 2: A4 = [0, 31 ), A5 = [ 13 , 32 ), A6 = [ 23 , 1)
···
Define (fn ) as fn := IAn , n ≥ 1 and f = 0.
Then (fn ) converges to f in Lp , for 1 ≤ p < ∞.
m−1
P R 1
Indeed, as soon as n ≥ j, we have |fn − f |p = λ(An ) ≤ m
.
j=1
But fn 9 f µ-a.e.
Indeed, ∀ω ∈ (0, 1): lim sup fn (ω) = 1, lim inf fn (ω) = 0.
n n

Remark fn 9 f in L∞ .

Example 5.14 (lp -space)


(Ω, A) = (N, P(N)), µ counting measure.
lp := Lp (Ω, A, µ) = {x = (xk ) ∈ RN , kxkp < ∞} where
 ∞  p1

 P p
kxkp = |xk | for 1 ≤ p < ∞,
k=1

kxk∞ = sup{|xk |}
 for p = ∞.
n
From Theorem 5.9, (lp , k · kp ) is a Banach space for any 1 ≤ p ≤ ∞.

27
6 Different types of convergence, uniform integrability
(Ω, A, P ) a probability space.
X, X1 , X2 , . . . random variables.

Definition 6.1
P
• We say that (Xn ) converges in probability to X (Notation: Xn −
→ X), if
n→∞
∀ε > 0 : P (|Xn − X| ≥ ε) −−−→ 0.

• We say that (Xn ) converges almost surely (a.s.) to X if


n→∞
P ({ω : Xn (ω) −−−→ X(ω)}) = 1.

• We say that (Xn ) converges in Lp to X if


n→∞
E(|Xn − X|p ) −−−→ 0 (1 ≤ p < ∞).

Theorem 6.2
Lp P
i) If Xn −−−→ X, then Xn −−−→ X
n→∞ n→∞

a.s. P
ii) If Xn −−−→ X, then Xn −−−→ X
n→∞ n→∞

E(|Xn −X|p ) n→∞


Proof: For i), note that P (|Xn − X| ≥ ε) ≤ εp
−−−→ 0.
For ii), we need
a.s. n→∞
Lemma 6.3 Xn −−−→ X ⇔ ∀ε > 0 : P (sup |Xn − X| ≥ ε) −−−→ 0.
n→∞ k≥n

Thanks to Lemma 6.3 and the fact that


P (|Xn − X| ≥ ε) ≤ P (sup |Xk − X| ≥ ε),
k≥n
n→∞ a.s.
we have, ∀ε > 0 : P (|Xn − X| ≥ ε) −−−→ 0 if Xn −−−→ X. -
n→∞

Proof of Lemma 6.3:


”⇒”: Let ε > 0 and define
An = {sup |Xk − X| ≥ ε}
k≥n
C = { lim Xn = X}
n→∞
Bn = C ∩ An

T
We have Bn ⊇ Bn−1 ⊇ . . . and Bn = ∅.
n=1
The continuity from above of P implies that lim P (Bn ) = 0 (see (1.2)).
n
On the other hand, P (C) = 1 ⇒ P (Bn ) = P (An ), ∀n ≥ 1.
n→∞
Hence, P (An ) −−−→ 0.
”⇐”: Define An and C as above and Dε = {lim sup |Xn − X| ≥ ε}.
n
We have Dε ⊆ An , ∀n ≥ 1.
n→∞
By hypothesis, P (An ) −−−→ 0, hence P (Dε ) = 0, ∀ε > 0.

28

S ∞
S
One can write C c = {lim sup |Xn − X| ≥ k1 } = D1
k
k=1 n k=1

P
0 ≤ P (C c ) ≤ P (D 1 ) = 0 ⇒ P (C) = 1. -
k
k=1

Remark The converse implications of i) and ii) in Theorem 6.2 do not hold.

Example 5.13 Ω = [0, 1), A = B, P = λ, Xn = IAn , X ≡ 0.


Lp
We have Xn −→ X but Xn 9 X a.s.
P
In this case, Xn −
→ X.

Example 6.4 Ω = [0, 1], A = B, P = λ, Xn (ω) = (n + 1)ω n (n = 1, 2, . . . ), X(ω) = 0


a.s. P
Then Xn −−−→ X ⇒ Xn −−−→ X
n→∞ n→∞ 
R1 1 if p = 1
p p np (n+1)p
E(|Xn − X| ) = (n + 1) X dx = np+1 →
0 ∞ if 1 < p < ∞
In both cases, the limit is different from 0.
⇒ Xn 9 X in Lp .

Theorem 6.5 The following assertions are equivalent:


P
a) Xn −−−→ X
n→∞

b) Every subsequence (Xnk ) of (Xn ) has another subsequence (Xñk ) such that
a.s.
Xñk −−−→ X.
n→∞

Proof: a) ⇒ b):
Let (Xnk ) be a subsequence of (Xn ).
Then, there exists (Xñk ) subsequence of (Xnk ) such that P (|Xñk − X| ≥ k1 ) ≤ 1
k2
, k ≥ 1.
n→∞
We now prove that ∀ε > 0, P (sup |Xñk − X| ≥ ε) −−−→ 0.
k≥n
Let ε > 0 and n large enough such that n1 < ε.

P ∞
P
P (sup |Xñk − X| ≥ ε) ≤ P (|Xñk − X| ≥ ε) ≤ P (|Xñk − X| ≥ k1 ) ≤
k≥n k=n k=n

P 1 n→∞
k2
−−−→ 0.
k=n
b) ⇒ a):
Let ε > 0 and define bn = P (|Xn − X| ≥ ε). bn is bounded.
Let (bnk ) be a convergent subsequence of (bn ).
a.s. k→∞
Then there exists (bñk ) such that Xñk −−−→ X, which implies that bñk −−−→ 0
k→∞
by Theorem 6.2 ii).
k→∞
⇒ bnk −−−→ 0.
Hence, every subsequence (bnk ) of (bn ) which converges, converges to 0.
n→∞
⇒ bn −−−→ 0. -

Remark We already know from Theorem 5.10 that for a sequence (Xn ) such that
Lp a.s.
Xn −−−→ X (1 ≤ p ≤ ∞), there is a subsequence (Xnk ) such that Xnk −−−→ X.
n→∞ n→∞

29
Definition 6.6 We say that the sequence (Xn ) is uniformly integrable if
Z !
lim sup |Xn | dP =0 (6.1)
c→∞ n
{|Xn |≥c}
 R 
|Xn | dP = E(|Xn |I{|Xn |≥c}) .
{|Xn |≥c}

Remarks

1. Every uniformly integrable sequence (Xn ) is bounded in L1 , that is


R
sup |Xn | dP < ∞.
n R R
Indeed |Xn | dP = c + |Xn | dP .
{|Xn |≥c}

2. |Xn | ≤ K a.s. ∀n ⇒ (Xn ) is uniformly integrable.

Theorem 6.7 (Extension of Lebesgue’s Theorem / Dominated convergence)


For a sequence (Xn ) of random variables which is uniformly integrable and converges to
a random variable X in probability, we have Xn → X in L1 .
In particular, if Xn → X P -a.s. and (Xn ) uniformly integrable then
lim E(Xn ) = E(lim Xn ) = E(X).
n n

For the proof, we need the following lemma:

Lemma 6.8 If (Xn ) is uniformly integrable then there is, for each ε > 0, some δ =
R
δ(ε) > 0 so that P (A) ≤ δ ⇒ sup |Xn | dP ≤ ε.
n A

Proof of Lemma 6.8: We have


Z Z Z Z
|Xn | dP = |Xn | dP + |Xn | dP ≤ cP (A) + |Xn | dP (6.3)
A A∩{|Xn |<c} A∩{|Xn |≥c} A∩{|Xn |≥c}

ε
and for c = c(ε) large enough and δ = 2c
, both terms on the right hand side of (6.3) are
≤ 2ε . -
Proof of Theorem 6.7: Xn → X in probability. W.l.o.g. X ≡ 0.
R
Then E(|Xn |) ≤ ε + |Xn | dP . (Take c = ε, A = Ω in (6.3).)
{|Xn |≥ε}
R
But P (|Xn | ≥ ε) ≤ δ for n ≥ N0 (δ) since Xn → 0 in probability, hence |Xn | dP ≤ ε
{|Xn |≥ε}
n→∞
for n ≥ N0 (ε) because of Lemma 6.8. Hence E(|Xn |) −−−→ 0. -

Remark 6.9 X, X1 , X2 , . . . ∈ Lp (Ω, A, P ).


Then the following statements are equivalent:

a) Xn → X in Lp .

b) Xn → X in probability and the sequence (|Xn − X|p ) is uniformly integrable.

30
Proof: see exercises -

An example of a sequence which is not uniformly integrable is Example 6.4 or

Example 6.10 Y1 , Y2 , . . . fair coin tosses, i.e.


P (Yi = 0) = 21 = P (Yi = 1), P (Y1 = 0, . . . , Yn = 0) = 21n .
T (ω) = min{n ≥ 1 | Yn (ω) = 1} is the waiting time for the first ”1”.
Let c > 2 be constant, Xn := cn I{T >n} .
n→∞
Then Xn −−−→ 0 P -a.s. (since T < ∞ P -a.s.).
n→∞
But E(|Xn |) = E(Xn ) = cn 2−n −−−→ ∞ since c > 2.

A sequence (Xn ) is uniformly integrable if it is bounded in Lp for some p > 1 i.e.


sup E(|Xn |p ) < ∞ or if sup E(|Xn | log+ |Xn |) < ∞.
n n
More generally, we have the following theorem:
g(x)
Theorem 6.11 Assume that g : R+ → R+ is increasing with lim x
= ∞ and that
x→∞
sup E(g(|Xn |)) < ∞.
n
Then (Xn ) is uniformly integrable.

Proof: For each δ > 0, there is c(δ) so that

g(|x|) 1
|x| ≥ c(δ) ⇒ ≥ (6.4)
|x| δ
Z
R
Hence, sup |Xn | dP ≤ δ sup g(|Xn|) dP ≤ ε for δ = δ(ε). -
n {|Xn |≥c(δ)} (6.4)
|n {z }
<∞

Then, take g(x) = xp (p > 0) or g(x) = x log+ x to get the statements above.

31
7 Independence
(Ω, A, P ) probability space.

Definition 7.1 A collection Ai (i ∈ I) of sets in A is (stochastically) independent if


!
\ Y
J ⊆ I, J finite (Ai 6= Aj for i, j ∈ J) ⇒ P Ai = P (Ai ) (7.1)
i∈J i∈J

Sets in A are also denoted ”events”.


A collection Gi (i ∈ I) with Gi ⊆ A (i ∈ I) is independent, if for each choice Ai ∈ Gi (i ∈ I),
the events Ai (i ∈ I) are independent.

Remark 7.2 In particular, two events A and B are independent if P (A∩B) = P (A)P (B).
Note that there may be probability measures Q on (Ω, A) with Q(A ∩ B) 6= Q(A)Q(B).
Hence, independence is not a property of A and B alone but involves P .

Theorem 7.3 Assume Gi (i ∈ I) is an independent collection of systems of sets,


Gi ⊆ A (i ∈ I) and Gi is ∩-stable (i ∈ I). Then:

(i) The σ-fields σ(Gi ) (i ∈ I) are independent, too.

 S Jk (k ∈ K) of I (partitioning I in pairwise disjoint subsets) the


(ii) For a partition
σ-fields σ Gi (k ∈ K) are independent, too.
i∈Jk

Example Ω = {0, 1}N , A = product σ-field, Xi (ω) = xi , where ω = (x1 , x2 , . . . ).


P probability measure on (Ω, A) such that the events {Xi = 1} (i = 1, 2, . . . ) are inde-
pendent.
Due to Theorem 7.3 (i) the events {Xi = 0} (i = 1, 2, . . . ) are independent, too.
Due to Theorem 7.3 (ii) the events Ak = {XkN +1 = x1 , . . . , X(k+1)N = xN }
(”in the period [kN + 1, (k + 1)N] the binary text (x1 , . . . , xn ) appears”)
(k = 0, 1, 2, . . . ) are independent, too.

Proof of Theorem 7.3:

(i) Take J = {i1 , . . . , ik } and Aij ∈ σ(Gij ).


Have to show: P (Ai1 ∩ . . . ∩ Aik ) = P (Ai1 ) · . . . · P (Aik ). (∗)
Fix Ai2 ∈ Gi2 , . . . , Aik ∈ Gik and let Di1 = {A ∈ σ(Gi1 ) | (∗) holds with Ai1 = A}.
Due to the hypothesis of the theorem, Gi1 ⊆ Di1 .
Further, Di1 is a λ-system: for instance, if A ∈ Di1 then
P (Ac ∩ Ai2 ∩ . . . ∩ Aik ) = P (Ai2 ∩ . . . ∩ Aik ) − P (A ∩ Ai2 ∩ . . . ∩ Aik ) =
(1 − P (A)) P (Ai2 ) · . . . · P (Aik ) ⇒ Ac ∈ Di1
| {z }
=P (Ac )
Due to Lemma 1.12, Di1 = σ(Gi1 ).
Now, let Di2 = {A ∈ σ(Gi2 ) | (∗) holds with Ai1 ∈ σ(Gi1 ), Ai2 = A, Ai3 , . . . , Aik }
and iterate the argument.

32
nT o
(ii) The collections Ck := Ai | J ⊆ Jk , J finite, Ai ∈ Gi (k ∈ K) are ∩-stable and
i∈J
independent.
T
For Ckj = Ai ∈ Ckj , we have
i∈Jkj
 T T  Q Q
P (Ck1 ∩ . . . ∩ Ckn ) = P Ai ∩ . . . ∩ Ai = P (Ai ) · . . . · P (Ai ) =
i∈Jk1 i∈Jkn i∈Jk1 i∈Jkn
P (Ck1 ) · . . . · P (Ckn )  S 
Due to (i), the generated σ-fields σ(Ck ) = σ Gi (k ∈ K) are independent,
i∈Jk
too. -

Lemma 7.4 (Borel-Cantelli-Lemma)

(i) (Ak )k≥1 sequence of events. Then



X \ [ 
P (Ak ) < ∞ ⇒ P Ak = 0.
k=1 n k≥n

(ii) (Ak )k≥1 sequence of independent events. Then


X∞ \ [ 
P (Ak ) = ∞ ⇒ P Ak = 1.
k=1 n k≥n

T S
Note that Ak is the event that for infinitely many k, Ak happens.
n k≥n

Remark 7.5 For an arbitrary sequence (An )n∈N of measurable sets in some measure space
we define
T S
lim sup An := Ak and
n→∞ n∈N T
k≥n
S
lim inf An := Ak .
n→∞ n∈N k≥n
It is an easy exercise to show that we have lim inf An ⊆ lim sup An and that
n→∞ n→∞
(lim sup An )c = lim inf Acn .
n→∞ n→∞
lim sup An = {ω ∈ Ω : ω ∈ An for infinitely many n ∈ N}.
n→∞
lim inf An = {ω ∈ Ω : ω ∈ An for all but finitely many n ∈ N}.
n→∞

Proof of Lemma 7.4:


S
(i) Let Bn = Ak .
k≥n T
Then we have lim sup An = Bn and thus lim sup An ⊆ Bn , ∀n ∈ N.
n→∞ n∈N n→∞

P ∞
P
n→∞
⇒ P (lim sup An ) ≤ P (Bn ) ≤ P (Ak ) −−−→ 0, since P (Ak ) < ∞
n→∞ k=n k=1
⇒ P (lim sup An ) = 0.
n→∞
 S 
(ii) It suffices to show that we have P Ak = 1, ∀n ∈ N or equivalently
k≥n
 T 
c
P Ak = 0, ∀n ∈ N.
k≥n

33
Due to Theorem 7.3 the events (Acn )n∈N
 are also m
independent and hence we have
 T  Thm. 1.17 (f) m
S Q
P Ack = lim P Ack = lim P (Ack ) =
k≥n m→∞ k=n m→∞ k=n
m
Q 1−x≤e−x Pm
lim (1 − P (Ak )) ≤ lim inf e− k=1 P (Ak )
= 0. -
m→∞ k=n m→∞

A very remarkable consequence of Theorem 7.3 is the following.

Theorem 7.6 (Kolmogorov’s 0-1-law)


Let (Ai )i∈I be a countable collection of independent σ-algebras. Further let
T S 
A∞ := σ Ai
J⊆I i∈J
/
|J|<∞
denote the corresponding tail σ-algebra (”terminale σ-Algebra”). Then we have
A ∈ A∞ ⇒ P (A) ∈ {0, 1}.

Interpretation of A∞

1) Dynamical:
If we interpret I as a sequence {1, 2, 3, . . . } of points in time and An as the σ-algebra
of all events observable at time n ∈ N, then we have

T  S  T∞
A∞ =: A∗ = σ Ak = σ(An , An+1 , . . . ).
n=1 k≥n n=1

A can be interpreted as the σ-algebra of all events observable ”in the infinitely distant
future”.
Example Let (Xn )n∈N be a sequence of random variables on (Ω, A).

T
We define An := σ(X1 , X2 , . . . , Xn ) then we have A∗ = σ(Xn , Xn+1 , . . . ).
  n=1
n Pn o Pn
k=1 Xk k=1 Xk
The events lim cn
≤ t , lim sup cn
≤ t for cn , t ∈ R with cn ր ∞ are
n→∞ n→∞

elements of A .
Due to Kolmogorov’s 0-1-law we have P (A) ∈ {0, 1}, ∀A ∈ A∗ if the random variables
X1 , X2 , . . . are independent.

2) Static:
We interpret I as the set of ”subsystems” which act independently of each other and
Ai as the σ-algbra of events which only depend on the i’th subsystem.
Then A∞ is the collection of all ”macroscopic” events which do not depend on finitely
many subsystems. Thus, if the subsystems are independent, we know that on this
”macroscopic scale” the whole system is deterministic.

Proof of Theorem 7.6:


S
Step 1 The collection of sets Aj (j ∈ J), Ai are independent for every finite set J ⊆ I.
i∈J
/ S 
Due to Theorem 7.3 we have that Aj (j ∈ J), σ Ai are also independent
 S  i∈J
(note that Ai is ∩-stable). Since A∞ ⊆ σ Ai for all finite sets J ⊆ I we have
i∈J
/
that Ai (i ∈ I), A∞ are independent.

34
S 
Step 2 Again with the help of Theorem 7.3 we can conclude that σ Ai and A∞ are
i∈I
independent.
S 
Step 3 Let A ∈ A∞ . Then A is also an element of σ Ai , since
! i∈I
T S  S
σ Ai ( σ Ai .
J⊆I i∈J i∈I
|J|<∞
Therefore, step 2 of this proof implies P (A) = P (A ∩ A) = P (A)P (A)
(A is independent of itself)
⇒ P (A) ∈ {0, 1}. -

Remark 7.7 Let (Ai )i∈I be a countable collection of independent σ-algebras and let
Ai ∈ Ai (i ∈ I).
T S
Then we have that Ā := Ai is an element of A∞ and thus P (Ā) ∈ {0, 1},
J⊆I i∈J
/
|J|<∞
and the Borel-Cantelli-Lemma can be used to decide whether P (Ā) = 1 or P (Ā) = 0.

Definition 7.8 A collection of random variables (Xi )i∈I is independent if the σ-algebra
generated by the random variables Xi are independent, i.e. if σ(Xi ) (i ∈ I) are indepen-
dent.

35
8 Kernels and Fubini’s Theorem
Let (Ω1 , F1 ) and (Ω2 , F2 ) be measure spaces.

Definition 8.1 A transition kernel or stochastic kernel K(x1 , dx2 ) from (Ω1 , F1 ) to
(Ω2 , F2 ) is a function K : Ω1 × F2 → [0, 1], (x1 , A2 ) 7→ K(x1 , A2 ) such that

i) K(x1 , ·) is a probability measure on (Ω2 , F2 ), ∀x1 ∈ Ω1 .

ii) K(·, A2 ) is (F1, B[0,1] )-measurable ∀A2 ∈ F2 .

Interpretations of kernels

1. Dynamic:
(Ωi , Fi ) = state space of period/time i.
K(x1 , ·) = law of the state at time 2 if state at time 1 was x1 .

2. Static:
(Ωi , Fi ) = state spaces of system number i.
K(x1 , ·) = law of the state of system number 2 given that system number 1 is in
state x1 .

Examples

1. K(x1 , ·) ≡ P2 , P2 probability measure on (Ω2 , F2) (”no information”)

2. K(x1 , ·) = δT (x1 ) where T : (Ω1 , F1) → (Ω2 , F2 ) (”full information”)

3. Markov chain
Ω1 = Ω2 = S, S countable.
F1 = F2 = P(S).
⇒ K is given by weights k(x, y) (x, y ∈ S) and k(x, y)x,y∈S is a stochastic matrix.
P
k(x, y) ≥ 0, ∀x, y ∈ S, k(x, y) = 1, ∀x ∈ S
y∈S
!
α 1−α
Example S = {0, 1}, (0 < α < 1)
1−α α
K(x, y) = αI{x=y} + (1 − α)I{x6=y} .

Let P1 be a probability measure on (Ω1 , F1 ) and K a stochastic kernel from (Ω1 , F1 ) to


(Ω2 , F2 ).
Goal: construct a probability measure P on (Ω, F ) where Ω = Ω1 × Ω2 , F = F1 × F2 such
that K(x1 , ·) can be interpreted as the conditional distribution of the second component,
given the first one.

36
A Discrete case
Ωi countable, Fi = P(Ωi ) (i = 1, 2).
⇒ Ω = Ω1 × Ω2 countable and the probability measure P on (Ω, F ) (F = P(Ω)) is given
by the weights p(x1 , x2 ) = p1 (x1 )k(x1 , x2 ), where p1 (x1 ) = P1 ({x1 }).
For each function f : Ω → [0, ∞], we  then have 
R P P P
f dP = f (x1 , x2 )p(x1 , x2 ) = f (x1 , x2 )k(x1 , x2 ) p1 (x1 ) =
x1 ,x2 x1 x2
R R 
f (x1 , x2 )K(x1 , dx2 ) P1 (dx1 ).
B General case
Let Ω = Ω1 × Ω2 , F = F1 × F2 .

Theorem 8.2 Assume P1 is a probability measure on (Ω1 , F1 ) and K a stochastic kernel


from (Ω1 , F1 ) to (Ω2 , F2 ). Then there is a probability measure P on (Ω, F ) such that
Z Z Z 
f dP = f (x1 , x2 )K(x1 , dx2 ) P1 (dx1 ) (8.1)

∀f : (Ω, F ) → (R̄, B̄), f ≥ 0.


In particular, Z
P (A) = K(x1 , Ax1 )P1 (dx1 ) (A ∈ F ), (8.2)

where Ax1 = {x2 ∈ Ω2 | (x1 , x2 ) ∈ A} and


Z
P (A1 × A2 ) = K(x1 , A2 )P1 (dx1 ) (A1 ∈ F1 , A2 ∈ F2 ). (8.3)
A1

P is uniquely determined by (8.3).

Proof: The uniqueness of P follows from Theorem 1.18 since the sets of the form A1 × A2
generate F and they are a ∩-stable collection of sets.
We show that the right hand side of (8.1) or (8.2), respectively, are well-defined.
Step 1 For x1 ∈ Ω1 , take ϕx1 (x2 ) := (x1 , x2 ).
∅ x1 ∈
/ A1
−1
Since ϕx1 (A1 × A2 ) =
A2 x1 ∈ A1 ,
ϕx1 : (Ω2 , F2 ) → (Ω, F ).
Hence, for any function f : (Ω, F ) → (R̄, B̄), the function fx1 = f ◦ ϕx1 or
fx1 (x2 ) = f (x1 , x2 ) is measurable: fx1 : (Ω2 , F2 ) → (R̄, B̄).
In particular, A ∈ F ⇒ Ax1 ∈ F2 , ∀x1 ∈ Ω1 .
R
Step 2 For f : (Ω, F ) → (R̄, B̄), f ≥ 0, the function x1 7→ f (x1 , x2 )K(x1 , dx2 ) is well-
defined due to Step 1.
We show that this function is (F1 , B̄)-measurable. Follow the definition of the
integral:
Take first f = IA , then f ≥ 0 with finitely many values, then f ≥ 0.
Let D be the collection of all A ∈ F , for which x1 7→ K(x1 , Ax1 ) is (F1 , B̄)-
measurable.

37
• D is a λ-system
• D contains all sets of the form A = A1 × A2 , since
K(x1 , Ax1 ) = IA1 (x1 )K(x1 , A2 ), hence x1 7→ K(x1 , Ax1 ) is (F1 , B̄)-measurable
as a product of two measurable functions.
⇒ D = F.

Step 3 Due to Step 2, the right hand side of (8.1) or (8.2), respectively, are well-defined.
(8.2) defines a probability measure on (Ω, F ):
R
(i) P (Ω) = K(x1 , Ω) P1 (dx1 ) = 1
| {z }
=1
∞  ∞
S S
(ii) A1 , A2 , . . . ∈ F , Ai ∩ Aj = ∅ for i 6= j ⇒ · Ai = · (Ai )x1
i=1 x1 i=1
∞  ∞
!
R [ ∞
S mon. P R
⇒ P · Ai = K x1 , · (Ai )x1 P (dx1 ) = K(x1 , (Ai )x1 )P (dx1)
i=1 conv. i=1
i=1
| P∞
{z }
i=1 K(x1 ,(Ai )x1 )

P
= P (Ai ).
i=1
⇒ P is a probability measure.
P satisfies (8.1) since for f = IA , (8.1) is (8.2) and for general f , proceed as in
the definition of the integral. -

The probability measure P in (8.2) is noted P1 × K.


If Xi (ω) = xi (ω = (x1 , x2 )) is the projection of Ω to Ωi (i = 1, 2), then X1 has the law
P1 : P (X1 ∈ A1 ) = P (A1 × Ω2 ) = P1 (A1 ), ∀A1 ∈ F1 .
The law of X2 is given by Z
P (X2 ∈ A2 ) = P (Ω1 × A2 ) = K(x1 , A2 )P1 (dx1 ), ∀A ∈ A2 .
| {z }
=:P2 (A2 )

Definition 8.3 We say that P1 and P2 are the marginals of P (”Randverteilungen”).


The stochastic kernel K(·, ·) is also denoted conditional distribution of X2 , given X1 and
we write
P (X2 ∈ A2 | X1 = x1 ) = K(x1 , A2 ) (x1 ∈ Ω1 , A2 ∈ F2 ) (8.4)
or P (X2 ∈ · | X1 = x1 ) = K(x1 , ·) = conditional distribution of X2 , given X1 (ω) = x1 .

If Ω1 is countable, the left hand side of (8.4) can be defined in an elementary way, namely
P (X2 ∈ A2 | X1 = x1 ) = P (XP2 ∈A 2 , X1 =x1 )
(X1 =x1 )
= P ({x 1 }×A2 )
P1 ({x1 })
(provided that P1 ({x1 }) > 0) and (8.4) can be proved:
(8.3)
P ({x1 } × A2 ) = P1 ({x1 })K(x1 , A2 ) ⇒ (8.4).
In general, this is not possible (for instance, P1 ({x1 }) can be = 0, ∀x1 ∈ Ω1 !) and then
we can take (8.4) as the definition of P (X2 ∈ A2 | X1 = x1 ).

38
Example 8.4 (Classical case)
Assume K(x1 , ·) ≡ P2 (”no information”).
Then we write P = P1 ⊗ P2 and P is the product (”Produktmaß”) of P1 and P2 .
We then have

P1 ⊗ P2 (A1 × A2 ) = P1 (A1 )P2 (A2 ) = P2 ⊗ P1 (A2 × A1 ) (8.5)

and (8.5) and (8.1) imply the classical

Fubini Theorem For f : (Ω, F ) → (R̄, B̄), f ≥ 0,


Z Z Z  Z Z 
f d(P1 ⊗ P2 ) = f (x1 , x2 )P2 (dx2 ) P1 (dx1 ) = f (x1 , x2 )P1 (dx1 ) P2 (dx2 )
(8.6)

Remark In fact, (8.6) remains true for σ-finite measures P1 and P2 .

Example 8.5 (Uniform distribution on [0, 1]2 )


Take (Ω1 , F1 ) = (Ω2 , F2 ) = ([0, 1], B[0,1] ),
P1 = λ|[0,1] = U[0, 1] = uniform distribution on [0, 1], K(x1 , dx2 ) ≡ λ|[0,1] = U[0, 1].
Then, X2 has the law P2 = P1 and X1 , X2 are independent.

Example 8.6 (Ω1 , F1 ), (Ω2 , F2), P1 as in Example 8.5., K(x1 , ·) = δx1 , ∀x1 .
Then, X2 = X1 P -a.s. and P2 = P1 = U[0, 1].
1
Example 8.7 (Ω1 , F1 ), (Ω2 , F2), P1 as in Example 8.5, K(x1 , dx2 ) = λ|
x1 [0,x1 ]
.
Rx1
K(x1 , ·) = x11 λ|[0,x1 ] = U[0, x1 ] (or K(x1 , A) = x11 IA (u) du)
0
We compute P2 :
R1
P2 (A2 ) = P (Ω × A2 ) = K(x1 , A2 ) dx1
0
R1 Rt R1 t
Take A2 = [0, t]. Then K(x1 , A2 ) dx1 = 1 dx1 + x1
dx1 = t − log t.
0 0 t
⇒ X2 has the cdf F2 (t) = t − t log t (0 ≤ t ≤ 1).

Remark X2 has the density f2 (x2 ) = − log x2 .


P is concentrated on the set {(x1 , x2 ) | 0 ≤ x2 ≤ x1 ≤ 1}.

Example 8.8 (Uniform distribution on A = {(x1 , x2 ) | 0 ≤ x2 ≤ x1 ≤ 1})


(Ω1 , F1 ), (Ω2 , F2 ) as in Example 8.5, P1 is the probability measure on (Ω1 , F1 ) with
density f1 (x1 ) = 2x1 and K(x1 , ·) = U[0, x1 ] as in Example 8.7.
R1
Then compute P2 : P2 (A2 ) = P (Ω1 × A2 ) = K(x1 , A2 )P1 (dx1 ).
0
Take A2 = [0, t].
R1 Rt R1 t
P2 (A2 ) = 2 K(x1 , [0, t])x1 dx1 = 2 x1 dx1 + 2 x
x1 1
dx1 = . . . = 2(t − 12 t2 )
0 0 t
⇒ the random variable X2 has the density f2 (x2 ) = 2(1 − x2 ).

39
Claim P is the uniform distribution on A, i.e. P (B) = 2λ2 (B ∩ A), ∀B ⊆ [0, 1]2 , B ∈
B[0,1]2 .

Proof: Suffices to consider B = B1 × B2 , B1 = [0, b1 ], B2 = [0, b2 ].


Assume b1 > b2 . Then
R Rb2 Rb1 Rb2 Rb2
P (B) = K(x1 , B2 )P1 (dx1 ) = 2 1P1 (dx1 ) + 2 xb21 P1 (dx1 ) = 2 x1 dx1 + 2 b2 dx1 =
B1 0 b2 0 b1
2( 21 b22 ) + (b1 − b2 )b2 = 2λ2 (B ∩ A). -

Example 8.9 (2-dimensional Gaussian law (”2-dim. Normalverteilung”))


R x2
1
(Ω1 , F1 ) = (Ω2 , F2 ) = (R, B), P1 = N(0, 1) i.e. P1 (A1 ) = √12π e− 2 dx1 (A1 ∈ B1 )
A1
R (x2 −ρx1 )2
1 −
Let 0 ≤ ρ ≤ 1 and K(x1 , ·) = N(ρx1 , 1 − ρ2 ) i.e. K(x1 , A2 ) = √ e 2(1−ρ2 )
2π(1−ρ2 )
A2

Claim P2 = N(0, 1) (no matter what ρ is!)


R∞
Proof: P2 (A2 ) = K(x1 , A2 )P1 (dx1 ). Take A2 = (−∞, t].
−∞
 
R∞ Rt 1
(x −ρx1 )2
− 2 1
x2
− 21
P2 (A2 ) = √ e 2(1−ρ ) dx
2
2 √ e dx1
2π(1−ρ2 ) 2π
−∞ −∞
t
∞ (x −ρx1 )2 +(1−ρ2 )x2

Fubini R R 1 − 2 1
1
= √ e 2(1−ρ ) 2 √ dx1 dx2
2π(1−ρ2 ) 2π
−∞
 ∞ −∞ 
t
Z 2 x2 (1−ρ2 )
R 1 − 1
(x −ρx 2 )
1 − 2 Rt x22
=  p e 2(1−ρ 2)
dx1 √2π e 2(1−ρ2 ) dx2 =
 √1 e− 2 dx2

−∞ 2π(1 − ρ2 ) −∞
−∞
| {z }
=1
x22
⇒ X2 has the density f2 (x2 ) = √1 e− 2 i.e. X2 has the law N(0, 1). -

Note that X1 , X2 are independent ⇔ ρ = 0.


ρ is the correlation (”Korrelationskoeffizient”).
ρ(x1 , x2 ) = √ Cov(X1 ,X2 ) , since V ar(X1 ) = V ar(X2 ) = 1 and
V ar(X1 )V ar(X2 )
(exercise!)
Cov(X1, X2 ) = E(X1 X2 ) − E(X1 ) E(X2 ) = ρ.
| {z } | {z } | {z }
=ρ =0 =0
!
1 ρ
P is the 2-dim. Gaussian law with expectation (0, 0) and covariance matrix .
ρ 1
Question: Does a probability measure P on (Ω1 × Ω2 , F1 × F2 ) always have the form
P1 × K with a probability measure P1 on (Ω1 , F1 ) and a stochastic kernel K from (Ω1 , F1 )
to (Ω2 , F2)?
Answer: Under (mild) regularity assumption on (Ω1 , F1 ), (Ω2 , F2): Yes!
In general: No!
See literature.

40
9 Absolute continuity, Radon-Nikodym derivatives
Recall that a real-valued random variable X on (Ω, A, P ) has the density f if

Zb Z
P (X ∈ [a, b]) = PX ([a, b]) = f (x) dx = f (x) dx.
a [a,b]

We will generalize this notion of density.


Let (Ω, A, µ) be a measure space. Recall that

E ∗ = {f : Ω → R̄ | f ≥ 0, f : (Ω, A) → (R̄, B̄)}.

Theorem 9.1 Let f ∈ E ∗ and define


Z
ν(A) := f dµ (A ∈ A). (9.1)
A

Then, ν is a measure with density f with respect to µ.

Proof: We have ν(∅) = 0, ν(A) ≥ 0, ∀A ∈ A. Remains to show that νis σ-additive.


 Take

S R S R m
P mon.
A = · An , An ∈ A, ∀n. Then ν(A) = f I · ∞ n=1 An
dµ = lim f IAn dµ =
n=1 m→∞ n=1 conv.
Pm ∞ R
P ∞
P
lim f IAn dµ = f IAn dµ = ν(An ). -
m→∞ n=1 n=1 n=1

If (Ω, A, P ) = (Rk , Bk , λk ) and f satisfies (9.1) we say that f is the Lebesgue-density of ν.


If ν = PX for some random variable X (ν is the law of X) we say that f is the Lebesgue-
density or density of X.
R
Theorem 9.2 Assume f ∈ E ∗ and define ν by ν(A) = f dµ (A ∈ A). Then
A
Z Z
a) ∀h ∈ E , ∗
h dν = hf dµ (9.2)

b) For h : (Ω, A) → (R̄, B̄), we have


h is integrable with respect to ν ⇔ hf is integrable with respect to µ
and in this case (9.2) holds.

Proof: see exercise 20. -


P
Example 9.3 Ω = Z, A = P(Z) and µ = δω counting measure on (Ω, A).
P ω∈Z R
Let f : Ω → [0, ∞] be such that f (ω) = 1 and define P by P (A) = f dµ (A ∈ A).
ω∈Z A
Then, P is a discrete probability measure with P ({ω}) = f (ω), ∀ω ∈ Z.
Let X : (Ω, A) → (R, B) be a random variable. If E(X) is defined,
Z Z X
Thm. 9.2
E(X) = X dP = Xf dµ = X(ω)f (ω).
ω∈Z

41
Theorem 9.4 (Uniqueness of densities)
Assume f, g ∈ E ∗ . Then:
R R
a) f = g µ-a.e. ⇒ f dµ = g dµ, ∀A ∈ A.
A A
R R
b) f integrable with respect to µ and f dµ = g dµ, ∀A ∈ A ⇒ f = g µ-a.e.
A A

Proof:

a) f = g µ-a.e. ⇒ f IA = gIA µ-a.e., ∀A ∈ A and the claim follows with Theorem 4.2.
R R
b) f dµ = g dµ ⇒ g is integrable as well.
Ω Ω
Define N := {f > g}, h := f IN − gIN ≥ 0. Then,
R R R
f IN dµ = gIN dµ ⇒ h dµ = 0 ⇒ h = 0 µ-a.e. ⇒ µ(N) = 0.
In the same way, µ({g > f }) = 0 ⇒ f = g µ-a.e. -

Example 9.5 b) does not hold without the assumption that f is integrable.
c
 countable, A = {A ⊆ Ω | A countable or A countable},
Take Ω not
0 A is countable
µ(A) =
∞ Ac is countable.
µ is a measure on (Ω, A).
Take f (ω) = 1, g(ω) = 2, ∀ω ∈ Ω.
R R
Then, f dµ = µ(A) = 2µ(A) = g dµ, ∀A ∈ A but µ({f 6= g}) = µ(Ω) = ∞.
A A

Question: For two measures ν, µ on (Ω, A), is there f : (Ω, A) → ([0, ∞], B̄) such that
R
ν(A) = f dµ, ∀A ∈ A?
A

Definition 9.6 Assume ν, µ are measures on (Ω, A). We say that ν is absolutely contin-
uous with respect to µ if ∀A ∈ A : µ(A) = 0 ⇒ ν(A) = 0.
We write ν ≪ µ. In words: ν ≪ µ means that each µ-nullset is also a ν-nullset.

Remark 9.7 If ν has a density f with respect to µ and µ(A) = 0, then


R
ν(A) = f IA dµ = 0 ⇒ ν ≪ µ.
|{z}
=0 µ-a.e.

Theorem 9.8 (Radon-Nikodym Theorem)


Assume ν and µ are σ-finite measures on (Ω, A).
Then the following two statements are equivalent:

a) ν has the density f with respect to µ.

b) ν ≪ µ.

f is also denoted Radon-Nikodym derivative and we write f = dµ
.

42
Proof:
a) ⇒ b) see Remark 9.7.
b) ⇒ a) will need the following lemma.

Lemma 9.9 Let η and τ be finite measures on (Ω, A) such that η(Ω) < τ (Ω). Then,
there is Ω∗ ∈ A with η(Ω∗ ) < τ (Ω∗ ) and η(A) ≤ τ (A), ∀A ∈ A with A ⊆ Ω∗ .

Proof: Define the function ψ = τ − η.


For A ∈ A, have −η(A) ≤ ψ(A) ≤ τ (A) ⇒ ψ : A → R is bounded.
Define the sets Ωn , An (n ∈ N) as follows:
A1 = ∅, Ω1 = Ω\A1 = Ω.
If A1 , . . . , An , Ω1 , . . . , Ωn are constructed, take
αn := inf{ψ(A) | A ∈ A, A ⊆ Ωn } (αn ≤ ψ(∅) = 0)

Case 1: αn = 0.
Then An+1 := ∅, Ωn+1 := Ωn \An+1 = Ωn ⇒ αk = 0, ∀k ≥ n.

Case 2: αn < 0.
Take An+1 ∈ A, An+1 ⊆ Ωn with ψ(An+1 ) < α2n and take Ωn+1 := Ωn \An+1 .
With this construction
∞ we
 have
A , . . . ∈ A, Ai ∩ Aj = 0 for i 6= j and
1 , A2
P∞ S ∞
S
|ψ(An )| ≤ τ · An + η · An < ∞.
n=1 n=1 n=1
⇒ lim ψ(An ) = 0 ⇒ lim αn = 0.
n→∞ n→∞

T

Define Ω := Ωn and show that the statements in Lemma 9.9 are satisfied
n=1
with Ω∗ . We have Ω1 ⊇ Ω2 ⊇ . . . ,
ψ(Ω∗ ) = τ (Ω∗ ) − η(Ω∗ ) = lim τ (Ωn ) − lim η(Ωn ) = lim ψ(Ωn ).
n→∞ n→∞ n→∞
Since ψ(Ωn+1 ) = ψ(Ωn ) − ψ(An+1 ) ≥ ψ(Ωn ) ≥ ψ(Ωn−1 ) ≥ . . . ≥ ψ(Ω1 ) = ψ(Ω),
| {z }
≤0
we have ψ(Ω∗ ) ≥ ψ(Ω) > 0 ⇒ η(Ω∗ ) < τ (Ω∗ ).
If A ∈ A, A ⊆ Ω∗ , we have A ∈ A, A ⊆ Ωn , ∀n ⇒ ψ(A) ≥ αn , ∀n.
But lim αn = 0 ⇒ ψ(A) ≥ 0. -
n→∞

Proof of Theorem 9.8: b) ⇒ a)

Case 1: ν(Ω) < ∞ and µ(Ω) < ∞.


R
G := {g ∈ E ∗ | g dµ ≤ ν(A), ∀A ∈ A} (g ≡ 0 ∈ G ⇒ G 6= ∅).
A R
If g, h ∈ G then max(g, h) ∈ G since max(g, h) dµ ≤
R R A
g dµ + h dµ ≤ ν(A ∩ {g ≥ h}) + ν(A ∩ {g < h}) = ν(A).
A∩{g≥h} A∩{g<h}
R
Define γ := sup{ g dµ | g ∈ G} (γ ≤ ν(Ω) < ∞).
R
Let (g̃n ) be a sequence in G such that lim g̃n dµ = γ.
n→∞
Let gn := max{g̃1 , . . . , g̃n } ⇒ gn ∈ G, ∀n.
R R R
Have g̃n dµ ≤ gn dµ ≤ γ ⇒ lim gn dµ = γ.
n→∞

43
R R
With monotone convergence lim gn dµ = lim gn dµ ≤ ν(A).
AR n→∞ n→∞ A
Hence: f := lim gn ∈ G and f dµ = γ.
n→∞
Claim: f is a density of ν with respect to µ.
R
Proof: Define τ on (Ω, A) by τ (A) := ν(A) − f dµ ≥ 0, ∀A ∈ A.
A
τ is a finite measure on (Ω, A) with τ ≪ µ (µ(A) = 0 ⇒ τ (A) = ν(A) = 0).
Assume τ (Ω) > 0.
τ (Ω)
τ ≪ µ ⇒ µ(Ω) > 0 ⇒ q := 2µ(Ω) > 0 ⇒ τ (Ω) = 2qµ(Ω) > qµ(Ω).
Apply Lemma 9.9 with τ and η = qµ and conclude that there is Ω∗ ∈ A with
τ (Ω∗ ) > qµ(Ω∗ ) and τ (A) ≥ qµ(A), ∀A ∈ A, A ⊆ Ω∗ .
Take f ∗ := f + qIΩ∗ . Then ∀A ∈ A:
R ∗ R R
f dµ = f dµ + qµ(A ∩ Ω∗ ) ≤ f dµ + τ (A) = ν(A) ⇒ f ∗ ∈ G.
A A A Z
R
Since 0 < qµ(Ω ) < τ (Ω ), f dµ = f dµ + qµ(Ω∗ ) and this is a contradiction
∗ ∗ ∗
| {z }
| {z } >0

to the definition of γ.
R
⇒ τ (Ω) = 0 ⇒ ν(A) = f dµ, ∀A ∈ A.
A

Case 2: ν, µ σ-finite
There are sequences (An ), (Bn ), An ∈ A, ∀n, Bn ∈ A, ∀n such that
A1 ⊆ A2 ⊆ . . . , An ր Ω, B1 ⊆ B2 ⊆ . . . , Bn ր Ω and
ν(An ) < ∞, ∀n, µ(Bn ) < ∞, ∀n.
Define Cn := An ∩ Bn , n ∈ N.
Then C1 ⊆ C2 ⊆ . . . , Cn ր Ω and ν(Cn ) < ∞, µ(Cn ) < ∞, ∀n.

S
Take En := Cn \Cn−1 . Then · En = Ω, ν(En ) < ∞, µ(En ) < ∞, ∀n.
n=1
Define for each n the finite measures νn , µn by
νn (A) := ν(A ∩ En ) and µn (A) := µ(A ∩ En ).
Then, dµ n

= IEn (∗) (Proof of (∗): exercise)
We have measurable functions fn : Ω → [0, ∞] such that 
∞ ∞ m
P Case 1 P R (∗) R P R
ν(A) = νn (A) = fn dµn = lim fn IEn dµ = f dµ
n=1 n=1 An m→∞ A n=1 A

P
with f := fn IEn .
n=1
⇒ ν has the density f with respect to µ. -

0 x<0
Example 9.10 Let F (x) =
1 − 1 e−x x ≥ 0
2
and let ν be the corresponding measure on (R, B) with ν((−∞, x]) = F (x), x ∈ R.
(ν could be for instance the law of the amount of rain falling at some place per month.)
Let λ be the Lebesgue measure on (R, B).
ν is NOT absolutely continuous with respect to λ since λ({0}) = 0, ν({0}) = 12 > 0.

Define µ := λ + δ0 . Then ν ≪ µ ⇒ there is a Radon-Nikodym derivative f = dµ .

44

0 x<0



Claim: f (x) = 1
2
x=0



 1 e−x
2
x>0
does the job.
 Rx 


0 = f (t) dt x < 0



 −∞


 
 Rx  Rx
Proof: ν((−∞, x]) = 12 = f (t) dt + 21 x=0 = f dµ, ∀x. -

 −∞ 
 −∞

 Rx 

1 − 21 e−x = 1
 
 f (t) dt + 2
x > 0

−∞

Example 9.11 Assume ν is a probability measure on (R, B) with ν ≪ λ. Then, the cdf
F of ν is absolutely continuous.
A function F is absolutely continuous if there is a measurable function f ≥ 0 such that
Rx Rx
F (x) = f (t) dt = f (t)λ(dt), ∀x ∈ R.
−∞ −∞
f is determined uniquely, λ-a.e.

Remark 9.12 There are probability measures on (R, B) such that ν(x) = 0, ∀x ∈ R
(⇒ the cdf F is continuous ) but ν 6≪ λ (⇒ F is not absolutely continuous).

Example Uniform distribution on the Cantor set C, see later.

Definition 9.13 Two measures µ and ν on (Ω, A) are singular with respect to each other
if there is a set A ∈ A such that µ(A) = 0, ν(Ac ) = 0. We write ν ⊥ µ.

Examples

1. ν uniform distribution on [0, 1], µ uniform distribution on [1, 2].


Then ν ⊥ µ (take A = [0, 1]).
P
2. µ a continuous measure on (R, B) (i.e. µ(x) = 0, ∀x ∈ R), ν = aq δq for some
q∈Q
sequence (aq )q∈Q , then ν ⊥ µ (take A = Q).

Theorem 9.14 (Lebesgue’s decomposition)


µ, ν measures on (Ω, A), ν finite measure. Then there are unique measures νa and νs on
(Ω, A) with

i) νa ≪ µ,

ii) νs ⊥ µ,

iii) ν = νa + νs .

We say that νa is the absolutely continuous part and νs the singular part of ν with respect
to µ.

45
Proof: Let Nµ = {A ∈ A | µ(A) = 0} and α := sup{ν(A) | A ∈ Nµ } and let (An ) be an
increasing sequence in Nµ such that lim ν(An ) = α.
n→∞

S
Define N := An . Then µ(N) = 0, ν(N) = α.
n=1
Define νa and νs by νa (A) = ν(A ∩ N c ), νs (A) = ν(A ∩ N).
Then, νa and νs are measures on (Ω, A) with ν = νa + νs .
Since νs (N c ) = 0 and µ(N) = 0, we have νs ⊥ µ.
We show that νa ≪ µ:
A ∈ A, µ(A) = 0 ⇒ N ∪(A · ∩ N c ) ∈ Nµ ⇒ ν(N ∪(A· ∩ N c )) = ν(N) + ν(A ∩ N c ) =
ν(B)≤α, ∀B∈Nµ
α + νa (A) ≤ α
⇒ νa (A) = 0.
Remains to show: uniqueness of the decomposition.
Assume ν = νa + νs = ν̃a + ν̃s with νa , νs as above, ν̃a ≪ µ, ν̃s ⊥ µ.
Since ν̃s ⊥ µ, there is a µ-nullset Ñ with ν̃s (Ñ c ) = 0, hence

ν̃s (A) = ν̃s (A ∩ Ñ), ∀A ∈ A. (9.3)

Then take N0 = N ∪ Ñ, then N0 ∈ Nµ and νa ≪ µ, ν̃a ≪ µ


⇒ νa (A ∩ N0 ) = ν̃a (A ∩ N0 ) = 0, ∀A ∈ A
⇒ νs (A ∩ N0 ) = ν̃(A ∩ N0 ), ∀A ∈ A.
Together with (9.3) this implies
(9.3) (9.3)
ν(A ∩ N0 ) = ν̃s (A ∩ N0 ) = ν̃s (A ∩ N0 ∩ Ñ ) = ν̃s (A ∩ Ñ) = ν̃s (A), ∀A ∈ A.
In the same way, ν(A ∩ N0 ) = νs (A)
⇒ νs = ν̃s ⇒ νa = ν̃a . -

46
10 Construction of stochastic processes
We saw that for a probability measure P1 on (Ω1 , F1) and a stochastic kernel from (Ω1 , F1 )
to (Ω2 , F2 ), there is a unique probability measure P = P1 × K on (Ω, F ) (Ω = Ω1 ×
Ω2 , F = F1 ×F2 ). On the other hand, under mild regularity assumptions, each probability
measure P on (Ω, F ) is of the form P1 × K when P1 is the marginal P X1−1 (where
ω = (x1 , x2 ), X1 (ω) = x1 , X2 (ω) = x2 ) and K is a suitable kernel from (Ω1 , F1) to
(Ω2 , F2 ).
We now describe two particular cases:

1. If Ω1 is countable,
 we take
P (X ∈ A | X = x ) P (X = x ) > 0
2 2 1 1 1 1
K(x1 , A2 ) =
ν2 (A2 ) otherwise
where ν2 is an arbitrary probability measure on (Ω2 , F2 ).
P
Then, P (A1 × A2 ) = P (X1 = x1 , x2 ∈ A2 ) =
P x 1 ∈A 1
P (X1 = x1 ) P (x2 ∈ A2 | X1 = x1 ) = (P1 × K)(A1 × A2 )
x1 ∈A1 | {z }
P (X1 =x1 )>0 =K(x1 ,A2 )
⇒ P = P1 × K, see the uniqueness statement after (8.3).

2. P is given by a Radon-Nikodym derivative f (x1 , x2 ) ≥ 0 with respect to a product


s
measure µ = µ1 ⊗ µ2 on Ω1 × Ω2 , i.e. P (A) = f (x1 , x2 )µ1 (dx1 )µ2 (dx2 )
A
⇒ P1 has the Radon-Nikodym derivative
R
f1 (x1 ) = f (x1 , x2 )µ2 (dx2 ) with respect to µ1 , i.e. f1 = dP
dµ1
1
.
dP2
R
In the same way, f2 = dµ2 where f2 (x2 ) = f (x1 , x2 )µ1 (dx1 ).
f (x1 ,x2 )
For x1 ∈ Ω1 , f1 (x1 ) 6= 0, we define the conditional density f2 (X2 | X1 ) = f1 (x1 )
(x2 ∈ Ω2 ). 
f (x | x )µ (·) f (x ) 6= 0
2 1 2 1
Let K(x1 , ·) =
ν2 (·) otherwise,
where ν2 is an arbitrary probability measure on (Ω2 , F2 ). Then
R
(P1 × K)(A1 × A2 ) = f1 (x1 )K(x1 , A2 )µ1 (dx1 )
R R f (xA11,x2 ) R R
= f1 (x1 ) f (x1 )
µ2 (dx2 )µ1 (dx1 ) = f (x1 , x2 )µ2 (dx2 )µ1 (dx1 )
A1 ∩{f1 >0} A1 A2
= P (A1 × A2 )
⇒ P = P1 × K, see the uniqueness statement after (8.3).

Example: 2-dimensional centered Gaussian law


Let ρ ∈ [0, 1), Ω = R2 , µ1 = µ2 = λ,
x2 2
1 −2ρx1 x2 +x2 x2 (x2 −ρx1 )2
− 1 −
f (x1 , x2 ) = √1 e 2(1−ρ2 ) = √1 e− 2 √ 1
e 2(1−ρ2 ) f1 (x1 )f (x2 | x1 ).
2π 1−ρ 2 2π 2π(1−ρ2 )
The kernel K(x1 , ·) is therefore given as the Gaussian law N (ρx1 , 1 − ρ2 ).
If ρ = 0, P = P1 ⊗ P2 .

47
Modelling the evolution of a stochastic system
Let(Ωi , Fi ) (i =0, 1, . . . ) be a sequence of measurable spaces. For n ≥ 0 take (Ω(n) , F (n) )
Qn Qn
= Ωi , Fi . Assume P0 is a probability measure on (Ω0 , F0 ) and for each n ≥ 1, Kn
i=0 i=0
is a stochastic kernel from (Ω(n−1) , F (n−1) ) to (Ωn , Fn ).

Interpretation (Ωi , Fi ) state space for time i, P0 = initial law, Kn (n = 1, 2, . . . )


”evolution laws”, Kn ((x0 , . . . , xn−1 ), An ) = probability that the system is at time n in An ,
given the history (x0 , . . . , xn−1 ).

By applying Theorem 8.2 n times, we get for each n ≥ 1 a probability measure P (n) on
(Ω(n) , F (n) ) with P (1) = P0 , P (n) = P (n−1) × Kn and we have ∀f ≥ 0, F (n) -measurable:
Z Z Z
(n)
f dP = · · · f (x0 , . . . , xn )Kn−1 ((x0 , . . . , xn−1 ), dxn ) · . . . · K1 (x0 , dx1 )P0 (dx0 ).
(10.1)
(n) (n) (n)
(Ω ,F ,P ) models the evolution of the system in the time 0, 1, . . . , n.

Goal: Model for infinite time horizon.



Q
Let Ω = Ωj = {ω = (x0 , x1 , . . . ) | xi ∈ Ωi } be the set of all possible trajectories
j=0
(”Menge aller möglichen Pfade”).
Xn (ω) = xn is the state at time n.
An = σ(X0 , . . . , Xn ) is the
 ∞σ-field
 ”containing all events observable until time n”.
S
A = σ(X0 , X1 , . . . ) = σ An .
n=0

Wanted: A probability measure P on (Ω, A) such that the restriction to (Ω(n) , F (n) ) is
P (n) . (Ω, A, P ) is then a model for the evolution of the system for infinite time horizon.
More precisely, A ∈ An is of the form A = A(n) × Ωn+1 × Ωn+2 × . . . with A(n) ∈ F (n) and
we want a probability measure P on (Ω, A) such that

P (A(n) × Ωn+1 × Ωn+2 × . . . ) = P (n) (A(n) ), ∀A(n) ∈ F (n) , n ≥ 0. (10.2)

Theorem 10.1 (Ionescu-Tulcea)


Given are P0 , a probability measure on (Ω0 , F0 ) and, for each n, a stochastic kernel Kn
from (Ω(n−1) , F (n−1) ) to (Ωn , Fn ).
Then, there is a unique probability measure P on (Ω, A) with (10.2) or (10.1), respectively.

S
Proof: (10.2) defines P on An , in a consistent way:
n=0
A ∈ An ∩ An−1 ⇒ A = A × Ωn+1 × Ωn+2 × . . . where A(n) = A(n−1) × Ωn with
(n)
R
A(n−1) ∈ F (n−1) ⇒ P (n) (A(n) ) = Kn ((x0 , . . . , xn−1 ), Ωn )P (n−1) (d(x0 , . . . , xn−1 )) =
A(n−1)
P (n−1) (A(n−1) )

48

 ∞

S S
Question: Can we extend P from the algebra An to the σ-field A = σ An ?
n=0 n=0

S
Uniqueness of the extension follows with Theorem 1.18 since An is ∩-stable and A =
∞  n=0
S S∞
σ An . P is additive on An .
n=0 n=0

S
To apply Theorem 1.18 we have to show that P is σ-additive on An .
n=0

S
Due to the Remark after Theorem 1.17, it suffices to show that An ∈ Ak , An ց ∅ ⇒
k=0
lim P (An ) = 0.
n→∞
Without loss of generality An ∈ An (n = 1, 2, . . . ) hence An = A(n) × Ωn+1 × Ωn+2 × . . . ,
A(n+1) ⊆ A(n) × Ωn+1 with A(n) ∈ F (n) (n = 1, 2, . . . ).
We assume that inf P (An ) > 0, and this will lead to a contradiction.
n
Now, since Z
R
P (An ) = · · · IA(n) (x0 , . . . , xn )Kn ((x0 , . . . , xn−1 ), dxn ) . . . K1 (x0 , dx1 ) P0 (dx0 ) =
| {z }
f0,n (x0 )
R
f0,n (x0 )P0 (dx0 ), there is some x̄0 ∈ Ω0 with inf f0,n (x̄0 ) > 0.
n
In the same way, with K1 (x̄0 , dx1 ) instead of P0 , there has to be x̄1 ∈ Ω1 with inf f1,n (x̄1 ) >
R R n
0, where inf · · · IA(n) ((x̄0 , x̄1 , x2 , . . . , xn )Kn ((x̄0 , x̄1 , x2 , . . . , xn−1 ), dxn ) . . .
n R
K2 ((x̄0 , x̄1 ), dx2 )K1 (x̄0 , dx1 ) = inf f1,n (x̄1 )Kn (. . . ) . . . K2 ((x̄0 , x̄1 ), dx2 ) > 0.
n
⇒ for each k ≥ 0 there is x̄k ∈ Ωk such that
R R
inf · · · IA(n) (x̄0 , . . . , x̄k , xk+1 , . . . , xn )Kn ((x̄0 , . . . , x̄k , xk+1 , . . . , xn−1 ), dxn ) . . .
n
Kk+1((x̄0 , . . . , x̄k ), dxk+1) > 0.
In particular,
A(k+1) ⊆A(k) ×Ωk+1
for n = k + 1, IA(k+1) (x̄0 , . . . , x̄k , ·) 6= 0 ⇒ (x̄0 , . . . , x̄k ) ∈ A(k) .
T
But now ω̄ = (x̄0 , x̄1 , . . . ) ∈ Ak , ∀k and this contradicts An = ∅. -
n

Definition 10.2 The sequence (Xn )n=1,2,... on the probability space (Ω, A, P ) is the
stochastic process with initial law P0 and evolution laws Kn (n = 1, 2, . . . ). We write
P (Xn ∈ An | X0 = x0 , . . . , Xn−1 = xn−1 ) = Kn ((x0 , . . . , xn−1 ), An ) (xi ∈ Ωi , An ∈ Fn ).
If K((x0 , . . . , xn−1 ), ·) = Kn (xn−1 , ·), ∀n for a kernel Kn from (Ωn−1 , Fn−1) to (Ωn , Fn ),
we say that (Xn )n=0,1,... is a Markov process.
If in addition (Ωn , Fn ) = (S, S) and Kn (·, ·) = K(·, ·), ∀n (for some measurable space
(S, S) and some kernel K from (S, S) to (S, S)) we say that (Xn )n=0,1,... is a time-
homogeneous Markov process with state space (S, S) and kernel K.

Example 10.3 S = R, S = B. Take β > 0. The kernel K from (S, S) to (S, S) is given
by K(x, ·) = N(0, βx2 ) and we take P0 = δx0 for some x0 6= 0.
(Xn )n=0,1,... is a system that ”approaches 0” with a variance which depends on the present
state.

49
Stability question:
For which values of β do we have Xn → 0 P -a.s.?

We have Z
2 (10.2)
R 2 (n)
E(Xn ) = xn P (dx0 , . . . , dxn ) = x2n K(xn−1 , dxn ) P (n−1) (dx0 , . . . , dxn−1 )
| {z }
=βx2n−1
2
= βE(Xn−1 ) ⇒ E(Xn2 )
= β x0  n
(n = 0, 1,
. . . ).

P ∞ ∞
mon. conv. P P
For β < 1, we conclude that E Xn2 = E(Xn2 ) < ∞ ⇒ Xn2 < ∞ P -a.s.
n=0 n=0 n=0
and in particular,
P (lim Xn = 0) = 1. (10.3)
n

For β < 1, we therefore have stable behaviour (in the sense of (10.3)).
q
R
π
This continues to be true for β < 2 : since |y|N(0, σ )(dy) = π2 σ, we have
2
RR
E(|Xn |) = |xn |K(xn−1 , dxn ) P (n−1) (dx0 , . . . , dxn−1).
| {z }
√ √
= β|xn−1 | π2
q
Hence E(|Xn |) = π2 βE(|Xn−1 |) and (10.3) follows with the same argument as above.
For 1 < β < π2 , we hence have E(Xn2 ) → ∞, Xn → 0 P -a.s.

Definition 10.4 If Kn ((x0 , . . . , xn−1 ), ·) ≡ Pn (n = 1, 2, . . . ) then the probability mea-


sure P , given by Theorem 10.1, is the (infinite) product measure (”Produktmaß”) with

Q
marginals Pn and we write P = Pn .
n=0
Pn is the n-th marginal or the law of Xn : Pn = P ◦ Xn−1 .

Theorem 10.5 P is the product of its marginals P ◦ Xn−1 if and only if X0 , X1 , . . . are
independent with respect to P .

Q
Proof: Let P̃ = Pn . Then, X0 , X1 , . . . are independent with respect to P ⇔
n=0
k
Q
P (X0 ∈ A0 , . . . , Xk ∈ Ak ) = P (Xi ∈ Ai ), ∀k, ∀A1 , . . . , Ak ∈ F1 , . . . , Fk .
i=0
k
Q
But P (Xi ∈ Ai ) = P̃ (X0 ∈ A0 , . . . , Xk ∈ Ak ). Hence X0 , X1 , . . . independent with
i=0
respect to P ⇔ P = P̃ . -

Example 10.6 (Independent coin tosses with success parameter p)


S = {0, 1}, S = P(S). Take p ∈ [0, 1].
µp probability measure on (S, S) with µp ({1}) = p = 1 − µp ({0}).

Q ∞
Q
Pp := µp is a probability measure on (Ω, A), Ω = S N , A = Si , Si = S, ∀i,
n=0 i=0
ω = (x0 , x1 , . . . ), Xi (ω) = xi .
Then X0 , X1 , . . . are independent under Pp with law µp . (We say that X0 , X1 , . . . are
i.i.d. (independent and identically distributed).)
Pn Pn
Pp (X0 = x0 , . . . , Xn = xn ) = p i=0 xi (1 − p)n− i=1 xi .

50
Hence, for 0 < p < 1, Pp ({ω}) = 0, ∀ω ∈ Ω.
n
P
See later: for Sn = xi , Snn → p Pp -a.s.
i=0
n
P
This implies that for p 6= q ⇒ Pp ⊥ Pq (take A = {ω : lim n1 xi = q}, then
n i=1
Pp (A) = 0, Pq (Ac ) = 0). Let An = σ(X0 , . . . , Xn ).
We write Pp |An for the restriction of Pp to (Ω, An ).
Then, Pp |An ≪ Pq |An , ∀p, q ∈ (0, 1).

Example 10.7 0 ≤ α, β, γ ≤ 1. Markov chain with state space S = {0, 1}, initial law
!
α 1−α
µ = (γ, 1 − γ), µ({0}) = γ, µ({1}) = 1 − γ and the stochastic kernel K = .
β 1−β
Then P (X0 = 0) = γ = 1 − P (X0 = 1),
P (Xn+1 = 0 | Xn = 0) = α = 1 − P (Xn+1 = 1 | Xn = 0),
P (Xn+1 = 0 | X1 = 1) = β = 1 − P (Xn+1 = 1 | Xn = 1).
Due to Theorem 10.1, this determines uniquely a probability measure P on (Ω, A) where

Q
Ω = S N, A = Si , Si = P(S), ∀i, as above.
i=0
Example 10.6 is a particular case with α = β = γ = 1 − p.

We finally consider a process which is not a Markov process.

Example 10.8 (Polya’s urn)


In an urn we have a white and a black ball. We draw a ball at random and replace it
with two balls of the colour which was drawn. 
1 i-th ball drawn is white
N
Ω = {0, 1} , ω = (x1 , x2 , . . . ), Xi (ω) =
0 i-th ball drawn is black,
1
P (Xi = 1) = = P (Xi =
2  0),
2 x = 1
1
P (X2 = 1 | X1 = x1 ) = 3
 1 x1 = 0.
3
Pn
Let Sn = Xi . After n balls drawn, n + 2 balls are in the urn,
i=1 Pn
i x +1
P (Xn+1 = 1 | X1 = x1 , . . . , Xn = xn ) = i=1
n+2
= Kn ((x1 , . . . , xn−1 ), 1).
With Theorem 10.1, this uniquely determines a probability measure on (Ω, A).
Sn +1
Question: Long-time behaviour of n+2
= proportion of white balls?

51
11 The law of large numbers
(Ω, A, P ) probability space.

Lemma 11.1 (Jensen’s inequality)


X ∈ L1 , g : R → R convex function. Then, g(X) is semi-integrable and

E(g(X)) ≥ g(E(X)) (11.1)

If g is strictly convex and X is not P -a.s. constant, we have ”>” in (11.1).

Proof: g is convex (strictly convex) if and only if there is for each x0 ∈ R a linear function
l(x) = ax + b such that l(x0 ) = g(x0 ) and l(x) ≤ g(x), ∀x 6= x0 (l(x) < g(x), ∀x 6= x0 ).
l linear
Take x0 = E(X), then E(g(X)) ≥ E(l(X)) = l(E(X)).
| {z }
g(E(X))
If we have ”=” in the above argument then l(X) = g(X) P -a.s.
(since for Y = g(X) − l(X), Y ≥ 0 and E(Y ) = 0 ⇒ Y = 0 P -a.s.).
If g is strictly convex, this implies X = x0 = E(X) P -a.s. -

Theorem 11.2 (Strong law of large numbers)


n
P Sn
X1 , X2 , . . . i.i.d. and X1 ∈ L1 . Sn := Xi . Then, n
→ E(X1 ) P -a.s.
i=1

Remark Due to Theorem 6.2 (ii), Theorem 11.2 implies the weak law of large numbers
which says that Snn → E(X1 ) in probability.

Proof of Theorem 11.2: Under the assumption E(X14 ) < ∞


Jensen
1) E(X14 ) ≥ E(|X1 |)4 ⇒ X1 ∈ L1 .

2) Without loss of generality E(X1 ) = 0 (otherwise, consider X̃i = Xi − E(Xi )).


Take ε > 0.
Markov inequ.
P (| Snn | ≥ ε) ≤ 1
ε4
E(( Snn )4 )
f (x)=x4
= ε41n4 E((X1 + . . . + Xn )(X1 + . . . + Xn )(X1 + . . . + Xn )(X1 + . . . + Xn ))
= ε41n4 (nE(X14 ) + 4n(n − 1)E(X13)E(X2 ) + 3n(n − 1)E(X12 )E(X22 )
+ 6n(n − 1)(n − 2)E(X12 )E(X2 )E(X3 )
+ n(n − 1)(n − 2)(n − 3)E(X1 )E(X2 )E(X3 )E(X4 )).
Jensen
But E(Xi ) = 0, ∀i, and E(Xi2 )2 ≤ E(Xi4 ).
Hence P (| Snn | ≥ ε) ≤ ε14 n14 (nE(X14 ) + 3n(n − 1)E(X14 )) ≤ C(ε)
n2
.

P Borel-
Hence P (| Snn | ≥ ε) < ∞ ⇒ P (| Snn | ≥ ε for infinitely many n) = 0.
n=1 Cantelli
Since ε > 0 was arbitrary, we conclude that | Snn | → 0 P -a.s. -

As an application, we go back to Example 10.3.

52
Example 10.13 (continuation)
P0 := δx0 where x0 6= 0, K(x, ·) = N(0, βx2 ).

The process (Xn )n=0,1,... can also be written as X0 = x0 , Xn+1 = β|Xn |Yn+1 ,
n = 0, 1, . . . where Y1 , Y2 , . . . are i.i.d. with law N(0, 1).

Then, |Xn | = ( β)n |Yn ||Yn−1| . . . |X0 |
Pn √ Pn
⇒ log |Xn | = (log(|Yi |) + log( β)) + log(|x0 |) = log |x0 | + Zi where Zi , i = 1, 2, . . .
i=1 √ √ i=1
are i.i.d., Zi = log |Yi | + log β, E(Z1 ) = E(log |Y1 |) + log β.

We define βc by − log βc = E(log |Y1 |).
Thm. 11.2
Then, if β > βc , n1 log |Xn | −−−−−−→ E(Z1 ) > 0 P -a.s. ⇒ |Xn | → ∞ P -a.s.
If β < βc , n1 log |Xn | → E(Z1 ) < 0 P -a.s. ⇒ |Xn | → 0 P -a.s.
(∗)
Remark βc = exp(−2E(log |Y1|)) > exp(−2 log E(|Y1 |)) = π2 .
| {z }
√2
π

(∗) x 7→ log x concave → exercises.

Corollary 11.3 Assume X1 , X2 , . . . i.i.d. and X1 is semi-integrable with E(X1 ) = ∞.


Then, Snn → +∞ P -a.s.

Proof: Take X̃i = Xi ∧ M. Then, X̃1 , X̃2 , . . . are i.i.d. and S̃nn → E(X̃1 ) P -a.s.
But: for each K > 0 there is M(K) = M such that E(X̃1 ) = E(X1 ∧ M) ≥ K.
Hence, lim inf Snn ≥ lim inf n S̃nn ≥ K P -a.s.
n
Sn Sn
Since K was arbitrary, this implies lim inf n
= ∞ P -a.s. ⇒ n
→ ∞ P -a.s. -
n

53
12 Weak convergence, characteristic functions and the
central limit theorem
Recall that for X1 , X2 , . . . i.i.d. with P (Xi = 1) = p, P (Xi = 0) = 1 − p,
Pn
Sn = Xi , Sn∗ = √Sn −np , we have
i=1 np(1−p)

n→∞
P (Sn∗ ≤ x) −−−→ Φ(x), (12.1)
Rx t2
where Φ(x) = √1 e− 2 dt (de Moivre-Laplace Theorem).

−∞

Goal:

1.) Generalize (12.1) for random variables with different laws.

2.) Interpretation of (12.1) as a convergence statement for the law of Sn∗ .

We often assume that S is a Polish space, i.e. a complete separable metric space.
We always equip S with its Borel-σ-field S (S is generated by the open subsets of S).

Definition 12.1 (Weak convergence)


Let S be a Polish space and let µ, µ1, µ2 , . . . be probability measures on (S, S).
Then (µn ) converges weakly to µ for n → ∞ if for each bounded, continuous function
f : S → R (we write f ∈ Cb (S)),
Z Z
f dµn → f dµ (12.2)

w
We write µn −
→ µ.

Theorem 12.2 (Portemanteau-Theorem)


The following statements are equivalent:
w
i) µn −
→ µ.
R R
ii) f dµn → f dµ for all f : S → R, f uniformly continuous and bounded.

iii) lim sup µn (F ) ≤ µ(F ) for all closed sets F .


n→∞

iv) lim inf µn (G) ≥ µ(G) for all open sets G.


n→∞

v) lim µn (A) = µ(A) for all sets A whose boundary is not charged by µ (”µ-randlose
n→∞
Mengen”) i.e. ∀A ∈ S with µ(Ā\Å) = 0 where Ā is the closure of A and Å its
interior.

Proof:
(i) ⇒ (ii) clear.

54
(ii) ⇒ (iii) F closed, ε > 0 ⇒ ∃f uniformly continuous, 0 ≤ f ≤ 1, f (u) = 1,
∀u ∈ F, f (u) = 0, ∀u ∈ Uε (F )c , where Uε (F ) = {s : d(s, F ) < ε}
(take for instance f (u) = (1 − 1ε d(u, F )) ∨ 0).
R (ii) R R
Then lim sup µn (F ) ≤ lim sup f dµn = f dµ ≤ IUε (F ) dµ = µ(Uε (F )).
n n
ε→0
Since F is closed, Uε (F ) ց F , hence µ(Uε (F )) ց µ(F ).
⇒ lim sup µn (F ) ≤ µ(F ).
n
(iii) ⇔ (iv) ”⇒”lim inf µn (G) = 1 − lim sup µn (Gc ) ≥ 1 − µ(Gc ) = µ(G).
n n
”⇐” follows in the same way.
(iii)+(iv) ⇒ (v) If the boundary of A is not charged by µ, then
µ(A) = µ(Å) ≤ lim inf µn (Å) ≤ lim sup µn (Ā) = µ(Ā) = µ(A).
n n
(v) ⇒ (i) Take f ∈ Cb (S), ε > 0. Choose c1 , . . . , cm such that ck − ck−1 < ε,
µ({f = ck }) = 0, ∀k, hence Ak = {ck−1 < f < ck } are sets whose boundary is not charged
Pm
by µ. Take g := ck IAk . Then, kg − f k ≤ ε (where kg − f k = sup |f (s) − g(s)|).
R R k=1 s∈S
But g dµn → g dµ due to (v).
R R
⇒ lim sup | f dµn − f dµ| ≤ 2ε. -
n

Definition 12.3 (Convergence in law)


A sequence of random variables (Xn ) with values in a Polish space S converges in law
(”in Verteilung”) to a random variable X if the laws of Xn converge weakly to the law of
d
X. We write Xn − → X (”(Xn ) konvergiert in Verteilung gegen X”).
d
Hence, Xn − → X ⇔ E(f (Xn )) → E(f (X)), ∀f ∈ Cb (S).

Lemma 12.4 Assume (Xn ) is a sequence of real-valued random variables which converges
to 0 in probability.
d
Then, Xn − → 0, i.e. the laws of (Xn ) converge weakly to the Dirac measure in 0.

Proof: Let µn be the law of Xn , n = 1, 2, . . . .


/ F . Then, there is ε > 0 so that (−ε, ε) ⊆ F c .
1. Take F closed so that 0 ∈
n→∞
⇒ µn (F ) = P (Xn ∈ F ) ≤ P (|Xn | ≥ ε) −−−→ 0 = δ0 (F ).

2. Let F be a closed set so that 0 ∈ F . Then, obviously, lim sup µn (F ) ≤ 1 = δ0 (F ).


n

Thm. 12.2 (iii) w


3. Hence, for all cosed sets F, lim sup µn (F ) ≤ δ0 (F ) ⇒ µn −
→ δ0 . -
n

Lemma 12.5 Assume (S1 , d1 ) and (S2 , d2 ) are Polish spaces with metrics d1 and d2 and
h : S1 → S2 a continuous function. Then:
(a) For probability measures µ, µ1 , µ2 , . . . on (S1 , S1 ) with µn → µ, we have
w
(µn )h = µn h−1 −
→ µ ◦ h−1 = µh .
d
(b) Assume S1 = S2 = R and X, X1 , X2 , . . . are random variables with Xn −
→ X.
d
Then, h(Xn ) −
→ h(X).

55
Proof:
R n→∞ R
(a) We have to show that for all f ∈ Cb (S2 ), f d(µh ◦ h−1 ) −−−→ f d(µ ◦ h−1 ).
But, due to Theorem 3.6,
R R n→∞ R R
f d(µh ◦ h−1 ) = (f ◦ h) dµn −−−→ (f ◦ h) dµ = f d(µ ◦ h−1) since f ◦ h ∈ Cb (S1 ).

(b) follows from (a) with µ = law of X, µn = law of Xn . -

12.1 Characteristic functions


We can integrate complex functions by decomposing into real and imaginary part, i.e.
E(X + iY ) = E(X) + iE(Y ), where X, Y are real-valued and in L1 .

Definition 12.6 (Characteristic function)


The characteristic function of a probability measure µ on (Rd , Bd ) is the function
µ̂ : Rd → C defined by
Z Z
ihx,yi
µ̂(x) = e µ(dy) = cos(hx, yi)µ(dy) + i sin(hx, yi)µ(dy) (x ∈ Rd ), (12.3)

where h·, ·i is the scalar product on Rd .


The characteristic function ϕX of a random variable X is the characteristic function of
the law of X. We write ϕX instead of P\ ◦ X −1 = P̂X , i.e. ϕX (z) = E(eihz,Xi ).

Example 12.7 (d = 1)
1. For c ∈ R and µ = δc , µ̂(x) = eixc , x ∈ R.

P ∞
P
2. For a discrete law µ = αk δck with αk > 0 we have µ̂(x) = αk eick x
k=1 k=1
(note that (12.3) is linear in µ).
n
P 
n
In particular, for µ = Bin(n, p), µ = k
pk (1 − p)n−k δk .
k=0
n
P  k
n
⇒ µ̂(x) = k
p (1 − p)n−k eikx = (1 − p + peix )n .
k=0
In the same way, if µ is Poisson with parameter λ,

P k

P k ix
µ= e−λ λk! δk ⇒ µ̂(x) = e−λ λk! eikx = eλ(e −1) .
k=0 k=0
1 2
3. For µ = N(0, 1), µ̂(x) = e− 2 x , x ∈ R.
R∞ − 1 y2 ixy
Proof: µ̂(x) = √12π e 2 e dy and one can calculate this integral.
−∞
x2
Let ψ(x) = µ̂(x), ϕ(x) = √12π e− 2 .
ϕ is a solution of the differential equation
ϕ′ (x) + xϕ(x) = 0. (12.4)
R∞
(Check!) The same holds true for ψ. More precisely, ψ(x) = eixy ϕ(y) dy
−∞
Z∞
⇒ ψ ′ (x) = i yeixy ϕ(y) dy (12.5)
−∞

56
and, integrating by parts,
Z∞ Z∞ Z∞
xψ(x) = xe ixy
ϕ(y) dy = −ieixy ϕ(y)|∞
| {z } |{z} −∞ + i eixy ϕ′ (y) dy = i eixy ϕ′ (y) dy
−∞ ր ց −∞ −∞
(12.6)
(since ϕ(|x|) → 0 for |x| → ∞).
Due to (12.4), ϕ′ (y) + yϕ(y) = 0, hence (12.5) and (12.6) imply

ψ ′ (x) + xψ(x) = 0. (12.7)



Now, 2πϕ and ψ solve the same linear differential equation (see (12.4), (12.7))
and they both take the value 1 in x = 0.
√ 1 2
⇒ ψ = 2πϕ i.e. µ̂(x) = e− 2 x , x ∈ R. -
2 imx− 12 σ2 x2
Similarly, one can show that for µ = N(m, σ ), µ̂(x) = e , x ∈ R.

4. Let µ be the Cauchy distribution with parameter c > 0, i.e. µ has the density
f (x) = π(c2c+x2 ) . Then, µ̂(x) = e−c|x| , see literature.
ix
5. µ = U[0, 1]. Then, µ̂(x) = e ix−1 .
R1
Proof: µ̂(x) = eixy dy = ix1 eixy |10 = 1
ix
(eix − 1). -
0

αe−αx x≥0
6. µ = exp(α) i.e. µ has the density f (x) =
0 x < 0.
α
Then, µ̂(x) = α−ix
, x ∈ R.
R∞ ixy −αy R∞ α
Proof: µ̂(x) = e αe dy = αe−(α−ix)y dy = α−ix
.
0 0

The characteristic function is an important tool in probability for the following reasons:

(1) The mapping X 7→ ϕX has nice properties under transformations and sums of i.i.d.
random variables.

(2) ϕX characterizes the law of X uniquely.

(3) Pointwise convergence of the characteristic function is (under mild additional assump-
tions) equivalent to the weak convergence of the corresponding measures.

At (1):

Lemma 12.8 For a random variable X with values in Rd and characteristic function ϕX
we have

a) For each y ∈ Rd , |ϕX (y)| ≤ 1 and ϕX (0) = 1.

b) ϕX is uniformly continuous.

c) For all a, b ∈ Rd , ϕaX+b (y) = ϕX (ay)eihb,yi .

57
d) ϕX (·) is real-valued if and only if PX = P−X i.e. if the law of X equals the law of −X.

e) If X and Z are independent, then ϕX+Z = ϕX · ϕZ

Proof:

a) Clear from the definition of ϕX .


R R
b) |ϕX (y1 ) − ϕX (y2 )| = |eihy1 ,xi − eihy2 ,xi |PX (dx) = |eihy1 ,xi | |1 − eihy2 −y1 ,xi | PX (dx).
| {z } | {z }
≤1 ≤ε if |y1 −y2 |≤δ(ε)
R R R
c) ϕaX+b (y) = eihy,zi PaX+b (dz) = eihy,az+bi PX (dz) = eihb,yi eihz,ayi PX (dz)
= eihb,yi ϕX (ay).
R R R
d) ϕX (y) = eihy,zi PX (dz) = coshy, ziPX (dz) + i sinhy, ziPX (dz) is real-valued
R R
⇔ sinhy, ziPX (dz) = − sinhy, ziPX (dz)
R
⇔ (sinhy, zi + sinhy, −zi)PX (dz) = 0, ∀y
⇔ PX = P−X .
Independence
e) ϕX+Z (y) = E(eihX+Z,yi ) = E(eihX,yi eihZ,yi ) = E(eihX,yi )E(eihZ,yi )
of X and Z
= ϕX (y) · ϕZ (y).

The moments of X can be calculated from its characteristic function.

Lemma 12.9 Let X be a real-valued random variable with characteristic function ϕX .


Then

(a) If E(|X|k ) < ∞, ϕX is k-times continuously differentiable and the derivatives are
(j)
given by ϕX (t) = E((iX)j eitX ), j = 0, 1, . . . , k.

(b) If E(X 2 ) < ∞ then


ϕX (t) = 1 + itE(X) − 21 t2 E(X 2 ) + o(t2 ) for t → 0.

We will use (b).

Proof of (b): We use the following analytical estimate:


n  
ix
X (ix)m |x|n+1 2|x|n
e − ≤ min , (12.8)
m=0
m! (n + 1)! n!

(see R. Durrett, Probability: Theory and Examples)


Take x = tX andtake expectations
  
Pn m n
P m
  n+1 
itX (itX) itX (itX) |tX| 2|tX|n
⇒ E(e ) − E m!
≤E e − m!
≤ E min (n+1)! , n!
m=0 m=0
tn
= (n+1)! E(min(|t||X|n+1, 2(n + 1)|X|n)).
For n = 2, we have therefore
|E(eitX ) − (1 + itE(X) − 21 t2 E(X 2 )) ≤ 61 t2 E(min(|t||X|3, 6|X|2)).
t→0
But |t||X|3 ∧ 6|X|2 ≤ 6|X|2 and |t||X|3 ∧ 6|X|2 −−→ 0
Lebesgue’s n→∞
⇒ E(min(|t||X|3, 6|X|2)) −−−→ 0. -
Theorem

58
At (2):

Theorem 12.10 Assume µ1 , µ2 are probability measures on (Rd , Bd ) with µ̂1 = µ̂2 . Then,
µ1 = µ2 .

Proof: Since the compact sets generate the Borel-σ-field, it suffices to show that
µ1 (K) = µ2 (K), ∀K ⊆ Rd , K compact.
Assume K compact and let d(x, K) = inf{d(x, y) | y ∈ K} be the distance from x to K
(x ∈ Rd ). 
1


 x ∈ K,
For m ∈ N define fm : Rd → [0, 1] by fm (x) = 0 d(x, K) ≥ m1 ,



1 − md(x, K) otherwise.
Then, fm is continuous, has values in [0, 1] and compact support and fm ց IK for m → ∞.
R R
With monotone convergence, we see that it suffices to show that fm dµ1 = fm dµ2 , ∀m
(because we then conclude µ1 (K) = µ2 (K)). Fix m.
Take ε > 0 and choose N large enough such that BN := [−N, N]d contains the set
{x ∈ Rd | fm (x) 6= 0} and such that µ1 (BN c
) ≤ ε and µ2 (BN c
) ≤ ε.
Using the Fourier convergence Theorem, there is a function g : Rd → C of the form
P n 2π
g(x) = cj eih 2N tj ,xi , where n ∈ N, c1 , . . . , cn ∈ C and t1 , . . . , tn ∈ Zd , such that
j=1
sup |g(x) − fm (x)| ≤ ε.
x∈BN
We conclude that sup |g(x)| ≤ 1 + ε. Now we can estimate
x∈Rd

Z Z Z Z Z Z
fm dµ1 − fm dµ2 ≤ fm dµ1 − g dµ1 + g dµ1 − g dµ2
Z Z
+ g dµ2 − fm dµ2 . (12.9)

Since µ̂1 = µ̂2 , the second term in (12.9) vanishes.


Since |g(x)| ≤ 1 + ε, the first term in (12.9) can be estimated as follows:
R R R R R
| fm dµ1 − g dµ1| ≤ |fm − g| dµ1 + |fm | dµ1 + |g| dµ1
BN c
BN c
BN
c
≤ εµ1(BN ) + (1 + ε)µ1(BN≤ ε + (1 + ε)ε = ε(2 + ε).
)
The last term in (12.9) is ≤ ε(2 + ε), in the same way.
R R
Since ε > 0 was arbitrary, we conclude that fm dµ1 = fm dµ2 , ∀m. -

At (3):

Theorem 12.11 (Continuity Theorem)


Assume µ1 , µ2, . . . are probability measures on (R, B) with characteristic functions
ϕ1 , ϕ2 , . . . . Then
w
(a) If µn −−−→ µ for some probability measure µ on (R, B), then (ϕn ) converges pointwise
n→∞
n→∞
to the characteristic function ϕ of µ, i.e. ϕn (t) −−−→ ϕ(t), ∀t ∈ R.

59
(b) If (ϕn ) converges pointwise to a function ϕ : R → C and ϕ is continuous at 0, then
there is a probability measure µ on (R, B) such that ϕ is the characteristic function
w
of µ and µn −→ µ.

For the proof, we will need the improtant notion of tightness (”Straffheit”).

Definition 12.12 A sequence of probability measures on a Polish space (S, S) is tight


(”straff”) if there is, ∀ε > 0, a compact set Kε ⊆ S such that sup µn (Kεc ) ≤ ε.
n

Theorem 12.13 (Prohorov’s Theorem)


(S, S) Polish space.
(i) (µn ) tight ⇒ each subsequence of (µn ) has a weakly convergent subsequence.

(ii) ”⇐” holds as well.

Proof: see P. Billingsley, Convergence of Probability Measures. -

Proof of Theorem 12.11:


(a) t → eitx is continuous and bounded, hence
R n→∞ R
ϕn (t) = eitx µn (dx) −−−→ eitx µ(dx) = ϕ(t).

(b) We show that (µn ) is tight.


Ru Ru 2 sin ux
(1 − eitx ) dt = 2u − (cos tx + i sin tx) dt = 2u − x
.
−u −u
Devide both sides by u and integrate with µn ,
1
Ru R R
u
(1 − ϕn (t)) dt = 2 (1 − sinuxux )µn (dx) ≥ 2 (1 − 1
|ux|
)µn (dx)
−u 2
{|x|≥ u }
2
≥ µn ({x : |x| > u
}).
t→0 1
Ru u→0
Since ϕn (t) −−→ 1 and ϕ is continuous in 0, we have u
(1 − ϕ(t)) dt −−→ 0.
−u
1
Ru
Choose u small enough so that u
(1 − ϕ(t)) dt < ∞.
−u
Ru
We show that µn ({x : |x| > u2 }) ≤ 1
u
(1 − ϕn (t)) dt.
−u
Since ϕn (t) → ϕ(t), ∀t we conclude with dominated convergence that for
Ru
n ≥ N0 (ε), u1 (1 − ϕn (t)) dt ≤ 2ε and this proves that (µn ) is tight.
−u
Now, with Theorem 12.13 (i): each subsequence of (µn ) has a weakly convergent
subsequence and this subsequences al converge to part a).
w
There µn −
→ µ. -

Example 12.14 Let µn be the uniform distribution on [−n, n], i.e. µn = U[−n, n]. Then,
1
Rn itx 1 1 1
ϕn (t) = 2n e λ(dx) = 2n ti
(eitn − e−itn ) = 2nt (sin tn − sin(−tn)) = sintntn , t ∈ R\{0}.
−n 
1 t = 0
n→∞
Hence ϕn (t) −−−→ ϕ(t), ∀t, where ϕ(t) =
0 else.

60
In particular, ϕ is not continuous in 0.
In fact, (µn ) does not converge weakly, see exercises.

Remark (µn ) is not tight.

Corollary 12.15 Let α > 0. For each n ∈ N consider i.i.d. random variables X1 , . . . , Xn .
Bernoulli with p = αn , i.e. P (Xi = 1) = αn = 1 − P (Xi = 0).
n
P
Sn = Xi has the law Bin(n, αn ).
i=1
⇒ the characteristic function ϕSn of Sn is given by ϕSn (t) = ϕn (t) = (1 − αn + αn eit )n , see
Example 12.7.2.
it it
ϕn converges to eα(e −1) for n → ∞ and ϕ(t) = eα(e −1) is the characteristic function of
the Poisson distribution with parameter α, see Example 12.7.2.
According to Theorem 12.11, the laws of Sn converge weakly to the Poisson distribution
with parameter α.

Theorem 12.16 (Central Limit Theorem (CLT))


Assume X1 , X2 , . . . are i.i.d. with σ 2 = V ar(X1 ) < ∞ and m = E(X1 ). Assume σ 2 > 0.
Pn
Take Sn = Xi and Sn∗ = S√n −nm nσ2
, n = 1, 2, . . . .
i=1
Then, the laws of Sn∗ converge weakly to N(0, 1). In particular, for −∞ ≤ a < b ≤ ∞,
we have
Zb
∗ 1 u2
lim P (a ≤ Sn ≤ b) = √ e− 2 du. (12.10)
n→∞ 2π
a

Proof: ϕ̃ characteristic function of X1 − m. Then, due to Lemma 12.9 b),


2 t→0
ϕ̃(t) = 1 − σ2 t2 + t2 g(t) with g(t) −−→ 0.
Let ϕn be the characteristic function of Sn∗ . Due to Lemma 12.8 (c) and (e), we have
t
ϕn (t) = ϕ̃( √nσ 2
)n (t ∈ R).
2 t2
t n
Note that (1 − 2n ) → e− 2 (t ∈ R).
Since |un − v n | ≤ |u − v|n max(|u|, |v|)n−1 for all u, v ∈ C, we have
t2 n t2 n t t2 t t2 n→∞
|(1 − 2n ) − ϕn (t)| = (1 − 2n ) − ϕ̃( √nσ )n | ≤ n|1 − 2n − ϕ̃( √nσ )| ≤ n nσ 2 |g(
√ t )| −−−→ 0.
2 2 nσ2
t2
⇒ the characteristic functions ϕn of Sn∗ converge to the characteristic function t → e− 2
of µ = N(0, 1), see Example 12.7.3.
Theorem 12.11 implies that the laws of Sn∗ converge weakly to N(0, 1).
(12.10) then follows with the Portemanteau Theorem, since µ = N(0, 1) has a density
and [a, b] is a set whose boundary is not charged by µ, ∀a, b. -

61
R
Remark Let M = {ν | ν probability measure on (R̄, B̄) with x2 dν < ∞}.
R R R R
For ν ∈ M, |x| dν < ∞. Let m = x dν and σ 2 = x2 dν − ( x dν)2 .
For ν1 , ν1 ∈ M, define ν1 ∗ ν2 as follows.
Let X1 , X2 independent random variables with laws ν1 and ν2 .
√2 −m
Let ν1 ∗ ν2 be the law of X1 +X 2
1 −m2
2
.
σ1 +σ2
n −nm
S√
Then, if X1 , . . . , Xn are i.i.d. with ν, the laws of Sn∗ = nσ2
is |ν ∗ ν ∗{z. . . ∗ ν}.
n-times
Note µ = N(0, 1) is a fixed point, µ ∗ µ = µ.
In this sense, the CLT describes convergence to a fixed point.

62
13 Conditional expectation
13.1 Motivation
Assume X is random variable on some probability space (Ω, A, P ), X ≥ 0.
The expectation E(X) can be interpreted as a prediction for the unknown (random) value
of X.
Assume A0 ⊆ A, A0 is a σ-field and assume we ”have the information in A0 ”, i.e. for
each A0 ∈ A0 we know if A0 will occur or not.
How does this partial information modify the prediction of X?

Example 13.1

(a) If X is measurable with respect to A0, then {X ≤ c} ∈ A0 , ∀c and we know for each
c if {X(ω) ≤ c} or {X(ω) > c} occurs.
⇒ We know the value of X(ω).

(b) X1 , X2 , . . . i.i.d. with m = E(X1 ) < ∞.


How should we modify the prediction E(X1 ) = m if we know the value
Sn (ω) = X1 (ω) + . . . + Xn (ω)?

The solution of the prediction problem is to pass from the constant E(X) = m to a
random variable E(X | A0 ), which is measurable with respect to A0 , the conditional
expectation of X, given A0 .

13.2 Conditional expectation for a σ-field generated by atoms



S
Let G = (Ai )i=1,2,... be a countable partition of Ω into ”atoms” Ai , i.e. Ω = · Ai and
i=1
A0 = σ(G).
If P (Ai ) > 0, consider the conditional law P (· | Ai ) defined by

P (B ∩ Ai )
P (B | Ai ) = (B ∈ A)
P (Ai )
R 1
R 1
and define E(X | Ai ) = X dP (· | Ai ) = P (A i)
X dP = P (A i)
E(XIAi ). Now define
Ai

X 1
E(X | A0 )(ω) := E(XIAi )IAi (ω). (13.1)
P (Ai )
i : P (Ai )>0

(13.1) gives for each ω ∈ Ω a prediction E(X | A0 )(ω) which uses only the information in
which atom ω is.

Definition 13.2 If A0 is generated by a countable partition G of Ω, the random variable


E(X | A0 ) defined in (13.1) is the conditional expectation of X given A0 .

63
Theorem 13.3 The random variable E(X | A0 ) (defined in (13.1)) has the following
properties:

(i) E(X | A0 ) is measurable with respect to A0 (E(X | A0 ) : (Ω, A0 ) → (R̄, B̄)).

(ii) For each random variable Y0 ≥ 0, which is measurable with respect to A0 , we have

E(XY0 ) = E(E(X | A0 )Y0 ). (13.2)

In particular,
E(X) = E(E(X | A0 )). (13.3)

Proof: (i) follows from (13.1).


To show (ii), take firstIAj (Aj ∈ A0 ): 
P 1 1
E(E(X | A0 )IAj ) = E P (Ai )
E(XIAj ) IA j IA j = P (Aj )
E(XIAj )P (Aj )
i : P (Ai )>0 | {z }

I i=j


Aj
=

0

else

= E(XIAj ) if P (Aj ) > 0.


Hence (13.2) follows in this case from (13.1) (if P (Aj ) = 0, both sides in (13.2) are = 0).
P
Next, we consider functions of the form ci IAi (ci ≥ 0), then monotone limits of such
functions as in the definition of the integral.
⇒ (13.2) holds true for all Y0 ≥ 0, Y0 measurable with respect to A0 .
Taking Y0 ≡ 1, (13.3) follows. -

Notation: If A0 = σ(Y ) for some random variable Y , we write E(X | Y ) instead of


E(X | σ(Y )).

Example 13.4
Take p ∈ [0, 1], let X1 , X2 , . . . be i.i.d. Bernoulli random variables with parameter p, i.e.
P (Xi = 1) = p = 1 − P (Xi = 0).
Question: What is E(X1 | Sn )?
n
P
Answer: E(X1 | Sn ) = P (X1 = 1 | Sn = k)I{Sn =k} and
k=0
P (X1 =1,Sn =k) k−1 )
p(n−1 pk−1 (1−p)n−1−(k−1) k
P (X1 = 1 | Sn = k) = = =
P (Sn =k) (nk)pk (1−p)n−k n

Sn
⇒ E(X1 | Sn ) = . (13.4)
n
Remark E(X1 | Sn ) does not depend on the ”success parameter” p.

64
Example 13.5 (Randomized sums)
X1 , X2 , . . . random variables with Xi ≥ 0, ∀i and E(Xi ) = m, ∀i.
TP
(ω)
T : Ω → {0, 1, . . . } is independent of (X1 , X2 , . . . ), ST (ω) := Xk (ω).
k=1

P 1
Then, according to (13.1), E(ST | T ) = P (T =k)
E(ST I{T =k} )I{T =k} .
k=0
T indep.
But E(ST I{T =k} ) = E(Sk I{T =k} ) = E(Sk )E(I{T =k} ) = k · m · P (T = k).
of Sk

P
Hence E(ST | T ) = m kI{T =k}
k=0
| {z }
=T
⇒ E(ST | T ) = m · T .
Now, with (13.3) we conclude that

E(ST ) = m · E(T ) (Wald’s identity) (13.5)

13.3 Conditional expectation for general σ-fields


X random variable on (Ω, A, P ), X ≥ 0, A0 ⊆ A, A0 σ-field.

Definition 13.7 A random variable X0 ≥ 0 is (a version of) the conditional expectation


of X, given A0 , if it satisfies

(i) X0 is measurable with respect to A0 ,

(ii) for each random variable Y0 ≥ 0, Y0 measurable with respect to A0 , we have


E(XY0 ) = E(X0 Y0 ).

We write X0 = E(X | A0 ).

Remark 13.8

1) For (ii) it suffices to have E(XIA0 ) = E(X0 IA0 ), ∀A0 ∈ A0 .

2) (ii) implies, with Y0 ≡ 1,


E(X) = E(E(X | A0 )) (13.6)

Theorem 13.9 (Existence and uniqueness of conditional expectations)


The conditional expectation E(X | A0 ) of a random variable X ≥ 0 given a σ-field A0
exists and is unique in the following sense:
If X0 and X̃0 are two random variables satisfying (i) and (ii) in Definition 13.7, then
X0 = X̃0 P -a.s.

Remark If X = X + − X − is semi-integrable, we define


E(X | A0 ) = E(X + | A0 ) − E(X − | A0 ).
Then, X ∈ L1 ⇒ E(X | A0 ) ∈ L1 .

65
Proof of Theorem 13.9:

1. Let Q(A0 ) := E(XIA0 ) (A0 ∈ A0 ).


Then
 ∞Q isa measure
 ∞on (Ω,  A0 ):∞ ∞
S P P P
Q · Ai = E X IA i = E(XIAi ) = Q(Ai ).
i=1 i=1 i=1 i=1
The measure Q is absolutely continuous with respect to P on (Ω, A0 ), i.e. we have,
∀A ∈ A0 , P (A0 ) = 0 ⇒ Q(A0 ) = 0.
Due to Radon Nikodym’s Theorem, there is a function X0 ≥ 0, which is measurable
R
with respect to A0 such that Q(A0 ) = X0 dP = E(X0 IA0 ) (A0 ∈ A0 ).
| {z } A0
=E(XIA0 )
Remark 13.8.1 implies that (ii) in Definition 13.7 is satisfied for X0 .

2. If X0 and X̃0 are random variables which satisfy (ii) in Definition 13.7, then
A0 = {X0 > X̃0 } ∈ A0 .
13.7 (ii) implies that E(X0 IA0 ) = E(X̃0 IA0 ) ⇒ E((X0 − X̃0 )IA0 ) = 0 ⇒ P (A0 ) = 0.
In the same way, P (X0 < X̃0 ) = 0 ⇒ X0 = X̃0 P -a.s. -

Example 13.10 We generalize the explicit computation in Example 13.4.


n
P
Lemma 13.10 X1 , . . . , Xn are i.i.d. and X1 ∈ L1 , Sn = Xi . Then,
i=1

Sn
E(Xi | Sn ) = , i = 1, . . . , n. (13.7)
n
Proof: We will need the following lemma.

Lemma 13.11 X and Y are random variables on some probability space (Ω, A, P ). Then,
the following statements are equivalent:

(a) Y is measurable with respect to σ(X).

(b) There is a measurable function h : (R̄, B̄) → (R̄, B̄) such that Y = h(X).

Proof of Lemma 13.11: (b) ⇒ (a) is clear because the composition of measurable functions
is measurable.
(a) ⇒ (b): Take first Y = IA , A ∈ σ(X).
Then, A = {X ∈ B} for some B ∈ B and Y = h(X) = I{X∈B} .
P
Then, take Y = ci IAi , then monotone limits of such functions etc. -
i

Proof of Lemma 13.10: Let Y0 ≥ 0, Y0 measurable with respect to σ(Sn ).


Hence, with Lemma 13.11, Y0 = h(Sn ) for a measurable function h. Hence,
R R
E(Xi h(Sn )) = · · · xi h(x1 + . . . + xn )µ1 (dx1 ) . . . µ(dxn ), where µ is the law of X1 .
E(Xi h(Sn )) is invariant under permutations of the indices {1, . . . , n}
⇒ E(Xi h(Sn )) = E(Xj h(Sn )), ∀i, j
Pn
⇒ E(Xi h(Sn )) = n1 E(Xk h(Sn )) = E( Snn h(Sn ))
k=1
⇒ E(Xi Y0 ) = E( Snn Y0 ) and we showed that Sn
n
satisfies property (ii) in Definition 13.7. -

66
Remark The proof used only that the joint law of (X1 , . . . , Xn ) is invariant under per-
mutations of the indices.

13.4 Properties of conditional expectations


Conditional expectation satisfies the same ”rules” as expectation.
In some situations, this is obvious since conditional expectation is an expectation with
respect to some conditional distribution.

Theorem 13.12 X1 , X2 random variables with X1 ≥ 0, X2 ≥ 0. Then

(a) E(X1 + X2 | A0 ) = E(X1 | A0 ) + E(X2 | A0 ) P -a.s.


E(cX1 | A0 ) = cE(X1 | A0 )
(linearity)

(b) X1 ≤ X2 P -a.s. ⇒ E(X1 | A0 ) ≤ E(X2 | A0 ) P -a.s. (monotonicity)

(c) 0 ≤ X1 ≤ X2 ≤ . . . P -a.s. ⇒ E(lim Xn | A0 ) = lim E(Xn | A0 ) P -a.s.


n n

Remark about (c)


T
A := {Xn < Xn+1 }. Due to the hypothesis, P (A) = 1 and (b) implies P (A0 ) = 1 where
n
T
A0 = {E(Xn | A0 ) ≤ E(Xn+1 | A0 )} (for all versions E(Xn | A0 ), E(Xn+1 | A0 )).
n
We now set lim Xn (ω) = lim Xn (ω)IA (ω) and lim E(X | A0 ) = lim E(Xn | A0 )IA0 (ω).
n n n n
(c) says that now lim E(Xn | A0 ) is (a version of) the conditional expectation of lim Xn ,
n n
given A0 , i.e. a random variable with properties 13.7 (i) and 13.7 (ii) with A0 and
X = lim Xn .
n
Proof:

(a) For each choice of a version E(Xi | A0 ) (i = 1, 2) we have that


E(X1 | A0 ) + E(X2 | A0 ) is a random variable which is measurable with respect to
A0 and for Y0 ≥ 0, Y0 measurable with respect to A0 , we have
13.7 (ii)
E(Y0 (E(X1 | A0 ) + E(X2 | A0 ))) = E(Y0 E(X1 | A0 )) + E(Y0 E(X2 | A0 )) =
E(Y0 X1 ) + E(Y0 X2 ) = E(Y0 (X1 + X2 )).

(b) Let B0 = {E(X1 | A0 ) > E(X2 | A0 )}. Then, B ∈ A0 and


R
(E(X1 | A0 ) − E(X2 | A0 )) dP = E(IB0 E(X1 | A0 ) − E(X2 | A0 ))
B0
13.7 (ii) R X1 ≤X2
= E(IB0 (X1 − X2 )) = (X1 − X2 ) dP ≤ 0 ⇒ P (B0) = 0.
B0 P -a.s.

(c) Let Y0 ≥ 0, Y0 measurable with respect to A0 . Then


mon. 13.7 (ii)
E(Y0 lim E(Xn | A0 )) = lim E(Y0 E(Xn | A0 )) = lim E(Y0 Xn )
n conv. n n
mon.
= E(Y0 lim Xn ). -
conv. n

67
Theorem 13.13

(a) Let Z0 ≥ 0 be a random variable which is measurable with respect to A0 . Then

E(Z0 X | A0 ) = Z0 E(X | A0 ). (13.8)

(b) Assume that σ(X) and A0 are independent. Then,

E(X | A0 ) = E(X). (13.9)

Proof:

(a) The right hand side of (13.8) is measurable with respect to A0 and for Y0 ≥ 0, Y0
measurable with respect to A0 we have
13.7 (ii)
E(Y0 (Z0 X)) = E((Y0 Z0 )X) = E(Y0 Z0 E(X | A0 )) = E(Y0 (Z0 E(X | A0 ))).

(b) see exercise 45. -

Theorem 13.12 implies the following

Fatou’s Lemma for conditional expectation


 
Xn ≥ Y P -a.s. ∀n for some Y ∈ L1 ⇒ E lim inf Xn | A0 ≤ lim inf E(Xn | A0 ) P -a.s.
n→∞ n→∞

Lebesgue’s Theorem for conditional expectations


1
|Xn | ≤ Y P -a.s. ∀n
 for some Y ∈ L , Xn → X P -a.s.
⇒ E(X | A0 ) = E lim Xn | A0 = lim E(Xn | A0 ) P -a.s.
n n

Jensen’s inequality for conditional expectations


X ∈ L1 , f convex. Then, f (x) is semi-integrable and E(f (x) | A0 ) ≥ f (E(X | A0 )) P -a.s.

Proof: Each convex function f is of the form f (x) = sup ln (x) ∀x with linear functions
n
ln (x) = an x + bn . In particular, f ≥ ln , ln (X) ∈ L1 .
13.12(b) 13.12(a)
Since E(f (x) | A0 ) ≥ E(ln (x) | A0 ) = ln (E(X | A0 )), we have
E(f (x) | A0 ) ≥ sup ln (E(X | A0 )) = f (E(X | A0 )) P -a.s. -
n

Corollary 13.14
For p ≥ 1, conditional expectation is a contraction of Lp in the following sense:
X ∈ Lp ⇒ E(X | A0 ) ∈ Lp and kE(X | A0 )kp ≤ kXkp .

Proof: With f (x) = |x|p , Jensen’s inequality for conditional expectations implies that
|E(X | A0 )|p ≤ E(|X|p | A0 ) ⇒ E(|E(X | A0 )|p ) ≤ E(|X|p )
⇒ kE(X | A0 )kp ≤ kXkp . -

In particular, if X ∈ L2 , then E(X | A0 ) ∈ L2 and E(X | A0 ) can be interpreted as the


”best” prediction of X, given A0 , in the following sense.

68
Theorem 13.15 Assume X ∈ L2 , Y0 is measurable with respect to A0 and Y0 ∈ L2 .
Then E ((X − E(X | A0 ))2 ) ≤ E((X − Y0 )2 ) and we have
”=” if and only if Y0 = E(X | A0 ) P -a.s.

Proof: Assume X0 is a version of E(X | A0 ).


Then E((X − Y0 )2 ) = E(X 2 ) − 2E(XY0 ) + E(Y02 ).
For Y0 = X0 , we conclude E((X − X0 )2 ) = E(X 2 ) − E(X02 ).
Hence E((X − Y0 )2 ) = E((X − X0 )2 ) + E((X0 − Y0 )2 ).
⇒ E((X − Y0 )2 ) ≥ E((X − X0 )2 ) with ”=” if and only if X0 = Y0 P -a.s. -

Remark Theorem 13.15 says that the conditional expectation E(X | A0 ) is the projec-
tion of the element X in the Hilbert space L2 (Ω, A, P ) on the closed subspace L2 (Ω, A0 , P ).

Theorem 13.16 (Projection property of conditional expectation)


Let A0 , A1 be σ-fields with A0 ⊆ A1 ⊆ A and X a random variable with X ≥ 0. Then,

E(E(X | A1 ) | A0 ) = E(X | A0 ) P -a.s. (13.10)

and
E(E(X | A0 ) | A1 ) = E(X | A0 ) P -a.s. (13.11)

Proof: For Y0 ≥ 0, Y0 measurable with respect to A0 ,


E(Y0 E(X | A1 )) = E(Y0 X) and this proves (13.10).
(13.11) is clear since E(X | A0 ) is measurable with respect to A1 : use (13.8). -

69
14 Martingales
14.1 Definition and examples
(Ω, A, P ) probability space, A0 ⊆ A1 ⊆ A2 ⊆ . . . increasing sequence of σ-fields with
Ai ⊆ A, ∀i.
Interpretation: An is the collection of events observable at time n.

Definition 14.1 A martingale is a sequence (Mn )n=0,1,... of random variables with

(i) Mn is measurable with respect to An , ∀n ≥ 0 and Mn ∈ L1 , ∀n. (14.1)


(ii) E(Mn+1 | An ) = Mn , ∀n ≥ 0. (14.2)

Remarks 14.2

1. Under the assumption (i), (ii) is equivalent to

E(Mn+1 − Mn | An ) = 0, ∀n ≥ 0. (14.3)

2. We are now omitting ”P -a.s.” in (14.2), (14.3).

3. We say that (Mn ) is adapted to (An ) (meaning that for each n, Mn is measurable
with respect to An ).

4. (14.3) implies that for n, k ≥ 0,


Pk
E(Mn+k − Mn | An ) = E(Mn+l − Mn+l−1 | An )
l=1
k
P
Thm.13.16
= E(E(Mn+l − Mn+l−1 | An+l−1 ) | An ) = 0.
l=1

We consider four important (classes of) examples.

14.1.1 Sums of independent centered random variables

Y1 , Y2, . . . independent random variables, Yi ∈ L1 , ∀i.


An := σ(Y1 , . . . , Yn ), n ≥ 1, A0 := {∅, Ω}.
Pn
Let Mn := (Yi − E(Yi )), n ≥ 1 and M0 = 0.
i=1
Then, (Mn ) is a martingale with respect to (An ):
(14.1) is satisfied and we have
Thm.13.13(a)
E(Mn+1 − Mn | An ) = E(Yn+1 − E(Yn+1) | An ) = E(Yn+1 | An ) − E(Yn+1 ) =
E(Yn+1) − E(Yn+1 ) = 0.

Example 14.3 p ∈ (0, 1), Y1 , Y2 , . . . i.i.d. with P (Yi = 1) = p = 1 − P (Yi = −1).


n
P
Sn := , n ≥ 1 (S0 = 1) is the corresponding random walk.
i=1
An = σ(Y1 , . . . , Yn ), A0 = {∅, Ω}.

70
Mn := Sn − n(2p − 1) (n = 0, 1, . . . ).
Then (Mn ) is a martingale with respect to (An ).
In the same way, for x ∈ R, M̃n = x + Sn − n(2p − 1) (n = 0, 1, . . . ) is a martingale with
respect to (An ).
Note that (Sn ) is a martingale with respect to An ⇔ p = 12 .

14.1.2 Successive Predictions

Take X ∈ L1 and set


Mn := E(X | An ), n = 0, 1, 2, . . . (14.4)
Then, (Mn ) is a martingale with respect to (An ).
Mn is measurable with respect to An , ∀n and
Thm.13.16
E(Mn+1 | An ) = E(E(X | An+1) | An ) = E(X | An ) = Mn .

14.1.3 Radon-Nikodym derivatives on increasing sequences of σ-fields

Example 14.4 P and Q are probability measures on (Ω, A) with Q ≪ P .


Let X := dQ
dP
be the Radon-Nikodym derivative of Q with respect to P .
dQ
Let Mn := dP |An be the Radon-Nikodym derivative of Q|An with respect to P |An .

Claim: Mn = E(X | An ), n = 0, 1, 2, . . .

Consequence: (Mn ) is a martingale with respect to (An ) (it falls into class 2).

Proof of the claim: Zn := E(X | An ).

1. Zn is measurable with respect to An , ∀n.


R dQ
2. We show that Q(A) = Zn dP, ∀A ∈ An (and this implies Zn = | ).
dP An
A
Take A ∈ An . Then,
R dQ
Q(A) = X dP since X = dP .
A
R R A∈A R R
X dP = X1A dP = n E(X | An )1A dP = Zn dP . -
A (13.2) A

Now Radon-Nikodym derivatives on increasing σ-fields form a martingale even if they are
not of the form E(X | An ).

Theorem 14.5 P and Q probability measures on (Ω, A), A0 ⊆ A1 ⊆ . . . increasing


sequence of σ-fields with Ai ⊆ A, ∀i.
We assume Q|Ai ≪ P |Ai , ∀i.
dQ
Let Mn = dP |An (n = 0, 1, . . . ).
Then, (Mn ) is a martingale with respect to (An ).

71
Proof:

1. Mn is measurable with respect to An , ∀n.

2. E(Mn+1 | An ) = Mn , ∀n, with the same argument as in Example 14.4, i.e. we show
R
that Q(A) = E(Mn+1 | An ) dP, ∀A ∈ An .
A R R
Take A ∈ An . Then, Q(A) = Mn dP = 1A Mn+1 dP
A
A∈An R R
= 1A E(Mn+1 | An ) dP = E(Mn+1 | An ) dP .
A
⇒ Mn = E(Mn+1 | An ) P -a.s. -

14.1.4 Harmonic functions of Markov chains

Consider a Markov process with state space (S, S) and transition kernel K(x, dy).
A function h : (S, S) → (R̄, B̄) is harmonic if it satisfies, ∀x ∈ S, the mean value property
Z
h(x) = h(y)K(x, dy). (14.5)

Take Ω = {ω = (x0 , x1 , . . . ) | xi ∈ S}, Xi (ω) = xi and let Px be the law of the Markov
process (Xn ) with x0 = x and transition kernel K. Assume h is harmonic.
If h(x) < ∞, Mn := h(Xn ) (n = 0, 1, . . . ) is a martingale with respect to Px and
An = σ(X0 , . . . , Xn ):
R (14.5)
E(h(Xn+1 ) | An ) = h(y)K(Xn , dy) = h(Xn ) Px -a.s.

Example 14.6 p ∈ (0, 1), x ∈ Z, Y1 , Y2 , . . . i.i.d. with


Pn
P (Yi = 1) = p = 1 − P (Yi = −1), Sn = Yi , (S0 = 0).
i=1
Consider the random walk Xn = x + Sn (n = 0, 1, . . . ).
(Xn ) is a Markov chain with starting point x and transition kernel
K(z, ·) = p δz+1
 +(1y − p)δz−1 .
Take h(y) = 1−p p
(y ∈ Z).
Then, h is a harmonic function for K:  z
R
h(y)K(z, dy) = p h(z + 1) + (1 − p)h(z − 1) = 1−p p
= h(z)
⇒ h(Xn ) (n = 0, 1, . . . ) is a martingale with respect to An = σ(X0 , . . . , Xn )

14.2 Stopping times


Definition 14.7 (Ω, A) measurable space, (An ) increasing sequence of σ-fields with
An ⊆ A, ∀n.
A stopping time is a function T : Ω → {0, 1, . . . , +∞} such that

{T = n} ∈ An (n = 0, 1, . . . ) (14.6)

In words: the decision whether to stop at time n can be based on the events observable
until time n.

72
Remark (14.6) is equivalent to

{T ≤ n} ∈ An (n = 0, 1, . . . ) (14.7)
n
S
Proof: (14.6) implies that {T ≤ n} = {T = k} ∈ An .
k=0
On the other hand, (14.7) implies that {T = n} = {T ≤ n} ∩ {T ≤ n − 1}c ∈ An . -

Example 14.8 If (Xn )n=0,1,... is adapted to (An ) and A ∈ B, the first entry time to A,
given by TA (ω) = min{n ≥ 0 : Xn (ω) ∈ A} (min ∅ = +∞) is a stopping time.
n
S
Proof: {TA ≤ n} = {Xk ∈ A} ∈ An . -
k=0

But the time of the last visit to A, given by LA (ω) = max{n ≥ 0 : Xn (ω) ∈ A} is in
general not a stopping time.

Theorem 14.9 (Optional stopping (”Stoppsatz”))


(Ω, A, P ) probability space, (An ) increasing sequence of σ-fields.

a) If (Mn ) is a martingale with respect to (An ) and T a stopping time, then


(MT ∧n ) (n = 0, 1, . . . ) is again a martingale with respect to (An ).

b) If in addition
i) T is bounded, i.e. there is a constant K so that P (T ≤ K) = 1 or
ii) P (T < ∞) = 1 and (MT ∧n )n=0,1,... is uniformly integrable, then

E(MT ) = E(M0 ). (14.9)

Proof:

a) Since {T ≤ n} ∈ An , MT ∧n is measurable with respect to An , ∀n.


E(MT ∧(n+1) − MT ∧n | An ) = E((Mn+1 − Mn )1{T >n} | An )
{T >n}∈An (Mn ) mart.
= 1{T >n} E(Mn+1 − Mn | An ) = 0.
⇒ (MT ∧n ) martingale with respect to (An ).

b) We have to show (14.9) under the above assumption on (Mn ) and T .


If P (T ≤ K) = 1, then (MT ∧n ) is uniformly integrable, since
|MT ∧n | ≤ max{|M0 |, . . . , |Mn |} ≤ |Mn | + . . . + |Mn | =: Z.
|MT ∧n | ≤ Z, Z ∈ L1 ⇒ (MT ∧n ) uniformly integrable.
Hence, it suffices to prove (14.9) under assumption (ii).
For ω ∈ {T < ∞}, we have lim MT ∧n (ω) = MT (ω).
n→∞  
a) (MT ∧n )
Hence E(M0 ) = lim E(MT ∧n ) = E lim MT ∧n = E(MT ). -
n→∞ unif. integr. n→∞

73
Example 14.10 (Simple random walk)
n
P
1
Y1 , Y2, . . . i.i.d. with P (Yi = 1) = 2
= P (Yi = −1), Sn = Yk for (n ≥ 1), S0 = 0,
k=1
An = σ(Y1 , . . . , Yn ), A0 = {∅, Ω}.
Then, (Sn ) is a martingale with respect to (An ), see Example 14.3.
Let T = min{n ≥ 0 : Sn = 1}.
Claim:
P (T < ∞) = 1. (14.10)

Proof: see later -

Note E(S0 ) = 0, E(ST ) = 1.


We conclude that (ST ∧n ) is not uniformly integrable.

Example 14.11 (The classical ruin problem)


p ∈ (0, 1), x ∈ Z, Y1 , Y2 , . . . i.i.d., P (Yi = 1) = p = 1 − P (Yi = −1),
Pn
An = σ(Y1 , . . . , Yn ), A0 = {∅, Ω}, Xn = x + Sn , Sn = Yk , S0 = 0.
k=1
a, b ∈ Z, a < b, T = min{n ≥ 0 : Xn ∈
/ (a, b)}.

Gambling interpretation:

• ruin of the gambler if XT = a.

• ruin of the casino if XT = b.

Claim: P (T < ∞) = 1.

Proof: Let c = b − a.
The events Ak = {Ykc+1 = 1, . . . , Y(k+1)c
 T=S1} (k= 0, 1, . . . ) are independent.
The Borel-Cantelli Lemma implies P Ak = 1 ⇒ P (T < ∞) = 1. -
n k≥n

Goal: Calculate the ruin probability r(x) = P (XT = a) = P (x + ST = a).

i) If p = 12 , (Xn ) is a martingale with respect to (An ).


Clearly, |XT ∧n | is bounded (a ≤ XT ∧n ≤ b) and we can apply Theorem 14.9.
⇒ x = E(X0 ) = E(XT ) = a r(x) + b(1 − r(x))
b−x
⇒ r(x) = . (14.11)
b−a
 Xn  x+Sn
ii) If p 6= 12 , we apply Theorem 14.9 to the martingale h(Xn ) = 1−p
p
= 1−p
p
from Example 14.6 and we get
h(x) = E(h(X0 )) = E(h(XT )) = h(a)r(x) + h(b)(1 − r(x))
 b−x
p
h(b) − h(x) 1 − 1−p
⇒ r(x) = = ... =  b−a (14.12)
h(b) − h(a) p
1 − 1−p

74
Remark If p < 12 ,
 b−x
p
r(x) ≥ 1 − (14.13)
1−p
and this bound does not depend on a.

14.3 The Martingale Convergence Theorem


(Ω, A, P ) probability space, (An ) increasing sequence of σ-fields, (Mn ) martingale with
respect to (An ).
N
For a < b and N ≥ 1, let Ua,b be the number of upcrossings of the interval [a, b] during
the time interval [0, N].
More precisely, let S0 = T0 = 0,
Sk = min{n ≥ Tk−1 : Mn ≤ a},
Tk = min{n ≥ Sk : Mn ≥ b},
N
Ua,b = max{k : Tk ≤ N}.

Lemma 14.12 (Upcrossing inequality)

N E((MN − a)− )
E(Ua,b )≤ (14.14)
b−a
Proof: Sk (k = 1, 2, . . . ) and Tk (k = 1, 2, . . . ) are stopping times.
P∞
Theorem 14.9 implies that for Z = (MTk ∧N − MSk ∧N ), we have E(Z) = 0.
k=1
N
If Ua,b = m, Z ≥ m(b − a) + MN − MSn+1 ∧N .
Further, MN − MSn+1 ∧N ≥ MN − a if Sn+1 < N and MN − MSn+1 ∧N = 0 else.
⇒ MN − MSn+1 ∧N ≥ −(MN − a)− ⇒ Z ≥ (b − a)Ua,bN
− (MN − a)− .
Now, (14.14) follows since E(Z) = 0. -

Remark
N
1. Let Ua,b := lim Ua,b . Then, monotone convergence implies
N →∞

1
N
E(Ua,b ) = lim E(Ua,b )≤ sup E((MN − a)− ) (14.15)
N →∞ b−a N

2. The right hand side of (14.15) is finite if

sup E(MN− ) < ∞. (14.16)


N

Further, (14.16) is equivalent to

sup E(|MN |) < ∞. (14.17)


N

Proof: E(|MN |) = E(MN+ ) + E(MN− ).


Since E(MN+ ) − E(MN− ) = E(MN ) = E(M0 ) we have

sup E(|MN |) < ∞ ⇔ sup E(MN− ) < ∞ ⇔ sup E(MN+ ) < ∞. - (14.18)
N N N

75
Theorem 14.13 ((Doob’s) Martingale Convergence Theorem)
Assume (Mn ) is a martingale which satisfies (14.16) (or (14.17)). Then,
P (Mn converges to a finite limit) = 1 and for M∞ := lim Mn , we have M∞ ∈ L1 .
n→∞

Proof:
S
1. We have {lim inf Mn < lim sup Mn } ⊆ {Ua,b = ∞}.
n n a,b∈Q
a<b
Due to (14.15), (14.16) implies that P (Ua,b < ∞) = 1, ∀a, b and we conclude that
lim inf Mn = lim sup Mn P -a.s.
n n
⇒ for P -almost all ω, M∞ (ω) := lim Mn (ω) exists.
n→∞

2. We have E(|M∞ |) ≤ lim inf E(|Mn |) < ∞ due to (14.18)


n
⇒ M∞ ∈ L1 (and in particular, M∞ is finite, P -a.s.). -

Example 14.14 Consider simple random walk (Sn ), see Example 14.10.
(Sn ) is a martingale. Clearly, (Sn ) does not converge (since |Sn − Sn−1 | = 1).
In fact, (14.17) does not hold, since the CLT implies that
n→∞ R∞ − x2
2
q
√1 E(|Sn |) −−−→ √1 |x|e dx = 2
.
n 2π π
−∞
Nevertheless, we can benefit from Theorem 14.13.
Let c ∈ Z\{0}, Tc := min{n ≥ 1 : Sn = c}.
Then, (STc ∧n ) is again
) a martingale with respect to (An ).
If c > 0, STc ∧n ≤ c (STc ∧n ) is a martingale which is bounded above

If c < 0, STc ∧n ≥ c (or below respectively).
⇒ (STc ∧n ) converges P -a.s.
⇒ P (Tc < ∞) = 1, ∀c ∈ Z\{0} (this proves (14.10) with c=1)
 c < ∞) = 1, ∀c ∈ Z
⇒ P (T 
⇒ P lim sup Sn = ∞, lim inf Sn = −∞ = 1.
n n

76

You might also like