probability_script_070212.pdf
probability_script_070212.pdf
February 7, 2012
3 Integration of functions 19
5 Lp -spaces, inequalities 24
7 Independence 32
13 Conditional expectation 63
14 Martingales 70
2
1 Measure spaces
Let Ω be a non-empty set.
(i) Ω ∈ A
(ii) A ∈ A ⇒ Ac ∈ A
S
(iii) An ∈ A, ∀n ∈ N ⇒ An ∈ A
n∈N
T
If An ∈ A, ∀n ∈ N, then also An ∈ A.
n∈N
(i) P (A) ≥ 0 ∀A ∈ A
(ii) P (Ω) = 1
∞ ∞
S P
(iii) P · An = P (An ) for each sequence of pairwise disjoint sets in A.
n=1 n=1
In contrast to the discrete case, P is continuous, i.e. there is no ω ∈ Ω with P ({ω}) > 0.
3
Remark A theorem by Ulam says that (assuming the continuum hypothesis) there is no
continuous probability measure on (Ω, P(Ω). ⇒ (*) is only possible if A ( P(Ω).
In general cannot take A = P(Ω), see example 1.5. On the other hand, A should contain
the ”interesting” sets (e.g. An in example 1.5).
Often, one knows P (A) for certain sets A and one wants to choose A so that one can
extend P to a probability measure on (Ω, A).
Ω is always a non-empty set.
A collection of sets M ⊆ P(Ω) is ∪-stable if it contains finite unions of sets in M, ∩-stable
if it contains intersections of finitely many sets in M.
(i) Ω ∈ A0
(ii) A ∈ A0 ⇒ Ac ∈ A0
(iii) A0 is ∪-stable
Each σ-field is an algebra, but there are algebras which are not σ-fields:
Example Ω = N. Then, A0 = {A ⊆ Ω | A finite or Ac is finite} is the algebra of finite
or cofinite sets. A0 is not a σ-field.
(i) Ω ∈ D
(ii) A, B ∈ D, A ⊆ B ⇒ B\A ∈ D
∞
S
(iii) An ∈ D, ∀n ∈ N, Ai ∩ Aj = ∅ for i 6= j ⇒ · An ∈ D
n=1
Each σ-field is a λ-system. On the other hand, there are λ-systems which are not σ-fields.
Example Ω = {1, 2, 3, 4}, D = {∅, Ω, {1, 2}, {1, 4}, {3, 4}, {2, 3}} D is a λ-system, but
not a σ-field.
Note that a λ-system which is ∩-stable is a σ-field:
∞
S ∞
S ∞
S
for A1 , A2 , . . . ∈ D, An = A1 ∪ (An ∩ Ac1 ∩ . . . ∩ Acn−1 ) ⇒ An ∈ D.
n=1 n=2 n=1
4
Definition 1.9 Let M ⊆ P(Ω), M = 6 ∅. Then
T
σ(M) = {A | M ⊆ A ⊆ P(Ω) | A σ-field} is the σ-field generated by M.
T
D(M) = {A | M ⊆ A ⊆ P(Ω) | A λ-system} is the λ-system generated by M.
T
α(M) = {A | M ⊆ A ⊆ P(Ω) | A algebra} is the algebra generated by M.
M is the generating system of σ(M), D(M) or α(M).
Proof: Use Lemma 1.8, where I is an index set for {A | M ⊆ A ⊆ P(Ω) | A σ-field}.
Have I 6= ∅, since M ⊆ P(Ω) and P(Ω) is a σ-field. Hence σ(M) is a σ-field. -
(a) M ⊆ σ(M)
Proof:
5
Example 1.13 An important σ-field is the Borel-σ-field on Rk .
Let Ok = {A ⊆ Rk | A open} be the collection of open subsets of Rk .
The σ-field Bk of Borel-sets of Rk (Borel-σ-field) is Bk = σ(Ok ).
Goal Assume that for A ∈ M ⊆ P(Ω), P (A) is known (see for instance the sets An in
example 1.5), M is not a σ-field.
Can we extend P to a probability measure on a σ-field A with M ⊆ A?
Is this extension unique?
In the following definitions, think of Ω = Rk , M = I k , µ = k-dim. content.
∞
S
d) σ-additive, if for pairwise disjoint sets A1 , A2 , . . . ∈ M with · Aj ∈ M, we have
j=1
!
∞
S ∞
P
µ · Aj = µ(Aj ).
j=1 j=1
6
b) Let T ⊆ Ωand define a measure µT by
|A ∩ T | A ∩ T finite
µT (A) =
∞ else
µT is the counting measure in T .
∞
P
In particular, take in c) A = P(Ω), µn = δωn , ωn ∈ Ω, ∀n, bn > 0, ∀n and bn = 1.
n=1
In this case, the measure is a discrete measure which is concentrated on the set {ω1 , ω2, . . . }
b) A ⊆ B ⇒ µ(A) ≤ µ(B)
∞
S
e) An ր A (i.e. A1 ⊆ A2 ⊆ . . . , A = Ai ) ⇒ µ(A) = lim µ(An )
i=1 n→∞
∞
T
f ) An ց A (i.e. A1 ⊇ A2 ⊇ . . . , A = Ai ) and µ(A1 ) < ∞ ⇒ µ(A) = lim µ(An )
i=1 n→∞
Remark
7
Theorem 1.18 (Uniqueness for measures agreeing on a ∩-stable collection of
sets)
Let M ⊆ P(Ω) be ∩-stable and A = σ(M). Assume that µ1 , µ2 are measures on (Ω, A)
with µ1 (A) = µ2 (A), ∀A ∈ M and assume that there is a sequence (An )n≥1 , An ∈ M, ∀n
with An ր Ω and µ1 (An ) = µ2 (An ) < ∞, ∀n. Then µ1 (A) = µ2 (A), ∀A ∈ A.
Corollary 1.19
Proof:
i) ∅ ∈ S.
ii) S is ∩-stable.
Examples
a) Ω = Rk , S = I k .
Proof: exercise. -
8
Definition 1.21 A non-negative function µ∗ : P(Ω) → [0, ∞] is an outer measure
(”äußeres Maß”) if:
i) µ∗ (∅) = 0
a) A(µ∗ ) is a σ-field on Ω.
i) µ(∅) = 0
iii) µ is σ-subadditive
9
Remark iii) in theorem 1.24 can be replaced with ”µ is σ-additive”.
µ is σ-additive ⇔ µ is σ-subadditive
Proof: Follows from Theorem 1.24 and Lemma 1.25 since each algebra is a semiring. -
Theorem 1.28 If G is a cdf then there exists a unique measure µG on (R, B) such that
10
Further, µG is σ-finite.
If G is the cdf of a probability measure, then µG is a probability measure.
µG is the Lebesgue-Stieltjes measure associated to G.
Example 1.29
3. For a probability measure ν on (R, B), F (x) := ν((−∞, x]) is the cdf of ν.
F is a cdf and we have µF = ν.
Example ν exponential
distribution with parameter α.
1 − e−αx x > 0
Then F (x) =
0 x≤0
4. A ⊆ R, A countable ⇒ λ(A) = 0
”⇐” is not true.
Example (Cantor set)
A1 = ( 13 , 32 ), A2 = ( 29 , 93 ) ∪ ( 79 , 98 ) !
S k−1
P 2 k−1
P
1
Ak = zj 3j + 3k
, zj 32j + 2
3k
(k ≥ 2)
(z1 ,...,zk−1 )∈{0,1}k−1 j=1 j=1
S∞
A := · Ak
k=1
The Cantor set C is defined by C := [0, 1]\A
Claim: C is not countable.
∞
P
Proof: Consider the function {0, 1, 2}N → [0, 1], a 7→ ak 31k
k=1
The set {a = (ak )k≥1 | ak ∈ {0, 2}, ∀k} is mapped one to one to C. -
∞ ∞ ∞
2 k−1
P P P
We have C ∈ B and λ(A) = λ(Ak ) = 2k−1 31k = 31 3
=1
k=1 n=1 k=1
⇒ λ(C) = 0.
11
Further, Theorem 1.24 implies the existence of the Borel-Lebesgue measure λk on (Rk , Bk ).
Qk
λk is defined on the semiring by λk ((x, y]) = (yi − xi ).
i=1
is a measure on (Ω, Aµ ) which extends µ. The measure space (Ω, Aµ , µ̄) is complete.
We say that (Ω, Aµ , µ̄) is the completion of (Ω, A, µ).
Remark The measure space (Rk , Bk , λk ) is not complete. Its completion is denoted
(Rk , Lk , λk ). Lk is the σ field of Lebesgue-measurable sets and λk is the Lebesgue measure
on (Rk , Lk ).
12
2 Measurable functions, random variables
Ω 6= ∅, Ω̃ 6= ∅, f : Ω → Ω̃.
The preimage (”Urbild”) of a set à ⊆ Ω̃ is f −1 (Ã) = {ω ∈ Ω | f (ω) ∈ Ã}.
The corresponding function f −1 is defined by f −1 : P(Ω̃) → P(Ω) Ã 7→ f −1 (Ã).
f −1 commutes ∪
S with and ∩, more precisely:
T letTI be an index set and Ãj ⊆ Ω̃ (j ∈ I),
S
then f −1 Ãj = f −1 (Ãj ), f −1 Ãj = f −1 (Ãj ), f −1 (Ãcj ) = (f −1 (Ãj ))c ,
j∈I j∈I j∈I j∈I
−1
f (Ω̃) = Ω.
Definition 2.2 Let (Ω, A) and (Ω̃, Ã) be measurable spaces. The function f : Ω → Ω̃ is
(A, Ã)-measurable if
Examples
1. If A = {∅, Ω} and à = P(Ω), only the constant functions f with f (ω) = ω̃, ∀ω ∈ Ω
are (A, Ã)-measurable.
Lemma 2.3 Let (Ω, A) and (Ω̃, Ã) be measurable spaces, f : Ω → Ω̃ and assume
M̃ ⊆ P(Ω̃) satisfies σ(M̃) = Ã. Then
f is (A, Ã)-measurable ⇔ f −1 (M̃) ⊆ A.
13
Proof: ”⇒” is clear.
”⇐” holds since M̃ ⊆ Af , Af is a σ-field due to Lemma 2.1 b) ⇒ Ã = σ(M̃) ⊆ Af . -
Proof:
14
Let A be a σ-field on Ω
such that fk is (A, Ak )-measurable ∀k ∈ J.
S −1 S −1
⇒ fj (Aj ) ⊆ A ⇒ σ fj (Aj ) ⊆ A.
j∈J j∈J
Hence, σ(fj , j ∈ J) is the smallest σ-field A, such that all functions fk are (A, Ak )-
measurable.
Definition 2.7 In the above setup, the σ-field σ(Πj , j ∈ J) is the product-σ-field (”Produkt-
∞
Q
σ-Algebra”) of F1 , F2, . . . and we write Fj .
j=1
Lemma 2.8 Let (Ω, A, µ) be a measure space and (Ω̃, Ã) be a measurable space and
f : (Ω, A) → (Ω̃, Ã).
Then, µf : Ã → [0, ∞], Ã 7→ µf (Ã) = µ(f −1(Ã)) is a measure on (Ω̃, Ã).
µf is the image of µ under f or the law of f under µ (”Bildmaß”).
Definition 2.9 Let P be a probability measure on (Ω, A), let (Ω̃, Ã) be a measurable
space. Then, a function X : (Ω, A) → (Ω̃, Ã) is a random variable with values in Ω̃.
The law PX of X under P (PX is a probability measure) is the law of X.
k k
(Ω̃, Ã) = (R , B ) the random variable X = (X1 , . . . , Xk ) is
If
a random vector k≥2
a real-valued random variable k = 1
If k ≥ 2, PX = P(X1 ,...,Xk ) is the joint law of X1 , . . . , Xk under P .
15
Define a function f : [0, 1] → {0, 1}N in the following way:
∞
P
choose for a ∈ [0, 1] a sequence (ak )k≥1 such that a = ak 21k (if there are two sequences,
k=1
take the sequence with infinitely many 1’s).
Q
Equip {0, 1}N with the product-σ-field Fi , where Fi = {∅, {0}, {1}, {0, 1}}.
i∈N Q
Then, f : ([0, 1], B[0,1] ) → (Ω, A), Ω = {0, 1}N , A = Fi i.e. f is (B[0,1] , A)-measurable.
i∈N
Proof: exercise -
Remark The image of λ under f is the probability measure on {0, 1}N which describes
infinitely many tosses of a fair coin.
We want now to consider functions (or random variables) which can take values in
R̄ = R ∪ {−∞, ∞}.
Let B̄ = {B ⊆ R̄ | B ∩ R ∈ B}. B̄ is a σ-field on R̄.
We have B̄ = B ∪ {B ∪ {∞} | B ∈ B} ∪ {B ∪ {−∞} | B ∈ B} ∪ {B ∪ {−∞, ∞} | B ∈ B}.
We will consider functions f : (Ω, A) → (R̄, B̄). We write
{f > a} = {ω ∈ Ω | f (ω) > a} = f −1 ((a, ∞])
{f = g} = {ω ∈ Ω | f (ω) = g(ω)}
{f = ∞} = {ω ∈ Ω | f (ω) = ∞}
a) f is (A, B̄)-measurable
b) {f > a} ∈ A, ∀a ∈ R
c) {f ≥ a} ∈ A, ∀a ∈ R
d) {f < a} ∈ A, ∀a ∈ R
e) {f ≤ a} ∈ A, ∀a ∈ R
16
S
Proof: {f < g} = ({f < a} ∩ {g > a})
a∈Q
c
{f ≤ g} = {f > g}
{f = g} = {f ≤ g} ∩ {g ≤ f }
{f 6= g} = {f = g}c -
Proof:
a) Due to Lemma 2.14, a + cf and −f are (A, B̄)-measurable, too (a, c ∈ R).
Hence {f + g ≥ a} = {f ≥ a − g} ∈ A due to Lemma 2.14.
2
Ω a≤0
But {f > a} = √ √
{f ≥ a} ∪ {f ≤ − a} a > 0
⇒ {f 2 ≥ a} ∈ A ⇒ f 2 is (A, B̄)-measurable.
General case:
Ω1 = {f · g = ∞}
Ω2 = {f · g = −∞}
Ω3 = {f · g = 0}
Ω4 = Ω\(Ω1 ∪ Ω2 ∪ Ω3 )
Then, Ωj ∈ A (j = 1, 2, 3, 4). The restriction f |Ω4 of f to Ω4 is (A, B)-measurable,
same for g|Ω4 . Due to first case, f |Ω4 · g|Ω4 is (A, B)-measurable. But {f · g ≥ a} =
4
S
· ({f · g ≥ a} ∩ Ωj ) = {f · g = ∞}∪∅ · ∪({f
· · |Ω4 · g|Ω4 )−1 ([a, ∞)) and
· g ≥ a} ∩ Ω3 )∪(f
j=1
{f · g = 0} a ≤ 0
{f · g ≥ a} ∩ Ω3 =
∅ else
⇒ f · g : (Ω, A) → (R̄, B̄).
17
c) n
Assumeoa > n
0. Then o n o n o
1 1 · 1 · 1
≥ a = ≥ a ∩ {g > 0} ∪ ≥ a ∩ {g = 0} ∪ ≥ a ∩ {g < 0} =
g g g g
1
0<g≤ ∪· {g = 0} ∪· |{z}
∅
a | {z }
| {z } ∈A ∈A
n ∈A o
⇒ 1g ≥ a ∈ A.
a < 0 analogous. -
a) sup fn , inf fn
n n
Proof:
∞
T
a) {sup fn ≤ a} = {fn ≤ a}
n≥1 n=1 | {z }
∈A
inf fn = − sup(−fn )
n n
b) follows from a) since lim sup fn = inf sup fn , lim inf fn = sup inf fn .
n n≥1 k≥n n n≥1 k≥n
Note that
f + ≥ 0, f − ≥ 0, f = f + − f − (2.2)
Corollary 2.17 If (fn )n≥1 is a sequence of functions which are (A, B̄)-measurable and
lim fn (ω) exists, ∀ω (we say that the sequence (fn ) converges pointwise) then lim fn is
n→∞ n→∞
(A, B̄)-measurable.
18
3 Integration of functions
Assume (Ω, A, µ) is a measure space.
R
Goal: For as many functions f : (Ω, A) → (R̄, B̄) as possible, define f dµ.
Proceeds in 4 steps:
1. f = IA .
R R
1. IA dµ = 1 dµ = µ(A) (A ∈ A)
A
n
S
with Aj = fj−1 ({αj }) ∈ A and Ω = · Aj .
j=1
R n
P
Define f dµ = αj µ(Aj )
j=1
- have to show that for two different representations of the form (3.1), the two
integrals coincide.
Define: Z Z
f dµ = lim un dµ (3.2)
n
19
4. Definition 3.2 The function f : f (Ω, A) → (R̄, B̄) is µ-integrable or integrable if
R + R
f dµ < ∞ and f − dµ < ∞.
In this case, Z Z Z
f dµ := f dµ − f − dµ
+
(3.3)
R
f dµ is the integral of f with respect to µ.
Notations:
R R R
f dµ = µ(f ) = f (ω) µ(dω) = f dµ
Ω
Remark
R R R
1. If f + dµ < ∞ or f − dµ < ∞ then (3.3) still defines f dµ - with possible values
+∞ or −∞. In this case, we say that f is quasi-integrable.
Examples
R n
P
If f is integrable w.r.t µ, f dµ = bn .
n=1
20
Theorem 3.4 (Monotone Convergence)
If 0 ≤ f1 ≤ f2 ≤ . . . then Z Z
lim fn dµ = lim fn dµ
n n
Remark f ≥ 0 is needed.
Proof: In 4 steps, see the definition of the integral at the beginning of the chapter. -
21
4 Markov inequality, Fatou’s lemma, dominated
convergence
Definition 4.1 (Ω, A, µ) measure space. We say, that a property E holds µ-almost
everywhere (µ-a.e.) (”µ-fast überall”) if {ω | E does not hold for ω} is a µ-nullset.
If P is a probability measure, we say that E holds P -almost surely (P -a.s.)
(”P -fast sicher”). In this case P ({ω | E holds for ω}) = 1.
Theorem 4.2 Let f, g : (Ω, A) → (R̄, B̄) and assume f = g, µ-a.e. Then
f is µ-integrable ⇔ g is µ-integrable
R R
and in this case: f dµ = g dµ.
Consequences 4.4
1. f : (Ω, A) → (R̄, B̄), f ≥ 0. Then,
Z
f dµ = 0 ⇔ f = 0 µ-a.e.
R
Proof: µ({f ≥ t}) ≤ 1t f dµ = 0 (t > 0)
S
hence µ({f > 0}) = µ {f ≥ n1 } ≤ 0
n
⇒ µ({f > 0}) = 0 -
22
(4.1) R
1 n→∞
Proof: µ({|f | = ∞}) ≤ µ({|f | ≥ n}) ≤ n
|f | dµ −−−→ 0. -
Proof: gn := inf fk , n = 1, 2, . . .
k≥n
Then 0 ≤ g1 ≤ g2 ≤ . . . and lim inf fn = lim gn .
n→∞ n→∞
R R mon. R gn ≤fn R
Hence lim inf fn dµ = lim gn dµ = lim gn dµ ≤ lim inf fn dµ. -
n→∞ n→∞ conv. n→∞ n→∞
Remark
23
5 Lp-spaces, inequalities
(Ω, A, µ) measure space, p ≥ 1.
R
Definition 5.1 f : (Ω, A) → (R̄, B̄) has a finite p-th absolute moment if |f |p dµ < ∞.
R
Lp = Lp (Ω, A, µ) := {f : (Ω, A) → (R̄, B̄) | |f |p dµ < ∞} is the collection of measurable
functions on (Ω, A) with a finite p-th absolute moment.
Definition 5.3 f : (Ω, A) → (R̄, B̄) is µ-almost everywhere bounded if there is a constant
K (0 ≤ K < ∞) such that µ({|f | > K}) = 0
L∞ = L∞ (Ω, A, µ) = {f : (Ω, A) → (R̄, B̄) | f µ-a.e. bounded}
24
Proof of Hölder’s inequality: W.l.o.g., let 0 < kf kp kgkq < ∞
Due to Lemma 5.7 we have
|f | |g| |f |p |g|q
kf kp kgkq
≤ 1p kf kpp
+ 1q kgk q
q
Integrating
R
with respect to µ yields
|f g| dµ 1 1
kf kp kgkq
≤ p ·1+ q ·1= 1 -
Proof:
Step 1 p < ∞
We can restrict ourselves to the non-trivial case in which we have kf kp , kgkp < ∞
and thus kf + gkp < ∞. Moreover, w.l.o.g. we can assume that f, g are non-
negative since we have kf + gkp ≤ k|f | + |g|kp.
Further, for p = 1 we have k|f | + |g|kp = kf kp + kgkp and thus we can assume
p > 1.
We define q := 1−1 1 ⇒ 1p + 1q = 1 and we have
R R p R
(f + g)p dµ = f (f + g)p−1 dµ + g(f + g)p−1 dµ
Hölder R 1
≤ kf kp k(f + g)p−1kq + kgkp k(f + g)p−1kq = (kf kp + kgkp ) (f + g)(p−1)q dµ q
(p−1)q=p R 1− 1
= (kf kp + kgkp ) (f + g)p dµ p .
Step 2 p = ∞
Again we assume that we have kf kp , kgkp < ∞.
For all ε > 0 we have
µ({|f + g| > kf k∞ + kgk∞ + ε}) ≤ µ({|f | > kf k∞ + 2ε }) + µ({|g| > kgk∞ + 2ε }) = 0
⇒ µ({|f + g| > s}) = 0, ∀s > kf k∞ + kgk∞ . -
• kf kp ≥ 0
• f ≡ 0 ⇒ kf kp = 0
• kαf kp = |α|kf kp
• kf + gkp ≤ kf kp + kgkp
25
We define N0 := {f ∈ Lp : f = 0 µ-a.e.} and observe that N0 is a linear subset of Lp .
Let Lp := Lp (Ω, A, µ) = Lp (Ω, A, µ)/N0 denote the quotient space of Lp with respect to
N0 .
Lp is a normed vector space over R with norm k[f ]kp := kf kp for some f ∈ [f ] where [f ]
denotes the equivalence class of f .
Proof:
Lemma 5.11 Let (Ω, A, µ) be a measure space, p ∈ [1, ∞] and (fn ) a sequence of R̄-
valued, (A, B̄)-measurable functions such that:
• ∃g ≥ 0, g ∈ Lp , |fn | ≤ g µ-a.e., ∀n ∈ N
26
With the help of Lemma 5.11, we can conclude that (fnk ) converges in Lp to f since we
have
P
|fnk | ≤ |fnk − fnk−1 + fnk−1 − fnk−2 + . . . + fn2 − fn1 + fn1 | ≤ |fnk − fnk−1 | + |fn1 | ∈ Lp
k
In other words kfnk − f kp −−−→ 0, and since (fn ) is a Cauchy sequence,
k→∞
kfn − f kp −−−→ 0. -
n→∞
N
P ∞
P
Proof: hN = gn ր h = gn .
n=1 n=1
R 1 R 1
Hence ( hpN dµ) p ր ( hp dµ) p (monotone convergence).
R 1 PN P∞
But ( hpN dµ) p ≤ kgn kp ≤ kgn kp .
Minkovski n=1 n=1
nR 1
o P ∞
Hence sup ( hpN dµ) p ≤ kgn kp . -
N n=1
Lp a.e.
Example 5.13 (fn −→ f ; fn −−→ f )
Ω = [0, 1], A = B and µ = λ
Define a sequence (An ) as follows:
m = 1: A1 = [0, 11 )
m = 2: A2 = [0, 21 ), A3 = [ 12 , 1)
m = 2: A4 = [0, 31 ), A5 = [ 13 , 32 ), A6 = [ 23 , 1)
···
Define (fn ) as fn := IAn , n ≥ 1 and f = 0.
Then (fn ) converges to f in Lp , for 1 ≤ p < ∞.
m−1
P R 1
Indeed, as soon as n ≥ j, we have |fn − f |p = λ(An ) ≤ m
.
j=1
But fn 9 f µ-a.e.
Indeed, ∀ω ∈ (0, 1): lim sup fn (ω) = 1, lim inf fn (ω) = 0.
n n
Remark fn 9 f in L∞ .
27
6 Different types of convergence, uniform integrability
(Ω, A, P ) a probability space.
X, X1 , X2 , . . . random variables.
Definition 6.1
P
• We say that (Xn ) converges in probability to X (Notation: Xn −
→ X), if
n→∞
∀ε > 0 : P (|Xn − X| ≥ ε) −−−→ 0.
Theorem 6.2
Lp P
i) If Xn −−−→ X, then Xn −−−→ X
n→∞ n→∞
a.s. P
ii) If Xn −−−→ X, then Xn −−−→ X
n→∞ n→∞
28
∞
S ∞
S
One can write C c = {lim sup |Xn − X| ≥ k1 } = D1
k
k=1 n k=1
∞
P
0 ≤ P (C c ) ≤ P (D 1 ) = 0 ⇒ P (C) = 1. -
k
k=1
Remark The converse implications of i) and ii) in Theorem 6.2 do not hold.
b) Every subsequence (Xnk ) of (Xn ) has another subsequence (Xñk ) such that
a.s.
Xñk −−−→ X.
n→∞
Proof: a) ⇒ b):
Let (Xnk ) be a subsequence of (Xn ).
Then, there exists (Xñk ) subsequence of (Xnk ) such that P (|Xñk − X| ≥ k1 ) ≤ 1
k2
, k ≥ 1.
n→∞
We now prove that ∀ε > 0, P (sup |Xñk − X| ≥ ε) −−−→ 0.
k≥n
Let ε > 0 and n large enough such that n1 < ε.
∞
P ∞
P
P (sup |Xñk − X| ≥ ε) ≤ P (|Xñk − X| ≥ ε) ≤ P (|Xñk − X| ≥ k1 ) ≤
k≥n k=n k=n
∞
P 1 n→∞
k2
−−−→ 0.
k=n
b) ⇒ a):
Let ε > 0 and define bn = P (|Xn − X| ≥ ε). bn is bounded.
Let (bnk ) be a convergent subsequence of (bn ).
a.s. k→∞
Then there exists (bñk ) such that Xñk −−−→ X, which implies that bñk −−−→ 0
k→∞
by Theorem 6.2 ii).
k→∞
⇒ bnk −−−→ 0.
Hence, every subsequence (bnk ) of (bn ) which converges, converges to 0.
n→∞
⇒ bn −−−→ 0. -
Remark We already know from Theorem 5.10 that for a sequence (Xn ) such that
Lp a.s.
Xn −−−→ X (1 ≤ p ≤ ∞), there is a subsequence (Xnk ) such that Xnk −−−→ X.
n→∞ n→∞
29
Definition 6.6 We say that the sequence (Xn ) is uniformly integrable if
Z !
lim sup |Xn | dP =0 (6.1)
c→∞ n
{|Xn |≥c}
R
|Xn | dP = E(|Xn |I{|Xn |≥c}) .
{|Xn |≥c}
Remarks
Lemma 6.8 If (Xn ) is uniformly integrable then there is, for each ε > 0, some δ =
R
δ(ε) > 0 so that P (A) ≤ δ ⇒ sup |Xn | dP ≤ ε.
n A
ε
and for c = c(ε) large enough and δ = 2c
, both terms on the right hand side of (6.3) are
≤ 2ε . -
Proof of Theorem 6.7: Xn → X in probability. W.l.o.g. X ≡ 0.
R
Then E(|Xn |) ≤ ε + |Xn | dP . (Take c = ε, A = Ω in (6.3).)
{|Xn |≥ε}
R
But P (|Xn | ≥ ε) ≤ δ for n ≥ N0 (δ) since Xn → 0 in probability, hence |Xn | dP ≤ ε
{|Xn |≥ε}
n→∞
for n ≥ N0 (ε) because of Lemma 6.8. Hence E(|Xn |) −−−→ 0. -
a) Xn → X in Lp .
30
Proof: see exercises -
g(|x|) 1
|x| ≥ c(δ) ⇒ ≥ (6.4)
|x| δ
Z
R
Hence, sup |Xn | dP ≤ δ sup g(|Xn|) dP ≤ ε for δ = δ(ε). -
n {|Xn |≥c(δ)} (6.4)
|n {z }
<∞
Then, take g(x) = xp (p > 0) or g(x) = x log+ x to get the statements above.
31
7 Independence
(Ω, A, P ) probability space.
Remark 7.2 In particular, two events A and B are independent if P (A∩B) = P (A)P (B).
Note that there may be probability measures Q on (Ω, A) with Q(A ∩ B) 6= Q(A)Q(B).
Hence, independence is not a property of A and B alone but involves P .
32
nT o
(ii) The collections Ck := Ai | J ⊆ Jk , J finite, Ai ∈ Gi (k ∈ K) are ∩-stable and
i∈J
independent.
T
For Ckj = Ai ∈ Ckj , we have
i∈Jkj
T T Q Q
P (Ck1 ∩ . . . ∩ Ckn ) = P Ai ∩ . . . ∩ Ai = P (Ai ) · . . . · P (Ai ) =
i∈Jk1 i∈Jkn i∈Jk1 i∈Jkn
P (Ck1 ) · . . . · P (Ckn ) S
Due to (i), the generated σ-fields σ(Ck ) = σ Gi (k ∈ K) are independent,
i∈Jk
too. -
T S
Note that Ak is the event that for infinitely many k, Ak happens.
n k≥n
Remark 7.5 For an arbitrary sequence (An )n∈N of measurable sets in some measure space
we define
T S
lim sup An := Ak and
n→∞ n∈N T
k≥n
S
lim inf An := Ak .
n→∞ n∈N k≥n
It is an easy exercise to show that we have lim inf An ⊆ lim sup An and that
n→∞ n→∞
(lim sup An )c = lim inf Acn .
n→∞ n→∞
lim sup An = {ω ∈ Ω : ω ∈ An for infinitely many n ∈ N}.
n→∞
lim inf An = {ω ∈ Ω : ω ∈ An for all but finitely many n ∈ N}.
n→∞
33
Due to Theorem 7.3 the events (Acn )n∈N
are also m
independent and hence we have
T Thm. 1.17 (f) m
S Q
P Ack = lim P Ack = lim P (Ack ) =
k≥n m→∞ k=n m→∞ k=n
m
Q 1−x≤e−x Pm
lim (1 − P (Ak )) ≤ lim inf e− k=1 P (Ak )
= 0. -
m→∞ k=n m→∞
Interpretation of A∞
1) Dynamical:
If we interpret I as a sequence {1, 2, 3, . . . } of points in time and An as the σ-algebra
of all events observable at time n ∈ N, then we have
∞
T S T∞
A∞ =: A∗ = σ Ak = σ(An , An+1 , . . . ).
n=1 k≥n n=1
∗
A can be interpreted as the σ-algebra of all events observable ”in the infinitely distant
future”.
Example Let (Xn )n∈N be a sequence of random variables on (Ω, A).
∞
T
We define An := σ(X1 , X2 , . . . , Xn ) then we have A∗ = σ(Xn , Xn+1 , . . . ).
n=1
n Pn o Pn
k=1 Xk k=1 Xk
The events lim cn
≤ t , lim sup cn
≤ t for cn , t ∈ R with cn ր ∞ are
n→∞ n→∞
∗
elements of A .
Due to Kolmogorov’s 0-1-law we have P (A) ∈ {0, 1}, ∀A ∈ A∗ if the random variables
X1 , X2 , . . . are independent.
2) Static:
We interpret I as the set of ”subsystems” which act independently of each other and
Ai as the σ-algbra of events which only depend on the i’th subsystem.
Then A∞ is the collection of all ”macroscopic” events which do not depend on finitely
many subsystems. Thus, if the subsystems are independent, we know that on this
”macroscopic scale” the whole system is deterministic.
34
S
Step 2 Again with the help of Theorem 7.3 we can conclude that σ Ai and A∞ are
i∈I
independent.
S
Step 3 Let A ∈ A∞ . Then A is also an element of σ Ai , since
! i∈I
T S S
σ Ai ( σ Ai .
J⊆I i∈J i∈I
|J|<∞
Therefore, step 2 of this proof implies P (A) = P (A ∩ A) = P (A)P (A)
(A is independent of itself)
⇒ P (A) ∈ {0, 1}. -
Remark 7.7 Let (Ai )i∈I be a countable collection of independent σ-algebras and let
Ai ∈ Ai (i ∈ I).
T S
Then we have that Ā := Ai is an element of A∞ and thus P (Ā) ∈ {0, 1},
J⊆I i∈J
/
|J|<∞
and the Borel-Cantelli-Lemma can be used to decide whether P (Ā) = 1 or P (Ā) = 0.
Definition 7.8 A collection of random variables (Xi )i∈I is independent if the σ-algebra
generated by the random variables Xi are independent, i.e. if σ(Xi ) (i ∈ I) are indepen-
dent.
35
8 Kernels and Fubini’s Theorem
Let (Ω1 , F1 ) and (Ω2 , F2 ) be measure spaces.
Definition 8.1 A transition kernel or stochastic kernel K(x1 , dx2 ) from (Ω1 , F1 ) to
(Ω2 , F2 ) is a function K : Ω1 × F2 → [0, 1], (x1 , A2 ) 7→ K(x1 , A2 ) such that
Interpretations of kernels
1. Dynamic:
(Ωi , Fi ) = state space of period/time i.
K(x1 , ·) = law of the state at time 2 if state at time 1 was x1 .
2. Static:
(Ωi , Fi ) = state spaces of system number i.
K(x1 , ·) = law of the state of system number 2 given that system number 1 is in
state x1 .
Examples
3. Markov chain
Ω1 = Ω2 = S, S countable.
F1 = F2 = P(S).
⇒ K is given by weights k(x, y) (x, y ∈ S) and k(x, y)x,y∈S is a stochastic matrix.
P
k(x, y) ≥ 0, ∀x, y ∈ S, k(x, y) = 1, ∀x ∈ S
y∈S
!
α 1−α
Example S = {0, 1}, (0 < α < 1)
1−α α
K(x, y) = αI{x=y} + (1 − α)I{x6=y} .
36
A Discrete case
Ωi countable, Fi = P(Ωi ) (i = 1, 2).
⇒ Ω = Ω1 × Ω2 countable and the probability measure P on (Ω, F ) (F = P(Ω)) is given
by the weights p(x1 , x2 ) = p1 (x1 )k(x1 , x2 ), where p1 (x1 ) = P1 ({x1 }).
For each function f : Ω → [0, ∞], we then have
R P P P
f dP = f (x1 , x2 )p(x1 , x2 ) = f (x1 , x2 )k(x1 , x2 ) p1 (x1 ) =
x1 ,x2 x1 x2
R R
f (x1 , x2 )K(x1 , dx2 ) P1 (dx1 ).
B General case
Let Ω = Ω1 × Ω2 , F = F1 × F2 .
Proof: The uniqueness of P follows from Theorem 1.18 since the sets of the form A1 × A2
generate F and they are a ∩-stable collection of sets.
We show that the right hand side of (8.1) or (8.2), respectively, are well-defined.
Step 1 For x1 ∈ Ω1 , take ϕx1 (x2 ) := (x1 , x2 ).
∅ x1 ∈
/ A1
−1
Since ϕx1 (A1 × A2 ) =
A2 x1 ∈ A1 ,
ϕx1 : (Ω2 , F2 ) → (Ω, F ).
Hence, for any function f : (Ω, F ) → (R̄, B̄), the function fx1 = f ◦ ϕx1 or
fx1 (x2 ) = f (x1 , x2 ) is measurable: fx1 : (Ω2 , F2 ) → (R̄, B̄).
In particular, A ∈ F ⇒ Ax1 ∈ F2 , ∀x1 ∈ Ω1 .
R
Step 2 For f : (Ω, F ) → (R̄, B̄), f ≥ 0, the function x1 7→ f (x1 , x2 )K(x1 , dx2 ) is well-
defined due to Step 1.
We show that this function is (F1 , B̄)-measurable. Follow the definition of the
integral:
Take first f = IA , then f ≥ 0 with finitely many values, then f ≥ 0.
Let D be the collection of all A ∈ F , for which x1 7→ K(x1 , Ax1 ) is (F1 , B̄)-
measurable.
37
• D is a λ-system
• D contains all sets of the form A = A1 × A2 , since
K(x1 , Ax1 ) = IA1 (x1 )K(x1 , A2 ), hence x1 7→ K(x1 , Ax1 ) is (F1 , B̄)-measurable
as a product of two measurable functions.
⇒ D = F.
Step 3 Due to Step 2, the right hand side of (8.1) or (8.2), respectively, are well-defined.
(8.2) defines a probability measure on (Ω, F ):
R
(i) P (Ω) = K(x1 , Ω) P1 (dx1 ) = 1
| {z }
=1
∞ ∞
S S
(ii) A1 , A2 , . . . ∈ F , Ai ∩ Aj = ∅ for i 6= j ⇒ · Ai = · (Ai )x1
i=1 x1 i=1
∞ ∞
!
R [ ∞
S mon. P R
⇒ P · Ai = K x1 , · (Ai )x1 P (dx1 ) = K(x1 , (Ai )x1 )P (dx1)
i=1 conv. i=1
i=1
| P∞
{z }
i=1 K(x1 ,(Ai )x1 )
∞
P
= P (Ai ).
i=1
⇒ P is a probability measure.
P satisfies (8.1) since for f = IA , (8.1) is (8.2) and for general f , proceed as in
the definition of the integral. -
If Ω1 is countable, the left hand side of (8.4) can be defined in an elementary way, namely
P (X2 ∈ A2 | X1 = x1 ) = P (XP2 ∈A 2 , X1 =x1 )
(X1 =x1 )
= P ({x 1 }×A2 )
P1 ({x1 })
(provided that P1 ({x1 }) > 0) and (8.4) can be proved:
(8.3)
P ({x1 } × A2 ) = P1 ({x1 })K(x1 , A2 ) ⇒ (8.4).
In general, this is not possible (for instance, P1 ({x1 }) can be = 0, ∀x1 ∈ Ω1 !) and then
we can take (8.4) as the definition of P (X2 ∈ A2 | X1 = x1 ).
38
Example 8.4 (Classical case)
Assume K(x1 , ·) ≡ P2 (”no information”).
Then we write P = P1 ⊗ P2 and P is the product (”Produktmaß”) of P1 and P2 .
We then have
Example 8.6 (Ω1 , F1 ), (Ω2 , F2), P1 as in Example 8.5., K(x1 , ·) = δx1 , ∀x1 .
Then, X2 = X1 P -a.s. and P2 = P1 = U[0, 1].
1
Example 8.7 (Ω1 , F1 ), (Ω2 , F2), P1 as in Example 8.5, K(x1 , dx2 ) = λ|
x1 [0,x1 ]
.
Rx1
K(x1 , ·) = x11 λ|[0,x1 ] = U[0, x1 ] (or K(x1 , A) = x11 IA (u) du)
0
We compute P2 :
R1
P2 (A2 ) = P (Ω × A2 ) = K(x1 , A2 ) dx1
0
R1 Rt R1 t
Take A2 = [0, t]. Then K(x1 , A2 ) dx1 = 1 dx1 + x1
dx1 = t − log t.
0 0 t
⇒ X2 has the cdf F2 (t) = t − t log t (0 ≤ t ≤ 1).
39
Claim P is the uniform distribution on A, i.e. P (B) = 2λ2 (B ∩ A), ∀B ⊆ [0, 1]2 , B ∈
B[0,1]2 .
40
9 Absolute continuity, Radon-Nikodym derivatives
Recall that a real-valued random variable X on (Ω, A, P ) has the density f if
Zb Z
P (X ∈ [a, b]) = PX ([a, b]) = f (x) dx = f (x) dx.
a [a,b]
41
Theorem 9.4 (Uniqueness of densities)
Assume f, g ∈ E ∗ . Then:
R R
a) f = g µ-a.e. ⇒ f dµ = g dµ, ∀A ∈ A.
A A
R R
b) f integrable with respect to µ and f dµ = g dµ, ∀A ∈ A ⇒ f = g µ-a.e.
A A
Proof:
a) f = g µ-a.e. ⇒ f IA = gIA µ-a.e., ∀A ∈ A and the claim follows with Theorem 4.2.
R R
b) f dµ = g dµ ⇒ g is integrable as well.
Ω Ω
Define N := {f > g}, h := f IN − gIN ≥ 0. Then,
R R R
f IN dµ = gIN dµ ⇒ h dµ = 0 ⇒ h = 0 µ-a.e. ⇒ µ(N) = 0.
In the same way, µ({g > f }) = 0 ⇒ f = g µ-a.e. -
Example 9.5 b) does not hold without the assumption that f is integrable.
c
countable, A = {A ⊆ Ω | A countable or A countable},
Take Ω not
0 A is countable
µ(A) =
∞ Ac is countable.
µ is a measure on (Ω, A).
Take f (ω) = 1, g(ω) = 2, ∀ω ∈ Ω.
R R
Then, f dµ = µ(A) = 2µ(A) = g dµ, ∀A ∈ A but µ({f 6= g}) = µ(Ω) = ∞.
A A
Question: For two measures ν, µ on (Ω, A), is there f : (Ω, A) → ([0, ∞], B̄) such that
R
ν(A) = f dµ, ∀A ∈ A?
A
Definition 9.6 Assume ν, µ are measures on (Ω, A). We say that ν is absolutely contin-
uous with respect to µ if ∀A ∈ A : µ(A) = 0 ⇒ ν(A) = 0.
We write ν ≪ µ. In words: ν ≪ µ means that each µ-nullset is also a ν-nullset.
b) ν ≪ µ.
dν
f is also denoted Radon-Nikodym derivative and we write f = dµ
.
42
Proof:
a) ⇒ b) see Remark 9.7.
b) ⇒ a) will need the following lemma.
Lemma 9.9 Let η and τ be finite measures on (Ω, A) such that η(Ω) < τ (Ω). Then,
there is Ω∗ ∈ A with η(Ω∗ ) < τ (Ω∗ ) and η(A) ≤ τ (A), ∀A ∈ A with A ⊆ Ω∗ .
Case 1: αn = 0.
Then An+1 := ∅, Ωn+1 := Ωn \An+1 = Ωn ⇒ αk = 0, ∀k ≥ n.
Case 2: αn < 0.
Take An+1 ∈ A, An+1 ⊆ Ωn with ψ(An+1 ) < α2n and take Ωn+1 := Ωn \An+1 .
With this construction
∞ we
have
A , . . . ∈ A, Ai ∩ Aj = 0 for i 6= j and
1 , A2
P∞ S ∞
S
|ψ(An )| ≤ τ · An + η · An < ∞.
n=1 n=1 n=1
⇒ lim ψ(An ) = 0 ⇒ lim αn = 0.
n→∞ n→∞
∞
T
∗
Define Ω := Ωn and show that the statements in Lemma 9.9 are satisfied
n=1
with Ω∗ . We have Ω1 ⊇ Ω2 ⊇ . . . ,
ψ(Ω∗ ) = τ (Ω∗ ) − η(Ω∗ ) = lim τ (Ωn ) − lim η(Ωn ) = lim ψ(Ωn ).
n→∞ n→∞ n→∞
Since ψ(Ωn+1 ) = ψ(Ωn ) − ψ(An+1 ) ≥ ψ(Ωn ) ≥ ψ(Ωn−1 ) ≥ . . . ≥ ψ(Ω1 ) = ψ(Ω),
| {z }
≤0
we have ψ(Ω∗ ) ≥ ψ(Ω) > 0 ⇒ η(Ω∗ ) < τ (Ω∗ ).
If A ∈ A, A ⊆ Ω∗ , we have A ∈ A, A ⊆ Ωn , ∀n ⇒ ψ(A) ≥ αn , ∀n.
But lim αn = 0 ⇒ ψ(A) ≥ 0. -
n→∞
43
R R
With monotone convergence lim gn dµ = lim gn dµ ≤ ν(A).
AR n→∞ n→∞ A
Hence: f := lim gn ∈ G and f dµ = γ.
n→∞
Claim: f is a density of ν with respect to µ.
R
Proof: Define τ on (Ω, A) by τ (A) := ν(A) − f dµ ≥ 0, ∀A ∈ A.
A
τ is a finite measure on (Ω, A) with τ ≪ µ (µ(A) = 0 ⇒ τ (A) = ν(A) = 0).
Assume τ (Ω) > 0.
τ (Ω)
τ ≪ µ ⇒ µ(Ω) > 0 ⇒ q := 2µ(Ω) > 0 ⇒ τ (Ω) = 2qµ(Ω) > qµ(Ω).
Apply Lemma 9.9 with τ and η = qµ and conclude that there is Ω∗ ∈ A with
τ (Ω∗ ) > qµ(Ω∗ ) and τ (A) ≥ qµ(A), ∀A ∈ A, A ⊆ Ω∗ .
Take f ∗ := f + qIΩ∗ . Then ∀A ∈ A:
R ∗ R R
f dµ = f dµ + qµ(A ∩ Ω∗ ) ≤ f dµ + τ (A) = ν(A) ⇒ f ∗ ∈ G.
A A A Z
R
Since 0 < qµ(Ω ) < τ (Ω ), f dµ = f dµ + qµ(Ω∗ ) and this is a contradiction
∗ ∗ ∗
| {z }
| {z } >0
=γ
to the definition of γ.
R
⇒ τ (Ω) = 0 ⇒ ν(A) = f dµ, ∀A ∈ A.
A
Case 2: ν, µ σ-finite
There are sequences (An ), (Bn ), An ∈ A, ∀n, Bn ∈ A, ∀n such that
A1 ⊆ A2 ⊆ . . . , An ր Ω, B1 ⊆ B2 ⊆ . . . , Bn ր Ω and
ν(An ) < ∞, ∀n, µ(Bn ) < ∞, ∀n.
Define Cn := An ∩ Bn , n ∈ N.
Then C1 ⊆ C2 ⊆ . . . , Cn ր Ω and ν(Cn ) < ∞, µ(Cn ) < ∞, ∀n.
∞
S
Take En := Cn \Cn−1 . Then · En = Ω, ν(En ) < ∞, µ(En ) < ∞, ∀n.
n=1
Define for each n the finite measures νn , µn by
νn (A) := ν(A ∩ En ) and µn (A) := µ(A ∩ En ).
Then, dµ n
dµ
= IEn (∗) (Proof of (∗): exercise)
We have measurable functions fn : Ω → [0, ∞] such that
∞ ∞ m
P Case 1 P R (∗) R P R
ν(A) = νn (A) = fn dµn = lim fn IEn dµ = f dµ
n=1 n=1 An m→∞ A n=1 A
∞
P
with f := fn IEn .
n=1
⇒ ν has the density f with respect to µ. -
0 x<0
Example 9.10 Let F (x) =
1 − 1 e−x x ≥ 0
2
and let ν be the corresponding measure on (R, B) with ν((−∞, x]) = F (x), x ∈ R.
(ν could be for instance the law of the amount of rain falling at some place per month.)
Let λ be the Lebesgue measure on (R, B).
ν is NOT absolutely continuous with respect to λ since λ({0}) = 0, ν({0}) = 12 > 0.
dν
Define µ := λ + δ0 . Then ν ≪ µ ⇒ there is a Radon-Nikodym derivative f = dµ .
44
0 x<0
Claim: f (x) = 1
2
x=0
1 e−x
2
x>0
does the job.
Rx
0 = f (t) dt x < 0
−∞
Rx Rx
Proof: ν((−∞, x]) = 12 = f (t) dt + 21 x=0 = f dµ, ∀x. -
−∞
−∞
Rx
1 − 21 e−x = 1
f (t) dt + 2
x > 0
−∞
Example 9.11 Assume ν is a probability measure on (R, B) with ν ≪ λ. Then, the cdf
F of ν is absolutely continuous.
A function F is absolutely continuous if there is a measurable function f ≥ 0 such that
Rx Rx
F (x) = f (t) dt = f (t)λ(dt), ∀x ∈ R.
−∞ −∞
f is determined uniquely, λ-a.e.
Remark 9.12 There are probability measures on (R, B) such that ν(x) = 0, ∀x ∈ R
(⇒ the cdf F is continuous ) but ν 6≪ λ (⇒ F is not absolutely continuous).
Definition 9.13 Two measures µ and ν on (Ω, A) are singular with respect to each other
if there is a set A ∈ A such that µ(A) = 0, ν(Ac ) = 0. We write ν ⊥ µ.
Examples
i) νa ≪ µ,
ii) νs ⊥ µ,
iii) ν = νa + νs .
We say that νa is the absolutely continuous part and νs the singular part of ν with respect
to µ.
45
Proof: Let Nµ = {A ∈ A | µ(A) = 0} and α := sup{ν(A) | A ∈ Nµ } and let (An ) be an
increasing sequence in Nµ such that lim ν(An ) = α.
n→∞
∞
S
Define N := An . Then µ(N) = 0, ν(N) = α.
n=1
Define νa and νs by νa (A) = ν(A ∩ N c ), νs (A) = ν(A ∩ N).
Then, νa and νs are measures on (Ω, A) with ν = νa + νs .
Since νs (N c ) = 0 and µ(N) = 0, we have νs ⊥ µ.
We show that νa ≪ µ:
A ∈ A, µ(A) = 0 ⇒ N ∪(A · ∩ N c ) ∈ Nµ ⇒ ν(N ∪(A· ∩ N c )) = ν(N) + ν(A ∩ N c ) =
ν(B)≤α, ∀B∈Nµ
α + νa (A) ≤ α
⇒ νa (A) = 0.
Remains to show: uniqueness of the decomposition.
Assume ν = νa + νs = ν̃a + ν̃s with νa , νs as above, ν̃a ≪ µ, ν̃s ⊥ µ.
Since ν̃s ⊥ µ, there is a µ-nullset Ñ with ν̃s (Ñ c ) = 0, hence
46
10 Construction of stochastic processes
We saw that for a probability measure P1 on (Ω1 , F1) and a stochastic kernel from (Ω1 , F1 )
to (Ω2 , F2 ), there is a unique probability measure P = P1 × K on (Ω, F ) (Ω = Ω1 ×
Ω2 , F = F1 ×F2 ). On the other hand, under mild regularity assumptions, each probability
measure P on (Ω, F ) is of the form P1 × K when P1 is the marginal P X1−1 (where
ω = (x1 , x2 ), X1 (ω) = x1 , X2 (ω) = x2 ) and K is a suitable kernel from (Ω1 , F1) to
(Ω2 , F2 ).
We now describe two particular cases:
1. If Ω1 is countable,
we take
P (X ∈ A | X = x ) P (X = x ) > 0
2 2 1 1 1 1
K(x1 , A2 ) =
ν2 (A2 ) otherwise
where ν2 is an arbitrary probability measure on (Ω2 , F2 ).
P
Then, P (A1 × A2 ) = P (X1 = x1 , x2 ∈ A2 ) =
P x 1 ∈A 1
P (X1 = x1 ) P (x2 ∈ A2 | X1 = x1 ) = (P1 × K)(A1 × A2 )
x1 ∈A1 | {z }
P (X1 =x1 )>0 =K(x1 ,A2 )
⇒ P = P1 × K, see the uniqueness statement after (8.3).
47
Modelling the evolution of a stochastic system
Let(Ωi , Fi ) (i =0, 1, . . . ) be a sequence of measurable spaces. For n ≥ 0 take (Ω(n) , F (n) )
Qn Qn
= Ωi , Fi . Assume P0 is a probability measure on (Ω0 , F0 ) and for each n ≥ 1, Kn
i=0 i=0
is a stochastic kernel from (Ω(n−1) , F (n−1) ) to (Ωn , Fn ).
By applying Theorem 8.2 n times, we get for each n ≥ 1 a probability measure P (n) on
(Ω(n) , F (n) ) with P (1) = P0 , P (n) = P (n−1) × Kn and we have ∀f ≥ 0, F (n) -measurable:
Z Z Z
(n)
f dP = · · · f (x0 , . . . , xn )Kn−1 ((x0 , . . . , xn−1 ), dxn ) · . . . · K1 (x0 , dx1 )P0 (dx0 ).
(10.1)
(n) (n) (n)
(Ω ,F ,P ) models the evolution of the system in the time 0, 1, . . . , n.
Wanted: A probability measure P on (Ω, A) such that the restriction to (Ω(n) , F (n) ) is
P (n) . (Ω, A, P ) is then a model for the evolution of the system for infinite time horizon.
More precisely, A ∈ An is of the form A = A(n) × Ωn+1 × Ωn+2 × . . . with A(n) ∈ F (n) and
we want a probability measure P on (Ω, A) such that
48
∞
∞
S S
Question: Can we extend P from the algebra An to the σ-field A = σ An ?
n=0 n=0
∞
S
Uniqueness of the extension follows with Theorem 1.18 since An is ∩-stable and A =
∞ n=0
S S∞
σ An . P is additive on An .
n=0 n=0
∞
S
To apply Theorem 1.18 we have to show that P is σ-additive on An .
n=0
∞
S
Due to the Remark after Theorem 1.17, it suffices to show that An ∈ Ak , An ց ∅ ⇒
k=0
lim P (An ) = 0.
n→∞
Without loss of generality An ∈ An (n = 1, 2, . . . ) hence An = A(n) × Ωn+1 × Ωn+2 × . . . ,
A(n+1) ⊆ A(n) × Ωn+1 with A(n) ∈ F (n) (n = 1, 2, . . . ).
We assume that inf P (An ) > 0, and this will lead to a contradiction.
n
Now, since Z
R
P (An ) = · · · IA(n) (x0 , . . . , xn )Kn ((x0 , . . . , xn−1 ), dxn ) . . . K1 (x0 , dx1 ) P0 (dx0 ) =
| {z }
f0,n (x0 )
R
f0,n (x0 )P0 (dx0 ), there is some x̄0 ∈ Ω0 with inf f0,n (x̄0 ) > 0.
n
In the same way, with K1 (x̄0 , dx1 ) instead of P0 , there has to be x̄1 ∈ Ω1 with inf f1,n (x̄1 ) >
R R n
0, where inf · · · IA(n) ((x̄0 , x̄1 , x2 , . . . , xn )Kn ((x̄0 , x̄1 , x2 , . . . , xn−1 ), dxn ) . . .
n R
K2 ((x̄0 , x̄1 ), dx2 )K1 (x̄0 , dx1 ) = inf f1,n (x̄1 )Kn (. . . ) . . . K2 ((x̄0 , x̄1 ), dx2 ) > 0.
n
⇒ for each k ≥ 0 there is x̄k ∈ Ωk such that
R R
inf · · · IA(n) (x̄0 , . . . , x̄k , xk+1 , . . . , xn )Kn ((x̄0 , . . . , x̄k , xk+1 , . . . , xn−1 ), dxn ) . . .
n
Kk+1((x̄0 , . . . , x̄k ), dxk+1) > 0.
In particular,
A(k+1) ⊆A(k) ×Ωk+1
for n = k + 1, IA(k+1) (x̄0 , . . . , x̄k , ·) 6= 0 ⇒ (x̄0 , . . . , x̄k ) ∈ A(k) .
T
But now ω̄ = (x̄0 , x̄1 , . . . ) ∈ Ak , ∀k and this contradicts An = ∅. -
n
Definition 10.2 The sequence (Xn )n=1,2,... on the probability space (Ω, A, P ) is the
stochastic process with initial law P0 and evolution laws Kn (n = 1, 2, . . . ). We write
P (Xn ∈ An | X0 = x0 , . . . , Xn−1 = xn−1 ) = Kn ((x0 , . . . , xn−1 ), An ) (xi ∈ Ωi , An ∈ Fn ).
If K((x0 , . . . , xn−1 ), ·) = Kn (xn−1 , ·), ∀n for a kernel Kn from (Ωn−1 , Fn−1) to (Ωn , Fn ),
we say that (Xn )n=0,1,... is a Markov process.
If in addition (Ωn , Fn ) = (S, S) and Kn (·, ·) = K(·, ·), ∀n (for some measurable space
(S, S) and some kernel K from (S, S) to (S, S)) we say that (Xn )n=0,1,... is a time-
homogeneous Markov process with state space (S, S) and kernel K.
Example 10.3 S = R, S = B. Take β > 0. The kernel K from (S, S) to (S, S) is given
by K(x, ·) = N(0, βx2 ) and we take P0 = δx0 for some x0 6= 0.
(Xn )n=0,1,... is a system that ”approaches 0” with a variance which depends on the present
state.
49
Stability question:
For which values of β do we have Xn → 0 P -a.s.?
We have Z
2 (10.2)
R 2 (n)
E(Xn ) = xn P (dx0 , . . . , dxn ) = x2n K(xn−1 , dxn ) P (n−1) (dx0 , . . . , dxn−1 )
| {z }
=βx2n−1
2
= βE(Xn−1 ) ⇒ E(Xn2 )
= β x0 n
(n = 0, 1,
. . . ).
∞
P ∞ ∞
mon. conv. P P
For β < 1, we conclude that E Xn2 = E(Xn2 ) < ∞ ⇒ Xn2 < ∞ P -a.s.
n=0 n=0 n=0
and in particular,
P (lim Xn = 0) = 1. (10.3)
n
For β < 1, we therefore have stable behaviour (in the sense of (10.3)).
q
R
π
This continues to be true for β < 2 : since |y|N(0, σ )(dy) = π2 σ, we have
2
RR
E(|Xn |) = |xn |K(xn−1 , dxn ) P (n−1) (dx0 , . . . , dxn−1).
| {z }
√ √
= β|xn−1 | π2
q
Hence E(|Xn |) = π2 βE(|Xn−1 |) and (10.3) follows with the same argument as above.
For 1 < β < π2 , we hence have E(Xn2 ) → ∞, Xn → 0 P -a.s.
Theorem 10.5 P is the product of its marginals P ◦ Xn−1 if and only if X0 , X1 , . . . are
independent with respect to P .
∞
Q
Proof: Let P̃ = Pn . Then, X0 , X1 , . . . are independent with respect to P ⇔
n=0
k
Q
P (X0 ∈ A0 , . . . , Xk ∈ Ak ) = P (Xi ∈ Ai ), ∀k, ∀A1 , . . . , Ak ∈ F1 , . . . , Fk .
i=0
k
Q
But P (Xi ∈ Ai ) = P̃ (X0 ∈ A0 , . . . , Xk ∈ Ak ). Hence X0 , X1 , . . . independent with
i=0
respect to P ⇔ P = P̃ . -
50
Hence, for 0 < p < 1, Pp ({ω}) = 0, ∀ω ∈ Ω.
n
P
See later: for Sn = xi , Snn → p Pp -a.s.
i=0
n
P
This implies that for p 6= q ⇒ Pp ⊥ Pq (take A = {ω : lim n1 xi = q}, then
n i=1
Pp (A) = 0, Pq (Ac ) = 0). Let An = σ(X0 , . . . , Xn ).
We write Pp |An for the restriction of Pp to (Ω, An ).
Then, Pp |An ≪ Pq |An , ∀p, q ∈ (0, 1).
Example 10.7 0 ≤ α, β, γ ≤ 1. Markov chain with state space S = {0, 1}, initial law
!
α 1−α
µ = (γ, 1 − γ), µ({0}) = γ, µ({1}) = 1 − γ and the stochastic kernel K = .
β 1−β
Then P (X0 = 0) = γ = 1 − P (X0 = 1),
P (Xn+1 = 0 | Xn = 0) = α = 1 − P (Xn+1 = 1 | Xn = 0),
P (Xn+1 = 0 | X1 = 1) = β = 1 − P (Xn+1 = 1 | Xn = 1).
Due to Theorem 10.1, this determines uniquely a probability measure P on (Ω, A) where
∞
Q
Ω = S N, A = Si , Si = P(S), ∀i, as above.
i=0
Example 10.6 is a particular case with α = β = γ = 1 − p.
51
11 The law of large numbers
(Ω, A, P ) probability space.
Proof: g is convex (strictly convex) if and only if there is for each x0 ∈ R a linear function
l(x) = ax + b such that l(x0 ) = g(x0 ) and l(x) ≤ g(x), ∀x 6= x0 (l(x) < g(x), ∀x 6= x0 ).
l linear
Take x0 = E(X), then E(g(X)) ≥ E(l(X)) = l(E(X)).
| {z }
g(E(X))
If we have ”=” in the above argument then l(X) = g(X) P -a.s.
(since for Y = g(X) − l(X), Y ≥ 0 and E(Y ) = 0 ⇒ Y = 0 P -a.s.).
If g is strictly convex, this implies X = x0 = E(X) P -a.s. -
Remark Due to Theorem 6.2 (ii), Theorem 11.2 implies the weak law of large numbers
which says that Snn → E(X1 ) in probability.
52
Example 10.13 (continuation)
P0 := δx0 where x0 6= 0, K(x, ·) = N(0, βx2 ).
√
The process (Xn )n=0,1,... can also be written as X0 = x0 , Xn+1 = β|Xn |Yn+1 ,
n = 0, 1, . . . where Y1 , Y2 , . . . are i.i.d. with law N(0, 1).
√
Then, |Xn | = ( β)n |Yn ||Yn−1| . . . |X0 |
Pn √ Pn
⇒ log |Xn | = (log(|Yi |) + log( β)) + log(|x0 |) = log |x0 | + Zi where Zi , i = 1, 2, . . .
i=1 √ √ i=1
are i.i.d., Zi = log |Yi | + log β, E(Z1 ) = E(log |Y1 |) + log β.
√
We define βc by − log βc = E(log |Y1 |).
Thm. 11.2
Then, if β > βc , n1 log |Xn | −−−−−−→ E(Z1 ) > 0 P -a.s. ⇒ |Xn | → ∞ P -a.s.
If β < βc , n1 log |Xn | → E(Z1 ) < 0 P -a.s. ⇒ |Xn | → 0 P -a.s.
(∗)
Remark βc = exp(−2E(log |Y1|)) > exp(−2 log E(|Y1 |)) = π2 .
| {z }
√2
π
Proof: Take X̃i = Xi ∧ M. Then, X̃1 , X̃2 , . . . are i.i.d. and S̃nn → E(X̃1 ) P -a.s.
But: for each K > 0 there is M(K) = M such that E(X̃1 ) = E(X1 ∧ M) ≥ K.
Hence, lim inf Snn ≥ lim inf n S̃nn ≥ K P -a.s.
n
Sn Sn
Since K was arbitrary, this implies lim inf n
= ∞ P -a.s. ⇒ n
→ ∞ P -a.s. -
n
53
12 Weak convergence, characteristic functions and the
central limit theorem
Recall that for X1 , X2 , . . . i.i.d. with P (Xi = 1) = p, P (Xi = 0) = 1 − p,
Pn
Sn = Xi , Sn∗ = √Sn −np , we have
i=1 np(1−p)
n→∞
P (Sn∗ ≤ x) −−−→ Φ(x), (12.1)
Rx t2
where Φ(x) = √1 e− 2 dt (de Moivre-Laplace Theorem).
2π
−∞
Goal:
We often assume that S is a Polish space, i.e. a complete separable metric space.
We always equip S with its Borel-σ-field S (S is generated by the open subsets of S).
w
We write µn −
→ µ.
v) lim µn (A) = µ(A) for all sets A whose boundary is not charged by µ (”µ-randlose
n→∞
Mengen”) i.e. ∀A ∈ S with µ(Ā\Å) = 0 where Ā is the closure of A and Å its
interior.
Proof:
(i) ⇒ (ii) clear.
54
(ii) ⇒ (iii) F closed, ε > 0 ⇒ ∃f uniformly continuous, 0 ≤ f ≤ 1, f (u) = 1,
∀u ∈ F, f (u) = 0, ∀u ∈ Uε (F )c , where Uε (F ) = {s : d(s, F ) < ε}
(take for instance f (u) = (1 − 1ε d(u, F )) ∨ 0).
R (ii) R R
Then lim sup µn (F ) ≤ lim sup f dµn = f dµ ≤ IUε (F ) dµ = µ(Uε (F )).
n n
ε→0
Since F is closed, Uε (F ) ց F , hence µ(Uε (F )) ց µ(F ).
⇒ lim sup µn (F ) ≤ µ(F ).
n
(iii) ⇔ (iv) ”⇒”lim inf µn (G) = 1 − lim sup µn (Gc ) ≥ 1 − µ(Gc ) = µ(G).
n n
”⇐” follows in the same way.
(iii)+(iv) ⇒ (v) If the boundary of A is not charged by µ, then
µ(A) = µ(Å) ≤ lim inf µn (Å) ≤ lim sup µn (Ā) = µ(Ā) = µ(A).
n n
(v) ⇒ (i) Take f ∈ Cb (S), ε > 0. Choose c1 , . . . , cm such that ck − ck−1 < ε,
µ({f = ck }) = 0, ∀k, hence Ak = {ck−1 < f < ck } are sets whose boundary is not charged
Pm
by µ. Take g := ck IAk . Then, kg − f k ≤ ε (where kg − f k = sup |f (s) − g(s)|).
R R k=1 s∈S
But g dµn → g dµ due to (v).
R R
⇒ lim sup | f dµn − f dµ| ≤ 2ε. -
n
Lemma 12.4 Assume (Xn ) is a sequence of real-valued random variables which converges
to 0 in probability.
d
Then, Xn − → 0, i.e. the laws of (Xn ) converge weakly to the Dirac measure in 0.
Lemma 12.5 Assume (S1 , d1 ) and (S2 , d2 ) are Polish spaces with metrics d1 and d2 and
h : S1 → S2 a continuous function. Then:
(a) For probability measures µ, µ1 , µ2 , . . . on (S1 , S1 ) with µn → µ, we have
w
(µn )h = µn h−1 −
→ µ ◦ h−1 = µh .
d
(b) Assume S1 = S2 = R and X, X1 , X2 , . . . are random variables with Xn −
→ X.
d
Then, h(Xn ) −
→ h(X).
55
Proof:
R n→∞ R
(a) We have to show that for all f ∈ Cb (S2 ), f d(µh ◦ h−1 ) −−−→ f d(µ ◦ h−1 ).
But, due to Theorem 3.6,
R R n→∞ R R
f d(µh ◦ h−1 ) = (f ◦ h) dµn −−−→ (f ◦ h) dµ = f d(µ ◦ h−1) since f ◦ h ∈ Cb (S1 ).
Example 12.7 (d = 1)
1. For c ∈ R and µ = δc , µ̂(x) = eixc , x ∈ R.
∞
P ∞
P
2. For a discrete law µ = αk δck with αk > 0 we have µ̂(x) = αk eick x
k=1 k=1
(note that (12.3) is linear in µ).
n
P
n
In particular, for µ = Bin(n, p), µ = k
pk (1 − p)n−k δk .
k=0
n
P k
n
⇒ µ̂(x) = k
p (1 − p)n−k eikx = (1 − p + peix )n .
k=0
In the same way, if µ is Poisson with parameter λ,
∞
P k
∞
P k ix
µ= e−λ λk! δk ⇒ µ̂(x) = e−λ λk! eikx = eλ(e −1) .
k=0 k=0
1 2
3. For µ = N(0, 1), µ̂(x) = e− 2 x , x ∈ R.
R∞ − 1 y2 ixy
Proof: µ̂(x) = √12π e 2 e dy and one can calculate this integral.
−∞
x2
Let ψ(x) = µ̂(x), ϕ(x) = √12π e− 2 .
ϕ is a solution of the differential equation
ϕ′ (x) + xϕ(x) = 0. (12.4)
R∞
(Check!) The same holds true for ψ. More precisely, ψ(x) = eixy ϕ(y) dy
−∞
Z∞
⇒ ψ ′ (x) = i yeixy ϕ(y) dy (12.5)
−∞
56
and, integrating by parts,
Z∞ Z∞ Z∞
xψ(x) = xe ixy
ϕ(y) dy = −ieixy ϕ(y)|∞
| {z } |{z} −∞ + i eixy ϕ′ (y) dy = i eixy ϕ′ (y) dy
−∞ ր ց −∞ −∞
(12.6)
(since ϕ(|x|) → 0 for |x| → ∞).
Due to (12.4), ϕ′ (y) + yϕ(y) = 0, hence (12.5) and (12.6) imply
4. Let µ be the Cauchy distribution with parameter c > 0, i.e. µ has the density
f (x) = π(c2c+x2 ) . Then, µ̂(x) = e−c|x| , see literature.
ix
5. µ = U[0, 1]. Then, µ̂(x) = e ix−1 .
R1
Proof: µ̂(x) = eixy dy = ix1 eixy |10 = 1
ix
(eix − 1). -
0
αe−αx x≥0
6. µ = exp(α) i.e. µ has the density f (x) =
0 x < 0.
α
Then, µ̂(x) = α−ix
, x ∈ R.
R∞ ixy −αy R∞ α
Proof: µ̂(x) = e αe dy = αe−(α−ix)y dy = α−ix
.
0 0
The characteristic function is an important tool in probability for the following reasons:
(1) The mapping X 7→ ϕX has nice properties under transformations and sums of i.i.d.
random variables.
(3) Pointwise convergence of the characteristic function is (under mild additional assump-
tions) equivalent to the weak convergence of the corresponding measures.
At (1):
Lemma 12.8 For a random variable X with values in Rd and characteristic function ϕX
we have
b) ϕX is uniformly continuous.
57
d) ϕX (·) is real-valued if and only if PX = P−X i.e. if the law of X equals the law of −X.
Proof:
(a) If E(|X|k ) < ∞, ϕX is k-times continuously differentiable and the derivatives are
(j)
given by ϕX (t) = E((iX)j eitX ), j = 0, 1, . . . , k.
58
At (2):
Theorem 12.10 Assume µ1 , µ2 are probability measures on (Rd , Bd ) with µ̂1 = µ̂2 . Then,
µ1 = µ2 .
Proof: Since the compact sets generate the Borel-σ-field, it suffices to show that
µ1 (K) = µ2 (K), ∀K ⊆ Rd , K compact.
Assume K compact and let d(x, K) = inf{d(x, y) | y ∈ K} be the distance from x to K
(x ∈ Rd ).
1
x ∈ K,
For m ∈ N define fm : Rd → [0, 1] by fm (x) = 0 d(x, K) ≥ m1 ,
1 − md(x, K) otherwise.
Then, fm is continuous, has values in [0, 1] and compact support and fm ց IK for m → ∞.
R R
With monotone convergence, we see that it suffices to show that fm dµ1 = fm dµ2 , ∀m
(because we then conclude µ1 (K) = µ2 (K)). Fix m.
Take ε > 0 and choose N large enough such that BN := [−N, N]d contains the set
{x ∈ Rd | fm (x) 6= 0} and such that µ1 (BN c
) ≤ ε and µ2 (BN c
) ≤ ε.
Using the Fourier convergence Theorem, there is a function g : Rd → C of the form
P n 2π
g(x) = cj eih 2N tj ,xi , where n ∈ N, c1 , . . . , cn ∈ C and t1 , . . . , tn ∈ Zd , such that
j=1
sup |g(x) − fm (x)| ≤ ε.
x∈BN
We conclude that sup |g(x)| ≤ 1 + ε. Now we can estimate
x∈Rd
Z Z Z Z Z Z
fm dµ1 − fm dµ2 ≤ fm dµ1 − g dµ1 + g dµ1 − g dµ2
Z Z
+ g dµ2 − fm dµ2 . (12.9)
At (3):
59
(b) If (ϕn ) converges pointwise to a function ϕ : R → C and ϕ is continuous at 0, then
there is a probability measure µ on (R, B) such that ϕ is the characteristic function
w
of µ and µn −→ µ.
For the proof, we will need the improtant notion of tightness (”Straffheit”).
Example 12.14 Let µn be the uniform distribution on [−n, n], i.e. µn = U[−n, n]. Then,
1
Rn itx 1 1 1
ϕn (t) = 2n e λ(dx) = 2n ti
(eitn − e−itn ) = 2nt (sin tn − sin(−tn)) = sintntn , t ∈ R\{0}.
−n
1 t = 0
n→∞
Hence ϕn (t) −−−→ ϕ(t), ∀t, where ϕ(t) =
0 else.
60
In particular, ϕ is not continuous in 0.
In fact, (µn ) does not converge weakly, see exercises.
Corollary 12.15 Let α > 0. For each n ∈ N consider i.i.d. random variables X1 , . . . , Xn .
Bernoulli with p = αn , i.e. P (Xi = 1) = αn = 1 − P (Xi = 0).
n
P
Sn = Xi has the law Bin(n, αn ).
i=1
⇒ the characteristic function ϕSn of Sn is given by ϕSn (t) = ϕn (t) = (1 − αn + αn eit )n , see
Example 12.7.2.
it it
ϕn converges to eα(e −1) for n → ∞ and ϕ(t) = eα(e −1) is the characteristic function of
the Poisson distribution with parameter α, see Example 12.7.2.
According to Theorem 12.11, the laws of Sn converge weakly to the Poisson distribution
with parameter α.
61
R
Remark Let M = {ν | ν probability measure on (R̄, B̄) with x2 dν < ∞}.
R R R R
For ν ∈ M, |x| dν < ∞. Let m = x dν and σ 2 = x2 dν − ( x dν)2 .
For ν1 , ν1 ∈ M, define ν1 ∗ ν2 as follows.
Let X1 , X2 independent random variables with laws ν1 and ν2 .
√2 −m
Let ν1 ∗ ν2 be the law of X1 +X 2
1 −m2
2
.
σ1 +σ2
n −nm
S√
Then, if X1 , . . . , Xn are i.i.d. with ν, the laws of Sn∗ = nσ2
is |ν ∗ ν ∗{z. . . ∗ ν}.
n-times
Note µ = N(0, 1) is a fixed point, µ ∗ µ = µ.
In this sense, the CLT describes convergence to a fixed point.
62
13 Conditional expectation
13.1 Motivation
Assume X is random variable on some probability space (Ω, A, P ), X ≥ 0.
The expectation E(X) can be interpreted as a prediction for the unknown (random) value
of X.
Assume A0 ⊆ A, A0 is a σ-field and assume we ”have the information in A0 ”, i.e. for
each A0 ∈ A0 we know if A0 will occur or not.
How does this partial information modify the prediction of X?
Example 13.1
(a) If X is measurable with respect to A0, then {X ≤ c} ∈ A0 , ∀c and we know for each
c if {X(ω) ≤ c} or {X(ω) > c} occurs.
⇒ We know the value of X(ω).
The solution of the prediction problem is to pass from the constant E(X) = m to a
random variable E(X | A0 ), which is measurable with respect to A0 , the conditional
expectation of X, given A0 .
P (B ∩ Ai )
P (B | Ai ) = (B ∈ A)
P (Ai )
R 1
R 1
and define E(X | Ai ) = X dP (· | Ai ) = P (A i)
X dP = P (A i)
E(XIAi ). Now define
Ai
X 1
E(X | A0 )(ω) := E(XIAi )IAi (ω). (13.1)
P (Ai )
i : P (Ai )>0
(13.1) gives for each ω ∈ Ω a prediction E(X | A0 )(ω) which uses only the information in
which atom ω is.
63
Theorem 13.3 The random variable E(X | A0 ) (defined in (13.1)) has the following
properties:
(ii) For each random variable Y0 ≥ 0, which is measurable with respect to A0 , we have
In particular,
E(X) = E(E(X | A0 )). (13.3)
Example 13.4
Take p ∈ [0, 1], let X1 , X2 , . . . be i.i.d. Bernoulli random variables with parameter p, i.e.
P (Xi = 1) = p = 1 − P (Xi = 0).
Question: What is E(X1 | Sn )?
n
P
Answer: E(X1 | Sn ) = P (X1 = 1 | Sn = k)I{Sn =k} and
k=0
P (X1 =1,Sn =k) k−1 )
p(n−1 pk−1 (1−p)n−1−(k−1) k
P (X1 = 1 | Sn = k) = = =
P (Sn =k) (nk)pk (1−p)n−k n
Sn
⇒ E(X1 | Sn ) = . (13.4)
n
Remark E(X1 | Sn ) does not depend on the ”success parameter” p.
64
Example 13.5 (Randomized sums)
X1 , X2 , . . . random variables with Xi ≥ 0, ∀i and E(Xi ) = m, ∀i.
TP
(ω)
T : Ω → {0, 1, . . . } is independent of (X1 , X2 , . . . ), ST (ω) := Xk (ω).
k=1
∞
P 1
Then, according to (13.1), E(ST | T ) = P (T =k)
E(ST I{T =k} )I{T =k} .
k=0
T indep.
But E(ST I{T =k} ) = E(Sk I{T =k} ) = E(Sk )E(I{T =k} ) = k · m · P (T = k).
of Sk
∞
P
Hence E(ST | T ) = m kI{T =k}
k=0
| {z }
=T
⇒ E(ST | T ) = m · T .
Now, with (13.3) we conclude that
We write X0 = E(X | A0 ).
Remark 13.8
65
Proof of Theorem 13.9:
2. If X0 and X̃0 are random variables which satisfy (ii) in Definition 13.7, then
A0 = {X0 > X̃0 } ∈ A0 .
13.7 (ii) implies that E(X0 IA0 ) = E(X̃0 IA0 ) ⇒ E((X0 − X̃0 )IA0 ) = 0 ⇒ P (A0 ) = 0.
In the same way, P (X0 < X̃0 ) = 0 ⇒ X0 = X̃0 P -a.s. -
Sn
E(Xi | Sn ) = , i = 1, . . . , n. (13.7)
n
Proof: We will need the following lemma.
Lemma 13.11 X and Y are random variables on some probability space (Ω, A, P ). Then,
the following statements are equivalent:
(b) There is a measurable function h : (R̄, B̄) → (R̄, B̄) such that Y = h(X).
Proof of Lemma 13.11: (b) ⇒ (a) is clear because the composition of measurable functions
is measurable.
(a) ⇒ (b): Take first Y = IA , A ∈ σ(X).
Then, A = {X ∈ B} for some B ∈ B and Y = h(X) = I{X∈B} .
P
Then, take Y = ci IAi , then monotone limits of such functions etc. -
i
66
Remark The proof used only that the joint law of (X1 , . . . , Xn ) is invariant under per-
mutations of the indices.
67
Theorem 13.13
Proof:
(a) The right hand side of (13.8) is measurable with respect to A0 and for Y0 ≥ 0, Y0
measurable with respect to A0 we have
13.7 (ii)
E(Y0 (Z0 X)) = E((Y0 Z0 )X) = E(Y0 Z0 E(X | A0 )) = E(Y0 (Z0 E(X | A0 ))).
Proof: Each convex function f is of the form f (x) = sup ln (x) ∀x with linear functions
n
ln (x) = an x + bn . In particular, f ≥ ln , ln (X) ∈ L1 .
13.12(b) 13.12(a)
Since E(f (x) | A0 ) ≥ E(ln (x) | A0 ) = ln (E(X | A0 )), we have
E(f (x) | A0 ) ≥ sup ln (E(X | A0 )) = f (E(X | A0 )) P -a.s. -
n
Corollary 13.14
For p ≥ 1, conditional expectation is a contraction of Lp in the following sense:
X ∈ Lp ⇒ E(X | A0 ) ∈ Lp and kE(X | A0 )kp ≤ kXkp .
Proof: With f (x) = |x|p , Jensen’s inequality for conditional expectations implies that
|E(X | A0 )|p ≤ E(|X|p | A0 ) ⇒ E(|E(X | A0 )|p ) ≤ E(|X|p )
⇒ kE(X | A0 )kp ≤ kXkp . -
68
Theorem 13.15 Assume X ∈ L2 , Y0 is measurable with respect to A0 and Y0 ∈ L2 .
Then E ((X − E(X | A0 ))2 ) ≤ E((X − Y0 )2 ) and we have
”=” if and only if Y0 = E(X | A0 ) P -a.s.
Remark Theorem 13.15 says that the conditional expectation E(X | A0 ) is the projec-
tion of the element X in the Hilbert space L2 (Ω, A, P ) on the closed subspace L2 (Ω, A0 , P ).
and
E(E(X | A0 ) | A1 ) = E(X | A0 ) P -a.s. (13.11)
69
14 Martingales
14.1 Definition and examples
(Ω, A, P ) probability space, A0 ⊆ A1 ⊆ A2 ⊆ . . . increasing sequence of σ-fields with
Ai ⊆ A, ∀i.
Interpretation: An is the collection of events observable at time n.
Remarks 14.2
E(Mn+1 − Mn | An ) = 0, ∀n ≥ 0. (14.3)
3. We say that (Mn ) is adapted to (An ) (meaning that for each n, Mn is measurable
with respect to An ).
70
Mn := Sn − n(2p − 1) (n = 0, 1, . . . ).
Then (Mn ) is a martingale with respect to (An ).
In the same way, for x ∈ R, M̃n = x + Sn − n(2p − 1) (n = 0, 1, . . . ) is a martingale with
respect to (An ).
Note that (Sn ) is a martingale with respect to An ⇔ p = 12 .
Claim: Mn = E(X | An ), n = 0, 1, 2, . . .
Consequence: (Mn ) is a martingale with respect to (An ) (it falls into class 2).
Now Radon-Nikodym derivatives on increasing σ-fields form a martingale even if they are
not of the form E(X | An ).
71
Proof:
2. E(Mn+1 | An ) = Mn , ∀n, with the same argument as in Example 14.4, i.e. we show
R
that Q(A) = E(Mn+1 | An ) dP, ∀A ∈ An .
A R R
Take A ∈ An . Then, Q(A) = Mn dP = 1A Mn+1 dP
A
A∈An R R
= 1A E(Mn+1 | An ) dP = E(Mn+1 | An ) dP .
A
⇒ Mn = E(Mn+1 | An ) P -a.s. -
Consider a Markov process with state space (S, S) and transition kernel K(x, dy).
A function h : (S, S) → (R̄, B̄) is harmonic if it satisfies, ∀x ∈ S, the mean value property
Z
h(x) = h(y)K(x, dy). (14.5)
Take Ω = {ω = (x0 , x1 , . . . ) | xi ∈ S}, Xi (ω) = xi and let Px be the law of the Markov
process (Xn ) with x0 = x and transition kernel K. Assume h is harmonic.
If h(x) < ∞, Mn := h(Xn ) (n = 0, 1, . . . ) is a martingale with respect to Px and
An = σ(X0 , . . . , Xn ):
R (14.5)
E(h(Xn+1 ) | An ) = h(y)K(Xn , dy) = h(Xn ) Px -a.s.
{T = n} ∈ An (n = 0, 1, . . . ) (14.6)
In words: the decision whether to stop at time n can be based on the events observable
until time n.
72
Remark (14.6) is equivalent to
{T ≤ n} ∈ An (n = 0, 1, . . . ) (14.7)
n
S
Proof: (14.6) implies that {T ≤ n} = {T = k} ∈ An .
k=0
On the other hand, (14.7) implies that {T = n} = {T ≤ n} ∩ {T ≤ n − 1}c ∈ An . -
Example 14.8 If (Xn )n=0,1,... is adapted to (An ) and A ∈ B, the first entry time to A,
given by TA (ω) = min{n ≥ 0 : Xn (ω) ∈ A} (min ∅ = +∞) is a stopping time.
n
S
Proof: {TA ≤ n} = {Xk ∈ A} ∈ An . -
k=0
But the time of the last visit to A, given by LA (ω) = max{n ≥ 0 : Xn (ω) ∈ A} is in
general not a stopping time.
b) If in addition
i) T is bounded, i.e. there is a constant K so that P (T ≤ K) = 1 or
ii) P (T < ∞) = 1 and (MT ∧n )n=0,1,... is uniformly integrable, then
Proof:
73
Example 14.10 (Simple random walk)
n
P
1
Y1 , Y2, . . . i.i.d. with P (Yi = 1) = 2
= P (Yi = −1), Sn = Yk for (n ≥ 1), S0 = 0,
k=1
An = σ(Y1 , . . . , Yn ), A0 = {∅, Ω}.
Then, (Sn ) is a martingale with respect to (An ), see Example 14.3.
Let T = min{n ≥ 0 : Sn = 1}.
Claim:
P (T < ∞) = 1. (14.10)
Gambling interpretation:
Claim: P (T < ∞) = 1.
Proof: Let c = b − a.
The events Ak = {Ykc+1 = 1, . . . , Y(k+1)c
T=S1} (k= 0, 1, . . . ) are independent.
The Borel-Cantelli Lemma implies P Ak = 1 ⇒ P (T < ∞) = 1. -
n k≥n
74
Remark If p < 12 ,
b−x
p
r(x) ≥ 1 − (14.13)
1−p
and this bound does not depend on a.
N E((MN − a)− )
E(Ua,b )≤ (14.14)
b−a
Proof: Sk (k = 1, 2, . . . ) and Tk (k = 1, 2, . . . ) are stopping times.
P∞
Theorem 14.9 implies that for Z = (MTk ∧N − MSk ∧N ), we have E(Z) = 0.
k=1
N
If Ua,b = m, Z ≥ m(b − a) + MN − MSn+1 ∧N .
Further, MN − MSn+1 ∧N ≥ MN − a if Sn+1 < N and MN − MSn+1 ∧N = 0 else.
⇒ MN − MSn+1 ∧N ≥ −(MN − a)− ⇒ Z ≥ (b − a)Ua,bN
− (MN − a)− .
Now, (14.14) follows since E(Z) = 0. -
Remark
N
1. Let Ua,b := lim Ua,b . Then, monotone convergence implies
N →∞
1
N
E(Ua,b ) = lim E(Ua,b )≤ sup E((MN − a)− ) (14.15)
N →∞ b−a N
sup E(|MN |) < ∞ ⇔ sup E(MN− ) < ∞ ⇔ sup E(MN+ ) < ∞. - (14.18)
N N N
75
Theorem 14.13 ((Doob’s) Martingale Convergence Theorem)
Assume (Mn ) is a martingale which satisfies (14.16) (or (14.17)). Then,
P (Mn converges to a finite limit) = 1 and for M∞ := lim Mn , we have M∞ ∈ L1 .
n→∞
Proof:
S
1. We have {lim inf Mn < lim sup Mn } ⊆ {Ua,b = ∞}.
n n a,b∈Q
a<b
Due to (14.15), (14.16) implies that P (Ua,b < ∞) = 1, ∀a, b and we conclude that
lim inf Mn = lim sup Mn P -a.s.
n n
⇒ for P -almost all ω, M∞ (ω) := lim Mn (ω) exists.
n→∞
Example 14.14 Consider simple random walk (Sn ), see Example 14.10.
(Sn ) is a martingale. Clearly, (Sn ) does not converge (since |Sn − Sn−1 | = 1).
In fact, (14.17) does not hold, since the CLT implies that
n→∞ R∞ − x2
2
q
√1 E(|Sn |) −−−→ √1 |x|e dx = 2
.
n 2π π
−∞
Nevertheless, we can benefit from Theorem 14.13.
Let c ∈ Z\{0}, Tc := min{n ≥ 1 : Sn = c}.
Then, (STc ∧n ) is again
) a martingale with respect to (An ).
If c > 0, STc ∧n ≤ c (STc ∧n ) is a martingale which is bounded above
⇒
If c < 0, STc ∧n ≥ c (or below respectively).
⇒ (STc ∧n ) converges P -a.s.
⇒ P (Tc < ∞) = 1, ∀c ∈ Z\{0} (this proves (14.10) with c=1)
c < ∞) = 1, ∀c ∈ Z
⇒ P (T
⇒ P lim sup Sn = ∞, lim inf Sn = −∞ = 1.
n n
76