Proba 2
Proba 2
Joseph Abdou
Master 1 MAEF-IMMAEF-MMEF-QEM
Under continuous revision.
This version April 3, 2021
ii
The author of the present notes has consulted available textbooks and lecture notes
on the subject. For martingale theory and Markov chains my main sources were:
Jacques Neveu: Bases Mathématiques de la théorie des probabilités,
Jacques Neveu: Martingales en temps discret
J. Lacroix, P. Priouret, Cours: J. Lacroix, Probabilités approfondies, Université Pierre
et Marie Curie, Master de Mathématiques, 2005-2006
Jean Jacod, Chaı̂nes de Markov, Processus de Poisson et Applications, Université Pierre
et Marie Curie, DEA de Probabilités et Applications, 2003-2004
Prerequisite. Probability with measure: σ-fields, measure space, measurable maps.
Non-negative measures, integration of real valued functions. Convergence of sequences
of real valued maps. Monotone convergence, Fatou lemma, dominated convergence
(Lebesgue). Lp spaces. Probability measure. Random variables. Expectations of r.v.
Independence of sub-σ-fields, independence of random variables.
Contents
I Conditional expectations 1
I.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
I.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
I.1.2 Conditional Expectation: definition and existence . . . . . . . . . 1
I.2 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
I.2.1 Regular conditional probability . . . . . . . . . . . . . . . . . . . 4
I.2.2 Partially defined random variables . . . . . . . . . . . . . . . . . 6
iii
iv CONTENTS
B Monotone class 69
B.1 Monotone class theorem for sets . . . . . . . . . . . . . . . . . . . . . . . 69
B.2 Monotone class theorem for funtions . . . . . . . . . . . . . . . . . . . . 70
C Transition probability 73
C.1 Composition of probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 73
C.2 Regular conditional probability . . . . . . . . . . . . . . . . . . . . . . . 74
D Uniform integrability 79
Chapter I
Conditional expectations
I.1.1 Theorem. For any sub-σ-field B ⊂ A and any nonnegative r.v. X, there exists
a nonnegative r.v. Y such that:
(i) Y is B-measurable,
1
2 CHAPTER I. CONDITIONAL EXPECTATIONS, APRIL 3, 2021
R R
(ii) For any B ∈ B : B Y dP = B XdP.
Such a r.v. is unique in the sense that if Y and Y ′ are two such r.v., then Y = Y ′ a.s.
I.1.2. By an easy application of the monotone class theorem (see Appendix B) property
(ii) of theorem I.1.1 can be replaced by any of the following equivalent properties:
(ii)’ for any bounded, B-measurable r.v. Z, one has : E(ZY ) = E(ZX).
(ii)” for any nonnegative, B-measurable r.v. Z, one has : E(ZY ) = E(ZX).
Since E(EB (X)) = E(X), X is integrable if and only if EB (X) is integrable. This
remark allows the extension of theorem I.1.1 to any quasi-integrable r.v as follows:
I.1.3 Theorem and definition. For any sub-σ-field B ⊂ A , and any quasi-integrable
r.v. X, there exists a r.v. Y such that :
(i) Y is B-measurable,
R R
(ii) For any B ∈ B : B Y dP = B XdP.
Such a r.v. is unique in the sense that if Y and Y ′ are two such r.v., then Y = Y ′ a.s.
It is called a (version of ) the conditional expectation of X given B. and will denoted
EB (X) or E(X|B).
Moreover EB (X) = EB (X + ) − EB (X − ).
It is easy to see, using again a monotone class theorem (see Appendix A), that
property (ii) of theorem I.1.3 can be replaced by the equivalent property:
(ii)’ for any bounded, B-measurable r.v. Z, one has : E(ZY ) = E(ZX).
EB (X + Y ) = EB (X) + EB (Y ) (I.1)
3) If X if independent of B then:
4) If B ⊂ C ⊂ A then :
EB EC (X) = EB (X) (I.4)
5) Monotone convergence:
If Xn ↑ X and if for some n0 , E(Xn−0 ) < +∞ then EB (Xn ) ↑ EB (X)
If Xn ↓ X and if for some n0 , E(Xn+0 ) < +∞ then EB (Xn ) ↓ EB (X)
6) Fatou-like lemma:
If Z ≤ Xn a.s for all n and E(|Z|) < +∞ then EB (lim inf n Xn ) ≤ lim inf n EB (Xn )
If Xn ≤ Z a.s for all n and E(|Z|) < +∞ then EB (lim supn Xn ) ≥ lim supn EB (Xn )
7) Dominated convergence:
If |Xn | ≤ Z a.s. for all n and E(|Z|) < +∞ and Xn → X a.s then EB (Xn ) → EB (X)
8) Jensen’s inequality: If f : R → R convex, E(|X|) < +∞ and E(|f (X)|) < +∞ then
f (EB (X)) ≤ EB (f (X))
9) The restriction of EB on L2 (Ω, A , P) is the orthogonal projector on the closed
subspace L2 (Ω, B, P) that is, for all X ∈ L2 (Ω, A , P), EB (X) is the unique element
Z ∈ L2 (Ω, B, P) such that for all Y ∈ L2 (Ω, B, P) one has: E(ZY ) = E(XY ).
10) For any p ≥ 1, the restriction of EB on Lp (Ω, A , P) is a linear map between
Lp (Ω, A , P) and its subspace Lp (Ω, B, P). This map is idempotent (EB ◦ EB = EB )
and ||EB || = 1, that is EB is a projector on Lp (Ω, B, P) of norm 1.
(i) is true since Q(·, A) constant on any element of the partition while (ii) is straigh-
forward. Properties (i) and (ii) imply that Q is a transition probability (or Markov
R R
Kernel) from (Ω, B) to (Ω, A ). Now Bi Q(ω, A)dP(ω) = P(A ∩ Bi ) = Bi 1Bi dP(ω).
It follows that we have :
Since for almost all ω, PB (ω, ·) is a probability measure one can write the following
equality between random variables:
Z
B
E (X) = X(ω ′ )PB (· , dω ′ ) a.s (I.7)
Ω
I.2.3 Definition. A transition probability (or Markov kernel) Q from (E, E ) to (Ω, A )
is a quotient regular conditional probability (QRCP) of P w.r.t T if for any A ∈ A one
has:
Q(T (·), A) = P(A |T )(·) a.s. (I.8)
I.2.4 Lemma (Doob). Let (E, E ) be a measurable space, let F be a polish space (com-
plete, separable metric space) endowed with its Borel σ-field BF , let T : Ω → E and let
X : Ω → F . Then the following statements are equivalent:
(i) X is σ(T )-measurable
(ii) There exists a mapping h : E → F , E /BF - measurable and such that X = h ◦ T
6 CHAPTER I. CONDITIONAL EXPECTATIONS, APRIL 3, 2021
T✲
Ω E♣
❅
♣♣
X❅
♣♣ h
❘
❅ ❄
♣♣
The discrete case. Assume now that E is finite or countable with its discrete σ-field.
A random variable T with values in E is said to be discrete. Put Bt := T −1 (t) (t ∈ E).
T being measurable, one has Bt ∈ A for all t. Let E ∗ be the set of all t ∈ E such
that P(Bt ) > 0 and let Ω∗ = ∪t∈E ∗ Bt . We put N (t, A) = P(A|T = t) if t ∈ E ∗ and
N (t, ·) an arbitrary probability measure on (Ω, A ), then N is a transition probability
(Markov kernel) from E to (Ω, A ). Since N (T (ω), A) = P(A|T )(ω) a.s., the family
N is the quotient regular conditional probability (QRCP) of P given T . Let X be a
quasi-integrable r.v and let h(t) := Ω X(ω ′ )N (t, dω ′ ), for all t ∈ E ∗ . We see that for
R
Let U ∈ B and let X be a quasi-integrable random variable and let Y = EB (X). Then
one has EB (1U X) = 1U EB (X).
I.2. CONDITIONAL PROBABILITY 7
Since the last equality is true for any extension X of T , one can take the extension
X = T̄ . We conclude that :
Let (Ω, F , P) a probability space and let (T, ≤) be a partially ordered set. A family
(Xt , t ∈ T) of random variables with values in (E, E ) is called a random process. When
(T, ≤) is totally ordered, it can be viewed as “time”. In most applications T is either
an interval of R+ or Z.
A T-filtration is a collection (Ft , t ∈ T) of sub-σ-fields of F such that Fs ⊂ Ft if s ≤ t.
A filtered space (Ω, F , P, (Ft , t ∈ T)) is a probability space together with a T-filtration.
Let (Xt , t ∈ T) be a process with values in (E, E ). The natural filtration associated to
(Xt ) is the filtration defined by FtX = σ(Xs , s ≤ t) (t ∈ T). Any process is adapted
relatively to its natural filtration.
In this course we are only interested in discrete time processes, that is, processes
where T is an interval of Z ≡ Z ∪ {−∞, +∞} with its natural order. In the sequel,
T will be mostly the set of natural numbers N = Z+ or its closure N = N ∪ {+∞}.
The case where T is a finite interval can be easily deduced from the case T = N. We
shall also consider, in very few cases, the negative interval T = Z− and its closure
Z− ∪ {−∞}. In what follows we shall use the following conventions:
A process without any further qualification is a process where T = Z+ = N. In such a
process we are mostly interested in asymptotic properties when the time goes to +∞.
A process closed on the right is a process where T = Z+ ∪ {+∞},
9
10 CHAPTER II. DISCRETE TIME PROCESSES APRIL 3, 2021
If τ (ω) = p for all ω ∈ Ω, then τ is a stopping time and Fτ = Fp . Most of the stopping
times that will be met in this course are obtained as follows:
II.1.3 Definition. Let (Xn , n ∈ N) be a process with values in (E, E ) and let B ∈ E ,
then the first hitting time of B is defined by :
.
Let τ be a stopping time of the filtration (Fn ). An event B ∈ A is determined
prior to τ if B ∈ F∞ and B ∩ {τ ≤ n} ∈ Fn for all n ∈ N. The set of such events is a
σ-field, denoted Fτ , this is the σ-field of events determined prior to τ .
II.1. FILTERED MEASURE SPACE 11
Proof. Exercise
II.1.7 Proposition. Let τ be a stopping time, and let X be a r.v. with values in (E, E )
then:
(a) the following are equivalent:
(i) X is Fτ -measurable
(ii) X is F∞ -measurable and X is Fn -measurable on {τ = n} for every n ∈ N
(iii) X is Fn -measurable on {τ = n} for every n ∈ N
(b) If X is an integrable or ≥ 0 r.r.v then: E(X|Fτ ) = E(X|Fn ) on {τ = n} (n ∈ N)
Proof (a) (i) ⇒ (iii): If (i) is verified then for any n ∈ N, and B ∈ E , on has
{X ∈ B} ∈ Fτ ⊂ F∞ , therefore {τ = n} ∩ {X ∈ B} ∈ Fn . (iii) ⇒ (ii): If (iii)
is satisfied, then for any Borel B, (X ∈ B) = ∪n∈N ((X ∈ B) ∩ (τ = n)) ∈ F∞ ,
12 CHAPTER II. DISCRETE TIME PROCESSES APRIL 3, 2021
Let (Xn , n ∈ N) be a closed process and let τ be a random time. One can define a
new mapping Xτ : Ω → E as follows:
When the process (Xn ) is defined only for n ∈ N, then formula defines a partial map
on {τ < +∞}, denoted also by Xτ .
Let (Xn , n ∈ N) be process, and let τ be a stopping time. The process (Xτ ∧n, n ∈ N)
is called the process (Xn ) stopped by τ . It denoted (Xnτ ).
It follows from the definition that an adapted process X is a martingale if and only if
it is a sub-martingale and a super-martingale. In particular X is integrable. Sometimes
a slightly more general notion of martingale is needed:
A numerical adapted process X = (Xt , t ∈ T) is a general martingale if the Xt are only
assumed to be quasi-integrable and if the relations III.1 are satisfied with equality.
This notion is mostly used when the Xt are non-negative and are referred to in the
literature as non-negative martingales even when they are not assumed to be integrable.
When need we refer to this notion as non-negative general martingale ...
In what follows, we shall assume that T = N so that, in general, we write (Fn )-
submartingale without further qualification. When the process is defined on T = Z+ ∪
13
14 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021
Unless otherwise stated, all martingales, stopping times, are w.r.t. to some fixed filtra-
N) and F∞ = ∨n∈NFn.
tion (Fn , n ∈
R R R
N
for all A ∈ Fτ1 . Let us first assume τ2 ≡ k where k ∈ . For any j ≤ k, A ∩ {τ1 = j} ∈
Fj , therefore A∩{τ1 =j} Xτ1 dP = A∩{τ1 =j} Xj dP ≤ A∩{τ1 =j} Xk dP. Summing up for
0 ≤ j ≤ k, one has:
Z k Z
X k Z
X Z
Xτ1 dP = Xj dP ≤ Xk dP = Xk dP (III.3)
A j=0 A∩{τ1 =j} j=0 A∩{τ1 =j} A
III.1. DEFINITION AND FIRST PROPERTIES 15
For the general case, we have τ1 , τ2 ≤ k, for some k ∈ N, therefore we apply the first
part of the proof on the stopped submartingale (Xnτ2 ). We thus obtain:
R
A Xτ1 dP =
R τ2
R τ2 R
A Xτ1 dP ≤ A Xk dP = A Xτ2 dP.
The optional sampling theorem does not extend to non bounded (finite) stopping
times. However if we start with a closed process, then this extension is possible. Recall
that when considering the closed process (Xn , n ∈ N), the σ-field of the filtration
associated to the terminal time +∞ is precisely F+∞ = ∨n∈N Fn . For submartingales
closed on the right (Xn , Fn , n ∈ N ∪ {+∞}), the optional sampling theorem is valid
for any stopping times. Precisely:
Now let (Xn ) be any closed submartingale. For any a ∈ R, the process Yn = (Xn ∨a)−a
is a nonnegative submartingale. In view of the first part, we have for any A ∈ Fτ the
inequality: A (Xτ ∨ a)dP ≤ A (X∞ ∨ a)dP and in particular E(Xτ+ ) ≤ E(X∞ + ) < +∞.
R R
III.1.5 Remark. Theorem III.1.3 can be easily deduced from Theorem III.1.4 as
follows. If (Xn , n ∈N) is a submartingale and τ1 ≤ τ2 ≤ p, then introduce the closed
submartingale (Yn , n ∈ N ∪ {+∞}) defined by Yn = Xn (n < p) and Yn = Xp (n ≥ p)
and apply Theorem III.1.4 on (Yn ).
16 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021
III.1.9 Definition. The process (An ) involved in the Doob decomposition of (Xn ) is
called the compensator of (Xn ). It is given inductively by A0 = 0 and An = An−1 +
E(Xn − Xn−1 |Fn−1 ) for (n ≥ 1).
III.1.10 Proposition. For any submartingale (Xn , Fn , n ∈ N), and any a > 0, any
N∈ N we have :
1 1
Z
+ +
P( sup Xk > a) ≤ XN dP ≤ E(XN ) (III.5)
k≤N a ( sup Xk >a) a
k≤N
1 1
Z
+
P( inf Xk < −a) ≤ −E(X0 )+ XN dP ≤ (−E(X0 )+E(XN )) (III.6)
k≤N a ( inf Xk ≥−a) a
k≤N
N
For any supermartingale (Xn , Fn , n ∈ ), and any a > 0, any N ∈ we have: N
1 1
Z
−
P( sup Xk > a) ≤ (E(X0 ) − XN dP) ≤ (E(X0 ) + E(XN )) (III.7)
k≤N a ( sup Xk ≤a) a
k≤N
III.2. ASYMPTOTIC PROPERTIES OF SUBMARTINGALES 17
1 1
Z
− −
P( inf Xk < −a) ≤ XN dP ≤ E(XN ) (III.8)
k≤N a ( inf Xk <−a) a
k≤N
Proof. Let ν = inf{n ≥ 0|Xn > a} with the usual convention inf ∅ = +∞. ν is a
stopping time. Also {ν ≤ N } = {supk≤N Xk > a} = {supk≤N Xk+ > a}. If (Xn ) is a
submartingale, then (Xn+ ) is a submartingale. For any k ≤ N , since {ν = k} ∈ Fk ,
R R R R
we have {ν=k} Xk ≤ {ν=k} XN , so that aP(ν ≤ N ) ≤ {ν≤N } Xν ≤ {ν≤N } XN ≤
+ +
R
{ν≤N } XN ≤ E(XN ). This proves inequality (III.5).
R R
If (Xn ) is a supermartingale, then aP(ν ≤ N ) ≤ {ν≤N } Xν = {ν≤N } Xν∧N =
−
R R R R
Xν∧N − {ν>N } Xν∧N = E(Xν∧N ) + {ν>N } (−XN ) ≤ E(X0 ) + {ν>N } XN ≤ E(X0 ) +
− −
E(XN ). We used the fact that (Xν∧n ) is a supermartingale and −XN ≤ XN . This
proves inequality (III.7). The other inequalities are easy consequences of the proved
ones.
lim inf xn < a < b < lim sup xn ⇒ U ((xn ), a, b) = +∞ ⇒ lim inf xn ≤ a < b ≤ lim sup xn
We conclude that (xn ) is convergent in R if and only if : for all a, b ∈ Q, a < b one has
U ((xn ), a, b) < +∞
III.2.3 Proposition. For any process (Xn ), νn and νn′ , U ((Xn ), a, b) and D((Xn ), a, b)
are random variables. If (Xn ) is adapted, then νn and νn′ are stopping times for all
n∈ N. One has the following equalities:
(i) U ((Xn ), a, b) = D((−Xn ), −b, −a)
(ii) U ((Xn ), a, b) = U (((X − a)+
n ), 0, b − a)
We define the upcrossing number of (Xn ) on the finite time interval [0, N ] as the number
U N ((Xn ), a, b) := U ((XnN ), a, b) where (XnN ) is (Xn ) stopped at N . We define similarly
the downcrossing number D N ((Xn ), a, b).
III.2. ASYMPTOTIC PROPERTIES OF SUBMARTINGALES 19
Proof. Let Ω0 = {ω|(Xn (ω))converges in R}, and let I be the collection ((a, b) ∈
Q × Q, a < b). For (a, b) ∈ I put C(a, b) = {ω|U ((Xn (ω)), a, b) = +∞}. From
relations (III.12), one has Ω \ Ω0 = ∪(a,b)∈I C(a, b). Since I is countable, P(Ω0 ) = 1
if and only if P(C(a, b)) = 0 for all (a, b) ∈ I .
We are going to associate to (Xn ) two processes (Vn ) and (Vn′ ). Put V0 = 0 and for
k ≥ 1 put:
0 if ∃m ≥ 0, ν2m < k ≤ ν2m+1
Vk =
1 if ∃m ≥ 0, ν2m+1 < k ≤ ν2m+2
III.2.5 Lemma. For any adapted process (Xn ), the sequences (Vn ) and (V ′ n) are
predictable.
III.2.6 Theorem. For any (Fn )-submartingale (Xn ), any (a, b) ∈ R2 , a < b, N ∈ N,
we have the following inequalities:
Proof of the first inequality. Assume first, that the submartingale is integrable. Put
U = U N ((Xn ), a, b) and ρ = ν2U +1 ∧ N . Using the inequalities b − Xν2k ≤ 0 and
Xν2k−1 − a ≤ 0 true for all k ∈ {1, . . . , U }, we have:
V0 X0 + U (b − a) + XN − Xρ
= V0 X0 + U
P
k=1 [(Xν2k − Xν2k−1 ) + (b − Xν2k ) + (Xν2k−1 − a)] + XN − Xρ
PU
≤ V0 X0 + k=1 (Xν2k − Xν2k−1 ) + XN − Xρ
= (V ⋆ X)N
Now ((1 − V ) ⋆ X)n ) is a submartingale by proposition III.2.2. Taking conditional
expectations on the preceding inequalities, it follows that
20 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021
(b − a)EF0 [U ]
≤ EF0 [(V ⋆ X)N − XN ] − V0 X0 + EF0 [Xρ ]
= −EF0 [((1 − V ) ⋆ X)N ] − V0 X0 + EF0 [Xρ ]
≤ −(1 − V0 )X0 − V0 X0 + EF0 [Xρ ]
= EF0 [Xρ ] − X0
In the particular case where a = 0, b = ∆ and Xn ≥ 0, we have Xρ = 1(ν2U +1 >N ) XN +
1(ν2U +1 ≤N ) X2U +1 ≤ XN . Thus ∆EF0 [U ] ≤ EF0 [XN ] − X0 . The first inequality of the
assertion is proved in this case. The general case is reduced to this particular case by
replacing respectively Xn by (Xn − a)+ , a by 0, and b by b − a and using Proposition
III.2.3 (ii).
Proof of the second inequality. Assume first, that the submartingale is integrable. Put
D = D N ((Xn ), a, b) and ρ′ = ν2D+1
′ ∧ N . Using the inequalities a − Xν2k
′ ≥ 0 and
Xν2k−1
′ − b ≥ 0, true for all k ∈ {1, . . . , D}, we have:
V0′ X0 + D(a − b) + XN − Xρ′
= V0′ X0 + D
P
′ − Xν ′ ) + (a − Xν2k − b)] + XN − Xρ′
k=1 [(Xν2k 2k−1
′ ) + (Xν ′
2k−1
≥ [V0′ X0 + D
P
′ − Xν ′ ) + (XN − Xρ′ )] + (Xν1′ − b)1(ν1′ <+∞)
k=1 (Xν2k 2k−1
III.2.7 Corollary. For any (Fn )-submartingale (Xn , n ∈ N), any (a, b) ∈ R2 , a < b,
we have the following inequalities:
(b − a)EF0 [U ((Xn ), a, b)] ≤ sup EF0 [(Xn − a)+ ] − (X0 − a)+ (III.15)
n∈N
(b − a)EF0 [D((Xn ), a, b)] ≤ sup EF0 [(Xn − b)+ ] − (X0 − b)+ (III.16)
n∈N
III.2. ASYMPTOTIC PROPERTIES OF SUBMARTINGALES 21
III.2.8 Theorem. Any submartingale (Xn , n ∈ N) such that supn∈N E(Xn+) < +∞
converges a.s. when n → ∞, to some F+∞ - measurable r.v. denoted X+∞ that satisfies:
+
E(X+∞ ) < +∞.
III.2.9 Remark. For an integrable submartingale (Xn , n ∈ N), the inequality E(Xn+ ) ≤
E(|Xn |) = E(2Xn+ − Xn ) ≤ 2E(Xn+ ) − E(X0 ) being verified for every n ∈ N, condition
supn∈N E(Xn+ ) < +∞ is equivalent to supn∈N kXn k1 < +∞ where k · k1 is the L1 norm.
III.2.10 Remark. With the assumptions of Theorem III.2.8, one cannot conclude
that the submartingale (Xn , n ∈N) can be closed on the right by X+∞, that is the
inequality Xn ≤ E(X+∞ |Fn ) is not necessarily true for all n ∈ N. However we have
the following:
Proof. The implication (ii) ⇒ (iii) is trivial. We prove (i) ⇒ (ii). In view of theorem
III.2.8, the limit ∞ exists and is F+∞ measurable. For any a ∈ R, the sequence (Xn ∨a)
is submartingale and by assumption (Xn+ ) is equiintegrable and therefore (Xn ∨ a) is
equiintegrable. It follows that for any n, and A ∈ Fn
R R R
A (Xn ∨ a)dP ≤ lim m→+∞ A (Xm ∨ a)dP = A (∞ ∨ a)dP
Now we prove (iii) ⇒ (i) Put Yn = EFn (∞+ ). Then by proposition D.0.3 (Yn , n ∈ N)
is equiintegrable. Since Xn+ ≤ Yn for all n, (Xn+ ) is equiintegrable. The fact that (iv)
22 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021
In the case where the submartingale is integrable, one has the following easy criterion
for corvengence in L1 :
We take as discrete time set the set Z− = {0, −1, −2, . . .} with a filtration F0 ⊃ F−1 ⊃
F−2 · · · . We put F−∞ = ∩n∈Z− Fn .
III.2.15 Corollary. For any (reverse) (Fn )-submartingale (Xn , n ∈ Z−), any (a, b) ∈
R 2, a < b, we have the following inequalities:
III.2.16 Theorem. Any (reverse) submartingale (Xn , n ∈ Z−) converges a.s., when
n → −∞, to some F−∞ - measurable r.v. that shall be denoted X−∞ . Moreover
(Xn , n ∈ Z− ∪ {−∞}) is a submartingale.
Proof. The convergence is a consequence of proposition III.2.4 and III.2.15. Since
X−∞ is measurable with respect to all Fn , it is measurable with respect their inter-
section. Let a ∈ R. Then (Xn ∨ a) is a submartingale and is bounded from below by
III.3. REGULAR MARTINGALES 23
a. For any p ∈ Z−, and A ∈ F−∞, E(1AX−∞ ∨ a) = E(1A lim inf n→−∞ Xn ∨ a) ≤
lim inf n→−∞ E(1A Xn ∨ a) = inf n∈Z E(1A Xn ∨ a) ≤ E(1A Xp ∨ a). By monotone con-
vergence, when a ց −∞ we have E(1A X−∞ ) ≤ E(1A Xp ).
III.2.17 Theorem. Let (Xn , n ∈ Z−) be a (reverse) submartingale and let X−∞ be its
a.s. limit. The following are equivalent:
(i) Xn is integrable for all n ∈ Z− and (Xn ) converges to X−∞ in L1,
(ii) The family (Xn , n ∈ N) is equiintegrable,
(iii) E(Xn ) ↓ α > −∞ as n ↓ −∞,
(iv) supn∈Z+ E(Xn− ) < +∞.
Proof. (i) ⇔ (ii) is an immediate consequence of Proposition D.0.6 and the fact that
a.s. convergence implies convergence in probability. (iii) ⇔ (iv) as well as (ii) ⇒ (iv)
are trivial. We prove (iii) ⇒ (ii). Assume that the monotone sequence sequence
E(Xn ) ↓ α > −∞ as n ↓ −∞. Clearly the Xn are integrable. Take ǫ > 0, and fix k0
such that α ≤ E(Xk0 ) < α + 2ǫ . It will be enough to prove that the family (Xn , n ≤ k0 )
R
R
is equiintegrable. Put I(n, a) = (|Xn |>a) |Xn |dP (n ≤ k0 , a ∈ + ). For n ≤ k0 we
have:
R R
I(n, a) = (Xn >a) Xn dP − (Xn <−a) Xn dP
R R
= (Xn >a) Xn dP + (Xn ≥−a) Xn dP − E(Xn )
ǫ
R R
≤ (Xn >a) Xk0 dP + (Xn ≥−a) Xk0 dP − E(Xk0 ) + 2
= (Xn >a) Xk0 dP − (Xn <−a) Xk0 dP + 2ǫ
R R
where we used, to obtain the first inequality, the fact that (Xn > a) and (Xn ≥ −a) are
in Fn , n ≤ k0 , the submartingale property, and the fact that α ≤ E(Xn ) ≤ E(Xk0 ).
Now :
E(|Xn |) = 2E(Xn+ ) − E(Xn ) ≤ 2E(Xk+0 ) − α
so that:
P(|Xn | > a) ≤ a1 E(|Xn |) ≤ a1 (2E(Xk+0 ) − α))
R
Since Xk+0 is integrable, it follows that there exists a0 ∈ + such that for a ≥ a0 one
has: (|Xn |>a) |Xk0 |dP ≤ 2ǫ . To sum up for a ≥ a0 and n ≤ k0 we have that I(n, a) ≤ ǫ.
R
for all n ∈ N:
Xn = E(Y |Fn ) (III.19)
(ii) There exists an integrable r.v. Z such that: Xn = E(Z|Fn ) for all n ∈ N,
(iii) The sequence (Xn , n ∈ N) converges a.s. to some r.v. X∞ and (Xn , n ∈ N∪{+∞})
is a closed martingale,
α ↑ +∞
Moreover the closure of the martingale in (i), EF∞ (Z) in (ii) and the limit X∞ claimed
in (iii) are equal.
Proof. (i) ⇔ (ii) and (iii) ⇒ (i)(ii) are straightforward. (ii) ⇒ (v) is a consequence
of Proposition D.0.3. (iv) ⇒ (v) is a direct consequence of Proposition D.0.6. Proof
of (v) ⇒ (iv): Any equiintegrable family is bounded in L1 (Proposition D.0.5), a
martingale (Xn ) that is bounded in L1 is convergent a.s. , hence in probability, and
in view of proposition D.0.6(Appendix), this implies the convergence of the sequence
(Xn ) in L1 . Proof of (iv) ⇒ (iii): Let Y be the limit of Xn in L1 , then for any
p ∈ N limn→∞ EF (Xn ) = EF (Y ) by continuity of the operation EF
p p p
on L1 but
EFp (Xn ) = Xp for n ≥ p. It follows that EFp (Y ) = Xp . Now convergence in L1
implies boundedness in L1 and therefore, by Theorem III.2.8, the a.s. convergence of
the martingale (Xn ) to some X∞ . It is clear that Y = X∞ .
III.3.3 Corollary. Let (Xn ) be a regular martingale, closed by X∞ , and let τ1 and τ2
be two stopping times, such that τ1 ≤ τ2 . Then one has:
III.4 Martingales in Lp
III.4.1 Proposition. Let p ∈ [1, ∞[. For any Z ∈ Lp , the sequence Xn := EFn (Z)
is a martingale in Lp , that converges almost surely and in Lp to Z∞ := EF∞ (Z).
We conclude that X∞ ∈ Lp . We can now apply proposition III.4.1 to conclude for the
convergence of (Xn ) in Lp .
III.4.3 Proposition. Let p ∈]1, ∞[. For any martingale (Xn , n ∈ N) that is bounded
in Lp , the r.v. X ∗ ≡ supn |Xn | is in Lp . Moreover
p
||X ∗ ||p ≤ sup ||Xn ||p (III.21)
p−1 n
26 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021
Proof. Put : Sn = supm≤n |Xn |. The maximal inequality (III.5) applied to the sub-
martingale (|Xn |) yields:
0
April 3, 2021
28 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021
Chapter IV
Proof. We first prove the only if part. Assume that A1 and A2 are conditionally
independent w.r.t. to C . Put H = {A ∈ F | E(1A X2 ) = E(1A E(X2 |C )} and
M = {A1 ∩ C | A1 ∈ A1 , C ∈ C }. It is easily verified that H is a monotone class that
contains A1 ∪ C . Moreover if A = A1 ∩ C where A1 ∈ A1 , C ∈ C , then E(1A X2 ) =
E(1C 1A1 X2 ) = E(EC [1C 1A1 X2 ]) = E(1C EC [1A1 X2 ]) = E(1C EC [1A1 ] EC [X2 ]),
the last equality is due to conditional independence. Now, by definition of E[1A1 ],
and since 1C EC [X2 ] is C -measurable, we have the equalities E(1C EC [X2 ]EC [1A1 ]) =
E(1C EC [X2 ]1A1 ) = E(1A EC [X2 ]). By the monotone class theorem, H contains the
σ-field generated by M , that is A1 ∨ C .
Conversely, assume that (IV.3) holds. Let X1 ∈ A1b , X2 ∈ A2b . For any C ∈ C , one
has: E(1C X1 X2 ) = E(EA1 ∨C [1C X1 X2 ]) = E(1C X1 EA1 ∨C [X2 ]) = E(1C X1 EC [X2 ]),
and the latter is equal to: E(1C EC [X1 ] EC [X2 ]). We conclude that EC [X1 X2 ] =
EC [X1 ] EC [X2 ].
29
30 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021
Let (Ω, F , (Fn ), P) be a filtered space and let (E, E ) be a measurable set, called the
set of states. If X = (Xn ) is a measurable sequence of random variables with values
in (E, E ), we put FnX = σ(X0 , . . . , Xn ) (n ∈ N). (FnX ) is called the natural filtration
of X. We also put AnX = σ(Xk , k ≥ n): this is the σ-field of future events determined
by X after time n.
By monotone class argument condition IV.4 is equivalent to the following: for all
B∈E
Any (Fn )-Markov sequence is an (FnX )-Markov sequence. This is why we introduce
the following definition:
Proof. First we prove IV.7 for all Φ of the form Φ = f0 (Xn ) . . . fp (Xn+p ) where p ≥ 0
and f0 , . . . fp ∈ Eb . The proof is by induction on p. For p = 0 equality IV.7 is obviously
true for all f0 ∈ Eb . Assume that the formula is true for all f0 , . . . fk ∈ Eb where
0 ≤ k < p. Put U = f0 (Xn ) . . . fk (Xn+k ). We have:
E(U fk+1 (Xn+k+1 )|Fn ) = E[E(U fk+1 (Xn+k+1 )|Fn+k )|Fn ]
= E[U E(fk+1 (Xn+k+1 )|Fn+k )|Fn ]
= E[U E(fk+1 (Xn+k+1 )|Xn+k )|Fn ] (using formula IV.4 )
IV.2. MARKOV SEQUENCE 31
The Markov property can be formulated via a transition probability between two
successive states, at least if the space state is a polish space. (A reminder on transition
probabilities is the object of Appendix B).
In the rest of this chapter the set E denotes a finite or countable set, endowed with
its discrete σ-field, P(E). A nonnegative measure on E is a σ-additive measure on
(E, E ) with values in [0, +∞]. Since any nonnegative measure µ on E verifies µ(B) =
P
i∈B µ({i}) for any B ∈ E , µ can be identified with a mapping µ : E → [0 + ∞]. In
our setting, µ is represented by a row matrix (µ(i)).
Dually, a numerical function f : E → R is usually represented by a column matrix
f = (f (j)). A nonnegative function f is a numerical function such that f ≥ 0. The set
of all nonnegative functions is denoted E+ . A bounded function f is a function such
that |f | ≤ c for some c ∈ R. The set of all bounded functions is denoted Eb.
A transition matrix (or stochastic matrix , or Markov matrix ) on E is a mapping M :
R P
E × E → + such that: j∈E M (i, j) = 1 for all i ∈ E. M thus represents a transition
probability from E to E (see Appendix C ). In the discrete setting that we consider in
this chapter, M is usually denoted as a matrix M = (mij ) where mij := M (i, j).
The product of transition matrices M and N is denoted M N and is defined by:
M N (i, j) = k∈E M (i, k)N (k, j). M N is a transition matrix. We put M 0 (i, j) := δij
P
In view of relation IV.6, and the discreteness of E one has the following:
Assume that property (IV.4) is satisfied then property (IV.11 ) is satisfied for the
transition matrix Mn defined by ( IV.12). The converse is trivial.
The matrix Mn is not uniquely defined since for x ∈ E such P(Xn = x) = 0, Mn (x, ·)
can be any arbitrary probability measure on E. A family of Matrices (Mn , n ∈ N) with
the property stated in the proposition is called a system of Markov matrices associated
to the Markov sequence.
IV.2.10 Definition. A sequence (Xn ) with values in the countable set E is an homo-
geneous (Fn )-Markov sequence if there exists a transition matrix M on E such that
all f ∈ E+ (or all f ∈ Eb ) and for all n ∈ N, one has:
E(f (Xn+1 )|Fn ) = M f (Xn ) a.s. (IV.15)
M is the transition matrix and the probability µ defined by µ(i) = P(X0 = i) is the
initial distribution of the Markov sequence.
Equivalently, one can replace in definition IV.2.10 the relation (IV.15) by the fol-
lowing: for all n ∈ N and all j ∈ E:
P(Xn+1 = j|Fn ) = M (Xn , j) a.s. (IV.16)
In the rest of this chapter we deal only with homogeneous Markov sequences, there-
fore in the sequel Markov sequence will mean homogeneous Markov sequence.
IV.2.11 Proposition. (i) If (Xn ) is an (Fn )-Markov sequence with transition matrix
M and initial distribution µ then for all n ∈ N and all i0 , . . . , in ∈ E one has:
P(X0 = i0 , . . . , Xn = in ) = µ(i0 )M (i0 , i1 ) · · · M (in−1 , in ) (IV.17)
(ii) Conversely if (IV.17) is verified for some µ and M , then (Xn ) is a Markov sequence
(w.r.t (FnX )) with transition Matrix M and initial distribution µ.
Let E be a finite or countable set endowed with its discrete σ-field E . The infinite
product E N , that is the set of all sequences in E, will be endowed with the product
σ-field E ⊗N and the fitration obtained as follows. Let πn be the projector of E N on its
n-th coordinate. We put En := σ(π0 , . . . , πn ). Clearly E ⊗N = ∨n≥0 En . The sequence
(πn ) is said to be the canonical sequence associated to E.The term (E N , E ⊗N , En , P) is
said to be a canonical process on E .
Let (Ω, F , P, (Xn )) be a measurable process with values in (E, E ). Put X = (Xn ),
then X : Ω → E N is F /E ⊗N -measurable. The image of P by X denoted PX is called the
IV.3. MARKOV CHAIN 35
Since Xn = πn ◦ X, we have:
It follows that (Ω, F , P, Xn ) and (Ω′ , F ′ , P′ , Xn′ ) are equivalent if and only if they
have the same canonical realization. It is easy to see that the canonical realization
(E N , E ⊗N , PX , πn ) of any (Fn )-Markov sequence is a Markov sequence. In view of
proposition IV.2.11 we can state:
IV.2.13 Proposition. Two Markov sequences (Xn ) and (Xn′ ) (each defined on some
filtered space) are equivalent if and only if they have the same initial distribution and
the same transition matrix.
Proof. Using formula (IV.17), one can define a family of probability measures Qn
on En := E {0,...,n} , with its discrete σ- field B̃n . For n ≤ m, Qn is the image of Qm
by the projection πnm of Em on En . It follows that the system ((En , Qn ), n ≥ 0 is
consistent so that one can apply a probability extension theorem of the Kolmogorov
type (Neveu, Calcul des Probabilités, section III- 3 ) to conclude of the existence
of a unique probability measure Q on E N such that Qn is the image of Q by the
projection (π0 , . . . , πn ). One can also apply the Ionescu-Tulcea theorem (Neveu, Calcul
des Probabilités, Proposition V-1-1) for the same purpose.
IV.3.1 Definition. Let M be a transition matrix on E. The array X = (Ω, (Xn ), θ, (Pi ))
as described above is said to be a Markov chain with transition matrix M on E if all
i ∈ E, the following properties hold:
(i) Pi (X0 = i) = 1
(ii) For all n ∈ N, and f ∈ E+ (or f ∈ Eb ) one has:
Ei (f (Xn+1 )|Fn ) = M f (Xn ) Pi a.s. (IV.21)
Translation operator. The translation ( or shift) operator θ will play a major role
in the reformulation of the Markov property. Let App+n = σ(Xp , . . . , Xp+n ) (p, n ∈ N)
and Ap = ∨n≥0 App+n . Ap is known as the σ-field of future events at time p. We put
θ0 = IdΩ and for any n ≥ 0 we put θn+1 := θ ◦ θn . For any B0 , . . . , Bn ⊂ E, the set
{ω|X0 ◦ θp (ω) ∈ B0 , . . . , Xn ◦ θp (ω) ∈ Bn } = {ω|Xp (ω) ∈ B0 , . . . , Xn+p (ω) ∈ Bn }. It
follows that θp−1 (Fn ) = σ(Xp , . . . , Xn+p ) = App+n so that in particular θp : (Ω, F ) →
(Ω, F ) is measurable and Ap = θp−1 (F ) that is, Ap is the σ-field generated by θp . In
view of Lemma I.2.4, a real random variable Y ′ is Ap -measurable if and only if there
exists an F -measurable r.v. Y such that Y ′ = Y ◦ θp . Homogeneity and existence of a
translation operator allow a more precise relations than equality (IV.7).
The Markov property for a Markov chain can be reformulated nicely using the existence
of a rich family of probability measures for which the sequence is markovian and the
translation operator. We start by remarking that the formula IV.15 can be expressed
as follows: For any µ ∈ ∆(E), any f ∈ E+ , any n ∈ N:
Eµ (f (X1 ◦ θn )|Fn ) = EXn (f (X1 )) Pµ a.s. (IV.22)
IV.3. MARKOV CHAIN 37
It is important to note that the right hand side of this equation is a random variable
equal, for Pµ -almost all ω, to the (unconditional !) expectation EXn (ω) (f (X1 )). The
following proposition extends the operator formulation of the Markov property to all
future random variables. This extension is similar to the one stated in Proposition
IV.2.3.
Note that in the RHS of formula (IV.23) EXn (Φ) is the r.v. ω → EXn (ω) (Y )
First proof. By the remarks preceding the proposition, the result appears as a refor-
mulation of Proposition IV.2.3 if we remark that the RHS is precisely E(Φ|Xn )
Second proof. We first prove the formula for Px . We want to prove that for any Z that
is bounded and Fn -measurable we have:
It will be enough to prove this equality for Φ of the form f0 (X0 ) . . . fp (Xp ) where
f0 , . . . , fp ∈ Eb since then, by the monotone class argument, it will be valid for all
bounded F - measurable Φ. The proof is by induction on p. For p = 0 Ex (Zf0 (X0 ) ◦
θn ) = Ex (Zf0 (Xn )). Since Ey (f0 (X0 )) = f0 (y) one has EXn (f0 (X0 )) = f0 (Xn ) so that
equality IV.24 is true in this case. Let U = f0 (X0 ) . . . fp−1 (Xp−1 ), V = U M fp (Xn−1 )
Ex (ZΦ ◦ θn ) = Ex (Z U ◦ θn fp (Xp+n )
= Ex [Ex (Z U ◦ θn fp (Xp+n )|Fp+n−1 )]
= Ex [Z U ◦ θn Ex (fp (Xp+n )|Fp+n−1 )]
= Ex [Z U ◦ θn M fp (Xp+n−1 ]
= Ex [Z V ◦ θn ]
= Ex [ZEXn (V )]
In particular taking Z = 1, n = 0 we have Ex (Φ) = Ex (V ). Since this equality is true
for all x ∈ E, one can replace has EXn (V ) by EXn (Φ) in the last term of the above
equalities. This proves IV.24.
Let T be a stopping time for the filtration (Fn ) then (T < +∞) ∈ F and one can
define on (T < +∞) the operator θT as follows:
The LHS of the equation must be understood as the restriction of Eµ (1(T <+∞) Φ ◦
θT |FT ) to (T < +∞) (see remark I.2.2). An alternative way to write this equality is
therefore:
In particular taking Φ = f (Xm ) in (IV.26) and applying Proposition IV.2.12 one has:
σi = 1 + τi ◦ θ (IV.29)
σin+1 = σi + σin ◦ θσi on (σi < +∞) (IV.30)
= σin + σi ◦ θσin on (σin < +∞) (IV.31)
Li := Ei (σi )
P
Ni := n≥0 1{i} (Xn ). Ni is called the number of visits to state i.
uij := Ei (Nj )
U := (uij ) is called the Potential matrix.
IV.3.6 Proposition. (i) For any j ∈ E and p ∈ N∗, σjp is a stopping time for the
filtration (Fn ). (ii) For any i, j ∈ E, the sequence (σjp , p ∈ N∗ ) is a homogeneous
Markov sequence with values in N for the filtered space (Ω, A , Pi , (Fσ )p≥1 ), with
∗
p
j
transition matrix Q(j) that depends on j and not on i, and initial distribution µ equal
to the law of σj under Pi , precisely: µ(k) = Pi (σj = k) ≡ skij (k ∈ N∗ )
Proof. The proof that σjp is a stopping time is left to the reader. Put Bp = Fσjp . Let
i ∈ E and let f : N∗ → R be a bounded function. For any k ∈ N∗, EBi (f (σjp+1)) =
p
B
EF k p p p+1
i (f (k + σj ◦ θ )) on (σj = k), and Ei (f (σj
k
)) = f (+∞) on (σjp = ∞). Thus we
have the equality:
B
Ei p (f (σjp+1 )) = F
N∗ 1(σjp =k)Ei k f (k + σj ◦ θ k ) + 1(σ(p) =∞) f (∞)
P
k∈
j
B
Therefore Ei p f (σjp+1 ) = Qf (σjp ) where Q is the following transition matrix.
l−k
sjj N
if k ∈ ∗ , l ∈ N∗ , l − k ≥ 1
Q(k, l) = 1 if k = l = +∞ (IV.32)
0 otherwise
(p+1) (p)
IV.3.7 Corollary. For any i, j ∈ E, p ≥ 1 : fij = fij fjj .
(p) (p)
Therefore: fij = fij (fjj )p−1 (i 6= j), and fjj = (fjj )p .
40 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021
Proof. Using the notations of Proposition IV.3.6, we see that for k ∈ N∗, Q1N (k) =
∗
Pi (σj < +∞) and Q1N∗ (∞) = 0. Since (σjp , p ∈ N ∗) is a Markov sequence for proba-
1N∗ (σjp+1 ) = Q1N∗ (σjp ) =
B Bp
bility Pi and matrix Q, one has: Ei p 1(σp+1 <+∞) = Ei
j
1(σjp <+∞) Pj (σj < +∞). By taking expectation, this implies Pi (σjp+1 < +∞) =
Pi (σjp < +∞)Pj (σj < +∞). In particular if i = j, Pj (σjp+1 < +∞) = Pj (σjp <
(p) (p)
+∞)Pj (σj < +∞). Therefore fjj = (fjj )p and consequently fij = fij (fjj )p−1 .
IV.3.8 Proposition. The number of visits Nj to state j is a r.v with values in N ∪{∞}
that has the following law, where we take i 6= j:
If fjj < 1
1 − fij if k = 0
Pi (Nj = k) = f (1 − fjj )(fjj )k−1
ij
if k ∈ ∗N (IV.33)
0 if k = ∞
0 if k = 0
Pj (Nj = k) = (1 − fjj )(fjj )k−1 if k ∈ ∗N (IV.34)
0 if k = ∞
If fjj = 1
1 − fij if k = 0
Pi (Nj = k) = 0 N
if k ∈ ∗ (IV.35)
fij if k = ∞
0 if k = 0
Pj (Nj = k) = 0 if k ∈ ∗N (IV.36)
1 if k = ∞
Proof. a) Proof of the case i 6= j. On (X0 = i) one has (Nj = 0) = (σj = +∞), therefore
Pi (Nj = 0) = 1 − fij and for p ∈ N∗ one has (Nj = p) = (σjp < +∞) ∩ (σjp+1 = +∞).
Therefore Pi (Nj = p) = Pi [(σjp < +∞) ∩ (σjp+1 = +∞)]. Writing σjp+1 = σjp + σj ◦ θσjp
on (σjp < +∞), it follows that:
Pi (Nj = p)
= Pi [(σjp < +∞) ∩ (σj ◦ θσjp = +∞)]
Fσ p
= Ei [Ei j 1(σjp <+∞) 1(σj ◦θσp =+∞) ]
j
Fσ p
= Ei [ 1(σjp <+∞) Ei j 1(σj ◦θσp =+∞) ]
j
IV.3. MARKOV CHAIN 41
= Ei [ 1(σjp <+∞) Ej (1(σj =+∞) ] (By the strong Markov property)
(p)
= fij .(1 − fjj )
p−1
= fij fjj (1 − fjj )
(in view of corollary IV.3.7)
k−1
Also Pi (Nj = ∞) = Pi ∩k≥1 (σjk < +∞) = limk Pi (σjk < +∞) = limk fij fjj
(corollary IV.3.7 and monotone convergence) Pi (Nj = ∞) is thus equal to 0 if fjj < 1
and equal to fij if fjj = 1.
b) Proof of the case i = j (sketch). (X0 = j) ∩ (Nj = 0) = ∅. Thus Pj (Nj = 0) = 0.
On (X0 = j) we have: (Nj = 1) = (σj = +∞) and for p ≥ 2 (Nj = p) = (σjp−1 <
+∞) ∩ (σjp = +∞). The same method as in the first paragraph leads to the announced
results.
(n)
X
uij = δij + fij (IV.37)
n≥1
and:
(1 − fjj )−1
if fjj < 1
ujj = (IV.39)
+∞ if fjj = 1
1
With the convention : 0 = +∞ and 0 · +∞ = 0.
P
Proof. Method 1. Since Nj is integer valued, uij ≡ Ei (Nj ) = n≥1 P(Nj ≥ n). On
(X0 6= j), we have (Nj ≥ n) = (σjn < +∞) while on (X0 = j) we have for n ≥ 2,
(n)
(Nj ≥ n) = (σjn−1 < +∞). It follows that if i 6= j, Pi (Nj ≥ n) = Pi (σjn < +∞) = fij
and if i = j, and n ≥ 2, Pj (Nj ≥ n) = Pj (σjn−1 < +∞), also Pj (Nj ≥ 1) = 1. This
ends the proof of the first part. Applying corollary IV.3.7 yields the two other formulae.
Method 2. Use the formula giving the law of Nj in the preceding proposition to compute
uij = Ei (Nj ).
IV.3.11 Proposition. (i) i → j if and only if uij > 0 or equivalently if and only if
(n)
there exists n ≥ 0 such that mij > 0.
(ii) The binary relation → defined on E is reflexive and transitive (preorder).
(iii) The binary relation ∼ defined on E is an equivalence relation.
(0)
Proof. (i) If i = j then uii ≥ mii = 1. Let i 6= j and Pi (σj < +∞) > 0. Then for some
(n)
n ≥ 1 Pi (σj = n) > 0. Thus uij ≥ mij = Pi (Xn = j) ≥ Pi (σj = n) > 0. Conversely
(n)
if uij > 0 and i 6= j then for some n ≥ 1, mij > 0. Thus Pi (σj < +∞) ≥ Pi (σj ≤
(n)
n) ≥ Pi (Xn = j) = mij > 0.
(n) (p)
(ii) If i → j and j → k then for some m, p ≥ 0 mij > 0 and mjk > 0. It follows that
(n+p) P (n) (p) (n) (p)
mik = ℓ∈E miℓ mℓk ≥ mij mjk > 0. Thus i → k.
(iii) The binary relation ∼ is clearly reflexive, symmetric and transitive.
IV.3.15 Definition. A state i is said to be recurrent if Pi (σi < +∞) = 1 and transient
otherwise.
Proof. (i) For any i, j (σj < +∞) ∩ (σi ◦ θσj = +∞) ⊂ (Ni < +∞). By the strong
Markov property: Pi [((σj < +∞) ∩ (σi ◦ θσj = +∞)] = Pi (σj < +∞).Pj (σi = +∞).
Thus fij .(1 − fji ) ≤ Pi (Ni < +∞).
(i) If i → j and i is recurrent we have fij > 0 and Pi (Ni < +∞) = 0, therefore
1 − fji = 0, consequently j → i and finally i ∼ j. In view of IV.3.16 we conclude that j
is recurrent. Now since j is recurrent, and j → i, by symmetry fij = 1 and in view of
(n)
Proposition IV.3.9 uij = +∞. Now in view of Proposition IV.3.6(iii) fij = fij (fjj )n =
(n)
1 therefore Pi (Nj = +∞) = Pi (∩n (σjn < +∞)) = limn→+∞ fij = 1.
(n)
(ii) If j is transient then fjj < 1. In view of Proposition (IV.3.6)(iii) fij = fij (fjj )n
(n)
therefore Pi (Nj = +∞) = Pi (∩n (σjn < +∞)) = limn→+∞ fij = 0. On the other hand
in view of IV.38 and IV.39 ujj < +∞ and uij < +∞.
It follows from Proposition IV.3.17 (i) that any recurrent communication class is
a maximal element in the partially ordered set (C , →), the converse being not true,
since there exist irreducible transient chains. However one has the following in the case
where the number of states is finite :
44 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021
IV.3.19 Theorem. Any Markov chain with a finite number of states has at least one
recurrent state. A communication class is recurrent if and only if it is a maximal
element of the p.o.s. (C , →).
P P P P P
Proof. j∈E uij = j∈E Ei (Nj ) = j∈E n≥0 Ei (1j ◦ Xn ) = n≥0 1 = +∞. There-
fore there exists some j such that uij = +∞. In view of IV.37 uij = fij ujj . Therefore
ujj = +∞ and we conclude that j is a recurrent state. Now if C is a maximal class
then the chain restricted to C is irreducible and has finitely many states. It follows by
the first part of the proof that this chain is recurrent and therefore, in view of remark
IV.3.18 that C is recurrent.
If the set E is not finite a recurrent state may not exist.
Assume that i is a recurrent state, then Pi (σi < +∞) = 1. It follows that θσk
i
is defined for all k ≥ 1, on a set of Pi -measure 1. In the following we prove that
Fσi , θσ−1 −1
1 (Fσi ), . . . , θσ n (Fσi ) are independent under Pi . Precisely:
i i
where we used the fact that Y0 is Fσi -measurable, the strong Markov property and
the fact that Xσi = i. By induction we have that this expectation is equal to:
Ei [f0 (Y0 )] . . . Ei [fn (Yn )].
IV.3. MARKOV CHAIN 45
µM = µ (IV.42)
and excessive if :
µM ≤ µ (IV.43)
Mf = f (IV.44)
and superharmonic if :
Mf ≤ f (IV.45)
Proof. The first assertion is obtained by induction on n. For the second, apply the first
part and the equality E(f (Xn+1 )|Fn ) = M f (Xn ). For the third assertion remark that
Eµ (f (X0 ) = µ(f ).
Let X be any Markov chain, with matrix M , then it is clear that the constant function
f : f (x) = 1 for all x ∈ E, is harmonic.
IV.3.25 Theorem. Let X be an irreducible recurrent Markov chain. Then any super-
harmonic function is constant.
closes it on the right as a supermartingale (corollary III.2.14), for any a.s. finite stopping
time τ , the optional sampling theorem III.1.4 applies thus: Eµ (f (Xτ )|F0 ) ≤ f (X0 )
Pµ -a.s. Take i, j ∈ E. Since the chain is irreducible recurrent, Pi (σj < +∞) = 1
and we can take τ = σj so that Xσj = j Pi -a.s. Thus we have f (j) = Ei (f (j)|F0 ) =
Ei (f (Xσj )|F0 ) ≤ f (X0 ) = f (i) Pi -a.s. By symmetry f (i) = f (j).
Second proof, not relying on martingales (1) For any Markov chain, If (fi , i ∈ I) is a
family of superharmonic functions then ∧i∈I fi is superharmonic. Indeed if f = ∧i∈I fi ,
for any j ∈ I, fj ≥ M fj ≥ M f so that f = inf fj ≥ M f .
(2) For any irreducible Markov chain, if f is harmonic and f has a maximum then f
is constant (the maximum principle). Indeed let i0 ∈ E such that f (i0 ) = maxi∈E f (i)
P (n)
for all n ≥ 0 one has :0 = i∈E mi0 j (f (i0 ) − f (j)). Since all the terms on the RHS
(n)
are nonnegative, it follows that mi0 j (f (i0 ) − f (j)) = 0. By irreducibility for any j ∈ E
(n)
there exist n ≥ 0 such that mi0 j > 0. We conclude that f (j) = f (i0 ).
(3) For any irreducible recurrent Markov chain, any superharmonic function is har-
monic. Let f be superharmonic. Put g := f − M f and for n ≥ 1 let U (n) = nk=0 M k .
P
Then U (n) g = f − M n+1 f . If for some j ∈ E g(j) > 0, then for any i ∈ E:
(n) (n)
uij g(j) ≤ U (n) g(i) ≤ f (i), so that uij ≤ f (i)/g(j) contradicting the fact that
(n)
limn→∞ uij = uij = +∞ (recurrence). We conclude that g(i) = 0.
End of the proof. Let f be superharmonic and let i and j be in E. In view of
(1) and since a constant function is harmonic inf{f, f (i)} is superharmonic, and in
view of (3) inf{f, f (i)} is harmonic. Since the fatter has a maximum at i, it follows
that min{f (j), f (i)} = f (i). Permuting the roles of i and j, we obtain that f (j) =
min{f (j), f (i)} = f (i). Therefore f is constant.
IV.3.28 Lemma. Let X be an irreducible recurrent Markov chain. Then any non
trivial excessive measure µ is such that: 0 < µ(i) < +∞ for all i ∈ E.
λ(j)
ûij = uji (IV.50)
λ(i)
so that the chain with matrix M̂ is irreducible and recurrent. The last assertion can
be easily checked.
IV.3.30 Theorem. Let X be an irreducible recurrent Markov chain, let i ∈ E, and let
λi be defined by (IV.46). Then any excessive measure µ is proportional to λi and as
such is invariant.
48 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021
Proof. Let µ be an excessive measure of M and let i ∈ E. Since the measure λi is non-
µ(j)
trivial we can define as in Lemma IV.3.29 µ̂, by µ̂(j) = λi (j) . Then µ̂ is superharmonic
function for M̂ . Since M̂ is the transition matrix of an irreducible recurrent Markov
chain, lemma IV.3.25 implies that µ̂ is constant. That is µ = cλi or some c ∈ [0 + ∞].
.
IV.3.32 Theorem. Let X be an irreducible recurrent Markov chain, and for any i ∈ E
let λi be defined by (IV.46). Then the set of all nontrivial invariant measures of X is
precisely the set: {cλi | 0 < c < +∞}
Proof. Fix i0 ∈ E. By the preceding theorem any nontrivial invariant measure is of the
form µ = cλi0 where 0 < c < ∞. In particular, it follows that for any i ∈ E we have
λi = cλi0 where 0 < c < ∞. But for the invariant measure λi , one has λi (E) = Ei (σi ).
Thus two cases are possible either all non trivial invariant measures have a finite total
weight (case1) or all non trivial invariant measures have an infinite total weight (case2).
1
IV.3.34 Remark. In case (i) the unique invariant probability π verifies : π(i) = Ei (σi )
(i ∈ E). Indeed there exists 0 < ci < +∞ such that π = ci λi . Since λi (i) = 1 and
π(i) π(E) 1
π(E) = 1, one has π(i) = λi (i) = λi (E) = Ei (σi ) .
In this subsection X is a Markov chain. Taking into account the classification stated
in corollary IV.3.33 for irreducible recurrent chains, we introduce the following:
IV.3. MARKOV CHAIN 49
Proof. If i is a transient state then Pi (σi = +∞) > 0 and consequently Ei (σi ) = +∞.
IV.3.37 Proposition. In the same communication class either all the states are pos-
itive or all the states are null.
Proof. If the class is transient all the states of the class are null. If the class is recurrent,
say C, then the positivity or the nullity of a state i ∈ C does not change if we consider
the restricted chain MC , and in view of corollary IV.3.33 the states of C are either all
positive or all null.
IV.3.39 Remark. The lemma implies that if µ(i) > 0 where µ is a bounded (that
is µ(E) < +∞) invariant measure, then this state is positive, hence recurrent. The
condition of boundedness on the measure µ cannot be removed. If µ is invariant but
not bounded, then a transient state can have a strictly positive weight. In Example
IV.3.20 the Markov chain with E = Z and the transition matrix mij = 1 if j = i + 1
and mij = 0 if j 6= i + 1, µ such that µ(i) = 1 for all i ∈ E is invariant. Indeed for any
j ∈ E, µM (j) = i∈Z µ(i)mij = µ(j − 1) = 1.
P
50 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021
IV.3.40 Corollary. Any Markov chain that has an invariant probability has a positive
class.
The following refines the result of Theorem IV.3.19 in the finite case:
IV.3.41 Corollary. Any Markov chain with finitely many states has at least one in-
variant probability hence at least one positive class.
IV.3.42 Proposition. Let X be a Markov chain, let A be the set of all positive classes,
and for any α ∈ A denote by γα the unique invariant probability measure of α extended
by 0 on E \ α, and by ∆(A) the set of all probabilities on A. The set of all invariant
probability measures Γ of X is given by the following:
X
Γ={ δα γα : δ ∈ ∆(A)} (IV.51)
α∈A
In particular X has an invariant probability if and only if it has some positive class. It
has a unique invariant probability if and only if it has exactly one positive class.
Proof. Let f ∈ L1 (λi ) and assume first that f is nonnegative. Let Z0 = 0≤k<σi f (Xk )
P
P
and for n ≥ 1, Zn = σn ≤k<σn+1 f (Xk ). Then by corollary IV.3.22 Zn = Z0 ◦ θσin and
i i
under Pi , the sequence Z0 , . . . , Zn , . . . is i.i.d. Note that Ei (Z0 ) = λi (f ). Applying the
strong law of large numbers, we have:
1
(Z0 + · · · + Zn−1 ) −→n→+∞ λi (f ) Pi a.s. (IV.53)
n
For any m ∈ N let ν(m) = inf{n ∈ N|m < σin }. Then ν(m) =
P
0≤k≤m 1{i} (Xk )
ν(m)−1 ν(m)
Pi - a.s. Moreover we have σi≤ m < σi and ν(m) →m→+∞ +∞. Now
P
Z0 + · · · + Zn−1 = 0≤k<σn f (Xk ) so that :
i
X X X
f (Xk ) ≤ f (Xk ) ≤ f (Xk ) (IV.54)
ν(m)−1
0≤k<σi 0≤k≤m ν(m)
0≤k<σi
Therefore:
P P
Zk 0≤k<ν(m) Zk
P
ν(m) − 1 0≤k<ν(m)−1 0≤k≤m f (Xk )
≤P ≤ (IV.55)
ν(m) ν(m) − 1 0≤k≤m 1{i} (Xk ) ν(m)
If f ∈ L1 (λi ), write f = f+ −f− and apply the preceding result on f+ and f− separately.
Since a similar limit can be obtained for g and since λi (g) 6= 0, (IV.52) results from
writing the limit of the quotient as a quotient of the limits.
The following is an easy consequence of Theorem IV.4.1 and Corollary IV.3.31.
P
f (Xk )
0≤k≤n λ(f )
P −→n→∞ Pµ a.s. (IV.57)
g(Xk ) λ(g)
0≤k≤n
IV.4.3 Theorem. Let X be an irreducible recurrent Markov chain, and let µ be the
initial distribution.
(i) If X is positive and π is the invariant probability, then for all f ∈ L1 (π)
1 X
f (Xk ) −→n→∞ π(f ) Pµ a.s. (IV.58)
n
0≤k <n
(ii) If X is null and λ is a nontrivial invariant measure, then for all f ∈ L1 (λ) :
1 X
f (Xk ) −→n→∞ 0 Pµ a.s. (IV.59)
n
0≤k <n
Proof. (i) Since X is positive, relation (IV.58), is a particular case of theorem IV.4.2
where g is the constant 1 on E. (ii) Let f be nonnegative. For any finite F ⊂ E,
1F ∈ L1 (λ) so that, in view of theorem IV.4.2 :
P
f (Xk )
1 X 0≤k <n λ(f )
f (Xk ) ≤ P −→n→∞ Pµ a.s.
n 1F (Xk ) λ(F )
0≤k <n
0≤k <n
1 X λ(f )
lim f (Xk ) ≤
n→∞ n λ(F )
0≤k <n
1 P
Since λ(f ) is finite and λ(F ) can be chosen as large as desired, lim n f (Xk ) = 0.
n→∞ 0≤k <n
IV.5 Periodicity
IV.5.1 Definition. Let i ∈ E. The period of i is the GCD of the set Di := {n ≥
(n)
1|mii > 0}. By convention di = +∞ if Di = ∅. When di = 1 the state i is said to be
aperiodic.
(i) For any k0 , k ∈ {0, . . . , d − 1} and any i0 ∈ Ck0 , the chain Pi0 visits Ck only on
times n such that n ≡ k − k0 (mod d)
(ii) M d is the transition matrix of a Markov chain, for which C0 , . . . , Cd−1 are com-
munication classes and for which a chain starting from i0 ∈ Ck0 never visits Cℓ unless
ℓ = k0 . Moreover if i0 is recurrent for M , then it is recurrent for M d and if it is
transient for M it is transient for M d .
(m)
Proof. Fix i0 ∈ C arbitrary. Let j ∈ C. There exists m such that mji0 > 0. Let sj
be the unique integer in {1, . . . , d} such that m ≡ sj (mod d) and let rj = d − sj . Let
(n) (m+n)
Di0 j : {n ≥ 0|mi0 j > 0}. For any n ∈ Di0 j on has mi0 i0 > 0 so that d divides m + n,
or equivalently m + n ≡ 0 (mod d). Thus Di0 ,j is included in the set {rj + kd : k ∈ N}.
N
Let Cr be the set of all j such that Di0 ,j ⊂ r+d . We just prove that the Cr are disjoint
and that their union is C. The only point that remains to be proved is that each Cr
is nonempty. i0 ∈ C0 . Assume that d > 1. Since there exist k ≥ 1 and i0 , i1 , . . . , iqd−1
such that i0 → i1 , · · · → iqd−1 → i0 , it is clear that ik ∈ Ck for k = 1, . . . , d − 1. This
(dn)
ends the proof of (i). Now the set {n ≥ 1|mi0 i0 > 0} have CGD 1 so that the period
of i0 for M d is 1. Now it is clear from (i) that the chain with matrix M d starting from
P (dn) P (n)
i0 ∈ Ck0 never visits Cℓ with ℓ 6= k0 . Moreover n≥0 mi0 i0 = n≥0 mi0 i0 , so that if i0
is recurrent for M it is recurrent for M d and if it is transient for M it is transient for
M d.
Proof. The first equality is a trivial consequence of the definitions and notations IV.3.1.
We prove the second. Since (Xn = j) ∩ (σj = k) = ∅ if k > n ≥ 1 we have:
(Xn = j) = ∪k∈N∗ (Xn = j) ∩ (σj = k) = ∪nk=1 (Xn = j) ∩ (σj = k).
Thus we have the following equalities:
Pi (Xn = j) = nk=1 Ei (1(σj =k) 1(Xn =j) )
P
= nk=1 Ei (EF
P
i [1(σj =k) 1(Xn =j) ])
k
GCDAn , d′n = GCDA′n . By induction we prove d′n = dn for all n ≥ 1. For n = 1, the
equality is straightforward.We distinguish 3 cases : Case (1) sn+1
ii > 0 then in view of
(n+1)
formula (IV.61) mii > 0 so that d′n+1 = GCD{d′n , n+1} = GCD{dn , n+1} = dn +1.
(n+1)
Case (2) sn+1
ii = 0 and mii = 0. Then d′n+1 = d′n = dn = dn+1 . Case (3) sn+1
ii =0
(n+1) (n+1) Pn (n+1−k)
and mii > 0 and in view of (IV.61), 0 < mii = k=1 skij mjj so that for
(n+1−k)
some 1 ≤ k ≤ n, skij mjj > 0. It follows that n − k + 1 ∈ An and k ∈ A′n . By the
induction hypothesis, dn divides n − k + 1 and k, therefore dn divides (n − k + 1) + k =
n + 1. Thus : d′n+1 = d′n = dn = dn+1 . Now d = limn→∞ dn = limn→∞ d′n = d′ .
Subclaim 1. For any q such that sq > 0 lim unt −q = λ. Passing to the limit in the
t
following equality
X
unt = sq unt −q + sk unt −k
1≤k≤nt ,k6=q
yields
X
lim unt = lim inf sq unt −q + lim sup sk unt −k
t t t
1≤k≤nt ,k6=q
The LHS is equal to λ. The first term of the RHS is equal to sq lim inf t unt −q . The
second term of the RHS can be considered as the limsup when t goes to +∞ of the
integral of the function gt defined on N∗ by gt(k) = un −k if 1 ≤ k ≤ nt, k 6= q, and
t
gt (k) = 0 otherwise, the intergal being w.r.t. the probability measure (sn ) on N∗ . Since
lim supt gt (k) ≤ λ if k 6= q and lim supt gt (q) = 0, by Fatou lemma, the limsup of the
integral is less or equal to (1 − sq ) λ. Therefore: λ ≤ sq lim inf t unt −q + (1 − sq ) λ, and
since sq > 0 we have: λ ≤ lim inf t unt −q . This ends the proof of subclaim 1.
Subclaim 2. For any q = α1 q1 + · · · + αℓ qℓ where , α1 , . . . , αℓ ∈ N and s1, . . . , sℓ > 0,
one has lim unt −q = λ. The proof is straightforward by induction.
t
Claim 1 is thus obtained as a consequence of lemma IV.6.3.
Claim 2. Put ℓi = ∞
P Pn
k=i+1 sk (i ≥ 0), δn = i=0 ℓi un−i (n ≥ 0). Then δn = δn−1 =
P∞ P∞
· · · = δ0 = 1 and i=0 ℓi = n=1 nsn .
The proof of this claim is standard.
P
End the proof of the lemma. For any t, δnt ≡ i∈N ℓi gt (i) is viewed as an integral of
the function defined on N by gt(i) = un −i1{i:i≤n } with respect to the nonnegative
t t
N
measure ℓ = (ℓi ) on . We distinguish two cases:
If ∞
P
i=0 ℓi < +∞, then the measure ℓ is bounded. Since 0 ≤ gt ≤ 1 for all t, and since
in view of claim 1 the sequence gt (i) goes to λ when t goes to +∞, the Dominated
56 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021
P
convergence theorem implies that δnt goes to λ i∈N ℓi when t goes to ∞. Since
P
δnt = 1 for all t we conclude that λ i∈N ℓi = 1.
If ∞
P P
i=0 ℓi = +∞, by Fatou lemma we obtain: λ i∈N ℓi ≤ 1, so that λ = 0.
Last argument : replacing limsup by liminf in the beginning of the proof, leads to the
same conclusion. Since they are both equal to ( ∞ −1
P
n=1 nsn ) , the proof is complete.
(ndj ) dj
lim mjj = (IV.63)
n→∞ Lj
1
(with the convention ∞ = 0, and notation Lj = Ej (σj ))
1
(gn (k))n≥0 converges toLj when n goes to infinity. The dominated convergence theo-
(n)
rem yields: limn→∞ mij = limn→∞ k∈N skij gn (k) = k∈N skij L1j = L1j k
P P P
k∈N sij . In
(n) f
view of (IV.60) limn→∞ mij = Lijj .
(n) 1
lim mij = (IV.65)
n→∞ Lj
1
(with the convention ∞ = 0, and notation Lj = Ej (σj )). In particular:
If the chain is positive then the limit of M n as n goes to infinity is a matrix with equal
raws, each raw being equal to π, the invariant measure of M .
If the chain is null then the limit of M n as n goes to infinity is a the null matrix.
IV.6. ASYMPTOTIC PROPERTIES OF THE MARKOV MATRIX 57
In the following we gather the results concerning the asymptotic behavior of the
sequence M n .
(ndj +r) dj
lim mij = fij (r) (IV.66)
n→∞ Lj
P
where: fij (r) = k≥0 Pi (σj = kdj + r)
In particular if i ∼ j there exists rij ∈ {0, . . . , dj − 1} such that :
(
dj
(nd +r) if r = rij
lim m j = Lj (IV.67)
n→∞ ij 0 6 rij
if r =
1
(with the convention ∞ = 0, and notation Lj = Ej (σj ))
P (n) (n)
Proof. If j is transient, then uij = n≥0 mij < +∞ therefore limn→∞ mij = 0. If j
is recurrent, then by modifying the proof of (IV.64), one can establish in a similar way
(nd +r) kd +r (n−k)dj
that mij j = k≥0 1{k≤n} sij j mjj
P
and that the limit when n → ∞ is equal
dj P kdj +r dj
to Lj k≥0 sij = Lj fij (r).
58 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021
0
April 3, 2021
Appendix A
The pair (Ω, A ) where Ω is a set and A is a σ-field on Ω, is called a measurable space.
If A is a σ-field then:
(1) Ω ∈ A ,
(2) If A = ∩n≥0 An and if An ∈ A (n = 0, 1, . . .) then A ∈ A ,
n=k A and if A ∈ A (n = 1, . . . , k) then A ∈ A ,
(3) If A = ∪n=1 n n
(1) P(Ω) is a σ-field, {∅, Ω} is a σ-field, the empty collection is not a σ-field.
(3) For any collection C ⊂ P(Ω) we associate the collection σ(C ) defined as the
intersection of all σ- fields containing C . σ(C ) is then σ-field. This is the smallest
σ-field containing C . It will be called the σ-field generated by C .
(4) If (Ω, τ ) is a topological space, where τ is the collection of open sets. The Borel
σ-field is the σ-field generated by τ , that is σ(τ ).
59
60 APPENDIX A. REMINDER ON MEASURE THEORY
(5) R ≡ (−∞, +∞) will be generally endowed with its Borel σ-field BR associated
to its usual topology. BR is generated by the collection of all open intervals, by
the collection of all closed intervals, by the collection (−∞, α) where α ∈ Q, etc
..
(6) R ≡ [−∞ + ∞] will be endowed with the σ-field generated by the family [−∞ α)
(resp. [−∞ α], (α + ∞], [α, +∞] ) where α ∈ R (or α ∈ Q). This is precisely
the Borel σ-field of [−∞ + ∞] when endowed with its usual topology as the two
point compactification of R. If τ denotes the collection of open sets of R, then
the collection of open sets of the topology of R is the union of τ with the two
collections [−∞ α) and (β + ∞] α, β ∈ R.
A.1.3 Definition. Let (E, E ) and (F, F ) be two measurable spaces. A map f : E → F
is E /F -measurable if f −1 (A) ∈ E for any A ∈ F .
(2) Let (E, E ), (E, E ), (G, G ) be measurable spaces. If f : (E, E ) → (F, F ) and
g : (F, F ) → (G, G ), then g ◦ f : (E, E ) → (G, G )
Therefore we have:
f = f+ − f− (A.3)
|f | = f + + f − (A.4)
(lim inf fn )(ω) := lim inf fn (ω) = sup inf fn (ω) (A.10)
n→+∞ n→+∞ n≥0 m≥n
(i) There exists a sequence (sn ) of simple measurable functions such that: sn (ω) →
f (ω) when n → +∞, for every ω ∈ Ω
(ii) If f ≥ 0 then the sequence (sn ) may be chosen to be positive and increasing:
0 ≤ s0 ≤ s1 ≤ · · · ≤ f
(iii) If f is bounded (that is |f | ≤ M for some M ∈ R+) then sequence (sn) can be
chosen to be increasing and uniformly convergent
(i ) µ(∅) = 0,
Note that in the definition, requirement (ii) is equivalent to the two following conditions:
Pn
(iia) µ(∪nk=1 Ak ) = k=0 µ(Ak ) for any finite family (Ai ∈ A i = 1, . . . , n) of pairwise
disjoint subsets (Ai ∩ Aj = ∅ for i 6= j)
Moreover:
Arithmetic in [0, +∞] Throughout this chapter we adopt the following rules:
x + (+∞) = (+∞) + x = +∞ for all x ∈ [0, +∞]
0.x = x.0 = 0 for all x ∈ [0, +∞]
(+∞).x = x.(+∞) = +∞ if x > 0
A.2. POSITIVE MEASURE 63
A.2.2 Definition. Let (Ω, A , µ) be a positive measure space. For any E ∈ A and
any simple function s of the form:
i=n
X
s= ai 1Ai (A.12)
i=1
where (a1 , . . . , an ) are distinct in [0, +∞) and (A1 , . . . , An ) is a partition of Ω, the
integral of s on E with respect to µ is the extended positive number (i.e. in [0, +∞]):
Z n
X
s dµ = ai µ(Ai ∩ E) (A.13)
E i=1
For any measurable f : [0, ∞] the integral of f on E with respect to µ is the extended
positive number:
Z Z
f dµ = sup{ s dµ : s simple, s ≤ f } (A.14)
E E
Then
Z ∞ Z
X
f dµ = fn dµ (A.17)
n=0
Z Z
(lim inf fn ) dµ ≤ lim inf fn dµ (A.1)
n→∞ n→∞
Z Z Z
f dµ = f dµ − f − dµ
+
(A.18)
We denote by L 1 (µ) the set of all measurable functions f : Ω → R that are integrable.
A.2.8 Proposition. We have the following:
Then :
A.2. POSITIVE MEASURE 65
(3) f ∈ L 1 (µ),
R
(4) |fn − f | dµ → 0, as n → ∞
and
R R
(5) fn dµ → f dµ, as n → ∞
A property that depends measurably on ω is said to be true almost every where if the
measure of the set of all ω where the property fails has µ-measure 0. This applies to
measurable functions.
A positive measure µ is bounded (or finite) if µ(Ω) < +∞. µ is σ-finite if there exists
an increasing sequence A1 ⊂ A2 ⊂ · · · in A such that µ(An ) < +∞ and ∪n An = Ω.
A.2.13 Remark. The main statement of the Radon-Nikodym theorem remains true if
µ and λ are both assumed to be σ-finite positive measures; note however that in this case
the function f that is announced will be only “locally integrable”; that means precisely
that there exists an increasing sequence A1 ⊂ A2 ⊂ · · · such that µ(An ) < +∞,
R
∪n An = Ω, and An f dµ < +∞.
A.2.4 Lp spaces
Monotone class
Since the intersection of any family of monotone classes is a monotone class, and
since the set of all subsets of Ω is a monotone class, the intersection of all monotone
classes containing some set B of subsets of Ω (a class), is a monotone class. This is the
smallest monotone class containing B and it will be denoted B m .
B.1.2 Definition. A class I is said to be stable for finite intersection (or π-system)
if it has the following property:
(i) If A ∈ I and B ∈ I , then A ∩ B ∈ I .
B.1.3 Theorem (monotone class theorem for sets). Let M be a monotone class and
let I be a class that is stable by finite intersection and containing Ω. If M contains
I then M contains σ(I ).
Proof. We are going to prove that I m , the smallest monotone class containing I , is
stable by finite intersection, thus showing that I m is a σ-field. Since I ⊂ I m ⊂ M ,
69
70 APPENDIX B. MONOTONE CLASS
B.2.3 Theorem (Monotone Class theorem - functional form). Let H be a vector space
of bounded functions defined on Ω that is stable by bounded monotone convergence and
let C be a set of real functions stable by multiplication and containing the constant
function 1. If H contains C then H contains all bounded σ(C )-measurable functions.
B.2.4 Lemma. Let H0 be a vector space of bounded real functions that is stable by
bounded monotone convergence, by multiplication and containing the constant 1. Then
H0 is the set of bounded σ(H0 )-measurable functions.
H2 = H0 .
We conclude from (3) that H0 is stable by multiplication.
Appendix C
Transition probability
C.0.1 Definition. A kernel from the measurable space (E, E ) to the measurable space
(F, F ) is a mapping N : E × F → [0 + ∞] such that:
(i) For all x ∈ E, N (x, ·) is a non negative measure on (F, F ),
(ii) For all B ∈ F , N (·, B) is E -measurable.
We see that a kernel is simply a family of non negative measures (Nx , x ∈ E) with
the additional requirement of measurability in the sense of property (ii). A kernel is
also called a transition measure. The Kernel is said to be sub-Markov (resp. Markov)
if N (x, F ) ≤ 1 (resp. N (x, F ) = 1) for all x ∈ E. A Markov kernel is also called
transition probability. The measure N (x, ·) is also denoted Nx (·).
C.1.1 Proposition. Let N be a transition probability from (E, E ) to (F, F ) and let µ
be a probability measure on (E, E ).
1) There exists a probability measure Q on (E × F, E ⊗ F ) such that for all A ∈ E , B ∈
F:
73
74 APPENDIX C. TRANSITION PROBABILITY
Z
Q(A × B) = N (x, B)µ(dx) (C.1)
A
2) Let f : E × F → R be E ⊗ F -measurable.
R
(i) If f ≥ 0 then the map x → F f (x, y)Nx (dy) is E -measurable ≥ 0 and we have the
equality: Z Z Z
f d(µ × N ) = [ f (x, y)Nx (dy)]µ(dx) (C.2)
E F
R
(ii) f is µ ⋉ N -integrable if and only if the map x → F |f (x, y)|Nx (dy) is µ-integrable
and in this case equality (C.10) holds.
(iv) For any Y that is A -measurable and ≥ 0 (resp. bounded, resp. integrable) :
(v) For any Y that is A -measurable and ≥ 0 (resp. bounded, resp. integrable) and any
f that is E - measurable ≥ 0 (resp. bounded) :
Z Z Z
f (T (ω))Y (ω)P (dω) = f (t)[ Y (ω)Nt (dω)]µ(dt) (C.7)
E Ω
C.2.4 Theorem. Assume that Ω is a polish space endowed with its Borel σ-field (A =
BΩ ), that (E, E ) is a measurable space and that T : Ω → E is A /E -mesurable. Then
there exists a QRCP N of P given T .
One standard particular case where regular conditional probability is often considered
is the following: E = Ω, E = B is a sub-σ-field of A et h = idΩ .
An RCP given B is a special case of QRCP : just take (E, E ) = (Ω, B) and T the
identity on Ω. Conversely given N a QRCP of P given T , put n(ω, A) = N (T (ω), A),
then n is a RCP of P given σ(T ).
R
or equivalently: For any f : E × F → + that is E ⊗ F -measurable
Z Z Z
[ f (x, y)Nx (dy)]µ(dx) = f dQ (C.10)
E F E×F
C.2.7 Proposition. Let (Ω, A , P ) be a probability space and let X and Y be two
random variables with values in (E, E ) and (F, F ) respectively. Let Q = PX,Y be the
probability measure on (E × F, E ⊗ F ) image of (X, Y ). If N is a PRCP of Q given
πE , then :
1) For any B ∈ F : P (Y ∈ B|X)(ω) = N (X(ω), B) P -a.s.
2) For any f : E × F → R that is E ⊗ F -measurable ≥ 0 or Q-integrable:
E(f (X, Y )|X) = h(X) a.s.
R
where h(x) = F f (x, y)Nx (dy) (x ∈ E)
3) In particular, when f : F → R is ≥ 0 or PY -integrable, we have :
C.2. REGULAR CONDITIONAL PROBABILITY 77
Z
E(f (Y )|X)(ω) = f (y)N (X(ω), dy)
E
R
If we put N f (x) = E f (y)N (x, dy), then E(f (Y )|X) = N f (X) P -a.s
C.2.8 Theorem. Let (E, E ) be a measurable space and let F be a polish space with F
its Borel field. Let Q be a probability on the product (E × F, E ⊗ F ). Then there exists
a PRCP N of Q given πE .
78 APPENDIX C. TRANSITION PROBABILITY
Appendix D
Uniform integrability
D.0.1 Definition. A family (Xi ) of real r.v. is said to be uniformly integrable (or
equiintegrable) if Z
sup |Xi |dP ↓ 0 when α ↑ +∞ (D.1)
i∈I {|Xi |>α}
R
Explicitly: ∀ǫ > 0, ∃α0 > 0, ∀α > 0, ∀i ∈ I, α > α0 ⇒ {|Xi |>α} |Xi |dP <ǫ
D.0.2 Proposition. If |Xi | ≤ X a.s. for all i ∈ I and X is integrable then (Xi , i ∈ I)
is uniformly integrable. In particular any finite family of integrable r.v. is uniformly
integrable.
Proof. For any sub-σ-field B of A , |EB (X)| ≤ EB (|X|). Thus for any a >
0 (|EB (X)|>a) |EB (X)|dP ≤ (EB (|X|)>a) EB (|X|)dP = (EB (|X|)>a) |X|dP , the last
R R R
equality being valid since (EB (|X|) > a) ∈ B. For any b > 0:
R R R
B
(E (|X|)>a) |X|dP = B
(E (|X|)>a)∩(|X|≤b) |X|dP + (EB (|X|)>a)∩(|X|>b) |X|dP
≤ ab E(EB (|X|)) +
R
(|X|>b) |X|dP
= ab E(|X|) +
R
(|X|>b) |X|dP
√
If we take b = a and let a go to infinity we get the result.
79
80 APPENDIX D. UNIFORM INTEGRABILITY
R
Explicitly: ∀ǫ > 0, ∃η > 0, ∀A ∈ A , ∀i ∈ I, P (A) < η ⇒ A |Xi |dP <ǫ
The family (Xi , i ∈ I) is said to be bounded in L1 if :
Z
sup |Xi |dP < +∞ (D.3)
i∈I Ω
There exists η > 0 such that A max(|Y |, |X1 |, . . . , |XN |)dP < 2ǫ for all A ∈ A such
R
R
that P (A) < η. We conclude that for any n ∈ N A |Xn | < ǫ. This means that (Xn ) is
P -equicontinous. Since the family (Xn ) is bounded in L1 , we conclude by proposition
D.0.5 that (Xn ) is equiintegrable.
(ii) ⇒ (iii) is a straightforward consequence of Proposition D.0.5.
We now prove (iii) ⇒ (i). Let (Xn ) be P -equicontinous and convergent in probability
to an a.s. finite r.v. Y . For any ǫ > 0, there exists η > 0 such that for any A ∈ A
with P (A) < η one has A |Xn | < 4ǫ (*). On the other hand convergence of (Xn )
R
81
in probability implies the existence of N ∈ N such that for all m, n ≥ N one has
ǫ
P (|Xm − Xn | > < η (**). Therefore for m, n ≥ N one has:
2)
R R
kXm − Xn k1 = {|Xm −Xn |≤ ǫ } |Xm − Xn | + {|Xm −Xn |> ǫ } |Xm − Xn |
2 2
≤ 2ǫ + A |Xn | + A |Xm |
R R
ǫ ǫ ǫ
≤ 2 + 4 + 4
=ǫ
where we denoted by A = {|Xm − Xn | > 2ǫ } and used (*) and (**). This proves that
(Xn ) is a Cauchy sequence in the Banach space L1 , and therefore convergent and we
know that the limit is Y .
82 APPENDIX D. UNIFORM INTEGRABILITY
Index
σ-algebra, 59 lemma, 64
σ-field, 59 Filtration, 9
discrete, 4 natural, 9
Function
Canonical
harmonic, 45
process, 34
measurable, 60
sequence, 34
simple, 61
setting, 34
superharmonic, 45
canonical
realization, 35 Initial distribution, 34
Communication Class, 42 Integrable
Compensator, 16 uniformly, 79
Conditional Integral
expectation, 2 of a measurable function, 64
independence, 29 of a positive function, 63
probability, 4 of a simple function, 63
Conditional probability
product, 76 Kernel, 73
quotient, 5, 74 Markov, 73
regular, 4, 74 sub-Markov, 73
Doob Markov
decomposition, 16 matrix, 32
Downcrossing Number, 18 chain, 36
kernel, 73
Equicontinuous, 79
property, 37
Equiintegrable, 79
sequence, 30
Essential
homogeneous, 34
supremum, 66
strong property, 38
Fatou Markov chain
83
84 INDEX
irreducible, 42 function, 61
Martingale, 13 Space
regular, 23 filtered , 9
stopped, 14 measurable, 59
Measurable measure, 62
map, 60 State, 30
space, 59 aperiodic, 52
Measure null, 49
σ-finite, 65 positive, 49
bounded, 65 recurrent, 43
excessive, 45 transient, 43
invariant, 45 stochastic
non trivial, 45 matrix, 32
positive, 62 Submartingale, 13
space, 62 Supermartingale, 13
Supremum
Period, 52
essential, 66
Potential Matrix, 39
Process, 9 Theorem
adapted, 9 of monotone convergence, 63
closed, 9 of dominated convergence, 64
discrete time, 9 of Radon-Nikodym, 65
predictable, 16 Time
reverse, 10 hitting, 10
Random return, 10
process, 9 stopping, 10
Random variable Transition
discrete, 6 matrix, 32
integrable, 1 measure, 73
quasi-integrable, 1 probability, 73
Translation Operator, 35
shift operator, 35
Simple Upcrossing Number, 18