0% found this document useful (0 votes)
26 views

Proba 2

This document is a lecture note on probability theory covering the topics of conditional expectations, martingales, and Markov chains. It begins with an introduction and prerequisites. The contents include sections on conditional expectations, discrete time processes, discrete time martingales, Markov chains with countable states, and appendices reviewing measure theory, the monotone class theorem, transition probabilities, and uniform integrability. The author consulted several textbooks and lecture notes to compile these notes for students.

Uploaded by

Shiyi Zhang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Proba 2

This document is a lecture note on probability theory covering the topics of conditional expectations, martingales, and Markov chains. It begins with an introduction and prerequisites. The contents include sections on conditional expectations, discrete time processes, discrete time martingales, Markov chains with countable states, and appendices reviewing measure theory, the monotone class theorem, transition probabilities, and uniform integrability. The author consulted several textbooks and lecture notes to compile these notes for students.

Uploaded by

Shiyi Zhang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Probability II

Martingales and Markov chains

Joseph Abdou

Master 1 MAEF-IMMAEF-MMEF-QEM
Under continuous revision.
This version April 3, 2021
ii

The author of the present notes has consulted available textbooks and lecture notes
on the subject. For martingale theory and Markov chains my main sources were:
Jacques Neveu: Bases Mathématiques de la théorie des probabilités,
Jacques Neveu: Martingales en temps discret
J. Lacroix, P. Priouret, Cours: J. Lacroix, Probabilités approfondies, Université Pierre
et Marie Curie, Master de Mathématiques, 2005-2006
Jean Jacod, Chaı̂nes de Markov, Processus de Poisson et Applications, Université Pierre
et Marie Curie, DEA de Probabilités et Applications, 2003-2004
Prerequisite. Probability with measure: σ-fields, measure space, measurable maps.
Non-negative measures, integration of real valued functions. Convergence of sequences
of real valued maps. Monotone convergence, Fatou lemma, dominated convergence
(Lebesgue). Lp spaces. Probability measure. Random variables. Expectations of r.v.
Independence of sub-σ-fields, independence of random variables.
Contents

I Conditional expectations 1
I.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
I.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
I.1.2 Conditional Expectation: definition and existence . . . . . . . . . 1
I.2 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
I.2.1 Regular conditional probability . . . . . . . . . . . . . . . . . . . 4
I.2.2 Partially defined random variables . . . . . . . . . . . . . . . . . 6

II Discrete time Processes 9


II.1 Filtered measure space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
II.1.1 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
II.1.2 σ-field of events determined prior to a stopping time . . . . . . . 10

III Discrete time martingales 13


III.1 Definition and first properties . . . . . . . . . . . . . . . . . . . . . . . . 13
III.1.1 Examples of martingales . . . . . . . . . . . . . . . . . . . . . . . 14
III.1.2 Stopped martingales . . . . . . . . . . . . . . . . . . . . . . . . . 14
III.1.3 Maximal inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 16
III.2 Asymptotic properties of submartingales . . . . . . . . . . . . . . . . . . 17
III.2.1 Upcrossings and downcrossings of an interval . . . . . . . . . . . 17
III.2.2 Convergence of submartingales . . . . . . . . . . . . . . . . . . . 21
III.2.3 Convergence of reverse submartingales . . . . . . . . . . . . . . . 22
III.3 Regular martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
III.4 Martingales in Lp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

IV Markov Chains with countable states 29


IV.1 Conditional independence . . . . . . . . . . . . . . . . . . . . . . . . . . 29
IV.2 Markov sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
IV.2.1 Definition and first properties . . . . . . . . . . . . . . . . . . . . 30
IV.2.2 Countable states, Transition matrix . . . . . . . . . . . . . . . . 32

iii
iv CONTENTS

IV.2.3 Homogeneous Markov sequence . . . . . . . . . . . . . . . . . . . 34


IV.2.4 The canonical setting . . . . . . . . . . . . . . . . . . . . . . . . 34
IV.3 Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
IV.3.1 Calculations on a Markov chain . . . . . . . . . . . . . . . . . . . 38
IV.3.2 First classification: Communication classes . . . . . . . . . . . . 41
IV.3.3 Second Classification: recurrence and transience . . . . . . . . . 42
IV.3.4 Independence of Markov excursions . . . . . . . . . . . . . . . . . 44
IV.3.5 Invariant measures . . . . . . . . . . . . . . . . . . . . . . . . . . 45
IV.3.6 Third classification . . . . . . . . . . . . . . . . . . . . . . . . . . 48
IV.4 Ergodic Theorem for irreducible recurrent chains . . . . . . . . . . . . . 51
IV.5 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
IV.6 Asymptotic properties of the Markov matrix . . . . . . . . . . . . . . . . 53

A Reminder on measure theory 59


A.1 Measurable structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
A.2 Positive measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
A.2.1 Integration of positive functions . . . . . . . . . . . . . . . . . . 63
A.2.2 Integration of measurable functions . . . . . . . . . . . . . . . . . 64
A.2.3 Radon-Nikodym derivative . . . . . . . . . . . . . . . . . . . . . 65
A.2.4 Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

B Monotone class 69
B.1 Monotone class theorem for sets . . . . . . . . . . . . . . . . . . . . . . . 69
B.2 Monotone class theorem for funtions . . . . . . . . . . . . . . . . . . . . 70

C Transition probability 73
C.1 Composition of probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 73
C.2 Regular conditional probability . . . . . . . . . . . . . . . . . . . . . . . 74

D Uniform integrability 79
Chapter I

Conditional expectations

Preliminary and incomplete, Please do not quote

I.1 Conditional Expectation


I.1.1 Preliminaries

Let (Ω, A , P) be a probability space, the mathematical expectation operator of which


will be denoted denoted E. A random variable (r.v. in the sequel) is a map from
(Ω, A ) to a some measurable space (E, E ) that is A /E -measurable. When E = R,
or E = R, we always assume that they are endowed with their Borel σ-field. Maps
taking their values in R are said to be real, those taking their values in R are said to
be numerical. For any B sub-σ-field of A , we call non negative and denote by B+
(resp. we call bounded and denote by Bb ) the set of B-measurable r.v. with values in
R+ (resp. with values in R and bounded). The respective sets of equivalence classes
(X =P Y if and only if P(X 6= Y ) = 0 ) are denoted respectively B+ (P), Bb (P). A
numerical r.v X is said to be integrable if E(|X|) < +∞. The integral of X is defined by
E(X + ) − E(X − ) and is denoted E(X). A numerical r.v. is said to be quasi-integrable if
either E(X + ) < +∞ or E(X − ) < +∞. Its integral is well defined by E(X + ) − E(X − )
and can take the values +∞ or −∞. Any nonnegative r.v. and any integrable r.v. is
quasi-integrable.

I.1.2 Conditional Expectation: definition and existence

Let (Ω, A , P) be a probability space.

I.1.1 Theorem. For any sub-σ-field B ⊂ A and any nonnegative r.v. X, there exists
a nonnegative r.v. Y such that:
(i) Y is B-measurable,

1
2 CHAPTER I. CONDITIONAL EXPECTATIONS, APRIL 3, 2021

R R
(ii) For any B ∈ B : B Y dP = B XdP.
Such a r.v. is unique in the sense that if Y and Y ′ are two such r.v., then Y = Y ′ a.s.

The r.v. Y associated to X will be denoted EB (X) or E(X|B).


The proof of existence is a straightforward application of the Radon-Nikodym theorem
A.2.12: the map Q : B → E(1B X) defines a σ-additive nonnegative measure on B with
the property that P(B) = 0 implies Q(B) = 0, therefore there exists some element of
B+ say Y such that Q(B) = E(1B Y ). Uniqueness up to a zero measure set is an easy
exercise.

I.1.2. By an easy application of the monotone class theorem (see Appendix B) property
(ii) of theorem I.1.1 can be replaced by any of the following equivalent properties:
(ii)’ for any bounded, B-measurable r.v. Z, one has : E(ZY ) = E(ZX).
(ii)” for any nonnegative, B-measurable r.v. Z, one has : E(ZY ) = E(ZX).

Since E(EB (X)) = E(X), X is integrable if and only if EB (X) is integrable. This
remark allows the extension of theorem I.1.1 to any quasi-integrable r.v as follows:

I.1.3 Theorem and definition. For any sub-σ-field B ⊂ A , and any quasi-integrable
r.v. X, there exists a r.v. Y such that :
(i) Y is B-measurable,
R R
(ii) For any B ∈ B : B Y dP = B XdP.
Such a r.v. is unique in the sense that if Y and Y ′ are two such r.v., then Y = Y ′ a.s.
It is called a (version of ) the conditional expectation of X given B. and will denoted
EB (X) or E(X|B).
Moreover EB (X) = EB (X + ) − EB (X − ).

It is easy to see, using again a monotone class theorem (see Appendix A), that
property (ii) of theorem I.1.3 can be replaced by the equivalent property:
(ii)’ for any bounded, B-measurable r.v. Z, one has : E(ZY ) = E(ZX).

I.1.4. An immediate consequence of the theorem is that for any quasi-integrable X, Y


if X =P Y then EB (X) =P EB (Y )
Therefore, rigorously speaking, EB is an operator from the set of equivalence classes
of q.i. A -measurable random variables to the set of equivalence classes of q.i. B-
measurable random variables. In the sequel we will identify a r.v. to its equivalence
class so that writing an equality or inequality between two random variables will mean
an almost sure equality or inequality even if this is not mentioned.
I.1. CONDITIONAL EXPECTATION 3

I.1.5. In what follows X, Y, Z, . . . are supposed to be quasi-integrable r.v.


1) If X ≥ 0 and Y ≥ 0 (resp. if X or Y is integrable) then:

EB (X + Y ) = EB (X) + EB (Y ) (I.1)

2) Let Z be B measurable. If Z ≥ 0 and X ≥ 0 (resp. if Z is bounded and X is


integrable) then :
EB (ZX) = ZEB (X) (I.2)

3) If X if independent of B then:

EB (X) = E(X) (I.3)

4) If B ⊂ C ⊂ A then :
EB EC (X) = EB (X) (I.4)

5) Monotone convergence:
If Xn ↑ X and if for some n0 , E(Xn−0 ) < +∞ then EB (Xn ) ↑ EB (X)
If Xn ↓ X and if for some n0 , E(Xn+0 ) < +∞ then EB (Xn ) ↓ EB (X)
6) Fatou-like lemma:
If Z ≤ Xn a.s for all n and E(|Z|) < +∞ then EB (lim inf n Xn ) ≤ lim inf n EB (Xn )
If Xn ≤ Z a.s for all n and E(|Z|) < +∞ then EB (lim supn Xn ) ≥ lim supn EB (Xn )
7) Dominated convergence:
If |Xn | ≤ Z a.s. for all n and E(|Z|) < +∞ and Xn → X a.s then EB (Xn ) → EB (X)
8) Jensen’s inequality: If f : R → R convex, E(|X|) < +∞ and E(|f (X)|) < +∞ then
f (EB (X)) ≤ EB (f (X))
9) The restriction of EB on L2 (Ω, A , P) is the orthogonal projector on the closed
subspace L2 (Ω, B, P) that is, for all X ∈ L2 (Ω, A , P), EB (X) is the unique element
Z ∈ L2 (Ω, B, P) such that for all Y ∈ L2 (Ω, B, P) one has: E(ZY ) = E(XY ).
10) For any p ≥ 1, the restriction of EB on Lp (Ω, A , P) is a linear map between
Lp (Ω, A , P) and its subspace Lp (Ω, B, P). This map is idempotent (EB ◦ EB = EB )
and ||EB || = 1, that is EB is a projector on Lp (Ω, B, P) of norm 1.

Conditional expectation given a r.v. T Let (E, E ) be a measurable space, let


T : Ω → E and let X be a quasi- integrable r.v, then E(X|σ(T )) is often denoted
E(X|T ). E(X|T ) is the unique σ(T )-measurable Y such that, for any f : E → R that
4 CHAPTER I. CONDITIONAL EXPECTATIONS, APRIL 3, 2021

is E -measurable and bounded: E[f (T )Y ] = E[f (T )X]. (Here f (T ) is another notation


of f ◦ T ).

E(X|T ) is the conditional expectation of X given T

I.2 Conditional probability


If A ∈ A , we put P(A|B) ≡ PB (A) := E(1A |B). This is sometimes called the
conditional probability of A given B. For any fixed A ∈ F , PB (A) is a B-measurable
random variable, defined P -a.s.
Let (E, E ) be a measurable space, let T : Ω → E then P(A|σ(T )) is often denoted
P(A|T ) and it is called the conditional probability of A given T .

I.2.1 Regular conditional probability

Let B be a sub-σ-field of A . Assume that we choose for each A ∈ A an arbitrary


version of P(A|B). Such a random variable is a B-measurable map. If A, B ∈ A
are disjoint then one can verify the equality P(A ∪ B|B) = P(A|B) + P(B|B) a.s.
The set of states Λ(A, B) where equality holds true has measure 1. More generally
given a countable family I = (Ai ) of disjoint sets the set Λ(I ) where σ-additivity is
satisfied is an element of A and P(Λ(I )) = 1 (by monotone convergence for conditional
expectation). However since the collection of all countable disjoint families of A may
not be countable, the set Ω∗ of all states ω where the set function A P(A|B)(ω) ≡
PB (ω, ·) is σ-additive may be not belong to A and may be too small. Therefore we
do not have necessarily a family of probability measures (PB (·, ω), ω ∈ Ω∗ ) on (Ω, F )
with P(Ω∗ ) = 1. The question is : under what conditions such a choice is possible
? Such a family of measures, or transition probability, when it exists is called regular
conditional probability (In short RCP) of P given B. The notion of regular conditional
probability is the object of Appendix B.

Regular conditional probability for discrete σ-fields A sub-σ-field B is said to


be discrete if B is generated by a countable or finite partition π = (Bi , ∈ I). Let π(ω)
be the element of the partition that contains ω. A r.v. X is B-measurable if and only if
X is constant on any Bi , (i ∈ I), or equivalently if and only if π(ω) = π(ω ′ ) ⇒ X(ω) =
X(ω ′ ). Let I ∗ the set of indices i such that P(Bi ) > 0, and let Ω∗ = ∪i∈I ∗ Bi so that
P(Ω∗ ) = 1. Define Q(ω, A) := P(A|π(ω)) if P(π(ω)) > 0 and A ∈ A , and Q(ω, ·) an
arbitrary probability measure on (A, A ) if P(π(ω)) = 0. Then:
(i) for all A ∈ A Q(·, A) is B-measurable
(ii) for all ω ∈ Ω, Q(ω, ·) is a probability (σ-additive) measure on (Ω, A ).
I.2. CONDITIONAL PROBABILITY 5

(i) is true since Q(·, A) constant on any element of the partition while (ii) is straigh-
forward. Properties (i) and (ii) imply that Q is a transition probability (or Markov
R R
Kernel) from (Ω, B) to (Ω, A ). Now Bi Q(ω, A)dP(ω) = P(A ∩ Bi ) = Bi 1Bi dP(ω).
It follows that we have :

(iii) for all A ∈ A Q(·, A) = PB (A) a.s.

I.2.1 Definition. Let (E, E ) be a measurable space. A transition probability Q from


(E, E ) to (Ω, A ) (or Markov kernel) is a map from E × A to R+ such that:
(i) for all A ∈ A , Q(·, A) is E -measurable
(ii) for all x ∈ E, Q(x, ·) is a probability measure on (Ω, A ).

I.2.2 Definition. Let B be a sub-σ-field of A . A transition probability Q from (Ω, B)


to (Ω, A ) is a regular conditional probability (RCP) of P given B if for all A ∈ A :

Q(· , A) = PB (A) a.s. (I.5)

If Q is a RCP of P given B, and if X is a r.v. then we have :


Z
B
E (X)(ω) = X(ω ′ )Q(ω, dω ′ ) for almost all ω, (I.6)

Since for almost all ω, PB (ω, ·) is a probability measure one can write the following
equality between random variables:
Z
B
E (X) = X(ω ′ )PB (· , dω ′ ) a.s (I.7)

Quotient regular conditional probability. Let (Ω, A , P) be a probability space,


(E, E ) be a measurable space and T : Ω → E be A /E - measurable.

I.2.3 Definition. A transition probability (or Markov kernel) Q from (E, E ) to (Ω, A )
is a quotient regular conditional probability (QRCP) of P w.r.t T if for any A ∈ A one
has:
Q(T (·), A) = P(A |T )(·) a.s. (I.8)

We recall a useful result in measure theory (no probability is needed):

I.2.4 Lemma (Doob). Let (E, E ) be a measurable space, let F be a polish space (com-
plete, separable metric space) endowed with its Borel σ-field BF , let T : Ω → E and let
X : Ω → F . Then the following statements are equivalent:
(i) X is σ(T )-measurable
(ii) There exists a mapping h : E → F , E /BF - measurable and such that X = h ◦ T
6 CHAPTER I. CONDITIONAL EXPECTATIONS, APRIL 3, 2021

T✲
Ω E♣

♣♣
X❅
♣♣ h

❅ ❄
♣♣

Let (Ω, A , P) be a probability space, (E, E ) be a measurable space and T : Ω → E


be A /E - measurable. In view of Doob’s lemma I.2.4, (where we take F = R̄) for
every quasi-integrable random variable X and T : Ω → E there exists some measurable
hX : E → R̄ such that E(X|T ) = hX (T ). Since E(X|T ) is defined up to a P- null set,
hX is defined up to a PT -null set, where PT ≡ T (P) is the probability image of P by
T.
When a quotient regular conditional probability (QRCP) Q of P given T exists,
then hX (t) can be obtained as the expectation of X w.r.t. the probability Q(t, ·) on
R
(Ω, A ). That is hX (t) = Ω X(ω)Qt (dω).

The discrete case. Assume now that E is finite or countable with its discrete σ-field.
A random variable T with values in E is said to be discrete. Put Bt := T −1 (t) (t ∈ E).
T being measurable, one has Bt ∈ A for all t. Let E ∗ be the set of all t ∈ E such
that P(Bt ) > 0 and let Ω∗ = ∪t∈E ∗ Bt . We put N (t, A) = P(A|T = t) if t ∈ E ∗ and
N (t, ·) an arbitrary probability measure on (Ω, A ), then N is a transition probability
(Markov kernel) from E to (Ω, A ). Since N (T (ω), A) = P(A|T )(ω) a.s., the family
N is the quotient regular conditional probability (QRCP) of P given T . Let X be a
quasi-integrable r.v and let h(t) := Ω X(ω ′ )N (t, dω ′ ), for all t ∈ E ∗ . We see that for
R

any ω ∈ Ω∗ we have E(X|T )(ω) = Ω X(ω ′ )N (T (ω), dω ′ ) = h(T (ω)).


R

I.2.2 Partially defined random variables

Partially defined random variable. Let B ⊂ A a sub-σ-field. For any U ⊂ Ω,


we put BU = {B ∩ U : B ∈ B}, BU is the section of B on U .

Let U ∈ A one can consider random variables restricted to U . A map X defined


on some event containing U with values in (E, E ) is said to be B-measurable on U
if its restriction on U is BU -measurable. If U ∈ B, BU is equal to the subclass
{B ⊂ A|B ∈ B}, therefore X is B-measurable on U if and only if U ∩ {X ∈ C} ∈ B,
for all C ∈ E . We say that X is a partially defined random variable, its domain of
definition is U .

Let U ∈ B and let X be a quasi-integrable random variable and let Y = EB (X). Then
one has EB (1U X) = 1U EB (X).
I.2. CONDITIONAL PROBABILITY 7

Cond. exp. of partially defined random variables. Let T be a random variable


defined on U and let T̄ be its extension by 0 on Ω \U . We say that T is quasi-integrable
if T̄ is.
Assume P(U ) > 0. Let PU be the probability on (U, AU ) defined by PU (B) =
P(B∩U )
P(B |U ) := P(U ) and denote by EU the corresponding expectation operator. We
can view T as a random variable on the probability space (U, AU , PU ). Then T is
quasi-integrable if and and if it is quasi-integrable relatively to the space (U, AU , PU ).
Let B be a sub-σ-field of A , and let BU be its section on U . Define the conditional
expectation of T given B as the conditional expectation of T in the space (U, AU , PU )
given BU . In formal notations E(T |B) := EU (T |BU ).
Let X be a quasi-integrable extension of T , Y = EB (X). Denote by Y|U the restriction
of Y to U . In general, if U is not in B, E(T |B) and Y|U are not comparable.
Assume that U ∈ B. For any bounded r.v. Z defined on U and BU -measurable,
denote by Z̄ the extension of Z by 0 on Ω \ U . Z̄ is B-measurable. Therefore one has:
1 1
R R R R R
U ZT dPU = U ZX |U dPU = P(U ) Ω 1U Z̄XdP = P(U ) Ω 1U Z̄Y dP = U ZY|U dPU .
Since Y|U is BU -measurable, by the uniqueness of the conditional expectation on
(U, AU , PU ) we conclude that:

E(T |B) = E(X|B)|U a.s. (I.9)

Since the last equality is true for any extension X of T , one can take the extension
X = T̄ . We conclude that :

E(T |B) = E(T̄ |B)|U a.s. (I.10)

In general if π = (Ui , i ∈ N) is a partition of Ω and Ui ∈ B (i ∈ N) one has: EB (X) =


EB (1Ui X) on Ui . Equivalently EB (X) on U i is equal to EB (X|Ui ).
8 CHAPTER I. CONDITIONAL EXPECTATIONS, APRIL 3, 2021
Chapter II

Discrete time Processes

Preliminary and incomplete, Please do not quote

II.1 Filtered measure space

Let (Ω, F , P) a probability space and let (T, ≤) be a partially ordered set. A family
(Xt , t ∈ T) of random variables with values in (E, E ) is called a random process. When
(T, ≤) is totally ordered, it can be viewed as “time”. In most applications T is either
an interval of R+ or Z.
A T-filtration is a collection (Ft , t ∈ T) of sub-σ-fields of F such that Fs ⊂ Ft if s ≤ t.
A filtered space (Ω, F , P, (Ft , t ∈ T)) is a probability space together with a T-filtration.

A process (Xt , t ∈ T) is said to be adapted to the filtration (Ft , t ∈ T) if Xt is Ft -


measurable for all t ∈ T.

Let (Xt , t ∈ T) be a process with values in (E, E ). The natural filtration associated to
(Xt ) is the filtration defined by FtX = σ(Xs , s ≤ t) (t ∈ T). Any process is adapted
relatively to its natural filtration.
In this course we are only interested in discrete time processes, that is, processes
where T is an interval of Z ≡ Z ∪ {−∞, +∞} with its natural order. In the sequel,
T will be mostly the set of natural numbers N = Z+ or its closure N = N ∪ {+∞}.
The case where T is a finite interval can be easily deduced from the case T = N. We
shall also consider, in very few cases, the negative interval T = Z− and its closure
Z− ∪ {−∞}. In what follows we shall use the following conventions:
A process without any further qualification is a process where T = Z+ = N. In such a
process we are mostly interested in asymptotic properties when the time goes to +∞.
A process closed on the right is a process where T = Z+ ∪ {+∞},
9
10 CHAPTER II. DISCRETE TIME PROCESSES APRIL 3, 2021

A finite process is a discrete process where T is a finite interval of Z,


A reverse process is a discrete process where T = Z−. In a reverse process, we are
interested in asymptotic properties when the time goes to −∞.

II.1.1 Stopping times

In this subsection we shall take T = N. Let (Fn , n ∈ N) be a filtration. We put


F∞ = ∨n∈N Fn .

II.1.1 Definition. A stopping time of the filtration (Fn ) is a mapping τ : Ω →


N ∪ {+∞} such that {τ ≤ n} ∈ Fn for all n ∈ N.
II.1.2 Exercise. τ : Ω → N ∪ {+∞} is a stopping time if and only if {τ = n} ∈ Fn
for all n ∈ N.

Examples of stopping times

If τ (ω) = p for all ω ∈ Ω, then τ is a stopping time and Fτ = Fp . Most of the stopping
times that will be met in this course are obtained as follows:

II.1.3 Definition. Let (Xn , n ∈ N) be a process with values in (E, E ) and let B ∈ E ,
then the first hitting time of B is defined by :

τB (ω) = inf{n| n ≥ 0, Xn ∈ B} (II.1)

The first return time to B is defined by:

σB (ω) = inf{n| n ≥ 1, Xn ∈ B} (II.2)

with the convention inf ∅ = +∞

II.1.4 Proposition. If (Xn , n ∈ N) is (Fn )-adapted with values in (E, E ) and if


B ∈ E then τB and σB are stopping times.

Proof. (τB ≤ n) = ∪nk=0 (Xk ∈ B) ∈ Fn (n ∈ N), (σB ≤ n) = ∪nk=1(Xk ∈ B) ∈ Fn


(n ≥ 1) and (σB ≤ 0) = ∅. 

II.1.2 σ-field of events determined prior to a stopping time

.
Let τ be a stopping time of the filtration (Fn ). An event B ∈ A is determined
prior to τ if B ∈ F∞ and B ∩ {τ ≤ n} ∈ Fn for all n ∈ N. The set of such events is a
σ-field, denoted Fτ , this is the σ-field of events determined prior to τ .
II.1. FILTERED MEASURE SPACE 11

II.1.5 Proposition. Fτ = {B ∈ F∞ |B ∩ {τ = n} ∈ Fn , ∀n ∈ N} is a σ-field.


Proof. Exercise

II.1.6 Proposition. If τ and ν are stopping times then:


1) τ ∧ ν, τ ∨ ν, τ + ν are stopping times
2) If τ ≤ ν then Fτ ⊂ Fν
3) Fτ ∧ν = Fτ ∩ Fν
4) (τ ≤ ν), (τ < ν), (τ = ν) are in Fτ ∩ Fν

Proof. Exercise

Closed process. Starting with a process (Xn , n ∈ N), it is sometimes useful to


embed it in a closed process, that is, a process defined on N ∪ +{∞}. One can close
any adapted process (Xn , n ∈ N). This can be done by first adding the terminal time
+∞ to the time interval and associate to this new instant some sub-σ-field G such that
N W
Fn ⊂ G for all n ∈ . Most of the time we take G = F∞ := n∈N Fn . Then we add
some real r.v. ∞ that is F∞ -measurable. The closure that one adopts depends on the
context. For instance if E = R, and (Xn) has an a.s. limit Y then one can set ∞ := Y .
One can also set ∞ = 0. In some cases, where the process has a value in some abstract
measure space (E, E ), it may be useful to add some isolated point denoted ∂ to E and
endow E ∪ {∂} with the σ-field generated by E and {∂} and to consider the process
(Xn , n ∈ N) as a process with values in E ∪ {∂} and to set ∞ := ∂.
In what follows we sometimes use statements like “X is measurable on B” or “X =
Y on B” where B ∈ A and X and Y are random variables. For restricted domains see
paragraph I.2.2.

II.1.7 Proposition. Let τ be a stopping time, and let X be a r.v. with values in (E, E )
then:
(a) the following are equivalent:
(i) X is Fτ -measurable
(ii) X is F∞ -measurable and X is Fn -measurable on {τ = n} for every n ∈ N
(iii) X is Fn -measurable on {τ = n} for every n ∈ N
(b) If X is an integrable or ≥ 0 r.r.v then: E(X|Fτ ) = E(X|Fn ) on {τ = n} (n ∈ N)
Proof (a) (i) ⇒ (iii): If (i) is verified then for any n ∈ N, and B ∈ E , on has
{X ∈ B} ∈ Fτ ⊂ F∞ , therefore {τ = n} ∩ {X ∈ B} ∈ Fn . (iii) ⇒ (ii): If (iii)
is satisfied, then for any Borel B, (X ∈ B) = ∪n∈N ((X ∈ B) ∩ (τ = n)) ∈ F∞ ,
12 CHAPTER II. DISCRETE TIME PROCESSES APRIL 3, 2021

therefore X is F∞ -measurable. (ii) ⇒ (i): If (ii) is satisfied then for any B ∈ E


one has {B ∈ E } ∈ F∞ and {τ = n} ∩ {X ∈ B} ∈ Fn so that {X ∈ B} ∈ Fτ .
(b) First remark that (τ = n) ∈ Fτ and E(X|Fn ) is Fτ -measurable on (τ = n).
For any A ∈ Fτ , E[1A 1(τ =n) X] = E[E(1A 1(τ =n) X|Fn )] = E[E(1A∩(τ =n) X|Fn )] =
E[1A∩(τ =n) E(X|Fn )] = E[1A 1(τ =n) E(X|Fn )] (we used A ∩ (τ = n) ∈ Fn to establish
the third equality). 

Let (Xn , n ∈ N) be a closed process and let τ be a random time. One can define a
new mapping Xτ : Ω → E as follows:

Xτ (ω) := Xτ (ω) (ω) (II.3)

When the process (Xn ) is defined only for n ∈ N, then formula defines a partial map
on {τ < +∞}, denoted also by Xτ .

II.1.8 Proposition. Let τ be a stopping time.


(i) For any closed adapted process (Xn , n ∈ N), Xτ is Fτ -measurable
(ii) For any adapted process (Xn , n ∈ N), Xτ is Fτ - measurable on (τ < +∞)
Proof. (i) Let B ∈ E . For any n ∈ N, {Xτ ∈ B}∩{τ = n} = {Xn ∈ B}∩{τ = n} ∈ Fn,
in particular {Xτ ∈ B} ∈ F∞ . (ii) Similar proof. 

Let (Xn , n ∈ N) be process, and let τ be a stopping time. The process (Xτ ∧n, n ∈ N)
is called the process (Xn ) stopped by τ . It denoted (Xnτ ).

II.1.9 Proposition. If (Xn , n ∈ N) is (Fn )-adapted and if τ is a stopping time of


(Fn ), then the stopped process (Xnτ ) is (Fn )-adapted and even (Fτ ∧n )-adapted.

Proof. For any B ∈ E {Xτ ∧n ∈ B} = ∪k≤n {τ = k}∩{Xk ∈ B} ∪ {τ > k}∩{Xn ∈

B} ∈ Fn . Moreover {Xτ ∧n ∈ B} ∩ {τ = k} is equal to {Xk ∈ B} ∩ {τ = k} ∈ Fk
if k ≤ n, and is equal to {Xn ∈ B} ∩ {τ = k} ∈ Fk if k > n. Therefore (Xnτ ) is
(Fτ )-adapted. Since Fτ ∧n = Fτ ∩ Fn , we conclude that (Xnτ ) is (Fτ ∧n ) adapted. 
Chapter III

Discrete time martingales

Preliminary and incomplete, Please do not quote

In all this chapter we are given a probability space (Ω, A , P)

III.1 Definition and first properties


III.1.1 Definition. Let (Ft , t ∈ T) be a T- filtration. A numerical process X =
(Xt , t ∈ T) is an (Ft )-submartingale if :
(i) For all t ∈ T, (Xt ) is Ft -measurable
(ii) For all t ∈ T, E(Xt+ ) < +∞
(iii) For all t, s ∈ T, t ≤ s:
Xt ≤ E(Xs |Ft ) a.s. (III.1)

X is a (Ft )-supermartingale if (−Xt ) is a (Ft )-submartingale, and a (Ft )-martingale


if X is a (Ft )- submartingale and a (Ft )-supermartingale.

It follows from the definition that an adapted process X is a martingale if and only if
it is a sub-martingale and a super-martingale. In particular X is integrable. Sometimes
a slightly more general notion of martingale is needed:
A numerical adapted process X = (Xt , t ∈ T) is a general martingale if the Xt are only
assumed to be quasi-integrable and if the relations III.1 are satisfied with equality.
This notion is mostly used when the Xt are non-negative and are referred to in the
literature as non-negative martingales even when they are not assumed to be integrable.
When need we refer to this notion as non-negative general martingale ...
In what follows, we shall assume that T = N so that, in general, we write (Fn )-
submartingale without further qualification. When the process is defined on T = Z+ ∪

13
14 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021

{+∞} we shall refer to it as a closed (on-the-right) (Fn )-submartingale. When T =


Z− we shall refer to it as a reverse (Fn )-submartingale. When (Fn , n ∈ N) is the
natural filtration generated by (Xn ), then the (Fn )-submartingale is simply called a
submartingale.
Given two filtrations (Gn ) and (Fn ) such that Gn ⊂ Fn (n ∈ N). If X is an (Fn )-
submartingale and if it is (Gn )-adapted, then X is a (Gn )- submartingale. Indeed if
n ≤ m, E(Xm |Gn ) = E(E(Xm |Fn )|Gn ) ≥ E(Xn |Gn ) = Xn . In particular any (Fn )-
submartingale is also a submartingale w.r.t its natural filtration.

III.1.1 Examples of martingales


················································

III.1.2 Stopped martingales

Unless otherwise stated, all martingales, stopping times, are w.r.t. to some fixed filtra-
N) and F∞ = ∨n∈NFn.
tion (Fn , n ∈

III.1.2 Proposition. Let (Xn , n ∈ N) be a martingale (resp. submartingale, resp.


supermartingale) and let τ be a stopping time. The stopped process (Xnτ ) is a martingale
(resp. submartingale, resp. supermartingale)
Proof. We give the proof for a submartingale: The stopped process (Xnτ ) is adapted
(prop. II.1.9). Let n ∈ N. Then Xn+1τ = 1{τ ≤n} Xτ + 1{τ ≥n+1} Xn+1 while Xnτ =
P
1{τ ≤n} Xτ + 1{τ ≥n+1} Xn . Now 1{τ ≤n} Xτ = k≤n 1{τ =k} Xk is Fn -measurable and so
is {τ ≥ n + 1}. It follows that EFn (Xn+1
τ ) = 1{τ ≤n} Xτ + 1{τ ≥n+1} EFn (Xn+1 ) ≥
1{τ ≤n} Xτ + 1{τ ≥n+1} Xn = Xnτ . 

III.1.3 Proposition (Doob’s optional sampling theorem). Let (Xn , n ∈ N) be a mar-


tingale (resp. submartingale, resp. supermartingale) and let τ1 and τ2 be two bounded
stopping times, such that τ1 ≤ τ2 . Then one has:

E(Xτ2 |Fτ1 ) = Xτ1 (resp. ≥, resp. ≤) (III.2)

Proof. We prove the statement in the case of a submartingale. It is equivalent to


prove : Z Z
Xτ1 dP ≤ Xτ2 dP
A A

R R R
N
for all A ∈ Fτ1 . Let us first assume τ2 ≡ k where k ∈ . For any j ≤ k, A ∩ {τ1 = j} ∈
Fj , therefore A∩{τ1 =j} Xτ1 dP = A∩{τ1 =j} Xj dP ≤ A∩{τ1 =j} Xk dP. Summing up for
0 ≤ j ≤ k, one has:
Z k Z
X k Z
X Z
Xτ1 dP = Xj dP ≤ Xk dP = Xk dP (III.3)
A j=0 A∩{τ1 =j} j=0 A∩{τ1 =j} A
III.1. DEFINITION AND FIRST PROPERTIES 15

For the general case, we have τ1 , τ2 ≤ k, for some k ∈ N, therefore we apply the first
part of the proof on the stopped submartingale (Xnτ2 ). We thus obtain:
R
A Xτ1 dP =
R τ2
R τ2 R
A Xτ1 dP ≤ A Xk dP = A Xτ2 dP. 

The optional sampling theorem does not extend to non bounded (finite) stopping
times. However if we start with a closed process, then this extension is possible. Recall
that when considering the closed process (Xn , n ∈ N), the σ-field of the filtration
associated to the terminal time +∞ is precisely F+∞ = ∨n∈N Fn . For submartingales
closed on the right (Xn , Fn , n ∈ N ∪ {+∞}), the optional sampling theorem is valid
for any stopping times. Precisely:

III.1.4 Theorem. Let (Xn , Fn , n ∈ N ∪ {+∞}) be a submartingale. Then for any


stopping time τ , E(Xτ+ ) < +∞ and for any pair of stopping times τ1 and τ2 such that
τ1 ≤ τ2 , one has:

Xτ1 ≤ E(Xτ2 |Fτ1 ) a.s. (III.4)

Proof. We first prove our assertion for a nonnegative submartingale (Xn , Fn , n ∈


N ∪ {+∞}) in the special case where τ1 = τ is an arbitrary stopping time and where
τ2 ≡ +∞. For any A ∈ Fτ , n ∈ N ∪ {+∞}, A ∩ {τ = n} ∈ Fn . By the submartingale
R R R
property: A 1{τ =n} Xτ dP = A 1{τ =n} Xn dP ≤ A 1{τ =n} X∞ dP. By the monotone
convergence theorem, one has:
P P
E(1A Xτ ) = j∈N E[1A 1{τ =j} Xτ ] ≤ j∈N E[1A 1{τ =j} X∞ ] = E(1A X∞ )
R
Thus if we take A = Ω we see first that Xτ is integrable, and in general A Xτ dP ≤
R
A X∞ dP. This ends the proof of the particular case.

Now let (Xn ) be any closed submartingale. For any a ∈ R, the process Yn = (Xn ∨a)−a
is a nonnegative submartingale. In view of the first part, we have for any A ∈ Fτ the
inequality: A (Xτ ∨ a)dP ≤ A (X∞ ∨ a)dP and in particular E(Xτ+ ) ≤ E(X∞ + ) < +∞.
R R

If we let a ↓ −∞, then Xτ ∨ a ↓ Xτ and X∞ ∨ a ↓ X∞ . Moreover for a ≤ 0, Xτ ∨ a ≤ Xτ+


and X∞ ∨ a ≤ X∞ + . By the monotone convergence theorem (Property (9) paragraph
R R
I.1.5) A Xτ dP ≤ A X∞ P.
In the general case where τ1 ≤ τ2 , we apply the result on the stopped submartingale
τ2
X τ2 . One has : Xτ1 = Xττ12 ≤ E(X+∞ |Fτ1 ) = E(Xτ2 |Fτ1 ). 

III.1.5 Remark. Theorem III.1.3 can be easily deduced from Theorem III.1.4 as
follows. If (Xn , n ∈N) is a submartingale and τ1 ≤ τ2 ≤ p, then introduce the closed
submartingale (Yn , n ∈ N ∪ {+∞}) defined by Yn = Xn (n < p) and Yn = Xp (n ≥ p)
and apply Theorem III.1.4 on (Yn ).
16 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021

III.1.6 Proposition. An integrable (Fn )-adapted process is an (Fn )-martingale if and


only if for any bounded stopping time τ one has E(Xτ ) = E(X0 ).

Proof. Let (Xn ) be an (Fn )-martingale. If N ∈ N is such that τ ≤ N , then in view of


proposition (III.1.3) E(Xτ ) = τ ]
E[XN = E(X0 ). Conversely let Xn be an (Fn )-adapted
process and let n ∈ N. For any A ∈ Fn we define a map τ as follows: τ = n on
A and τ = n + 1 on Ac . τ is a stopping time taking the values and n and n + 1.
By assumption E(Xn+1 ) = E(X0 ) = E(Xτ ) = E(1A Xn + 1Ac Xn+1 ). It follows that
E(1A Xn ) = E(1A Xn+1 ). 

III.1.7 Definition. A process (An , n ∈ N) is said to be predictable (relatively to the


filtration (Fn )) if A0 is F0 -measurable and for any n ≥ 1, An is Fn−1 -measurable. It
is said to be nondecreasing if An ≤ An+1 (n ∈ N).
III.1.8 Theorem (Doob decomposition of an integrable submartingale). Any inte-
grable submartingale (Xn ) can be written as a sum Xn = Mn + An a.s., where (Mn ) is
an integrable martingale and (An ) is an integrable, predictable, nondecreasing process
with A0 = 0. Such a decomposition is unique.

Proof. By induction define A0 := 0 and An+1 := An + EFn (Xn+1 − Xn ) (n ≥ 1) and


Mn := Xn − An (n ≥ 0). It is clear that the process (An ) is predictable, nondecreasing
and EFn (Mn+1 − Mn ) = 0. Such a decomposition is unique: If Xn = Mn + An =
Mn′ + A′n , then A′n+1 − A′n = EFn (Xn+1 − Xn ) = An+1 − An . Since A0 = A′0 = 0 then
by induction A′n = An and therefore Mn′ = Mn for all n ≥ 0. 

III.1.9 Definition. The process (An ) involved in the Doob decomposition of (Xn ) is
called the compensator of (Xn ). It is given inductively by A0 = 0 and An = An−1 +
E(Xn − Xn−1 |Fn−1 ) for (n ≥ 1).

III.1.3 Maximal inequalities

III.1.10 Proposition. For any submartingale (Xn , Fn , n ∈ N), and any a > 0, any
N∈ N we have :
1 1
Z
+ +
P( sup Xk > a) ≤ XN dP ≤ E(XN ) (III.5)
k≤N a ( sup Xk >a) a
k≤N

1  1
Z
+
P( inf Xk < −a) ≤ −E(X0 )+ XN dP ≤ (−E(X0 )+E(XN )) (III.6)
k≤N a ( inf Xk ≥−a) a
k≤N

N
For any supermartingale (Xn , Fn , n ∈ ), and any a > 0, any N ∈ we have: N
1 1
Z

P( sup Xk > a) ≤ (E(X0 ) − XN dP) ≤ (E(X0 ) + E(XN )) (III.7)
k≤N a ( sup Xk ≤a) a
k≤N
III.2. ASYMPTOTIC PROPERTIES OF SUBMARTINGALES 17

1 1
Z
− −
P( inf Xk < −a) ≤ XN dP ≤ E(XN ) (III.8)
k≤N a ( inf Xk <−a) a
k≤N

Proof. Let ν = inf{n ≥ 0|Xn > a} with the usual convention inf ∅ = +∞. ν is a
stopping time. Also {ν ≤ N } = {supk≤N Xk > a} = {supk≤N Xk+ > a}. If (Xn ) is a
submartingale, then (Xn+ ) is a submartingale. For any k ≤ N , since {ν = k} ∈ Fk ,
R R R R
we have {ν=k} Xk ≤ {ν=k} XN , so that aP(ν ≤ N ) ≤ {ν≤N } Xν ≤ {ν≤N } XN ≤
+ +
R
{ν≤N } XN ≤ E(XN ). This proves inequality (III.5).
R R
If (Xn ) is a supermartingale, then aP(ν ≤ N ) ≤ {ν≤N } Xν = {ν≤N } Xν∧N =

R R R R
Xν∧N − {ν>N } Xν∧N = E(Xν∧N ) + {ν>N } (−XN ) ≤ E(X0 ) + {ν>N } XN ≤ E(X0 ) +
− −
E(XN ). We used the fact that (Xν∧n ) is a supermartingale and −XN ≤ XN . This
proves inequality (III.7). The other inequalities are easy consequences of the proved
ones. 

III.1.11 Corollary. For any submartingale


1 3
P(sup |Xk | > a) ≤ (E(X0− ) + 2 sup E(Xn+ )) ≤ sup E(|Xn |)) (III.9)
n a n a n

III.2 Asymptotic properties of submartingales


III.2.1 Definition. Let A = (An , n ∈ N) be a bounded predictable process and let
X = (Xn ) be an adapted integrable process. The transform of X by A is the new
process denoted A ⋆ X, defined by induction as follows:
(i) (A ⋆ X)0 = A0 X0 and
(ii) (A ⋆ X)n = (A ⋆ X)n−1 + An (Xn − Xn−1 ) (n ≥ 1)

It is easy to see that the new process A ⋆ X is adapted.

III.2.2 Proposition. If A is bounded predictable (resp. bounded predictable and non


negative) and X a martingale (resp. an integrable submartingale), then A ⋆ X is a
martingale (resp. an integrable submartingale).

III.2.1 Upcrossings and downcrossings of an interval

Let (xn , n ∈ N) a sequence in R. Let a, b ∈ R such that a < b. We shall associate to


((xn ), a, b) a sequence (νk ) in N as follows (with the convention inf(∅) = +∞):
ν0 = 0
ν1 = inf(n : n ≥ 0, xn ≤ a)
ν2 = inf(n : n ≥ ν1 , xn ≥ b)
and for k ≥ 1, by induction:
18 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021

ν2k+1 = inf(n : n ≥ ν2k , xn ≤ a)


ν2k+2 = inf(n : n ≥ ν2k+1 , xn ≥ b)
··· ··················
Clearly (νk ) (k ≥ 1) is strictly increasing as long as it does not take the value +∞. We
define the number of upcrossings by (xn ) of the interval [a, b] as the number possibly
+∞:
U ((xn ), a, b) := sup(p ∈ N : ν2p < +∞) (III.10)

We define similarly a sequence


ν0′ = 0
ν1′ = inf(n : n ≥ 0, xn ≥ b)
ν2′ = inf(n : n ≥ ν1′ , xn ≤ a)
and for k ≥ 1, by induction:

ν2k+1 ′ , x ≥ b)
= inf(n : n ≥ ν2k n

ν2k+2 ′
= inf(n : n ≥ ν2k+1 , xn ≤ a)
··· ··················
The sequence (νk′ ) (k ≥ 1) is strictly increasing as long as it does not take the value
+∞. The number of downcrossings by (xn ) of the interval [a, b] is the number possibly
+∞ defined by:
D((xn ), a, b) := sup(p ∈ N : ν2p′ < +∞) (III.11)

We have the following:

lim inf xn < a < b < lim sup xn ⇒ U ((xn ), a, b) = +∞ ⇒ lim inf xn ≤ a < b ≤ lim sup xn

We conclude that (xn ) is convergent in R if and only if : for all a, b ∈ Q, a < b one has
U ((xn ), a, b) < +∞

When (Xn , n ∈ N) is a sequence of random variables we can define the sequences


(νn (ω)), (νn′ (ω)), U ((Xn (ω)), a, b) and D((Xn (ω)), a, b) (ω ∈ Ω).

III.2.3 Proposition. For any process (Xn ), νn and νn′ , U ((Xn ), a, b) and D((Xn ), a, b)
are random variables. If (Xn ) is adapted, then νn and νn′ are stopping times for all
n∈ N. One has the following equalities:
(i) U ((Xn ), a, b) = D((−Xn ), −b, −a)
(ii) U ((Xn ), a, b) = U (((X − a)+
n ), 0, b − a)

(iii) D((Xn ), a, b) = D(((X − a)+


n ), 0, b − a)

We define the upcrossing number of (Xn ) on the finite time interval [0, N ] as the number
U N ((Xn ), a, b) := U ((XnN ), a, b) where (XnN ) is (Xn ) stopped at N . We define similarly
the downcrossing number D N ((Xn ), a, b).
III.2. ASYMPTOTIC PROPERTIES OF SUBMARTINGALES 19

If a process X = (Xn : n = 0, . . . , N ) is defined only on the finite time interval [0, N ],


we define the upcrossing number U (X, a, b) as U N ((Yn ), a, b) where (Yn ) extends X by
putting: Yn = Xn for n ≤ N and Yn = XN for all n > N .

III.2.4 Proposition. Let (Xn , n ∈ N) be a sequence of random variables. (Xn) con-


verges a.s. in R if and only if for any (a, b) ∈ Q × Q, a < b, the r.v. U ((Xn ), a, b) is
a.s. finite.

Proof. Let Ω0 = {ω|(Xn (ω))converges in R}, and let I be the collection ((a, b) ∈
Q × Q, a < b). For (a, b) ∈ I put C(a, b) = {ω|U ((Xn (ω)), a, b) = +∞}. From
relations (III.12), one has Ω \ Ω0 = ∪(a,b)∈I C(a, b). Since I is countable, P(Ω0 ) = 1
if and only if P(C(a, b)) = 0 for all (a, b) ∈ I . 

We are going to associate to (Xn ) two processes (Vn ) and (Vn′ ). Put V0 = 0 and for
k ≥ 1 put: 
0 if ∃m ≥ 0, ν2m < k ≤ ν2m+1
Vk =
1 if ∃m ≥ 0, ν2m+1 < k ≤ ν2m+2

Similarly put V0′ = 0 and for k ≥ 1 put:


′ ′

′ 0 if ∃m ≥ 0, ν2m < k ≤ ν2m+1
Vk = ′ ′
1 if ∃m ≥ 0, ν2m+1 < k ≤ ν2m+2

III.2.5 Lemma. For any adapted process (Xn ), the sequences (Vn ) and (V ′ n) are
predictable.

III.2.6 Theorem. For any (Fn )-submartingale (Xn ), any (a, b) ∈ R2 , a < b, N ∈ N,
we have the following inequalities:

(b − a)EF0 [U N ((Xn ), a, b)] ≤ EF0 [(XN − a)+ ] − (X0 − a)+ (III.12)


(b − a)EF0 [D N ((Xn ), a, b)] ≤ EF0 [(XN − b)+ ] − (X0 − b)+ (III.13)

Proof of the first inequality. Assume first, that the submartingale is integrable. Put
U = U N ((Xn ), a, b) and ρ = ν2U +1 ∧ N . Using the inequalities b − Xν2k ≤ 0 and
Xν2k−1 − a ≤ 0 true for all k ∈ {1, . . . , U }, we have:
V0 X0 + U (b − a) + XN − Xρ
= V0 X0 + U
P
k=1 [(Xν2k − Xν2k−1 ) + (b − Xν2k ) + (Xν2k−1 − a)] + XN − Xρ
PU
≤ V0 X0 + k=1 (Xν2k − Xν2k−1 ) + XN − Xρ
= (V ⋆ X)N
Now ((1 − V ) ⋆ X)n ) is a submartingale by proposition III.2.2. Taking conditional
expectations on the preceding inequalities, it follows that
20 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021

(b − a)EF0 [U ]
≤ EF0 [(V ⋆ X)N − XN ] − V0 X0 + EF0 [Xρ ]
= −EF0 [((1 − V ) ⋆ X)N ] − V0 X0 + EF0 [Xρ ]
≤ −(1 − V0 )X0 − V0 X0 + EF0 [Xρ ]
= EF0 [Xρ ] − X0
In the particular case where a = 0, b = ∆ and Xn ≥ 0, we have Xρ = 1(ν2U +1 >N ) XN +
1(ν2U +1 ≤N ) X2U +1 ≤ XN . Thus ∆EF0 [U ] ≤ EF0 [XN ] − X0 . The first inequality of the
assertion is proved in this case. The general case is reduced to this particular case by
replacing respectively Xn by (Xn − a)+ , a by 0, and b by b − a and using Proposition
III.2.3 (ii).
Proof of the second inequality. Assume first, that the submartingale is integrable. Put
D = D N ((Xn ), a, b) and ρ′ = ν2D+1
′ ∧ N . Using the inequalities a − Xν2k
′ ≥ 0 and
Xν2k−1
′ − b ≥ 0, true for all k ∈ {1, . . . , D}, we have:
V0′ X0 + D(a − b) + XN − Xρ′
= V0′ X0 + D
P
′ − Xν ′ ) + (a − Xν2k − b)] + XN − Xρ′
k=1 [(Xν2k 2k−1
′ ) + (Xν ′
2k−1

≥ [V0′ X0 + D
P
′ − Xν ′ ) + (XN − Xρ′ )] + (Xν1′ − b)1(ν1′ <+∞)
k=1 (Xν2k 2k−1

= (V ′ ⋆ X)N + (Xν1′ − b)1(ν1′ <+∞)


Now we claim that we have the inequality:

V0′ X0 + D(a − b) + (XN − b)+ ≥ (V ′ ⋆ X)N + (X0 − b)+ (III.14)

For that purpose we prove the two inequalities:


XN − Xρ′ ≤ (XN − b)+ and 1(ν1′ <+∞) (Xν1′ − b) ≥ (X0 − b)+ .
Indeed the first inequality is verified by definition of ρ′ , and the second is verified
trivially on (ν1′ = 0) while on (1 ≤ ν1′ < +∞) Xν1′ − b ≥ 0 > X0 − b so that the second
inequality is verified. Taking conditional expectation with respect to F0 , and using
the submartingale property of ((V ′ ⋆ X)n ) by proposition III.2.2, we have: D(b − a) ≤
EF0 [(XN − b)+ ] − (X0 − b)+ . In the case of nonintegrable submartingales, we replace
Xn by (Xn − a)+ , a by 0 and b by b − a and apply Proposition III.2.3 (iii). . 

III.2.7 Corollary. For any (Fn )-submartingale (Xn , n ∈ N), any (a, b) ∈ R2 , a < b,
we have the following inequalities:

(b − a)EF0 [U ((Xn ), a, b)] ≤ sup EF0 [(Xn − a)+ ] − (X0 − a)+ (III.15)
n∈N
(b − a)EF0 [D((Xn ), a, b)] ≤ sup EF0 [(Xn − b)+ ] − (X0 − b)+ (III.16)
n∈N
III.2. ASYMPTOTIC PROPERTIES OF SUBMARTINGALES 21

III.2.2 Convergence of submartingales

III.2.8 Theorem. Any submartingale (Xn , n ∈ N) such that supn∈N E(Xn+) < +∞
converges a.s. when n → ∞, to some F+∞ - measurable r.v. denoted X+∞ that satisfies:
+
E(X+∞ ) < +∞.

Proof. The convergence is a consequence of propositions III.2.4 and III.2.7. Applying


+
Fatou lemma, E(X+∞ ) = E(limn→+∞ Xn+ ) ≤ limn→+∞ E(Xn+ ) = supn≥0 E(Xn+ ) <
+∞. 

III.2.9 Remark. For an integrable submartingale (Xn , n ∈ N), the inequality E(Xn+ ) ≤
E(|Xn |) = E(2Xn+ − Xn ) ≤ 2E(Xn+ ) − E(X0 ) being verified for every n ∈ N, condition
supn∈N E(Xn+ ) < +∞ is equivalent to supn∈N kXn k1 < +∞ where k · k1 is the L1 norm.

III.2.10 Remark. With the assumptions of Theorem III.2.8, one cannot conclude
that the submartingale (Xn , n ∈N) can be closed on the right by X+∞, that is the
inequality Xn ≤ E(X+∞ |Fn ) is not necessarily true for all n ∈ N. However we have
the following:

III.2.11 Theorem. Let (Xn , n ∈ N) be a submartingale. The following are equivalent:


(i) The family (Xn+ , n ∈ N) is equiintegrable,
(ii) (Xn , n ∈ N) converges a.s. to some r.v. X+∞ when n → +∞ and (Xn , n ∈
N ∪ {+∞}) is a submartingale.
(iii) (Xn , n ∈ N) is closable on the right as a submartingale.
(iv) (Xn+ , n ∈ N) converges in L1 .

III.2.12 Definition. A submartingale that verifies one of these properties is said to


be a regular submartingale

Proof. The implication (ii) ⇒ (iii) is trivial. We prove (i) ⇒ (ii). In view of theorem
III.2.8, the limit ∞ exists and is F+∞ measurable. For any a ∈ R, the sequence (Xn ∨a)
is submartingale and by assumption (Xn+ ) is equiintegrable and therefore (Xn ∨ a) is
equiintegrable. It follows that for any n, and A ∈ Fn
R R R
A (Xn ∨ a)dP ≤ lim m→+∞ A (Xm ∨ a)dP = A (∞ ∨ a)dP

the equality being a consequence of equiintegrability of (Xn ∨ a) (Proposition D.0.6).


+
Now Xn+ and X+∞ are integrable, the second being a consequence of Fatou lemma. If
a decreases to −∞, (Xn ∨ a) and (∞ ∨ a) increase and by monotone convergence we
have: A Xn dP ≤ A ∞dP. This is equivalent to : Xn ≤ EFn (∞).
R R

Now we prove (iii) ⇒ (i) Put Yn = EFn (∞+ ). Then by proposition D.0.3 (Yn , n ∈ N)
is equiintegrable. Since Xn+ ≤ Yn for all n, (Xn+ ) is equiintegrable. The fact that (iv)
22 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021

is equivalent to (i) is a consequence of Theorem III.2.13 since (Xn+ ) is an integrable


submartingale. 

In the case where the submartingale is integrable, one has the following easy criterion
for corvengence in L1 :

III.2.13 Theorem. Let (Xn , n ∈ N) be an integrable submartingale. The following


are equivalent:
(i) The family (Xn , n ∈ N) is equiintegrable,
(ii) (Xn , n ∈ N) converges in L1.
Proof. (i) ⇒ (ii): We want to apply proposition D.0.6 of Appendix D.. For that purpose
we want to show that the limit a.s. limit ∞ is finite. Fatou lemma yields E(|∞|) =
E(limn→+∞ |Xn |) ≤ limn→+∞ E(|Xn |) ≤ supn≥0 E(|Xn |) < +∞ since equiintegrability
implies boundedness of the sequence in L1 . (ii) ⇒ (i) is a consequence of D.0.6.

Theorem III.2.11 applies naturally for supermartingales after proceeding to the usual
sign change. The following is an important particular case :

III.2.14 Corollary. Any nonnegative supermartingale converges a.s. to some nonneg-


ative F∞ -measurable r. v. X+∞ . Moreover (Xn , n ∈ N ∪ {+∞}) is a supermartingale.
Proof. Since Xn− = 0, for all n, the family (Xn− , n ∈ N) is trivially equiintegrable. The
conclusion is straightforward from theorem III.2.11. 

III.2.3 Convergence of reverse submartingales

We take as discrete time set the set Z− = {0, −1, −2, . . .} with a filtration F0 ⊃ F−1 ⊃
F−2 · · · . We put F−∞ = ∩n∈Z− Fn .

III.2.15 Corollary. For any (reverse) (Fn )-submartingale (Xn , n ∈ Z−), any (a, b) ∈
R 2, a < b, we have the following inequalities:

(b − a)EF0 [U ((Xn ), a, b)] ≤ EF0 [(X0 − a)+ ] (III.17)


(b − a)EF0 [D((Xn ), a, b)] ≤ EF0 [(X0 − b)+ ] (III.18)

III.2.16 Theorem. Any (reverse) submartingale (Xn , n ∈ Z−) converges a.s., when
n → −∞, to some F−∞ - measurable r.v. that shall be denoted X−∞ . Moreover
(Xn , n ∈ Z− ∪ {−∞}) is a submartingale.
Proof. The convergence is a consequence of proposition III.2.4 and III.2.15. Since
X−∞ is measurable with respect to all Fn , it is measurable with respect their inter-
section. Let a ∈ R. Then (Xn ∨ a) is a submartingale and is bounded from below by
III.3. REGULAR MARTINGALES 23

a. For any p ∈ Z−, and A ∈ F−∞, E(1AX−∞ ∨ a) = E(1A lim inf n→−∞ Xn ∨ a) ≤
lim inf n→−∞ E(1A Xn ∨ a) = inf n∈Z E(1A Xn ∨ a) ≤ E(1A Xp ∨ a). By monotone con-
vergence, when a ց −∞ we have E(1A X−∞ ) ≤ E(1A Xp ). 

III.2.17 Theorem. Let (Xn , n ∈ Z−) be a (reverse) submartingale and let X−∞ be its
a.s. limit. The following are equivalent:
(i) Xn is integrable for all n ∈ Z− and (Xn ) converges to X−∞ in L1,
(ii) The family (Xn , n ∈ N) is equiintegrable,
(iii) E(Xn ) ↓ α > −∞ as n ↓ −∞,
(iv) supn∈Z+ E(Xn− ) < +∞.

Proof. (i) ⇔ (ii) is an immediate consequence of Proposition D.0.6 and the fact that
a.s. convergence implies convergence in probability. (iii) ⇔ (iv) as well as (ii) ⇒ (iv)
are trivial. We prove (iii) ⇒ (ii). Assume that the monotone sequence sequence
E(Xn ) ↓ α > −∞ as n ↓ −∞. Clearly the Xn are integrable. Take ǫ > 0, and fix k0
such that α ≤ E(Xk0 ) < α + 2ǫ . It will be enough to prove that the family (Xn , n ≤ k0 )
R
R
is equiintegrable. Put I(n, a) = (|Xn |>a) |Xn |dP (n ≤ k0 , a ∈ + ). For n ≤ k0 we
have:
R R
I(n, a) = (Xn >a) Xn dP − (Xn <−a) Xn dP
R R
= (Xn >a) Xn dP + (Xn ≥−a) Xn dP − E(Xn )
ǫ
R R
≤ (Xn >a) Xk0 dP + (Xn ≥−a) Xk0 dP − E(Xk0 ) + 2
= (Xn >a) Xk0 dP − (Xn <−a) Xk0 dP + 2ǫ
R R

≤ (|Xn |>a) |Xk0 |dP + 2ǫ


R

where we used, to obtain the first inequality, the fact that (Xn > a) and (Xn ≥ −a) are
in Fn , n ≤ k0 , the submartingale property, and the fact that α ≤ E(Xn ) ≤ E(Xk0 ).
Now :
E(|Xn |) = 2E(Xn+ ) − E(Xn ) ≤ 2E(Xk+0 ) − α
so that:
P(|Xn | > a) ≤ a1 E(|Xn |) ≤ a1 (2E(Xk+0 ) − α))
R
Since Xk+0 is integrable, it follows that there exists a0 ∈ + such that for a ≥ a0 one
has: (|Xn |>a) |Xk0 |dP ≤ 2ǫ . To sum up for a ≥ a0 and n ≤ k0 we have that I(n, a) ≤ ǫ.
R

III.3 Regular martingales


III.3.1 Definition. An (Fn )-martingale (Xn ) is regular if it is closable on the right as
a martingale, that is, if there exists some integrable F∞ -measurable r.v. Y such that
24 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021

for all n ∈ N:
Xn = E(Y |Fn ) (III.19)

If it exists such an Y is unique. Indeed if Y an Y ′ verify property (III.19), then


E(1A (Y − Y ′ )) = 0 for all A ∈ ∪n∈N Fn . By a monotone class argument, this equality
extends to all A ∈ F∞ . Since A = {Y ′ − Y > 0} and B = {Y − Y ′ ≤ 0}, are both in
F∞ , E(|Y ′ − Y |) = E[1A (Y ′ − Y ) + 1B (Y − Y ′ )] = 0. Thus Y = Y ′ a.s.

III.3.2 Proposition. Let (Xn , n ∈ N) be an (Fn )- martingale. The following proper-


ties are equivalent:

(i) (Xn ) is a regular martingale,

(ii) There exists an integrable r.v. Z such that: Xn = E(Z|Fn ) for all n ∈ N,
(iii) The sequence (Xn , n ∈ N) converges a.s. to some r.v. X∞ and (Xn , n ∈ N∪{+∞})
is a closed martingale,

(iv) The sequence (Xn , n ∈ N) converges in L1,


(v) The family (Xn , n ∈ N) is uniformly integrable, that is: supn R|X |>α |Xn|dP ↓ 0 as
n

α ↑ +∞

Moreover the closure of the martingale in (i), EF∞ (Z) in (ii) and the limit X∞ claimed
in (iii) are equal.

Proof. (i) ⇔ (ii) and (iii) ⇒ (i)(ii) are straightforward. (ii) ⇒ (v) is a consequence
of Proposition D.0.3. (iv) ⇒ (v) is a direct consequence of Proposition D.0.6. Proof
of (v) ⇒ (iv): Any equiintegrable family is bounded in L1 (Proposition D.0.5), a
martingale (Xn ) that is bounded in L1 is convergent a.s. , hence in probability, and
in view of proposition D.0.6(Appendix), this implies the convergence of the sequence
(Xn ) in L1 . Proof of (iv) ⇒ (iii): Let Y be the limit of Xn in L1 , then for any
p ∈ N limn→∞ EF (Xn ) = EF (Y ) by continuity of the operation EF
p p p
on L1 but
EFp (Xn ) = Xp for n ≥ p. It follows that EFp (Y ) = Xp . Now convergence in L1
implies boundedness in L1 and therefore, by Theorem III.2.8, the a.s. convergence of
the martingale (Xn ) to some X∞ . It is clear that Y = X∞ . 

III.3.3 Corollary. Let (Xn ) be a regular martingale, closed by X∞ , and let τ1 and τ2
be two stopping times, such that τ1 ≤ τ2 . Then one has:

E(Xτ2 |Fτ1 ) = Xτ1 (III.20)

Proof. This is a restatement of proposition III.1.4, in the case of a regular martingale.


III.4. MARTINGALES IN LP 25

III.4 Martingales in Lp

III.4.1 Proposition. Let p ∈ [1, ∞[. For any Z ∈ Lp , the sequence Xn := EFn (Z)
is a martingale in Lp , that converges almost surely and in Lp to Z∞ := EF∞ (Z).

Proof. (Xn ) is equiintegrable (D.0.2 Appendix) and therefore, by theorem III.3.2,


is closed on the right by its a.s. limit. X∞ . The convergence takes place in L1 . Since
p
X∞ = EF∞ (Z), by Jensen inequality |X∞ | ≤ EF∞ (|Z|) and |X∞ |p ≤ EF∞ (|Z|) ≤
EF∞ (|Z|p ). We conclude that X∞ ∈ Lp . We want to prove that the convergence takes
place also in Lp . Assume first that there exists some a > 0 such that |Z| ≤ a. By the
Lebesgue bounded convergence we have kEFn (Z) − EF∞ (Z)kp → 0. In the genaral
case we put Z = Z1{|Z|≤a} + Z1{|Z|>a} . We have:
kEFn (Z) − EF∞ (Z)kp
≤ kEFn (Z1{|Z|≤a} ) − EF∞ (Z1{|Z|≤a} )kp + kEFn (Z1{|Z|<a} ) − EF∞ (Z1{|Z|>a} )kp
≤ kEFn (Z1{|Z|≤a} ) − EF∞ (Z1{|Z|≤a} )kp + 2kZ1{|Z|>a} kp
where we used kEB (Z1{|Z|<a} )kp ≤ kZ1{|Z|>a} kp for B = Fn and B = F∞ . For
any ǫ > 0, the fact that Z ∈ Lp implies, by bounded convergence, that there exists a0
such that kZ1{|Z|>a} kp < ǫ/4. Furthermore, by the first part of the proof, there exists
n0 ∈ N such that for n ≥ n0 kEF n
(Z1{|Z|≤a0 } ) − EF∞ (Z1{|Z|≤a0 } )kp < ǫ/2. 

III.4.2 Proposition. Let p ∈]1, ∞[. Any martingale (Xn , n ∈ N) in Lp that is


bounded in Lp converges a.s. to some r.v. F∞ - measurable r.v. X∞ . Moreover
X∞ ∈ Lp , Xn = EFn (X∞ ) (n ∈ N) and (Xn) converges to X∞ in Lp.
Proof. Let C = supn∈N kXn kp . For any a > 0, ap−1 p
R R
Xn >a |Xn |dP ≤ Ω |Xn | dP.
Therefore:
Cp
supn∈N
R
Xn >a |Xn |dP ≤ ap−1
.
When a ր +∞, the RHS goes to 0, then so does the LHS. The martingale is thus equiin-
tegrable and therefore regular. Let X∞ be its a.s. limit, we know by theorem III.3.2 that
X∞ ∈ L1 and Xn = EFn (X∞ ) (n ∈ N). Since |XnR|p converges to |X∞|p, applying Fatou
Lemma on the sequence (|Xn | , n ∈ N), yields: |X∞ | dP ≤ lim inf n→+∞ |Xn |p dP.
p p
R

We conclude that X∞ ∈ Lp . We can now apply proposition III.4.1 to conclude for the
convergence of (Xn ) in Lp . 

III.4.3 Proposition. Let p ∈]1, ∞[. For any martingale (Xn , n ∈ N) that is bounded
in Lp , the r.v. X ∗ ≡ supn |Xn | is in Lp . Moreover

p
||X ∗ ||p ≤ sup ||Xn ||p (III.21)
p−1 n
26 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021

Proof. Put : Sn = supm≤n |Xn |. The maximal inequality (III.5) applied to the sub-
martingale (|Xn |) yields:

aE(1(Sn >a) ) ≤ E(|Xn |1(Sn >a) ) (n ∈ N, a > 0)


Integrating the two member of the inequality for the non negative measure pap−2 da on
R+ we have by Fubini, for the left:
R∞ R R Sn p−1
pap−1 E(1(Sn >a) )da = pap−1 da)dP = da)dP = E(Snp ) = ||Sn ||pp
R R
0 Ω ( R+ Ω ( 0 pa

and for the right:


R∞ p−2 da =
R R Sn p−2 p p−1
0 E(|Xn |)1(Sn >a) pa Ω |Xn |( 0 pa da)dP = p−1 E(|Xn |Sn )

In view of the Hölder inequality:


E(|Xn |Snp−1 ) ≤ ||Xn ||p ||Snp−1 ||q = ||Xn ||p ||Sn ||p−1
p (where 1
p + 1
q = 1)
Therefore: ||Sn ||pp ≤ p p−1
p−1 ||Xn ||p ||Sn ||p
p
and finally: ||Sn ||p ≤ p−1 ||Xn ||p . The inequality announced in the proposition is ob-
tained by letting n go to infinity. 
III.4. MARTINGALES IN LP 27

0
April 3, 2021
28 CHAPTER III. DISCRETE TIME MARTINGALES APRIL 3, 2021
Chapter IV

Markov Chains with countable


states

IV.1 Conditional independence


IV.1.1 Definition. Let A1 , A2 , C be σ-fields of F . A1 and A2 are conditionally
independent w.r.t. to C if for all X1 ∈ A1b , one has X2 ∈ A2b :

E(X1 X2 |C ) = E(X1 |C ) E(X2 |C ) (IV.1)

or equivalently for all A1 ∈ A1 , A2 ∈ A2 :

P(A1 A2 |C ) = P(A1 |C ) P(A2 |C ) (IV.2)

IV.1.2 Proposition. A1 and A2 are conditionally independent w.r.t. to C if and only


if for all X2 ∈ A2b one has:

E(X2 |A1 ∨ C ) = E(X2 |C ) (IV.3)

Proof. We first prove the only if part. Assume that A1 and A2 are conditionally
independent w.r.t. to C . Put H = {A ∈ F | E(1A X2 ) = E(1A E(X2 |C )} and
M = {A1 ∩ C | A1 ∈ A1 , C ∈ C }. It is easily verified that H is a monotone class that
contains A1 ∪ C . Moreover if A = A1 ∩ C where A1 ∈ A1 , C ∈ C , then E(1A X2 ) =
E(1C 1A1 X2 ) = E(EC [1C 1A1 X2 ]) = E(1C EC [1A1 X2 ]) = E(1C EC [1A1 ] EC [X2 ]),
the last equality is due to conditional independence. Now, by definition of E[1A1 ],
and since 1C EC [X2 ] is C -measurable, we have the equalities E(1C EC [X2 ]EC [1A1 ]) =
E(1C EC [X2 ]1A1 ) = E(1A EC [X2 ]). By the monotone class theorem, H contains the
σ-field generated by M , that is A1 ∨ C .
Conversely, assume that (IV.3) holds. Let X1 ∈ A1b , X2 ∈ A2b . For any C ∈ C , one
has: E(1C X1 X2 ) = E(EA1 ∨C [1C X1 X2 ]) = E(1C X1 EA1 ∨C [X2 ]) = E(1C X1 EC [X2 ]),
and the latter is equal to: E(1C EC [X1 ] EC [X2 ]). We conclude that EC [X1 X2 ] =
EC [X1 ] EC [X2 ]. 

29
30 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.2 Markov sequence


IV.2.1 Definition and first properties

Let (Ω, F , (Fn ), P) be a filtered space and let (E, E ) be a measurable set, called the
set of states. If X = (Xn ) is a measurable sequence of random variables with values
in (E, E ), we put FnX = σ(X0 , . . . , Xn ) (n ∈ N). (FnX ) is called the natural filtration
of X. We also put AnX = σ(Xk , k ≥ n): this is the σ-field of future events determined
by X after time n.

IV.2.1 Definition. A sequence X = (Xn ) of random variables with values in E has


the (Fn )-Markov property (or is an (Fn )-Markov sequence) if (Xn ) is (Fn )-adapted
and if for all f ∈ E+ (or all f ∈ Eb ) and all n ∈ N one has:
E(f (Xn+1 )|Fn ) = E(f (Xn+1 )|Xn ) a.s. (IV.4)

By monotone class argument condition IV.4 is equivalent to the following: for all
B∈E

P(Xn+1 ∈ B|Fn ) = P(Xn+1 ∈ B|Xn ) a.s. (IV.5)

Any (Fn )-Markov sequence is an (FnX )-Markov sequence. This is why we introduce
the following definition:

IV.2.2 Definition. A sequence X = (Xn ) or r.v. with values in E is said to be a


Markov sequence (without reference to a filtration) if X is an (FnX )-Markov sequence
in the sense of definition IV.2.1; explicitly for any B ∈ E , n ≥ 0:

P(Xn+1 ∈ B|X0 , . . . , Xn ) = P(Xn+1 ∈ B|Xn ) (IV.6)

IV.2.3 Proposition. Let X be a an (Fn )-adapted sequence. X is an (Fn )-Markov


sequence if and only if for any n ∈ N, any AnX - measurable random variable Φ that is
nonnegative or bounded one has:

E(Φ|Fn ) = E(Φ|Xn ) a.s. (IV.7)

Proof. First we prove IV.7 for all Φ of the form Φ = f0 (Xn ) . . . fp (Xn+p ) where p ≥ 0
and f0 , . . . fp ∈ Eb . The proof is by induction on p. For p = 0 equality IV.7 is obviously
true for all f0 ∈ Eb . Assume that the formula is true for all f0 , . . . fk ∈ Eb where
0 ≤ k < p. Put U = f0 (Xn ) . . . fk (Xn+k ). We have:
E(U fk+1 (Xn+k+1 )|Fn ) = E[E(U fk+1 (Xn+k+1 )|Fn+k )|Fn ]
= E[U E(fk+1 (Xn+k+1 )|Fn+k )|Fn ]
= E[U E(fk+1 (Xn+k+1 )|Xn+k )|Fn ] (using formula IV.4 )
IV.2. MARKOV SEQUENCE 31

Now E(fk+1 (Xn+k+1 )|Xn+k ) = g(Xn+k ) for some g ∈ Eb and therefore:

U g(Xn+k ) = f0 (Xn ) . . . fk−1 (Xn+k−1 )fk (Xn+k )g(Xn+k )

Thus by the induction hypothesis applied on U g(Xn+k )):


E(U fk+1 (Xn+k+1 )|Fn ) = E[U g(Xn+k )|Fn ]
= E[U g(Xn+k )|Xn ]
= E[U E(fk+1 (Xn+k+1 )|Fn+k )|Xn ]
= E[E(U fk+1 (Xn+k+1 )|Fn+k )|Xn ]
= E[U fk+1 (Xn+k+1 )|Xn ]
Now a monotone class argument allows the extension of equality (IV.7) to all AnX -
measurable and bounded random variables. 

Remark that Fn ∨ σ(Xn ) = Fn so that the result of Proposition IV.2.3 combined


with Proposition IV.1.2 give the following abstract formulation of the so-called Markov
property :

IV.2.4 Corollary. A (Fn )-adapted sequence X = (Xn ) is a (Fn )-Markov sequence if


and only if for any n ∈ N, the σ-fields AnX and Fn are independent conditionally on
σ(Xn ).

The Markov property can be formulated via a transition probability between two
successive states, at least if the space state is a polish space. (A reminder on transition
probabilities is the object of Appendix B).

IV.2.5 Proposition. Assume that (E, E ) is a polish space. Let X be a an (Fn )-


adapted sequence with values in E. X is an (Fn )-Markov sequence if and only if for
all n ∈ N there exists a transition probability (Markov Kernel) Qn from E to E such
that for all f ∈ E+ (or all f ∈ Eb ) one has:

E(f (Xn+1 )|Fn ) = Qn f (Xn ) a.s. (IV.8)

Proof. Let X be a sequence of r. v. X with values in (E, E ), and let Pn,n+1 be


the law of the couple (Xn , Xn+1 ), that is the probability measure on (E × E, E ⊗ E )
image of P by (Xn , Xn+1 ) and let Pn be the law of Xn . In view Theorem C.2.8
of the Appendix, there exists a product regular condition probability (PRCP) say Qn
associated to Pn,n+1 . This means that Qn is a transition probability (or Markov Kernel)
from (E, E ) to (E, E ) such that for all (A, B) ∈ E × E one has:
Z
Pn,n+1 (A × B) = Qn (x, B)Pn (dx) (IV.9)
A
In view of Proposition C.2.7, the RHS of equation IV.4 is equal to the RHS of equation
IV.8. 
32 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.2.2 Countable states, Transition matrix

In the rest of this chapter the set E denotes a finite or countable set, endowed with
its discrete σ-field, P(E). A nonnegative measure on E is a σ-additive measure on
(E, E ) with values in [0, +∞]. Since any nonnegative measure µ on E verifies µ(B) =
P
i∈B µ({i}) for any B ∈ E , µ can be identified with a mapping µ : E → [0 + ∞]. In
our setting, µ is represented by a row matrix (µ(i)).
Dually, a numerical function f : E → R is usually represented by a column matrix
f = (f (j)). A nonnegative function f is a numerical function such that f ≥ 0. The set
of all nonnegative functions is denoted E+ . A bounded function f is a function such
that |f | ≤ c for some c ∈ R. The set of all bounded functions is denoted Eb.
A transition matrix (or stochastic matrix , or Markov matrix ) on E is a mapping M :
R P
E × E → + such that: j∈E M (i, j) = 1 for all i ∈ E. M thus represents a transition
probability from E to E (see Appendix C ). In the discrete setting that we consider in
this chapter, M is usually denoted as a matrix M = (mij ) where mij := M (i, j).
The product of transition matrices M and N is denoted M N and is defined by:
M N (i, j) = k∈E M (i, k)N (k, j). M N is a transition matrix. We put M 0 (i, j) := δij
P

(Kronecker convention). Thus M 0 is the identity matrix and we define by induction


M n+1 := M M n for n ∈ N. It follows that for all n, p ∈ N, M n+p = M nM p = M pM n.
Let µ be a non negative measure on E. For a nonnegative function f : E → [0, +∞],
P
the integral of f by µ is the non negative finite or infinite number µf : k∈E µ(i)f (i)
∈ [0 +∞]. (Note that the expectation µf is obtained as a usual matrix multiplication).
A numerical function f : E → R is µ-quasintegrable if either µf+ < +∞ or µf− < +∞.
Then the integral of f by µ is then µf = µf+ − µf− . f is µ-integrable if µf+ < +∞
and µf− < +∞ or equivalently µ|f | < +∞. We denote by L 1 (µ) the set of all µ-
integrable functions and by L1 (µ) the set of equivalent classes of L 1 (µ) with respect
to the equivalence relation defined by f ∼ g if µ({i|f (i) 6= f (j)}) = 0.
The product µM (a row matrix) defines a nonnegative measure on E. This is the
P
measure that assigns the value µM (j) := i µ(i)M (i, j) to the state j. The product
M f (a comum matrix) (when it is well defined. e.g. f ∈ E+ ∪ Eb or generally when f
is quasi-integrable for all M (i, ·)) is a function M f (i) will also be written M (i, f ).

In view of relation IV.6, and the discreteness of E one has the following:

IV.2.6 Proposition. Let X = (Xn ) be an Fn -adapted sequence with values in E. X


is an (Fn )-Markov sequence if and only if: for all n ≥ 0 and all i0 , . . . , in , in+1 ∈ E:

P(Xn+1 = in+1 |X0 = i0 , . . . , Xn = in ) = P(Xn+1 = in+1 |Xn = in ) (IV.10)


IV.2. MARKOV SEQUENCE 33

(Equality must be understood as verified when both members are defined)

IV.2.7 Proposition. Let X be a an (Fn )-adapted sequence. X is an (Fn )-Markov


sequence if and only if for any n ∈ N there exists a transition matrix Mn on E such
that for all f ∈ E+ (or all f ∈ Eb ) one has:

E(f (Xn+1 )|Fn ) = Mn f (Xn ) a.s. (IV.11)

Proof. Fix an arbitrary probability π on E. For n ∈ N and (i, j) ∈ E × E put:



P(Xn+1 = j|Xn = i) if P(Xn = i) > 0
Mn (i, j) = (IV.12)
π(j) if P(Xn = i) = 0

Assume that property (IV.4) is satisfied then property (IV.11 ) is satisfied for the
transition matrix Mn defined by ( IV.12). The converse is trivial. 
The matrix Mn is not uniquely defined since for x ∈ E such P(Xn = x) = 0, Mn (x, ·)
can be any arbitrary probability measure on E. A family of Matrices (Mn , n ∈ N) with
the property stated in the proposition is called a system of Markov matrices associated
to the Markov sequence.

IV.2.8 Proposition. Let X be an (Fn )-Markov sequence and let (Mn , n ∈ N) be


an associated system of Markov matrices and initial distribution µ.Then for any f ∈
E+ ∪ Eb , n ≥ 0, p ≥ 1 one has:

E(f (Xn+p )|Fn ) = Mn,n+p f (Xn ) a.s. (IV.13)

where Mn,n+p = Mn · · · Mn+p−1 . Moreover the law of Xp is µM0,p .

Proof. By induction on p. (IV.13) is true by definition for p = 1 and all all f ∈ E+ .


Assume that E[f (Xn+p−1 ))|Fn ] = Mn,n+p−1 f (Xn ) for all f ∈ E+ . By the Markov
property (IV.22) EFn [f (Xn+p )] = EFn [EFn+p−1 [f (Xn+p )]] = EFn [Mn+p−1 f (Xn+p−1 )].
Now applying the induction hypothesis for the function Mn+p−1 f (in E+ ), the last
expression is equal to Mn,n+p−1 .Mn+p−1 f (Xn ) = Mn,n+p f (Xn ). Thus the property
is proved. In particular E[f (Xp )] = E[EF0 [f (Xp )] = E[M0,p f (X0 )] = µM0,p f , from
which it follows that the law of Xp is µM0,p . 

IV.2.9 Proposition. Let X be an (Fn )-Markov sequence and let (Mn , n ∈ N) be an


associated system of Markov matrices, and initial distribution µ. Then for any n ≥ 0,
and for any i0 , i1 , . . . , in ∈ E one has:

P(X0 = i0 , . . . , Xn = in ) = µ(x0 ) M0 (i0 , i1 ) · · · Mn−1 (in−1 , in ) (IV.14)


34 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.2.3 Homogeneous Markov sequence

IV.2.10 Definition. A sequence (Xn ) with values in the countable set E is an homo-
geneous (Fn )-Markov sequence if there exists a transition matrix M on E such that
all f ∈ E+ (or all f ∈ Eb ) and for all n ∈ N, one has:
E(f (Xn+1 )|Fn ) = M f (Xn ) a.s. (IV.15)

M is the transition matrix and the probability µ defined by µ(i) = P(X0 = i) is the
initial distribution of the Markov sequence.

Equivalently, one can replace in definition IV.2.10 the relation (IV.15) by the fol-
lowing: for all n ∈ N and all j ∈ E:
P(Xn+1 = j|Fn ) = M (Xn , j) a.s. (IV.16)

In the rest of this chapter we deal only with homogeneous Markov sequences, there-
fore in the sequel Markov sequence will mean homogeneous Markov sequence.

IV.2.11 Proposition. (i) If (Xn ) is an (Fn )-Markov sequence with transition matrix
M and initial distribution µ then for all n ∈ N and all i0 , . . . , in ∈ E one has:
P(X0 = i0 , . . . , Xn = in ) = µ(i0 )M (i0 , i1 ) · · · M (in−1 , in ) (IV.17)

(ii) Conversely if (IV.17) is verified for some µ and M , then (Xn ) is a Markov sequence
(w.r.t (FnX )) with transition Matrix M and initial distribution µ.

IV.2.12 Proposition. If (Xn ) is an (Fn )-Markov sequence with transition matrix M


and initial distribution µ then we have for all f ∈ E+ ∪ Eb and for all n, p ∈ N:
E(f (Xn+p )|Fn ) = M p f (Xn ) a.s. (IV.18)

Moreover the law of Xn is µM n .

IV.2.4 The canonical setting

Let E be a finite or countable set endowed with its discrete σ-field E . The infinite
product E N , that is the set of all sequences in E, will be endowed with the product
σ-field E ⊗N and the fitration obtained as follows. Let πn be the projector of E N on its
n-th coordinate. We put En := σ(π0 , . . . , πn ). Clearly E ⊗N = ∨n≥0 En . The sequence
(πn ) is said to be the canonical sequence associated to E.The term (E N , E ⊗N , En , P) is
said to be a canonical process on E .
Let (Ω, F , P, (Xn )) be a measurable process with values in (E, E ). Put X = (Xn ),
then X : Ω → E N is F /E ⊗N -measurable. The image of P by X denoted PX is called the
IV.3. MARKOV CHAIN 35

law (or the distribution) of X. The process (E N , E ⊗N , PX , πn ) is called the canonical


realization of (Ω, F , P, (Xn )). Two processes (Ω, F , P, Xn ) and (Ω′ , F ,′ P′ , Xn′ ) with

values in (E, E ) are said to be equivalent if PX = P′ X this is equivalent to say that they
have the same finite dimensional distributions, precisely: for all n ∈ N, all (i0 , . . . , in ) ∈
E n+1 :
P(X0 = i0 , . . . , Xn = in ) = P′ (X0′ = i0 , . . . , Xn′ = in ) (IV.19)

Since Xn = πn ◦ X, we have:

P(X0 = i0 , . . . , Xn = in ) = PX (π0 = i0 , . . . , πn = in ) (IV.20)

It follows that (Ω, F , P, Xn ) and (Ω′ , F ′ , P′ , Xn′ ) are equivalent if and only if they
have the same canonical realization. It is easy to see that the canonical realization
(E N , E ⊗N , PX , πn ) of any (Fn )-Markov sequence is a Markov sequence. In view of
proposition IV.2.11 we can state:

IV.2.13 Proposition. Two Markov sequences (Xn ) and (Xn′ ) (each defined on some
filtered space) are equivalent if and only if they have the same initial distribution and
the same transition matrix.

IV.2.14 Proposition. Given any probability µ and any transition matrix M on E,


there exists a (unique) canonical Markov sequence with initial distribution µ and tran-
sition matrix M

Proof. Using formula (IV.17), one can define a family of probability measures Qn
on En := E {0,...,n} , with its discrete σ- field B̃n . For n ≤ m, Qn is the image of Qm
by the projection πnm of Em on En . It follows that the system ((En , Qn ), n ≥ 0 is
consistent so that one can apply a probability extension theorem of the Kolmogorov
type (Neveu, Calcul des Probabilités, section III- 3 ) to conclude of the existence
of a unique probability measure Q on E N such that Qn is the image of Q by the
projection (π0 , . . . , πn ). One can also apply the Ionescu-Tulcea theorem (Neveu, Calcul
des Probabilités, Proposition V-1-1) for the same purpose. 

IV.3 Markov chain


In the rest of this chapter we assume that we are given a measurable space (Ω, F ) and
a sequence of F -measurable functions (Xn ) with values in E. Let Fn = σ(X0 , . . . , Xn )
N W
(n ∈ ), we assume that F = n∈N Fn . We also assume the existence of a translation
or (shift) operator on Ω, that is θ : Ω → Ω such that Xn+1 = Xn ◦ θ, and that we are
given a family of probabilities (Pi , i ∈ E) on (Ω, F ), the expectation operator of which
is denoted Ei .
36 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.3.1 Definition. Let M be a transition matrix on E. The array X = (Ω, (Xn ), θ, (Pi ))
as described above is said to be a Markov chain with transition matrix M on E if all
i ∈ E, the following properties hold:
(i) Pi (X0 = i) = 1
(ii) For all n ∈ N, and f ∈ E+ (or f ∈ Eb ) one has:
Ei (f (Xn+1 )|Fn ) = M f (Xn ) Pi a.s. (IV.21)

IV.3.2 Definition. Let M be a transition matrix of E. The canonical Markov chain


associated to M is the one obtained in the canonical setting that is : Ω = E N , F is the
product σ-field, Xn is the nth projection of Ω on E (n ∈ N), Fn = σ(X0 , . . . , Xn) (n ∈
N), the translation operator is the map θ defined by θ(ω0, ω1 , ω2, . . .) = (ω1 , ω2, ω3, . . .)
It follows from definition IV.3.1 that a Markov chain is a sequence (Xn ) that has
the Markov property relatively to all Pi , (i ∈ E), and for the same transition matrix.
For any probability µ on E, we denote by Pµ the probability measure on Ω defined
P
by Pµ (B) = i∈E µ(i)Pi (B) (B ∈ F ). The corresponding expectation operator will
be denoted Eµ . It is easy to verify that for a Markov chain as in definition IV.3.1,
relatively to the probability Pµ the sequence (Xn ) is a Markov sequence with matrix
M and initial distribution µ.

Translation operator. The translation ( or shift) operator θ will play a major role
in the reformulation of the Markov property. Let App+n = σ(Xp , . . . , Xp+n ) (p, n ∈ N)
and Ap = ∨n≥0 App+n . Ap is known as the σ-field of future events at time p. We put
θ0 = IdΩ and for any n ≥ 0 we put θn+1 := θ ◦ θn . For any B0 , . . . , Bn ⊂ E, the set
{ω|X0 ◦ θp (ω) ∈ B0 , . . . , Xn ◦ θp (ω) ∈ Bn } = {ω|Xp (ω) ∈ B0 , . . . , Xn+p (ω) ∈ Bn }. It
follows that θp−1 (Fn ) = σ(Xp , . . . , Xn+p ) = App+n so that in particular θp : (Ω, F ) →
(Ω, F ) is measurable and Ap = θp−1 (F ) that is, Ap is the σ-field generated by θp . In
view of Lemma I.2.4, a real random variable Y ′ is Ap -measurable if and only if there
exists an F -measurable r.v. Y such that Y ′ = Y ◦ θp . Homogeneity and existence of a
translation operator allow a more precise relations than equality (IV.7).

Markov property: formulation with translation operator

The Markov property for a Markov chain can be reformulated nicely using the existence
of a rich family of probability measures for which the sequence is markovian and the
translation operator. We start by remarking that the formula IV.15 can be expressed
as follows: For any µ ∈ ∆(E), any f ∈ E+ , any n ∈ N:
Eµ (f (X1 ◦ θn )|Fn ) = EXn (f (X1 )) Pµ a.s. (IV.22)
IV.3. MARKOV CHAIN 37

It is important to note that the right hand side of this equation is a random variable
equal, for Pµ -almost all ω, to the (unconditional !) expectation EXn (ω) (f (X1 )). The
following proposition extends the operator formulation of the Markov property to all
future random variables. This extension is similar to the one stated in Proposition
IV.2.3.

IV.3.3 Proposition (Markov Property formulation with shift operator). Let X be


a Markov chain with transition matrix M . For any probability µ on E, the sequence
(Xn ) is a Markov sequence for the probability space (Ω, F , Pµ ) with transition matrix
M and initial distribution µ. Moreover, for any F -measurable function Φ on Ω that is
bounded or nonnegative and any n ∈ N one has:
Eµ (Φ ◦ θn |Fn ) = EXn (Φ) Pµ a.s. (IV.23)

Note that in the RHS of formula (IV.23) EXn (Φ) is the r.v. ω → EXn (ω) (Y )
First proof. By the remarks preceding the proposition, the result appears as a refor-
mulation of Proposition IV.2.3 if we remark that the RHS is precisely E(Φ|Xn )
Second proof. We first prove the formula for Px . We want to prove that for any Z that
is bounded and Fn -measurable we have:

Ex (ZΦ ◦ θn ) = Ex (ZEXn (Φ)) (IV.24)

It will be enough to prove this equality for Φ of the form f0 (X0 ) . . . fp (Xp ) where
f0 , . . . , fp ∈ Eb since then, by the monotone class argument, it will be valid for all
bounded F - measurable Φ. The proof is by induction on p. For p = 0 Ex (Zf0 (X0 ) ◦
θn ) = Ex (Zf0 (Xn )). Since Ey (f0 (X0 )) = f0 (y) one has EXn (f0 (X0 )) = f0 (Xn ) so that
equality IV.24 is true in this case. Let U = f0 (X0 ) . . . fp−1 (Xp−1 ), V = U M fp (Xn−1 )
Ex (ZΦ ◦ θn ) = Ex (Z U ◦ θn fp (Xp+n )
= Ex [Ex (Z U ◦ θn fp (Xp+n )|Fp+n−1 )]
= Ex [Z U ◦ θn Ex (fp (Xp+n )|Fp+n−1 )]
= Ex [Z U ◦ θn M fp (Xp+n−1 ]
= Ex [Z V ◦ θn ]
= Ex [ZEXn (V )]
In particular taking Z = 1, n = 0 we have Ex (Φ) = Ex (V ). Since this equality is true
for all x ∈ E, one can replace has EXn (V ) by EXn (Φ) in the last term of the above
equalities. This proves IV.24. 

Let T be a stopping time for the filtration (Fn ) then (T < +∞) ∈ F and one can
define on (T < +∞) the operator θT as follows:

θT (ω) = θn (ω) on (T = n) (IV.25)


38 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

If Φ is an F -measurable r.v., then Φ ◦ θT is F -measurable on (T < +∞). Indeed for


any B ∈ B, (Φ ◦ θT ∈ B) ∩ (T < +∞) = ∪n∈N (Φ ◦ θT ∈ B) ∩ (T = n) = ∪n∈N (Φ ◦ θn ∈
B) ∩ (T = n) ∈ F

IV.3.4 Proposition (Strong Markov Property). For any F -measurable function Φ


on Ω that is nonnegative or bounded, and any stopping time T one has:

Eµ (Φ ◦ θT |FT ) = EXT (Φ) Pµ a.s. on (T < +∞) (IV.26)

The LHS of the equation must be understood as the restriction of Eµ (1(T <+∞) Φ ◦
θT |FT ) to (T < +∞) (see remark I.2.2). An alternative way to write this equality is
therefore:

Eµ (1(T <+∞) Φ ◦ θT |FT ) = 1(T <+∞) EXT (Φ) Pµ a.s. (IV.27)

In particular taking Φ = f (Xm ) in (IV.26) and applying Proposition IV.2.12 one has:

Eµ (f (XT +m )|FT ) = EXT (f (Xm )) = M m f (XT ) Pµ a.s. on (T < +∞) (IV.28)

IV.3.1 Calculations on a Markov chain


Definitions and Notations.

We shall use the following notations :


(n) (0) (1)
M n = (mij ), in particular mij = δij (Kronecker notation) and mij = mij
τi := inf(m ≥ 0|Xm = i). τi is the hitting time of i,
σi := inf(m ≥ 1|Xm = i). σi is the return time to i.
By induction define a sequence (σin , n ≥ 1) by:
σi1 := σi and for n ≥ 1, σin+1 := inf(m > σin |Xm = i).
It is easily seen that τi and σin are stopping times for the filtration (Fn ) and that:

σi = 1 + τi ◦ θ (IV.29)
σin+1 = σi + σin ◦ θσi on (σi < +∞) (IV.30)
= σin + σi ◦ θσin on (σin < +∞) (IV.31)

fij := Pi (σj < +∞)


(n)
fij := Pi (σjn < +∞) (n ≥ 1)
skij := Pi (σj = k) (k ∈ N∗ ), so that : s∞ij = 1 − fij
IV.3. MARKOV CHAIN 39

Li := Ei (σi )
P
Ni := n≥0 1{i} (Xn ). Ni is called the number of visits to state i.
uij := Ei (Nj )
U := (uij ) is called the Potential matrix.

IV.3.5 Proposition. For all i, j ∈ E, one has:


(p)
(i) mij = Pi (Xp = j) all p ∈ , N
(n)
(ii) uij = n≥0 mij and therefore U = n≥0 M n .
P P

Proof. (i) is a consequence of Proposition IV.2.12 to the particular case E = Ei , n = 0,


P P
f = 1{j} . For assertion (ii), remark that uij = n≥0 Ei (1{j} ◦Xn ) = n≥0 Pi (Xn = j).


Law of return to some state

IV.3.6 Proposition. (i) For any j ∈ E and p ∈ N∗, σjp is a stopping time for the
filtration (Fn ). (ii) For any i, j ∈ E, the sequence (σjp , p ∈ N∗ ) is a homogeneous
Markov sequence with values in N for the filtered space (Ω, A , Pi , (Fσ )p≥1 ), with

p
j
transition matrix Q(j) that depends on j and not on i, and initial distribution µ equal
to the law of σj under Pi , precisely: µ(k) = Pi (σj = k) ≡ skij (k ∈ N∗ )
Proof. The proof that σjp is a stopping time is left to the reader. Put Bp = Fσjp . Let
i ∈ E and let f : N∗ → R be a bounded function. For any k ∈ N∗, EBi (f (σjp+1)) =
p

B
EF k p p p+1
i (f (k + σj ◦ θ )) on (σj = k), and Ei (f (σj
k
)) = f (+∞) on (σjp = ∞). Thus we
have the equality:
B
Ei p (f (σjp+1 )) = F
N∗ 1(σjp =k)Ei k f (k + σj ◦ θ k ) + 1(σ(p) =∞) f (∞)
P 
k∈
j

Applying the Markov property and remarking that on (σjp


= k), Xk = j for all k ∈ N∗:
Bp p+1
Ei (f (σj )) = k∈N∗ 1(σjp =k) EXk f (k + σj ) + 1(σjp =∞) f (∞)
P 

= k∈N∗ 1(σjp =k) Ej f (k + σj ) + 1(σjp =∞) f (∞)


P 

B
Therefore Ei p f (σjp+1 ) = Qf (σjp ) where Q is the following transition matrix.


 l−k
 sjj N
if k ∈ ∗ , l ∈ N∗ , l − k ≥ 1
Q(k, l) = 1 if k = l = +∞ (IV.32)
0 otherwise


(p+1) (p)
IV.3.7 Corollary. For any i, j ∈ E, p ≥ 1 : fij = fij fjj .
(p) (p)
Therefore: fij = fij (fjj )p−1 (i 6= j), and fjj = (fjj )p .
40 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

Proof. Using the notations of Proposition IV.3.6, we see that for k ∈ N∗, Q1N (k) =

Pi (σj < +∞) and Q1N∗ (∞) = 0. Since (σjp , p ∈ N ∗) is a Markov sequence for proba-
1N∗ (σjp+1 ) = Q1N∗ (σjp ) =
B Bp 
bility Pi and matrix Q, one has: Ei p 1(σp+1 <+∞) = Ei
j

1(σjp <+∞) Pj (σj < +∞). By taking expectation, this implies Pi (σjp+1 < +∞) =
Pi (σjp < +∞)Pj (σj < +∞). In particular if i = j, Pj (σjp+1 < +∞) = Pj (σjp <
(p) (p)
+∞)Pj (σj < +∞). Therefore fjj = (fjj )p and consequently fij = fij (fjj )p−1 .


Number of visits to a state

IV.3.8 Proposition. The number of visits Nj to state j is a r.v with values in N ∪{∞}
that has the following law, where we take i 6= j:
If fjj < 1

 1 − fij if k = 0
Pi (Nj = k) = f (1 − fjj )(fjj )k−1
 ij
if k ∈ ∗N (IV.33)
0 if k = ∞


 0 if k = 0
Pj (Nj = k) = (1 − fjj )(fjj )k−1 if k ∈ ∗N (IV.34)
0 if k = ∞

If fjj = 1

 1 − fij if k = 0
Pi (Nj = k) = 0 N
if k ∈ ∗ (IV.35)
fij if k = ∞


 0 if k = 0
Pj (Nj = k) = 0 if k ∈ ∗N (IV.36)
1 if k = ∞

Proof. a) Proof of the case i 6= j. On (X0 = i) one has (Nj = 0) = (σj = +∞), therefore
Pi (Nj = 0) = 1 − fij and for p ∈ N∗ one has (Nj = p) = (σjp < +∞) ∩ (σjp+1 = +∞).
Therefore Pi (Nj = p) = Pi [(σjp < +∞) ∩ (σjp+1 = +∞)]. Writing σjp+1 = σjp + σj ◦ θσjp
on (σjp < +∞), it follows that:
Pi (Nj = p)
= Pi [(σjp < +∞) ∩ (σj ◦ θσjp = +∞)]
Fσ p 
= Ei [Ei j 1(σjp <+∞) 1(σj ◦θσp =+∞) ]
j
Fσ p 
= Ei [ 1(σjp <+∞) Ei j 1(σj ◦θσp =+∞) ]
j
IV.3. MARKOV CHAIN 41


= Ei [ 1(σjp <+∞) Ej (1(σj =+∞) ] (By the strong Markov property)
(p)
= fij .(1 − fjj )
p−1
= fij fjj (1 − fjj )
(in view of corollary IV.3.7)
k−1
Also Pi (Nj = ∞) = Pi ∩k≥1 (σjk < +∞) = limk Pi (σjk < +∞) = limk fij fjj


(corollary IV.3.7 and monotone convergence) Pi (Nj = ∞) is thus equal to 0 if fjj < 1
and equal to fij if fjj = 1.
b) Proof of the case i = j (sketch). (X0 = j) ∩ (Nj = 0) = ∅. Thus Pj (Nj = 0) = 0.
On (X0 = j) we have: (Nj = 1) = (σj = +∞) and for p ≥ 2 (Nj = p) = (σjp−1 <
+∞) ∩ (σjp = +∞). The same method as in the first paragraph leads to the announced
results.


IV.3.9 Proposition. For any i, j ∈ E:

(n)
X
uij = δij + fij (IV.37)
n≥1

It follows that for i 6= j :


uij = fij ujj (IV.38)

and:

 (1 − fjj )−1

if fjj < 1
ujj = (IV.39)
+∞ if fjj = 1

1
With the convention : 0 = +∞ and 0 · +∞ = 0.
P
Proof. Method 1. Since Nj is integer valued, uij ≡ Ei (Nj ) = n≥1 P(Nj ≥ n). On
(X0 6= j), we have (Nj ≥ n) = (σjn < +∞) while on (X0 = j) we have for n ≥ 2,
(n)
(Nj ≥ n) = (σjn−1 < +∞). It follows that if i 6= j, Pi (Nj ≥ n) = Pi (σjn < +∞) = fij
and if i = j, and n ≥ 2, Pj (Nj ≥ n) = Pj (σjn−1 < +∞), also Pj (Nj ≥ 1) = 1. This
ends the proof of the first part. Applying corollary IV.3.7 yields the two other formulae.
Method 2. Use the formula giving the law of Nj in the preceding proposition to compute
uij = Ei (Nj ).


IV.3.2 First classification: Communication classes

IV.3.10 Definition. Let i, j ∈ E. i leads to j (denoted i → j) if i = j or Pi (σj <


+∞) > 0. i communicates with j (denoted i ∼ j) if i → j and j → i.
42 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.3.11 Proposition. (i) i → j if and only if uij > 0 or equivalently if and only if
(n)
there exists n ≥ 0 such that mij > 0.
(ii) The binary relation → defined on E is reflexive and transitive (preorder).
(iii) The binary relation ∼ defined on E is an equivalence relation.
(0)
Proof. (i) If i = j then uii ≥ mii = 1. Let i 6= j and Pi (σj < +∞) > 0. Then for some
(n)
n ≥ 1 Pi (σj = n) > 0. Thus uij ≥ mij = Pi (Xn = j) ≥ Pi (σj = n) > 0. Conversely
(n)
if uij > 0 and i 6= j then for some n ≥ 1, mij > 0. Thus Pi (σj < +∞) ≥ Pi (σj ≤
(n)
n) ≥ Pi (Xn = j) = mij > 0.
(n) (p)
(ii) If i → j and j → k then for some m, p ≥ 0 mij > 0 and mjk > 0. It follows that
(n+p) P (n) (p) (n) (p)
mik = ℓ∈E miℓ mℓk ≥ mij mjk > 0. Thus i → k.
(iii) The binary relation ∼ is clearly reflexive, symmetric and transitive.


IV.3.12 Definition. The equivalence classes of E with respect to the equivalence


relation ∼ are called the communication classes of the Markov chain. A chain with
only one communication class is said to be irreducible

Let C be the set of communication classes. Equivalently C = E/ ∼. Endowed


with the induced (quotient) relation → (c → c′ if and only if there exist x ∈ c, x′ ∈ c′ ,
x → x′ ), the set (C , →) is a partially ordered set. (C , →)

IV.3.13 Remark. If i0 → i1 → · · · → in , and if i0 and in are in the same communi-


cation class C then i0 , i1 , . . . , in belong to C. Put otherwise: If the chain leaves some
communication class C then with probability 1 it will never return to C.

IV.3.3 Second Classification: recurrence and transience

IV.3.14 Theorem. We have the equivalences:

Pi (σi < +∞) = 1 ⇐⇒ uii = +∞ ⇐⇒ Pi (Ni = +∞) = 1 (IV.40)

Pi (σi < +∞) < 1 ⇐⇒ uii < +∞ ⇐⇒ Pi (Ni = +∞) = 0 (IV.41)


(n)
Proof. In view of Corollary IV.3.7 and Proposition IV.3.9 we have fii = (fii )n
and uii = n≥0 (fii )n . Thus fii = 1 is equivalent to uii = +∞. On the other hand
P
(n)
(Ni = +∞) = ∩n (σin < +∞) so that Pi (Ni = +∞) = limn fii = limn (fii )n . Thus
when i is recurrent Pi (Ni = +∞) = 1 and when it is transient Pi (Ni = +∞) = 0.

IV.3. MARKOV CHAIN 43

IV.3.15 Definition. A state i is said to be recurrent if Pi (σi < +∞) = 1 and transient
otherwise.

IV.3.16 Proposition. Recurrence and transience are class properties: In a commu-


nication class either all the states are recurrent or all the states are transient.

Proof. (i) Let j ∼ i, j 6= i. There exists n0 , p0 ∈ N such that b (n )


:= mij 0 > 0
(p ) (p +l+n ) (l)
and a := mji 0 > 0. It follows that for any l ≥ 0, mjj0 0
≥ abmii . Thus
P (n) P (n)
ujj ≡ n≥0 mjj ≥ ab n≥0 mii = abuii . Thus if i is recurrent (that is ujj = +∞)
then j is recurrent (that is ujj = +∞). By symmetry i recurrent if and only if j
recurrent and i transient if and only if j transient. Thus recurrence and transience are
class properties. 

IV.3.17 Proposition. In any Markov chain we have the following:


(i) If i is recurrent and i → j then: i ∼ j, j is recurrent, fij = 1, Pi (Nj = +∞) = 1
and uij = +∞
(ii) If j is transient and i is arbitrary then: uij < +∞ and Pi (Nj = +∞) = 0.

Proof. (i) For any i, j (σj < +∞) ∩ (σi ◦ θσj = +∞) ⊂ (Ni < +∞). By the strong
Markov property: Pi [((σj < +∞) ∩ (σi ◦ θσj = +∞)] = Pi (σj < +∞).Pj (σi = +∞).
Thus fij .(1 − fji ) ≤ Pi (Ni < +∞).

(i) If i → j and i is recurrent we have fij > 0 and Pi (Ni < +∞) = 0, therefore
1 − fji = 0, consequently j → i and finally i ∼ j. In view of IV.3.16 we conclude that j
is recurrent. Now since j is recurrent, and j → i, by symmetry fij = 1 and in view of
(n)
Proposition IV.3.9 uij = +∞. Now in view of Proposition IV.3.6(iii) fij = fij (fjj )n =
(n)
1 therefore Pi (Nj = +∞) = Pi (∩n (σjn < +∞)) = limn→+∞ fij = 1.
(n)
(ii) If j is transient then fjj < 1. In view of Proposition (IV.3.6)(iii) fij = fij (fjj )n
(n)
therefore Pi (Nj = +∞) = Pi (∩n (σjn < +∞)) = limn→+∞ fij = 0. On the other hand
in view of IV.38 and IV.39 ujj < +∞ and uij < +∞. 

IV.3.18 Remark. A communication class C is a maximal element in the partially


ordered set (C , →) if and only if any state in C can lead only to states in C, equivalently
if Pi (Xn ∈ C, ∀n ∈ N) = 1 for every i ∈ C. One can consider the chain restricted to C
with Markov matrix MC . The restricted chain is clearly irreducible and it is recurrent
(resp. transient) if and only C is a recurrent (resp. transient) in the original chain.

It follows from Proposition IV.3.17 (i) that any recurrent communication class is
a maximal element in the partially ordered set (C , →), the converse being not true,
since there exist irreducible transient chains. However one has the following in the case
where the number of states is finite :
44 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.3.19 Theorem. Any Markov chain with a finite number of states has at least one
recurrent state. A communication class is recurrent if and only if it is a maximal
element of the p.o.s. (C , →).
P P P P P
Proof. j∈E uij = j∈E Ei (Nj ) = j∈E n≥0 Ei (1j ◦ Xn ) = n≥0 1 = +∞. There-
fore there exists some j such that uij = +∞. In view of IV.37 uij = fij ujj . Therefore
ujj = +∞ and we conclude that j is a recurrent state. Now if C is a maximal class
then the chain restricted to C is irreducible and has finitely many states. It follows by
the first part of the proof that this chain is recurrent and therefore, in view of remark
IV.3.18 that C is recurrent. 
If the set E is not finite a recurrent state may not exist.

IV.3.20 Example. Let E = Z and mij = 1 if j = i + 1 and mij = 0 if j 6= i + 1. If


the chain starts at i0 , then it describes the trajectory ω = (i0 , i0 + 1, i0 + 2, . . .), that
is Pi0 (ω) = 1 so that fi0 ,i0 = 0. The state i0 is transient. The chain has no recurrent
state.

IV.3.4 Independence of Markov excursions

Assume that i is a recurrent state, then Pi (σi < +∞) = 1. It follows that θσk
i
is defined for all k ≥ 1, on a set of Pi -measure 1. In the following we prove that
Fσi , θσ−1 −1
1 (Fσi ), . . . , θσ n (Fσi ) are independent under Pi . Precisely:
i i

IV.3.21 Theorem. Let i be a recurrent state, let Y0 , . . . , Yn be a family of Fσi -


measurable random variables with values in some measurable spaces (T0 , T0 ), . . . , (Tn , Tn )
respectively and let ρ0 , . . . , ρn be their respective laws under Pi . Then under Pi , the
random variables Y0 , . . . , Yk ◦ θσk , . . . , Yn ◦ θσin (Well defined since σik are Pi -a.s. finite)
i
are independent and their laws are respectively ρ0 , . . . , ρn .

Proof. For any (Tk )- measurable and bounded real functions fk : Tk → R (k = 0, . . . , n)


we have:
Ei [f0 (Y0 )f1 (Y1 ◦ θσ1 ) . . . fn (Yn ◦ θσin )]
i

= Ei [f0 (Y0 ) f1 (Y1 ) . . . fn (Yn ◦ θσn−1 ) ◦ θσi ]
i

= Ei [f0 (Y0 )Ei [ f1 (Y1 ) . . . fn (Yn ◦ θσn−1 ) ◦ θσi |Fσi ]]
i

= Ei [f0 (Y0 )]Ei [f1 (Y1 ) . . . fn (Yn ◦ θσn−1 )]


i

where we used the fact that Y0 is Fσi -measurable, the strong Markov property and
the fact that Xσi = i. By induction we have that this expectation is equal to:
Ei [f0 (Y0 )] . . . Ei [fn (Yn )]. 
IV.3. MARKOV CHAIN 45

IV.3.22 Corollary. Let i be a recurrent state and let f : E → + . Let Z0 =


P P
R
0≤k<σi f (Xk ) and for n ≥ 1, Zn = σn ≤k<σn+1 f (Xk ). Then Zn = Z0 ◦ θσi and
n
i i
under Pi , the sequence Z0 , . . . , Zn , . . . is i.i.d. In particular taking f = 1, the sequence
σi , σi2 − σi , . . . , σin+1 − σin , . . . is i.i.d.

IV.3.5 Invariant measures

We recall that any nonnegative measure µ on E can be represented by a function


µ : E → [0 + ∞]. Such a measure is said to be invariant if :

µM = µ (IV.42)

and excessive if :
µM ≤ µ (IV.43)

A nonnegative function f : E → [0 + ∞] is harmonic if :

Mf = f (IV.44)

and superharmonic if :
Mf ≤ f (IV.45)

IV.3.23 Definition. µ (resp. f ) is said to be trivial if µ (reps. f ) is either identically


0 or identically +∞, otherwise it is said to be non trivial.

IV.3.24 Proposition. If f is superharmonic (resp. harmonic), then (1) for every


n∈ N one has M nf ≤ f (resp. M nf = f ). (2) For any probability µ on E, f (Xn) is
an (Fn )-nonnegative supermartingale (resp. nonnegative general martingale) for Pµ .
(3) The latter is Pµ - integrable if and only if µ(f ) < +∞.

Proof. The first assertion is obtained by induction on n. For the second, apply the first
part and the equality E(f (Xn+1 )|Fn ) = M f (Xn ). For the third assertion remark that
Eµ (f (X0 ) = µ(f ). 

Harmonic functions of irreducible recurrent MC

Let X be any Markov chain, with matrix M , then it is clear that the constant function
f : f (x) = 1 for all x ∈ E, is harmonic.

IV.3.25 Theorem. Let X be an irreducible recurrent Markov chain. Then any super-
harmonic function is constant.

Proof. Since f is superharmonic, f (Xn ) is a nonnegative supermartingale under Pµ for


any initial distribution µ. Since a nonnagative supermartingale has an a.s. limit that
46 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

closes it on the right as a supermartingale (corollary III.2.14), for any a.s. finite stopping
time τ , the optional sampling theorem III.1.4 applies thus: Eµ (f (Xτ )|F0 ) ≤ f (X0 )
Pµ -a.s. Take i, j ∈ E. Since the chain is irreducible recurrent, Pi (σj < +∞) = 1
and we can take τ = σj so that Xσj = j Pi -a.s. Thus we have f (j) = Ei (f (j)|F0 ) =
Ei (f (Xσj )|F0 ) ≤ f (X0 ) = f (i) Pi -a.s. By symmetry f (i) = f (j). 
Second proof, not relying on martingales (1) For any Markov chain, If (fi , i ∈ I) is a
family of superharmonic functions then ∧i∈I fi is superharmonic. Indeed if f = ∧i∈I fi ,
for any j ∈ I, fj ≥ M fj ≥ M f so that f = inf fj ≥ M f .
(2) For any irreducible Markov chain, if f is harmonic and f has a maximum then f
is constant (the maximum principle). Indeed let i0 ∈ E such that f (i0 ) = maxi∈E f (i)
P (n)
for all n ≥ 0 one has :0 = i∈E mi0 j (f (i0 ) − f (j)). Since all the terms on the RHS
(n)
are nonnegative, it follows that mi0 j (f (i0 ) − f (j)) = 0. By irreducibility for any j ∈ E
(n)
there exist n ≥ 0 such that mi0 j > 0. We conclude that f (j) = f (i0 ).
(3) For any irreducible recurrent Markov chain, any superharmonic function is har-
monic. Let f be superharmonic. Put g := f − M f and for n ≥ 1 let U (n) = nk=0 M k .
P

Then U (n) g = f − M n+1 f . If for some j ∈ E g(j) > 0, then for any i ∈ E:
(n) (n)
uij g(j) ≤ U (n) g(i) ≤ f (i), so that uij ≤ f (i)/g(j) contradicting the fact that
(n)
limn→∞ uij = uij = +∞ (recurrence). We conclude that g(i) = 0.
End of the proof. Let f be superharmonic and let i and j be in E. In view of
(1) and since a constant function is harmonic inf{f, f (i)} is superharmonic, and in
view of (3) inf{f, f (i)} is harmonic. Since the fatter has a maximum at i, it follows
that min{f (j), f (i)} = f (i). Permuting the roles of i and j, we obtain that f (j) =
min{f (j), f (i)} = f (i). Therefore f is constant.

Invariant measures of irreducible recurrent MC

IV.3.26 Definition. Let X be a Markov chain. For any i ∈ E we define a positive


measure λi on E by:
X 
λi (j) = Ei 1{j} (Xk ) (IV.46)
0≤k<σi

or equivalently, for any f : E → [0 + ∞]


X 
λi (f ) = Ei f (Xk ) (IV.47)
0≤k<σi

One has: λi (i) = 1 and λi (E) = Ei (σi ).

λi (j) is the average number of apparitions of j between two apparitions of i.


IV.3. MARKOV CHAIN 47

IV.3.27 Theorem (Existence of invariant measures). Let X be an irreducible recurrent


Markov chain. Then there exists at least one nontrivial invariant measure. Precisely
for every i ∈ E the measure λi defined by (IV.46) is a nontrivial invariant measure.

Proof. For all i, j ∈ E put : λ′i (j) = Ei ′


P 
1≤k≤σi 1{j} (Xk ) . We have λi (j) − λi (j) =
Ei (1{j} (Xσi )) − Ei (1{j} (X0 )) = 0. Thus λ′i = λi . Now, for any f ∈ E+ , we have the fol-
P 
lowing equalities: (λi M )(f ) = λi (M f ) = Ei k ≥0 1{k<σi } M f (Xk ) . By the Markov
P 
property, this is equal to: Ei k ≥0 1{k<σi } Ei (f (Xk+1 )|Fk ) and since (k < σi ) ∈ Fk ,
P  P 
the last expression is equal to Ei k≥0 1{k<σi } f (Xk+1 ) = Ei k≥1 1{k≤σi } f (Xk ) =
λ′i (f ) = λi (f ). This proves that λi is invariant. 

IV.3.28 Lemma. Let X be an irreducible recurrent Markov chain. Then any non
trivial excessive measure µ is such that: 0 < µ(i) < +∞ for all i ∈ E.

Proof. Let i, j ∈ E. µ ≥ µM n so that for all n ∈ N, µ(j) ≥ Pk∈E µ(k)m(n) (n)


kj ≥ µ(i)mij .
Since i and j communicate there exists n ∈ N such that m(n)
ij > 0. Therefore µ(i) > 0
implies µ(j) > 0 and µ(i) = +∞ implies µ(j) = +∞. 

IV.3.29 Lemma. Let X be an irreducible recurrent Markov chain and let λ be an


invariant measure such that: 0 < λ(i) < +∞. Then the matrix M̂ with the following
entries:
λ(j)
m̂ij = mji (IV.48)
λ(i)
is a transition matrix. The Markov chain associated to M̂ is irreducible and recurrent.
For any non negative measure ρ on E, associate the nonnegative function : ρ̂ defined
ρ(i)
by ρ̂(i) := λ(i) . Then the mapping ρ → ρ̂ is a bijection between the set of excessive
measures of M and the set of superharmonic functions of M̂ .
P
Proof. Since λ is an invariant measure, everywhere positive and finite: j∈E m̂ij =
1 P
λ(i) j∈E λ(j)mji = 1 so that M̂ is a transition matrix. By induction it is easily proved
(with obvious notations) that:

(n) λ(j) (n)


m̂ij = m (IV.49)
λ(i) ji

λ(j)
ûij = uji (IV.50)
λ(i)
so that the chain with matrix M̂ is irreducible and recurrent. The last assertion can
be easily checked. 

IV.3.30 Theorem. Let X be an irreducible recurrent Markov chain, let i ∈ E, and let
λi be defined by (IV.46). Then any excessive measure µ is proportional to λi and as
such is invariant.
48 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

Proof. Let µ be an excessive measure of M and let i ∈ E. Since the measure λi is non-
µ(j)
trivial we can define as in Lemma IV.3.29 µ̂, by µ̂(j) = λi (j) . Then µ̂ is superharmonic
function for M̂ . Since M̂ is the transition matrix of an irreducible recurrent Markov
chain, lemma IV.3.25 implies that µ̂ is constant. That is µ = cλi or some c ∈ [0 + ∞].
.

IV.3.31 Corollary (Uniqueness of invariant measures). Let X be an irreducible re-


current Markov chain, let i ∈ E, and let λi be defined by (IV.46). Then:
(i) If µ is an invariant measure then µ = cλi for some c ∈ [0 + ∞].
(ii) Any two nontrivial invariant measures are proportional.

Proof. This is a consequence of Theorem IV.3.27 and theorem IV.3.30.



We summarize the results on existence and uniqueness of nontrivial invariant mea-
sures for irreducible recurrent Markov chains in the following:

IV.3.32 Theorem. Let X be an irreducible recurrent Markov chain, and for any i ∈ E
let λi be defined by (IV.46). Then the set of all nontrivial invariant measures of X is
precisely the set: {cλi | 0 < c < +∞}

IV.3.33 Corollary (Classification of recurrent chains). Let X be an irreducible recur-


rent Markov chain. Then, one and only one of the following two cases prevail:
(i) For all non trivial invariant measures µ, µ(E) < +∞, Ei (σi ) < +∞ for all i ∈ E,
there exists a unique invariant probability (The chain is said to be positive),
(ii) For all non trivial invariant measures µ, µ(E) = +∞, Ei (σi ) = +∞ for all i ∈ E,
no invariant probability exists (The chain is said to be null).

Proof. Fix i0 ∈ E. By the preceding theorem any nontrivial invariant measure is of the
form µ = cλi0 where 0 < c < ∞. In particular, it follows that for any i ∈ E we have
λi = cλi0 where 0 < c < ∞. But for the invariant measure λi , one has λi (E) = Ei (σi ).
Thus two cases are possible either all non trivial invariant measures have a finite total
weight (case1) or all non trivial invariant measures have an infinite total weight (case2).

1
IV.3.34 Remark. In case (i) the unique invariant probability π verifies : π(i) = Ei (σi )
(i ∈ E). Indeed there exists 0 < ci < +∞ such that π = ci λi . Since λi (i) = 1 and
π(i) π(E) 1
π(E) = 1, one has π(i) = λi (i) = λi (E) = Ei (σi ) .

IV.3.6 Third classification

In this subsection X is a Markov chain. Taking into account the classification stated
in corollary IV.3.33 for irreducible recurrent chains, we introduce the following:
IV.3. MARKOV CHAIN 49

IV.3.35 Definition. A state i ∈ E is said to be positive if Ei (σi ) < +∞ and null if


Ei (σi ) = +∞.

IV.3.36 Proposition. Any transient state is null.

Proof. If i is a transient state then Pi (σi = +∞) > 0 and consequently Ei (σi ) = +∞.


IV.3.37 Proposition. In the same communication class either all the states are pos-
itive or all the states are null.

Proof. If the class is transient all the states of the class are null. If the class is recurrent,
say C, then the positivity or the nullity of a state i ∈ C does not change if we consider
the restricted chain MC , and in view of corollary IV.3.33 the states of C are either all
positive or all null. 

Invariant probability, the general case

IV.3.38 Lemma. Let X be a Markov chain and let µ be an invariant probability. If j


is a null state then µ(j) = 0.

Proof. Let µ be an invariant probability then µ(j) =


P
i∈E
(n)
µ(i)mij for all n ∈ N, all
j ∈ E. Suppose first that j ∈ E is a transient state. For any state i ∈ E, in view
P (n) (n)
of proposition IV.3.17(iii) n≥0 mij = uij < +∞. It follows that (mij ) →n→∞ 0.
Applying the Lebesgue dominated theorem on the discrete space E with probability µ
(n)
for the sequence of functions i → mij (n ∈ N), we obtain: µ(j) = 0.
Alternative proof. µ being invariant µU (j) = n µM n (j) = n µ(j) (= 0 or +∞) (∗).
P P
P P
On the other hand µU (j) = i µ(i)uij ≤ ujj i µ(i) = ujj where we used uij ≤ ujj
(Proposition IV.3.9). If j is transient then ujj is finite, and therefore µU (j) is finite,
and from equality (∗) it follows that µ(j) = 0.
Now suppose that j is a recurrent null state and let C be its communication class. Since
the restriction of µ on all transient states is 0, µC is invariant for the matrix MC . If µC
were non trivial, C being null, by corollary IV.3.33, we should have µC (C) = +∞, this
contradicts µC (C) ≤ 1. It follows that µC is trivial, but then µ(i) = 0 for all i ∈ C. 

IV.3.39 Remark. The lemma implies that if µ(i) > 0 where µ is a bounded (that
is µ(E) < +∞) invariant measure, then this state is positive, hence recurrent. The
condition of boundedness on the measure µ cannot be removed. If µ is invariant but
not bounded, then a transient state can have a strictly positive weight. In Example
IV.3.20 the Markov chain with E = Z and the transition matrix mij = 1 if j = i + 1
and mij = 0 if j 6= i + 1, µ such that µ(i) = 1 for all i ∈ E is invariant. Indeed for any
j ∈ E, µM (j) = i∈Z µ(i)mij = µ(j − 1) = 1.
P
50 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.3.40 Corollary. Any Markov chain that has an invariant probability has a positive
class.

The following refines the result of Theorem IV.3.19 in the finite case:

IV.3.41 Corollary. Any Markov chain with finitely many states has at least one in-
variant probability hence at least one positive class.

Proof. Let C ⊂ E be a recurrent class. The Markov chain restricted to C has an


invariant probability µ that can be extended to a probability measure on E, that is
invariant on E as well. Moreover µ(j) > 0 for all j ∈ C. In view of Theorem IV.3.38
any state j ∈ C is positive . 

In general the situation concerning invariant probabilities is described in the fol-


lowing:

IV.3.42 Proposition. Let X be a Markov chain, let A be the set of all positive classes,
and for any α ∈ A denote by γα the unique invariant probability measure of α extended
by 0 on E \ α, and by ∆(A) the set of all probabilities on A. The set of all invariant
probability measures Γ of X is given by the following:

X
Γ={ δα γα : δ ∈ ∆(A)} (IV.51)
α∈A

In particular X has an invariant probability if and only if it has some positive class. It
has a unique invariant probability if and only if it has exactly one positive class.

Proof. Let α be a positive class in X. Since α is recurrent, then the matrix Mα ,


restriction of M on α is the transition matrix of an irreducible Markov chain on α,
say Xα . For any invariant probability γα of Xα , the measure γα on E defined by
extending γα by 0 on E \ C is an invariant probability of X. Conversely any invariant
measure with support in α gives rise to an invariant probability of Xα . Moreover if
P
δ ∈ ∆(A), and if (γα , α ∈ A) is a family of invariant measures then α∈A δα γα is an
invariant probability of X and thus is in Γ. Conversely any element of Γ gives rise
to a family of invariant probabilities (γα , α ∈ A), where γα is supported by α, and a
family of nonnegative reals (δα , α ∈ A). Precisely δα = γ(α) (α ∈ A) and for all α such
γ(i)
that γ(α) > 0, γα (i) = γ(α) for all i ∈ α. Since γ(i) = 0 for any null state one has
P
γ = α∈A δα γα . This proves the first part of the assertion. Since in each positive class
there is exactly one invariant probability, the set Γ is a singleton if and only if A is a
singleton. 
IV.4. ERGODIC THEOREM FOR IRREDUCIBLE RECURRENT CHAINS 51

IV.4 Ergodic Theorem for irreducible recurrent chains


IV.4.1 Theorem (Ergodic theorem). Let i be recurrent, let λi be the measure defined
by (IV.46). For any f, g ∈ L1 (λi ) such that λi (g) 6= 0, one has:
P
f (Xk )
0≤k≤n λi (f )
P −→n→∞ Pi , a.s. (IV.52)
gXk ) λi (g)
0≤k≤n

Proof. Let f ∈ L1 (λi ) and assume first that f is nonnegative. Let Z0 = 0≤k<σi f (Xk )
P
P
and for n ≥ 1, Zn = σn ≤k<σn+1 f (Xk ). Then by corollary IV.3.22 Zn = Z0 ◦ θσin and
i i
under Pi , the sequence Z0 , . . . , Zn , . . . is i.i.d. Note that Ei (Z0 ) = λi (f ). Applying the
strong law of large numbers, we have:
1
(Z0 + · · · + Zn−1 ) −→n→+∞ λi (f ) Pi a.s. (IV.53)
n
For any m ∈ N let ν(m) = inf{n ∈ N|m < σin }. Then ν(m) =
P
0≤k≤m 1{i} (Xk )
ν(m)−1 ν(m)
Pi - a.s. Moreover we have σi≤ m < σi and ν(m) →m→+∞ +∞. Now
P
Z0 + · · · + Zn−1 = 0≤k<σn f (Xk ) so that :
i

X X X
f (Xk ) ≤ f (Xk ) ≤ f (Xk ) (IV.54)
ν(m)−1
0≤k<σi 0≤k≤m ν(m)
0≤k<σi

Therefore:

P P
Zk 0≤k<ν(m) Zk
P
ν(m) − 1 0≤k<ν(m)−1 0≤k≤m f (Xk )
≤P ≤ (IV.55)
ν(m) ν(m) − 1 0≤k≤m 1{i} (Xk ) ν(m)

Since the two extreme terms converge to λi (f ) Pi -a.s. , it follows that :


P
f (Xk )
P 0≤k≤m −→n→+∞ λi (f ) Pi a.s. (IV.56)
0≤k≤m 1{i} (Xk )

If f ∈ L1 (λi ), write f = f+ −f− and apply the preceding result on f+ and f− separately.
Since a similar limit can be obtained for g and since λi (g) 6= 0, (IV.52) results from
writing the limit of the quotient as a quotient of the limits.

The following is an easy consequence of Theorem IV.4.1 and Corollary IV.3.31.

IV.4.2 Theorem. Let X be an irreducible recurrent Markov chain, let λ be a non


trivial invariant measure. For any f, g ∈ L1 (λ) such that λ(g) 6= 0, and any initial
distribution µ, on has:
52 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

P
f (Xk )
0≤k≤n λ(f )
P −→n→∞ Pµ a.s. (IV.57)
g(Xk ) λ(g)
0≤k≤n

IV.4.3 Theorem. Let X be an irreducible recurrent Markov chain, and let µ be the
initial distribution.
(i) If X is positive and π is the invariant probability, then for all f ∈ L1 (π)
1 X
f (Xk ) −→n→∞ π(f ) Pµ a.s. (IV.58)
n
0≤k <n

(ii) If X is null and λ is a nontrivial invariant measure, then for all f ∈ L1 (λ) :
1 X
f (Xk ) −→n→∞ 0 Pµ a.s. (IV.59)
n
0≤k <n

Proof. (i) Since X is positive, relation (IV.58), is a particular case of theorem IV.4.2
where g is the constant 1 on E. (ii) Let f be nonnegative. For any finite F ⊂ E,
1F ∈ L1 (λ) so that, in view of theorem IV.4.2 :
P
f (Xk )
1 X 0≤k <n λ(f )
f (Xk ) ≤ P −→n→∞ Pµ a.s.
n 1F (Xk ) λ(F )
0≤k <n
0≤k <n

1 X λ(f )
lim f (Xk ) ≤
n→∞ n λ(F )
0≤k <n
1 P
Since λ(f ) is finite and λ(F ) can be chosen as large as desired, lim n f (Xk ) = 0.
n→∞ 0≤k <n

IV.5 Periodicity
IV.5.1 Definition. Let i ∈ E. The period of i is the GCD of the set Di := {n ≥
(n)
1|mii > 0}. By convention di = +∞ if Di = ∅. When di = 1 the state i is said to be
aperiodic.

IV.5.2 Remark. di = +∞ if and only if the singleton {i} is a transient communication


class.

IV.5.3 Proposition. If i ∼ j then di = dj


(n) (p)
Proof. If i ∼ j and i 6= j then there exists m, n ≥ 1 such that mij > 0 and mji > 0
(n+p) P (n) (p) (n) (p)
so that mii = ℓ∈E miℓ mℓj ≥ mij mji > 0 so that n + p ∈ Di . Let q ∈ Dj Then
(q) (n+q+p) (n) (q) (p)
mjj . mii ≥ mij mjj mji > 0. Thus n + p + q ∈ Di . di divides n + p + q and
n + p thus di divides their difference q. Since q ∈ Dj arbitrary, it follows that di divides
dj . By symmetry, ddj divides di . We conclude that di = dj . 
IV.6. ASYMPTOTIC PROPERTIES OF THE MARKOV MATRIX 53

IV.5.4 Proposition. Let C be a class of finite period d. Then there is a partition of


C into d subclasses C0 , . . . , Cd−1 with the following property:

(i) For any k0 , k ∈ {0, . . . , d − 1} and any i0 ∈ Ck0 , the chain Pi0 visits Ck only on
times n such that n ≡ k − k0 (mod d)

(ii) M d is the transition matrix of a Markov chain, for which C0 , . . . , Cd−1 are com-
munication classes and for which a chain starting from i0 ∈ Ck0 never visits Cℓ unless
ℓ = k0 . Moreover if i0 is recurrent for M , then it is recurrent for M d and if it is
transient for M it is transient for M d .

(m)
Proof. Fix i0 ∈ C arbitrary. Let j ∈ C. There exists m such that mji0 > 0. Let sj
be the unique integer in {1, . . . , d} such that m ≡ sj (mod d) and let rj = d − sj . Let
(n) (m+n)
Di0 j : {n ≥ 0|mi0 j > 0}. For any n ∈ Di0 j on has mi0 i0 > 0 so that d divides m + n,
or equivalently m + n ≡ 0 (mod d). Thus Di0 ,j is included in the set {rj + kd : k ∈ N}.
N
Let Cr be the set of all j such that Di0 ,j ⊂ r+d . We just prove that the Cr are disjoint
and that their union is C. The only point that remains to be proved is that each Cr
is nonempty. i0 ∈ C0 . Assume that d > 1. Since there exist k ≥ 1 and i0 , i1 , . . . , iqd−1
such that i0 → i1 , · · · → iqd−1 → i0 , it is clear that ik ∈ Ck for k = 1, . . . , d − 1. This
(dn)
ends the proof of (i). Now the set {n ≥ 1|mi0 i0 > 0} have CGD 1 so that the period
of i0 for M d is 1. Now it is clear from (i) that the chain with matrix M d starting from
P (dn) P (n)
i0 ∈ Ck0 never visits Cℓ with ℓ 6= k0 . Moreover n≥0 mi0 i0 = n≥0 mi0 i0 , so that if i0
is recurrent for M it is recurrent for M d and if it is transient for M it is transient for
M d. 

IV.6 Asymptotic properties of the Markov matrix


(n)
In this section we study the asymptotic behavior of the sequence M n ≡ (mij ) when
n goes to infinity. We already have some idea about the limit if the latter exists.
Indeed in the case of an irreducible Markov chain the ergodic Theorem IV.58 shows
that the Cesaro limit exists and is equal to a Matrix with equal raws each equal to µ,
the invariant probability in case of a positive chain and the null Matrix in case of a
null chain. Therefore if the limit of M n exists then it will take the same values. On
the other hand we know that when the chain is periodic (period d ≥ 2), then Theorem
(n)
IV.5.4 shows that the values of (mij ) are zero, unless i, j are in the same class as
the class of n modulo d, which excludes convergence of the sequence but hints at the
convergence of subsequences.
For k ∈ N∗, i, j ∈ E we recall the notations skij := Pi(σj = k), fij := Pi(σj < +∞)
and Li := Ei (σi ) and the equality Li = λi (E) (definition IV.3.26).
54 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

IV.6.1 Proposition. For n ≥ 1, we have:


X
fij = skij (IV.60)

k∈N
n
(n) (n−k)
X
mij = skij mjj (IV.61)
k=1

Proof. The first equality is a trivial consequence of the definitions and notations IV.3.1.
We prove the second. Since (Xn = j) ∩ (σj = k) = ∅ if k > n ≥ 1 we have:
(Xn = j) = ∪k∈N∗ (Xn = j) ∩ (σj = k) = ∪nk=1 (Xn = j) ∩ (σj = k).
Thus we have the following equalities:
Pi (Xn = j) = nk=1 Ei (1(σj =k) 1(Xn =j) )
P

= nk=1 Ei (EF
P
i [1(σj =k) 1(Xn =j) ])
k

= k=1 Ei (1(σj =k) EF


Pn
i [1(Xn =j) ])
k
Pn n−k
= k=1 Ei (1(σj =k) M (Xk , j))
Pn n−k
= k=1 Ei (1(σj =k) M (j, j))
Pn n−k
= k=1 Pi (σj = k) M (j, j))
We used the Markov property to derive the fourth equality. 

IV.6.2 Lemma. For any i ∈ E, let d = GCD{k ∈ N∗|m(k) ′


ii > 0} and d = GCD{k ∈
N∗|skii > 0}. Then d = d′
Proof. Let An = {k ∈ N∗ |k ≤ n, mii > 0}, A′n = {k ∈ N∗ |k ≤ n, skii > 0}, dn =
(k)

GCDAn , d′n = GCDA′n . By induction we prove d′n = dn for all n ≥ 1. For n = 1, the
equality is straightforward.We distinguish 3 cases : Case (1) sn+1
ii > 0 then in view of
(n+1)
formula (IV.61) mii > 0 so that d′n+1 = GCD{d′n , n+1} = GCD{dn , n+1} = dn +1.
(n+1)
Case (2) sn+1
ii = 0 and mii = 0. Then d′n+1 = d′n = dn = dn+1 . Case (3) sn+1
ii =0
(n+1) (n+1) Pn (n+1−k)
and mii > 0 and in view of (IV.61), 0 < mii = k=1 skij mjj so that for
(n+1−k)
some 1 ≤ k ≤ n, skij mjj > 0. It follows that n − k + 1 ∈ An and k ∈ A′n . By the
induction hypothesis, dn divides n − k + 1 and k, therefore dn divides (n − k + 1) + k =
n + 1. Thus : d′n+1 = d′n = dn = dn+1 . Now d = limn→∞ dn = limn→∞ d′n = d′ . 

IV.6.3 Lemma. For any A ⊂ N∗, we put:


 = {α1 q1 + · · · + αℓ qℓ : ℓ ≥ 1, α1 , . . . , αℓ ∈ N, q1, . . . , qℓ ∈ A}
If GCD A = 1, then there exists v0 ∈ N∗ such that [v0, ∞[⊂ Â
Z
Proof. In view of Bezout theorem, there exists some β1 , . . . , βℓ ∈ , q1 , . . . , qℓ ∈ A
Pℓ 2
P
such that : k=1 βk qk = 1. Let M = supk |βk | and q = k qk , v0 = M q . For
any v ≥ v0 we write v = qd + v ′ where q ∈ N and 0 ≤ v′ ≤ q −P1. It follows that

M q2 − v′ M q 2 − q, + v ′ ℓ1 βk qk
P
qd ≥ > d > M q − 1 so that d ≥ M q. Now v = d 1 qk
Pℓ
= 1 (d + βk v )qk . Since d + βk v ′ ≥ d − M q ≥ 0 we conclude that v ∈ Â.
′ 
IV.6. ASYMPTOTIC PROPERTIES OF THE MARKOV MATRIX 55

IV.6.4 Lemma. Let (sk )k≥1 be a nonnegative sequence such that :


P
k≥1 sk = 1 and GCD{k|sk > 0} = 1
Let (un )n≥0 be a real sequence defined by induction as follows:
u0 = 1 and un = nk=1 sk un−k (n ≥ 1)
P

Then the sequence (un ) is convergent, precisely:


1
lim un = P∞ (IV.62)
k=1 k sk
n→∞

Proof. Put λ = lim sup un and let (nt )t≥0 be a sequence in


n
N increasing to +∞ such
that limt→∞ unt = λ. We are going to prove:
Claim 1. There exists some q0 ∈ N∗ such that for all q ≥ q0 limt un −q = λ. t

Subclaim 1. For any q such that sq > 0 lim unt −q = λ. Passing to the limit in the
t
following equality
X
unt = sq unt −q + sk unt −k
1≤k≤nt ,k6=q

yields
X
lim unt = lim inf sq unt −q + lim sup sk unt −k
t t t
1≤k≤nt ,k6=q

The LHS is equal to λ. The first term of the RHS is equal to sq lim inf t unt −q . The
second term of the RHS can be considered as the limsup when t goes to +∞ of the
integral of the function gt defined on N∗ by gt(k) = un −k if 1 ≤ k ≤ nt, k 6= q, and
t

gt (k) = 0 otherwise, the intergal being w.r.t. the probability measure (sn ) on N∗ . Since
lim supt gt (k) ≤ λ if k 6= q and lim supt gt (q) = 0, by Fatou lemma, the limsup of the
integral is less or equal to (1 − sq ) λ. Therefore: λ ≤ sq lim inf t unt −q + (1 − sq ) λ, and
since sq > 0 we have: λ ≤ lim inf t unt −q . This ends the proof of subclaim 1.
Subclaim 2. For any q = α1 q1 + · · · + αℓ qℓ where , α1 , . . . , αℓ ∈ N and s1, . . . , sℓ > 0,
one has lim unt −q = λ. The proof is straightforward by induction.
t
Claim 1 is thus obtained as a consequence of lemma IV.6.3.
Claim 2. Put ℓi = ∞
P Pn
k=i+1 sk (i ≥ 0), δn = i=0 ℓi un−i (n ≥ 0). Then δn = δn−1 =
P∞ P∞
· · · = δ0 = 1 and i=0 ℓi = n=1 nsn .
The proof of this claim is standard.
P
End the proof of the lemma. For any t, δnt ≡ i∈N ℓi gt (i) is viewed as an integral of
the function defined on N by gt(i) = un −i1{i:i≤n } with respect to the nonnegative
t t

N
measure ℓ = (ℓi ) on . We distinguish two cases:
If ∞
P
i=0 ℓi < +∞, then the measure ℓ is bounded. Since 0 ≤ gt ≤ 1 for all t, and since
in view of claim 1 the sequence gt (i) goes to λ when t goes to +∞, the Dominated
56 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

P
convergence theorem implies that δnt goes to λ i∈N ℓi when t goes to ∞. Since
P
δnt = 1 for all t we conclude that λ i∈N ℓi = 1.
If ∞
P P
i=0 ℓi = +∞, by Fatou lemma we obtain: λ i∈N ℓi ≤ 1, so that λ = 0.

Last argument : replacing limsup by liminf in the beginning of the proof, leads to the
same conclusion. Since they are both equal to ( ∞ −1
P
n=1 nsn ) , the proof is complete.


IV.6.5 Theorem. Let X be a Markov chain. If j is a recurrent state of period dj then

(ndj ) dj
lim mjj = (IV.63)
n→∞ Lj

1
(with the convention ∞ = 0, and notation Lj = Ej (σj ))

Proof. Put d = dj . In view of lemma IV.6.2, GCD{k ∈ N∗|skjj > 0} = d. Put


sk = sdk
P∞ k P∞ dk
(dn)
jj and un = mjj . Thus GCD{k ∈
P∞
N
∗ |s >
k 0} = 1. Since j is recurrent
1 = k=1 sjj = k=1 sjj = k=1 sk . We can apply lemma IV.6.4 and since Lj =
Ej (σj ) = ∞
P k
P∞
k=1 k sjj = d k=1 k sk , we obtain (IV.63). 

IV.6.6 Theorem. Let X be a Markov chain. If j is an aperiodic recurrent state then,


for any i ∈ E,
(n) fij
lim mij = (IV.64)
n→∞ Lj
1
(with the convention ∞ = 0, and notation Lj = Ej (σj ))
(n) P k (n−k)
Proof. In view of (IV.61), mij = k∈N 1{k≤n} sij mjj is the integral on the measure
space N endowed with the nonnegative measure sij : k → skij (s0ij = 0) of the non-
negative function gn : k → 1{k≤n} mjj . By (IV.63), for any k ∈ N∗ , the sequence
(n−k)

1
(gn (k))n≥0 converges toLj when n goes to infinity. The dominated convergence theo-
(n)
rem yields: limn→∞ mij = limn→∞ k∈N skij gn (k) = k∈N skij L1j = L1j k
P P P
k∈N sij . In
(n) f
view of (IV.60) limn→∞ mij = Lijj . 

IV.6.7 Theorem. Let X be an irreducible recurrent aperiodic Markov chain then:

(n) 1
lim mij = (IV.65)
n→∞ Lj

1
(with the convention ∞ = 0, and notation Lj = Ej (σj )). In particular:
If the chain is positive then the limit of M n as n goes to infinity is a matrix with equal
raws, each raw being equal to π, the invariant measure of M .
If the chain is null then the limit of M n as n goes to infinity is a the null matrix.
IV.6. ASYMPTOTIC PROPERTIES OF THE MARKOV MATRIX 57

In the following we gather the results concerning the asymptotic behavior of the
sequence M n .

IV.6.8 Theorem. For any Markov chain we have the following:


(n)
(i) If j is a null state, then limn→∞ mij = 0
(ii) If j be a positive state of period dj (we know that in this case j is recurrent and dj
is finite) then for any i, and any r ∈ {0, . . . , dj − 1}

(ndj +r) dj
lim mij = fij (r) (IV.66)
n→∞ Lj
P
where: fij (r) = k≥0 Pi (σj = kdj + r)
In particular if i ∼ j there exists rij ∈ {0, . . . , dj − 1} such that :
(
dj
(nd +r) if r = rij
lim m j = Lj (IV.67)
n→∞ ij 0 6 rij
if r =
1
(with the convention ∞ = 0, and notation Lj = Ej (σj ))
P (n) (n)
Proof. If j is transient, then uij = n≥0 mij < +∞ therefore limn→∞ mij = 0. If j
is recurrent, then by modifying the proof of (IV.64), one can establish in a similar way
(nd +r) kd +r (n−k)dj
that mij j = k≥0 1{k≤n} sij j mjj
P
and that the limit when n → ∞ is equal
dj P kdj +r dj
to Lj k≥0 sij = Lj fij (r). 
58 CHAPTER IV. MARKOV CHAINS APRIL 3, 2021

0
April 3, 2021
Appendix A

Reminder on measure theory

A.1 Measurable structure


For any set Ω we denote by P(Ω) the set of subsets of Ω. We use frequently the term
‘collection” instead of “set” for a subset of P(Ω).

A.1.1 Definition. A ⊂ P(Ω) is a σ-field (or σ-algebra) on Ω if :


(i) ∅ ∈ A ,
(ii) If A ∈ A then Ω \ A ∈ A ,
(iii) If A = ∪n≥0 An and if An ∈ A (n = 0, 1, . . .) then A ∈ A .

The pair (Ω, A ) where Ω is a set and A is a σ-field on Ω, is called a measurable space.
If A is a σ-field then:
(1) Ω ∈ A ,
(2) If A = ∩n≥0 An and if An ∈ A (n = 0, 1, . . .) then A ∈ A ,
n=k A and if A ∈ A (n = 1, . . . , k) then A ∈ A ,
(3) If A = ∪n=1 n n

A.1.2 Fact. Note the following:

(1) P(Ω) is a σ-field, {∅, Ω} is a σ-field, the empty collection is not a σ-field.

(2) The intersection of any family of σ-fields is again a σ-field.

(3) For any collection C ⊂ P(Ω) we associate the collection σ(C ) defined as the
intersection of all σ- fields containing C . σ(C ) is then σ-field. This is the smallest
σ-field containing C . It will be called the σ-field generated by C .

(4) If (Ω, τ ) is a topological space, where τ is the collection of open sets. The Borel
σ-field is the σ-field generated by τ , that is σ(τ ).

59
60 APPENDIX A. REMINDER ON MEASURE THEORY

(5) R ≡ (−∞, +∞) will be generally endowed with its Borel σ-field BR associated
to its usual topology. BR is generated by the collection of all open intervals, by
the collection of all closed intervals, by the collection (−∞, α) where α ∈ Q, etc
..

(6) R ≡ [−∞ + ∞] will be endowed with the σ-field generated by the family [−∞ α)
(resp. [−∞ α], (α + ∞], [α, +∞] ) where α ∈ R (or α ∈ Q). This is precisely
the Borel σ-field of [−∞ + ∞] when endowed with its usual topology as the two
point compactification of R. If τ denotes the collection of open sets of R, then
the collection of open sets of the topology of R is the union of τ with the two
collections [−∞ α) and (β + ∞] α, β ∈ R.
A.1.3 Definition. Let (E, E ) and (F, F ) be two measurable spaces. A map f : E → F
is E /F -measurable if f −1 (A) ∈ E for any A ∈ F .

The notation f : (E, E ) → (F, F ) will always mean a E /F -measurable map f : E → F .

A.1.4 Fact. Note the following:

(1) Assume that the σ-field F is generated by C . f is E /F -measurable if already


f −1 (A) ∈ E for any A ∈ C .

(2) Let (E, E ), (E, E ), (G, G ) be measurable spaces. If f : (E, E ) → (F, F ) and
g : (F, F ) → (G, G ), then g ◦ f : (E, E ) → (G, G )

(3) Let (E, ξ), (F, η) be topological spaces. If f : E → F is continuous, then f is


σ(ξ)/σ(η)- measurable.

A function Ω → R is a real function. A function Ω → R is an extended real function


or a numerical function.

A.1.5 Proposition. Let X : (Ω, A ) → (E, E ) be a measurable map and Y : Ω → R


an extended real function. Y is E (X)-measurable if and only if there exists f : E → R,
E /BR -measurable such that Y = f ◦ X

For any a ∈ [−∞, +∞] put a+ = max(a, 0) a− = max(−a, 0)


Then we have : a = a+ − a− and |a| = a+ + a−
For f : Ω → [−∞, +∞] define:

(f + )(ω) = (f (ω))+ (ω ∈ Ω) (A.1)


(f − )(ω) = (f (ω))− (ω ∈ Ω) (A.2)
A.1. MEASURABLE STRUCTURE 61

Therefore we have:

f = f+ − f− (A.3)
|f | = f + + f − (A.4)

For any sequence (an ) in R we recall the following limits:


lim sup an = inf sup an (A.5)
n→+∞ m≥0 n≥m

lim inf an = sup inf an (A.6)


n→+∞ m≥0 n≥m

A.1.6 Fact. Note the following:

(1) lim inf an ≤ lim sup an


n→+∞ n→+∞

(2) (an ) converges in R if and only if : lim


n→+∞
inf an = lim sup an
n→+∞

Let fn : Ω → [−∞, +∞] n = 0, 1, 2 . . .. We define the following functions on Ω:

(sup fn )(ω) := sup fn (ω) (A.7)


n n
(inf fn )(ω) := inf fn (ω) (A.8)
n n
(lim sup fn )(ω) := lim sup fn (ω) = inf sup fn (ω) (A.9)
n→+∞ n→+∞ n≥0 m≥n

(lim inf fn )(ω) := lim inf fn (ω) = sup inf fn (ω) (A.10)
n→+∞ n→+∞ n≥0 m≥n

A.1.7 Fact. If fn (n = 0, 1, . . .) is measurable then:

(1) sup fn , inf fn , lim sup fn , lim inf fn are measurable


n n n→+∞ n→+∞

(2) CV ((fn )) := {ω ∈ Ω|fn (ω) converges} ∈ A

A.1.8 Definition. A measurable real function s : Ω → R is said to be simple if its


range contains only finite points. If the range is {a1 , . . . , an } (where the ai are mutually
distinct), then Ai := {ω|s(ω) = ai } ∈ A and one has:
i=n
X
s= ai 1Ai (A.11)
i=1

Any function of the form s =


Pi=n
i=1 ai 1Ai where Ai ∈ A , ai ∈ R is simple. Note
that no empty intersections of distinct Ai s is required. The form of s given in the
definition is the canonical form, it is the only representation of s for which the distinct
ai s that appear in the canonical form are the values of s.
62 APPENDIX A. REMINDER ON MEASURE THEORY

A.1.9 Theorem. Let f : Ω → [−∞, +∞] be measurable.

(i) There exists a sequence (sn ) of simple measurable functions such that: sn (ω) →
f (ω) when n → +∞, for every ω ∈ Ω

(ii) If f ≥ 0 then the sequence (sn ) may be chosen to be positive and increasing:
0 ≤ s0 ≤ s1 ≤ · · · ≤ f

(iii) If f is bounded (that is |f | ≤ M for some M ∈ R+) then sequence (sn) can be
chosen to be increasing and uniformly convergent

A.2 Positive measure

A.2.1 Definition. Let (Ω, A ) be a measurable space. A positive measure on (Ω, A )


is a map µ : A → [0, +∞] satisfying the following properties:

(i ) µ(∅) = 0,

(ii ) For any sequence (An ∈ A , n = 0, 1, . . .) of pairwise disjoint elements of A


P
(Ai ∩ Aj = ∅ for i 6= j) µ(∪n≥0 An ) = n≥0 µ(An )

The triple (Ω, A , µ) is called a measure space.

Note that in the definition, requirement (ii) is equivalent to the two following conditions:

Pn
(iia) µ(∪nk=1 Ak ) = k=0 µ(Ak ) for any finite family (Ai ∈ A i = 1, . . . , n) of pairwise
disjoint subsets (Ai ∩ Aj = ∅ for i 6= j)

(iib) µ(∪n≥0 An ) = limn→+∞ µ(An ) for any sequence (An ∈ A , n = 0, 1, . . .) of increas-


ing elements of A : A0 ⊂ A1 ⊂ · · ·

Moreover:

(iii ) µ(∩n≥0 An ) = limn→+∞ µ(An ) for any sequence (An ∈ A , n = 0, 1, . . .) of decreas-


ing elements of A A0 ⊃ A1 ⊃ · · · with at least some i0 such that µ(Ai0 ) < +∞

Arithmetic in [0, +∞] Throughout this chapter we adopt the following rules:
x + (+∞) = (+∞) + x = +∞ for all x ∈ [0, +∞]
0.x = x.0 = 0 for all x ∈ [0, +∞]
(+∞).x = x.(+∞) = +∞ if x > 0
A.2. POSITIVE MEASURE 63

A.2.1 Integration of positive functions

A.2.2 Definition. Let (Ω, A , µ) be a positive measure space. For any E ∈ A and
any simple function s of the form:
i=n
X
s= ai 1Ai (A.12)
i=1

where (a1 , . . . , an ) are distinct in [0, +∞) and (A1 , . . . , An ) is a partition of Ω, the
integral of s on E with respect to µ is the extended positive number (i.e. in [0, +∞]):

Z n
X
s dµ = ai µ(Ai ∩ E) (A.13)
E i=1

For any measurable f : [0, ∞] the integral of f on E with respect to µ is the extended
positive number:
Z Z
f dµ = sup{ s dµ : s simple, s ≤ f } (A.14)
E E

A.2.3 Proposition. Let f, g : Ω → [0, +∞] be measurable and let E ∈ A .


R R
(1) If 0 ≤ f ≤ g then E f dµ ≤ E g dµ
R R
(2) If A ⊂ B then A f dµ ≤ B f dµ
R R
(3) If c ∈ [0, +∞] then E c f dµ = c E f dµ
R
(4) If f (ω) = 0 for all ω ∈ E, then E f dµ = 0
R
(5) If µ(E) = 0 and 0 ≤ f ≤ +∞ then E f dµ = 0
R
(6) If E f dµ = 0 then µ({ω ∈ E|f (ω) > 0}) = 0
R
(7) If E f dµ < +∞ then µ({ω ∈ E|f (ω) = +∞}) = 0
R R
(8) E f dµ = Ω 1E f dµ
R
In the sequel we drop the subscript Ω in the expression of the integral Ωf dµ when
the integral is on the whole space.

A.2.4 Theorem (Lebesgue monotone convergence Theorem). Let (fn ) be a sequence


of measurable functions such that:

(i) 0 ≤ f0 (ω) ≤ f1 (ω) ≤ · · · ≤ +∞, for every ω ∈ Ω


64 APPENDIX A. REMINDER ON MEASURE THEORY

(ii) fn (ω) → f (ω) as n → ∞, for every ω ∈ Ω

Then f is measurable and :


Z Z
fn dµ → f dµ as n → ∞ (A.15)

A.2.5 Theorem. If fn : Ω → [0, +∞] is measurable for n = 0, 1, 2, . . . and



X
f (ω) = fn (ω) (ω ∈ Ω) (A.16)
n=0

Then
Z ∞ Z
X
f dµ = fn dµ (A.17)
n=0

A.2.6 Theorem (Fatou-lemma). If (fn ) is a sequence of positive measurable functions


then:

Z Z
(lim inf fn ) dµ ≤ lim inf fn dµ (A.1)
n→∞ n→∞

A.2.2 Integration of measurable functions

A.2.7 Definition. f : Ω → [−∞, +∞] is said to be quasi-integrable if f + dµ < +∞


R

or f − dµ < +∞. If f is quasi-integrable its integral is the extended number :


R

Z Z Z
f dµ = f dµ − f − dµ
+
(A.18)

f + dµ < +∞ and f − dµ < +∞ or equivalently if


R R R
f is integrable if |f | dµ < +∞.

We denote by L 1 (µ) the set of all measurable functions f : Ω → R that are integrable.
A.2.8 Proposition. We have the following:

(1) If f, g ∈ L 1 (µ), α, β ∈ R then αf + βg ∈ L 1 (µ)


(2) if f ∈ L 1 (µ) then | f d µ| ≤ |f |d µ
R R

A.2.9 Theorem (Lebesgue dominated convergence theorem). Let fn : Ω → R be a


sequence of measurable functions and let g ∈ L 1 (µ). Assume that:

(1) fn (ω) → f (ω) as n → ∞ (ω ∈ Ω)

(2) |fn (ω)| ≤ g(ω) (n = 0, 1, · · · ; ω ∈ Ω)

Then :
A.2. POSITIVE MEASURE 65

(3) f ∈ L 1 (µ),
R
(4) |fn − f | dµ → 0, as n → ∞

and
R R
(5) fn dµ → f dµ, as n → ∞

The role of sets of nul measure

A property that depends measurably on ω is said to be true almost every where if the
measure of the set of all ω where the property fails has µ-measure 0. This applies to
measurable functions.

A.2.10 Definition. Two measurable functions f, g : [−∞, +∞] are equivalent if :

µ({ω|f (ω) 6= g(ω)}) = 0 (A.19)

This equivalence is denoted f =µ g and currently f = g a.e [µ] or simply f = g a.e.

Theorems A.2.4, A.2.6, A.2.9 remain true if we replace functions in R by functions in


R̄, and equality of functions, inequality of functions, limits of sequences of functions,
when they are stated for all ω ∈ Ω, by the corresponding relations a.e. [µ]

A.2.3 Radon-Nikodym derivative

A positive measure µ is bounded (or finite) if µ(Ω) < +∞. µ is σ-finite if there exists
an increasing sequence A1 ⊂ A2 ⊂ · · · in A such that µ(An ) < +∞ and ∪n An = Ω.

A.2.11 Definition. Let λ and µ be two positive measures on (Ω, A ).


(i) λ is absolutely continuous with respect to µ (written λ << µ) if λ(A) = 0 for every
A ∈ A such that µ(A) = 0 (that is : µ(A) = 0 ⇒ λ(A) = 0).
(ii) λ and µ are equivalent if for any A ∈ A : λ(A) = 0 if and only if µ(A) = 0

A.2.12 Theorem (Radon-Nikodym Theorem). Let (Ω, A , µ) be a bounded positive


measure and let λ be a positive measure on (Ω, A ). If λ << µ then there exists a
unique (up to equivalence) measurable function f : Ω → [0, +∞] such that dλ = f dµ.
Moreover:
(i) f ∈ L 1 (µ) if and only if λ is bounded
(ii) f is finite if and only if λ is σ-finite
66 APPENDIX A. REMINDER ON MEASURE THEORY

A.2.13 Remark. The main statement of the Radon-Nikodym theorem remains true if
µ and λ are both assumed to be σ-finite positive measures; note however that in this case
the function f that is announced will be only “locally integrable”; that means precisely
that there exists an increasing sequence A1 ⊂ A2 ⊂ · · · such that µ(An ) < +∞,
R
∪n An = Ω, and An f dµ < +∞.

A.2.4 Lp spaces

We denote by L ≡ L (Ω, A ) the set of all measurable functions Ω → (−∞, +∞), by


L(µ) ≡ L(Ω, A , µ) the set of all equivalence classes of L with respect to the equivalence
relation ≡µ , and by N ≡ N (µ) the set of all functions that are 0 a.e.[µ]. L is a real
vector space for pointwise addition and multiplication by a scalar. N is a vector
subspace of L . The set L(µ) is endowed with a vector space structure as the quotient
space L /N (µ).
For any f : Ω → [−∞, +∞] and α ∈ R, put A(α, f ) = {ω| f (ω) > α} and define:

ess sup f = sup{α ∈ R| µ(A(α, f ))) > 0} (A.20)

(Convention inf ∅ = +∞, sup ∅ = −∞)


ess sup f is called the essential supremum of f . It depends obviously on µ (through
its nul sets only)
Put A(f ) = {α ∈ R|µ({ω| f (ω) > α}) = 0}. We have the following:
(1) ess sup f = inf A(f ),
(2) f ≤ α µ-a.e. if and only if ess sup f ≤ α.
(3) If f ≡µ g then ess sup f = ess sup g
For any measurable f : Ω → [−∞, +∞] define:

kf k∞ = ess sup|f | (A.21)


Z 1
p p
kf kp = |f | dµ (1 ≤ p < +∞) (A.22)

For 1 ≤ p ≤ +∞ we denote by L p (µ) ≡ L p (Ω, A , µ) the set of all functions


f : Ω → (−∞, +∞) such that kf kp < +∞.

A.2.14 Proposition. Let 1 ≤ p ≤ ∞. For any α, β ∈ R, f, g ∈ L p(Ω, A , µ) one has:


(1) αf + βg ∈ L p (µ),

(2) kf + gkp ≤ kf kp + kgkp


A.2. POSITIVE MEASURE 67

(3) kαf kp = |α|kf kp

(4) kf kp = 0 if and only if f ≡µ 0

Therefore L p (µ) is a vector subspace of L , and k · kp is a pseudo-norm on L p (µ).

L p (µ) contains N as a vector subspace. We denote by Lp (µ) ≡ L∞ (Ω, A , µ) the


set of all equivalent classes of L p (µ) with respect to the equivalent relation ≡µ or
equivalently the vector space L p (µ)/N . Thus Lp (µ) is a vector subspace of L(µ). For
any f ∈ Lp (µ), we define kf kp as kf kp where f ∈ f it is clear that this definition does
not depend of the choice of the function f in f . Note that N is the zero of Lp (µ) and
that kf kp = 0 if and only if f = N . It follows that k · kp is a norm on Lp (µ). In fact
we have:

A.2.15 Theorem. Lp (Ω, A , µ) is a Banach space (1 ≤ p ≤ ∞)


68 APPENDIX A. REMINDER ON MEASURE THEORY
Appendix B

Monotone class

B.1 Monotone class theorem for sets


In the following, we are given a set Ω. The word class is used for “set of subsets of Ω”.

B.1.1 Definition. A set M of subsets of Ω is a monotone class (or λ-system) if it has


the following properties:
(i) If A, B ∈ M and B ⊂ A then A − B ∈ M
(ii) If (An )n≥0 is an increasing sequence in M then ∪n≥0 An ∈ M .

Since the intersection of any family of monotone classes is a monotone class, and
since the set of all subsets of Ω is a monotone class, the intersection of all monotone
classes containing some set B of subsets of Ω (a class), is a monotone class. This is the
smallest monotone class containing B and it will be denoted B m .

B.1.2 Definition. A class I is said to be stable for finite intersection (or π-system)
if it has the following property:
(i) If A ∈ I and B ∈ I , then A ∩ B ∈ I .

It is clear that a class containing Ω is a σ-field if and only if it is a monotone class


and stable for finite intersection. If B is any class we denote by σ(B) the smallest
σ-field containing B. The following result is often used to prove that some class is or
contains some desired σ-field.

B.1.3 Theorem (monotone class theorem for sets). Let M be a monotone class and
let I be a class that is stable by finite intersection and containing Ω. If M contains
I then M contains σ(I ).

Proof. We are going to prove that I m , the smallest monotone class containing I , is
stable by finite intersection, thus showing that I m is a σ-field. Since I ⊂ I m ⊂ M ,

69
70 APPENDIX B. MONOTONE CLASS

we will have σ(I ) ⊂ I m ⊂ M , and the theorem will be proved. Let M1 = {A ∈


M | ∀B ∈ I , A ∩ B ∈ I m }. It is easy to see that M1 is a monotone class containing I
so that I m ⊂ M1 . Let M2 = {A ∈ M | ∀B ∈ I m , A ∩ B ∈ I m }. M2 is a monotone
class. Moreover let A ∈ I . For any B ∈ I m , we have B ∈ M1 , so that, by definition
of M1 , A ∩ B ∈ I m , thus A ∈ M2 . This proves I ⊂ M2 . It follows that I m ⊂ M2 ,
or put otherwise, I m is stable by finite intersection. 

B.2 Monotone class theorem for funtions


A map f : Ω → R will be called a numerical function. A real function is a map
f :Ω→ R. A numerical function is said to be bounded if there exists some K ∈ R+
such that |f (ω)| ≤ K for all ω ∈ Ω. The set of all bounded functions defined on Ω is a
subspace of the vector space of all real valued functions.

B.2.1 Definition. A set H of numerical functions is stable by (bounded) monotone


convergence if whenever a non decreasing sequence (fn ) of (bounded) functions in H
converges to some (bounded) function f , then f ∈ H .

B.2.2 Definition. A set of real functions C is stable by multiplication if whenever


f ∈ C and g ∈ C then f g ∈ C

B.2.3 Theorem (Monotone Class theorem - functional form). Let H be a vector space
of bounded functions defined on Ω that is stable by bounded monotone convergence and
let C be a set of real functions stable by multiplication and containing the constant
function 1. If H contains C then H contains all bounded σ(C )-measurable functions.

We start by the following:

B.2.4 Lemma. Let H0 be a vector space of bounded real functions that is stable by
bounded monotone convergence, by multiplication and containing the constant 1. Then
H0 is the set of bounded σ(H0 )-measurable functions.

Proof. 1) H0 is stable by the map f |f |, f f + (f, g)f ∧ g, (f, g) f ∨ g.


f √
Let f ∈ H0 , |f | ≤ 1 (in the general case replace f by Recall thet 1 − x =
kf k ).
n
p
1 − n≥1 αn x , where αn ≥ 0 (n ≥ 1), for all 0 ≤ x ≤ 1. Since |f | = 1 − (1 − f 2 ), we
P

have the equality : |f | = 1− n≥1 αn (1− f 2 )n . It follows that 1− |f | ∈ H0 by bounded


P

montone convergence. Since 1 ∈ H0 , we conclude that |f | ∈ H0 . From f + = 12 (f +|f |),


f − = −(−f )+ , f ∨ g = g + (f − g)− and f ∧ g = −(−f ∨ −g), we deduce the rest of
the claim.
2) Let B := {A ⊂ Ω : 1A ∈ H0 }. B is clearly a σ-field.
B.2. MONOTONE CLASS THEOREM FOR FUNTIONS 71

3) If f ∈ H0 then f is B-measurable: Assume first that f ≥ 0 and let A = {ω|f (ω) ≥


1}. 1A = limnր+∞ (f ∧ 1)n (monotone bounded limit). Therefore A ∈ B. For any
t > 0,{ω|f (ω) ≥ t} = {ω|(f /t)(ω) ≥ 1} is in B. Thus, f is B-measurable. Generally,
if f ∈ H0 , then, by the preceding, f + kf k is B-measurable, then so is f .

4) B = σ(H0 ): By definition B ⊂ σ(H0 ), and by 3) σ(H0 ) ⊂ B. f ∈ H0

5) If f ∈ H0 then f is B-measurable and bounded.

6) If f is B-measurable and bounded then f ∈ H0 : By definition of B, any B-


measurable indicator function is in H0 , so is any B-simple function (that is a function
that takes only a finite number of values) since it is a linear combination of indicator
functions, more generally , any B-measurable bounded f ≥ 0 is an increasing limit of
B simple functions and therefore f ∈ H0 and finally any B-measurable bounded f can
be written f = f + − f − and therefore is in H0 .
The conclusion follows from 5) and 6). 

Proof of theorem B.2.3. Let H0 be the intersection of all vector subspaces of H


containing C and stable by bounded monotone convergence. Then H0 is a vector
subspace of H containing C and stable bounded monotone convergence. We claim
that H0 is stable by multiplication. By lemma B.2.4, it follows that H0 is the set of all
σ(H0 )- measurable bounded functions. Therefore H0 contains the σ(C )-measurable
bounded functions.

Proof of the claim: H0 is stable by multiplication. Let L0 be the smallest subvector


space of H containing C . Clearly L0 ⊂ H0 .

1) L0 is stable by multiplication. The proof is straightforward.

2) Let H1 = {f ∈ H0 |∀g ∈ L0 : f g ∈ H0 }, then H1 = H0 . First remark that in


order that f ∈ H belongs to H1 , it is enough to prove : ∀g ∈ L0 , g ≥ 0, : f g ∈ H0 .
Indeed if the latter is proved then for any g ∈ L0 , g + kgk ∈ H0 and nonnegative,
therefore f (g + kgk) ∈ H0 and f g ∈ H0 − kgkf . And since kgkf ∈ H0 we conclude
that f g ∈ H0 . Thus H1 = {f ∈ H0 |∀g ∈ L0 , g ≥ 0 : f g ∈ H0 }. Now H1 is easily
seen to be a vector space. H1 is stable by monotone bounded convergence: If fn ր f ,
fn ∈ H1 (n ∈ N) and f is bounded , then for all g ∈ L0 , g ≥ 0, we have fng ր f g and
consequently f g ∈ H0 . It follows that that f ∈ H1 . Finally H1 contains C . Thus H1
contains H0 . and since H1 ⊂ H0 we conclude that H1 = H0 .

3) H2 = {f ∈ H0 |∀g ∈ H0 : f g ∈ H0 }. Then H2 = H0 . By a similar argument to the


one used in (2) H2 = {f ∈ H0 |∀g ∈ H0 , g ≥ 0 : f g ∈ H0 }. We prove similarly that H2
is a vector space stable by bounded monotone convergence. Finally it follows from 2)
that H2 contains C . Therefore H2 contains H0 and since H2 ⊂ H0 we conclude that
72 APPENDIX B. MONOTONE CLASS

H2 = H0 .
We conclude from (3) that H0 is stable by multiplication. 
Appendix C

Transition probability

We consider a family (Nx , x ∈ E) of probabilities on the measurable space (F, F ),


where E is a set. If the set E is itself measurable, then it is interesting to assume that
the map x → Nx is measurable in some sense. The following definition provides the
right assumptions:

C.0.1 Definition. A kernel from the measurable space (E, E ) to the measurable space
(F, F ) is a mapping N : E × F → [0 + ∞] such that:
(i) For all x ∈ E, N (x, ·) is a non negative measure on (F, F ),
(ii) For all B ∈ F , N (·, B) is E -measurable.

We see that a kernel is simply a family of non negative measures (Nx , x ∈ E) with
the additional requirement of measurability in the sense of property (ii). A kernel is
also called a transition measure. The Kernel is said to be sub-Markov (resp. Markov)
if N (x, F ) ≤ 1 (resp. N (x, F ) = 1) for all x ∈ E. A Markov kernel is also called
transition probability. The measure N (x, ·) is also denoted Nx (·).

C.1 Composition of probabilities


Given a measurable family of nonnegative measures (Nx , x ∈ E) on F and a positive
R
measure µ on E, then it is meaningful to define an integral of measures N dµ as a
measure on F . The following specializes the construction in the case of probabilities, the
result of the construction is what we call composition of probabilities, in fact product
of µ by N . This leads to a general form of Fubini Theorem.

C.1.1 Proposition. Let N be a transition probability from (E, E ) to (F, F ) and let µ
be a probability measure on (E, E ).
1) There exists a probability measure Q on (E × F, E ⊗ F ) such that for all A ∈ E , B ∈
F:

73
74 APPENDIX C. TRANSITION PROBABILITY

Z
Q(A × B) = N (x, B)µ(dx) (C.1)
A

Such a measure is unique and is denoted µ ⋉ N .

2) Let f : E × F → R be E ⊗ F -measurable.
R
(i) If f ≥ 0 then the map x → F f (x, y)Nx (dy) is E -measurable ≥ 0 and we have the
equality: Z Z Z
f d(µ × N ) = [ f (x, y)Nx (dy)]µ(dx) (C.2)
E F
R
(ii) f is µ ⋉ N -integrable if and only if the map x → F |f (x, y)|Nx (dy) is µ-integrable
and in this case equality (C.10) holds.

The marginal of µ ⋉ N on F is a probability measure denoted µ.N or simply


R
µN . By definition µN (B) = E N (x, B)µ(dx) for all B ∈ F . This justifies also the
R
notation µ.N = N dµ where it shows that µ.N is obtained as an ” integral” of the
function with measure values x → Nx . Intuitively µN is the probability obtained by
composition of probabilities. In the discrete case this amounts to first select x randomly
using distribution µ, then use the distribution Nx to select y ∈ F .
µ µN defines an operator from the set of probabilities on (E, E ) to the set of
R
probabilities on (F, F ). Dually, let f ∈ Fb . For any x ∈ E, F f (y) Nx (dy) will be
denoted N (x, f ) or N f (x). Since x → N f (x) is E - measurable and bounded N gives
rise to an operator f → N f from the Banach space Fb to the Banach space Eb .

C.2 Regular conditional probability


Let (Ω, A , P ) be a probability space, (E, E ) a measurable space and T : Ω → E a
A /E -measurable map.
For any given A ∈ A the conditional probability P (A |T ) is a r.v. defined up to a set
of P -measure 0 (that depends on A). If for any A ∈ A , we choose an arbitrary version
of the conditional probability P (A |T )(·), it is not true that the σ-additivity of the map
P (· |T )(ω) holds for a significant set of states ω. The question is therefore whether we
can find a family of probability measures (that is σ-additive measures) (N (T (ω) , ·),
ω ∈ Ω) such that for any given A ∈ A , N (T (ω), A) = P (A |T )(ω) for almost all ω. If
such an object exists then we say that this a regular conditional probability of P given
T.
In what follows E × Ω is endowed with the product σ-field E ⊗ A . We denote by
µ := T (P ) = PT the probability on (E, E ) image of P by T , and Q := PT ×IdΩ the
C.2. REGULAR CONDITIONAL PROBABILITY 75

probability measure on (E × Ω, E ⊗ A ), image of P by T × IdΩ . By definition Q is


the unique measure on (E × Ω, E ⊗ A ) such that Q(B × A) = P (A ∩ T −1 (B)) for all
B ∈ E and A ∈ A . Note that the marginal of Q on E is µ and the marginal on Ω is P .

C.2.1 Definition. A Markov kernel N from (E, E ) in (Ω, F ) is a quotient regular


conditional probability (QRCP) of P w.r.t T if for any A ∈ A one has:

N (T (ω), A) = P (A |T )(ω) a.s. (C.3)

C.2.2 Proposition. The following properties are equivalent :


(i) N is a QRCP of P w.r.t T
(ii) Q = µ ⋉ N
(iii) for any A ∈ A , B ∈ E :
Z
−1

P A∩T (B) = N (t, A)µ(dt) (C.4)
B

(iv) For any Y that is A -measurable and ≥ 0 (resp. bounded, resp. integrable) :

E(Y |T )(ω) = N (T (ω), Y ) a.s. (C.5)

or explicitly E(Y |T ) = h(T ) ≡ h ◦ T where :


Z
h(t) = Y (ω)Nt (dω) (C.6)

(v) For any Y that is A -measurable and ≥ 0 (resp. bounded, resp. integrable) and any
f that is E - measurable ≥ 0 (resp. bounded) :
Z Z Z
f (T (ω))Y (ω)P (dω) = f (t)[ Y (ω)Nt (dω)]µ(dt) (C.7)
E Ω

C.2.3 Remark. If N is a QRCP of P given T , then in particular on has : P = µ.N =


R
N dµ. Therefore N is in a sense a “decomposition” of P via the measure space
(E, E , µ). Put B := σ(T ) and ν(ω, A) = N (T (ω), A) for all ω ∈ Ω and A ∈ A . Then ν
R
is a transition probability from (Ω, B) to (Ω, A ), and one can write P = P̂ ν = ν dP̂ ,
where we denoted by P̂ the restriction of P to B. Thus ν is a decomposition of P via
the measure space (Ω, B, P̂ )

Given (Ω, A , P ) a probability space, (E, E ) a measurable space and T : Ω → E a


A /E -measurable map, it is not always possible to find a decomposition of P via the
measure space (E, E , T (P )) or said otherwise it is not true that a Quotient Regular
Conditional Probabilities always exists. We shall content ourselves with the following
existence result which is sufficient for most applications in probability theory:
76 APPENDIX C. TRANSITION PROBABILITY

C.2.4 Theorem. Assume that Ω is a polish space endowed with its Borel σ-field (A =
BΩ ), that (E, E ) is a measurable space and that T : Ω → E is A /E -mesurable. Then
there exists a QRCP N of P given T .

One standard particular case where regular conditional probability is often considered
is the following: E = Ω, E = B is a sub-σ-field of A et h = idΩ .

C.2.5 Definition. Let P be a probability measure on (Ω, A ) and let B be a sub-σ-


field of A . A kernel m from Ω to (Ω, A ) is a regular conditional probability (RCP in
the sequel) of P given B if m is a B-mesurable and if for all A ∈ A ,

n(ω, A) = P (A |B))(ω) a.s (C.8)

Equivalently , (m, idΩ ) is a QRCP for P , where idΩ is considered as A /B-mesurable


map.

An RCP given B is a special case of QRCP : just take (E, E ) = (Ω, B) and T the
identity on Ω. Conversely given N a QRCP of P given T , put n(ω, A) = N (T (ω), A),
then n is a RCP of P given σ(T ).

C.2.6 Definition. Let Q be a probability on the product (E × F, E ⊗ F ), let πE the


projector on E, µ = πE (Q) the marginal of Q on E. A transition probability N from
E to F , is a Product regular conditional probability (PRCP in the sequel) of Q given
πE if Q = µ ⋉ N , that is for all A ∈ E and B ∈ F the following equality holds:
Z
Nx (B)µ(dx) = Q(A × B) (C.9)
A

R
or equivalently: For any f : E × F → + that is E ⊗ F -measurable
Z Z Z
[ f (x, y)Nx (dy)]µ(dx) = f dQ (C.10)
E F E×F

C.2.7 Proposition. Let (Ω, A , P ) be a probability space and let X and Y be two
random variables with values in (E, E ) and (F, F ) respectively. Let Q = PX,Y be the
probability measure on (E × F, E ⊗ F ) image of (X, Y ). If N is a PRCP of Q given
πE , then :
1) For any B ∈ F : P (Y ∈ B|X)(ω) = N (X(ω), B) P -a.s.
2) For any f : E × F → R that is E ⊗ F -measurable ≥ 0 or Q-integrable:
E(f (X, Y )|X) = h(X) a.s.
R
where h(x) = F f (x, y)Nx (dy) (x ∈ E)
3) In particular, when f : F → R is ≥ 0 or PY -integrable, we have :
C.2. REGULAR CONDITIONAL PROBABILITY 77

Z
E(f (Y )|X)(ω) = f (y)N (X(ω), dy)
E
R
If we put N f (x) = E f (y)N (x, dy), then E(f (Y )|X) = N f (X) P -a.s

C.2.8 Theorem. Let (E, E ) be a measurable space and let F be a polish space with F
its Borel field. Let Q be a probability on the product (E × F, E ⊗ F ). Then there exists
a PRCP N of Q given πE .
78 APPENDIX C. TRANSITION PROBABILITY
Appendix D

Uniform integrability

(Ω, A , P ) is a probability space and E denotes the expectation operator.

D.0.1 Definition. A family (Xi ) of real r.v. is said to be uniformly integrable (or
equiintegrable) if Z
sup |Xi |dP ↓ 0 when α ↑ +∞ (D.1)
i∈I {|Xi |>α}
R
Explicitly: ∀ǫ > 0, ∃α0 > 0, ∀α > 0, ∀i ∈ I, α > α0 ⇒ {|Xi |>α} |Xi |dP <ǫ

D.0.2 Proposition. If |Xi | ≤ X a.s. for all i ∈ I and X is integrable then (Xi , i ∈ I)
is uniformly integrable. In particular any finite family of integrable r.v. is uniformly
integrable.

D.0.3 Proposition. For any X ∈ L1 , and any family Σ of sub-σ-fields of A , the


family (EB (X), B ∈ Σ) is equiintegrable

Proof. For any sub-σ-field B of A , |EB (X)| ≤ EB (|X|). Thus for any a >
0 (|EB (X)|>a) |EB (X)|dP ≤ (EB (|X|)>a) EB (|X|)dP = (EB (|X|)>a) |X|dP , the last
R R R

equality being valid since (EB (|X|) > a) ∈ B. For any b > 0:
R R R
B
(E (|X|)>a) |X|dP = B
(E (|X|)>a)∩(|X|≤b) |X|dP + (EB (|X|)>a)∩(|X|>b) |X|dP

≤ bP (EB (|X|) > a) +


R
(|X|>b) |X|dP

≤ ab E(EB (|X|)) +
R
(|X|>b) |X|dP

= ab E(|X|) +
R
(|X|>b) |X|dP

If we take b = a and let a go to infinity we get the result. 

D.0.4 Definition. A family (Xi , i ∈ I) of real r.v. is said to be P -equicontinuous if:


Z
sup |Xi |dP → 0 when P (A) → 0 (D.2)
i∈I A

79
80 APPENDIX D. UNIFORM INTEGRABILITY

R
Explicitly: ∀ǫ > 0, ∃η > 0, ∀A ∈ A , ∀i ∈ I, P (A) < η ⇒ A |Xi |dP <ǫ
The family (Xi , i ∈ I) is said to be bounded in L1 if :
Z
sup |Xi |dP < +∞ (D.3)
i∈I Ω

D.0.5 Proposition. A family of r.v. (Xi , i ∈ I) is equiintegrable if and only if it is


P -equicontinuous and bounded in L1 .

Proof. Let (Xi , i ∈ I) be equiintegrable. Taking ǫ = 1 in definition D.0.1 there exists


R R
some α > 0 such that for all i ∈ I {|Xi |>α} |Xi |dP < 1. It follows that |Xi |dP ≤ α +
1
R
{|Xi |>α} |Xi |dP < α + 1. Therefore the family is bounded in L . For any ǫ > 0, choose
α such that {|Xi |>α} |Xi |dP < 2ǫ and let η > 0 such that αη < 2ǫ . For any A ∈ A , such
R
R R R
that P (A) < η, and any i ∈ I we have A |Xi | = A 1{Xi >α} |Xi | + A 1{Xi ≤α} |Xi | ≤
ǫ
2 + αη < ǫ. This proves that (Xi , i ∈ I) is P -equicontinuous. Conversely let M be
R R
such that Ω |Xi |dP ≤ M for all i ∈ I. For any ǫ > 0 let η > 0 such that A |Xi | < ǫ
R
for all i ∈ I and all A that satisfy P (A) < η (⋆). Let α > 0. From A 1{Xi >α} |Xi | +
R R R
A 1{X i ≤α} |Xi | = A |Xi | ≤ M , we deduce αP (|Xi | > α) + A 1{Xi ≤α} |Xi | ≤ M , and
finally P (|Xi | > α) ≤ M α . For α >
M
η one has P (|Xi | > α) < η and therefore in view
R
of (⋆), {|Xi |>α} |Xi | < ǫ. 

D.0.6 Proposition. Let (Xn , n ∈ N) be a sequence of integrable r.v. The following


are equivalent:
(i) (Xn ) converges in L1 ,
(ii) (Xn ) is equiintegrable and converges in probability to some a.s. finite r.v.
(iii) (Xn ) is P -equicontinuous and converges in probability to some a.s. finite r.v.
Moreover the limits in (i) (ii) (iii) are equal a.s.

Proof. We prove (i) ⇒ (ii). If (Xn ) converges to Y in L1 , then (Xn ) converges to


Y in probability and Y ∈ L1 is a.s. finite. For any ǫ > 0 there exists N ∈ such that N
kXn − Y k < 2ǫ for all n > N . It follows that for any A ∈ A , A |Xn | ≤ A |Y | + 2ǫ .
R R

There exists η > 0 such that A max(|Y |, |X1 |, . . . , |XN |)dP < 2ǫ for all A ∈ A such
R
R
that P (A) < η. We conclude that for any n ∈ N A |Xn | < ǫ. This means that (Xn ) is
P -equicontinous. Since the family (Xn ) is bounded in L1 , we conclude by proposition
D.0.5 that (Xn ) is equiintegrable.
(ii) ⇒ (iii) is a straightforward consequence of Proposition D.0.5.
We now prove (iii) ⇒ (i). Let (Xn ) be P -equicontinous and convergent in probability
to an a.s. finite r.v. Y . For any ǫ > 0, there exists η > 0 such that for any A ∈ A
with P (A) < η one has A |Xn | < 4ǫ (*). On the other hand convergence of (Xn )
R
81

in probability implies the existence of N ∈ N such that for all m, n ≥ N one has
ǫ
P (|Xm − Xn | > < η (**). Therefore for m, n ≥ N one has:
2)
R R
kXm − Xn k1 = {|Xm −Xn |≤ ǫ } |Xm − Xn | + {|Xm −Xn |> ǫ } |Xm − Xn |
2 2

≤ 2ǫ + A |Xn | + A |Xm |
R R

ǫ ǫ ǫ
≤ 2 + 4 + 4


where we denoted by A = {|Xm − Xn | > 2ǫ } and used (*) and (**). This proves that
(Xn ) is a Cauchy sequence in the Banach space L1 , and therefore convergent and we
know that the limit is Y . 
82 APPENDIX D. UNIFORM INTEGRABILITY
Index

σ-algebra, 59 lemma, 64
σ-field, 59 Filtration, 9
discrete, 4 natural, 9
Function
Canonical
harmonic, 45
process, 34
measurable, 60
sequence, 34
simple, 61
setting, 34
superharmonic, 45
canonical
realization, 35 Initial distribution, 34
Communication Class, 42 Integrable
Compensator, 16 uniformly, 79
Conditional Integral
expectation, 2 of a measurable function, 64
independence, 29 of a positive function, 63
probability, 4 of a simple function, 63
Conditional probability
product, 76 Kernel, 73
quotient, 5, 74 Markov, 73
regular, 4, 74 sub-Markov, 73

Doob Markov
decomposition, 16 matrix, 32
Downcrossing Number, 18 chain, 36
kernel, 73
Equicontinuous, 79
property, 37
Equiintegrable, 79
sequence, 30
Essential
homogeneous, 34
supremum, 66
strong property, 38
Fatou Markov chain

83
84 INDEX

irreducible, 42 function, 61
Martingale, 13 Space
regular, 23 filtered , 9
stopped, 14 measurable, 59
Measurable measure, 62
map, 60 State, 30
space, 59 aperiodic, 52
Measure null, 49
σ-finite, 65 positive, 49
bounded, 65 recurrent, 43
excessive, 45 transient, 43
invariant, 45 stochastic
non trivial, 45 matrix, 32
positive, 62 Submartingale, 13
space, 62 Supermartingale, 13
Supremum
Period, 52
essential, 66
Potential Matrix, 39
Process, 9 Theorem
adapted, 9 of monotone convergence, 63
closed, 9 of dominated convergence, 64
discrete time, 9 of Radon-Nikodym, 65
predictable, 16 Time
reverse, 10 hitting, 10
Random return, 10
process, 9 stopping, 10
Random variable Transition
discrete, 6 matrix, 32
integrable, 1 measure, 73
quasi-integrable, 1 probability, 73
Translation Operator, 35
shift operator, 35
Simple Upcrossing Number, 18

You might also like