0% found this document useful (0 votes)
33 views

Markov Chains

This document provides an introduction to Markov chains. It defines key concepts such as the transition matrix P, the Chapman-Kolmogorov equations, and classifications of states. Specifically, it discusses: 1) Defining Markov chains and their transition matrices P, with entries Pij representing the probability of moving from state i to j. 2) Proving the Chapman-Kolmogorov equations, which relate the probabilities of moving between states over multiple time steps. 3) Classifying states into communicating classes based on whether they can reach each other over time, partitioning the state space into equivalence classes.

Uploaded by

Ananna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Markov Chains

This document provides an introduction to Markov chains. It defines key concepts such as the transition matrix P, the Chapman-Kolmogorov equations, and classifications of states. Specifically, it discusses: 1) Defining Markov chains and their transition matrices P, with entries Pij representing the probability of moving from state i to j. 2) Proving the Chapman-Kolmogorov equations, which relate the probabilities of moving between states over multiple time steps. 3) Classifying states into communicating classes based on whether they can reach each other over time, partitioning the state space into equivalence classes.

Uploaded by

Ananna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Markov Chains

Richard Lockhart

Simon Fraser University

STAT 870 — Summer 2011

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 1 / 86
Purposes of Today’s Lecture

Define Markov Chain, transition matrix.


Prove Chapman-Kolmogorov equations.
Introduce classification of states: communicating classes.
Define hitting times; prove the Strong Markov property.
Define initial distribution.
Establish relation between mean return time and stationary initial
distribution.
Discuss ergodic theorem.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 2 / 86
Markov Chains

Stochastic process: family {Xi ; i ∈ I } of rvs; I the index set. Often


I ⊂ R, e.g. [0, ∞), [0, 1]

Z = {. . . , −2, −1, 0, 1, 2, . . .}

or
N = {0, 1, 2, . . .}.
Continuous time: I is an interval
Discrete time: I ⊂ Z.
Generally all Xn take values in state space S. In following S is a
finite or countable set; each Xn is discrete.
Usually S is Z, N or {0, . . . , m} for some finite m.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 3 / 86
Definition
Markov Chain: stochastic process Xn ; n ∈ N. taking values in a finite
or countable set S such that for every n and every event of the form

A = {(X0 , . . . , Xn−1 ) ∈ B ⊂ S n }

we have
P(Xn+1 = j|Xn = i , A) = P(X1 = j|X0 = i ) (1)
Notation: P is the (possibly infinite) array with elements

Pij = P(X1 = j|X0 = i )

indexed by i , j ∈ S.
P is the (one step) transition matrix of the Markov Chain.
WARNING: in (1) we require the condition to hold only when

P(Xn = i , A) > 0

NOTE: this chain has stationary transitions.


Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 4 / 86
Stochastic matrices
Evidently the entries in P are non-negative and
X
Pij = 1
j

for all i ∈ S.
Any such matrix is called stochastic.
We define powers of P by
X
(Pn )ij = Pn−1

ik
Pkj
k

Notice that even if S is infinite these sums converge absolutely


because for all i X
Pnij = 1.
j

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 5 / 86
Chapman-Kolmogorov Equations 1
Condition on Xl+n−1 to compute P(Xl+n = j|Xl = i ).

P(Xl+n = j|Xl = i )
X
= P(Xl+n = j, Xl+n−1 = k|Xl = i )
k
X
= P(Xl+n = j|Xl+n−1 = k, Xl = i )P(Xl+n−1 = k|Xl = i )
k
X
= P(X1 = j|X0 = k)P(Xl+n−1 = k|Xl = i )
k
X
= P(Xl+n−1 = k|Xl = i )Pkj
k

Now condition on Xl+n−2 to get


X
P(Xl+n = j|Xl = i ) = Pk1 k2 Pk2 j P(Xl+n−2 = k1 |Xl = i )
k1 k2

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 6 / 86
Chapman-Kolmogorov Equations 2
Notice: sum over k2 computes k1 , j entry in matrix PP = P2 .
X
P(Xl+n = j|Xl = i ) = (P2 )k1 ,j P(Xl+n−2 = k1 |Xl = i )
k1

We may now prove by induction on n that

P(Xl+n = j|Xl = i ) = (Pn )ij .

This proves Chapman-Kolmogorov equations:

P(Xl+m+n = j|Xl = i ) =
X
P(Xl+m = k|Xl = i )P(Xl+m+n = j|Xl+m = k)
k

These are simply a restatement of the identity

Pn+m = Pn Pm .
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 7 / 86
Remarks
These probabilities depend on m and n but not on l .
We say the chain has stationary transition probabilities.
A more general definition of Markov chain than (1) is
P(Xn+1 = j|Xn = i , A) = P(Xn+1 = j|Xn = i ) .
Notice RHS now permitted to depend on n.
Define Pn,m : matrix with i , jth entry
P(Xm = j|Xn = i )
for m > n.
Get more general form of Chapman-Kolmogorov equations:
Pr ,s Ps,t = Pr ,t
This chain does not have stationary transitions.
Calculations above involve sums with all terms are positive. They
therefore apply even if the state space S is countably infinite.
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 8 / 86
Extensions of the Markov Property
Function f (x0 , x1 , . . .) defined on S ∞ = all infinite sequences of
points in S.
Example f might be

X
2−k 1(xk = 0).
k=0

Theorem
Let Bn be the event
f (Xn , Xn+1 , . . .) ∈ C
for suitable C in range space of f . Then

P(Bn |Xn = x, A) = P(B0 |X0 = x) (2)

for any event A of the form

{(X0 , . . . , Xn−1 ) ∈ D}
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 9 / 86
Extensions of the Markov Property 2

Theorem
With Bn as before

P(ABn |Xn = x) = P(A|Xn = x)P(Bn |Xn = x) (3)

“Given the present the past and future are conditionally independent.”

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 10 / 86
Markov Property: Proof of (2)
Special case:

Bn = {Xn = x, Xn+1 = x1 , · · · , Xn+m = xm }

LHS of (2) evaluated by repeated conditioning (cf.


Chapman-Kolmogorov):

Px,x1 Px1 ,x2 · · · Pxm−1 ,xm

Same for RHS.


Events defined from Xn , . . . , Xn+m : sum over appropriate vectors
x, x1 , . . . , xm .
General case: monotone class techniques.
To prove (3) write, using (2):

P(ABn |Xn = x) = P(Bn |Xn = x, A)P(A|Xn = x)


= P(Bn |Xn = x)P(A|Xn = x)

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 11 / 86
Classification of States

If an entry Pij is 0 it is not possible to go from state i to state j in one


step. It may be possible to make the transition in some larger number of
steps, however. We say i leads to j (or j is accessible from i ) if there is an
integer n ≥ 0 such that

P(Xn = j|X0 = i ) > 0 .

We use the notation i j. Define P0 to be identity matrix I. Then i j


if there is an n ≥ 0 for which (Pn )ij > 0.
States i and j communicate if i j and j i.
Write i ↔j if i and j communicate.
Communication is an equivalence relation: reflexive, symmetric, transitive
relation on states of S.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 12 / 86
Equivalence Classes
More precisely:
Reflexive: for all i we have i ↔i .
Symmetric: if i ↔j then j↔i .
Transitive: if i ↔j and j↔k then i ↔k.
Proof:
Reflexive: follows from inclusion of n = 0 in definition of leads to.
Symmetry is obvious.
Transitivity: suffices to check that i j and j k imply that i k. But
m n
if (P )ij > 0 and (P )jk > 0 then
X
(Pm+n )ik = (Pm )il (Pn )lk
l
≥ (Pm )ij (Pn )jk
>0

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 13 / 86
Equivalence Classes

Any equivalence relation on a set partitions the set into equivalence


classes; two elements are in the same equivalence class if and only if they
are equivalent.
Communication partitions S into equivalence classes called
communicating classes.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 14 / 86
Example
Example:
1 1 1 1
 
4 4 4 4 0 0 0 0
 
 
 1 1 1 1
0 0 0 0 
 4 4 4 4 
 
 
 1 1 1 1
0 0 0 0 
 4 4 4 4 
 
 
 1 1 1 1
0 0 0 0

 4 4 4 4 
P=
 

 1 1 1
0 0 0 0 0

 4 4 2 
 
 
 0 1 1 1 1
0 0 0
 
4 4 4 4

 
 
1 1
 
 0 0 0 0 0 0 
 2 2 
 
1 1
0 0 0 0 0 0 2 2

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 15 / 86
Example

Find communicating classes: start with say state 1, see where it leads.
1 2, 1 3 and 1 4 in row 1.
Row 4: 4 1. So: (transitivity) 1, 2, 3 and 4 all in the same
communicating class.
Claim: none of these leads to 5, 6, 7 or 8.
Suppose i ∈ {1, 2, 3, 4} and j ∈ {5, 6, 7, 8}.
Then (Pn )ij is sum of products of Pkl .
Cannot be positive unless there is a sequence i0 = i , i1 , . . . , in = j
with Pik−1 ,ik > 0 for k = 1, . . . , n.
Consider first k for which ik ∈ {5, 6, 7, 8}
Then ik−1 ∈ {1, 2, 3, 4} and so Pik−1 ,ik = 0.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 16 / 86
Example

So: {1, 2, 3, 4} is a communicating class.


5 1, 5 2, 5 3 and 5 4.
None of these lead to any of {5, 6, 7, 8} so {5} must be
communicating class.
Similarly {6} and {7, 8} are communicating classes.
Note: states 5 and 6 have special property. Each time you are in either
state you run a risk of going to one of the states 1, 2, 3 or 4. Eventually
you will make such a transition and then never return to state 5 or 6.
States 5 and 6 are transient.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 17 / 86
Hitting Times

To make this precise define hitting times:

Tk = min{n > 0 : Xn = k}

We define
fk = P(Tk < ∞|X0 = k)
State k is transient if fk < 1 and recurrent if fk = 1.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 18 / 86
Geometric Distribution

Let Nk be number of times chain is ever in state k.


Claims:
1 If fk < 1 then Nk has a Geometric distribution:

P(Nk = r |X0 = k) = fkr −1 (1 − fk )

for r = 1, 2, . . ..
2 If fk = 1 then
P(Nk = ∞|X0 = k) = 1

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 19 / 86
Stopping Times
Def’n: A Stopping time for the Markov chain is a random variable T
taking values in {0, 1, · · · } ∪ {∞} such that for each finite k there is a
function fk such that

1(T = k) = fk (X0 , . . . , Xk )

Notice that Tk in theorem is a stopping time.


Standard shorthand notation: by

P x (A)

we mean
P(A|X0 = x) .
Similarly we define
Ex (Y ) = E(Y |X0 = x) .

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 20 / 86
Strong Markov Property

Goal: explain and prove

E(f (XT , . . .)|XT , . . . , X0 ) = EXT (f (X0 , . . .))

Simpler claim:

P(XT +1 = j|XT = i ) = Pij = P i (X1 = j) .

Explanation: given what happens up to and including a random stopping


time T , future behaviour of chain is like that of chain started from XT .

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 21 / 86
Proof of Strong Markov Property
Notation: Ak = {Xk = i , T = k}
Notice: Ak = {XT = i , T = k}:

P(XT +1 = j, XT = i )
P(XT +1 = j|XT = i ) =
P(XT = i )
P
k P(X
P T +1
= j, XT = i , T = k)
=
P k P(XT = i , T = k)
k P(X
P k+1
= j, Ak )
=
k P(A k)
P
k P(XPk+1 = j|Ak )P(Ak )
=
P k P(Ak )
k P(X1P= j|X0 = i )P(Ak )
=
k P(Ak )
= Pi ,j

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 22 / 86
More Proof of Strong Markov Property

Notice use of fact that T = k is event defined in terms of X0 , . . . , Xk .


Technical problems with proof:
◮ It might be that P(T = ∞) > 0. What are XT and XT +1 on the event
T = ∞.
◮ Answer: condition also on T < ∞.
◮ Prove formula only for stopping times where {T < ∞} ∩ {XT = i} has
positive probability.
We will now fix up these technical details.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 23 / 86
Proof of Strong Markov Property continued

Suppose f (x0 , x1 , . . .) is a (measurable) function on S N . Put

Yn = f (Xn , Xn+1 , . . .) .

Assume E( |Y0 | |X0 = x) < ∞ for all x. Claim:

E(Yn |Xn , A) = EXn (Y0 ) (4)

whenever A is any event defined in terms of X0 , . . . , Xn .


1 Family of f for which claim holds includes all indicators; see extension of
Markov Property in previous lecture.
2 family of f for which claim is true is vector space (so if f , g in family
then so is af + bg for any constants a and b.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 24 / 86
Proof of Strong Markov Property continued

So family of f for which claim is true includes all simple functions.


family of f for which claim true is closed under monotone increasing
limits (of non-negative fn ) by the Monotone Convergence theorem.
So claim true for every non-negative integrable f .
Claim follows for integrable f by linearity.
Aside on “measurable”: what sorts of events can be defined in terms of a
family {Yi : i ∈ I }?

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 25 / 86
Strong Markov Property Commentary
Natural: any event of form (Yi1 , . . . , Yik ) ∈ C is “defined in terms of
the family” for any finite set i1 , . . . , ik and any (Borel) set C in S k .
For countable S: each singleton (s1 , . . . , sk ) ∈ S k Borel. So every
subset of S k Borel.
Natural: if you can define each of a sequence of events An in terms of
the Y s then the definition “there exists an n such that (definition of
An ). . .” defines ∪An .
Natural: if A is definable in terms of the Y s then Ac can be defined
from the Y s by just inserting the phrase “It is not true that” in front
of the definition of A.
So family of events definable in terms of the family {Yi : i ∈ I } is a
σ-field which includes every event of the form (Yi1 , . . . , Yik ) ∈ C .
We call the smallest such σ-field, F({Yi : i ∈ I }), the σ-field
generated by the family {Yi : i ∈ I }.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 26 / 86
Use of Strong Markov Property
Toss coin till I get a head. What is the expected number of tosses?
Define state to be 0 if toss is tail and 1 if toss is heads.
Define X0 = 0.
Let N = min{n > 0 : Xn = 1}. Want

E(N) = E0 (N)

Note: if X1 = 1 then N = 1. If X1 = 0 then


N = 1 + min{n > 0 : Xn+1 = 1}.
In symbols:

N = min{n > 0 : Xn = 1} = f (X1 , X2 , · · · )

and
N = 1 + 1(X1 = 0)f (X2 , X3 , · · · )

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 27 / 86
Use of Strong Markov Property

Take expected values starting from 0:

E0 (N) = 1 + E0 {1(X1 = 0)f (X2 , X3 , · · · )}

Condition on X1 and get

E0 (N) = 1 + E0 [E{1(X1 = 0)f (X2 , · · · )|X1 }]

But

E{1(X1 = 0)f (X2 , X3 , · · · )|X1 } = 1(X1 = 0)EX1 {f (X1 , X2 , · · · )}


= 1(X1 = 0)E0 {f (X1 , X2 , · · · )}
= 1(X1 = 0)E0 (N)

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 28 / 86
Use of Strong Markov Property

Hence
E0 (N) = 1 + pE0 {N}
where p is the probability of tails.
Solve for E(N) to get
1
E(N) =
1−p
This is the formula for expected value of the sort of geometric which
starts at 1 and has p being the probability of failure.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 29 / 86
Initial Distributions

Meaning of unconditional expected values?


Markov property specifies only cond’l probs; no way to deduce
marginal distributions.
For every dstbn π on S and transition matrix P there is a a stochastic
process X0 , X1 , . . . with

P(X0 = k) = πk

and which is a Markov Chain with transition matrix P.


Note Strong Markov Property proof used only conditional
expectations.
Notation: π a probability on S. Eπ and P π are expected values and
probabilities for chain with initial distribution π.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 30 / 86
Summary of easy properties

For any sequence of states i0 , . . . , ik

P(X0 = i0 , . . . , Xk = ik ) = πi0 Pi0 i1 · · · Pik−1 ik

For any event A: X


Pπ (A) = πk Pk (A)
k

For any bounded rv Y = f (X0 , . . .)


X
Eπ (Y ) = πk Ek (Y )
k

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 31 / 86
Recurrence and Transience
Now consider a transient state k, that is, a state for which

fk = P k (Tk < ∞) < 1

Note that Tk = min{n > 0 : Xn = k} is a stopping time.


Let Nk be the number of visits to state k. That is

X
Nk = 1(Xn = k)
n=0

Notice that if we define the function



X
f (x0 , x1 , . . .) = 1(xn = k)
n=0

then
Nk = f (X0 , X1 , . . .)
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 32 / 86
Recurrence and Transience 2

Notice, also, that on the event Tk < ∞

Nk = 1 + f (XTk , XTk +1 , . . .)

and on the event Tk = ∞ we have

Nk = 1

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 33 / 86
Proof
In short:
Nk = 1 + f (XTk , XTk +1 , . . .)1(Tk < ∞)
Hence

Pk (Nk = r ) = Ek {P(Nk = r |FT )}


= Ek [P {1 + f (XTk , XTk +1 , . . .)1(Tk < ∞) = r |FT }]
h i
= Ek 1(Tk < ∞)P XTk {f (X0 , X1 , . . .) = r − 1}
n o
= Ek 1(Tk < ∞)P k (Nk = r − 1)
= Ek {1(Tk < ∞)} P k (Nk = r − 1)
= fk P k (Nk = r − 1)

It is easily verified by induction, then, that

Pk (Nk = r ) = fkr −1 P k (Nk = 1)

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 34 / 86
Proof
But Nk = 1 if and only if Tk = ∞ so
Pk (Nk = r ) = fkr −1 (1 − fk )
so Nk has (chain starts from k) Geometric dist’n, mean 1/(1 − fk ).
Argument also shows that if fk = 1 then
P k (Nk = 1) = P k (Nk = 2) = · · ·
which can only happen if all these probabilities are 0. Thus if fk = 1
P(Nk = ∞) = 1
P∞
Since Nk = n=0 1(Xn = k)

X
Ek (Nk ) = (Pn )kk
n=0
So state k is transient if and only if
X∞
(Pn )kk = 1/(1 − fk ) < ∞.
n=0
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 35 / 86
Class properties

Theorem
Recurrence (or transience) is a class property. That is, if i and j are in the
same communicating class then i is recurrent (respectively transient) if
and only if j is recurrent (respectively transient).

Proof:
Suppose i is recurrent and i ↔j. There are integers m and n such that

(Pm )ji > 0 and (Pn )ij > 0

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 36 / 86
Recurrence is a class property

Then
X X X
(Pk )jj ≥ (Pm+k+n )jj ≥ (Pm )ji (Pk )ii (Pn )ij
k k≥0 k≥0
 
X 
= (Pm )ji (Pk )ii (Pn )ij
 
k≥0

The middle term is infinite and the two outside terms positive so
X
(Pk )jj = ∞
k

which shows j is recurrent.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 37 / 86
Existence of Recurrent States

Theorem
A finite state space chain has at least one recurrent state

Proof.
If all states we transient we would have for each k P(Nk < ∞) = 1. This
would mean P(∀k Nk < ∞) = 1. But for any ω there must be at least one
k for which Nk = ∞ (the total of a finite list of finite numbers is
finite).

Infinite state space chain may have all states transient: the chain Xn
satisfying Xn+1 = Xn + 1 on the integers has all states transient.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 38 / 86
Coin Tossing

More interesting example:


Toss a coin repeatedly.
Let Xn be X0 plus the number of heads minus the number of tails in
the first n tosses.
Let p denote the probability of heads on an individual trial.
Xn − X0 is a sum of n iid random variables Yi where P(Yi = 1) = p
and P(Yi = −1) = 1 − p.
SLLN shows Xn /n converges almost surely to 2p − 1.
If p 6= 1/2 this is not 0.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 39 / 86
Coin Tossing Example Continued

In order for Xn /n to have a positive limit we must have Xn → ∞


almost surely.
So all states are visited only finitely many times.
That is, all states are transient.
Similarly for p < 1/2 Xn → −∞ almost surely and all states are
transient.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 40 / 86
Coin Tossing

Now look at p = 1/2. The law of large numbers argument no long shows
anything. I will showP
that all states are recurrent.
Proof: We evaluate n (Pn )00 and show the sum is infinite. If n is odd
then (Pn )00 = 0 so we evaluate
X
(P2m )00
m

Now  
2m 2m −2m
(P )00 = 2
m

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 41 / 86
Coin Tossing

According to Stirling’s approximation


m!
lim √ =1
m→∞ mm+1/2 e −m 2π

Hence
√ 1
lim m(P2m )00 = √
m→∞ π
Since X 1
√ =∞
m
we are done.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 42 / 86
Mean return times

Compute expected times to return.


For x ∈ S let Tx denote the hitting time for x.
Suppose x recurrent in irreducible chain (only one communicating
class).
Derive equations for expected values of different Tx .
Each Tx is a certain function fx applied to X1 , . . ..
Setting µij = Ei (Tj ) we find
X
µij = Ei (Tj 1(X1 = k))
k

Note that if X1 = x then Tx = 1 so

Ei (Tj 1(X1 = j)) = Pij

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 43 / 86
Mean Return Times

For k 6= j, if X1 = k then

Tj = 1 + fj (X2 , X3 , . . .)

and, by conditioning on X1 = k we find


n o
Ei (Tj 1(X1 = k)) = Pik 1 + Ek (Tj )

This gives X
µij = 1 + Pik µkj (5)
k6=j

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 44 / 86
Technical details

Technically, I should check that the expectations in (5) are finite.


All the random variables involved are non-negative, however, and the
equation actually makes sense even if some terms are infinite.
(To prove this you actually study

Tx,n = min(Tx , n)

deriving an identity for a fixed n, letting n → ∞ and applying the


monotone convergence theorem.)

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 45 / 86
Mean Return Times

Here is a simple example:


1 1
 
0 2 2
1 1
P= 2 0 2

1 1
2 2 0

The identity (5) becomes


µ2,1 µ3,1 µ3,2 µ2,3
µ1,1 = 1 + 2 + 2 µ1,2 = 1 + 2 µ1,3 = 1 + 2
µ3,1 µ1,2 µ3,2 µ1,3
µ2,1 = 1 + 2 µ2,2 = 1 + 2 + 2 µ2,3 = 1 + 2
µ2,1 µ1,2 µ1,3 µ2,3
µ3,1 = 1 + 2 µ3,2 = 1 + 2 µ3,3 = 1 + 2 + 2

Seventh and fourth show µ2,1 = µ3,1 . Similar calculations give µii = 3 and
for i 6= j µi ,j = 2.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 46 / 86
Mean Return Times
Coin tossing Markov Chain with p = 1/2 shows situation can be different
when S is infinite. Equations above become:
1 1
m0,0 = 1 + m1,0 + m−1,0
2 2
1
m1,0 = 1 + m2,0
2
and many more.
Some observations:
Have to go through 1 to get to 0 from 2 so

m2,0 = m2,1 + m1,0

Symmetry (switching H and T):

m1,0 = m−1,0

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 47 / 86
More Coin Tossing

Transition probabilities are homogeneous:

m2,1 = m1,0

Conclusion:

m0,0 = 1 + m1,0
1
= 1 + 1 + m2,0
2
= 2 + m1,0

Notice that there are no finite solutions!

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 48 / 86
Coin Tossing Summary

Every state is recurrent.


All the expected hitting times mij are infinite.
All entries Pnij converge to 0.
Jargon: The states in this chain are null recurrent.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 49 / 86
Example
Model: 2 state MC for weather: ‘Dry’ or ‘Wet’.

> p:= matrix(2,2,[[3/5,2/5],[1/5,4/5]]);


[3/5 2/5]
p := [ ]
[1/5 4/5]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> p16:=evalm(p8*p8):

This computes the powers (evalm understands matrix algebra).


Fact:  1 2 
3 3
lim Pn =  
n→∞ 1 2
3 3

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 50 / 86
Example
> evalf(evalm(p2));
[.4400000000 .5600000000]
[ ]
[.2800000000 .7200000000]
> evalf(evalm(p4));
[.3504000000 .6496000000]
[ ]
[.3248000000 .6752000000]
> evalf(evalm(p8));
[.3337702400 .6662297600]
[ ]
[.3331148800 .6668851200]
> evalf(evalm(p16));
[.3333336197 .6666663803]
[ ]
[.3333331902 .6666668098]

Where did 1/3 and 2/3 come from?


Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 51 / 86
Example

Suppose we toss a coin P(H) = αD


Start the chain with Dry if we get heads and Wet if we get tails.
Then (
αD x = Dry
P(X0 = x) =
αW = 1 − αD x = Wet
and
X
P(X1 = x) = P(X1 = x|X0 = y )P(X0 = y )
y
X
= αy Py ,x
y

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 52 / 86
Example

Notice last line is a matrix multiplication of row vector α by matrix P.


A special α: if we put αD = 1/3 and αW = 2/3 then

 3 2
 
  
1 2  5 5
= 1 2
3 3 1 4 3 3
5 5

So: if P(X0 = D) = 1/3 then P(X1 = D) = 1/3 and analogously for


W.
This means that X0 and X1 have the same distribution.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 53 / 86
Initial Distributions
Def’n: A probability vector α is called the initial distribution for the chain
if
P(X0 = i ) = αi
Def’n: A Markov Chain is stationary if

P(X1 = i ) = P(X0 = i )

for all i
Finding stationary initial distributions.
Consider P above.
The equation
αP = α
is really

αD = 3αD /5 + αW /5
αW = 2αD /5 + 4αW /5

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 54 / 86
Initial Distributions

The first can be rearranged to

αW = 2αD .

So can the second.


If α is probability vector then

αW + αD = 1

so we get
1 − αD = 2αD
leading to
αD = 1/3

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 55 / 86
Initial Distributions: More Examples
 
0 1/3 0 2/3
 1/3 0 2/3 0 
P=
 0 2/3 0 1/3 

2/3 0 1/3 0
Set αP = α and get
α1 = α2 /3 + 2α4 /3
α2 = α1 /3 + 2α3 /3
α3 = 2α2 /3 + α4 /3
α4 = 2α1 /3 + α3 /3
1 = α1 + α2 + α3 + α4
First plus third gives
α1 + α3 = α2 + α4
so both sums 1/2. Continue algebra to get unique solution
(1/4, 1/4, 1/4, 1/4) .
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 56 / 86
Initial Distributions
p:=matrix([[0,1/3,0,2/3],[1/3,0,2/3,0],
[0,2/3,0,1/3],[2/3,0,1/3,0]]);

[ 0 1/3 0 2/3]
[ ]
[1/3 0 2/3 0 ]
p := [ ]
[ 0 2/3 0 1/3]
[ ]
[2/3 0 1/3 0 ]
> p2:=evalm(p*p);
[5/9 0 4/9 0 ]
[ ]
[ 0 5/9 0 4/9]
p2:= [ ]
[4/9 0 5/9 0 ]
[ ]
[ 0 4/9 0 5/9]

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 57 / 86
Initial Distributions

> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> p16:=evalm(p8*p8):
> p17:=evalm(p8*p8*p):

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 58 / 86
Initial Distributions

> evalf(evalm(p16));
[.5000000116 , 0 , .4999999884 , 0]
[ ]
[0 , .5000000116 , 0 , .4999999884]
[ ]
[.4999999884 , 0 , .5000000116 , 0]
[ ]
[0 , .4999999884 , 0 , .5000000116]
> evalf(evalm(p17));
[0 , .4999999961 , 0 , .5000000039]
[ ]
[.4999999961 , 0 , .5000000039 , 0]
[ ]
[0 , .5000000039 , 0 , .4999999961]
[ ]
[.5000000039 , 0 , .4999999961 , 0]

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 59 / 86
Initial Distributions

> evalf(evalm((p16+p17)/2));
[.2500, .2500, .2500, .2500]
[ ]
[.2500, .2500, .2500, .2500]
[ ]
[.2500, .2500, .2500, .2500]
[ ]
[.2500, .2500, .2500, .2500]

Pn doesn’t converge but(Pn + Pn+1 )/2 does. Next example:


 2 3 
5 5 0 0
 1 4 0 0 
P= 5 5
 0 0 2 3 

5 5
0 0 51 45

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 60 / 86
Initial Distributions
Solve αP = α:
2 1
α1 = α1 + α2
5 5
3 4
α2 = α1 + α2
5 5
2 1
α3 = α3 + α4
5 5
3 4
α4 = α3 + α4
5 5
1 = α1 + α2 + α3 + α4
Second and fourth equations redundant. Get
α2 = 3α1 3α3 = α4 1 = 4α1 + 4α3
Pick any α1 in [0, 1/4]; put α3 = 1/4 − α1 .
α = (α1 , 3α1 , 1/4 − α1 , 3(1/4 − α1 ))
solves αP = α. So solution is not unique.
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 61 / 86
Initial Distributions

> p:=matrix([[2/5,3/5,0,0],[1/5,4/5,0,0],
[0,0,2/5,3/5],[0,0,1/5,4/5]]);
[2/5 3/5 0 0 ]
[ ]
[1/5 4/5 0 0 ]
p := [ ]
[ 0 0 2/5 3/5]
[ ]
[ 0 0 1/5 4/5]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 62 / 86
Initial Distributions

> evalf(evalm(p8*p8));
[.2500000000 , .7500000000 , 0 , 0]
[ ]
[.2500000000 , .7500000000 , 0 , 0]
[ ]
[0 , 0 , .2500000000 , .7500000000]
[ ]
[0 , 0 , .2500000000 , .7500000000]

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 63 / 86
Initial Distributions
Notice that rows converge but to two different vectors:

α(1) = (1/4, 3/4, 0, 0)

and
α(2) = (0, 0, 1/4, 3/4)
Solutions of αP = α revisited? Check that

α(1) P = α(1)

and
α(2) P = α(2)
If α = λα(1) + (1 − λ)α(2) (0 ≤ λ ≤ 1) then

αP = α

so again solution is not unique.


Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 64 / 86
Initial Distributions: Last example

> p:=matrix([[2/5,3/5,0],[1/5,4/5,0],
[1/2,0,1/2]]);

[2/5 3/5 0 ]
[ ]
p := [1/5 4/5 0 ]
[ ]
[1/2 0 1/2]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> evalf(evalm(p8*p8));
[.2500000000 .7500000000 0 ]
[ ]
[.2500000000 .7500000000 0 ]
[ ]
[.2500152588 .7499694824 .00001525878906]

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 65 / 86
Initial Distributions

Interpretation of examples

For some P all rows converge to some α. In this case this α is a


stationary initial distribution.
For some P the locations of zeros flip flop. Pn does not converge.
Observation: average

P + P2 + · · · + Pn
n
does converge.
For some P some rows converge to one α and some to another. In
this case the solution of αP = α is not unique.
Basic distinguishing features: pattern of 0s in matrix P.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 66 / 86
The ergodic theorem

Consider a finite state space chain.


If x is a vector then the i th entry in Px is
X
Pij xj
j

Rows of P probability vectors, so a weighted average of the entries in


x.
If weights strictly
P between 0, 1 and largest and smallest entries in x
not same then j Pij xj strictly between largest and smallest entries in
x.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 67 / 86
Ergodic Theorem

In fact
X X
Pij xj − min(xk ) = Pij {xj − min(xk )}
j j

≥ min{pij }(max{xk } − min{xk })


j

and X
max{xj } − Pij xj ≥ min{pij }(max{xk } − min{xk })
j
j

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 68 / 86
Ergodic Theorem

Now multiply Pr by Pm .
ijth entry in Pr +m is a weighted average of the jth column of Pm .
i th entry in the jth column of Pr +m must be strictly between the
minimum and maximum entries of the jth column of Pm .
In fact, fix a j.
x m = maximum entry in column j of Pm
x m the minimum entry.
Suppose all entries of Pr are positive.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 69 / 86
Ergodic Theorem

Let δ > 0 be the smallest entry in Pr . Our argument above shows that

x m+r ≤ x m − δ(x m − x m )

and
x m+r ≥ x m + δ(x m − x m )
Putting these together gives

(x m+r − x m+r ) ≤ (1 − 2δ)(x m − x m )

In summary the column maximum decreases, the column minimum


increases and the gap between the two decreases exponentially along the
sequence m, m + r , m + 2r , . . ..

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 70 / 86
Ergodic Theorem

This idea can be used to prove


Theorem
Suppose Xn finite state space Markov Chain with stationary transition
matrix P. Assume that there is a power r such that all entries in Pr are
positive. Then Pk has all entries positive for all k ≥ r and Pn converges,
as n → ∞ to a matrix P∞ . Moreover,

(P∞ )ij = πj

where π is the unique row vector satisfying

π = πP

whose entries sum to 1.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 71 / 86
Proof of Ergodic Theorem

First for k > r X


(Pk )ij = (Pk−r )i ℓ (Pr )ℓj

For each i there is an ℓ for which (Pk−r )i ℓ > 0.


Since (Pr )ℓj > 0 we see (Pk )ij > 0.
The argument before the proposition shows that

lim Pm+jk
j→∞

exists for each m and k ≥ r .

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 72 / 86
Proof of Ergodic Theorem

This proves Pn has a limit which we call P∞ .


Since Pn−1 also converges to P∞ we find

P∞ = P∞ P

Hence each row of P∞ is a solution of xP = x.


The argument before the statement of the proposition shows all rows
of P∞ are equal.
Let π be this common row.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 73 / 86
Proof of Ergodic Theorem

Now if α is any vector whose entries sum to 1 then αPn converges to

αP∞ = π

If α is any solution of x = xP we have by induction αPn = α so


αP∞ = α so α = π.
That is exactly one vector whose entries sum to 1 satisfies x = xP. •
Note conditions:
There is an r for which all entries in Pr are positive.
The chain has a finite state space.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 74 / 86
Finite state space case: Pn need not have limit

Example:  
0 1
P=
1 0

Note P2n is the identity while P2n+1 = P.


Note, too, that
P0 + · · · + Pn 1 1
 
→ 2 2
1 1
n+1 2 2
Consider the equations π = πP with π1 + π2 = 1.
We get
1 1 1
π1 = π1 + (1 − π1 ) =
2 2 2
so that the solution to π = πP is again unique.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 75 / 86
Periodic Chains
Def’n: The period d of a state i is the greatest common divisor of

{n : (Pn )ii > 0}

Lemma
If i ↔ j then i and j have the same period.

Def’n: A state is aperiodic if its period is 1.


I do the case d = 1. Fix i . Let

G = {k : (Pk )ii > 0}

If k1 , k2 ∈ G then k1 + k2 ∈ G .
This (and aperiodic) implies (number theory argument) that there is an r
such that k ≥ r implies k ∈ G .
Now find m and n so that

(Pm )ij > 0 and (Pn )ji > 0

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 76 / 86
Periodic Chains
For k > r + m + n we see (Pk )jj > 0 so the gcd of the set of k such that
(Pk )jj > 0 is 1. •
The case of period d > 1 can be dealt with by considering Pd .
 
0 1 0 0 0
 0 0 1 0 0 
 
P=  1 0 0 0 0 

 0 0 0 0 1 
0 0 0 1 0
For this example {1, 2, 3} is a class of period 3 states and {4, 5} a class of
period 2 states.  
0 1/2 1/2
P= 1 0 0 
1 0 0
has a single communicating class of period 2.
A chain is aperiodic if all its states are aperiodic.
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 77 / 86
Hitting Times
Start irreducible recurrent chain Xn in state i . Let Tj be first n > 0 such
that Xn = j. Define
mij = E(Tj |X0 = i )
First step analysis:
mij = 1 · P(X1 = j|X0 = i )
X
+ (1 + E(Tj |X0 = k))Pik
k6=j
X X
= Pij + Pik mkj
j k6=j
X
=1+ Pik mkj
k6=j

Example
3 2
 
5 5
P= 
1 4
5 5
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 78 / 86
Stationary Initial Distributions: Equations

m11 = 1 + 25 m21 m12 = 1 + 35 m12


m21 = 1 + 45 m21 m22 = 1 + 15 m12
The second and third equations give immediately
5
m12 = and m21 = 5
2
Then plug in to the others to get
3
m11 = 3 and m22 =
2
Notice stationary initial distribution is
 
1 1
,
m11 m22

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 79 / 86
Stationary Initial Distributions
Consider fraction of time spent in state j:
1(X0 = j) + · · · + 1(Xn = j)
n+1
Imagine chain starts in chain i ; take expected value.
Pn r
r =1 Pij + 1(i = j)
n+1
If rows of Pr converge to π then fraction converges to πj ; i.e. limiting
fraction of time in state j is πj .
Heuristic: start chain in i . Expect to return to i every mii time units. So
are in state i about once every mii time units; i.e. limiting fraction of time
in state i is 1/mii .
Conclusion: for an irreducible recurrent finite state space Markov chain
1
πi = .
mii
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 80 / 86
Stationary Initial Distributions
Real proof: Renewal theorem or variant.
Idea: S1 < S2 < · · · are times of visits to i . Segment i :

XSi −1 +1 , . . . , XSi .

Segments are iid by Strong Markov.


Number of visits to i by time Sk is exactly k.
Total elapsed time is Sk = T1 + · · · + Tk where Ti are iid.
Fraction of time in state i by time Sk is
k 1

Sk mii
by SLLN. So if fraction converges to πi must have
1
πi = .
mii

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 81 / 86
Summary of Theoretical Results

For an irreducible aperiodic positive recurrent Markov Chain:


1 Pn converges to a stochastic matrix P∞ .
2 Each row of P∞ is π the unique stationary initial distribution.
3 The stationary initial distribution is given by

πi = 1/mi

where mi is the mean return time to state i from state i .


If the state space is finite an irreducible chain is positive recurrent.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 82 / 86
Ergodic Theorem
Notice slight of hand: I showed
E { ni=0 1(Xi = k)}
P
→ πk
n
but claimed Pn
i =0 1(Xi = k)
→ πk
n
almost surely which is also true. This is a step in the proof of the ergodic
theorem. For an irreducible positive recurrent Markov chain and any f on
S such that Eπ (f (X0 )) < ∞:
Pn
0 f (Xi )
X
→ πj f (j)
n
almost surely. The limit works in other senses, too. You also get
Pn
0 f (Xi , . . . , Xi +k )
→ Eπ {f (X0 , . . . , Xk )}
n
E.g. fraction of transitions from i to j goes to
πi Pij
Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 83 / 86
Positive Recurrent Chains

For an irreducible positive recurrent chain of period d:


1 Pd has d communicating classes each of which forms an irreducible
aperiodic positive recurrent chain.
2 (Pn+1 + · · · + Pn+d )/d has a limit P∞ .
3 Each row of P∞ is π the unique stationary initial distribution.
4 Stationary initial distribution places probability 1/d on each of the
communicating classes in 1.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 84 / 86
Null Recurrent and Transient Chains

For an irreducible null recurrent chain:


1 Pn converges to 0 (pointwise).
2 there is no stationary initial distribution.
For an irreducible transient chain:
1 Pn converges to 0 (pointwise).
2 there is no stationary initial distribution.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 85 / 86
Reducible Chains

For a chain with more than 1 communicating class:


1 If C is a recurrent class the submatrix PC of P made by picking out
rows i and columns j for which i , j ∈ C is a stochastic matrix. The
corresponding entries in Pn are just (PC )n so you can apply the
conclusions above.
2 For any transient or null recurrent class the corresponding columns in
Pn converge to 0.
3 If there are multiple positive recurrent communicating classes then
the stationary initial distribution is not unique.

Richard Lockhart (Simon Fraser University) Markov Chains STAT 870 — Summer 2011 86 / 86

You might also like