0% found this document useful (0 votes)
47 views

Stat333 Notes 2

This document contains course notes for STAT 333 Applied Probability from Winter 2018 at the University of Waterloo. It includes solutions to examples and exercises covering topics like conditional probability, expectation, variance, probability generating functions, and more. Each section contains examples, theorems, and solutions related to probability concepts addressed in that class.

Uploaded by

Drishti Handa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Stat333 Notes 2

This document contains course notes for STAT 333 Applied Probability from Winter 2018 at the University of Waterloo. It includes solutions to examples and exercises covering topics like conditional probability, expectation, variance, probability generating functions, and more. Each section contains examples, theorems, and solutions related to probability concepts addressed in that class.

Uploaded by

Drishti Handa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Winter 2018 STAT 333 Course Notes TABLE OF CONTENTS

richardwu.ca

STAT 333 Course Notes


Applied Probability
Steve Drekic • Winter 2018 • University of Waterloo

Last Revision: April 21, 2018

Table of Contents
1 January 4, 2018 1
1.1 Example 1.1 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example 1.2 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 January 9, 2018 1
2.1 Example 1.3 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Example 1.4 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Example 1.5 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Exercise 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 January 11, 2018 4


3.1 Theorem 2.1 (conditional variance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Example 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Example 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Tutorial 1 6
4.1 Exercise 1: MGF of Erlang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Exercise 2: MGF of Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Exercise 3: Moments from PGF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4 Exercise 4: PGF of Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 January 16, 2018 10


5.1 Example 2.3 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Example 2.4 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Example 2.5 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6 January 18, 2018 13


6.1 Example 2.6 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Example 2.7 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Theorem 2.2 (law of total expectation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.4 Example 2.8 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Tutorial 2 16
7.1 Sum of geometric distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
7.2 Conditional card drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.3 Conditional points from interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

i
Winter 2018 STAT 333 Course Notes TABLE OF CONTENTS

8 January 23, 2018 20


8.1 Example 2.8 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.2 Theorem 2.3 (variance as expectation of conditionals) . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.3 Example 2.9 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9 January 25, 2018 23


9.1 Example 2.10 solution (P (X < Y )) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.2 Example 2.11 solution (P (X < Y ) where X ∼ EXP (λ1 ) and Y ∼ EXP (λ2 ) . . . . . . . . . . . . . 24
9.3 Example 2.12 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

10 Tutorial 3 27
10.1 Mixed conditional distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
10.2 Law of total expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10.3 Conditioning on wins and losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

11 February 1, 2018 30
11.1 Example 3.1 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
11.2 Example 3.2 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
11.3 Example 3.3 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

12 Tutorial 4 32
12.1 Law of total expectations with indicator variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
12.2 Discrete time Markov chain urn example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
12.3 Discrete time Markov chain weather example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

13 February 6, 2018 35
13.1 Example 3.4 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
13.2 Example 3.2 (continued) solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
13.3 Example 3.5 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
13.4 Theorem 3.1 (equivalent states have equivalent periods) . . . . . . . . . . . . . . . . . . . . . . . . 36
13.5 Example 3.6 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

14 February 8, 2018 37
14.1 Theorem 3.2 (communication and recurrent state i implies recurrent state j) . . . . . . . . . . . . . 37
14.2 Theorem 3.3 (communication and recurrent state i implies mutual recurrence among all states) . . 38
14.3 Theorem 3.4 (finite-state DTMCs have at least one recurrent state) . . . . . . . . . . . . . . . . . . 38

15 Tutorial 5 39
15.1 Determining diagram, equivalence classes, period, and transience/recurrence of DTMC . . . . . . . 39
15.2 Discrete time Markov chain consecutive successes example . . . . . . . . . . . . . . . . . . . . . . . 40
15.3 Limiting behaviour of discrete Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

16 February 13, 2018 42


16.1 Theorem 3.5 (recurrent i and not communicate with j implies Pi,j = 0) . . . . . . . . . . . . . . . . 42
16.2 Example 3.3 (continued) solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
16.3 Example 3.7 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
16.4 Example 3.8 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

ii
Winter 2018 STAT 333 Course Notes TABLE OF CONTENTS

17 February 15, 2018 48


17.1 Example 3.9 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
17.2 Example 3.10 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

18 Tutorial 6 49
18.1 Determining fi,i and recurrence using definition (infinite summation) from TPM . . . . . . . . . . 49
18.2 Making random walk recurrent with a reflecting boundary . . . . . . . . . . . . . . . . . . . . . . . 50
(n)
18.3 Determining pmf of Ni ∼ fi,i and the mean number of transitions between re-visit . . . . . . . . . 51

19 February 27, 2018 53


19.1 Example 3.11 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
19.2 Theorem 3.6 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
19.3 Example 3.10 (continued) solution (finding limiting probability using BLT) . . . . . . . . . . . . . 54

20 March 1, 2018 55
20.1 Example 3.12 solution (Gambler’s Ruin) (applying limiting probability) . . . . . . . . . . . . . . . 55

21 Tutorial 7 57
21.1 DTMCs and Ni (minimum n for Xn = i), transience and recurrence, limit probabilities, number of
transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

22 March 6, 2018 61
22.1 Example 3.11 (continued) solution (showing absorption probability equal limiting probabilities) . . 61
22.2 Example 3.13 solution (solving absorption probabilities) . . . . . . . . . . . . . . . . . . . . . . . . 62
22.3 Example 3.14 solution (absorbing states with absorbing recurrent classes) . . . . . . . . . . . . . . 64
22.4 Aside: n-step TPM (P (n) ) for absorbing DTMCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
22.5 Example 3.11 (continued) solution (mean absorption time wi ) . . . . . . . . . . . . . . . . . . . . . 65
22.6 Example 3.13 (continued) solution (mean absorption time wi ) . . . . . . . . . . . . . . . . . . . . . 66
22.7 Example 3.13 (continued) solution (average number of visits prior to absorption Si,l ) . . . . . . . . 66

23 Tutorial 8 67
23.1 Transforming problem into absorption problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
23.2 Mean recurrent time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

24 March 13, 2018 68


24.1 Example 4.1 solution (P (Xj = min{X1 , . . . , Xn }) for Xi ∼ EXP (λi )) . . . . . . . . . . . . . . . . . 68
24.2 Theorem 4.1 (equivalent memoryless property definition) . . . . . . . . . . . . . . . . . . . . . . . . 68
24.3 Theorem 4.2 (exponentials are memoryless) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

25 March 15, 2018 69


25.1 Example 4.2 solution (non-identical exponentials problem) . . . . . . . . . . . . . . . . . . . . . . . 69

26 March 20, 2018 70


26.1 Theorem 4.3 (Poisson processes are Poisson distributed) . . . . . . . . . . . . . . . . . . . . . . . . 70
26.2 Interarrival times Ti between Poisson events are Exponential distributed . . . . . . . . . . . . . . . 72

27 March 22, 2018 73


27.1 Example 4.3 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

iii
Winter 2018 STAT 333 Course Notes TABLE OF CONTENTS

28 March 27, 2018 74


28.1 Example 4.3 (continued) solution (Poisson process with classifications) . . . . . . . . . . . . . . . . 74
28.2 Theorem 4.5 (conditional distribution of first arrival time given N (t) = 1 is uniform) . . . . . . . . 74
28.3 Theorem 4.6 (conditional joint distribution of n arrival times is the n order statistics with U (0, t)) 75

29 March 29, 2018 76


29.1 Example 4.4 solution (average waiting time) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
29.2 Example 4.5 solution (pdf and joint probability of arrival times) . . . . . . . . . . . . . . . . . . . . 77
29.3 Example 5.1 solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

iv
Winter 2018 STAT 333 Course Notes 2 JANUARY 9, 2018

Abstract
These notes are intended as a resource for myself; past, present, or future students of this course, and anyone
interested in the material. The goal is to provide an end-to-end resource that covers all material discussed
in the course displayed in an organized manner. These notes are my interpretation and transcription of the
content covered in lectures. The instructor has not verified or confirmed the accuracy of these notes, and any
discrepancies, misunderstandings, typos, etc. as these notes relate to course’s content is not the responsibility of
the instructor. If you spot any errors or would like to contribute, please contact me directly.

1 January 4, 2018
1.1 Example 1.1 solution
What is the probability that we roll a number less than 4 given that we know it’s odd?
Solution. Let A = {1, 2, 3} (less than 4) and B = {1, 3, 5} (odd). We want to find P (A | B). Note that
A ∩ B = {1, 3} and there are six elements in the sample space S thus
2
P (A ∩ B) 6 2
P (A | B) = = 3 =
P (B) 6
3

1.2 Example 1.2 solution


Show that BIN (n, p) ∼ P OI(λ) when λ = np for n large and p small.
λ
Solution. Let λ = np. Note that p = n n > 0. From the pmf for X ∼ BIN (n, p)
 
n x
p(x) = p (1 − p)n−x
x
n(n − 1)...(n − x + 1) λ x λ
= ( ) (1 − )n−x
x! n n
n(n − 1)...(n − x + 1) λx (1 − nλ )n
= ·
nx x! (1 − nλ )x

Recall limn→∞ (1 − nλ )n = e−λ so


λx · e−λ
lim p(x) =
n→∞ x!

2 January 9, 2018
2.1 Example 1.3 solution
Find the mgf of BIN (n, p) and use that to find E[X] and V ar(X).
Solution. Recall the binomial series is
m  
m
X m
(a + b) = ax bm−x a, b ∈ R, m ∈ N
x
x=0

Let x ∼ BIN (n, p) and so  


n x
p(x) = p (1 − p)n−x x = 0, 1, . . . , n
x

1
Winter 2018 STAT 333 Course Notes 2 JANUARY 9, 2018

Taking the mgf E[etX ]


n  
X n x
ΦX (t) = E[etX ] = p (1 − p)n−x
etx
x
x=0
n  
X n
= (pet )x (1 − p)n−x
x
x=0

from the binomial series we have


Φx (t) = (pet + 1 − p)n t∈R
We can take the first and second derivatives for the first and second moment

Φ0X (t) = n(pet + 1 − p)n−1 pet


Φ00X (t) = np[(pet + 1 − p)n−1 et + et (n − 1)(pet + 1 − p)n−2 pet ]

So E[X] = ΦX (t) |t=0 = np.


For the variance, we need the second moment

E[X 2 ] = ΦX (t) |t=0


= np[1 + (n − 1)p]
2 2
= np + (np) − np

So

V ar(X) = E[X 2 ] − E[X]2


= np + (np)2 − np2 − (np)2
= np(1 − p)

2.2 Example 1.4 solution


Show that Cov(X, Y ) = 0 6⇒ independence.

Solution. We show this using a counter example

y
p(x, y) 0 1 pX (x)
0 0.2 0 0.2
x 1 0 0.6 0.6
2 0.2 0 0.2
pY (y) 0.4 0.6 1

Note that

Cov(X, Y ) = E[XY ] − E[X]E[Y ]

2
Winter 2018 STAT 333 Course Notes 2 JANUARY 9, 2018

where
2 X
X 1
E[XY ] = xyp(x, y) = (1)(1)(0.6) = 0.6
x=0 y=0
2
X
E[X] = xpX (x) = (1)(0.6) + (2)(0.2) = 0.6 + 0.4 = 1
x=0
X1
E[Y ] = ypY (y) = (1)(0.6) = 0.6
y=0

So Cov(X, Y ) = 0.6 − (1)(0.6) = 0. However, p(2, 0) = 0.2 6= pX (2)pY (0) = (0.2)(0.4) = 0.08, thus X and Y are
not independent (they are dependent).

2.3 Example 1.5 solution


Pn
Given
Qn X1 , . . . , Xn are independent r.v’s where ΦX (t) is the mgf of Xi , show that T = i=1 Xi has mgf ΦT (t) =
i=1 ΦXi (t).

Solution. We take the definition of the mgf of T

ΦT (t) = E[etT ]
= E[et(X1 +...+Xn ) ]
= E[etX1 · . . . · etXn ]
= E[etX1 ] · . . . · E[etXn ] independence
Yn
= ΦXi (t)
i=1

2.4 Exercise 1.3


P P
If Xi ∼ P OI(λi ) show that T = Xi ∼ P OI( λi ).

Solution. Recall that P OI(λi ) ∼ BIN (ni , p) where λi = ni p and

ΦXi (t) = (pet + 1 − p)ni ∀t ∈ R

where Xi ∼ BIN (ni , p) i = 1, . . . , m.


Therefore
m
Y
ΦT (t) = (pet + 1 − p)ni
i=1
= (pet + 1 − p)n1 · . . . · (pet + 1 − p)nm
P
= (pet + 1 − p) ni
t∈R

By the mgf uniqueness property, we have


m
X Xm
T = Xi ∼ BIN ( ni , p)
i=1 i=1

3
Winter 2018 STAT 333 Course Notes 3 JANUARY 11, 2018

3 January 11, 2018


3.1 Theorem 2.1 (conditional variance)
Theorem 3.1.
V ar(X1 | X2 = x2 ) = E[X12 | X2 = x2 ] − E[X1 | X2 = x2 ]2

Proof.

V ar(X1 | X2 = x2 ) = E[(X1 − E[X1 | X2 = x2 ])2 | X2 = x2 ]


= E[(X12 − 2E[X1 | X2 = x2 ]X1 + E[X1 | X2 = x2 ]2 ) | X2 = x2 ]
= E[X12 | X2 = x2 ] − 2E[X1 | X2 = x2 ]E[X1 | X2 = x2 ] + E[X1 | X2 = x2 ]2
= E[X12 | X2 = x2 ] − E[X1 | X2 = x2 ]2

3.2 Example 2.1


Suppose that X and Y are discrete random variables having join pmf of the form



1/5 , if x = 1 and y = 0,




2/15 , if x = 0 and y = 1,

1/15 , if x = 1 and y = 2,
p(x, y) =


 1/5 , if x = 2 and y = 0,




 2/5 , if x = 1 and y = 1,

0 , otherwise.
Find the conditional probability of X | (Y = 1). Also calculate E[X | Y = 1] and V ar(X | Y = 1).

Solution. Note: for problems of this nature, construct a table.

y
p(x, y) 0 1 2 pX (x)
0 0 2/15 0 2/15
x 1 1/5 2/5 1/15 2/3
2 1/5 0 0 1/5
pY (y) 2/5 8/15 1/15 1

Then we have
2/15 1
p(0 | 1) = P (X = 0 | Y = 1) = =
8/15 4
2/5 3
p(1 | 1) = P (X = 1 | Y = 1) = =
8/15 4
0
p(2 | 1) = P (X = 2 | Y = 1) = =0
8/15

The conditional pmf of X | (Y = 1) can be represented as follows

4
Winter 2018 STAT 333 Course Notes 3 JANUARY 11, 2018

x 0 1
p(x | 1) 1/4 3/4

We observe X | (Y = 1) ∼ Bern(3/4). We can take the known E[X] = p and V ar(X)p(1 − p) for X ∼ Bern(p),
thus

E[X | (Y = 1)] = 3/4


V ar(X | (Y = 1)) = 3/4(1 − 3/4) = 3/16

3.3 Example 2.2


For i = 1, 2 suppose that Xi ∼ BIN (ni , p) where X1 , X2 are independent (but not identically distributed).
Find conditional distribution of X1 given X1 + X2 = n.

Solution. We want to find conditional pmf of X | (X1 + X2 = n). Let this conditional pmf be denoted by

p(x1 | n) = P (X1 = x1 | X1 + X2 = n)
P (X1 = x1 , X1 + X2 = n)
=
P (X1 + X2 = n)

Recall: X1 + X2 ∼ BIN (n1 + n2 , p) so


 
n1 + n2 n
P (X1 + X2 = n) = p (1 − p)n1 +n2 −n
n

Next, consider

P (X1 = x1 , X1 + X2 = n) = P (X1 = x1 , x1 + X2 = n)
= P (X1 = x1 , X2 = n − x1 )
= P (X1 = x1 )P (X2 = n − x1 ) independence
   
n 1 x1 n1 −x1 n2
= p (1 − p) · pn−x1 (1 − p)n2 −(n−x1 )
x1 n − x1

provided that 0 ≤ x1 ≤ n1 and

0 ≤ n − x1 ≤ n2
− n2 ≤ x1 − n ≤ 0
n − n2 ≤ x1 ≤ n

(from the binomial coefficients). Therefore our domain for x1 is

x1 = max{0, n − n2 }, . . . , min{n1 , n}

5
Winter 2018 STAT 333 Course Notes 4 TUTORIAL 1

Thus we have
P (X1 = x, X1 + x2 = n)
p(x1 | n) =
P (X1 + X2 = n)
n1 x1 n1 −x1 · n2 1 (1 − p)n2 −(n−x1 )
  n−x
x1 p (1 − p) n−x1 p
= n1 +n2 n

n p (1 − p)n1 +n2 −n
n1 n2
 
x1 n−x1
= n1 +n2

n

for x1 = max{0, n − n2 }, . . . , min{n1 , n}.


Recall: A HG(N, r, n) (hypergeometric) distribution has pmf
r N −r
 
x n−x
N
 x = max{0, n − N + r}, . . . , min{n, r}
x

So this is precisely HG(n1 + n2 , x1 , n).


If you think about it: we are choosing x1 successes from n1 trials from the first set X1 and choosing the remaining
n − x1 successes from n2 trials from X2 .

4 Tutorial 1
4.1 Exercise 1: MGF of Erlang
Find the mgf of X ∼ Erlang(λ) and use it to find E[X], V ar(X).
Note that the Erlang’s pdf is for n ∈ Z+ and λ > 0

λn xn−1 e−λx
f (x) = x>0
(n − 1)!

Solution.

λn xn−1 e−λx
Z
tX
ΦX (t) = E[e ]= etx dx
0 (n − 1)!

λn xn−1 e−(λ−t)x
Z
= dx
0 (n − 1)!

Note that the term in the integral is similar to the pdf of Erlang but for λ = λ − t. So we try to fix it so the integral
is this pdf of Erlang

λn xn−1 e−(λ−t)x
Z
ΦX (t) = dx
0 (n − 1)!
λ n ∞ (λ − t)n xn−1 e−(λ−t)x
Z
=( ) dx
λ−t 0 (n − 1)!
λ n
=( ) t<λ
λ−t
since the integral over the positive real line of the pdf of an Erlang(n, λ − t) is 1 and t < λ must hold so the rate
parameter λ − t is positive.

6
Winter 2018 STAT 333 Course Notes 4 TUTORIAL 1

Differentiating,

(1) d λ
ΦX (t) = ( )
dt (λ − t)n
nλn
=
(λ − t)n+1
(2) d nλn
ΦX (t) = ( )
dt (λ − t)n+1
n(n + 1)λn
=
(λ − t)n+2

Thus we have

(1) nλn n
E[X] = ΦX (0) = n+1
=
(λ − t) t=0 λ
n(n + 1)λ n n(n + 1)
(2)
E[X 2 ] = ΦX (0) = =
(λ − t)n+2 t=0 λ2
n(n + 1) n n
V ar(X) = E[X 2 ] − E[X]2 = 2
− = 2
λ λ λ
Remark 4.1. To solve any of these mgfs, it is useful to see if one can reduce the integral into a pdf of a known
distribution (possibly itself).

4.2 Exercise 2: MGF of Uniform


Find the mgf of the uniform distribution on (0, 1) and find E[X] and V ar(X).

Solution. Let X ∼ U (0, 1) so that f (x) = 1 0 ≤ x ≤ 1. We have


Z 1
ΦX (t) = E[etX ] = etx (1)dx
0
x=1
1
= etx
t x=0
= t−1 (et − 1) t 6= 0

Differentiating

(1) d −1 t
ΦX (t) = (t (e − 1))
dt
= t−1 et − t−2 (et − 1)
tet − et + 1
=
t2
(2) d tet − et + 1
ΦX (t) = ( )
dt t2
t2 (tet + et − et ) − 2t(tet − et + 1)
=
t4
2 t t
t e − 2te + 2e − 2 t
=
t3

7
Winter 2018 STAT 333 Course Notes 4 TUTORIAL 1

We may calculate the first two moments by applying L’Hopital’s rule to calculate the limits

(1) tet − et + 1
E[X] = ΦX (t) = lim
t=0
t→∞ t2
tet + et − et
= lim
t→∞ 2t
e t 1
= lim =
t→∞ 2 2
Similarly

(2) t2 et − 2tet + 2et − 2


E[X 2 ] = ΦX (t) = lim
t=0
t→∞ t3
t2 et + 2tet − 2tet − 2et + 2et
= lim
t→∞ 3t2
et 1
= lim =
t→∞ 3 3
So we have
1 1 1
V ar(X) = E[X 2 ] − E[X]2 = − =
3 4 12

4.3 Exercise 3: Moments from PGF


Suppose X is a discrete r.v. on N with pmf p(x). Show how to find the first two moments of X from its pgf.
Solution. By definition, the pgf of X is ΨX (z) = E[z X ] = ∞ x
P
x=0 z p(x).
If we let z = 1, then the sum equals 1. However, if we take its derivative with respect to z just once
∞ ∞
(1) d X x X
ΨX (z) = z p(x) = xz x−1 p(x)
dz
x=0 x=1

Letting z = 1 we can find the first moment



(1)
X
ΨX (1) = lim xz x−1 p(x)
z→1
x=1

X
= xp(x)
x=1
X∞
= xp(x) when x = 0 the term is 0 anyways
x=0
= E[X]

For the second moment, we consider the second derivative



(1) d2 X x
ΨX (z) = z p(x)
dz 2
x=0

X
= x(x − 1)z x−2 p(x)
x=2

8
Winter 2018 STAT 333 Course Notes 4 TUTORIAL 1

Letting z = 1

(2)
X
ΨX (1) = lim x(x − 1)z x−2 p(x)
z→1
x=2

X
= x(x − 1)p(x)
x=2

X
= x(x − 1)p(x)
x=0
= E[X(X − 1)]
= E[X 2 ] − E[X]

(2) (1)
So we have E[X 2 ] = ΨX (1) + ΨX (1). To find the variance
(2) (1) (1)
V ar(X) = ΨX (1) + ΨX (1) − (ΨX (1))2

4.4 Exercise 4: PGF of Poisson


Suppose X ∼ P OI(λ). Find the pgf of X and use it to find E[X] and V ar(X). The pmf of P OI(λ) for λ > 0

λx
f (x) = e−λ x = 0, 1, 2, . . .
x!
Solution.

X −λ
X (zλ)x
ΨX (z) = E[z ] = e
x!
x=0
−λ zλ
=e ·e
λ(z−1)
=e

where the second equality holds since the summation is the Taylor expansion of ezλ .
Differentiating

(1) d λ(z−1)
ΨX (z) = e
dz
= λeλ(z−1)
(2) d λ(z−1)
ΨX (z) = λe
dz
= λ2 eλ(z−1)

9
Winter 2018 STAT 333 Course Notes 5 JANUARY 16, 2018

The moments are thus


(1)
E[X] = ΦX (1) = λeλ(1−1) = λ
(2)
E[X(X − 1)] = ΦX (1) = λ2 eλ(1−1) = λ2
E[X 2 ] = E[X(X − 1)] + E[X] = λ2 + λ
V ar(X) = E[X 2 ] − E[X]2 = λ2 + λ − λ2 = λ

5 January 16, 2018


5.1 Example 2.3 solution
Pm
Let X1 , . . . , Xm be independent r.v.’s where Xi ∼ P OI(λi ). Define Y = i=1 Xi . Find the conditional distribution
Xj | (Y = n).

Solution. We set out to find


P (Xj = xj , Y = n)
p(xj | n) = p(Xj = xj | Y = n) =
P (Y = n)
P (Xj = xj , m
P
i=1 Xi = n)
=
P (Y = n)
P (Xj = xj , Xj + m
P
i=1,i6=j Xi = n)
=
P (Y = n)
Pm
P (Xj = xj , i=1,i6=j Xi = n − xj )
=
P (Y = n)
P (Xj = xj )P ( m
P
i=1,i6=j Xi = n − xj )
= independence of Xi
P (Y = n)

Remember that if Xi ∼ P OI(λi ), then


m
X m
X
Y = Xi ∼ P OI( λi )
i=1 i=1

which can be derived from mgfs (Exercise 1.3). Therefore


m
X m
X
Xi ∼ P OI( λi )
i=1,i6=j i=1,i6=j

Expanding out p(xj | n) with the pdfs


x −
P
e−λj λj j i=1,i6=j λi ( λi )n−xj
P
e i=1,i6=j
xj ! · (n−xj )!
p(xj | n) = −
Pm
λ
i=1 i ·( m λi )n
P
e i=1
n!

where xj ≥ 0 and n − xj ≥ 0 ⇒ 0 ≤ xj ≤ n (from the factorials).

10
Winter 2018 STAT 333 Course Notes 5 JANUARY 16, 2018

Pm
Cancelling out the eλ terms and let λY = i=1 λi
x
n! λj j (λY − λj )n−xj
p(xj | n) =
(n − xj )!xj ! λxYj n−x
λY j
 
n λj λj n−xj
= ( )xj (1 − )
xj λ Y λY

This is the binomial distribution, so we have


λi
Xj | Y = n ∼ BIN (n, )
λY

5.2 Example 2.4 solution


Suppose X ∼ P OI(λ) and Y | (X = x) ∼ BIN (x, p). Find the conditional distribution X | Y = y.
(Note: range of y depends on x (that is y ≤ x). Graphically, we have integral points on and below the y = x line
starting from 0 for both x and y).

Solution. We wish to find the conditional pmf given by X | Y = y or

P (X = x, Y = y)
p(x | y) = P (X = x | Y = y) =
P (Y = y)

Note that also


P (Y = y, X = x)
P (Y = y | X = x) =
P (X = x)
⇒ P (X = x, Y = y) = P (X = x)P (Y = y | X = x)
e−λ λx
 
x y
= · p (1 − p)x−y
x! y

for x = 0, 1, 2, . . . and y = 0, 1, 2, . . . , x (range of y depends on x).


To find the marginal marginal pmf of Y , we use
X
pY (y) = p(x, y)
x

To find the support for x, note that from the graphical region, we realize that x = 0, 1, 2, . . . and y = 0, 1, 2, . . . , x
is equivalent to y = 0, 1, 2, . . . and x = y, y + 1, y + 2, . . ..

11
Winter 2018 STAT 333 Course Notes 5 JANUARY 16, 2018

So
∞ −λ x
X e λ x!
pY (y) = py (1 − p)x−y
x=y
x! (x − y)!y!

λy e−λ py X λx−y (1 − p)x−y
=
y! x=y
(x − y)!

e−λ (λp)y X [λ(1 − p)]x−y
=
y! x=y
(x − y)!
e−λ (λp)y λ(1−p)
= e
y!
e−λp (λp)y
= y = 0, 1, 2, . . .
y!

Note that pY (y) ∼ P OI(λp).


Thus
P (X = x, Y = y)
p(x | y) =
P (Y = y)
e−λ λx x! y
x! ·
(x−y)!y! p (1 − p)x−y
= e−λp (λp)y
y!
e−λ+λp [λ(1 − p)]x−y
=
(x − y)!
e−λ(1−p) [λ(1 − p)]x−y
= x = y, y + 1, y + 2, . . .
(x − y)!

This resembles the POIson distribution with λ = λ(1 − p) but with a slightly modified domain.
So we see that
W | (Y = y) ∼ W + y
where W ∼ P OI(λ(1 − p)). This is the shifted Poisson pmf y units to the right (note that W and y are random
variables).
We can easily find the conditional expectations and variance e.g.

E[X | Y = y] = E[W + y] = E[W ] + y

5.3 Example 2.5 solution


Suppose the joint pdf of X and Y is
(
12
5 x(2 − x − y) , 0 < x < 1, 0 < y < 1,
f (x, y) =
0 , elsewhere

Determine the conditional distribution of X given Y = y where 0 < y < 1. Also calculate the mean of X | (Y = y).
(Note: the graphical region is a unit square box where the bottom left corner is at 0, 0: the inside of the box is the
support).

12
Winter 2018 STAT 333 Course Notes 6 JANUARY 18, 2018

Solution. Using our theory, we wish to find the conditional pdf of X | (Y = y) given by

f (x, y)
fX|Y (x | y) =
fY (y)

For 0 < y < 1


Z ∞
fY (y) = f (x, y)dx
−∞
Z 1
12
= x(2 − x − y)dx
0 5
12 1
Z
= (2x − x2 − xy)dx
5 0
1
12 2 x3 x2 y
= (x − − )
5 3 2 0
12 1 y
= (1 − − )
5 3 2
2
= (4 − 3y)
5
So we have
12
5 x(2 − x − y)
fX|Y (x | y) = 2
5 (4 − 3y)
6x(2 − x − y)
=
4 − 3y
Thus we have
Z 1
E[X | Y ] = x · fX|Y (x | y)dx
0
5 − 4y
=
2(4 − 3y)

6 January 18, 2018


6.1 Example 2.6 solution
Suppose the joint pdf of X and Y is
(
5e−3x−y , 0 < 2x < y < ∞,
f (x, y) =
0 , otherwise

Find the conditional distribution of Y | (X = x) where 0 < x < ∞.


Note the region of support is a “flag” (upright triangle with downawrd point) where the slanted part is the line
y = 2x.

Solution. We wish to find


f (x, y)
fY |X (y | x) =
fX (x)

13
Winter 2018 STAT 333 Course Notes 6 JANUARY 18, 2018

For 0 < x < ∞


Z ∞
fX (x) = f (x, y)dy
Z−∞

= 5e−3x−y dy
2x
Z ∞
−3x
= 5e 5e−y dy
2x

= 5e−3x (−e−y )
2x
= 5e−3x e−2x )
= 5e−5x

so we have fX (x) ∼ Exp(5).

Remark 6.1. The bounds on the integral are in terms of y: it is dependent on x in our f (x, y) definition.

Now
5e−3x−y
fY |X (y | x) =
5e−5x
= e−y+2x y > 2x

Note: recognize the conditional pdf of Y | (X = x) as that of a shifted exponential distribution (2x units to the
right). Specifically, we have
Y | (X = x) ∼ W + 2x
where W ∼ Exp(1). Thus E[Y | (X = x)] = E(W ) + 2x and V ar[Y | (X = x)] = V ar(W ).

6.2 Example 2.7 solution


Suppose X ∼ U (0, 1) and Y | (X = x) ∼ Bern(x). Find the conditional distribution X | (Y = y).
Note: X is continuous and Y | (X = x) is discrete.

Solution. We wish to find


p(y | x)fX (x)
fX|Y (x | y) =
pY (y)
From the given information, we have fX (x) = 1 for 0 < x < 1 Furthermore p(y | x) = Bern(x) = xy (1 − x)1−y for
y = 0, 1. R
For y = 0, 1 note that (from f (x | y)dx = 1)
Z ∞
pY (y) = p(y | x)fX (x)dx
−∞
Z 1
pY (y) = xy (1 − x)1−y dx
0

14
Winter 2018 STAT 333 Course Notes 6 JANUARY 18, 2018

To compute this integral, let’s check pY (0) and pY (1)


Z 1
pY (0) = x0 (1 − x)1−0 dx
0
Z 1
= 1 − xdx
0
1
x2
=x−
2 0
1
=
2
Similarly, take y = 1 where pY (1) = 12 .
In other words, we have that pY (y) = 21 y = 0, 1 so
 
1
Y ∼ Bern
2

So
p(y | x)fX (x)
f (x | y) =
pY (y)
x (1 − x)1−y · 1
y
= 1
2
= 2xy (1 − x)1−y 0<x<1

6.3 Theorem 2.2 (law of total expectation)


Theorem 6.1. For random variables X and Y , E[X] = E[E[X | Y ]].

Proof. WLOG assume X, Y are jointly continuous random variables. We note


Z ∞
E[E[X | Y ]] = E[X | Y = y]fY (y)dy
−∞
Z ∞ Z ∞ 
= xfX|Y (x | y)dx fY (y)dy
−∞ −∞
Z ∞Z ∞
f (x, y)
= x · fY (y)dxdy
fY (y)
Z−∞
∞ Z −∞

= xf (x, y)dxdy
−∞ −∞
Z ∞ Z ∞ 
= x f (x, y)dy dx
Z−∞

−∞

= xfX (x)dx
−∞
= E[X]

15
Winter 2018 STAT 333 Course Notes 7 TUTORIAL 2

6.4 Example 2.8 solution


Suppose X ∼ GEO(p) with pmf pX (x) = (1 − p)x−1 p where x = 1, 2, 3, . . .. Calculate E[X] and V ar(X) using the
law of total expectation.

Solution. Recall E[X] = p1 and V ar(X) = 1−p p2


where X models the number of (independent) trials necessary to
obtain the first success.
Remember: we could manually solve E[X] = ∞ x−1 p and similarly V ar(X) = E[X 2 ] − E[X], or take the
P
x=1 (1 − p)
derivatives of the mgf ΦX (t) = E[etX ]. This is tedious in general.

7 Tutorial 2
7.1 Sum of geometric distributions
Let Xi for i = 1, 2, 3 be independent geometric random variables having the same parameter p. Determine the value
3
X
P (Xj = xj | Xi = n)
i=1

Solution. Note that, by construction, the sum of k independent GEO(p) random variables is distributed as
N B(k, p). Recall that

Xi ∼ GEO(p) ⇒ PXi (x) = (1 − p)x−1 p x = 1, 2, 3, . . .


 
y−1 k
Y ∼ N B(k, p) ⇒ PY (y) = p (1 − p)y−k y = k, k + 1, k + 2, . . .
k−1

Breaking apart the summation


3
X 3
X
P (Xj = xj | Xi = n) = P (Xj = xj | Xj + Xi = n)
i=1 i=1,i6=j
P3
P (Xj = xj , Xj + i=1,i6=j Xi = n)
= P3
P ( i=1 Xi = n)
P (Xj = xj , 3i=1,i6=j Xi = n − xj )
P
=
P ( 3i=1 Xi = n)
P

P (Xj = xj ) · P ( 3i=1,i6=j Xi = n − xj )
P
= Xi ’s are independent
P ( 3i=1 Xi = n)
P

(1 − p)xj −1 p · n−x1j −1 p2 (1 − p)n−xj −2



= n−1 3
 provided that xj ≥ 1 and n − xj ≥ 2
n−3
2 p (1 − p)
(1 − p)xj −1 p · n−x1j −1 p2 (1 − p)n−xj −2

= n−1 3

n−3
2 p (1 − p)
(n − xj − 1)! 2!(n − 3)!
= ·
1!(n − xj − 2)! (n − 1)!
2(n − xj − 1)
= xj = 1, 2, . . . , n − 2
(n − 1)(n − 2)

16
Winter 2018 STAT 333 Course Notes 7 TUTORIAL 2

Note this is a pmf so we can check


n−2 n−2 n−2
X 2(n − x1 ) X 2(n − 1) X 2x
= −
x1
(n − 1)(n − 2) x
(n − 1)(n − 2) x
(n − 1)(n − 2)
1 1
n−2
2(n − 1)(n − 2) 2 X
= − x
(n − 1)(n − 2) (n − 1)(n − 2)
x=1
2 (n − 2)(n − 1)
=2− ·
(n − 1)(n − 2) 2
=2−1
=1

which satisfies the cdf axiom.

7.2 Conditional card drawing


Given N ∈ Z+ cards labelled 1, 2, . . . , N , let X represent the number that is picked. Suppose a second card Y is
picked from 1, 2, . . . , X.
Assuming N = 10, calculate the expected value of X given Y = 8.

Solution. Clearly we have that PX (x) = N1 where x = 1, 2, . . . , N and PY |X (y | x) = x1 for y = 1, 2, . . . , x.


To find the conditional distribution of X | (Y = y) we must identify the joint distribution of X, Y . It immediately
follows that
1
p(x, y) = P (X = x, Y = y) = PY |X (y | x)PX (x) =
xN
for x = 1, 2, . . . , N and y = 1, 2, . . . , x. or equivalently the range can be re-expressed as

y = 1, 2, . . . , N and x = y, y + 1, . . . , N

Remark 7.1. Whenever we want to find the marginal pmf/pdf for a given rv Y , we generally need to re-map the
support such that the support of Y is independent of the other rv X.

Note that
N N
X X 1
PY (y) = p(x, y) =
x=y x=y
xN
N
1 X1
= y = 1, 2, . . . , N
N x=y x

17
Winter 2018 STAT 333 Course Notes 7 TUTORIAL 2

Letting N = 10, we can calculate


10
X
E[X | Y = 8] = xPX|Y (x | 8)
x=8
10
X P (x, 8)
= x
PY (8)
x=8
X10 1
10x
= x 1 P 10 1
x=8 10 z=8 z
10 10
X X 1 −1
= x( )
z
x=8 z=8
1 1 1
= 3( + + )−1
8 9 10
242 −1
= 3( )
720
1080
= ≈ 8.9256
121

7.3 Conditional points from interval


Let us choose a random point from interal (0, 1) denoted as rv X1 . We then choose a random point X2 on the
interval (0, x1 ) hwere x1 is the realized value of X1 .

1. Make assumptions about the marginal pdf f1 (x1 ) and conditional pdf f2|1 (x2 | x1 ).

2. Find the conditional mean E[X1 | X2 = x2 ].

3. Compute P (X1 + X2 ≥ 1).

Solution. 1. It makes sense that X1 ∼ U (0, 1) and X2 | (X1 = x1 ) ∼ U (0, x1 ) so that f1 (x1 ) = 1, 0 < x1 < 1
and f2|1 (x2 | x1 ) = x11 for 0 < x2 < x1 < 1.

2. Note that f1|2 (x1 | x2 ) = ff(x2 (x


1 ,x2 )
2)
and so we need to identify the joint distribution of x1 and x2 as well as the
marginal distribution of X2 . We have

f (x1 , x2 ) = f2|1 (x2 | x1 ) · f1 (x1 )


1
= 0 < x2 < x1 < 1 0 < x1 < 1
x1
or equivalently, the region of support can be re-expressed as

0 < x2 < 1
x2 < x1 < 1

18
Winter 2018 STAT 333 Course Notes 7 TUTORIAL 2

so the marginal pdf of f2 (x2 ) is


Z 1
f2 (x2 ) = p(x1 , x2 )dx1
x1 =x2
Z 1
1
= dx1
x1 =x2 x1
x1 =1
= ln(x1 )
x1 =x2
= − ln(x2 ) 0 < x2 < 1

so the conditional pdf is

f (x1 , x2 )
f1|2 (x1 | x2 ) =
f2 (x2 )
1
= 0 < x2 < x1 < 1
−x1 ln(x2 )

Taking the expectation


Z 1
E[X1 | X2 = x2 ] = x1 p1|2 (x1 , x2 )dx1
x1 =x2
Z 1
1
= x1 · dx1
x1 =x2 −x1 ln(x2 )
Z 1
1
= dx1
x1 =x2 − ln(x2 )
1 − x2
= 0 < x2 < 1
− ln(x2 )

Exercise: solve for limx2 →1 E[X1 | X2 = x2 ] (use LHR).

3. The probability that X1 + X2 ≥ 1 may be calculated by taking the double integral over the region R of their
support where X1 + X2 ≥ 1 holds. This region may be found as follows:

Figure 7.1: The region R is the support where X1 + X2 ≥ 1.

1
The region R is equivalent to the bounds 2 < x1 < 1 and 1 − x1 < x2 < x1 .

19
Winter 2018 STAT 333 Course Notes 8 JANUARY 23, 2018

Integrating f (x1 , x2 ) over R we obtain


Z Z
P (X1 + X2 ≥ 1) = f (x1 , x2 )dx2 dx1
R
Z 1 Z x1
1
= dx2 dx1
1
1−x1 x1
2
1
x2 x2 =x1
Z
= dx1
1 x1 x2 =1−x1
2
Z 1
1
= (2 − )dx1
1 x1
2
x1 =1
= (2x1 − ln(x1 ))
x1 = 12
1
= 1 + ln( )
2
= 1 − ln(2)
≈ 0.3068528

8 January 23, 2018


8.1 Example 2.8 solution
Suppose X ∼ GEO(p) with pmf pX (x) = (1 − p)x−1 p for x = 1, 2, 3 . . .. Calculate E[X], V ar(X) using the law of
total expectation.
Solution. Recall X is modelling the number of trials needed to obtain the 1st success. We want to calculate
E[X] and V ar(X) using the total law of expectation.
Define (
0 if the 1st trial is a failure
Y =
1 if the 1st trial is a success
Note that Y ∼ Bern(p) so that PY (0) = P (Y = 0) = 1 − p and similarly PY (1) = P (Y = 1) = p.
Thus by the law of total expectation

E[X] = E[E[X | Y ]]
1
X
= E[X | Y = y]pY (y)
y=0

= (1 − p)E[X | Y = 0] + pE[X | Y = 1]

Note that
X | (Y = 1) = 1
with probability 1 (one success is equivalent to X = 1 for GEO(p)), and

X | (Y = 0) ∼ 1 + X

(the first one failed, we expect to take X more trials; same initial problem - recurse. See course notes for formal
proof).

20
Winter 2018 STAT 333 Course Notes 8 JANUARY 23, 2018

Thus we have

E[X] = (1 − p)E[1 + X] + p(1)


= (1 − p)(1 + E[X]) + p
= 1 + (1 − p)E[X]
⇒E[X](1 − (1 − p)) = 1
1
⇒E[X] =
p
as expected.
For V ar(X), notice that

E[X 2 ] = E[E[X 2 | Y ]]
1
X
= E[X 2 | Y = y]pY (y)
y=0

= (1 − p)E[X 2 | Y = 0] + pE[X 2 | Y = 1]
= (1 − p)E[(1 + X)2 ] + p(1)2 from above
2
= (1 − p)E[1 + 2X + X ] + p
= (1 − p)(1 + 2E[X] + E[X 2 ]) + p
= 1 + 2(1 − p)E[X] + (1 − p)E[X 2 ]
2(1 − p)
⇒E[X 2 ](1 − (1 − p)) = 1 +
p
1 2(1 − p)
⇒E[X 2 ] = +
p p2
So we have

V ar(X) = E[X 2 ] − E[X]2


1 2(1 − p) 1
= + 2
− 2
p p p
p + 2 − 2p − 1
=
p2
1−p
=
p2
Remark 8.1. For law of total expectations, a large part of it is choosing the right random variable to condition on
(i.e. Y = Bern(p) in this example).

8.2 Theorem 2.3 (variance as expectation of conditionals)


Theorem 8.1. For random variables X and Y

V ar(X) = E[V ar(X | Y )] + V ar(E[X | Y ])

Proof. Recall that


V ar(X | Y = y) = E[X 2 | Y = y] + E[X | Y = y]2

21
Winter 2018 STAT 333 Course Notes 8 JANUARY 23, 2018

so more generally we have


V ar(X | Y ) = E[X 2 | Y ] + E[X | Y ]2
Taing the expectation of this

E[V ar(X | Y )] = E[E[X 2 | Y ] − E[X | Y ]2 ]


= E[E[X 2 | Y ]] − E[E[X | Y ]2 ]
= E[X 2 ] − E[E[X | Y ]2 ] E[A] = E[E[A | B]] (law of total expectation)

Note that
V ar(E[X | Y ]) = V ar(v(Y ))
where v(Y ) = E[X | Y ] is a function of Y (not X!).

V ar(v(Y )) = E[v(Y )2 ] − E[v(Y )]2


= E[E[X | Y ]2 ] − E[X]2 law of total expectation

Therefore we have

E[V ar(X | Y )] + V ar(E[X | Y ]) = E[X 2 ] − E[E[X | Y ]2 ] + E[E[X | Y ]2 ] − E[X]2


= E[X 2 ] − E[X]2
= V ar(X)

as desired.

8.3 Example 2.9 solution


Suppose {Xi }∞ 2
i=1 is an iid sequence of random variables with common mean µ and variance σ . Let N be a discrete,
non-negative integer-valued rv that is
Pindependent of each Xi .
N
Find the mean and variance of T = i=1 Xi (referred to as a random sum).

Solution. To find the mean:


We condition on N since the value of our T depends on how many Xi ’s there are which depends on N . By the law
of total expectations
E[T ] = E[E[T | N ]]
Note that
N
X
E[T | N = n] = E[ Xi | N = n]
i=1
Xn
= E[ Xi | N = n]
i=1
n
X
= E[Xi | N = n] due to independence of Xi and N
i=1
Xn
= E[Xi ]
i=1
= nµ

22
Winter 2018 STAT 333 Course Notes 9 JANUARY 25, 2018

So we have E[T | N ] = N µ.

Remark 8.2. We needed to first condition on a concrete N = n in order to unwrap the summation, then revert
back to the random variable N .

Thus we have
E[T ] = E[E[T | N ]] = E[N µ] = µE[N ]
which intuitively makes sense.
To find the variance:
We use our previous theorem on variance as expectation of conditionals

V ar(T ) = E[V ar(T | N )] + V ar(E[T | N ])

We know from before that


V ar(E[T | N ]) = V ar(N µ) = µ2 V ar(N )
We can break apart the variance as
N
X
V ar(T | N = n) = V ar( Xi | N = n)
i=1
n
X
= V ar( Xi | N = n)
i=1
n
X
= V ar( Xi
i=1
X
= i = 1n V ar(Xi ) independence of Xi
= σ2n

Therefore V ar(T | N )V ar(T | N = n) = σ2N .


n=N
So
E[V ar(T | N )] = E[σ 2 N ] = σ 2 E[N ]
and thus
V ar(T ) = σ 2 E[N ] + µ2 V ar(N )

9 January 25, 2018


9.1 Example 2.10 solution (P (X < Y ))
Suppose X and Y are independent continuous random variables. Find an expression for P (X < Y ).

Solution. Define our event of interest as


A = {X < Y }

23
Winter 2018 STAT 333 Course Notes 9 JANUARY 25, 2018

Thus we have
Z ∞
P (X < Y ) = P (A) = P (A | Y = y)fY (y)dy law of total probability
Z−∞

= P (X < Y | Y = y)fY (y)dy
Z−∞

= P (X < y | Y = y)fY (y)dy
Z−∞

= P (X < y)fY (y)dy X < y only depends on X; Y = y only depends on Y
Z−∞

= P (X ≤ y)fY (y)dy X is a continuous rv
Z−∞

= FX (y)fY (y)dy
−∞

Suppose that X and Y have the same distribution. We expect P (X < Y ) = 21 . Let’s verify it with our expression
Z ∞
P (X < Y ) = FX (y)fY (y)dy
−∞
Z ∞
= FY (y)fY (y)dy X∼Y
−∞

du
Let u = FY (y), thus dy = fY (y) ⇐⇒ du = fY (y)dy. So we have
Z 1
P (X < Y ) = udu domain for a CDF is [0, 1]
0
u2 1
=
2 0
1
=
2

9.2 Example 2.11 solution (P (X < Y ) where X ∼ EXP (λ1 ) and Y ∼ EXP (λ2 )
Suppose X ∼ Exp(λ1 ) and Y ∼ Exp(λ2 ) are independent exponential rvs. Show that

λ1
P (X < Y ) =
λ1 + λ2

Solution. Since Y ∼ Exp(λ2 ), then we have fY (y) = λ2 e−λy for y > 0.


Since X ∼ Exp(λ1 ), we have
Z x
FX (x) = P (X ≤ x) = λ1 e−λ1 x dx
0
x
= −e−λ1 x
0
= 1 − e−λ1 x x≥0

24
Winter 2018 STAT 333 Course Notes 9 JANUARY 25, 2018

From the expression in Example 2.10, we have


Z ∞
P (X < Y ) = FX (y)fY (y)dy
Z0 ∞
= (1 − e−λ1 y )(λ2 e−λ2 y )dy
0
Z ∞
= λ2 e−λy − λ2 e−(λ1 +λ2 )y dy
Z0 ∞ ∞
λ2 λ2
= λ2 e−λy + e−(λ1 +λ2 )y =1−
0 λ 1 + λ 2 0 λ1 + λ2
λ1
=
λ1 + λ2

9.3 Example 2.12 solution


Consider an experiment in which independent trials each having probability p ∈ (0, 1) are performed until k ∈ Z+
consecutive successes are achieved. Determined the expected number of trails for k consecutive successes.

Solution. Let Nk be the rv which counts the number of trials needed to obtain k consecutive successes.
Current goal: we want to find E[Nk ].
Note: when n = 1, then we have N1 ∼ GEO(p), and so E[N1 ] = p1 .
For arbitrary k ≥ 2, we will try to find E[Nk ] using the law of total expectations, namely

E[Nk ] = E[E[Nk | W ]]

for some W rv we choose carefully.


Suppose we choose W where (we will later see why this won’t work)
(
0 if first trial is a failure
W =
1 if first trial is a success

So we have
X
E[Nk ] = E[Nk | W = w]P (W = w)
w
= P (W = 0)E[Nk | W = 0] + P (W = 1)E[Nk | W = 1]
= (1 − p)E[Nk | W = 0] + pE[Nk | W = 1]

Note that

Nk | (W = 0) ∼ 1 + Nk
Nk | (W = 1) ∼?

We can’t simply have Nk | (W = 1) ∼ 1 + Nk−1 since Nk−1 does not guarantee that the k − 1 consecutive successes
are followed immediately after our first W = 1.
Perhaps we need another W , W = Nk−1 so we attempt to find

E[Nk ] = E[E[Nk | Nk−1 ]]

25
Winter 2018 STAT 333 Course Notes 9 JANUARY 25, 2018

Consider
E[Nk | Nk−1 = n]
conditional on Nk−1 = n, defin (
0 if the (n + 1)th trial is a failure
Y =
1 if the (n + 1)th trial is a success
Now we have
X
E[Nk | Nk−1 = n] = E[Nk | Nk−1 = n, Y = y]P (Y = y | Nk−1 = n)
y

= P (Y = 0 | Nk−1 = n)E[Nk | Nk−1 = n, Y = 0]


+ P (Y = 1 | Nk−1 = n)E[Nk | Nk−1 = n, Y = 1]
= (1 − p)E[Nk | Nk−1 = n, Y = 0] + pE[Nk | Nk−1 = n, Y = 1] Y is independent from Nk−1

Note that

Nk | (Nk−1 = n | Y = 0) ∼ n + 1 + Nk we need to start over again


Nk | (Nk−1 = n | Y = 1) ∼ n + 1 with probability 1

Therefore

E[Nk | Nk−1 = n] = (1 − p)(n + 1 + E[Nk ]) + p(n + 1)


= n + 1 + (1 − p)E[Nk ]

which in terms of the rv Nk−1

E[Nk | Nk=1 ] = E[Nk | Nk−1 = n] = Nk−1 + 1 + (1 − p)E[Nk ]


n=Nk−1

Thus from the law of total expectations

E[Nk ] = E[E[Nk | Nk−1 ]]


= E[Nk−1 + 1 + (1 − p)E[Nk ]]
= E[Nk−1 ] + 1 + (1 − p)E[Nk ]
1 E[Nk−1 ]
⇒E[Nk ] = +
p p
This is a recurrence relation for k = 2, 3, 4, . . .. To solve, we check for some k values to gain some intuition

1 E[N1 ] 1 1
k = 2 ⇒ E[N2 ] = + = + 2
p p p p
1 E[N2 ] 1 1 1
k = 3 ⇒ E[N3 ] = + = + 2+ 3
p p p p p
..
.
k
X 1
E[Nk ] = k = 1, 2, 3, . . . by induction
pi
i=1

26
Winter 2018 STAT 333 Course Notes 10 TUTORIAL 3

This is the finite geometric series for r = p1 , thus we have

1 1
p − pk+1
E[Nk ] =
1 − p1

10 Tutorial 3
10.1 Mixed conditional distribution
Suppose X is Erlang(n, λ) with pdf
λn xn−1 e−λx
fX (x) = x>0
(n − 1)!
Suppose Y | (X = x) is P OI(x) with pmf

e−x xy
pY |X (y | x) = y = 0, 1, 2, . . .
y!

Find the condition distribution X | (Y = y).

Solution. The marginal distribution of Y is characterized by its pmf


Z ∞
pY (y) = pY |X (y | x)fX (x)dx
Z−∞

= pY |X (y | x)fX (x)dx
−∞
∞ −x y
λn xn−1 e−λx
Z
e x
= · dx
0 y! (n − 1)!
Z ∞
λn
= xn+y−1 e−(λ+1)x dx
y!(n − 1)! 0
Z ∞
λn (n + y − 1)! (λ + 1)n+y xn+y−1 e−(λ+1)x
= dx
(λ + 1)n+y y!(n − 1)! 0 (n + y − 1)!
λn (n + y − 1)!
= integral of pdf Erlang(n + y, λ + 1)
(λ + 1)n+y y!(n − 1)!
  n  y
n+y−1 λ 1
= y = 0, 1, 2, . . .
n−1 λ+1 λ+1

Note that pY (y) is the Negative Binomial distribution shifted to the left n units. In other words, it counts the
number of “failures” before n successes, where the probability of success if λ/(λ + 1).
The distribution of X | (Y = y) is thus

pY |X (y | x)fX (x)
fX|Y (x | y) =
pY (y)
e−x xy λn xn−1 e−λx
y! (n−1)!
= (n+y−1)! λn
y!(n−1)! (λ+1)n+y

(λ + 1)n+y xn+y−1 e−(λ+1)x


= x>0
(n + y − 1)!

27
Winter 2018 STAT 333 Course Notes 10 TUTORIAL 3

Note that fX|Y (x | y) is exactly the Erlang distribution Erlang(n + y, λ + 1).

10.2 Law of total expectations


1. Let {Xi }Q∞
i=1 an iid sequence of EXP (λ) random variables and let N ∼ GEO(p) be independent of each Xi .
Find E[ N i=1 Xi ].

2. Let {Xi }∞ i
i=0 an iid sequence where Xi ∼ BIN (10, 1/2 ), i = 0, 1, 2, . . .. Also let N ∼ P OI(λ) be independent
of each Xi . Find E[XN ].

1. We want to first find E[ N


Q
Solution. i=1 Xi | N = n] (conditioning on N = n)

N
Y n
Y
E[ Xi | N = n] = E[ Xi | N = n]
i=1 i=1
n
Y
= E[ Xi ] independence of Xi0 s and N
i=1
n
Y
= E[Xi ] independence of Xi0 s
i=1
n
Y 1
=
λ
i=1
1
=
λn
Thus by the law of total expectations
N
Y N
Y
E[ Xi ] = E[E[ Xi | N = n]]
i=1 i=1

X 1
= (1 − p)n−1 p
λn
n=1

p X
= (1 − p)n−1
λn
n=1

p X 1 − p n−1
 
=
λ λ
n=1
∞ 
1 − p n−1
  
p X 1−p
= 1−
λ(1 − 1−p
λ ) n=1
λ λ
p 1−p
= summation of pmf of GEO( )
λ(1 − 1−p
λ )
λ
p
=
λ−1+p
1−p
provided that λ < 1 or 1 − p < λ.

28
Winter 2018 STAT 333 Course Notes 10 TUTORIAL 3

2. Condition on N = n we have
1 10
E[XN | N = n] = E[Xn | N = n] = E[Xn ] = 10 · n
= n
2 2
From the law of total expectations

E[XN ] = E[E[XN | N = n]]



X 10 e−λ λn
=
2n n!
n=0
∞ −λ/2 λ n

−λ/2
X e 2
= 10e
n!
n=0
= 10e−λ/2 summation of pmf of P OI(λ/2)

10.3 Conditioning on wins and losses


A, B, C are evenly matched tennis players. Initially A and B play a set, and the winner plays C. The winner of
each set continues playing the waiting player until one player wins two sets in a row. What is the probability that
A is the overall winner?

Solution. Key idea: we condition on wins and losses each time until we can find some sort of recurrent relationship,
eliminating trivial cases along the way.
Let A denote the event that A is the overall winner, and Wi , Li denote that A wins or loses game i, respectively.
Then we have
1 1
P (A) = P (W1 )P (A | W1 ) + P (L1 )P (A | L1 ) = P (A | W1 ) + P (A | L1 )
2 2
We can then continue conditioning on subsequent games and their possible outcomes
1 1
P (A | W1 ) = P (A | W1 , W2 ) + P (A | W1 , L2 )
2 2
1 1
= (1) + [P (A | W1 , L2 , C wins)
2 2
1
+ P (A | W1 , L2 , C loses)]
2
1 1
= + P (A | W1 , L2 , C loses) P (A | W1 , L2 , C wins) = 0
2 4
since C wins twice in a row
1 1 1
= + [ P (A | W1 , L2 , C loses, W4 )
2 4 2
1
+ P (A | W1 , L2 , C loses, L4 )]
2
1 1
= + P (A | W1 , L2 , C loses, W4 ) P (A | W1 , L2 , C loses, L4 ) = 0
2 8
since B wins twice in a row
1 1
= + P (A | W1 ) since the probability is the same as
2 8
A winning its second game after a win

29
Winter 2018 STAT 333 Course Notes 11 FEBRUARY 1, 2018

8
Solving this recurrence we get P (A | W1 ) = 14 . Similarly

1 1
P (A | L1 ) = P (A | L1 , B wins) + P (A | L1 , B loses)
2 2
1 1 1
= [ P (A | L1 , B loses, W3 ) + P (A | L1 , B loses, L3 )] P (A | L1 , B wins) = 0
2 2 2
since B wins twice in a row
1
= P (A | L1 , B loses, W3 ) P (A | L1 , B loses, L3 ) = 0
4
since C wins twice in a row
1
= P (A | W1 )
4
2
So P (A | L1 ) = 14 .
5
Plugging this into our initial equation we get P (A) = 14 .

11 February 1, 2018
11.1 Example 3.1 solution
A particle moves along the state [0, 1, 2] according to a DTMC whose TPM is given by
 
0.7 0.2 0.1
P =  0 0.6 0.4
0.5 0 0.5

where Pij is the transition probability P (Xn = j | Xn−1 = i).


Let Xn denote the position of the particle after the nth move. Suppose the particle is likely to start in any of the
three states.

1. Calculate P (X3 = 1 | X0 = 0).

2. Calculate P (X4 = 2).

3. Calculate P (X6 = 0, X4 = 2).


(3)
Solution. 1. We wish to determine P0,1 . To get this, we proceed to calculate P (3) = P 3 . So we have
  
0.54 0.26 0.2 0.7 0.2 0.1
3 2
P = (P )P = 0.2
 0.36 0.44  0 0.6 0.4 (11.1)
0.6 0.1 0.3 0.5 0 0.5
 
0.478 0.264 0.258
= 0.36
 0.256 0.384 (11.2)
0.57 0.18 0.25

(3)
So P (X3 = 1 | X0 = 0) = P0,1 = 0.264.

30
Winter 2018 STAT 333 Course Notes 11 FEBRUARY 1, 2018

2. We wish to find α4,2 = P (X4 = 2). So

α4 = (α4,0 , α4,1 , α4,2 )


= α0 P (4)
1 1 1
= ( , , )P 3 P
3 3 3   
0.478 0.264 0.258 0.7 0.2 0.1
1 1 1
= ( , , )  0.36 0.256 0.384  0 0.6 0.4
3 3 3
0.57 0.18 0.25 0.5 0 0.5
 
0.4636 0.254 0.2824
1 1 1
= ( , , )  0.444 0.2256 0.3304
3 3 3
0.524 0.222 0.254
= (0.4772, 0.233867, 0.288933)

So we have P (X4 = 2) = 0.288933.

3. We wish to calculate P (X6 = 0, X4 = 2), which is

P (X6 = 0, X4 = 2) = P (X4 = 2)P (X6 = 0 | X4 = 2)


= (0.288433)P (X2 = 0 | X0 = 2) by stationary assumption
(2)
= (0.288433)P2,0
= (0.288433)(0.6)
= 0.1733598

Continued: what are the equivalence classes of the DTMC?

Solution. Remember we have the TPM  


0.7 0.2 0.1
P =  0 0.6 0.4
0.5 0 0.5
To answer questions of this nature, it is useful to draw a statement transition diagram

We see that all states communicate with each other (there is some path from state i to j and vice versa). Thre is
only one equivalence class, namely {0, 1, 2}. This is an irreduicble DTMC.

31
Winter 2018 STAT 333 Course Notes 12 TUTORIAL 4

11.2 Example 3.2 solution


Consider a DTMC with TPM  
0 1 0 0
0 0 1 0
P =
0

0 0 1
0.5 0 0.5 0
What are the equivalence classes of this DTMC?

Solution. Using a state diagram we have

From the diagram, there is only one equivalence class {0, 1, 2, 3}. This DTMC is irreducible.

11.3 Example 3.3 solution


Consider a DTMC with TPM 1 2 
3 3 0 0
1 1 1 1
P = 2 4 8 8
0 0 1 0
3 1
4 4 0 0
What are the equivalence classes of this DTMC?

Solution. From the state diagram there are two equivalence classes: {2} and {0, 1, 3}. Thus this DTMC is not
irreducible.

12 Tutorial 4
12.1 Law of total expectations with indicator variables
Suppose the number of people who get on the ground floor of an elevator follows P OI(λ). If there are m floors
above the ground floor and if each person is equally likely to get off at each of the m floors, independent of where
the others get off, calculate the expected number of stops the elevator will make before discharging all passengers.

32
Winter 2018 STAT 333 Course Notes 12 TUTORIAL 4

Solution. Let Xi denote the whether or not someone gets off at floor i, that is
(
0 no one gets off at floor i
Xi =
1 someone gets off at floor i

It is easier to think of the case where no one gets off at a floor. That is for N people
1 N
P (Xi ) = (1 − )
m
Since Xi is bernoulli, we have
1 n
E[Xi | N = n] = 1 − (1 − )
m
Let X = X1 + . . . + Xm denote the total number of stops for the elevator. Thus we have

X Xm
E[X] = E[E[X | N = n]] = E[ Xi | N = n]pN (n)
n=0 i=1

X 1 n e−λ λn
= m(1 − (1 − ) )·
m n!
n=0
∞ −λ n ∞ −λ 1 n
X e λ X e ((1 − m )λ)
= m[ − ]
n! n!
n=0 n=0
∞ −(1− 1 )λ 1 n
−λ X e m ((1 − m )λ)
= m[1 − e m ]
n!
n=0
−λ
= m(1 − e m )

where the second last and third last equality follows from the summation of Poisson distributions.

12.2 Discrete time Markov chain urn example


Three white and three black balls are distributed in two urns in such a way that each contains three balls. We say
that the system is in state i, i = 0, 1, 2, 3, if the first urn contains i white balls. At each step, we draw one ball from
each urn and place the ball drawn from the first urn into the second, and conversely with the ball from the second
urn. Let Xn denote the state of the system after the nth step. Explain why {Xn , n = 0, 1, 2, . . .} is a Markov chain
and calculate its transition probability matrix.

Solution. Since the determination of the number of white balls in the first urn in the nth step is only dependent
on the same information in the (n − 1)th step, the process {Xn , n = 0, 1, 2, . . .} satisfies the Markov property
assumption, and hence is a Markov chain.
Let Urn 1 contain i white balls and 3 − i black balls (and Urn 2 has 3 − i white balls and i black balls). There are 9
possible ways to choose a ball from each so we do a case analysis on what color ball is chosen from each

Urn 1 Urn 2 Xn Xn+1


W B i i−1 with probability i2 /9
W W i i with probability i(3 − i)/9
B B i i with probability (3 − i)i/9
B W i i+1 with probability (3 − i)2 /9

33
Winter 2018 STAT 333 Course Notes 12 TUTORIAL 4

So for i = 0, 1, 2, 3 we have
i2 2i(3 − i) (3 − i)2
Pi,i−1 = Pi,i = Pi,i+1
9 9 9
thus we have the TPM  
0 1 0 0
1 4 4
0
P = 9 9 9 
0 4 4 1
9 9 9
0 0 1 0

12.3 Discrete time Markov chain weather example


On a given day, the weather is either clear (C), overcast (O), or raining (R). If the weather is clear today, then it
will be C, O, or R tomorrow with respective probabilities 0.6, 0.3, 0.1. If the weather is overcast today, then it will
be C, O, or R tomorrow with probabilities 0.2, 0.5, 0.3. If the weather is raining today, then it will be C, O, or R
tomorrow with probabilities 0.4, 0.2, 0.4. Construct the one-step transition probability matrix and use it to find α1 ,
P (2) , and the probability that it rains on both the first and third days. Assume that the initial probability row
vector is given by a0 = (0.5, 0.3, 0.2).

Solution. Let state 0, 1, 2 be the states for clear, overcast, and raining. Then we can easily construct the TPM
from the given probabilities of transitioning to the other states given an initial state
 
0.6 0.3 0.1
P = 0.2 0.5 0.3
0.4 0.2 0.4

To find α1 , we simply take α0 P or


 
0.6 0.3 0.1
α1 = (0.5, 0.3, 0.2) 0.2 0.5 0.3 = (0.44, 0.34, 0.22)
0.4 0.2 0.4

To find P (2) , we take P P or


    
0.6 0.3 0.1 0.6 0.3 0.1 0.46 0.35 0.19
0.2 0.5 0.3 0.2 0.5 0.3 = 0.34 0.37 0.29
0.4 0.2 0.4 0.4 0.2 0.4 0.44 0.30 0.26

We know the probability it rains on the first day α1,2 or 0.22.

Remark 12.1. We can’t just take α3,2 as the probability it rains on the third day then apply the multiplication
rule since we have some prior that the first day also rains. So we must use conditional probability here.

Thus we have

P (X3 = 2, X1 = 2) = P (X3 = 2 | X1 = 2)P (X1 = 2)


= P (X2 = 2 | X0 = 2)P (X1 = 2) stationary assumption
(2)
= P2,2 α1,2 = (0.26)(0.22) = 0.0572

(2)
Where P (X2 = 2 | X0 = 2) = P2,2 since we already know day 0 has state 2.

34
Winter 2018 STAT 333 Course Notes 13 FEBRUARY 6, 2018

13 February 6, 2018
13.1 Example 3.4 solution
Consider the DTMC with TPM 1 2
3 0 0 3
1 1 1 1
P =2 4 8 8
0 0 1 0
3 1
4 0 0 4

which has equivalence classes {0, 3}, {1}, and {2}. Determine the period of each state.

Solution. Consider the state 0. There are many paths which we can go from state 0 to 0 in n steps, but one
obvious one is simply going from 0 to 0 n times which has probability (P0,0 )n . Therefore
(n)
P0,0 ≥ (P0,0 )n = (1/3)n > 0 ∀n ∈ Z+

(n)
Thus d(0) = gcd{n ∈ Z+ | P0,0 > 0} = gcd{1, 2, 3, . . .} = 1 (there is a way to get from 0 to 0 in any n ∈ Z+ steps,
so we take the gcd of Z+ which is 1).
In fact, since every term on the main diagonal of P is positive, the same argument holds for every state. Thus
d(1) = d(2) = d(3) = 1.

13.2 Example 3.2 (continued) solution


Recall for the DTMC with TPM  
0 1 0 0
0 0 1 0
P =
0

0 0 1
0.5 0 0.5 0
there is 1 equivalence class {0, 1, 2, 3} with state diagram

(note the bi-direction between 2 and 3). Determine the period for each state.

Solution. We see that in one of the loops 0 → 1 → 2 → 3 → 0 we have


(n)
P0,0 > 0 for n = 4, 8, 12, 16, . . .

Also we have (for the cycle 0 → 1 → 2 → 3 → 2 → 3 → 0)


(n)
P0,0 > 0 for n = 6, 10, 14, 18, . . .

Thus d(0) = gcd{4, 6, 8, 10, 12} = 2.

35
Winter 2018 STAT 333 Course Notes 13 FEBRUARY 6, 2018

Following a similar line of logic, we find

d(1) = gcd{4, 6, 8, 10, 12, . . .} = 2


d(2) = gcd{2, 4, 6, 8, 10, 12, . . .} = 2
d(3) = gcd{2, 4, 6, 8, 10, 12, . . .} = 2

13.3 Example 3.5 solution


Consider the DTMC with TPM  
1/2 1/2 0 0
2/3 1/3 0 0
P =
 0

0 0 1
0 0 1 0
Find the equivalence class of this DTMC and determine the period of each state.
Solution. To determine the equivalence class we draw the state diagram

Clearly the equivalence classes are {0, 1} and {2, 3}.


As in Example 3.4, the main diagonal terms for rows 0 and 1 are positive (i.e. P0,0 , P1,1 > 0) and so d(0) = d(1) = 1.
For states 2 and 3 the DTMC will continually alternate (with probability 1) between each other at every step (i.e.
2 → 3 → 2 → 3 → . . .). Thus it is clear that
(n)
d(2) = gcd{n ∈ Z+ | P2,2 > 0} = gcd{2, 4, 6, 8, . . .} = 2
(n)
d(3) = gcd{n ∈ Z+ | P3,3 > 0} = gcd{2, 4, 6, 8, . . .} = 2

13.4 Theorem 3.1 (equivalent states have equivalent periods)


Theorem 13.1. If i ↔ j (they communicate), then d(i) = d(j).
(n) (m)
Proof. Assume d(i) 6= d(j). Since i ↔ j, we know by definition that Pi,j > 0 for some n ∈ Z+ and Pj,i > 0 for
some m ∈ Z+ . Moreover since state i is accessible from state j and state j is accessible from state i, ∃s ∈ Z+ such
(s)
that Pj,j > 0.
Clearly we have that
(n+m) (n) (m)
Pi,i ≥ Pi,j · Pj,i > 0
(paths that take n steps to i to j then m steps to j to i is one such possible path from i to i in n + m steps. There
could be more n + m paths hence the ≥. This also follows from the Chapman-Kolmogorov equations.)
In addition,
(n+s+m) (n) (s) (m)
Pi,i ≥ Pi,j · Pj,j · Pj,i > 0
So we have paths with n + m and n + s + m steps, thus d(i) divides both n + m and n + s + m. Therefore it follows
that d(i) divides their differencely, namely (n + s + m) − (n + m) = s. Since this holds true for any s which satisfies
(s)
Pj,j > 0, then it must be the case that d(i) divides d(j).
Using the same line of logic, it is straightforward to show d(j) divides d(i).

36
Winter 2018 STAT 333 Course Notes 14 FEBRUARY 8, 2018

Putting these two arguments together, we deduce that d(i) = d(j).

13.5 Example 3.6 solution


Consider the DTMC with TPM  
0 1/2 1/2
P = 1/2 0 1/2
1/2 1/2 0
Find the equivalence classes and determine the period of the states.
Solution. The state transition diagram looks like

Clearly the DTMC is irreducible (i.e. there is just one class {0, 1, 2}).
Note that
(1)
P0,0 = 0
 2
(2) 1 1
P0,0 ≥ P0,1 P1,0 = = >0
2 4
 3
(3) 1 1
P0,0 ≥ P0,1 P1,2 P2,0 = = >0
2 8

Clearly d(0) = gcd{2, 3, . . .} = 1.


By the above theorem, we have d(0) = d(1) = d(2) = 1.

14 February 8, 2018
14.1 Theorem 3.2 (communication and recurrent state i implies recurrent state j)
Theorem 14.1. If i ↔ j (communicate) and state i is recurrent, then state j is recurrent.
Proof. Since j ↔ j, ∃m, n ∈ N such that
(m)
Pi,j > 0
(n)
Pj,i > 0

Also since state i is recurrent, then we have



(l)
X
Pi,i = ∞
l=1

37
Winter 2018 STAT 333 Course Notes 14 FEBRUARY 8, 2018

Suppose that s ∈ Z+ . Note that


(n+s+m) (m) (s) (n)
Pj,j ≥ Pj,i · Pi,i · Pi,j
Now to show that j is recurrent, we show that the following series diverges
∞ ∞
(k) (k)
X X
Pj,j ≥ Pj,j
k=1 k=n+m+1

(n+s+m)
X
= Pj,j
s=1

(m) (s) (n)
X
≥ Pj,i · Pi,i · Pi,j
s=1

(m) (n) (s)
X
= Pj,i · Pi,j Pi,i
s=1
=∞

(m) (n)
since Pj,i , Pi,j > 0 and the series diverges by our premise. Therefore j is recurrent.

Remark 14.1. A by-product of the above theorem is that if i ↔ j and state i is transient, then state j is transient.

14.2 Theorem 3.3 (communication and recurrent state i implies mutual recurrence among
all states)
Theorem 14.2. If i ↔ j and state i is recurrent, then

fi,j = P (DTMC ever makes a future visit to state j | X0 = i) = 1

Proof. Clearly the result is true if i = j. Therefore suppose that i 6= j. Since i ↔ j, the fact that state i is recurrent
implies that state j is recurrent by the previous theorem and fj.j = 1.
To prove fi,j = 1, suppose that fi,j < 1 and try to get a contradiction.
(n)
Since i ← j, ∃n ∈ Z+ such that Pj,i > 0 i.e. each time the DTMC visits state j, there is the possibility of being in
(n)
state i n time units later with probability Pj,i > 0.
If we are assuming that fi,j < 1, then this implies that the probability of returning to state j after visiting i in the
future is not guaranteed (as 1 − fi,j > 0). Therefore

1 − fj,j = P (DTMC never makes a future visit to state j | X0 = j)


(n)
= Pj,i · (1 − fi,j )
>0 both > 0

This implies that 1 − fj,j > 0 or fj,j < 1, which is a contradiction. Therefore fi,j = 1.

14.3 Theorem 3.4 (finite-state DTMCs have at least one recurrent state)
Theorem 14.3. A finite-state DTMC has at least one recurrent state.

Proof. Equivalently, we want to show that not all states can be transient.
Suppose that {0, 1, 2, . . . , N } represents the states of the DTMC where N < ∞ (finite).

38
Winter 2018 STAT 333 Course Notes 15 TUTORIAL 5

To prove that not all states can be transient, we suppose they are all transient and try to get a contradiction.
Now for each i = 0, 1, . . . , N , if state i is assumed to be transient, we know that after a finite amount of time
(denoted by Ti ), state i will never be visited again. As a result, after a finite amount of time T = {T0 , T1 , . . . , Tn }
has gone by none of the states will be visited again.
However, the DTMC must be in some state after time T but we have exhausted all states fro the DTMC to be
in. This is a contradiction thus not all states can be transient in a finite state DTMC.

15 Tutorial 5
15.1 Determining diagram, equivalence classes, period, and transience/recurrence of DTMC
For the following Markov chain, draw its state transition diagram, determine its equivalence classes, and the periods
of states within each class, and determine whether they are transient or recurrent
 
0 1 0 0 0
0 2 1 0 0 
3 3
1 4
 
P = 0 5 5 03 01 

0 0 0 
4 4
0 0 0 0 1

Solution. The state diagram for the DTMC is

From the diagram we see the only closed loop path containing more than one state is 1 → 2 → 1, thus we have the
equivalence classes {0}, {1, 2}, {3}, {4}.
Clearly 0 → 1 with a probability of 1 so 0 is transient and its period is ∞.
Notice for states 1, 2, 3, 4 the main diagonal entries Pi,i > 0 so their periods are 1.

39
Winter 2018 STAT 333 Course Notes 15 TUTORIAL 5

For transience/recurrence, note that


∞ ∞
(n)
X X
P0,0 = 0 < ∞ ⇒ {0} is transient
n=1 n=1
∞ ∞  n 3
X (n)
X 3 4
P3,3 = = 3 =3 < ∞ ⇒ {3} is transient
4 1− 4
n=1 n=1
∞ ∞
(n)
X X
P4,4 = 1 = ∞ ⇒ {4} is recurrent
n=1 n=1

(n)
For class {1, 2}, consider the values of f1,1

(1) 2
f1,1 = P (X1 = 1 | X0 = 1) = P1,1 =
3
(2) 1 1
f1,1 = P (X2 = 1, X1 6= 1 | X0 = 1) = P (X2 = 1, X1 = 2 | X0 = 1) = P1,2 P2,1 = ·
3 5
(3)
f1,1 = P (X3 = 1, X2 6= 1, X1 6= 1 | X0 = 1)
1 4 1
= P (X3 = 1, X2 = 2, X1 = 2 | X0 = 1) = P1,2 P2,2 P2,1 = ·
·
3 5 5
 2
(4) 1 4 1
f1,1 = P (X4 = 1, X3 = 2, X2 = 2, X1 = 2 | X0 = 1) = P1,2 (P2,2 )2 P2,1 = · ·
3 5 5
 n−2
(n) 1 4 1
f1,1 = P (Xn = 1, Xn−1 = 2, . . . , X2 = 2, X1 = 2 | X0 = 1) = P1,2 (P2,2 )n−2 P2,1 = · ·
3 5 5

So we have

(n)
X
f1,1 = f1,1
n=1
∞  
2 1 1 X 4 n−2
= + ·
3 3 5 5
n=2

1 X 4 n
 
2
= +
3 15 5
n=0
2 1 1
= + 4
3 15 1 − 5
2 1
= +
3 3
=1

which by definition implies class {1, 2} is recurrent.

15.2 Discrete time Markov chain consecutive successes example


Let k ∈ Z+ and consider an experiment in which independent trials, each having success probability p ∈ (0, 1),
are conducted indefinitely. We say that the system is in state i at time n if after the nth trial we have observed i
consecutive successes occurring at times n − i + 1, n − i + 2, . . . , n − 1, n where i = 1, 2, . . . , k − 1. In addition, we

40
Winter 2018 STAT 333 Course Notes 15 TUTORIAL 5

say that the system is in state k at time n if we have ever observed at least k consecutive successes occur at least
once by time n. Finally, if we have not observed k consecutive successes by time n, and the nth trial was a failure,
then we say that the system is in state 0. Letting Xn denote the state of the system after the nth trial, calculate its
transition probability matrix.

Solution. Note at any state k = 1, 2, . . . , k − 1 we always have a probability of 1 − p of failing and going back to
state 0.
To transition from state k to state k + 1 successes, we must get a success on the next trial with probability p, unless
it’s already in state k.
Finally, once we reach state k (we have seen k consecutive successes), any successes or failures will not change the
system’s state at k, so Pk,k = 1.
Thus we have
0 1 2 ... k−2 k−1 k
 
0 1 − p p 0 ... 0 0 0
1
 1 − p 0 p ... 0 0 0 

2   1 − p 0 0 ... 0 0 0 

P = ...  ..
 
 . 

k−2  1 − p 0 0 ... 0 p 0 

k−1  1 − p 0 0 . . . 0 0 p 
k 0 0 0 ... 0 0 1
The corresponding state diagram of this DTMC is

15.3 Limiting behaviour of discrete Markov chains


For a DTMC {Xn , n ∈ N} suppose we are given initial probability row vector α0 = (1, 0) and TPM

0 1
 
0 1−p p
P =
1 0 1

where 0 < p < 1. Find P (n) and αn .

Solution. We know the n-step TPM is simply the one-step TPM multipled by itself n times or
n  
(n)
Y 1−p p
P =
0 1
i=1

41
Winter 2018 STAT 333 Course Notes 16 FEBRUARY 13, 2018

Note that
1
(n) (n−1) (n−1) (n−1) (n−1)
X
P0,0 = P0,k Pk,0 = P0,0 P0,0 + P0,1 P1,0 = P0,0 (1 − p) P1,0 = 0
k=0
(n−2) (1)
= P0,0 (1 − p)2 = . . . = P0,0 (1 − p)n−1 = (1 − p)n
1
(n) (n−1) (n−1) (n−1) (n−1)
X
P1,0 = P1,k Pk,0 = P1,0 P0,0 + P1,1 P1,0 = P1,0 (1 − p) P1,0 = 0
k=0
(n−2) (1)
= P1,0 (1 − p)2 = . . . = P1,0 (1 − p)n−1 = 0(1 − p)n−1 = 0
1
(n) (n−1) (n−1) (n−1) (n−1) (1)
X
P1,1 = P1,k Pk,1 = P1,0 P0,1 + P1,1 P1,1 = P1,1 1 = . . . = P1,1 = 1
k=0

for all n ∈ N. Using the fact that each row sum of a TPM must equal 1 we have
(n) (n)
P0,1 = 1 − P0,0 = 1 − (1 − p)n n∈N

So we have
0 1
p)n 1 − (1 − p)n
 
(n) 0 (1 −
P = n∈N
1 0 1
and by definition

(1 − p)n 1 − (1 − p)n
 
(n)
αn = α0 P = (1, 0) = ((1 − p)n , 1 − (1 − p)n )
0 1

16 February 13, 2018


16.1 Theorem 3.5 (recurrent i and not communicate with j implies Pi,j = 0)
Theorem 16.1. If state i is recurrent and state i does not communicate with state j, then Pi,j = 0.

Proof. Let us assume that Pi,j > 0 (one-step transition probability) and try to get a contradiction.
(n)
Then Pj,i = 0 ∀n ∈ Z+ otherwise states i, j communicate.
However, the DTMC, starting in state i, would have a positive probability of at least Pi,j of never returning to
state i. This contradicts the recurrence of state i, therefore we must have Pi,j = 0.

16.2 Example 3.3 (continued) solution


Recall our earlier DTMC with TPM 1 2 
3 3 0 0
1 1 1 1
P = 2 4 8 8
0 0 1 0
3 1
4 4 0 0
Determine whether each state is transient or recurrent.
(n)
Solution. Note that ∞
P P∞
n=1 P2,2 = n=1 1 = ∞ which implies state 2 is recurrent.

42
Winter 2018 STAT 333 Course Notes 16 FEBRUARY 13, 2018

Looking at the possible transitions that can take place among states 0, 1, and 3 we strongly suspect state 1 to be
transient (since there is a positive probability of never returning to state 1 if a transition to state 2 occurs). To
show this formally assume state 1 is recurrent and try to get a contradiction.
So if state 1 is recurrent, note that state 1 does not communicate with state 2. By theorem 3.5, we have that
P1,2 = 0, but in fact P1,2 = 18 > 0. This is a contradiction, so state 1 must be transient and so {0, 1, 3} is transient.

16.3 Example 3.7 solution


Consider a DTMC with TPM 1 3 
4 0 4 0
0 1 2
3 0 3
P =
0 1 0 0
2 3
0 5 0 5
Determine whether each state is transient or recurrent.

Solution. The equivalence classes are {0}, {2}, {1, 3}.


Note that (since 0 is in its own equivalence classes)
∞ ∞ 1
X (n)
X 1 1
P0,0 = ( )n = 4 1 = <∞
4 1− 4
3
n=1 n=1

so state 0 is transient.
Furthermore (since state 2 is in its own equivalence class)
∞ ∞
(n)
X X
P2,2 = 0=0<∞
n=1 n=1

so state 2 is also transient.


On the other hand, concerning the class {1, 3} we observe

(1) 1
f1,1 = P1,1 =
3
AND
(n)
X
f1,1 = (2/3)(3/5)n−2 (2/5) n ≥ 2
n=2

43
Winter 2018 STAT 333 Course Notes 16 FEBRUARY 13, 2018

Now
∞ ∞
X (n) 1 X 2 3 n−2 2
f1,1 = f1,1 = ( )( ) ( )
3 3 5 5
n=1 n=2
1 2 2 X 3
= + ( )( ) ( )n−2
3 3 5 5
n=2
1 2 2 1
= + ( )( ) · 3
3 3 5 1− 5
=1

By definition since f1,1 = 1 state 1 is recurrent and so class {1, 3} is recurrent.


(Or we could have concluded from Theorem 3.4 that because {0} and {2} are both transient, {1, 3} must be
recurrent).

16.4 Example 3.8 solution


Consider a DTMC {Xn , n ∈ N} whose state space is all integers i.e. Z = {0, ±1, ±2, . . .}. Suppose the TPM for the
DTMC satisfies
Pi,i−1 = 1 − p and Pi,i+1 = p ∀i ∈ Z and 0 < p < 1
In other words from any state, we can either jump up or down one state. This is often referred to as the Random Walk.
Characterize the behaviour of this DTMC in terms of its equivalence classes, periodicty, and transience/recurrence.

Solution. First of all since 0 < p < 1 all states clearly communicate with each other. This implies that {Xn , n ∈ N}
is an irreducible DTMC.
Hence we can determine its periodicity (and likewise its transcience/recurrence) by analyzing any state we wish.
Let us select state 0. Starting from state 0, note that state 0 cannot possibly visited in an odd number of transitions
since we are guaranteed to have the number of up(down) jumps exceed the number of down(up) jumps (try to brute
force this to see why).
(1) (3) (5)
Thus P0,0 = P0,0 = P0,0 = . . . = 0 or equivalent

(2n−1)
P0,0 = 0 ∀n ∈ Z+

However since it is clearly possible to return to state 0 in an even number of transitions, it immediately follows that
(2n)
P0,0 > 0 ∀n ∈ Z+ .
(n)
Hence d(0) = gcd{n ∈ Z+ | P0,0 > 0} = gcd{2, 4, 6, . . .} = 2.
Finally to determine whether state 0 is transient or recurrent, let us consider
∞ ∞ ∞  
X (n)
X (2n)
X 2n n
P0,0 = P0,0 = p (1 − p)n
n
n=1 n=1 n=1

44
Winter 2018 STAT 333 Course Notes 16 FEBRUARY 13, 2018

 to take n up steps with probability p and n down steps with probability 1 − p.


That is we need
2n
Note that n accounts for the number of ways to arrange this.
Recall: ratio test
P for series.
Suppose that ∞ n=1 an is a series of positive terms and

an+1
L = lim
n→∞ an

1. If L < 1 then the series converges.

2. If L > 1 then the series diverges.

3. If L = 1 then the test is inconclusive.

In our case we obtain

(2(n+1))! n+1 (1 −
(n+1)!(n+1)! · p p)n+1
L= lim 2n! n n
n!n! · p (1 − p)
n→∞

(2n + 2)! n!n!


= lim · · p(1 − p)
n→∞ (n + 1)!(n + 1)! 2n!
(2n + 2)(2n + 1)
= lim · p(1 − p)
n→∞ (n + 1)(n + 1)

4n2 + 6n + 2
= lim 2 · p(1 − p)
n→∞ n + 2n + 1
= 4p(1 − p)

A plot of L = 4p(1 − p) reveals the following shape

45
Winter 2018 STAT 333 Course Notes 16 FEBRUARY 13, 2018

Note that if p 6= 12 then L < 1 which means state 0 is transient (and so is DTMC).
However, if p = 12 then L = 1 and the ratio test is inconclusive. We need another approach to handle p = 1
2 (this
approach can handle both p = 12 and p 6= 21 ).
Recall that fi,j = P (DTMC ever makes a future visit to state j | X0 = i).
Letting q = 1 − p and conditioning on the state of the DTMC at time 1 we have

f0,0 = P (DTMC ever makes a future visit to state 0 | X0 = 0)


= P (X1 = −1 | X0 = 0) · P (DTMC ever makes a future visit to state 0 | X1 = −1, X0 = 0)
+ P (X1 = 1 | X0 = 0) · P (DTMC ever makes a future visit to state 0 | X1 = 1, X0 = 0)
by Markov property (only most recent state matters) and stationarity (we can treat time 1 as time 0)
= q · f−1,0 + p · f1,0

Using precisely the same logic it follows that

f1,0 = q · 1 + p · f2,0 (16.1)

Consider f2,0 : in order to visit state 0 from state 2, we first have to visit state 1 again, which happens with
probability f2,1 . This is equivalent to f1,0 , so f2,1 = f1,0 (probability of ever making a visit to the state one step
down).
Given you make a visit back to state 1, then we must make a visit back to state 0, which happens to with probability
f1,0 .
Putting this together we have
2
f2,0 = f2,1 · f1,0 = f1,0
Thus from equation 16.1 we have
2
f1,0 = q + p · f1,0
which is a quadratic in the form
2
pf1,0 − f1,0 + q = 0

46
Winter 2018 STAT 333 Course Notes 16 FEBRUARY 13, 2018

Applying the quadratic formula we get



1± 1 − 4pq
f1,0 =
2p
p
1 ± (p + q)2 − 4pq
=
2p
p
1 ± p − 2pq + q 2
2
=
2p
1 ± |p − q|
=
2p
There can only be one unique solution for f1,0 which means that one of

1 + |p − q|
r1 = ; or
2p
1 − |p − q|
r2 =
2p

must be inadmissible (intuitively: we see that if p < 0.5 then r1 will be bigger than 1 which is not possible for
probabilities so r1 does not work for all p).
To determine which it is, suppose that q > p. Then |p − q| = −(p − q) and the 2 roots become

1 − (p − q) q+q q
r1 = = = >1
2p 2p p
1 + (p − q) p+p
r2 = = =1
2p 2p
Thus we must have
1 − |p − q|
f1,0 =
2p
Note: using the exact same approach
1 − |p − q|
f−1,0 =
2q
With knowledge of f1,0 and f−1,0 we find that

f0,0 = qf−1,0 + pf1,0


1 − |p − q| 1 − |p − q|
= q( ) + p( )
2q 2p
= 1 − |p − q|

Suppose p > q i.e. 1 − q > q or 2q < 1:

f0,0 = 1 − (p − q)
=1−p+q
= 2q < 1

since f0,0 < 1 we have that state 0 is transient.


Similarly for p < q we get f0,0 = 2p < 1 so state 0 is transient.

47
Winter 2018 STAT 333 Course Notes 17 FEBRUARY 15, 2018

When p = q, then we have f0,0 = 1 so state 0 is recurrent under p = q = 12 .

17 February 15, 2018


17.1 Example 3.9 solution
Consider the DTMC with TPM  
0 0 1
P = 0 1 0
1 0 0
Determine if limn→∞ P (n) exists (i.e. the limiting behavior of the DTMC).

Solution. There are 2 equivalence classes, namely {0, 2} and {1}.


Each class is recurrent with periods 2 and 1, respectively.
For n ∈ Z+ , note that  
1 0 0
P (2n) = 0 1 0
0 0 1
which follows by taking P (2) and multiplying by itself arbitrarily many times. Also
 
0 0 1
P (2n−1) = 0 1 0
1 0 0

which follows by taking P (1) and multiply it arbitrarily many times by P (2) (identity).
As such, limn→∞ P (n) does not exist since P (n) alternates between those two matrices..
(n) (n)
However, while limn→∞ P0,0 and limn→∞ P0,2 (and many other) do not exist, note that some limits do exist such as

(n)
lim P0,1 = 0
n→∞

and
(n)
lim P1,1 = 1
n→∞

17.2 Example 3.10 solution


Consider the DTMC with TPM 1 1

2 2 0
P = 1 1 1
2 4 4
1 2
0 3 3

Determine if limn→∞ P (n) exists.

Solution. There is only one equivalence class and so the DTMC is irreducible. Also it is straightforward to show
that the DTMC is aperiodic and recurrent.
It can be shown that
4 4 3

11 11 11
lim P (n) =  11
4 4
11
3 
11
n→∞ 4 4 3
11 11 11

48
Winter 2018 STAT 333 Course Notes 18 TUTORIAL 6

(we can find this by using some software and repeatedly applying matrix exponentiation; we will later demonstrate
a way to solve this for certain DTMCs (see Example 3.10 (continued) solution (finding limiting probability using
BLT))).
(n)
Note that this matrix has identical rows. This implies that Pi,j converges to a value as n → ∞ which is the same
for all initial states i. In other words, there is a limiting probability the DTMC will be in state j as n → ∞ and
this probability is indepdendent of the initial states.

18 Tutorial 6
18.1 Determining fi,i and recurrence using definition (infinite summation) from TPM
Consider a DTMC {Xn , n = 0, 1, 2, . . .} with

0 1 2 3 4 5
 1 1 1 
0 4 2 0 4 0 0
1 1 3
1 
5 2 10 0 0 0 
2 3
 
2 
5 0 5 0 0 0 
P = 
 0

3  0 1 0 0 0 

 0 1 1
4 0 0 2 0 2

1
5 0 0 0 2 0 1

It is easy to show there are three equivalence classes {0, 1, 2, 3}, {4}, and {5}. Calculate f0,0 to show that state 0 is
recurrent (and hence its equivalence class is recurrent).

Solution. Recall that


(n)
fi,i = P (Xn = i, Xn−1 6= i, . . . , X2 6= i, X1 6= i | X0 = i)
(n)
As f0,0 = ∞
P
n=1 f0,0 we consider the possible paths starting from state 0 that requires exactly n steps to return to
step 0 for the first time.

n=1 : 0→0
n=2 : 0→1→0
n≥3 : 0 → 1 → ... → 1 → 0
0 → 1 → ... → 1 → 2 → ... → 2 → 0
0 → 3 → 2 → ... → 2 → 0

49
Winter 2018 STAT 333 Course Notes 18 TUTORIAL 6

(n)
Thus we can compute f0,0

(1) 1
f0,0 = P0,0 =
4
(2) 11 1
f0,0 = P0,1 P1,0 = =
25 10
n−3
(n)
X
n−2 k n−3−k n−3
f0,0 = P0,1 P1,1 P1,0 + P0,1 P1,1 P1,2 P2,2 P2,0 + P0,3 P3,2 P2,2 P2,0
k=0
n−3    
1 1 n−2 1 1 3 2 X 1 k 3 n−3−k 1 2 3 n−3
   
= + +
2 2 5 2 10 5 2 5 45 5
k=0
n−3    
1 1 n−2 3 X 1 k 3 n−3−k 1 3 n−3
   
= + + n≥3
10 2 50 2 5 10 5
k=0

So we have

(n)
X
f0,0 = f0,0
n=1
∞ n−3    
( )
1 1 n−2 3 X 1 k 3 n−3−k 1 3 n−3
   
1 1 X
= + + + + n≥3
4 10 10 2 50 2 5 10 5
n=3 k=0
∞ ∞ m ∞  
1 X 1 m 3 X X 1 k 3 m−k 1 X 3 m
     
1 1
= + + + +
4 10 20 2 50 2 5 10 5
m=0 m=0 k=0 m=0
∞ ∞
3 X X 1 k 3 m−k
   
1 1 1 1 1 1
= + + 1 + +
4 10 20 1 − 2 50 2 5 10 1 − 35
k=0 m=k
∞   ∞  
1 1 1 3 X 1 k X 3 m−k 1
= + + + +
4 10 10 50 2 5 4
k=0 m=k
1 1 1 3 1 1 1
= + + + 1 3 +
4 10 10 50 1 − 2 1− 5
4
9 3 1
= + +
20 10 4
=1

as desired.

18.2 Making random walk recurrent with a reflecting boundary


Consider the DTMC {Xn , n ∈ N} represent the random walk model, but now with a reflect boundary at state 0
such that the state space contains only the non-negative integers. Specifically we are now assuming the TPM for
the DTMC satisfies
Pi,i−1 = 1 − p Pi,i+1 = p ∀i ∈ Z+
and P0,1 = 1 where 0 < p < 1. Determine the values of p that result in the DTMC being recurrent.

50
Winter 2018 STAT 333 Course Notes 18 TUTORIAL 6

Solution. It is clear to see that the DTMC is irreducible and has one equivalence class containing all the states.
Thus we only need to focus on a single state. Suppose state 0 is recurrent, then f0,0 = 1. Note that state 0
transitions to state 1 with a probability of 1 (P0,1 = 1), therefore f0,0 = f1,0 = 1. Letting q = 1 − p we can show that
2
f1,0 = q + pf2,0 = q + pf1,0
2 . Remember that we showed in class the
where we are condition on the state at time 2 and f2,0 = f2,1 f1,0 = f1,0
quadratic equation
2
pf1,0 − f1,0 + q = 0
has one solution where
1 − |p − q|
f1,0 =
2p
We plug in values for p, q. Specifically when p ≤ q or |p − q| = q − p

1 − (q − p) 2p
f1,0 = f0,0 = = =1
2p 2p

For p > q we have |p − q| = p − q so


1 − (p − q) 2q
f0,0 = = <1
2p 2p
Thus state 0 and the entire DTMC will be recurrent if p ≤ 12 .

(n)
18.3 Determining pmf of Ni ∼ fi,i and the mean number of transitions between re-visit
Consider the DTMC with the TPM
0 1
 
0 α 1−α
P =
1 β 1−β
and assume α, β ∈ (0, 1) so the DTMC is irreducible (thus both states are recurrent). Find the pmf of N = min{n ∈
Z+ | Xn = 0} conditional on X0 = 0, use it to show that f0,0 = 1 and find the mean number of transitions between
successive visits to state 0.
(n)
Solution. By definition the pmf of N | (X0 = 0) is given as P (N = n | X0 = 0) = f0,0 for n ∈ Z+ . Thus

(1)
P (N = 1 | X0 = 0) = f0,0 = P0,0 = α
(n)
P (N = n | X0 = 0) = f0,0
n−2
= P0,1 P1,1 P1,0
= (1 − α)(1 − β)n−2 β n≥2

51
Winter 2018 STAT 333 Course Notes 18 TUTORIAL 6

To find f0,0 note that



(n)
X
f0,0 = f0,0
n=1

X
= α + β(1 − α) (1 − β)n−2
n=2
X∞
= α + β(1 − α) (1 − β)m
m=0
1
= α + β(1 − α)
1 − (1 − β)
= α + (1 − α)
=1

Also note that the mean number of transitions mi between state i is defined as

(n)
X
m0 = E[N | X0 = 0] = nf0,0
n=1

X
= α + (1 − α)β n(1 − β)n−2
n=2
X∞
= α + (1 − α)β (m + 1)(1 − β)m−1
m=1
∞ ∞
!
1 X
m−1
X
m−1
= α + (1 − α)β m(1 − β) β+ (1 − β) β
β
m=1 m=1
by expectation and total probability of GEO(β) we have
 
1
= α + (1 − α) +1
β
1−α+β
=
β

Let α = 1/2 and β = 1/4. Evaluate E[N | X0 = 0] and find limn→∞ P (n) . What is the relationship between the
mean number of transitions and the limiting probability of being in state 0?
1−1/2+1/4
Solution. Note that we get m0 = 1/4 = 3. Furthermore note that we have

0 1
 
0 1/3 2/3
lim P (n) =
n→∞ 1 1/3 2/3

from R. This coincides with the Basic Limit Theorem (BLT) we proved in class for irreducible, aperiodic and
recurrent DTMCs where the limiting probability πi of being in some state i is equal to m1i (intuitively, since the
mean time of getting back to state 0 is 3 transitions, we are in some other state for the other two transitions. So
the limiting probability of observing a DTMC in state 0 is equal to the fraction of time we spend in that state 0).

52
Winter 2018 STAT 333 Course Notes 19 FEBRUARY 27, 2018

19 February 27, 2018


19.1 Example 3.11 solution
Consider the DTMC with TPM  
1 0 0
P =  13 1
2
1
6
0 0 1
Examine limn→0 P (n) and explain why the limiting probability of being in a state depends on the initial state of
this DTMC.
Solution. We have three equivalence classes {0}, {1}, {2}. Since Pi,i > 0 ∀i = 0, 1, 2 each state is aperiodic.
∞ ∞ ∞
(n) (n)
X X X
P0,0 = P2,2 = =∞
n=1 n=1 n=1

which implies state 0 and 2 are recurrent.


We can also show that
∞ ∞  n
X (n)
X 1 1 1
P1,1 = = · 1 =1<∞
2 2 1− 2
n=1 n=1

which means state 1 is transient.


In fact, states 0 and 2 are examples of absorbing states (i.e. P0,0 = P2,2 = 1).
If one begins in state 1, note that with probability 12 one can end up in one of the two recurrent classes in the next
transition i.e. the DTMC will eventually be absorbed into either state 0 or 2 so state 1 is transient.
It can be shown that  
1 0 0
lim P (n) =  23 0 13 
n→0
0 0 1
Intuitively this makes sense: eventually state 1 will go into 0 or 2. The relative probability of going into state 0 and
going into state 2 is equivalent to our limiting probability entries.
Remark 19.1. Unlike the previous example, it does matter from which state one begins in this DTMC. Furthermore
the second column of the above contains all zeros.

19.2 Theorem 3.6 solution


(n)
Theorem 19.1. For any state i and transient state j of a DTMC, limn→∞ Pi,j = 0.

Proof. Recall that we defined the “first visit probability in n steps” as


(n)
fi,j = P (Xn = j, Xn−1 6= j, . . . , X1 6= j | X0 = i)

Furthermore we defined

(n)
X
fi,j = P (DTMC ever makes a future visit to state j | X0 = i) = fi,j
n=1

We can then use the fact that


n
(n) (k) (n−k)
X
Pi,j = fi,j Pj,j
k=1

53
Winter 2018 STAT 333 Course Notes 19 FEBRUARY 27, 2018

So we have
∞ ∞ X
n
(n) (k) (n−k)
X X
Pi,j = fi,j Pj,j
n=1 n=1 k=1
∞ ∞
(k) (n−k)
X X
= fi,j Pj,j
k=1 n=k
∞ ∞
(k) (l)
X X
= fi,j Pj,j l =n−k
k=1 l=0

X (k) ∞
(l) (0)
X
= ( fi,j )(1 + Pj,j ) Pi,i = 1 ∀i
k=1 l=1

(l)
X
= fi,j (1 + Pj,j )
l=1

(l)
X
≤ 1+ Pj,j fi,j ≤ 1 since it’s a probability
l=1
< ∞ since jis transient so the summation is finite
P∞
Recall the nth term test: If limn→0 an 6= 0 or if the limit does not exist, then n=1 an diverges.
We can show by contradiction that if the series converges then the limit is 0.
By the nth term test for infinite series
(n)
lim Pi,j = 0
n→0

19.3 Example 3.10 (continued) solution (finding limiting probability using BLT)
Recall we were given the DTMC with TPM 1 1

2 2 0
P = 1 1 1
2 4 4
1 2
0 3 3
Find the limiting probability of this DTMC.

Solution. Clearly DTMC is irreducible, aperiodic and positive recurrent. The BLT conditions are met and
π = (π0 , π1 , π2 ) exists and satisfies

π = πP
πe0 = 1 e0 = (1, 1, . . . , 1)T

Thus we have
1 1

2 2 0
(π0 , π1 , π2 ) = (π0 , π1 , π2 )  12 1
4
1
4
1 2
0 3 3

54
Winter 2018 STAT 333 Course Notes 20 MARCH 1, 2018

Expanding the matrix multiplication we get


1 1
π0 = π0 + π1
2 2
1 1 1
π1 = π0 + π1 + π2
2 4 3
1 2
π2 = π1 + π2
4 3
1 = π0 + π1 + π2

(we can disregard the π1 = . . . equation since it requires solving for the most terms and we have 4 equations for 3
unknowns). Solving the system we get
4
π0 = π1 =
11
3
π2 =
11
Recall we had
4 4 3
   
11 11 11 π0 π1 π2
lim P (n) = 4
11
4
11
3 
11 = π0 π1 π2 
n→∞ 4 4 3
11 11 11 π0 π1 π2
We observe that each row of P (n) converges to π as n → ∞.

20 March 1, 2018
20.1 Example 3.12 solution (Gambler’s Ruin) (applying limiting probability)
Consider a gambler who, at each play of a game, has probability p ∈ (0, 1) of winning one unit and probability
q = 1 − p of losing one unit. Assume that successive plays of the game are independent. If the gambler initially
begins with i units, what is the probability that the gambler’s fortune will reach N units (N < ∞) before reaching
0 units? This problem is often referred to as the Gambler’s Ruin Problem, with state 0 representing bankruptcy
and state N representing the jackpot.

Solution. For n ∈ N, define Xn as the gambler’s fortune after the nth play of the game. Therefore {Xn , n ∈ N} is
a DTMC with TPM  
1 0 0 0 ... 0 0 0
q 0 p 0 . . . 0 0 0
 
0 q 0 p . . . 0 0 0
P = .
 
 ..


 
0 0 0 0 . . . q 0 p
0 0 0 0 ... 0 0 1
Where the TPM has entries Pij where i, j ∈ [0, N ] (for each i ∈ [1, N − 1], we have Pi(i−1) = q and Pi(i+1) = p.
P00 = PN N = 1 and all other Pmn = 0).
Note: States 0 and N are treated as absorbing states, therefore they are both recurrent.
States {1, 2, . . . , N − 1} belong to the same equivalence class and it is straightforward to see it is a transient class.
Goal: Determine P (i) for i = 0, 1, . . . , N which represents the probability that starting with i units, the gambler’s
fortune will eventually reach N units.

55
Winter 2018 STAT 333 Course Notes 20 MARCH 1, 2018

It follows that (for the limiting TPM)


 
1 0 0 0 ... 0 0 0

 1 − P (1) 0 0 0 ... 0 0 
 P (1)
 1 − P (2) 0 0 0 ... 0 0  P (2)
lim P (n) =
 
n→∞
.. 

 . 

1 − P (N − 1) 0 0 0 . . . 0 0 P (N − 1)
0 0 0 0 ... 0 0 1

where the (0, 0) entry is 1 (0 state is recurrent), (0, N ) is 0 (can never reach state N from 0) and similarly for
(N, N ) and (N, 0).
Entries (i, N ) = P (i) by definition where P (0) = 0 and P (N ) = 1. Since the sum of the entries for a given row
must be 0, we get for entry (i, 0) as 1 − P (i).
Conditioning on the outcome of the very first game i = 1, 2 . . . , N − 1, we have

P (i) = p · P (i + 1) + q · P (i − 1)

(if we go up by 1 with probability p, then the probability of winning afterwards is P (i + 1) i.e. P (i + 1) = P (i | win)
and p = P (win). Similarly for q and P (i − 1)).
Note that p + q = 1, so we have

P (i) = pP (i) + qP (i)


⇒p(P (i + 1) − P (i)) = q(P (i) − P (i − 1))
q
P (i + 1) − P (i) = (P (i) − P (i − 1))
p
To determine if an explicit solution exists, consider several values of i
q q
i = 1 ⇒ P (2) − P (1) = (P (1) − P (0)) = P (1)
p p
 2
q q
i = 2 ⇒ P (3) − P (2) = (P (2) − P (1)) = P (1)
p p
 3
q q
i = 3 ⇒ P (4) − P (3) = (P (3) − P (2)) = P (1)
p p
..
.
 k
q q
i = k ⇒ P (k + 1) − P (k) = (P (k) − P (k − 1)) = P (1)
p p

Note: the above k equations are linear in terms of the unknowns P (1), P (2), . . . , P (k + 1). Summing these k

56
Winter 2018 STAT 333 Course Notes 21 TUTORIAL 7

equations yields
k  i
X q
P (k + 1) − P (1) = P (1)
p
i=1
k  i
X q
⇒P (k + 1) = P (1) k = 0, 1, . . . , N − 1
p
i=0
k−1  i
X q
⇒P (k) = P (1) k = 1, 2, . . . , N
p
i=0

Applying the formula for a finite geometric series


  k
 1− pq

1
P (k) = 1− pq
P (1) p 6= 2

kP (1) 1
p= 2

1
Letting k = N , we obtain for p 6= 2
 N
q
1− p 1 − pq
1 = P (N ) = P (1) ⇒ P (1) =
− pq
 N
1
1 − pq

Similarly for p = 21 , we have P (1) = 1


N.
So for k = 0, 1, . . . , N we have   q k
 1− p

p 6= 1
N 2
P (k) = 1− q
p

k 1
N p= 2

Think about what happens when N → ∞ for each case.

21 Tutorial 7
21.1 DTMCs and Ni (minimum n for Xn = i), transience and recurrence, limit probabilities,
number of transitions
The state space S of DTMC is the set of all non-negative integers, namely S = {0, 1, 2, . . .}. For each i ∈ S, the
one-step transition probabilities have the form
α 2+i−α
Pi,i+1 = and Pi,0 =
2+i 2+i

57
Winter 2018 STAT 333 Course Notes 21 TUTORIAL 7

resulting in the TPM


0 1 2 3 4 ...
2−α α
 
0 2 2 0 0 0 ...
3−α α
1 
 3 0 3 0 0 ... 

4−α α
2 
 4 0 0 4 0 ... 

P =

5−α α .. 
3 
 5 0 0 0 5
. 

 6−α .. 
4 
6 0 0 0 0 . 
.. .. .. ..
 
.. .. ..
. . . . . . .

1. Explain why we must restrict α to the interval 0 ≤ α ≤ 2 in order for P to be a TPM when defined as above.

2. Let N0 = min{n ∈ Z+ | Xn = 0} be the time until the first return to state 0. For each value of α ∈ [0, 2], find
an expression for the pmf of N0 , conditional on X0 = 0.

3. For each value of α ∈ [0, 2], discuss the transience and recurrence of each state of S.

4. Letting 0 < α ≤ 2, find the mean recurrent time of state 0 and determine if the states of this DTMC are
positive or null recurrent.

5. For what values of α ∈ [0, 2] does the DTMC have a limiting distribution on S?

6. For the values of α where the Basic Limit Theorem is satisfied, find the limit probabilities {πj }j∈S .

7. Letting 0 < α ≤ 2, determine the expected number of of transitions to reach state 2, conditional on X0 = 0.

Solution. 1. Note that all transitions have some probability bounded by 0 ≤ Pi,j ≤ 1.
So for Pi,i+1 we have
α
0≤≤1⇒0≤α≤2+i
2+i
since the smallest value i can be is 0, then 0 ≤ α ≤ 2.

2. Note for each value α ∈ [0, 2] and by the structure of the DTMC we have
(1) α
f0,0 = P0,0 = 1 −
2
(2) α α
f0,0 = P0,1 P1,0 = · (1 − )
2 3
(3) α α α
f0,0 = P0,1 P1,2 P2,0 = · · (1 − )
2 3 4
(n) αn−1 α
f0,0 = P0,1 P1,2 . . . Pn−2,n−1 Pn−1,0 = · (1 − )
n! n+1
for n ∈ Z+ .

3. When α = 0, it is clear that only state 0 is recurrent and all other states are transient (since Pi,0 = 1 ∀i ∈ Z + ).
When α 6= 0, we note that all states are in one equivalence class, so we only focus on the transience one state
e.g. 0.

58
Winter 2018 STAT 333 Course Notes 21 TUTORIAL 7

We find f0,0 when α 6= 0


∞ ∞ ∞ ∞
X (n)
X αn−1 α X αn−1 X αn
f0,0 = f0,0 = (1 − = −
n! n+1 n! (n + 1)!
n=1 n=1 n=1 n=1
∞ ∞
!
1 X αn X αn
= −
α n! n!
n=1 n=2

=
α1
=1

So state 0 is recurrent and so are the other states, thus the DTMC is recurrent.

4. We can find the mean recurrent time m0 of state 0 from



(n)
X
m0 = E[N0 | X0 = 0] = nf0,0
n=1
∞ ∞
X αn−1 X nαn
= −
(n − 1)! (n + 1)!
n=1 n=1
∞ ∞
!
α
X αn e−α X (n + 1 − 1)αn e−α
=e −
n! (n + 1)!
n=0 n=1
∞ ∞
!
X αn e−α X αn e−α
= eα 1− +
n! (n + 1)!
n=1 n=1

!
1 X α n+2 e−α
= eα 1 − (1 − e−α +
α (n + 2)!
n=2
1
= eα (e−α + (1 − e−α − αe−α )
α
eα − 1
= <∞
α
for 0 < α ≤ 2. This implies state 0 is positive recurrent. Since the DTMC is irreducible, this implies every
state is positive recurrent.

5. From part (3) we have two cases.


If α = 0, then we have
0 1 2 3 4 ...
 
0 1 0 0 0 0 ...
1
 1 0 0 0 0 ... 

2  1 0 0 0 0 ... 

P = P (n) = 3  1 0 0 0
 .. 
 0 . 

 .. 
4  1 0 0 0 0 . 
.. .. .. .. . .
 
.. ..
. . . . . . .
Therefore the limit of the n-step TPM as n → 0 has identical rows, implying there is a unique solution to the
limiting probabilities equal to π = (1, 0, 0, . . . , 0, . . .).

59
Winter 2018 STAT 333 Course Notes 21 TUTORIAL 7

If 0 < α ≤ 2, we know the DTMC is irreducible and positive recurrent. Also note that d(0) = gcd{2, 3, 4, . . .} =
1 implying that state 0 is aperiodic. Thus the conditions of the Basic Limit Theorem hold true and a unique
limiting distribution {πj }j∈S exists.
Putting both cases together a limiting distribution exists for all 0 ≤ α ≤ 2.
α
6. From part (e), 0 < α ≤ 2. From part (4) we see that π0 = eα −1 .
We can also derive π0 . From the Basic Limit Theorem, we have

X
πj = πi Pi,j
i=0

X
π1 = 1
i=0

Note Pi−1,i > 0 and Pi,0 > 0, and all other Pi,j = 0. Thus we have
∞ ∞
X X 2+i−α
π0 = πi Pi,0 = πi
2+i
i=0 i=0
α
π1 = π0 P0,1 = π0
2
α2
π2 = π1 P1,2 = π0
3!
α3
π3 = π2 P2,3 = π0
4!
..
.
αi
πi = π0 i∈S
(i + 1)!
P
Along with 1 = i∈S πi , we have enough equations to solve for each πi .
Solving for π0 we have
X α α2
1= πi = π0 + π0 + π0 + . . .
2! 3!
i∈S

1 X αi
= π0 ·
α i!
i=1

π0 X αi
= ( − 1)
α i!
i=0
π0 α
= (e − 1)
α
α
⇒π0 = α
e −1
1
which we recognize as m0 from (4). Therefore we have

αi+1
πi = ∀i ∈ S
(i + 1)!(eα − 1)

60
Winter 2018 STAT 333 Course Notes 22 MARCH 6, 2018

7. Let Ti denote the number of transitions to state 2 starting from state i. We want to find E[T0 ].
Conditioning on the state at step 1, we have

E[T0 ] = E[T0 | X1 = 0]P (X1 = 0 | X0 = 0) + E[T0 | X1 = 1]P (X1 = 1 | X0 = 0)


2−α α
= (1 + E[T0 ]) + (1 + E[T1 ])
2 2
α α
⇒ E[T0 ] = 1 + E[T1 ]
2 2
2
E[T0 ] = + E[T1 ]
α

Likewise

E[T1 ] = E[T1 | X1 = 0]P (X1 = 0 | X0 = 1) + E[T1 | X1 = 2]P (X1 = 2 | X0 = 1)


3−α α
= (1 + E[T0 ]) + (1)
3 3
3−α 2 α
= (1 + + E[T1 ]) +
3 α 3
α 2(3 − α)
⇒ E[T1 ] = 1 +
3 3α
α+6
⇒E[T1 ] =
α2

Thus
2 α+6
E[T0 ] = +
α α2
3(α + 2)
=
α2

22 March 6, 2018
22.1 Example 3.11 (continued) solution (showing absorption probability equal limiting prob-
abilities)
Recall the earlier DTMC with TPM  
1 0 0
P =  13 1
2
1
6
0 0 1
We previously claimed  
1 0 0
lim P (n) =  32 0 13 
n→0
0 0 1
(n)
Show that the absoprtion probabilities from transient state 1 into states 0 and 2 are equal to limn→∞ P1,0 and
(n)
limn→∞ P1,2 , respectively.

61
Winter 2018 STAT 333 Course Notes 22 MARCH 6, 2018

Solution. First of all, relabel the states of the DTMC is as follows

0∗ = state 1 of the original DTMC


1∗ = state 0 of the original DTMC
2∗ = state 2 of the original DTMC

The new TPM corrresponding to states {0∗ , 1∗ , 2∗ }

0∗ 1∗ 2∗
1 1 1
 
0∗ 2 3 6
P = 1∗  0 1 0 
2∗ 0 0 1

Based on the notation used in class, we have


1
Q= 2
R = 13 1
 
6
 
0
O=
0
 
1 0
I=
0 1

Using the definition of absorption probability


M
X −1
Ui,k = Ri,k + Qi,j Uj,k
j=0

we get

U0∗ ,1∗ = R0∗ ,1∗ + Q0∗ ,0∗ U0∗ ,1∗


1 1
= + U0∗ ,1∗
3 2
2 (n)
⇒U0∗ ,1∗ = = lim P1,0
3 n→∞
And similarly,
1 (n)
U0∗ ,2∗ = = lim P
3 n→∞ 1,2

22.2 Example 3.13 solution (solving absorption probabilities)


Consider the DTPMC with TPM
0 1 2 3
 
0 0.4 0.3 0.2 0.1
1  0.1 0.3 0.3 0.3 
P = 
 0

2 0 1 0 
3 0 0 0 1
Suppose the DTMC starts off in state 1. What is the probability that the DTMC ultimately ends up in state 3?
How would this probability change if the DTMC begins in state 0 with probability 34 and in state 1 with probability

62
Winter 2018 STAT 333 Course Notes 22 MARCH 6, 2018

1
4?

Solution. First we wish to calculate U1,3 . Here,


 
0.4 0.3
Q=
0.1 0.3
 
0.2 0.1
R=
0.3 0.3
 
U ,U
U = 0,2 0,3
U1,2 , U1,3

Since U = (I − Q)−1 R (matrix definition of absorption probability matrix) we need the inverse of
     
1 0 0.4 0.3 0.6 −0.3
I −Q= − =
0 1 0.1 0.3 −0.1 0.7

Recall if    
a b −1 1 d −b
A= ⇒A =
c d ad − bc −c a
provided that ad − bc 6= 0. Thus we have
   70 10

−1 1 0.7 0.3
(I − Q) = = 39
10
13
20
0.42 − 0.03 0.1 0.6 39 13

so we get  
−1 0.589744 0.410256
U = (I − Q) R=
0.512821 0.487179
Thus U1,3 = 0.487179.
Now under the alternate set of initial conditions

P (DTMC ultimately in state 3) = P (final state 3 | X0 = 0)P (X0 = 0) + P (final state 3 | X0 = 1)P (X0 = 1)
3 1
= U0,3 + U1,3
4 4
3 1
= (0.410256) + (0, 487179)
4 4
≈ 0.429487

Alternatively we could calculate U1,3 using the definition of absorption probability as a system of equations
M
X −1
Ui,k = Ri,k + Qi,j Uj,k
j=0

Set i = 1 and k = 3 to get

U1,3 = R1,3 + Q1,0 U0,3 + Q1,1 U1,3 = 0.3 + 0.1U0,3 + 0.3U1,3

so we also need U0,3


U0,3 = R0,3 + Q0,0 U0,3 + Q0,1 U1,3 = 0.1 + 0.4U0,3 + 0.3U1,3

63
Winter 2018 STAT 333 Course Notes 22 MARCH 6, 2018

We have to solve two equations with two unknowns

0.1U0,3 − 0.7U1,3 = −0.3


0.6U0,3 − 0.3U1,3 = 0.1

Multiplying the first equation by 6 and subtracting one from the other we get

3.9U1,3 = 1.9 ⇒ U1,3 = 0.4871719

22.3 Example 3.14 solution (absorbing states with absorbing recurrent classes)
Consider the DTMC with TPM
0 1 2 3 4
 
0 0.4 0.3 0.2 0.1 0
1 
 0.1 0.3 0.3 0.3 0 

P = 2 
 0 0 1 0 0 

3  0 0 0 0.7 0.3 
4 0 0 0 0.4 0.6
Suppose the DTMC starts off in state 1. What is the probability that the DTMC ultimately ends up in state 3?
(n)
Solution. Goal: find limn→∞ P1,3 . We can begin grouping states 3 and 4 together into a single state 3∗ resulting
in the revised TPM
0 1 2 3∗
 
0 0.4 0.3 0.2 0.1
1  0.1 0.3 0.3 0.3 
P∗ = 
 0

2 0 1 0 
3∗ 0 0 0 1
This is identical to the TPM from Example 3.13, so we have

U1,3 = 0.487179

Once in 3∗ , the DTMC will remain in recurrent class {3, 4} with associated TPM

3 4
 
3 0.7 0.3
4 0.4 0.6

Note: for this “smaller” DTMC, the conditions of the BLT are satisfied.
We can solve for limiting probabilities π3 , π4
 
0.7 0.3
(π3 , π4 ) = (π3 , π4 )
0.4 0.6
π3 + π4 = 1
4
We can show that π3 = 7 and π4 = 37 . Thus

(n) 4
lim P1,3 = U1,3∗ · π3 = (0.487179)( ) = 0.278388
n→∞ 7

64
Winter 2018 STAT 333 Course Notes 22 MARCH 6, 2018

Why?
(n)
lim P1,3 = lim P (Xn = 3 | X0 = 1)
n→∞ n→∞
lim {P (Xn = 3, XT = 2 | X0 = 1) + P (Xn = 3, XT = 3 | X0 = 1)}
n→∞
P (Xn = 3, XT = 2) = 0 since state 2 is absorbing, we have
lim P (Xn = 3, XT = 3 | X0 = 1)
n→∞
lim P (Xn = 3 | XT = 3∗ , X0 = 1)P (XT = 3∗ | X0 = 1)
n→∞
lim P3n−T
∗ ,3 U1,3∗
n→∞
n
lim P3,3 U1,3∗
n→∞
U1,3∗ π̇3

22.4 Aside: n-step TPM (P (n) ) for absorbing DTMCs


Recall that we have in Example 3.14 that we have
 
Q R
P =
0 A

where  
1 0 0
A = 0 0.7 0.3
0 0.4 0.6
So we have
Q2 QR + RA
   n 
2 (n) Q ?
P = ⇒P =
0 A2 0 An
(we solve for the ? quadrant).
Note that An is just A but with the lower right 4 elements as a separate matrix to the matrix power of n. Thus if
we take the limit as n → ∞ we get our limiting probability matrix (for the lower right 4 elements)
 
π1 π2
π3 π4

22.5 Example 3.11 (continued) solution (mean absorption time wi )


Recall the modified TPM
0∗ 1∗ 2∗
1 1 1
 
0∗ 2 3 6
P = 1∗  0 1 0 
2∗ 0 0 1
What is the mean absoprtion time for this DTMC, given that it begins in state 0∗ ?

Solution. Rcall that we have (in matrix form)

w0 = (I − Q)−1 e0

65
Winter 2018 STAT 333 Course Notes 22 MARCH 6, 2018

Thus we have (since we only have 1 transient state)


1
w0∗ = w0 = ( )−1 (1)
2
=2

Looking at this particular TPM, given that the DTMC initially begins in state 0∗ , each transition will return to
state 0∗ with probability 12 or become absorbed into one of the two absorbing states with probability 13 + 16 = 12 .
Thus
1 1
T | (X0 = 0∗ ) ∼ GEO( ) ⇒ E[T | X0 = 0∗ ] = 1 = 2
2 2

22.6 Example 3.13 (continued) solution (mean absorption time wi )


Consider the DTMC with TPM
0 1 2 3
 
0 0.4 0.3 0.2 0.1
1  0.1 0.3 0.3 0.3 
P = 
 0

2 0 1 0 
3 0 0 0 1
starts off in state 1, how long on average does it take to end up in either state 2 or 3?
 
0 w0
Solution. Find w1 where w = . Note that
w1
 70 10
 
0 −1 0 39 13 1 10 20 70
w = (I − Q) e = 10 20 ⇒ w1 = + = ≈ 1.79
39 13 1 39 13 39

22.7 Example 3.13 (continued) solution (average number of visits prior to absorption Si,l )
Consider the DTMC with TPM
0 1 2 3
 
0 0.4 0.3 0.2 0.1
1  0.1 0.3 0.3 0.3 
P = 
 0

2 0 1 0 
3 0 0 0 1
Given X0 = 1, what is the average number of visits to state 0 prior to absoprtion? Also what is the probability that
the DTMC ever makes a visit to state 0?

Solution. Goal: find S1,0 where


0 1
 
0 S0,0 S0,1
S=
1 S1,0 S1,1
From our earlier calculations we know
 70 10

−1 39 13
10
S = (I − Q) = 10 20 ⇒ S1,0 = ≈ 0.257
39 13 39

66
Winter 2018 STAT 333 Course Notes 23 TUTORIAL 8

Lastly, we calculate (as derived)


10
S1,0 − δ1,0 39 −0 1
f1,0 = = 70 =
S0,0 39
7

23 Tutorial 8
23.1 Transforming problem into absorption problem
Recall the DTMC {Xn , n ∈ N} from Tutorial 7 with state space S = {0, 1, 2, . . .}. For each i ∈ S we had

α 2+i−α
Pi,i+1 = and Pi,0 =
2+i 2+i
resulting in the TPM
0 1 2 3 4 ...
2−α α
 
0 2 2 0 0 0 ...
3−α α
1 
 3 0 3 0 0 ... 

4−α α
2 
 4 0 0 4 0 ... 

P =

5−α α .. 
3 
 5 0 0 0 5
. 

 6−α .. 
4 
6 0 0 0 0 . 
.. .. .. ..
 
.. .. ..
. . . . . . .
In part (7) we determined the number of transitions to reach state 2, conditional on X0 = 0 (letting 0 ≤ α ≤ 2).
Show that we can solve this expected value using absorbing DTMC results.

Solution. We can transform our DTMC to an equivalent sub-DTMC with absorbing state 2 and TPM

0 1 2
2−α α
 
0 2 2 0
3−α α
P = 1 
3 0 3

2 0 0 1

This sub-DTMC behaves the same up until the first visit to 2 which is all that we are concerned with. Thus the
expected number of transitions to 2 starting from state 0 is equivalent to the mean absorption time for state 0 or
w0 . That is
2−α α 2
w0 = 1 + Q0,0 w0 + Q0,1 w1 ⇒ w0 = 1 + w0 + w1 ⇒ w0 = + w1
2 2 α
3−α
w1 = 1 + Q1,0 w0 = 1 + w0
3
Thus we have
2 3−α 3 2+α 3(2 + α)
w0 = +1+ w0 ⇒ w0 = =
α 3 α α α2
as desired.

67
Winter 2018 STAT 333 Course Notes 24 MARCH 13, 2018

23.2 Mean recurrent time


Consider DTMC with TPM
0 1 2
1 1 1
 
0 3 2 6
1 1 1
P = 1 
4 4 2

3 1 1
2 5 5 5
If the process begins in state 0, then on average how many transitions are required to return to 0?

Solution. Note that the DTMC is ergodic and irreducible, thus the BLT applies here.

24 March 13, 2018


24.1 Example 4.1 solution (P (Xj = min{X1 , . . . , Xn }) for Xi ∼ EXP (λi ))
Let {Xi }ni=1 be a sequence of independent random variables where Xi ∼ EXP (λi ), i = 1, 2, . . . , n. For j ∈
{1, 2, . . . , n}, calculate P (Xj = min{X1 , . . . , Xn }).

Solution. Note that

P (Xj = min{X1 , . . . , Xn }) = . . .
(n − 1)-fold intersection of events (no Xj < Xj term)
= P (Xj < X1 , Xj < X2 , . . . , X\
j < Xj , . . . , Xj < Xn )
Z ∞
= P (Xj < X1 , Xj < X2 , . . . , Xj < Xn | Xj = x)λj e−λj x dx
0
since Xj and {Xi }ni=1,i6=j are independent, we have
Z ∞
= P (X1 > x, X2 > x, . . . , Xn > x)λj e−λj x dx
0
since {Xi }ni=1,i6=j are independent, we have
Z ∞
= P (X1 > x)P (X2 > x) . . . P (Xn > x)λj e−λj x dx
Z0 ∞
= e−λ1 x · e−λ2 x · . . . · e−λn x λj e−λj x dx
0
Z ∞ n
λj X P
= Pn ( λi )e−( λi )x dx
i=1 λi 0 i=1
λj
= Pn
i=1 λi

24.2 Theorem 4.1 (equivalent memoryless property definition)


Theorem 24.1. X is memoryless iff P (X > y + z) = P (X > y)P (X > z), ∀y, z ≥ 0.

Proof. Forwards direction:


Recall a rv is memoryless iff
P (X > y + z | X > y) = P (X > z) ∀y, z ≥ 0

68
Winter 2018 STAT 333 Course Notes 25 MARCH 15, 2018

Thus we have
P (X > y + z, X > y) P (X > y + z)
P (X > y + z | X > y) = =
P (X > y) P (X > y)
If X is memoryless, then P (X > y + z | X > y) = P (X > z).
The result immediately follows.
Backwards direction:
Conversely, if P (X > y + z) = P (X > y)P (X > z) ∀y, z ≥ 0, note that by conditional probability

P (X > y + z) P (X > y)P (X > z)


P (X > y + z | X > y) = = = P (X > z)
P (X > y) P (X > y)

which implies that X is memoryless.

24.3 Theorem 4.2 (exponentials are memoryless)


Theorem 24.2. An exponential distribution is memoryless.

Proof. Suppose that X ∼ EXP (λ). For y, z ≥ 0

P (X > y + z) = e−λ(y+z) = e−λy e−λz = P (X > y)P (X > z)

By Theorem 4.1, X is memoryless.

25 March 15, 2018


25.1 Example 4.2 solution (non-identical exponentials problem)
Suppose that a computer has 3 switches which govern the transfer of electronic impulses. These switches operate
independently of one another and their lifetimes are exponentially distributed with mean lifetimes of 10, 5, and 4
years, respectively.

1. What is the probability that the time until the very first switch breakdown exceeds 6 years?

Solution. Let Xi represent the lifetime of switch i where i = 1, 2, 3.


1
We know Xi ∼ EXP (λi ) where λ1 = 10 , λ2 = 51 , and λ3 = 14 .
The time until the 1st breakdown is defined by the rv Y = min{X1 , X2 , X3 }.
1
Since the lifetimes are independent of each other, Y ∼ Exp(λ) where λ = 10 + 15 + 15 = 11
20 (from Example 4.1).
11
− 20
We wish to calculate P (Y > 6) = e (6)
= e−3.3 = 0.0369.

2. What is the probability that switch 2 outlives switch 1?

Solution. We simply want to calculate P (X1 < X2 ). We showed in Example 2.11 that this is

λ1 1/10 1
P (X1 < X2 ) = = =
λ1 + λ2 1/10 + 1/5 3

3. If switch 3 is known to have lasted 2 years, what is the probability it will last at most 3 more years?

69
Winter 2018 STAT 333 Course Notes 26 MARCH 20, 2018

Solution. We wish to compute P (X3 ≤ 5 | X3 > 2), or

1 − P (X3 > 5 | X3 > 2) = 1 − P (X3 > 2 + 3 | X3 > 2)

By the memoryless property of EXP , this is equivalent to


1
1 − P (X3 > 3) = 1 − e− 4 (3) = 1 − e−0.75 ≈ 0.5276

4. Consider only switches 1 and 2, what is the expected amount of time until they have both suffered a breakdown?

Solution. We want to solve for E[max{X1 , X2 }]. Consider

E[max{X1 , X2 } | X2 > X1 ] = E[X2 | X2 > X1 ]


= E[X1 + (X2 − X1 ) | X2 > X1 ]
= E[X1 | X2 > X1 ] + E[X2 − X1 | X2 > X1 ]
by Exercise 4.1 we have
= E[min{X1 , X2 }] + E[X2 − X1 | X2 > X1 ]
by memoryless property and remark (2) in notes we have
= E[min{X1 , X2 }] + E[X2 ]

In a completely similar fashion, we have

E[max{X1 , X2 } | X1 > X2 ] = E[min{X1 , X2 }] + E[X1 ]

Thus by total probability and total expectations (we don’t have to worry about X1 = X2 since exponential is
continuous and P (X = c) = 0).

E[max{X1 , X2 }] = P (X1 > X2 )E[max{X1 , X2 } | X1 > X2 ] + P (X2 > X1 )E[max{X1 , X2 } | X2 > X1 ]


2 1
= [E[min{X1 , X2 }] + E[X1 ]] + [E[min{X1 , X2 }] + E[X2 ]]
3 3
2 1
= E[min{X1 , X2 }] + E[X1 ] + E[X2 ]
3 3
1 2 1
= 1 1 + 3 (10) + 3 (5)
10 + 5
35
= ≈ 11.67 years
3

26 March 20, 2018


26.1 Theorem 4.3 (Poisson processes are Poisson distributed)
Theorem 26.1. If {N (t), t ≥ 0} is a Poisson process at rate λ, then N (t) ∼ P OI(λt).

Proof. We will prove this by showing that their MGFs are equivalent (i.e. the MGF of the Poisson process is that
of P OI(λt)).
For t ≥ 0, let φt (u) = E[euN (t) ] the MGF of N (t) (where the MGF is parameterized by u).

70
Winter 2018 STAT 333 Course Notes 26 MARCH 20, 2018

For h ≥ 0, consider

φt+h (u) = E[euN (t+h) ]


= E[eu[N (t+h)−N (t)+N (t)] ]
= E[eu[N (t+h)−N (t)] euN (t) ]
= E[eu[N (t+h)−N (t)] ]E[euN (t) ] due to independent increments
= E[euN (h) ]E[euN (t) ] due to stationary increments
= φt (u)φh (u)

Next note that for j ≥ 2 we have

0 ≤ P (N (h) = j) ≤ P (N (h) ≥ 2)
P (N (h) = j) P (N (h) ≥ 2)
⇒0 ≤ ≤
h h
P (N (h) = j) P (N (h) ≥ 2)
⇒0 ≤ lim ≤ lim Let h → 0
h→0 h h→0 h
P (N (h) = j)
⇒0 ≤ lim ≤0 P (N (h) = j) ∈ o(h)
h→0 h

Thus limh→0 P (N (h)=j)


h = 0 by Squeeze theorem. Therefore P (N (h) = j) ∈ o(h) i.e. of order h.
Using this result, we get

φh (u) = E[euN (h) ]



X
= euj P (N (h) = j)
j=0

X
= P (N (h) = 0) + eu P (N (h) = 1) + euj P (N (h) = j)
j=2

X X
euj P (N (h) = j) = cj o(h) linear combination of o(h) so we have
j=2 j=2

= 1 − λh + o(h) + eu (λh + o(h)) + o(h)


= 1 + λh(eu − 1) + o(h)

Returning to φt+h (u)

φt+h (u) = φt (u)φh (u)


= φt (u)[1 + λh(eu − 1) + o(h)]
= φt (u) + λh(eu − 1)φt (u) + o(h)
φt+h (u) − φt (u) = λh(eu − 1)φt (u) + o(h)
φt+h (u) − φt (u) o(h)
= λ(eu − 1)φt (u) + divide by h
h h
d u
(φt (u)) = λ(e − 1)φt (u) take h → 0
dt

71
Winter 2018 STAT 333 Course Notes 26 MARCH 20, 2018

This is a differential equation for φt (u), note that (replacing t with s)


d
ds (φs (u))
= λ(eu − 1)
φs (u)
d
(ln φs (u)) = λ(eu − 1)
ds
d(ln φs (u)) = λ(eu − 1)ds
Z t Z t
d(ln φs (u)) = λ(eu − 1)ds
0 0
s=t
ln φs (u) = λ(eu − 1)t
s=0
ln φt (u) − ln φ0 (u) = (λt)(eu − 1)
ln φt (u) = (λt)(eu − 1) φ0 (u) = E[euN (0) ] = 1 since N (0) = 0
(λt)(eu −1)
φt (u) = e

We recognize this as the mgf of a P OI(λt) r.v. By the uniqueness property of MGFs, N (t) ∼ P OI(λt).

26.2 Interarrival times Ti between Poisson events are Exponential distributed


Theorem 26.2. If {N (t), t ≥ 0} is Poisson process at rate λ > 0, then {Ti }∞
i=1 is a sequence of iid EXP (λ) random
variables.

Proof. We begin by considering T1 (first arrival time) for t ≥ 0. Note that the tail probability of T1 is

P (T1 > t) = P (no events before time t)


= P (N (t) = 0)
e−λt (λt)0
=
0!
−λt
=e

We recognize this as the tail probability of an EXP (λ) r.v. thus T1 ∼ EXP (λ).
Next for s > 0 and t ≥ 0 consider

P (T2 > t | T1 = s)
=P (T2 > t | N (w) = 0 ∀w ∈ [0, s) and N (s) = 1)
=P (no events occur in (s, s + t] | N (w) = 0 ∀w ∈ [0, s) and N (s) = 1)
=P (N (s + t) − N (s) = 0 | N (w) = 0 ∀w ∈ [0, s) and N (s) = 1)
since conditional interval [0, s] does not overlap (s, s + t] by independent increments
=P (N (s + t) − N (s) = 0)
=P (N (t) = 0) by stationary increments
=e−λt

Note that e−λt is independent of s, thus T1 and T2 are independent r.v’s so P (T2 > t | T1 = s) = P (T2 > t) = e−λt
thus T2 ∼ EXP (λ). Carrying this out inductively leads to the desired result.

72
Winter 2018 STAT 333 Course Notes 27 MARCH 22, 2018

27 March 22, 2018


27.1 Example 4.3 solution
At a local insurance company, suppose that fire damage claims come into the company according to a Poisson
process at rate 3.8 expected claims per year.

1. What is the probability that exactly 5 claims occur in the time interval (3.2, 5] (measured in years)?
Solution. Let N (t) be the number of claims arriving to the company in the interval [0, t].
Since {N (t), t ≥ 0} is a Poisson process with λ = 3.8, we want to find

e−3.8(1.8) (3.8 · 1.8)5


P (N (5) − N (3.2) = 5) = ≈ 0.1335
5!

2. What is the probability that the time between the 2nd and 4th claims is between 2 and 5 months?
Solution. Let T be the time between the 2nd and 4th claims, so T = T3 + T4 where T3 , T4 are independent
EXP (3.8) random variables representing the arrival times of the 3rd and 4th claims.
We know T ∼ Erlang(2, 3.8) which means
2−1
−3.8t
X (3.8t)j
P (T > t) = e = e−3.8t (1 + 3.8t)
j!
j=0

for t ≥ 0.
We wish to calculate (converted from months to years)
1 5 1 5
P ( < T ) = P (T > ) − P (T > ) ≈ 0.3367
6 12 6 12

3. If exactly 12 claims have occurred within the first 5 years, how many claims on average occurred within the
first 3.5 years?
Solution. We want to calculate E[N (3.5) | N (5) = 12]. Recall for s < t we have N (s) | (N (t) = n) ∼
Bin(n, st ), thus N (3.5) | (N (5) = 12) ∼ BIN (12, 3.5/5) so E[N (3.5) | N (5) = 12] = 12(3.5/5) = 42/5 = 8.4.
Remark 27.1. Without conditioning, N (3.5) ∼ P OI(3.8 · 3.5) so E[N (3.5)] = (3.8)(3.5) = 13.3 6= 8.4.
So conditioning on N (5) does indeed affect the mean of N (3.5).

4. At another competing insurance company, suppose fire damage claims arrive according to a Poisson process at
rate 2.9 claims per year. What is the probability that 3 claims arrive to this company before 2 claims arrive
to the first company? Assume insurance comapnies operate independently of each other.
Solution. Let N1 (t) denote the number of claims arriving to the 1st company by time t and N2 (t) denote
the number of claims arriving at the 2nd company.
We are assuming {N1 (t), t ≥ 0} (i.e. Poisson process at rate λ1 = 3.8) and {N2 (t), t ≥ 0} (i.e. Poisson process
at rate λ2 = 2.9) are independent processes.
In general
m−1
X  n  j
n+j−1 λ1 λ2
P (Sn(1) < (2)
Sm ) =
n−1 λ1 + λ2 λ1 + λ2
j=0

73
Winter 2018 STAT 333 Course Notes 28 MARCH 27, 2018

We want the probability of 3 claims to company 2 before 2 claims to company 1 so we have


(2) (1) (1) (2)
P (S3 < S2 = 1 − P (S2 < S3 )
3−1   n  j
X 2+j−1 3.8 2.9
=1−
2−1 3.8 + 2.9 2.9 + 3.8
j=0

≈ 0.2191

(we don’t need to take the complement, but for clarity we did).

28 March 27, 2018


28.1 Example 4.3 (continued) solution (Poisson process with classifications)
Suppose that fire damage claims can be categorized as being either commercial, business or residential.
At the first insurance company, past history suggests that 15% are commercial, 25% are business, and the remaining
60% are residential. Over the course of the next 4 years, what is the probability that the company experiences
fewer than 5 claims in each of the 3 categories?

Solution. Let Nc (t) bbe the number of commercial claims by time t. Likewise Nb (t) and Nr (t) denote the number
of business and residential claims by time t, respectively.
It follows that (recall λ = 3.8 for the whole process)

Nc (4) ∼ P OI(3.8 · 0.15 · 4 = 2.28)


Nb (4) ∼ P OI(3.8 · 0.25 · 4 = 3.8)
Nr (4) ∼ P OI(3.8 · 0.6 · 4 = 9.12)

We wish to calculate the joint probability

P (Nc (4) < 5, Nb (4) < 5, Nr (4) < 5) = P (Nc (4) < 5) · P (Nb (4) < 5) · P (Nr (4) < 5) independence
4
! 4 ! 4 !
X e−2.28 (2.28)i X e−3.8 (3.8)i X e−9.12 (9.12)i
=
i! i! i!
i=0 i=0 i=0
= (0.91857)(0.66784)(0.05105)
≈ 0.0313

28.2 Theorem 4.5 (conditional distribution of first arrival time given N (t) = 1 is uniform)
Theorem 28.1. Suppose that {N (t), t ≥ 0} is a Poisson process at rate λ. Given N (t) = 1, the conditional
distribution of the first arrival time is uniformly distributed on [0, t]. That is S1 | (N (t) = 1) ∼ U (0, t).

Proof. In order to identify the conditional distribution S1 | (N (t) = 1), we consider the cdf of S1 | (N (t) = 1) which

74
Winter 2018 STAT 333 Course Notes 28 MARCH 27, 2018

we will denote

G(s) = P (S1 ≤ s | N (t) = 1) 0≤s≤t


P (S1 ≤ s, N (t) = 1)
=
P (N (t) = 1)
P (1 event in [0, s] and 0 events in (s, t])
=
P (N (t) = 1)
P (N (s) = 1, N (t) − N (s) = 0)
=
P (N (t) = 1)
P (N (s) = 1)P (N (t) − N (s) = 0)
= independent increments
P (N (t) = 1)
e−λs λ1 e−λ(t−s) λ0 1!
=
1! 0! e−λt λ1
−λs
e (λs) e 1 −λ(t−s) (λ(t − s))0 1!
= −λt
1! 0! e (λt)1
s
=
t
which is exactly the cdf of a U (0, t) random variable so S1 | (N (t) = 1) ∼ U (0, t).

28.3 Theorem 4.6 (conditional joint distribution of n arrival times is the n order statistics
with U (0, t))
Theorem 28.2. Let {N (t), t ≥ 0} be a Poisson process at rate λ. Given N (t) = n, the conditional joint distribution
of the n arrival times is identical to the joint distribution of the n order statistics from the U (0, t) distribution.
That is
(S1 , S2 , . . . , Sn ) | (N (t) = n) ∼ (Y(1) , Y(2) , . . . , Y(n) )
where {Yi }ni=1 is an iid sequence of U (0, t) random variables.

Proof. For 0 < s1 < s2 < . . . < sn < t, consider

P (S1 ≤ s1 , s1 < S2 ≤ s2 , . . . , sn−1 < Sn ≤ sn , N (t) = n)


P (S1 ≤ s1 , s1 < S2 ≤ s2 , . . . , sn−1 < Sn ≤ sn | N (t) = n) =
P (N (t) = n)

Denote the above as Q. Thus we have

P (N (s1 ) = 1, N (s2 ) − N (s1 ) = 1, . . . , N (sn ) − N (sn−1 ) = 1, N (t) − N (sn ) = 0)


Q=
P (N (t) = n)
P (N (s1 ) = 1)P (N (s2 ) − N (s1 ) = 1) . . . P (N (sn ) − N (sn−1 ) = 1)P (N (t) − N (sn ) = 0)
= independent increments
P (N (t) = n)
e−λs1 (λs1 )1 e−λ(sn −sn−1 ) (λ(sn − sn−1 ))1 e−λ(t−sn ) (λ(t − sn ))0 n!
= · ... · · · −λt
1! 1! 0! e (λt)n
n!
= n s1 (s2 − s1 ) . . . (sn − sn−1 ) 0 < s1 < s2 < . . . < sn < t
t

75
Winter 2018 STAT 333 Course Notes 29 MARCH 29, 2018

Differentiating this we get our density function


∂nQ n!
= n
∂s1 ∂s2 . . . ∂sn t
which is the density function for our n order statistics from the U (0, t) distribution.

29 March 29, 2018


29.1 Example 4.4 solution (average waiting time)
Suppose customers arrive to a subway station in accordance with a Poisson process at rate λ. If the subway train
departs promptly at time t (for some time t > 0) what is the expected total waiting time of all customers arriving
during the interval [0, t]?

Solution. Let N (t) count the number of customers arriving to the station by time t where N (t) ∼ P OI(λt).
Letting Si denote the arrival time of the ith customer to the station we have that Wi = t − Si represent the waiting
time of the ith customer.
The total waiting time of all customers is
N
X
W = (t) = Wi = t − Si
i=1

Goal: calculate E[W ].

Remark 29.1. We cannot use the result of the random sum of random variables where E[W ] = E[N (t)]E[Wi ]
since two assumptions are violated here (1) N (t) and Wi are not independent and Wi ’s are not iid.

Note that
 
N (t)
X
E[W ] = E  (t − Si )
i=1
 
∞ N (t)
X X
= E (t − Si ) | N (t) = n P (N (t) = n)
n=0 i=1
∞ n
" #
X X
= E (t − Si ) | N (t) = n P (N (t) = n) W = 0 if N (t) = 0
n=1 i=1
∞ n
" #
X X
= E nt − Si | N (t) = n P (N (t) = n)
n=1 i=1

( " n #)
X X
= nt − E Si | N (t) = n P (N (t) = n)
n=1 i=1

By theorem 4.6, note that (S1 , . . . , Sn ) | N (t) = n ∼ (Y(1) , . . . , Y(n) ) where (Y(1) , . . . , Y(n) ) are the n order statistics
from U (0, t) distribution.

76
Winter 2018 STAT 333 Course Notes 29 MARCH 29, 2018

Thus
∞ n
( " #)
X X
E[W ] = nt − E Y(i) P (N (t) = n)
n=1 i=1

( " n #)
X X
= nt − E Yi P (N (t) = n) summing over all Y(i) anyways
n=1 i=1
∞ n
( )
X X
= nt − E[Yi ] P (N (t) = n)
n=1 i=1
∞ n
( )
X t X
= nt − P (N (t) = n)
2
n=1 i=1
∞  
X nt
= nt − P (N (t) = n)
2
n=1

t X
= nP (N (t) = n)
2
n=0
t
= E[N (t)]
2
λt2
=
2
Note that E[Y(i) ] 6= 2t since it’s not just any Yi but the ith order statistics which depends on i, so we must do the
conversion in line 2.

29.2 Example 4.5 solution (pdf and joint probability of arrival times)
Satellites are launched at times according to a Poisson process at rate 3 per year.
During the past year, it was observed only two satellites were launched. What is the joint probability that the first
of these two satellites was launched in the first 5 months and the second satellite was launched prior to the last 2
months of the year?

Solution. Let {N (t), t ≥ 0} be the Poisson process at rate λ = 3 governing satellite launches. We are interested in
calculating
5 5
P (S1 ≤ , S2 ≤ | N (1) = 2)
12 6
Given N (1) = 2, we use Theorem 4.6 to obtain the conditional joint pdf of (S1 , S2 ) | (N (1) = 2) ∼ (Y(1) , Y(2) ) where
{Yi }2i=1 are iid U (0, 1).
Recall we have for the 2 order statistics pdf at time 1
2!
g(s1 , s2 ) = =2
12
for 0 < s1 < s2 < 1. We use this conditional joint pdf and integrate it over the follow region

77
Winter 2018 STAT 333 Course Notes 29 MARCH 29, 2018

Figure 29.1: The support of our pdf is the top left triangle in the diagram. We want to integrate over the shaded
region.

So
Z 5/12 Z 5/6
5 5
P (S1 ≤ , S2 ≤ | N (t) = 2) = 2 ds2 ds1
12 6 0 s1
Z 5/12 5/6
=2 (s2 ) ds1
0 s1
Z 5/12
5
=2 ( − s1 ) ds1
0 6
5s1 s21 5/12
 
=2 −
6 2 0
25
=
48
≈ 0.5208

29.3 Example 5.1 solution


A Poisson process {X(t), t ≥ 0} at rate λ is an example of a CTMC. Determine the values of vi , i ∈ N and construct
the TPM of its embedded Markov chain.

Solution. Clearly X(t) ∈ N for all t ≥ 0. Moreover, by the way a Poisson process is constructed, we have
Ti ∼ EXP (vi = λ), i ∈ N.
The corresponding TPM of the embedded Markov chain looks like

0 1 2 3 4 ...
 
0 0 1 0 0 0 ...
1 
 0 0 1 0 0 ... 

P = 2 
 0 0 0 1 0 ... 

3 
 0 0 0 0 1 ... 

.. .. .. .. .. .. ..
. . . . . . .

78

You might also like