Adventure in Stochastic
Adventure in Stochastic
0, the graphs for 0.< ¢ <1 have at most two points in common, One of these is s= 1 If (1) = m <1, then in le neighborhood of 1 the graph of y = P(s) cannot be below thot of y = ¢ and hence by convexity of P(s) the only intersection is ¢ = 1. In the contrary case, if P/(L) = m > 1 then in a left neighborhood of 1 the graph of y = P(3) is below the diagonal and there rust be an additional intersection to the left of 2 of the two graphs. See Figure 1.41. a MPLE BRANCHING PROCESS 23 Figure 1.4.1 Note in the binomial replacement example that ¢ = P{s) yields the equation sa gtps ‘whose only solution is s= 1 which agrees with the fact that m= p <1 We now give the following complement, which ties in with the conti- huity theorem of the next section, Complement. For 01.) Since P(s} is non-decressing, from the previous inequalities wo get P(x) =< Pals) < Pla) S. Continuing in this manner, for the general case we obtain © £ Pals) < Pyals) So $ Pls) So. “Thus we conclude that {P,(s)} i non-inereasing for x < 8 <1, Let Pya(8) = lita oo Pals). Suppose for some sy € (8,1) we have Pro($0) =! a> m Then P(q) = Jim. P(Pa(so)) = Jim, Pass (so) =a and on the domain (1,1) we have P(s). tl What Pa(s) + says is that SS Pen = Hot x = 18° + Soot oS i Anticipating the Continuity Theorem for generating functions presented in the next section, this implies that PlZn=0) +, PlZn =k) 0, for k>1. 14, TRE SmaPLE BRANCHING PROCS 25, In fact, using Markov Chain or martingale methods, we can get a stronger result, namely that Pla +0 oF Zy -¥ oo} = 1 and PlZq 0) = 1 PlZa > ol ‘The simple branching process exhibits an instability: Either extinction ‘occurs of the process explodes. Example. Harry Yearns for a Coffee Break. In order to help some fiends, Harry becomes the east coast sales representative of B & D Soft- vrare. ‘The softwace has been favorably reviewed and demand is heavy Barry sets up a sales booth at the local computer show and takes orders Bach order takes throe mimites to fill. While each order is being filled there js probability p; that j more customers will arive and join the line, As- sume po =.2,p) = .2 and p; =.6. Harty cannot take a coffee break until ‘service is completed and no one is waiting inline to order the software IE present conditions persist, what is the probability that Harry will ever take a coffee break? Consider a branching provess with offspring distribution given by (Po,pi,p2). Hasty can take a coffee break if and only if extinction occurs inthe branching process. We have P() = 2428+. and m=(2)14(6)(2)=14>1, 80 #= P(s) yields the equation Sa 2+ Ds+ 69" ‘Therefore we must solve the quadratic equation fs? — 85+ 2=0, and the two roots 2.6) 2(.6) yield the numbers 1, $, and thus x= 3. ‘Thus the probability that Harry can ever take a coffee break if present conditions persist is 1/3,26 PRELIMINARIES. When P() is of degree higher than two, solution by hand of the equa- tion ¢ = P(s) may be difficult, while a numerical solution is easy. ‘The procedure is frst to compute m = Sip kos. I'm <1, then x = 1 and ‘we are done, Otherwise we must solve numerically: Root finding packages ze common. A program such as Mathematica makes short work of finding the solution. Typing SaivelP(s) - = 0,5] will immediately yield all the roots of the equation and the sinallest root in the unit interval can easily be identified. Alternately, can be found by computing x = liray-gp P(0). The recursion =P) Bet = Plo) can be easily programmed on a computer and the solution will converge quickly. Tn fact the convergence will be geometrically fast since for some ee (0,1) O< rm p40. ‘The reason for this inequality is that by the mean value theorem and the monotonicity of PY 19 = Pals) ~ Pal) $ Pal) We need to check that Pile) = (P(e) and PR) <1 Since Prsals) = Py(P(s))P (8) wwe get Psalm) = Pula) P'(m). ‘The difference equation when iterated shows the desired power solution. It remains to check that P'(r) < 1. If this were false and P"(r) > 1, then for s >, by monotonicity and the mean value theorem, we get Pls) Pin) 2 Pils 7) 2 (8-7), | Ps)zrts—r= which for s > 7 is a contradiction since on (7,1) we have P(s) < ¢ (as: suming P(s) is not linear) 1.8. Law Distrisurions aN THE ConTINUITY THEOREM 27. 1.5. Limtr Disterputions anp THE Cowninurry THEOREM. Let {Xn,n > O} be non-negative, integer valued random variables with (n20,k 20) (asa) PIX, wf, Pals) Ee ‘Then X,, converges in distribution to Xo, written Xq = Xo, if (1.5.2) im pf? =f for k=0,1,2,.... As the next result shows, this is equivalent to (1.5.3) Pals) > Fo(s) for0 0) i probability mass function on {02,2} 50 thr _ AWP20, Sa = ‘Then there exists a sequence {pf’,k > O} such that (say Lim of? =p, & 20, iff there exists a function Fy(s),00 we may pick m so large that We have ede? 14+ ior Yial —wP ire Letting n +00, we got Pols) Se and because ¢ is arbitrary we obtain (1.5.5). ‘The proof of the converse is somewhat more involved and is deferred to the appendix at the end of this section, which ean be read by the interested student or skipped by a beginner. 15. Lure Distaupvrions ano THE ConTINUITY THEORES — 29 Example. Harry and the Mushroom Staples.* Bach morning, Harry Duys enough salad ingredients to prepare 200 salads for the lunch crowd at his restaurant. Included in the salad are mushrooms which come in small boxes held shut by relatively large staples. For each salad, there is probability .005 that the person preparing the salad will sloppily drop fa staple into it. During a three week period, Harry's precocious twelfth grade niece, who has just completed a statistics unit in high school, keeps track of the number of staples dropped in salads. (Harry's customers are not reticent about complaining about such things so detection of the sin and collection of the data pose no problem.) After drawing a histogram, the niece decides that the number of salads per day containing a staple is Poisson distributed with parameter (200)(.005) = 1. Harry's niece has empirically rediscovered the Poisson approximation to the binomial distribution: If Xq ~ (K;n, p(n) and lim EX, = € (0,00), (1.58) ta mpl then +X asm — co where Xo ~ p(k). ‘The verification is easy using generating functions, We have sig, PA) ~ in Be = i (plo) + der = lim ( + Sa dnelny using (1.5.8). Appendii Continuation of the Proof of ‘Theorem 1.5.1. We now return to the proof of Theorem 15.1 and show why’ convergence of the generatog Functions implies convergence of the sequences ‘Assume we know the following fact: Aay sequence of mass functions {CF ",5 > 0}, m2 1} has a convergent subsequence {{f",j > 0}} meaning that forall j Jim ‘exists. If {pl”} has two different subsequential limits along {n'} and {n"}, by the fist half of Theorem 15.1 and hypothesis (1.53), we would have Tim, Sop = im. Ps) = Pols) im Im A semi-true story30 PRELIMINARIES and also dim Cpt st = fim Pols) = Pols) ‘Thus any two subsequential limits of {p{")} have the same generating func- tion. Since generating functions iniquely determine the sequence, ll sub- sequential limits are equal and thus lim, qo p{” exists for all k. ‘The limit has a generating function Po(s)- Tt remains to verify the claim that a sequence of mass functions {U3 > 0}, m > If has a subsequential limit, Since for each m we have {4.5 20) (0.1, and (0,1) is a compact set (being a produet of the compact sets [0,1]), ‘we have an infinite sequence of elements in a compact set. Hence a subse- ‘quential limit must exist, Ifthe compactness argument is not satisfying, a subsequential Iimit can be manufactured by a diagonalization procedure, (See Billingsley, 1986, page 568.) 1.5.1. THe Law oF Rare EVENTS. ‘A more sophisticated version of the Poisson approximation, sometimes called the Law of Rare Events, is discussed next, yn 1.5.2. Suppose we have a doubly indexed array of random ch that for each n= 1,2,..., {Xnark > 1), is a sequence of independent (but not necessarily identically distributed) Bernoulli random variables satisfying (5.11) Pa == pln) =1- PX ne =O), (15.12) VV. pala) =6(n) +0, neo, as.) Yopuln) = BY Xan A (0,00), +00. Jf PO(A) is a Poisson distributed random variable with mean A then Yo Xen + POW). 15. Liwtr Disreisurions AND THe Conrmeutry Tusonem 31 Remarks. ‘The ellect of (1.5.1.2) is that each Xy4,k = 1)... yn has a uniformly smal] probability of being 1. Think of X,,« 4s being the indicator of the event Ant, viz Xn = LAqy in which case So Xae= Dd = the number of 44,1 < #2, then in order for the random walk to go from 0 to 1 in steps the first step rust be to ~1 (which has probability g). From ~1 the walk must male its way back up t0 0. Say this takes j steps. Then it seems reasonable that the probability of the walk going from —1 to 0 in j steps is gj. From 0 the random walk still must get up to 1. Say this takes & steps. Then this probability should be dy and the constraint on j and kis that 1+j-+= ‘where the 1 is used forthe intial step to ~1. Thus the equation should be SoS wea ‘The argument just given seems plausible and we now make it precise. (These who foud the argument convincing can skip to (1.6.2) For n > 2-we have (162) WWenl= Ubi = 194 n Bes34 PRELIMINARIES where the random wall: makes a fast retura from —1 to O in j steps} ee a the random walk makes a first passage from 0 to 1 in naj 1 steps] sfintin So Xyaee = m5 Ih When n = 2, interpret the right side of (1.6.1) a8 8, the empty set. Note Ay is dotermined by Xz,...,Xj41, and similarly By,-j-1 is determined by Xy42y.o-1%n ‘Thus, the three events Pa =-a), Banta are independent because they depend on disjoint blocks of the X's. Sinez the union in (1.6.1) is union of disjoint events we have on aPC, P(Ba- 5-3) PIN Now (1, Xo, } (XX, Xe} ning the finite dimensional distributions of both sequences are identi cal; ie, for any mand sequence ky... of elements chosen from {1,1} wwe have PI = hays Xm PUK = bay Xen Fel since both sequences are just independent, identically distributed. There- ma] P(A,) = Phin ‘and similarly PByj-1 = Onis 16. THE SnaPte Ranpom Wack 35 from which we get the recursion 40 =0,1=7 (1.6.2) bn = DW jbngars 2B cat ‘This difference equation summarizes the probability structure. To solve, multiply (1.6.2) by s* and snm over n. Set &(s) = Dig dns”. We have aes) See (Serer) * “E(Seown)a Reversing the summation order (note n ~2> j > Oimplies n > j+2), we set the above equal to Setting m= n—j~1 yields E(Ear)onm BSR) osshs = 90818) S450" -gs0"(s), ‘The left side of (1.6.3) is and we conclude36 Preuiinanies Solve the quadratic for the unknown (3). We get (0) = (1+ VI 4pas?) /2a8. ‘The solution with the “4 siga is probabilistically inadmissible. We know (0) <1 but the solution with the “+” sign has the property La YT= apg? 11 2 es gs ~ Dgs as ¢ +0 (where a(3) ~ B(s) as s —+ 0 means lim, 9 a(s)/4(s) = 1). So we conclude T= tas (1.6.4) a= Osse2 We can expand this to get an explicit sokution for {n) using the Binomial ‘Theorem. On the one hand we know (3) = D7 dns™ and on the other, by expanding 1.6.4, we also have 6) (: E Q)corer) on ‘The “1” and the *j = 0” terms cancel; taking the minus sign in front of, the sum inside the sum yields SUD by By Bis) = (-1)°? (4pq)4844 /298 =) so (U2 jar 409)! soja EC ow eg jst (s+ Os? +. We conclude ons 2) (ay s#1(4pqy8 (P)cwearaiiaa 524 ‘and, for the even indices, we have for j > 1 16. Tae Simpce Ranbom WALK 37 bys =0. For obtaining qualitative conclusions, it may be easier to extract in- formation from the generating function. For instance, from (1.6.4) PIN 0] > 0 and on the set of positive proba- bility iq > 0. OmolSn <0) the gambler is never ahead ‘When P[V = co] > 0 we have by defaition EN =o. When p > q we compute EN = (1) by differentiating in (1.6.4) 24st = 4oas?)-¥*( Spgs) — (1 VT — 4oas")2a ag?s® (ugly but correct), and letting s 7 1 yields en = (2a( ) = 200 vI= a) Hq. mumerator and denominator by 2q we get Be) = i a ew = (4 (1-19) /20 2p _ (= \p~al) p-al ee38 PRELIMINARIES and so EN {Geos ine An extension of these techniques can be used to analyze the distribu- tion of the first return time to 0. Define Np = infin 1:8, =0), st fo = 0 and fan = PlNo = 2], n> 1. Also . FO) = fans™, OS #51. Now we have At inf{m: Dh Xin eee { 14+ infin: Sh, Keg Yon [Xs 1) on [X Set wr ttte Sax meas SX =} = N and observe that because {X,,i 2 1} 4 {Xi 2 2} we have NW 4 Nt ‘Also N* is determined by (Xi4a,i 2 1) and is therefore independent of X;. Similarly N~ is independent of X;. We have Pls) = Bs" = Bey yaa) + BN SES eyo y+ BS yas, By independence this is = Bs" PIX, = -1]+ Es" PIX, = 1] 6.7) = o0(s\q+ spEs* 16. THE SmaPLe Ranpom Wak 39 Note Wo = inffns So Xiga = 1) Snffa SO = = inf(n SX) =) infin: > XP* = 1} Moreover, the process {Sf Xifsn 2 1) is a simple random walk with step distribution PIX = Pixt 1-%1 = 1] = PX; P. To get P= BN, vee simply use the formula (1.6.4) with p and g reversed. Consequently, from (1.6.7), Fs) 90 (iS = ME) +p (=== ae) gs ps (1.6.8) =~ v1— pas. Ruther, PUI) = Ng < oo] = 1 = T= Apa = 1 [pal 0 1 fpw@ PINs g dp, ifp 2, then X isa random vector. (3) If5 = R®, then X is a stochastic sequence; ie, a stochastic process with a discrete index set. So a stochastie process with index set {0,1,...} sa random element of R®. (4) If $= C, the space of continuous functions on (0,0), then X {is a continuous path stochastic process. A prominent example of such » process is Brownian motion, (6) If S = M,(B), the space of all point measures on some nice space F, then X is a stochastic point process. Some promi- nent examples when £ = (0,00) are the Poisson processes and renewal processes, Other examples abound. ‘The distribution of the random element X is the probability measure on (5,5) induced by X, namely Po X-!, so that for Be S PoX""(B) = PIX € Bi AAs before, the distribution of X determines the probabilities of all events determined by X, namely X~(5). Usually it is convenient to find a small cass of sets, as we did in the case of ranclom variables, so that Po X~? is determined by its values on this small class. Recall that the relevant technique is the consequence of Dynkin's Theorem (Billingsley, 1986, p. 38) given next. Proposition 1.7.1. IFC is a class of subsots of S which is closed under finite intersections and generates S, i, if o(C) = S, and iftwo probability measures P,, Pp agree on C, thon P, =P, on S, Sequence Space: We now concentrate attention on RO = (x:x = (ey12,...) andy € R, i> 1} since this class is most appropriate for discrete time stochastie processes. Let C be the class of finite dimensional rectangles in R™; ue, AE C if there exist real intervals Jy,... 1 for some & and A= {xe R® neh, a Note that C is closed under intersections, and it is also true that 72, the cralgebra generated by the open sets in R, is generated by the finite dimensional rectangles in C (cf. for example, Billingsley, 1968). So we have the important conclusion that any measure on (R™, R®) is uniquely ‘determined by its values on the finite dimensional rectangles C.a2 PRELIMINARIES Suppose X= (X;, Kay...) is a random element of R defined on the probability spoce (Q,4,P). lis distribution Po X~ is determined by its values on the finite dimensional rectangles C. ‘This can be expressed another way. We say two random elements X and X’ are equal in distribution (written X £ X") if PoX-! = Po (X")-! on R™. X’ is then called a version of X Proposition 1.7.2. IfX and X! are two random elements of R™ then xéx’ if for every KEV: (yy... Xa) 4 (XY. XD € RE Proof. Define the projections My : °° ++ RE by Hye, 22,-+-) (215. 4) Each Tl, is continuous and hence measurable. IfX £ X" then also Ta(X) = Clays Me) SMR) = (Xp... VQ) 1s desired. Conversely if for every BE 1: (Nyy. Xs) £ (Xhyee- AD ERE then the distributions of X and X’ agree on C and hence everywhere, Ml Call the collection of distributions PoX"tolk() = PU(X1,.-- Xa) E41 on RE (k > 1) the finite dimensional distributions of the process X and our proposition may be phrased as the distribution of a process is determined by the finite dimensional distributions. Define a new class €’ as follows: A set \’ isin C’ if i is of the form N= {y eR? ry Sayi ohh for some & > 1,(2y,... 24) € RE Note that C’ is still closed under Intersections and still generates R™; also Pox al) = PIX San... Xe S tah 17. THE DISTRIBUTION OF A PROCESS 43 which is a &-dimensional distribution function. The analogue of Proposition 1.7.1 is that the distribution of the process is determined by the finite dimensional distribution functions. ‘Two random elements X,X’ in R which are equal in distribution will be probabilistically indistinguishable. ‘This last statement is somewhat vague. What is meant is that any probsbility caleulation dane for X yields the same answer when done for X’. (This rephrases the statement Pik €B)= PK eB, WeR™) In succueding chapters we frequently will construct 9 convenient represen- tation of the stochastic process X= (Xe). (This was already done with the branching procass.) We are assured that any othor version X= {Xi} will have the same properties as the constructed X. Here is one last bit of information: Define the coordinate map 7 Bes Rby miltytay-- for k > 1. The following shows there is nothing mysterious about a mea- surable map from 9 to R*. Proposition 1.7.3. If X is a random element of f° then for each k > 1 we have that m_(X) i a random variable. Conversely, if Xi,X2,... are random variables defined on ®, then defining X by Xu) = (w), Nal), --) ‘Yields a random element of R. ‘A random element then is just a sequence of random variables. Proof. my is continuous and hence measurable. Therefore if X is a random element of R, meoX 06 RB being the composition of two measurable maps is measurable and hence is 4 random variable For the converse, we must show XURB) CA. However, R® = o(C) and XMo(O)) = o(X"C))44 PRELIMINARIES But for a typiesl A € C. XNA) = eG eLied since X1,..- Xe are random variables. Hence XMOQCA and (XO) CA 1s desired. 1.8. SrorPine Times. * Information in a probability space is organized with the help of o-fields If we have a stochastic prooess (Xq,n > 1} we frequently have to know what information is available if we hypothetically observe the process for n time units, Imagine that you will observe the process next week for m time units, The information at our disposal today from this hypothetical future experiment is the o-algebra generated by X;,-..,%y which we denote as O(Xpy.++yXq)- Another way to think about this i that o(X,...Xn) consists of those events such that when we know the value of Xsy...4Xny wwe can decide whether or not the events occurred. Note for n> 0 (Xin) CO(Kty soy Xnt) and the information from hypothetically observing the whole process is of X,5 21) = of) olXs,.- Xn). In general, suppose we have a probability space (9,4, P) and an in- creasing family of o-Belds F,,n>0; ie, Fy C Fasr C A. Define Vanna) Fee * This section contains advanced material which may be skipped om first reading by beginning readers 1.8, SroPeING Times 45 30 that Fa C Fog CA, Think of (Fa,0 1: Sq =} Js a stopping time. Note the convention: Tie infimum of sn empty set is 490. So [Sn <1, for all n] = [{n 21: Sy = 1} = 0] =[W = oe] We have that, N a stopping time with respect to {F,,m > 0} where Fo= (00), Fa=o(X1,..-)Xn)n 2 1 ‘The reason is that for n> 1 iv ($1 1 frp =n] = Ko € BY... Gun € BG EB] E Fr. In Markov chain models, the state space tay be {0,2,...}- A typical case is that B= {0}. We are interested in 79 where {2 0&4 =O} Back to the general discussion. If « is a stopping time with respect to {Fa}, the information up to time a is contained in the o-feld 7 Fa ={N€ Foo: AN a= nl € Fa for all 1 1}- It is the information available after time a 1.8, STOPPING ThaEs ar 1.8.1, WAto’s IDewriry, Wald’s identity and its generalizations are special cases of martingale stop- ping theorems. They are useful for computing moments of randomly stopped sums, although checking the validity of moment: assumptions nee- essary for the identities to hold ean be tricky. We have already seen an identity lke Wald’s in (1.3.4.3) Ifyou did not read Section 1.7, think of a stopping time a with respect to the soquence {Xn,n > 1} as a random variable such that the sot [a = m] is dotermined only by X1,...,Xwms for any m > 1. Thus a takes on the value m regardless of future (beyond time m) values of the process, Proposition 1.8.1. Suppose {Xq.n > 1} are independent, identically distributed with E|X,| < 00. Suppose @ is a stopping time with respect to {Xan > 1} and Ea =E(X)Ea fom Lemma 1.1.1 ‘The rest of the proof justifying the interchange of E and 22, requires f bit of measure theory. A student. who does not have this background should skip the rest of the proof and proceed to the example below. For ‘those who continue, note ED XAtcail = BDO 1XMice48 PRELIMINARIES Since all terms are positive, Fubini or monotone convergence justify the interchange and SF IKMjsai = BUG IBA < 20 by assumption. ‘Therefore the function of two variables é and w Xu) coil) js absolutely integrable with respect to the product of P and counting measure. This justifies a Pubini interchange of the iterated integration. Ml Example. Consider the simple random walk (Sq) with Sp = 0 and set N =inf{n > 1:84 =1} Recall P[X; = 1] = p= 1—P[Xy = —1] 0 that BX; = p—g. On [N < oe] wwe have Sy =1. If EN < oo then Wald’s identity holds and Sy E(X)EN = (p-)EN. If p = q, we get a contradiction: If BN < oo then PIN’ < oo] ‘moreover, on the one hand ESy=EL=1 (since Sy = 1) ‘and on the othe, by Wald Sy —hew=0 Hence BN ~ 00.11 EN < co and p < q then Wald implies 1 = (p-)EN < 0, a contradiction. So we get the weak conclusion EN = oo, whereas we know from (1.6) that in fact PIN’ = oo] > 0. If p > g and EN < co we conclude from Wald EN = (p~ 4), in ageeoment with (2.6.6), but this argument does not prove EN < co, 1.8.2. SpurrrinG AN IID Sequence ar SroppiNe Trust Suppose {Xn,n > 0} are iid random elements and set Fa =a(Xoy... Xa), Fa =a Xngs Xng aioe) “This section contains advanced material which may be skipped on fist reading by beginning readers. However, the main result Proposition 1.8 2s easy ‘to understand in the case that @ i five as. 1a. Sroprine Times 49 ‘Fa sepresents the history up to time n and is the future after n, For jid random clements, these two o-fields are independent. ‘The same is true svien nis replaced by a stopping time a; however care must be taken to handle the case that Pla = co] >0. As before, we must define Fu=V Fa= el Fe) ae tees If a= co it makes no sense to talk about splitting (Xa) into two piwous-—the pre-and posta pieces. Instead we restrict attention to. the Have probability space. If (9,4, ) is the probability space on which {G65} and ar are defined, the trae probability spnce is (O*, A®, P#) = (IA fe < oo), AN [er < oo], Pf-lar < 00}) {assuming Pla < 0] > 0). If Pla < oo] = 1, then there is no essential difference between the original and the trace space. Proposition 1.8.2. Let (Xq,n > 0) be iid and suppose a isa stopping time of the sequence (ie, with respect to {Fq}). In the trace probability space (2#, F#, P#), the pro- and post-a o-fields F,,F., are independent and {Xun 20} £ (Kaen 2 I fn the sense that for BE R™ (as2a) PH[(Xasesk 2 1) € B= Pl(Xqn 20) € Bl Proof. Suppose A © Fa, Then PEA [a < col Mi{Xasask2 1) € Bl (1.82.2) ¥Y PiAnia=nlA [Xnsek 21) € Bl} = From the definition of F, we have Anla=nle Fu Since [(Xqea,k 2 1) € B] € Fy which isindependent of F, we got (18.2.2) equal to SPianie PU Xneesk 2 1} € BL = So P(N [a= al] PX, b> 0} € B) =P In la < ol) PX, ope BI50, PRELIMINARIES By dividing (1.8.2.2) by Pla < oo] we conclude (1828) PH IAM (Xapisk 21) € Bl] = PA(A)PIEX #2 O} € Bl. Let A=. We conclude that (1.8.2.1) is true, Once we know (1.8.21) i true we may rewrite (1.8.23) 88 (18.2.4) PRIANXasask 2 1} € Bl] = PAA)PH Xa k 2M) € Bl Which gives the roquized independence. a Example, Let {Xq,n > 1} be iid Bernoulli, and let the associated simple random walk be 8 = 0,8, = Xi to + Xp mB We derive the quadratic equation for @(s) ~ Bs where N inf(n > 1: Sq = 1} without first deriving a recursion for {P[N = k],k > 1}. We have B(6) Ee Yyxjary + B8™ x3) p+ Bs™Yyy, aay Define inf 21: 0&4) =) i and on [Ny < oc] define My =int{n 21:2 Xystey = Ih so that on (M, < 00,X: = —1] Nel+M+% Define P# = Pl-|M < 20}. Now on [Ny < oo}, No is the same functional of {Xnenad 2 Up a8 Mh is of {Xignd 21} So feom the previous results, for any k, PIN = 8) = P*|Ns = 4] 1. EXERCISES bL ‘Thus for 0< 5-< 1 Bs xynna] = BM ye as<0) (Gince on [X1 = -1,Ni = oc] we have NV = 00 ands = 0). Let E¥ be expectation with respect to the measure P*, Then Be iyxyaay HSB Aye, coo} =sqb¥ s™PIN, < 00), and since N,N, are independent with respect to P# (by 1.8.2), this is asqbt EAN PIN, < 00] Using (1.8.2.1) we get =q #31 Ba™ PIN, < oo] 8B, Tp 22, Let {Xq,n 2 1} be iid Bernoulli random variables with PIX, =1)=p=1- PIX, = 0] and let S, = S7ZL, Xe be the number of successes in'n trials. Show Sy, has 4 binomial distribution by the following method: (1) Prove form 20,1 Sé 0} where X(t) i the quantity ordeced by time ¢ (b} Thirty-six points are chosen randomly in Alaska according to some probability distribution. A cirele of random radius is drawn about each point yielding a random set S. Let X(A) be the value of the oil in the ground under region ACS. The proces is {X(B), BC Alaska) (c) Sleeping Beauty sleeps in one of three positions (1) Onher back looking radiant, (2) Curled up in the fetal postion. (3) In the fetal position, sucking her thumnb and looking radiant only to an orthodontist. Let X(t) be Sleeping Beauty's position at time . The process is {X(t 2 0} (d) For n= 0,1,..., et X; be the value in dollars of property damage to West Palm Beach, Florida and Charleston, South Carolina by the nth hurricane to hit the coast of the United States. 1.5. IX is a nonnegative integer valued random variable with X~ {pe}, Pls) = Es*, express the generating functions if possible, in terms of P(s), of (a) P(X < ni}, (b) PLX 1) is independent, identically distributed. Define Sy = Xq=1 and for n> 1 Sn = Xot Xtc Xe For n > 1 the distribution of Xq is specified by PIX. =5-Wery 70,1, where Meat ss) =Vays' oss <1. ca = (he random walk starts at 1; when it moves in the negative direction, it docs so only by jumpe of —1. The walk connot jump over states when roving i the negative direction.) Let N =infln: Sa = 0} If Pls) = Bs, show P(s) — sf(P(s)). (Note what happens at the first step: Either the random walk goes from 1 toO with probability po or ftom 1 to} with probability py.) If f(s) = p/(1—gs) corresponding to a geometric distribution, find the smallest solution, 1.8. In a branching process Pls) =a8" +3 +e where a> 0,b > 0,¢> 0, P(1) = 1. Compute 9. Give a condition for sure extinction. 1.9. In a binomial replacement branching, model T= ini{n: Zq = 0} (1) Find PE =n) for n> 1 (2) Find P(T =n] assuming Z» Pls) = 9 tps, let 0 1.10, Harry lets his health habits stip during a depressed period and discovers spots growing between his toes according to a branching process with generating funetion P(s) = 15 +.058-4 039? 4.0783 + 494 +255? +.059°4 1. Exercises Will the spots survive? With what probability? LAL. A Point Process. Let 1V(A) be the number of points in re- sion A, Assume that for any n, the set A can be decomposed, A GAM, euch that AQ... AM? are disjoint, (4) = TENA) and N(A%),... ,.N(AS™) are independent. Assume PIN(A) = 0)= expl-A/n), PENCAI) 22] < As) where 6(2) isa positive function such that 6(z) —+ 0 as:z ~+ 0. Show (A) has a Poisson distribution. 1.12. For a branching process {Zn}, let $= 1+ Dy Zn be the total population ever born, Find a recursion which is satisfied by the generating fonction of 3. Solve this in the case P(s) = q+ ps and P(s) = p/(t ~ 98). What is B(S)? 1.13. Let [2] be the greatest integer <2. Check by integral comparison cof another such method that ise) ia, 2s Let {Xj,j 2 1} be independent random variables with PIX; 1/j=1- PIX; = 0) and set S, = D0, Xi 0 21 (1) What is the generating funetion of wa y x (2) Use the continuity theorem for generating functions to show = pl) = ete Swwa - Sw im, PlSiva~ Sw = (3) Define 11) = infj > 1: Xj = 1} Compute the generating function of {P[L(1) > n,n 2. What is ELC)? (4) Tf {Zq,n > 2) is 8 soquence of fd random variables with a contin- uous distribution, show that Ar,ovestent 2 US (Xp. 21) 1. EXERGIsES 55 1.14. Harry comes from a long line of descendents who do not get along with their parents. Consequently each generation vows to be diflerent from their elders. Assume the offspring distribution for the progenitor has enerating function f(s) and that the offspring distribution governing the umber of children per individual in the first generation has generating fimetion 9(s). The next generation has offspring governed by f(s) and the next has g(6) s0 that the functions alternate from generation to generation. Determine the extinction probability of this process and the mean number of individuals in the nth (assume n is even) generation, 1.18. (a) Suppose X is a non-negative integer valued random variable Conduct & compound experiment. Observe X items and mark each of the X items independently with probability » where 0 n| Ocsel Compute P{F > x 116. Stopping Times. (a) Ifa is a stopping time with respect to the ovfields {Fn} thon prove Fa is a 0-feld (b) Hag, k > 1 ate stopping tines with respect to (Fy), show Vion nd Agar are stopping times, (Note V means “max” and f'means “imin”.) If {a4} is a monotone inereasing family of stopping times thon liza, 's a stopping time (6) Mas $ a2 show Fay C Fay LAAT, For a simple random walk {S,} let up =1 and for n> 1, let tin = PISn = Compute by combinatorics the value of up. Find the generating function U(s} = Dj ugs” in closed form. To get this in elosed form you need the ()-eor(2) 1.18, Happy Harry's Fan Club. Harry’s restaurant is located near Orwell University, a famous institution of higher learning. Because of the ‘rucia} culinary, social and intellectual role played by Harry and the restau ant in the life of the University, a fan club is started consisting of two types of members: students and faculty. Due to the narrow focus of the club, ‘uembership automatically terminates after one year. Student members of56 1. EXERCISES the Happy Harry’s Fan Club are so fanatical they recruit other members when their membership expires. Faculty members never recruit because they are too busy. A student recruiter will recruit two students with prob- ability 1/4, one student and one faculty member with probability 2/3 and 2 faculty with probability 1/12. Assume the club was started by one stu- dent. After n years, what is the probability that no faculty will have yet ‘eon recruited? What is the probability tho club will eventually have no members? 1.19, At 2 AM business is slow and Harry contemplates closing his estab- lishment for the night. He starts flipping an unfair coin out of boredom and decides to close when he gets r consecutive heads. Let T' be the nutnber of fips necassary to obtain r consecutive heads. Suppose the probability of a head is p and the probability of a tal is q. Define px = P[T’ = k] so that Pe =O for kr Pa = PID =| = P'Qll — po ~ pa — = Phor-ah (2) Compute the generating function of T and verify P(1 (3) Compute ET. Ifyou are masochistic, try Var(). ‘The next night at midnight, Harry is bored, so again he starts flipping coins. To vary the routines he looks for a pattern of HIT (head then tail) For n> 2, let fu = P| the patter HT first appears at trial number n } ‘Compute the generating function of {fq} and find the mean and variance. 1.20. In a branching process, suppose P(s) = g+ps",0 1: Zq =O} (2) Find the probability of eventual extinction. (2) Suppose the population starts with one individual, Find P{T > n}. 1.21, Let {N(t), 2 0} be » process with independent increments which means that for any & and times 0S th So S te N(ty),N(ba) - Nith),---)N(te) ~ N(¢e-1) are independent random variables. Suppose for each ¢ that .V(t) is non-negative integer valued with Rs) = BX 1, Exercises 8T For 7 0}. Is this a branching process? If so, what is the offspring distribution which generates this process? 1.28. For a branching process with offspring distribution Pa=pa", n>0p+q=10 O} with generating function da(s) = Sop Panst. As before, let Zy be the number ia the nth generation. (1) Construct @ model for this population analogous with the eon- struction of Section 1.58. 1. EXERCISES (2) Express the generating function Jals) = Bs? in terms of ge(s),k 2 O where do(s) = s. (8) xpress my = BZ, in torms of yi > 0 where p= 4(2)- 1.25. Harry and the Management Software. Eager to give Happy Harry's Restaurant every possible competitive advantage, Harry writes in- ventory management software that is supposedly geared to restaurants. Hiarry, sly fox that be is, has designed the software to contain a virus that wipes out all computer memory and results in a restaurant being unable to continue operation. He starts by crossing the street and giving a copy to the trendy sprouts bar. The software is presented with the condition that the recipient must give a copy to two other restaurateurs, thus spreading. the joy of technology. The time it takes a recipient to find someone else for the software is random. Upon receipt of the software, the length of time until it wipes out a restaurant's computer memory {3 also random, Of course, once a resteurant’s computer memory is wiped out, the owner ‘would not continue to disburse the software. Thus a restaurateur may distribute the software to 0, 1 or 2 other restaurants. For j = 0,1,2, define pj ~ P(e restaurateur distributes the software to j other restaurants | Suppose py = .2,p: = Ip = 7. What is the probability that Harry's plans for world domination of the restaurant business will succeed? 1.26.* Suppose Xi, X» are independent, N(0,1) random variables on the space (8,A,P). (a) Prove X 4 (b) Prove 35 Le, prove that Po Xz? = Po(—Xy)"! on R. (1, Xa + Xa) & (Mi, Xa ~ Xa) in RY; ke, prove Po(Xa,X1 + X)7! = Po (Xi, Xa — Xz)" on R® Now suppose (Xi, > 1} is an iid sequence of N(0,1) random variables, * This problem requires some advanced material which shoud be skipped on ‘he frst reading by beginning readers. 1. BXBRCISES 59. (c) Prove (M+ Xa. in Re. (a) If X,¥ are random elements of a metric space S, and g: S++ S' is ‘a mapping from S to a second metric space 5”, show that X # ¥ implies oX) £9). 1.27. IX; has o negative binomial distribution with parameters pyr (of. Example 1.36, Section 1.3.3), show that ifr — oo and rq — > 0, then the negative binomial random variable X, converges in distribution to a Poisson random variable with parameter 2 1.28. Consider the simple branching process {Z} with offspring distribu tion {px} and generating function P(s). (a) When is the total number of ofispring "2.0 2, < oo? (b) When the total number of offspring is finite, give the finetional equation satishied by the generating function ¥(s) of [222.9 Zn < 00. (c) Zeke initiates a family line whichis sure to die out. “Lifetime earnings of each individual in Zeko's line of descent (including Zeke) constitute lid random variables which are independont of the branching process and have ‘common distribution function F(z), where F concentrates on (0,00). Thus vo each individual in the line of descent is associated a non-negative random variable. What isthe probability H(z) that no one in Zeke's line earns more than x im his/her lifetime, where of course «> 0. (4) Whea 5 = 35 P(s) find (3). Ifin addition, Fo) 2>0, find H(2).CHAPTER 2 Markov Chains N TRYING to make a realistic stochastic model of any physical situation, one is forced to confront the fact that real lifes full of dependencies. For example, purchases next week at the supermarket may depend on satis- faction with purchases made up to now. Similarly, an hourly reading of pollution concentration at a fixed monitoring station will depend on pre- vious readings; tomorrow's stock inventory will depend on the stock level today, as well as on demand. The number of customers awaiting service at 6 facility depends on the number of waiting eustomers in previous time periods, ‘The dilemma is that dependencies make for realistic models but also for unwieldy or impossible probebility calculations. The more independeace built into @ probability model, the more possibility for explicit calcula- tions, but the more questionable is the realisin of the model. Imagine the absurdity of a probebility model of a nuclear reactor which assumes each component of the complex system fails independently. ‘The independence assumptions would allow for calculations of the probability of a core melt- down, but the model is so unrealistic that no government agency would be so foolish as to base policy on such unreliable numbers—at least not for long. ‘When constructing a stochastic model, the challenge is to have depen- encies which allow for sufficient reatism but which ean be analytically tamed to permit sufficient’ mathematical tractability. Markov processes frequently balance these two demands nicely. A Markov process has the property that, conditional on a history up to the present, the probabilistic, structure of the future does not depend on the whole history but ony on the present. Dependencies are thus manageable since they are conditional ‘on the present state; the future becomes conditionally independent of the past. Markov chains are Markov processes with discrete index set and countable oF finite state space. We start with a construction of a Markov chain process {Xq,n > 0} ‘The process has a discrete state space denoted by S. Usually we take the state space Sto be a subset of integers such as {0,1,..} (infinite state space) or {0,1,...,m} (finite state space). When considering sta tionary Markov chains, itis frequently convenient to let the index set be 21, CONSTRUCTION AND Fins? PROPERTICS 61 for chOds index set How does a Markov chain evolve? To fix ideas, think of the following scenario. During a decadent period of Harry’s life he used to visit a bar every night. The bars were chosen according to a random mechanism, Harey's random choice of a bar was dependent only on the bar he had visited the previous night, not on the choices prior to the previous night. What would be the ingredients necessary for the specification of a model of bar selection? We would need an initial distribution {a,} so that when Harry's decadent period commenced he chose his initial bar to be the kth with probability ax. We would alco need transition probabilities p,; which could determine the probability of choosing the jth pub if on the prior night the ith was visited, Section 2.1 begins with a construction of a Markov chain and a discussion of elementary properties. The construction also describes how one would simulate a Markov chain, }, but for now the non-negative integers suffice for the 2.1, CONSTRUCTION AND Finst PROPERTIES, Let us first recall how to simulate a random variable with non-negative integor values {0,1,...}. Suppose X is a random variable with PIX =H=on #20, 35 Let U be uniformly distributed on (0,1). We may simulate X by observing U, and if U falls in the interval (SEQ £. (As a convention here and in what follows, set oz, a; = 0.) Now if we define v= Shy & v) a Bhool sothat ¥ = kif U € (Dio ai, Fig ai). then Y has the same distribution a8 X, and we have simulated We now construct a Markov chain, For concreteness we assume the state space $ is {0,1,...}. Only minor modifications are necessary if the state apace is finite, for example § = (0,1,...,m)}. We nowd an initial distribution {ay} where ax > 0,55 to govern the choice of an initial state, We also need a transition matrix to govern transitions from2 Markov Ciatns state to state. A transition matrix is a matrix which in the infinite state space case is P = (pyj,i 2 0,j 20) or, written out, Poo Po. Po Pn ‘and where the entries satisfy p35 20 Ypy=h, Ay (ln the ease where the state space $ is nite and equal to {0,1,... sm}, P is (m+1) x (m +1) dimensional) ‘We now construct a Markov chain {X,,n > 0}. We need a scheme which will choose an initial state k with probability ay that will generate trapsitions from é to j with probability py. Let {Un 2 O} be iid uniform random variables on (0,1). Define ge Oo) ‘This is the construction given above, whick produces a random variable that takes the value k with probability ay. The rest of the process is defined inductively. Define the function f(é,u) with domain $ x {0,1} by fe so that f(i,te) = k iff we (S43 ps DhpPiy)- Now for n 2 0 define Xan = Ln Unt) Note that if X = i, we have constructed X41 so that it equals k with probability pis. Also observe that Xo is a function of Uo, X; is a function of Xo and Ui and hence is a function of Up and U,, and s0 on so, that in general we have Xn41 is a function of Up, Ur,... ;Uma: Some elementary properties of the construction follow, 1. We have (21.1) PIX =A) 21, CONSTRUCTION AND FinsT PROPERTIES 63. and for any n> 0 1.2) PiXnt = jn = Bs ‘This follows since the conditional probability in (2.1.2) is equal to PIS(Xn,Unss) = 51% PUfli Una) = PU (i Uns, Xn =i] jl since Une and X, are independent. By the construction at the beginning of Section 2.1, this probability is py 2. As a generalization of (2.1.2) we show we may condition on a history with no change in the conditional probability provided the history ends in state i. More specifically we have (2.1.3) PiXnta = 3X0 = fos: for integers fo,i1, fay Xa = ina Xe As with property 1, this conditional probability is PUG Uns) = J1X0 = t0.-- Xa i and since Xo,...,Xy are independent of Uns, the foregoing probability PLSG, Vasa) = 51 = Pas Processes satisfying (2.1.2) and (2.1.3) possess the Markov property ‘meaning PiXnea = 3X0 = io, fat = int) Xn =i] = Plf(Xa, Unga) = j1Xn =i) 3. As a generalization of (2.1.8), we show that the probability of the future subsequent to time n given the history up to m, is the same as the Probability of the future given only the state at time n; and this conditional Probability is independent of n (but dependent on the state). Precisely, we have for any integer m and any states ky,...Km. PUK tt = bys y Xt = RimiXo = fos Kner = tna Xn = i pose Xam = hin Xn 14) Xin = bi Xo = ie64 Markov Cuamns In shorthand notation, denote the event [Xne1 = his) Xmam = ka] by [(Xj,5 2 n+ 1) € Bl. Note that in the probability of (2.1.4) we can replace X41 by f(i,Unt1), and we can replace Xnya by f(Xntts Una £(F (i, Unga), Una) and 90 on. Thus in the probability of (2.1.4) we can replace (X), j 2 n+1) by something depending only on Uy, 2 n-+1 which is independent of Xo,....Xq- Therefore the conditional probability is PUL, Untads fF lGUnta) Onsa)s---) € Bl Since this also equals PUSG,Us), AFG), Ua),---) € Bl, the result follows, "The three properties above are the essential characteristics of a Markov chain, Definition. Any process {Xn n > 0) satisfying (2.1.2)—(2.1.3) is called ‘a Markov chain with initial distribution {ax} and transition probability matrix P, Sometimes a transition probability matrix is called a Markow or a sto- ‘chastic matrix. ‘The constructed Markev chain has stationary transition probabilities since the conditional probability in (2.1.2) is independent of n. Sometimes Markov chain with stationary transition probabilities is called homo- geneous. ‘Warning. Although the constructed process possesses stationary transition probabilities, the process in general is not stationary. For the process {Xq} to be stationary, the following condition, describing a translation property of the finite dimensional distributions, must hold: For any non-negative Integers m,v and any states ko, kg We have PIX = bo, (Roughly speaking, this says the statistical evolution of the process over an Snterval isthe same as that of the process over a translated interval.) The concept of a Markov chain being a stationary stochastic process and having stationary transition probabilities should not be confused. Conditions for the Markov chain to be stationary are discussed in Section 2.12. ‘The process constructed above will sometimes be referred to as the sim- tlated Markay chain. We will show in Proposition 2.1.1 that any Markov chain {Xf} n > O} satisfying (2.1.1), (21.2) will be indistinguishable from the simulated chain (Xp) in the sense that Xm = Kel PX = Boyes Kot = kr {Xun > 0} £4X#,n > 0}, 21. CoNsTaucTION AND Finst PRoPERriEs 6s that is, the finite dimensional distributions of both processes are the same. ‘Together, the ingredients {ax} and P in fact determine the distribution of the process as shown next. Proposition 2.1.1. Given a Markov chain satisfying (2.1.1)-(2.1.3), the finite dimensional distributions are of the form (215) PIXy=i0.% soo Me = i Pits Pia lor ig,» six integers in the state space and k > 0 arbitrary. Conversely siven a density {a4} and a transition matrix P and a process {X,} whose finite dimensional distributions are given by (2.1.5), we have that {Xn} is 4 Markov chain satisfying (2.1.1)-(21.3). So the Markov property, ie. (2.1.2)-(2.1.3), can be recognized by the form of the finite dimensional distributions given in (2.1.5). Proof. Reeall the Chain Rule of conditional probability. If Ao,..- Ax are ‘events then PAD = PALF) A) PUAnal PAs) PCAnlAo) P(A) a a be provided P((Yup Ai) > 0,5 = O,1,...4k = Suppose (2:1.1)-(2.1.8) hold and Set A; = [Xy = i] 90 that if (21.6) P\X> = ij] > 0, =0,..,k—1 then, PIX: and applying (2.1.3) to the right side we got TPs =uix What if (2.1.6) fails for some j? Let inf{j 2 0: PIXo = toy... Xy66 Markov CHAIN If j* = 0 then aj, = 0 and (2.1.5) holds trivially. If j* > 0 then by what was already proved PIX = toy... Xyrat) = O4Pits Pryenntenn > Consequently, Payeaatye = PIX foes Xp ipl PIX = fo, so again (2.1.5) holds. Conversely, suppose forall and elivioes of fo,... i that (2.1.5) holds. For OiPigs **Plyaainn > 0 ‘we have PiXe = ielXp = toy... Xen = fea] Xi = te) PIX0 = fos Xen 16a] BsgPioss ++ Phaainas = Poaaste showing that the Markov property holds. i 2.2, EXAMPLES, Hore are some examples of Markov chains. Some will be used to illustrate ‘concepts discussed later and some show the range of applications of Markov ‘chains Example 2.2.1, Independent ‘Trials. Independence is a spocial cave of Markov dependence. If (X,) are iid with PIXo= a4, K=O1,...5m, then PiX naa = intslXo = to a) and 2.2, EXAMPLES 67 Example 2.2.2. The Simple Branching Process. Consider the ple branching process of Section 1.4. Recall {Zn} are iid with common distribution {p,} and Zp = 1 and Zn ryt bot Dn les PlZn = inl = PLY, Zug = inl oy.) Znnt = int] fog ey Baar = ina] = PILL Zag = inh siving the Markov property since thie depends only on in and in. Thus Plan = 5l2n-1 = 4) = PID. Zana = 5) = Pi i where i denotes /-fold convolution. Example 2.2.3. Random Walks. Let {Xq,n 2 1} be iid with PIXn =k] = ay, ~00 Sk <0. Define the random walk by ‘Then {Sq} is a Markov chain since P(Sn1 = inesiSo = 0,5) = ity... Sn = ta] 1Xn+i + in = insalSo = 0, Xngs = ings — én] [Sasa in] intl Since Xq4y is independent of So,-.. 5 Sw.68 Markov Cuams ‘A common special case is where only increments of £1 and 0 are al- lowed and where 0 and m are absorbing barriers, The transition matrix is ‘then of the form a 1pm 0 o 0 0 @ te He) ee) ‘he trcingonal structure indicative of a random wa with stops 1,0 ote PlSq = 018-1 = 0] = P|Sy = on|Sy-3 =m which models the hypothesized absorbing nature of states 0 and rm and Pl ner = + 1Xy =] =P5 PiXngr = t— %n PX =iXn % for l 1, we have 7,401 = 1, s0 that the process marches detersinistically through the integer towards 00 ‘A comainon method for generating Matkov chains with state space S,
0} are id fandom elements in some space 5. For instance, E could be R or Ror R®. Given two functions Br SKEOS, Gt 12, define Xo =n1(5,¥o)s and for n > 1 Xp =G2(Kna1.Vo): ‘The branching process and random walks follow this paradigm, as do the following examples. Example 2.2.6. An Inventory Model. Let I(¢) be the inventory level of an item at time &. Stock levels are checked at fixed times To,Ts, 2, A commonly used restocking policy is that there be two critical values of inventory s and $ where 0 1) is fid and independent of Xo, and suppose Xo < S. Then nan {SAT Bendh Hoek SS SVS Dardis Xa Ss, (22a) where as usual -{é ifz>0, Lo, ites. ‘This follows the paradigm Xaya = (Xn, Dusah m2 0, and hence {Xn} is. a Markov chain, For this inventory model, deseriptive quantities of interest include: