Stochastic Petri Nets 2nd Edition Falko Bause Pieter S Kritzinger Download
Stochastic Petri Nets 2nd Edition Falko Bause Pieter S Kritzinger Download
https://ebookbell.com/product/stochastic-petri-nets-2nd-edition-
falko-bause-pieter-s-kritzinger-2168608
Stochastic Petri Nets For Wireless Networks 1st Edition Lei Lei
https://ebookbell.com/product/stochastic-petri-nets-for-wireless-
networks-1st-edition-lei-lei-5054528
https://ebookbell.com/product/stochastic-petri-nets-modelling-
stability-simulation-1st-edition-peter-j-haas-890752
https://ebookbell.com/product/stochastic-petri-nets-modelling-
stability-simulation-1st-edition-peter-j-haas-4848508
https://ebookbell.com/product/introduction-to-quantitative-
macroeconomics-with-julia-stateoftheart-dynamic-stochastic-general-
equilibrium-models-paperback-petre-caraiani-10560850
Realtime Applications With Stochastic Task Execution Times Analysis
And Optimisation 1st Edition Sorin Manolache
https://ebookbell.com/product/realtime-applications-with-stochastic-
task-execution-times-analysis-and-optimisation-1st-edition-sorin-
manolache-2143182
https://ebookbell.com/product/stochastic-processes-with-applications-
antonio-di-crescenzo-claudio-macci-55252044
https://ebookbell.com/product/stochastic-processes-and-simulations-a-
machine-learning-perspective-2nd-edition-vincent-granville-44871770
https://ebookbell.com/product/stochastic-elasticity-a-
nondeterministic-approach-to-the-nonlinear-field-theory-l-angela-
mihai-45334300
https://ebookbell.com/product/stochastic-exponential-growth-and-
lattice-gases-statistical-mechanics-of-stochastic-compounding-
processes-dan-pirjol-46073864
5
Preface
Any developer of discrete event systems knows that the most important quality of
the final system is that it be functionally correct by exhibiting certain functional,
or qualitative properties decided upon as being important. Once assured that the
system behaves correctly, it is also important that it is efficient in that its running
cost is minimal or that it executes in optimum time or whatever performance
measure is chosen. While functional correctness is taken for granted, the latter
quantitative properties will often decide the success, or otherwise, of the system.
Ideally the developer must be able to specify, design and implement his system
and test it for both functional correctness and performance using only one for-
malism. No such formalism exists as yet. In recent years the graphical version
of the Specification and Description Language (SDL) has become very popular
for the specification, design and partial implementation of discrete systems. The
ability to test for functional correctness of systems specified in SDL is, however,
limited to time consuming simulative executions of the specification and perfor-
mance analysis is not directly possible. Petri nets, although graphical in format
are somewhat tedious for specifying large complex systems but, on the other
hand were developed exactly to test discrete, distributed systems for functional
correctness. With a Petri net specification one can test, e.g., for deadlock, live-
ness and boundedness of the specified system. Petri nets in their various formats,
have been studied extensively since first proposed by Carl Adam Petri in 1962
[133] and several algorithms exist to determine the functional properties of nets.
Another paradigm which is aimed at testing for functional correctness is that of
process algebras or calculi for communicating systems.
The major drawback of Petri nets, as originally proposed and process algebras
(amongst others) is that quantitative analyses are not catered for. As a conse-
quence, the developer who needs to know about these properties in his system
has to devise a different model of the system which, apart from the overhead con-
cerned provides no guarantee of consistency across the different models. Because
of the latter, computer scientists during the last decade added time, in various
forms, to ordinary Petri nets to create Stochastic Petri nets (SPNs) and Gen-
eralized Stochastic Petri nets (GSPNs) for performance modelling and a great
deal of theory has developed around Stochastic Petri nets as these are generically
known.
Another aspect which also contributed significantly to the development of Stochas-
tic Petri nets is the fact that their performance analysis is based upon Markov
theory. Since the description of a Markov process is cumbersome, abstract models
have been devised for their specification. Of these, queueing networks (QNs) was
originally the most popular, especially since the analysis of a large class of QNs
(product-form QNs) can be done very efficiently. QNs cannot, however, describe
system behaviours like blocking and forking and with the growing importance
of distributed systems this inability to describe synchronisation naturally turned
6 PREFACE
Contents
Preface 5
Contents 9
I STOCHASTIC THEORY 13
1 Random Variables 15
1.1 Probability Theory Refresher . . . . . . . . . . . . . . . . . . . 15
1.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . 18
1.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . 20
1.4 Moments of a Random Variable . . . . . . . . . . . . . . . . . . 21
1.5 Joint Distributions of Random Variables . . . . . . . . . . . . . 22
1.6 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Markov Processes 25
2.1 Discrete Time Markov Chains . . . . . . . . . . . . . . . . . . . 27
2.1.1 Steady State Distribution . . . . . . . . . . . . . . . . . . . 33
2.1.2 Absorbing Chains and Transient Behaviour . . . . . . . . . 37
2.2 Semi-Markov Processes . . . . . . . . . . . . . . . . . . . . . . 43
2.2.1 Formal Model of a Semi-Markov Process . . . . . . . . . . . 43
2.2.2 Interval Transition Probabilities . . . . . . . . . . . . . . . 45
2.2.3 Steady State Behaviour . . . . . . . . . . . . . . . . . . . . 47
2.3 Continuous Time Markov Chains . . . . . . . . . . . . . . . . . 49
2.3.1 Steady State Distribution . . . . . . . . . . . . . . . . . . . 54
2.4 Embedded Markov Chains . . . . . . . . . . . . . . . . . . . . . 56
4 Further Reading 75
10 Contents
II PETRI NETS 77
5 Place-Transition Nets 79
5.1 Structure of Place-Transition Nets . . . . . . . . . . . . . . . . 83
5.2 Dynamic Behaviour of Place-Transition Nets . . . . . . . . . . 86
5.3 Properties of Place-Transition Nets . . . . . . . . . . . . . . . . 88
5.4 Analysis of Place-Transition Nets . . . . . . . . . . . . . . . . . 92
5.4.1 Analysis of the Reachability Set . . . . . . . . . . . . . . . 92
5.4.2 Invariant Analysis . . . . . . . . . . . . . . . . . . . . . . . 97
5.4.3 Analysis of Net Classes . . . . . . . . . . . . . . . . . . . . 103
Analysis of State Machines . . . . . . . . . . . . . . . . . . 104
Analysis of Marked Graphs . . . . . . . . . . . . . . . . . . 106
Analysis of EFC-nets . . . . . . . . . . . . . . . . . . . . . . 108
5.4.4 Reduction and Synthesis Analysis . . . . . . . . . . . . . . 112
5.5 Further Remarks on Petri Nets . . . . . . . . . . . . . . . . . . 115
Bibliography 197
11
Index 213
12 Contents
Part I
STOCHASTIC THEORY
15
1 Random Variables
Much of the world around us is not very deterministic although it may not be
apparent at first glance. Consider a computer, for instance, which given the same
input values, will always give the same output. While a computer program is
processing incoming data however, it is often not possible to predict from one
moment to the next
Think of the node of a computer network to understand this. Although the set
of messages which may arrive at the node is finite and known, we cannot tell for
certain from instant to instant which messages will arrive from where. Moreover,
the network software is likely to be using the same processor(s) at the node as
the operating system. When the process executing the network software will be
interrupted and by which process cannot be said for certain. All of which makes
it impossible to tell for certain what will happen next. We say the process just
described is stochastic.
The term stochastic has an exact mathematical meaning and there is a vast theory
developed to predict the behaviour of stochastic processes. This part of the book
gives only a basic introduction to that theory. Goodman [86] provides a more
thorough introduction to the subject while an extensive treatment can be found
in Howard [90].
then the two events are said to be mutually exclusive or disjoint. This leads to
the concept of mutually exclusive exhaustive events {A1 ,A2 , . . . ,An } which are
events such that
Ai Aj = Ai ∩ Aj = ∅ for all i 6= j
A1 ∪ A2 ∪ . . . ∪ An = S (1.1)
P [AB]
P [A|B] :=
P [B]
whenever P [B] 6= 0.
The statistical independence of events can be defined as follows. Two events A
and B are said to be statistically independent iff
For three statistically independent events A, B, C each pair of events must satisfy
Eq. (1.2) as well as
and so on for n events requiring the n-fold factoring of the probability expression
as well as the (n−1)-fold factorings all the way down to all the pairwise factorings.
Moreover, for two independent events A and B
P [A|B] = P [A]
which merely states that the knowledge of the occurrence of an event B does not
affect the probability of the occurrence of the independent event A in any way
and vice-versa.
We also need to know the theorem of total probability for our basic understanding
of probability theory.
The last equation suggests that to find the probability of some complex event
B, one simplifies the event by conditioning it on some event Ai in such a way
that computing the probability of event B given event Ai is less complex and
then to multiply by the probability of the conditional event Ai to yield the joint
probability P [Ai B]. Having done this for a set of mutually exclusive exhaustive
events {Ai } we may then sum these probabilities to find the probability of the
event B. If we need to simplify the analysis even further, we may condition
event B on more than one event and then uncondition each of these events by
multiplying by the probability of the appropriate condition and then sum all
possible forms of all conditions.
The final bit of probability theory that we are certain to come across in our study
of stochastic systems is Bayes’ theorem.
Theorem 1.2 Bayes’ theorem. Let {Ai } be a set of mutually exclusive and ex-
haustive events. Then
P [B|Ai ]P [Ai ]
P [Ai |B] = Pn (1.3)
j=1 P [B|Aj ]P [Aj ]
Exercise 1.1 If there are n people present in a room, what is the probability
that at least two of them have the same birthday? How large may n be for this
probability to be less than 0.5?
We call a variable random and denote it χ if we cannot tell for certain what its
value will be. Examples of such random variables are the temperature outside on
any particular day, the number of customers in a supermarket checkout line or
the number of messages arriving at a network node.
A random variable is said to be discrete if the set of possible values of χ is
countable (but not necessarily finite). Since we do not know for certain what
value it will have, we say that it will have value x with a probability pχ (x). That
is
pχ (x) = P [χ = x]. (1.4)
In this formula, x can be any real number and 0 ≤ pχ (x) ≤ 1 for all values of x.
pχ (x) is called the probability mass function of χ.
Suppose that χ can take on the values x1 ,x2 ,x3 ,x4 or x5 with probability p1 ,p2 ,p3 ,p4
and p5 respectively. Clearly,
5
X
pi = 1
i=1
Another way of describing a random variable χ which takes values from an ordered
set is to give a formula for the probability that it will take on values of xi which
are less than or equal to some value a. This leads to the following important
definition.
Fχ (x) = P [χ ≤ x]
Again, it should be evident that the values of the function Fχ are between 0 and
1. Using Fχ we can calculate
{χ ≤ b} = {χ ≤ a} ∪ {a < χ ≤ b}
so that
Fχ (b) = Fχ (a) + P [a < χ ≤ b]
and the equation in (1.6) follows.
Exercise 1.3 If the probabilities of a male or female offspring are both 0.5, find
the probability of a family of five children being all male.
Exercise 1.5 An Ethernet local network has k stations always ready to transmit.
A station transmits successfully if no other station attempts to transmit at the
same time as itself. If each station attempts to transmit with probability p, what
is the probability that some station will be successful?
20 1 Random Variables
Fχ (a) = P [χ ≤ a]
The function fχ (x) is called the probability density function of the random vari-
able χ. Again, because we are concerned with probabilities, we must have the
condition
Z ∞
fχ (x)dx = 1 (1.8)
−∞
Also, analogous to Eq. (1.6) we can calculate the probability that a random
variable χ lies in the interval (a,b) from
Z b
P [a < χ < b] = fχ (x)dx (1.9)
a
The density function fχ (x) does not have to be continuous, but the distribution
function Fχ (x) is automatically continuous. This implies
P [χ = x] = 0 (1.10)
The constant λ is called the parameter of the distribution and the function is
undefined for x < 0.
The corresponding cumulative distribution function is easily calculated to be
Z a
F χ(a) = λe−λx dx = 1 − e−λa (1.13)
0
for a ≥ 0, and F (a) = 0 if a < 0. Note also that lima→∞ F (a) = 1 as it should
be, since it is certain that 0 ≤ x < ∞.
Exercise 1.6 Find the probabilities that a random variable having an exponential
distribution with parameter λ = 10 assumes a value between 0 and 3, a value
greater than 5, and a value between 9 and 13.
Exercise 1.7 The life expectancy of a certain kind of lightbulb is a random vari-
able with an exponential distribution and a mean life of 100 hours. Find the
probability that the lightbulb will exceed its expected lifetime.
Note that we integrate only over the interval t ∈ [0,∞) since the independent
variable will always be time in our discussions.
We will see later on that we will need to know the expectation of the power of
a random variable as well. The expected value of the nth power of a random
variable is referred to as its nth moment. Thus the more general nth moment
(the mean is just the first moment) is given by
Z ∞
E[χn ] = tn fχ (t)dt
0
The second central moment is used very often and is referred to as the variance,
usually denoted by σχ2 and defined as before by
22 1 Random Variables
σχ2 = (χ − χ)2
= χ2 − (χ)2
The square root σχ of the variance is referred to as the standard deviation. The
ratio of the standard deviation to the mean of a random variable is called the
coefficient of variation denoted by
σχ
Cχ =
χ
Exercise 1.8 Referring to Ex. 1.5 (page 19), compute the mean number of colli-
sions to be expected by any one of the k stations before a successful transmission.
Exercise 1.9 Compute the mean and coefficient of variation of a random variable
which is exponentially distributed with parameter λ.
Use the symbol Rn to denote the set of all n-tuples of real numbers. Let χ1 ,χ2 , . . . ,χn
be random variables. These random variables are said to have a joint discrete dis-
tribution if there exists a nonnegative function p(x1 ,x2 , . . . ,xn ) of n real variables
that has the value 0 except at a countable set of points in Rn , such that
for all points (x1 ,x2 , . . . ,xn ) in Rn . Obviously we must have that
X
p(x) = 1
x∈Rn
Similarly we say that the collection χ1 ,χ2 , . . . ,χn of random variables has a
joint continuous distribution if there exists a nonnegative integrable function
f (x1 ,x2 , . . . ,xn ) of n real variables that satisfies
Z a1 Z an
P [χ1 ≤ a1 , . . . ,χn ≤ an ] = ... f (x1 ,x2 , . . . ,xn )dx1 . . . dxn
−∞ −∞
for all choices of upper limits a1 , . . . ,an . The function f is called the joint proba-
bility density function of the random variables χ1 ,χ2 , . . . ,χn and as in the discrete
case, we must have
Z ∞ Z ∞
... f (x1 ,x2 , . . . ,xn )dx1 . . . dxn = 1
−∞ −∞
1.6 Stochastic Processes 23
If we know the joint distribution f of χ1 ,χ2 , . . . ,χn , we can obtain the distribution
of any one, say χm , of the random variables by simply integrating over all values
of the remaining random variables. That is, fm (x) =
Z ∞ Z ∞
... f (x1 , . . . ,xm−1 ,x,xm+1 , . . . ,xn )dx1 . . . dxm−1 dxm+1 . . . dxn
−∞ −∞
is the probability density function of χm . The same holds true for the discrete
case where we would sum, rather than integrate, over all possible values of the
other variables.
Definition 1.5 If the values x = (x1 ,x2 , . . .) in the state space of χ(t) are finite
or countable, then we have a discrete-state process, also called a chain. The state
space for a chain is usually the set of integers {0,1,2, . . .}. If the permitted values
in the state space may range over a finite or infinite continuous interval, then
we say that we have a continuous-state process. The theory of continuous-state
stochastic processes is not easy and we will only be considering discrete-state
processes in this book.
Definition 1.6 If the times t = (t1 ,t2 , . . . ,tn ) at which we observe the value of
χ(t) are finite or countable, then we say that we have a discrete-time process;
if these times may, however, occur anywhere within a set of finite intervals or
an infinite interval of time, then we say that we have a continuous-time process.
When time is discrete we write χn rather than χ(t) and refer to a stochastic
sequence rather than a stochastic process.
24 1 Random Variables
Definition 1.7 Consider the joint distribution function (refer Sec. 1.5) of all the
random variables X = {χ(t1 ),χ(t2 ), . . .} given by
for all x = (x1 ,x2 , . . . ,xn ), t = (t1 ,t2 , . . . ,tn ) and all n. Then the nature of FX (x; t)
is the third quantity which determines the class of a stochastic process.
In this book we will consider only the class of stochastic processes known as
Markov processes.
25
2 Markov Processes
In the case of a homogeneous Markov process, the particular instant tn in Eq. (2.2)
does not matter either so that the future of the process is completely determined
by the knowledge of the present state. In other words,
In fact, worse than that, an important implication is that the distribution of the
sojourn time in any state must be memoryless. Our surfer does not know how long
he has been at this beach! If you think about it, if the future evolution depends
on the present state only, it cannot depend on the amount of time spend in the
current state either.
When time is continuous, there is only one probability distribution fχ (y) of the
time y spent in a state which satisfies the property
P [χ ≥ y + s|χ ≥ s] = P [χ ≥ y]
In other words, the sojourn times in a Continuous Time Markov Chain (CTMC)
have an exponential probability distribution function. We will prove this fact in
Sec. 2.3 on page 49. Not surprisingly, we will meet the exponential distribution
many times in our discussions.
Similarly, for a Discrete Time Markov Chain (DTMC), the sojourn time η in a
state must be a geometrically distributed random variable (cf. Eq. (1.5))
Note that when a process has an interarrival time distribution given by Fη (n) it
is said to be a Bernoulli arrival process. Moreover, let η = nδ for n an integer
and δ the basic unit of time. Then the mean time is given by
∞
X δ
δ kpη (k) = (2.6)
k=1
(1 − q)
Exercise 2.1 The weather bureau in an European country decided to improve its
record for weather prediction. This is made a little easier by the fact there are
never two sunny days in a row. If it is a sunny day however, the next day is just
as likely to be rainy as it is likely to be just grey and dull. If it is not a sunny day,
there is an even chance that the weather will be the same the next day. If there
is a change from a rainy or dull day, there is only a 50 percent chance that the
next day will be sunny.
1. Is the stochastic process we have just described Markovian?
2. If it is only approximately Markovian, what can one do to improve the ap-
proximation?
In this section we concern ourselves with the case where the time spent in a
Markov state has a discrete distribution whence we have a Discrete Time Markov
Chain (DTMC).
for n ∈ N.
The expression on the right-hand side of this equation is the one-step transition
probability of the process and it denotes the probability that the process goes
from state xn to state xn+1 when the time (or index) parameter is increased from
n to n + 1. That is, using the indices for notating the states,
The more general form of the sth step transition probabilities is given by
which gives the probability that the system will be in state j at step s, given that
it was in state i at step n where s ≥ n.
Note that the probabilities pij (n,s) must satisfy the following requirements:
The probability of going from state i to state j is the probability of somehow get-
ting from i at time n to some intermediate state k at some time r and from there
to state j. The events {χr = k|χn = i} and {χs = j|χr = k} are independent, so
that using this and the fact that from the Markov property,
for all m ∈ N. From the Markov property we can establish the following recursive
equation for calculating pij (m)
X
pij (m) = pik (m − 1)pkj (1), m = 2,3, . . . (2.9)
k
We can write Eq. (2.9) in matrix form by defining matrix P = [pij ], where
pij := pij (1), so that
where
P (0) = I
P (1) = P (0) P = IP
P (2) = P (1) P = P 2
P (3) = P (2) P = P 3
and in general
This equation enables us to compute the m-step transition probabilities from the
one-step transition probabilities.
(m)
Next we consider a very important quantity, the probability πj of finding our
DTMC in state j at the mth step:
(m)
πj = P [χm = j] (2.12)
for the m-th step transition probability where we have assumed, without loss of
generality, that we entered state i at time 0, then multiplying both sides of this
(0)
equation by πi = P [χ0 = i] (cf. definition in Eq. (2.12)), summing over all states
and applying theorem of Total Probability (cf. page 16), we obtain
X (m) X
P [χ0 = i]pij = P [χ0 = i]P [χm = j|χ0 = i]
i i
X (0) (m)
πi pij = P [χm = j]
i
(m)
= πj
or, alternatively
(m) X (0) (m)
πj = πi pij (2.13)
i
That is, the state probabilities at time m can be determined by multiplying the
multistep transition probabilities by the probability of starting in each of the
states and summing over all states.
The row vector formed by the state probabilities at time m is called the state
probability vector Π(m) . That is,
(m) (m) (m)
Π(m) = (π0 ,π1 ,π2 , . . .)
With this definition, Eq. (2.13) can be written in matrix form as follows
1
IPANEMA
1
2
CLIFTON
Example 2.1 Consider the simple discrete time MC in Fig.2.1 which illustrates
the behaviour of our surfer. This diagramme is also called the state transition
diagramme of the DTMC. Every instant a unit of time elapses the surfer decides
to do something. When at the Clifton, he decides to go to Waikiki with probability
1
2 or may decide to go to Ipanema with the same probability (our surfer happens
to be very affluent). When in Ipanema he may in fact decide to stay there with
probability 12 at the end of a time period. With our beaches numbered as shown,
we have
0 0.5 0.5
P = 0.3 0 0.7
0.2 0.3 0.5
Assume that our surfer starts off at Clifton (beach 1). In other words the initial
distribution is Π(0) = (1,0,0). From Clifton he can go to Ipanema or Waikiki with
equal probability, i.e.,
0 0.5 0.5
As we will see later, the vector Π(m) of state probabilities tends to a limit for
m → ∞. Even more, one can show that for specific DTMCs the effect of Π(0) on
the vector Π(m) completely vanishes. For our surfer that means, e.g., even if we
do not know at which beach he started the probability of finding him at a specific
beach after a long time is nearly constant. This phenomenon does not hold for all
DTMCs. Consider, e.g., the DTMC of Fig. 2.2. If the process starts in state 0 it
stays there forever. But starting in state 3 there is a chance that the process gets
2.1 Discrete Time Markov Chains 31
p
2 3
q p
0 p q 1
1 5
q
p
1 4
q
absorbed in state 5. Clearly, the probability Π(m) is not independent of the initial
distribution. This effect or to be more precise the absence of such effects can be
verified by investigating the structure of the state transition diagramme. E.g.,
from state 0 or 5 of the DTMC given in Fig. 2.2 no other state can be reached,
thus intuitively explaining the described effect.
Next we consider a classification of Markov states based on the structure of the
state transition diagramme.
Consider states i,j ∈ S. If there is a path from i to j, i.e., there exists an integer
n (which may depend on i and j) such that
pij (n) > 0
then we write i * j.
Two states i and j are said to communicate, written i j, if there is a path from
state i to state j and vice versa.
Let C[i] = {j|i j; j ∈ S},∀i ∈ S. We call C[i] the class of state i.
Example 2.2 Consider the simple MC in Fig. 2.2. In that figure, C[0] = {0},C[5] =
{5},C[1] = {1,2,3,4}.
An irreducible MC clearly has only one class of states, i.e. C[i] = C[j] ∀i,j ∈ S.
The MC of Fig. 2.2 is reducible since 0 1 is for instance not true.
Let C denote any class of state and C be the set of Markov states not in the class
C.
Since the latter implies pij = 0 for i 6= j, an absorbing state does not communicate
with any other state.
The MC of Fig. 2.2 has two absorbing states, 0 and 5.
Definition 2.5 A class C is said to be transient if there is a path out of C. That
is, if ∃i ∈ C and k ∈ C such that pik > 0. The individual states in a transient
class are themselves said to be transient.
States 1, 2, 3 and 4 in the MC of Fig. 2.2 are all transient.
Definition 2.6 A MC is said to be absorbing if every state in it is either absorb-
ing or transient.
Finally we define an ergodic class.
Definition 2.7 A class C is said to be ergodic if every path which starts in C
remains in C. That is
X
pij = 1, ∀i ∈ C
j∈C
∞
X (m)
fj = fj
m=1
We now classify the states j of a MC depending on the value fj of the state. Not
surprisingly, if fj = 1 we say the state is said to be recurrent; if a return is not
certain, that is fj < 1, then state j is said to be transient. Furthermore, if our
MC can return to state j only at steps η,2η,3η, . . ., where η ≥ 2 is the largest
such integer, then state j is said to be periodic with period η. If such an integer
number η does not exist, then the state j is said to be aperiodic.
(m)
Knowing the probability fj of returning to state j in m steps, we can now
define another interesting quantity, the mean recurrence time Mj of state j.
∞
X (m)
Mj = mfj (2.15)
m=1
The mean recurrence time is thus the average number of steps needed to return
to state j for the first time after leaving it.
We can further describe a state j to be recurrent null if Mj = ∞, whereas it
is recurrent nonnull if Mj < ∞. Note that an irreducible MC can only have
recurrent null states if the number of states is infinite.
With all this in mind, we can now state the following important result[108] with-
out proof:
2.1 Discrete Time Markov Chains 33
Theorem 2.1 The states of an irreducible DTMC are all of the same type; thus
they can be either
• all transient,
Exercise 2.2 Assume that we don’t know for certain where our surfer has started.
An oracle tells us that he might have started at Clifton with a chance of 19%, at
Waikiki with 26% and at Ipanema, the beach he likes most, with a chance of 55%.
What is our vector π (0) now? Calculate π (1) , π (2) , π (3) .
The most interesting DTMCs for performance evaluation are those whose state
(m)
probability distribution πj does not change when m → ∞ or to put it dif-
ferently, a probability distribution πj defined on the DTMC states j is said to
(m)
be stationary (or have reached a steady state distribution) if πj = πj when
(0)
πj = πj , that is, once a distribution πj has been attained, it does not change in
the future (with m).
(m)
πj = lim πj
m→∞
We are after the steady state probability distribution {πj } of being in state j at
some arbitrary point in the future. Clearly, if we know this, we can say a great
deal about the system modelled by the MC. When the DTMC is irreducible,
aperiodic and homogeneous the following theorem [108] helps us out.
1. all states are transient or all states are recurrent null. In both cases πj = 0 ∀ j
and there exists no steady state distribution, or
2. all states are recurrent nonnull and then πj > 0 ∀ j, in which case the set
{πj } is a steady state probability distribution and
34 2 Markov Processes
1
πj = (2.16)
Mj
In this case the quantities πj are uniquely determined through the following
equations
X
πi = 1 (2.17)
i
X
πi pij = πj (2.18)
i
τ i = πi T (2.19)
Another quantity which will be useful to us is the average time υij spent by the
DTMC in state i between two successive visits to state j in steady state. This
quantity is also known as the visit ratio or mean number of visits, and can be
computed from
πi
υij = (2.20)
πj
Π = ΠP (2.21)
Note that Eq. (2.21) follows directly from the equation Π(m) = Π(m−1) P by
taking the limit as m → ∞. The following example illustrates that no unique
steady state distribution exists for a periodic MC.
2.1 Discrete Time Markov Chains 35
1 1
2 3
1
Example 2.3 Consider the periodic MC illustrated in Fig.2.3 and let Π(0) =
(1,0,0). Then
Clearly the limit Π = limm→∞ Π(m) does not exist. Similarly the MC must be
irreducible for a unique solution to exist as the following example illustrates.
Example 2.4 Consider the reducible MC illustrated in Fig.2.4 and let Π(0) =
(1,0,0,0,0). Then
0 0 0.75 0.25 0
0 0 1 0 0
(1) (0)
Π =Π 0 1 0 0 0 = (0,0,0.75,0.25,0)
0 0 0 0 1
0 0 0 1 0
Example 2.5 Using Eq. (2.21) we can write the following set of linear equations:
36 2 Markov Processes
0.75 0.25
1 1
2 3 4 5
1 1
1−p
1 2
1
p 1−q
3
q
Note that in Eqs. (2.22) above, the last equation is a linear combination of the sec-
ond and the first, indicating that there is a linear dependence among them. There
will always be a linear dependence amongst the set of equations in Eq. (2.21) and
it is the reason why we have to use the additional Eq. (2.17) to derive a solu-
tion. Using the latter equation and any two of the equations in (2.22) we obtain
approximately
π1 = 0.188
π2 = 0.260
π3 = 0.552
So our surfer is most likely to be found on the beach at Ipanema (probability .552)
1
and in fact he returns every 0.552 or 1.81 days to that beach.
2.1 Discrete Time Markov Chains 37
Exercise 2.3 Consider the stochastic process described in Exercise 2.1. Let C,R
and D represent a sunny, rainy and dull day respectively and in this way define a
new stochastic process with 9 states. Determine the new transition probabilities.
Consider this process to be a discrete time MC and find the probability of two dull
days following upon one another.
2. Under what conditions will the chain be irreducible and aperiodic, if at all?
When using MCs to model real systems it is often very useful to know the number
of steps (or, equivalently, the time) spent in the transient states before reaching
an absorbing state. Think of executing a multi-layer network protocol: The time
spent by processes executing the protocol in one layer (transient states) before
going to the the next layer (absorbing state) is one example of such an application.
The absorbing MC illustrated in Fig. 2.6 consisting of a set St of nt transient
states and a set Sa of na absorbing states, illustrates what we have in mind.
We begin our analysis by numbering the states in the MC such that the na
absorbing states occur first and writing the transition probability matrix P as
I 0
P = (2.23)
R Q
Once in an absorbing state the process remains there, so I is the identity matrix
with all elements pii = 1, 1 ≤ i ≤ na . R is an nt × na matrix describing the
movement from the transient to the absorbing states, and Q is a nt × nt matrix
describing the movement amongst transient states. Since it is not possible to move
from the absorbing to the transient states 0 is the na × nt zero matrix. Since the
formula for matrix multiplication also applies to matrices written in block form,
we can calculate the powers of P in terms of the matrices R and Q:
2 I 0
P =
R + QR Q2
and
I 0
P3 =
R + QR + Q2 R Q3
38 2 Markov Processes
1 1
2 2
nt na
St Sa
or in general
n I 0
P =
N n R Qn
where N n = I + Q + Q2 + . . . + Qn−1 = ni=1 Qi−1 .
P
We can now state the following fundamental result for an absorbing MC:
We will not prove the theorem (for a proof see [86]) but, from Eq. (2.14) above
and the knowledge that for a transient state the steady state probability πj = 0,
the first part of the result is easy to accept intuitively.
Write
N = [nij ] = (I − Q)−1
For absorbing chains, the only interesting starting states are the transient ones.
Assume that we start with an initial state i ∈ St . For each state j ∈ St , define
the random variable υij to be the number of visits to state j before an absorbing
state is reached. Define υij = 1 when i = j.
We know from Th. 2.2 that υij < ∞ for any transient state j, and that υij has
finite expectation. Assuming these properties we can now prove the following
theorem:
E[υij ] = nij
Proof. Suppose that we move from starting state i to state k in the first step.
If k is an absorbing state, we can never get to state j. If k is a transient state,
we are in the same situation as before with starting state k instead. Using the
Markov property,
X
E[υij ] = δij + qik E[υkj ]
k∈St
The term δij is the Kronecker delta function with value 1 if i = j and 0 otherwise
and it counts the initial visit to state j in case the starting state is j. Denote by
M the matrix whose i,j-entry is E[υij ] for all i,j ∈ St . Then the last equation can
obviously be written
M = I + QM
so that M = (I − Q)−1 = N . t
u
Referring to Fig. 2.6, starting off in some state i ∈ St , the total number of steps
(transitions) before reaching an absorbing state is clearly the sum of times we
visit every state in St before absorption. Denote this random variable by υi and
its expected value E[υi ] by τi .
Theorem 2.5
X
τi = nij i ∈ St (2.24)
j∈St
and τi < ∞.
40 2 Markov Processes
Proof. Since the expectation of the sum is the sum of the expectations the latter
result follows from the previous theorem. t
u
~τ = N e> (2.25)
Theorem 2.6 Let τ be the mean time before absorption of a DTMC with tran-
sient state set St = {1,2, . . . ,nt } and initial probability distribution ~r = (r1 ,r2 , . . . ,rnt ).
Then
τ = ~r N e> (2.26)
Furthermore, define σi2 as the variance of the time υi before absorption starting
in state i ∈ St and σ~2 = (σi2 , . . . ,σn2 t ) as the vector of the variance of these times.
Let τ~2 = (E[v12 ], . . . ,E[vn2 t ]). Then it can be shown (cf. [103], page 49) that
Theorem 2.7
which reduces to
by substituting for N Q and N e> from Eq. (2.26). The result follows. t
u
In theory we therefore have the formulas in Eqs. (2.26) and (2.27) to compute
the mean and variance respectively for the time before absorption. In practice,
computing the matrix N = (I − Q)−1 for a MC with a large state space is no mean
task. Courtois and Semal[56] fortunately have devised a method of computing τ
and σ 2 from P. We next describe their technique without proving the results.
Proofs can be found in [103].
To start off, we define the augmented transition probability matrix
!
Q (I − Q)e>
(2.29)
~r 0
Note that (I − Q)e> is the vector of transition probabilities from the states in St
to a new state say, a ∈ Sa , and ~r and Q are the same as before.
The clever idea is that, assuming irreducibility and aperiodicity, the Markov
process defined by the matrix in Eq. (2.29), has the same behaviour as a new
process with the state a designated as an absorbing state provided one assumes
that whenever the latter chain reaches the absorbing state a, it is restarted with
the initial vector ~r and this is done infinitely many times. The new, absorbing
MC is described by the matrix
!
Q (I − Q)e>
(2.30)
0 1
Again, the ergodic behaviour of the process described by Eq. (2.29) describes the
behaviour of the absorbing chain of Eq. (2.30) over an infinite number of runs,
each started with the initial distribution vector ~r.
where πa is the last component of the steady state distribution of the DTMC
described by the matrix !
Q (I − Q)e>
~r 0
The proof of this theorem can be found in [56]. Intuitively, 1/πa is the mean
time between two visits to the last state a of the Markov process described by
the matrix in Eq. (2.29) (cf. Th. 2.2) and each time the system is in this last
state, one further step is needed to restart the absorbing chain with the initial
distribution ~r.
A similar result exists for the variance σ 2 of the time before absorption.
Theorem 2.9 If σ 2 is the variance of the time before absorption of the chain
!
Q (I − Q)e>
0 1
σ 2 = 2τ τ 0 − τ − τ 2
and πa0 is the last component of the steady state distribution of the DTMC de-
scribed by the matrix !
Q (I − Q)e>
~r0 0
with
1
~r0 = (π1 ,π2 , . . . ,πnt )
1 − πa
where πi is the steady state distribution of the MC given in Th. 2.8.
This concludes our study of discrete time MCs.
Exercise 2.5 Two gamblers are betting on the outcome of an unlimited sequence
of coin tosses. The first gambler always bets heads, which appears with probability
p, 0 < p < 1 on every toss. The second gambler always bets tails, which appears
with probability q = 1 − p. They start with a total of C chips between them.
Whenever one gambler wins he has to give the other one chip. The game stops
when one gambler runs out of chips (is ruined). Assume the gamblers start with
C = 3 chips between them.
2. How long will the game last if the first gambler starts with 1 coin?
Exercise 2.6 Write a program to simulate the game described in the previous
exercise for a sufficient number of coin tosses. Use values of p = 0.2,0.4, . . . ,0.8.
Compare your simulation results with the theoretical answers in the previous ex-
ercise. Assume p = 0.6.
1. Determine the theoretical mean time of a game. Compare your answer with
your simulation results.
In our discussions thus far the Markov process had the property that a transition
was made at every time instant. That transition may well return the process to
the same state, but a transition occurred nevertheless.
We now turn our attention to a more general class of processes where the time
between transitions may be several unit time intervals, and where this time can
depend on the particular transition being made. This process is no longer strictly
Markovian, however, it retains enough of the Markovian properties to deserve the
name semi-Markov process[90].
In our discrete time case, these sojourn times τij with expected value τij are
positive, integer-valued random variables, each governed by a probability mass
function sij (m), m = 1,2, . . . called the sojourn time mass function for a transi-
tion from state i to state j. That is,
One distribution sij (m) we are familiar with is the geometric distribution
(1 − a)am−1 , m = 1,2,3, . . .
We assume that the mean of the sojourn time is finite and that the sojourn times
are at least one time unit in duration, i.e.,
sij (0) = 0
that is, all sojourn times are exactly one time unit in length.
We next define the waiting time τi with expected value τi as the time spent
in state i, i = 1,2, . . . ,N irrespective of the successor state and we define the
probability mass function of this waiting time as
N
X
P [τi = m] = pij sij (m)
j=1
N
X
τi = pij τij
j=1
That is, the probability that the system will spend m time units in state i if
we do not know its successor state, is the probability that it will spend m time
units in state i if its successor state is j, multiplied by by the probability that its
successor state will indeed be j and summed over all possible successor states.
2.2 Semi-Markov Processes 45
As in the DTMC case, we next set out to compute the n−step transition prob-
abilities, which we denote φij (n), for the semi-Markov case. That is, how can a
process that started by entering state i at time 0 be in state j at time n?
One way this can happen is for i and j to be the same state and for the process
never to have left state i throughout the period (0,n). This requires that the
process makes its first transition, to any other state, only after time n. That is
where
1, if i = j,
δij =
0, otherwise
Let W (n) = {δij 1 − nm=0 N j=1 pij sij (m) } be the matrix of these elements.
P P
Every other way to get from state i to j in the interval (0,n) would mean the
process made its first transition from state i to some other state k at a time m,
and then by a succession of such transitions to state j at time n. Note that we
have to consider all intermediate times m, 0 < m ≤ n and intermediate states
k ∈ S, S the Markov state space. In other words
n X
X N
pik P [τik = m]φkj (n − m) (2.32)
m=0 k=1
n
X
(P ◦ S(m))Φ(n − m) n = 0,1,2, . . .
m=0
and
n
X
Φ(n) = W (n) + (P ◦ S(m))Φ(n − m) n = 0,1,2, . . . (2.34)
m=0
Φ(n) is called the interval transition probability matrix for the semi-Markov pro-
cess in the interval (0,n) and clearly
Φ(0) = I
Eq. (2.34) provides a convenient recursive basis for computing Φ(n) for any semi-
Markov process. The quantities P and S(m) come directly from the definition of
the process.
Example 2.6 Consider again our typical surfer of Fig. 2.1 on page 30, but let
us now give the example a semi-Markovian flavour. The nature of the surfer now
dictates that the length of time he will stay on a particular beach will depend on
both which beach he is on and where he intends to go next. The (sojourn) time
τij is thus the length of time spent surfing on beach i with the intention of going
to beach j (where j = i is certainly possible). The lifeguards on each beach have
been keeping record of our surfer and have come up with the following probability
mass functions describing the surfer’s behaviour:
m−1 m−1 m−1
1 2 1 3 1 2
s11 (m) = s12 (m) = s13 (m) =
3 3 4 4 3 3
m−1 m−1 m−1
1 1 1 7 2 3
s21 (m) = s22 (m) = s23 (m) =
2 2 8 8 5 5
m−1 m−1 m−1
1 3 1 2 1 1
s31 (m) = s23 (m) = s33 (m) =
4 4 3 3 2 2
for m = 1,2,3, . . .
Solution. Consider the general geometric distribution function (cf. Sec. 1.2)
p(n) = (1 − a)a(n−1) . The first moment or mean n can be calculated in the fol-
lowing way:
∞
X
n= n(1 − a)a(n−1)
n=0
P d d P
Using the property that da = da we write
∞
X d n
n = (1 − a) a
n=0
da
d 1
= (1 − a)
da 1 − a
1
=
1−a
2.2 Semi-Markov Processes 47
τ 11 = 3; τ 12 = 4; τ 13 = 3;
τ 21 = 2; τ 22 = 8; τ 23 = 2.5;
τ 31 = 4; τ 32 = 3; τ 33 = 2;
Clearly beach 2 (Waikiki) is the most popular with our surfer, since he spends 8
units of time on the average surfing there and then immediately returning to it.
The mean time he spends on that beach, irrespective of where he might go next is
given by
Exercise 2.7 In Ex. 2.6, compute the mean time the surfer spends on Ipanema,
assuming that he has arrived there from anywhere else but Waikiki.
We now set out to find the limiting behaviour of the interval transition proba-
bilities over long intervals. It is important to note that the MC structure of a
semi-Markov process is the same as that of its embedded Markov process. There-
fore the interval transition probabilities of a semi-Markov process can exhibit a
unique limiting behaviour only within the chain of the embedded Markov process.
We begin by defining a limiting interval transition probability matrix Φ for our
process by
Φ = lim Φ(n)
n→∞
with elements φij . However, in steady state, the limiting interval transition prob-
abilities φij do not depend on the starting state i and we therefore write φij = φj .
Define a vector ϕ = (φ0 ,φ1 , . . . ,φN ) as the vector of probabilities φj that the semi-
Markov process is in state j as time n → ∞ and let Π = {π0 ,π1 , . . . ,πN } be the
steady state probability vector of the equivalent embedded MC [cf. Eq. (2.21)].
One can prove (see e.g., Howard[90]) that
πj τ j
φj = PN (2.35)
i=1 πi τ i
or
1
ϕ= ΠM
τ
where we have written
48 2 Markov Processes
N
X
τ= πj τ j
j=1
where υij , given by Eq. (2.20), is the visit ratio defined on page 34.
Example 2.7 Suppose we want to know the steady state probability of finding
our surfer on Waikiki beach. This we can do by applying Eq. (2.35). In Ex. 2.5
we determined that
Π = (0.188,0.260,0.552)
so that
π2 τ 2
φ2 =
π1 τ 1 + π2 τ 2 + π3 τ 3
0.260 × 2.35
=
0.188 × 3.5 + 0.260 × 2.35 + 0.552 × 2.7
= 0.22
There is thus a 22 percent chance of finding him on Waikiki or beach number 2.
Exercise 2.8 A car rental company has determined that when a car is rented
in Town 1 there is a 0.8 probability that it will be returned to the same town
and a 0.2 probability that it will be returned to Town 2. When rented in Town
2, there is a 0.7 probability that the car will be returned to Town 2, otherwise it
is returned to Town 1. From its records, the company determined that the rental
period probability mass functions are:
m−1 m−1
1 2 1 5
s11 (m) = s12 (m) = m = 1,2,3, . . .
3 3 6 6
m−1 m−1
1 3 1 11
s21 (m) = s22 (m) = m = 1,2,3, . . .
4 4 12 12
What percentage of the time does a car spend in Town 2?
2.3 Continuous Time Markov Chains 49
In Sec. 2.1 we described how our surfer had to decide at regular, equal intervals
of time whether to leave or whether to stay on the beach where he is. If we now
allow him to decide at an arbitrary time which beach to go to next, we have the
continuous-time version of that example.
The Continuous Time Markov Chain (CTMC) version of the Markov property,
Eq. (2.1), is given by
for any sequence t0 ,t1 , . . . ,tn such that t0 < t1 < . . . < tn and xk ∈ S where S is
the (discrete) state space of the process.
The right-hand side of the above equation is the transition probability of the
CTMC and we write
to identify the probability that the process will be in state xj at time s, given
that it is in state xi at time t ≤ s. Since we are still considering discrete state
Markov processes (chains) we will continue to use i ∈ N rather than xi to denote
a state of our Markov processes.
Note that we need to define
1, if i = j
pij (t,t) = (2.37)
0, otherwise
to establish the fact that the process may not leave immediately to another state.
We already mentioned in Sec. 2 on page 26 that the time a Markov process spends
in any state has to be memoryless. In the case of a DTMC this means that the
chain must have geometrically distributed state sojourn times while a CTMC
must have exponentially distributed sojourn times. This is such an important
property that we include a proof from Kleinrock[108] of it here. The proof is also
instructive in itself.
Let yi be a random variable which describes the time spent in state i. The Markov
property specifies that we may not remember how long we have been in state
i which means that the remaining sojourn time in i may only depend upon
i. Assume that this remaining time t has distribution h(t). Then our Markov
property insists that
for all s,t ≥ 0. All that remains is to show that the only continuous distribution
function which satisfies Eq. (2.39) is the negative exponential distribution. Write
fi (t) for the corresponding density function. Then we can write
d d
P [yi > t] = (1 − P [yi ≤ t])
dt dt
= −fi (t) (2.40)
Using the latter result and differentiating Eq. (2.39) with respect to s we obtain
dP [yi > s + t]
= −fi (s)P [yi > t]
ds
Dividing both sides by P [yi > t] and setting s = 0 we obtain the differential
equation
dP [yi > t]
= −fi (0)ds
P [yi > t]
for all t ≥ 0. Setting λ = fi (0) we are back to fi (x) = λe−λx which we had before
in Eq. (2.4) on page 26. 2
In other words, if a stochastic process has the Markovian property, the time spent
in a state will have a negative exponential distribution. This may seem rather
restrictive since many real systems do not have exponential time distributions.
Other documents randomly have
different content
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
ebookbell.com