0% found this document useful (0 votes)
3 views78 pages

Stochastic Petri Nets 2nd Edition Falko Bause Pieter S Kritzinger Download

The document discusses the second edition of 'Stochastic Petri Nets' by Falko Bause and Pieter S. Kritzinger, highlighting advancements in the analysis of Petri nets and Stochastic Petri nets since the first edition. It emphasizes the importance of both functional correctness and performance analysis in discrete event systems, and introduces new methods for analyzing large state spaces. The book is structured into three parts covering stochastic theory, Petri nets, and Stochastic Petri net models, aiming to provide a comprehensive understanding of the subject.

Uploaded by

dahlarfyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views78 pages

Stochastic Petri Nets 2nd Edition Falko Bause Pieter S Kritzinger Download

The document discusses the second edition of 'Stochastic Petri Nets' by Falko Bause and Pieter S. Kritzinger, highlighting advancements in the analysis of Petri nets and Stochastic Petri nets since the first edition. It emphasizes the importance of both functional correctness and performance analysis in discrete event systems, and introduces new methods for analyzing large state spaces. The book is structured into three parts covering stochastic theory, Petri nets, and Stochastic Petri net models, aiming to provide a comprehensive understanding of the subject.

Uploaded by

dahlarfyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Stochastic Petri Nets 2nd Edition Falko Bause

Pieter S Kritzinger download

https://ebookbell.com/product/stochastic-petri-nets-2nd-edition-
falko-bause-pieter-s-kritzinger-2168608

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Stochastic Petri Nets For Wireless Networks 1st Edition Lei Lei

https://ebookbell.com/product/stochastic-petri-nets-for-wireless-
networks-1st-edition-lei-lei-5054528

Stochastic Petri Nets Modelling Stability Simulation 1st Edition Peter


J Haas

https://ebookbell.com/product/stochastic-petri-nets-modelling-
stability-simulation-1st-edition-peter-j-haas-890752

Stochastic Petri Nets Modelling Stability Simulation 1st Edition Peter


J Haas

https://ebookbell.com/product/stochastic-petri-nets-modelling-
stability-simulation-1st-edition-peter-j-haas-4848508

Introduction To Quantitative Macroeconomics With Julia Stateoftheart


Dynamic Stochastic General Equilibrium Models Paperback Petre Caraiani

https://ebookbell.com/product/introduction-to-quantitative-
macroeconomics-with-julia-stateoftheart-dynamic-stochastic-general-
equilibrium-models-paperback-petre-caraiani-10560850
Realtime Applications With Stochastic Task Execution Times Analysis
And Optimisation 1st Edition Sorin Manolache

https://ebookbell.com/product/realtime-applications-with-stochastic-
task-execution-times-analysis-and-optimisation-1st-edition-sorin-
manolache-2143182

Stochastic Processes With Applications Antonio Di Crescenzo Claudio


Macci

https://ebookbell.com/product/stochastic-processes-with-applications-
antonio-di-crescenzo-claudio-macci-55252044

Stochastic Processes And Simulations A Machine Learning Perspective


2nd Edition Vincent Granville

https://ebookbell.com/product/stochastic-processes-and-simulations-a-
machine-learning-perspective-2nd-edition-vincent-granville-44871770

Stochastic Elasticity A Nondeterministic Approach To The Nonlinear


Field Theory L Angela Mihai

https://ebookbell.com/product/stochastic-elasticity-a-
nondeterministic-approach-to-the-nonlinear-field-theory-l-angela-
mihai-45334300

Stochastic Exponential Growth And Lattice Gases Statistical Mechanics


Of Stochastic Compounding Processes Dan Pirjol

https://ebookbell.com/product/stochastic-exponential-growth-and-
lattice-gases-statistical-mechanics-of-stochastic-compounding-
processes-dan-pirjol-46073864
5

Preface
Any developer of discrete event systems knows that the most important quality of
the final system is that it be functionally correct by exhibiting certain functional,
or qualitative properties decided upon as being important. Once assured that the
system behaves correctly, it is also important that it is efficient in that its running
cost is minimal or that it executes in optimum time or whatever performance
measure is chosen. While functional correctness is taken for granted, the latter
quantitative properties will often decide the success, or otherwise, of the system.
Ideally the developer must be able to specify, design and implement his system
and test it for both functional correctness and performance using only one for-
malism. No such formalism exists as yet. In recent years the graphical version
of the Specification and Description Language (SDL) has become very popular
for the specification, design and partial implementation of discrete systems. The
ability to test for functional correctness of systems specified in SDL is, however,
limited to time consuming simulative executions of the specification and perfor-
mance analysis is not directly possible. Petri nets, although graphical in format
are somewhat tedious for specifying large complex systems but, on the other
hand were developed exactly to test discrete, distributed systems for functional
correctness. With a Petri net specification one can test, e.g., for deadlock, live-
ness and boundedness of the specified system. Petri nets in their various formats,
have been studied extensively since first proposed by Carl Adam Petri in 1962
[133] and several algorithms exist to determine the functional properties of nets.
Another paradigm which is aimed at testing for functional correctness is that of
process algebras or calculi for communicating systems.
The major drawback of Petri nets, as originally proposed and process algebras
(amongst others) is that quantitative analyses are not catered for. As a conse-
quence, the developer who needs to know about these properties in his system
has to devise a different model of the system which, apart from the overhead con-
cerned provides no guarantee of consistency across the different models. Because
of the latter, computer scientists during the last decade added time, in various
forms, to ordinary Petri nets to create Stochastic Petri nets (SPNs) and Gen-
eralized Stochastic Petri nets (GSPNs) for performance modelling and a great
deal of theory has developed around Stochastic Petri nets as these are generically
known.
Another aspect which also contributed significantly to the development of Stochas-
tic Petri nets is the fact that their performance analysis is based upon Markov
theory. Since the description of a Markov process is cumbersome, abstract models
have been devised for their specification. Of these, queueing networks (QNs) was
originally the most popular, especially since the analysis of a large class of QNs
(product-form QNs) can be done very efficiently. QNs cannot, however, describe
system behaviours like blocking and forking and with the growing importance
of distributed systems this inability to describe synchronisation naturally turned
6 PREFACE

the focus to Petri nets as well.


Stochastic Petri nets are therefore a natural development from the original Petri
nets because of
• the advantage of their graphical format for system design and specification
• the possibility and existing rich theory for functional analysis with Petri nets
• the facility to describe synchronisation, and
• the natural way in which time can be added to determine quantitative prop-
erties of the specified system.
The disappointing thing about Stochastic Petri nets is that the integration of time
changes the behaviour of the Petri net significantly. So properties proven for the
Petri net might not hold for the corresponding time-augmented Petri net. E.g., a
live Petri net might become deadlocked or a non-live Petri net might become live.
We will see that the analysis techniques developed for Petri nets are not always
applicable to SPNs. But there are ways around this, as we shall see in this book.
Also, using Stochastic Petri nets to specify the sharing of resources controlled by
specific scheduling strategies is very cumbersome. So the pendulum has swung
back, in a sense, that we introduce certain concepts from queueing theory when
presenting Queueing Petri nets (QPNs) which offer the benefits of both worlds,
Petri nets and Queueing networks.
This book itself arose out of a desire by the authors to collect all one needs to
understand Stochastic Petri net theory in one volume. It is in three parts. The
first part is on stochastic theory leading to introductory queueing theory and
simple queues. In Part I we emphasise Markovian theory, because where general
queueing theory fails, Markovian analysis can often still be useful.
Part II is about Petri nets, starting with ordinary Petri nets and ending with
Coloured Petri nets. Ordinary and Coloured Petri nets do not involve time and
were developed to test the functionality of concurrent systems. In this part of
the book we give an overview of the most important analysis techniques paying
particular attention to the validation of those properties which are essential for
Stochastic Petri nets.
Our emphasis in Part III is on those Stochastic Petri net models which can be
analysed by Markovian techniques. The intention of this book is not to give an
overview of several or all Stochastic Petri net models appearing in the literature,
but to stress a combined view of functional and performance analysis in the
context of some Stochastic Petri net models.
We hope that by reading this book, the reader will become as excited as we are
about the subject of Stochastic Petri nets and the many unsolved problems arising
from the increasing demands for correctness and performance when specifying
discrete event systems.
Falko Bause and Pieter Kritzinger
Dortmund, Germany
Cape Town, South Africa
1995.
7

Preface to the Second Edition


A great deal of progress has been made in the analysis of Petri nets and Stochastic
Petri nets since the first edition of this book appeared over 6 years ago. Amongst
others, partial state space exploration methods have been proposed and state-
based analysis techniques exploiting the structure of the system being modeled
are now known. In the case of Queueing Petri nets and stochastic Petri nets
in general, results and algorithms based on product-form solutions have been
introduced.
The results are that nets with up to 50 million states can now be analysed on
ordinary computing equipment. In order to guide the reader to these results we
have added several links in this edition to the relevant literature and updated the
Further Reading sections at the end of each chapter as starting points for more
detailed information.
Naturally, we were tempted to include the new material mentioned in the book.
That would have however, detracted from the focus and advantage of this text:
A concise introduction to both the functional and performance aspects of Petri
nets without emphasising the one or the other.
Falko Bause and Pieter Kritzinger
Dortmund, Germany
Cape Town, South Africa
2002.
8

For Heinz Beilner, our friend and mentor, without whom


this book would never have been written.
9

Contents

Preface 5

Preface to the Second Edition 7

Contents 9

I STOCHASTIC THEORY 13

1 Random Variables 15
1.1 Probability Theory Refresher . . . . . . . . . . . . . . . . . . . 15
1.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . 18
1.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . 20
1.4 Moments of a Random Variable . . . . . . . . . . . . . . . . . . 21
1.5 Joint Distributions of Random Variables . . . . . . . . . . . . . 22
1.6 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Markov Processes 25
2.1 Discrete Time Markov Chains . . . . . . . . . . . . . . . . . . . 27
2.1.1 Steady State Distribution . . . . . . . . . . . . . . . . . . . 33
2.1.2 Absorbing Chains and Transient Behaviour . . . . . . . . . 37
2.2 Semi-Markov Processes . . . . . . . . . . . . . . . . . . . . . . 43
2.2.1 Formal Model of a Semi-Markov Process . . . . . . . . . . . 43
2.2.2 Interval Transition Probabilities . . . . . . . . . . . . . . . 45
2.2.3 Steady State Behaviour . . . . . . . . . . . . . . . . . . . . 47
2.3 Continuous Time Markov Chains . . . . . . . . . . . . . . . . . 49
2.3.1 Steady State Distribution . . . . . . . . . . . . . . . . . . . 54
2.4 Embedded Markov Chains . . . . . . . . . . . . . . . . . . . . . 56

3 General Queueing Systems 58


3.1 Little’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Birth-Death Processes . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 M/M/1 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 M/M/m Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.6 Queues with Processor Sharing Scheduling Strategy . . . . . . 72
3.7 Queues with Infinite Servers . . . . . . . . . . . . . . . . . . . . 73
3.8 Queues with Priority Service . . . . . . . . . . . . . . . . . . . 73

4 Further Reading 75
10 Contents

II PETRI NETS 77

5 Place-Transition Nets 79
5.1 Structure of Place-Transition Nets . . . . . . . . . . . . . . . . 83
5.2 Dynamic Behaviour of Place-Transition Nets . . . . . . . . . . 86
5.3 Properties of Place-Transition Nets . . . . . . . . . . . . . . . . 88
5.4 Analysis of Place-Transition Nets . . . . . . . . . . . . . . . . . 92
5.4.1 Analysis of the Reachability Set . . . . . . . . . . . . . . . 92
5.4.2 Invariant Analysis . . . . . . . . . . . . . . . . . . . . . . . 97
5.4.3 Analysis of Net Classes . . . . . . . . . . . . . . . . . . . . 103
Analysis of State Machines . . . . . . . . . . . . . . . . . . 104
Analysis of Marked Graphs . . . . . . . . . . . . . . . . . . 106
Analysis of EFC-nets . . . . . . . . . . . . . . . . . . . . . . 108
5.4.4 Reduction and Synthesis Analysis . . . . . . . . . . . . . . 112
5.5 Further Remarks on Petri Nets . . . . . . . . . . . . . . . . . . 115

6 Coloured Petri Nets 119

7 Further Reading 128

III TIME-AUGMENTED PETRI NETS 131

8 Stochastic Petri Nets 135

9 Generalized Stochastic Petri Nets 143


9.1 Quantitative Analysis of GSPNs . . . . . . . . . . . . . . . . . 145
9.2 Qualitative Analysis of GSPNs . . . . . . . . . . . . . . . . . . 152
9.2.1 Qualitative Analysis of EFC-GSPNs . . . . . . . . . . . . . 158
9.3 Further Remarks on GSPNs . . . . . . . . . . . . . . . . . . . . 162

10 Queueing Petri Nets 166


10.1 Quantitative Analysis of QPNs . . . . . . . . . . . . . . . . . . 168
10.2 Qualitative Analysis of QPNs . . . . . . . . . . . . . . . . . . . 173
10.2.1 Qualitative Analysis of EFC-QPNs . . . . . . . . . . . . . . 174
10.3 Some Remarks on Quantitative Analysis . . . . . . . . . . . . . 178

11 Further Reading 180

12 Application Examples 183


12.1 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . 183
12.2 Node of a DQDB network . . . . . . . . . . . . . . . . . . . . . 184

13 Solutions to Selected Exercises 189

Bibliography 197
11

Index 213
12 Contents
Part I

STOCHASTIC THEORY
15

1 Random Variables

Much of the world around us is not very deterministic although it may not be
apparent at first glance. Consider a computer, for instance, which given the same
input values, will always give the same output. While a computer program is
processing incoming data however, it is often not possible to predict from one
moment to the next

– what input values will arrive for processing, or

– the time sequence in which they will arrive.

Think of the node of a computer network to understand this. Although the set
of messages which may arrive at the node is finite and known, we cannot tell for
certain from instant to instant which messages will arrive from where. Moreover,
the network software is likely to be using the same processor(s) at the node as
the operating system. When the process executing the network software will be
interrupted and by which process cannot be said for certain. All of which makes
it impossible to tell for certain what will happen next. We say the process just
described is stochastic.
The term stochastic has an exact mathematical meaning and there is a vast theory
developed to predict the behaviour of stochastic processes. This part of the book
gives only a basic introduction to that theory. Goodman [86] provides a more
thorough introduction to the subject while an extensive treatment can be found
in Howard [90].

1.1 Probability Theory Refresher

In order to understand stochastic theory, one needs to know some fundamental


concepts of probability theory. This section provides such a basic introduction.
For students wishing a more fundamental introduction, there are many good
books on probability theory, such as those of Feller [75] and Ross [152].
The first concept in probability theory that we need to know is that of an exhaus-
tive set of events which is a set of events whose union forms the sample space
S of all possible outcomes of an experiment. The sample space when we roll an
ordinary die, consists of 6 events for instance.
If two events A and B are such that

A ∩ B = ∅ (the empty set)


16 1 Random Variables

then the two events are said to be mutually exclusive or disjoint. This leads to
the concept of mutually exclusive exhaustive events {A1 ,A2 , . . . ,An } which are
events such that

Ai Aj = Ai ∩ Aj = ∅ for all i 6= j
A1 ∪ A2 ∪ . . . ∪ An = S (1.1)

The next important concept is that of conditional probability. The conditional


probability of the event A, given that the event B occurred (denoted as P [A|B])
is defined as

P [AB]
P [A|B] :=
P [B]

whenever P [B] 6= 0.
The statistical independence of events can be defined as follows. Two events A
and B are said to be statistically independent iff

P [AB] = P [A]P [B]. (1.2)

For three statistically independent events A, B, C each pair of events must satisfy
Eq. (1.2) as well as

P [ABC] = P [A]P [B]P [C]

and so on for n events requiring the n-fold factoring of the probability expression
as well as the (n−1)-fold factorings all the way down to all the pairwise factorings.
Moreover, for two independent events A and B

P [A|B] = P [A]

which merely states that the knowledge of the occurrence of an event B does not
affect the probability of the occurrence of the independent event A in any way
and vice-versa.
We also need to know the theorem of total probability for our basic understanding
of probability theory.

Theorem 1.1 Theorem of Total Probability. Consider an event B and a set of


mutually exclusive exhaustive events {A1 ,A2 , . . . ,An }. If the event B is to occur
it must occur in conjunction with exactly one of the mutually exhaustive events
Ai . That is
n
X
P [B] = P [Ai B]
i=1
1.1 Probability Theory Refresher 17

From the definition of conditional probability we may always write

P [Ai B] = P [Ai |B]P [B]


= P [B|Ai ]P [Ai ]

which leads to the second form of the theorem of total probability


n
X
P [B] = P [B|Ai ]P [Ai ]
i=1

The last equation suggests that to find the probability of some complex event
B, one simplifies the event by conditioning it on some event Ai in such a way
that computing the probability of event B given event Ai is less complex and
then to multiply by the probability of the conditional event Ai to yield the joint
probability P [Ai B]. Having done this for a set of mutually exclusive exhaustive
events {Ai } we may then sum these probabilities to find the probability of the
event B. If we need to simplify the analysis even further, we may condition
event B on more than one event and then uncondition each of these events by
multiplying by the probability of the appropriate condition and then sum all
possible forms of all conditions.
The final bit of probability theory that we are certain to come across in our study
of stochastic systems is Bayes’ theorem.

Theorem 1.2 Bayes’ theorem. Let {Ai } be a set of mutually exclusive and ex-
haustive events. Then
P [B|Ai ]P [Ai ]
P [Ai |B] = Pn (1.3)
j=1 P [B|Aj ]P [Aj ]

Bayes’ theorem allows us to compute the probability of one event conditioned on


a second by calculating the probability of the second conditioned on the first and
other terms.

Exercise 1.1 If there are n people present in a room, what is the probability
that at least two of them have the same birthday? How large may n be for this
probability to be less than 0.5?

Exercise 1.2 A student writes a multiple-choice examination where each ques-


tion has exactly m possible answers. Assume that a student knows the correct
answer to a proportion p of all the questions; if he does not know the correct
answer, he makes a random guess. Suppose that the student got the answer to a
particular question wrong. What is the probability that he was guessing?
18 1 Random Variables

1.2 Discrete Random Variables

We call a variable random and denote it χ if we cannot tell for certain what its
value will be. Examples of such random variables are the temperature outside on
any particular day, the number of customers in a supermarket checkout line or
the number of messages arriving at a network node.
A random variable is said to be discrete if the set of possible values of χ is
countable (but not necessarily finite). Since we do not know for certain what
value it will have, we say that it will have value x with a probability pχ (x). That
is
pχ (x) = P [χ = x]. (1.4)
In this formula, x can be any real number and 0 ≤ pχ (x) ≤ 1 for all values of x.
pχ (x) is called the probability mass function of χ.
Suppose that χ can take on the values x1 ,x2 ,x3 ,x4 or x5 with probability p1 ,p2 ,p3 ,p4
and p5 respectively. Clearly,
5
X
pi = 1
i=1

The following random variables are important for our studies.

Definition 1.1 Bernoulli variable. A Bernoulli random variable χ takes on val-


ues of 0 and 1. The event {χ = 1} is called a success and occurs with probability
p. The event {χ = 0} is called a failure and occurs with probability q = 1 − p.

Suppose an experiment consists of spinning an unbiased coin. If we spin the coin


n times we say that we have performed n trials and if the outcome is either 0 or
1, true or false, we refer to that as a Bernoulli trial.

Definition 1.2 Binomial variable. The probability mass function of a binomial


random variable χ that yields k successes in n independent Bernoulli trials is
defined by
 
n
pχ (k) = pk q n−k
k

Definition 1.3 Geometric variable. Suppose it took N Bernoulli trials to obtain


the first success. The variable N is said to be geometric and its probability mass
function is given by
pN (k) = q k−1 p (1.5)
1.2 Discrete Random Variables 19

Another way of describing a random variable χ which takes values from an ordered
set is to give a formula for the probability that it will take on values of xi which
are less than or equal to some value a. This leads to the following important
definition.

Definition 1.4 The cumulative distribution function of a random variable χ is


the function X
Fχ (a) = pχ (x)
x≤a

defined for all real variables a.

Another way of denoting the cumulative distribution often encountered in the


literature is

Fχ (x) = P [χ ≤ x]

Again, it should be evident that the values of the function Fχ are between 0 and
1. Using Fχ we can calculate

P [a < χ ≤ b] = Fχ (b) − Fχ (a) (1.6)

This follows easily from the fact that

{χ ≤ b} = {χ ≤ a} ∪ {a < χ ≤ b}

so that
Fχ (b) = Fχ (a) + P [a < χ ≤ b]
and the equation in (1.6) follows.

Exercise 1.3 If the probabilities of a male or female offspring are both 0.5, find
the probability of a family of five children being all male.

Exercise 1.4 A person has 18 bank-notes which includes 4 counterfeits in his


purse. If he pays for an item with 2 bank-notes selected randomly from his purse,
what are the probabilities that
1. both notes are genuine;

2. one of the notes is a counterfeit;

3. both notes are counterfeits?

Exercise 1.5 An Ethernet local network has k stations always ready to transmit.
A station transmits successfully if no other station attempts to transmit at the
same time as itself. If each station attempts to transmit with probability p, what
is the probability that some station will be successful?
20 1 Random Variables

1.3 Continuous Random Variables

In Sec.1.2 we obtained the distribution function of the discrete random variable


by summing values of the mass function. When we now replace summation by
integration, we obtain the notion of a continuous random variable.
A random variable χ is said to be continuous if there exists a nonnegative function
fχ (x) such that the cumulative distribution function Fχ (x) can be calculated from
Z a
Fχ (a) = fχ (x)dx (1.7)
−∞

and frequently defined by the expression

Fχ (a) = P [χ ≤ a]

The function fχ (x) is called the probability density function of the random vari-
able χ. Again, because we are concerned with probabilities, we must have the
condition
Z ∞
fχ (x)dx = 1 (1.8)
−∞

Also, analogous to Eq. (1.6) we can calculate the probability that a random
variable χ lies in the interval (a,b) from
Z b
P [a < χ < b] = fχ (x)dx (1.9)
a

The density function fχ (x) does not have to be continuous, but the distribution
function Fχ (x) is automatically continuous. This implies

P [χ = x] = 0 (1.10)

for any value of x, so that the events

{a ≤ χ < b} {a < χ ≤ b} {a < χ < b}

all have the same probability given by the integral in (1.9).


It should be clear that we can compute the density function from the distribution
function from
d
fχ (x) = Fχ (x) (1.11)
dx
The probability density function we will meet over and over again is the negative
exponential density function given by

fχ (x) = λe−λx (1.12)


1.4 Moments of a Random Variable 21

The constant λ is called the parameter of the distribution and the function is
undefined for x < 0.
The corresponding cumulative distribution function is easily calculated to be
Z a
F χ(a) = λe−λx dx = 1 − e−λa (1.13)
0

for a ≥ 0, and F (a) = 0 if a < 0. Note also that lima→∞ F (a) = 1 as it should
be, since it is certain that 0 ≤ x < ∞.

Exercise 1.6 Find the probabilities that a random variable having an exponential
distribution with parameter λ = 10 assumes a value between 0 and 3, a value
greater than 5, and a value between 9 and 13.

Exercise 1.7 The life expectancy of a certain kind of lightbulb is a random vari-
able with an exponential distribution and a mean life of 100 hours. Find the
probability that the lightbulb will exceed its expected lifetime.

1.4 Moments of a Random Variable

In most cases we are not interested in the specific distribution function of a


random variable, but only in some characteristic values, the moments. The mean
or average value of a real positive random variable χ(t) is used very often and it
is more frequently referred to as the expectation of that variable. We write E[χ]
or χ for that value and it is defined by
Z ∞
E[χ] = tfχ (t)dt
0

Note that we integrate only over the interval t ∈ [0,∞) since the independent
variable will always be time in our discussions.
We will see later on that we will need to know the expectation of the power of
a random variable as well. The expected value of the nth power of a random
variable is referred to as its nth moment. Thus the more general nth moment
(the mean is just the first moment) is given by
Z ∞
E[χn ] = tn fχ (t)dt
0

Furthermore, the nth central moment of a random variable is defined to be


Z ∞
n
(χ − χ) = (t − χ)n fχ (t)dt
0

The second central moment is used very often and is referred to as the variance,
usually denoted by σχ2 and defined as before by
22 1 Random Variables

σχ2 = (χ − χ)2
= χ2 − (χ)2

The square root σχ of the variance is referred to as the standard deviation. The
ratio of the standard deviation to the mean of a random variable is called the
coefficient of variation denoted by

σχ
Cχ =
χ

Exercise 1.8 Referring to Ex. 1.5 (page 19), compute the mean number of colli-
sions to be expected by any one of the k stations before a successful transmission.

Exercise 1.9 Compute the mean and coefficient of variation of a random variable
which is exponentially distributed with parameter λ.

1.5 Joint Distributions of Random Variables

Use the symbol Rn to denote the set of all n-tuples of real numbers. Let χ1 ,χ2 , . . . ,χn
be random variables. These random variables are said to have a joint discrete dis-
tribution if there exists a nonnegative function p(x1 ,x2 , . . . ,xn ) of n real variables
that has the value 0 except at a countable set of points in Rn , such that

P [χ1 = x1 ,χ2 = x2 , . . . ,χn = xn ] = p(x1 ,x2 , . . . ,xn )

for all points (x1 ,x2 , . . . ,xn ) in Rn . Obviously we must have that
X
p(x) = 1
x∈Rn

Similarly we say that the collection χ1 ,χ2 , . . . ,χn of random variables has a
joint continuous distribution if there exists a nonnegative integrable function
f (x1 ,x2 , . . . ,xn ) of n real variables that satisfies

Z a1 Z an
P [χ1 ≤ a1 , . . . ,χn ≤ an ] = ... f (x1 ,x2 , . . . ,xn )dx1 . . . dxn
−∞ −∞

for all choices of upper limits a1 , . . . ,an . The function f is called the joint proba-
bility density function of the random variables χ1 ,χ2 , . . . ,χn and as in the discrete
case, we must have

Z ∞ Z ∞
... f (x1 ,x2 , . . . ,xn )dx1 . . . dxn = 1
−∞ −∞
1.6 Stochastic Processes 23

If we know the joint distribution f of χ1 ,χ2 , . . . ,χn , we can obtain the distribution
of any one, say χm , of the random variables by simply integrating over all values
of the remaining random variables. That is, fm (x) =
Z ∞ Z ∞
... f (x1 , . . . ,xm−1 ,x,xm+1 , . . . ,xn )dx1 . . . dxm−1 dxm+1 . . . dxn
−∞ −∞

is the probability density function of χm . The same holds true for the discrete
case where we would sum, rather than integrate, over all possible values of the
other variables.

1.6 Stochastic Processes

In the previous sections we frequently referred to a random variable χ taking on


a value x. Nowhere did we make any mention of time, or, in other words, when
χ took on what value or how that varied with time. Should we do that, we have
what is known as a stochastic process. Mathematically then, a stochastic process
is a family of random variables {χ(t)} defined over the same probability space.
Put differently, the values (also called states) that members of the family χ(t)
can take on all belong to the same set called the state space of χ(t).
Examples of stochastic processes are the number of persons on the beach as a
function of the time of the day or the number of processes executing on a computer
as a function of time. You will come to suspect already that if we can describe the
latter mathematically we have made great progress at predicting the behaviour
of the computer.
The classification of stochastic processes (some people also call them random pro-
cesses) depends on three things: the state space; the nature of the time parameter
and the statistical dependencies among the random variables χ(t) for different
values of the time parameter.

Definition 1.5 If the values x = (x1 ,x2 , . . .) in the state space of χ(t) are finite
or countable, then we have a discrete-state process, also called a chain. The state
space for a chain is usually the set of integers {0,1,2, . . .}. If the permitted values
in the state space may range over a finite or infinite continuous interval, then
we say that we have a continuous-state process. The theory of continuous-state
stochastic processes is not easy and we will only be considering discrete-state
processes in this book.

Definition 1.6 If the times t = (t1 ,t2 , . . . ,tn ) at which we observe the value of
χ(t) are finite or countable, then we say that we have a discrete-time process;
if these times may, however, occur anywhere within a set of finite intervals or
an infinite interval of time, then we say that we have a continuous-time process.
When time is discrete we write χn rather than χ(t) and refer to a stochastic
sequence rather than a stochastic process.
24 1 Random Variables

Definition 1.7 Consider the joint distribution function (refer Sec. 1.5) of all the
random variables X = {χ(t1 ),χ(t2 ), . . .} given by

FX (x; t) = P [χ(t1 ) ≤ x1 , . . . ,χ(tn ) ≤ xn ] (1.14)

for all x = (x1 ,x2 , . . . ,xn ), t = (t1 ,t2 , . . . ,tn ) and all n. Then the nature of FX (x; t)
is the third quantity which determines the class of a stochastic process.

In this book we will consider only the class of stochastic processes known as
Markov processes.
25

2 Markov Processes

In 1907 a Russian mathematician, A.A. Markov, described a class of stochastic


processes whose conditional probability density function is such that

P [χ(t) = x|χ(tn ) = xn ,χ(tn−1 ) = xn−1 , . . . ,χ(t0 ) = x0 ] (2.1)


= P [χ(t) = x|χ(tn ) = xn ], t > tn > tn−1 > . . . > t0

The above condition is known as the Markov property. A Markov process is a


stochastic process {χ(t), t ∈ T } for which this property holds. We will assume
that T = [0,∞) in our discussions and write S = {xi = i; i ∈ N0 }1 , the state
space of the process.
The intuitive explanation of the Markov property is to say that the future of the
process, from time tn onwards, is determined only by the present state. However
the process may have evolved to its present state χ(tn ), it does not influence
the future. However, even if the history of the system up to the present state
does influence the future behaviour, we may still be able to satisfy the Markov
assumption by a change in the state structure. Let us assume, for example, that
the next state depends on the last two states of our N state Markov chain (MC)2 .
Then we could define a new Markov process with N 2 states, where each state in
the new process would consist of successive pairs of states in the old process. In
this way the Markov property of Eq. (2.1) would still be satisfied, albeit at the
expense of considerable increase in computational complexity. Any dependence
of future behaviour on a finite number of historical steps can, at least in theory,
be treated in the same way.
Definition 2.1 Homogeneous Markov Processes. A Markov process {χ(t)} is said
to be homogeneous or stationary if the following condition holds

P [χ(t + s) = x|χ(tn + s) = xn ] = P [χ(t) = x|χ(tn ) = xn ] (2.2)

The equation expresses that a homogeneous Markov process is invariant to shifts


in time.
Throughout our discussions we shall use as an example of a Markov process a
surfer which goes from beach to beach in some random way as surfers tend to
do. We shall describe the state of this Markov process, xi , i = 1,2, . . . ,N by
the number i of the particular beach the surfer is on. In fact, for notational
convenience, we shall use throughout the integer i to denote the state xi of a
Markov process.
1
N denotes the set of positive integers and N0 additionally includes the 0.
2
A Markov process with a discrete state space is also called a Markov chain.
26 2 Markov Processes

In the case of a homogeneous Markov process, the particular instant tn in Eq. (2.2)
does not matter either so that the future of the process is completely determined
by the knowledge of the present state. In other words,

pij (t − tn ) := P [χ(t) = j|χ(tn ) = i] (2.3)

In fact, worse than that, an important implication is that the distribution of the
sojourn time in any state must be memoryless. Our surfer does not know how long
he has been at this beach! If you think about it, if the future evolution depends
on the present state only, it cannot depend on the amount of time spend in the
current state either.
When time is continuous, there is only one probability distribution fχ (y) of the
time y spent in a state which satisfies the property

P [χ ≥ y + s|χ ≥ s] = P [χ ≥ y]

and that is the negative exponential function

fχ (y) = λe−λy , y ≥ 0 (2.4)

In other words, the sojourn times in a Continuous Time Markov Chain (CTMC)
have an exponential probability distribution function. We will prove this fact in
Sec. 2.3 on page 49. Not surprisingly, we will meet the exponential distribution
many times in our discussions.
Similarly, for a Discrete Time Markov Chain (DTMC), the sojourn time η in a
state must be a geometrically distributed random variable (cf. Eq. (1.5))

pη (n) = P [η = n] = q n−1 (1 − q), n = 1,2,3, . . . ; 0 ≤ q < 1. (2.5)

with cumulative distribution function Fη (n)


n
X
Fη (n) = pη (k)
k=1

Note that when a process has an interarrival time distribution given by Fη (n) it
is said to be a Bernoulli arrival process. Moreover, let η = nδ for n an integer
and δ the basic unit of time. Then the mean time is given by

X δ
δ kpη (k) = (2.6)
k=1
(1 − q)

from which the mean arrival rate is (1 − q)/δ.


In order to decide whether a particular process is a Markov process, it suffices to
check whether the distribution of sojourn times is either exponential or geometric
and whether the probabilities of going from one state to another only depend on
the state the process is leaving and on the destination state.
2.1 Discrete Time Markov Chains 27

Exercise 2.1 The weather bureau in an European country decided to improve its
record for weather prediction. This is made a little easier by the fact there are
never two sunny days in a row. If it is a sunny day however, the next day is just
as likely to be rainy as it is likely to be just grey and dull. If it is not a sunny day,
there is an even chance that the weather will be the same the next day. If there
is a change from a rainy or dull day, there is only a 50 percent chance that the
next day will be sunny.
1. Is the stochastic process we have just described Markovian?
2. If it is only approximately Markovian, what can one do to improve the ap-
proximation?

2.1 Discrete Time Markov Chains

In this section we concern ourselves with the case where the time spent in a
Markov state has a discrete distribution whence we have a Discrete Time Markov
Chain (DTMC).

Definition 2.2 The stochastic sequence {χn |n = 0,1,2, . . .} is a DTMC provided


that

P [χn+1 = xn+1 |χn = xn ,χn−1 = xn−1 , . . . ,χ0 = x0 ] = (2.7)


P [χn+1 = xn+1 |χn = xn ]

for n ∈ N.

The expression on the right-hand side of this equation is the one-step transition
probability of the process and it denotes the probability that the process goes
from state xn to state xn+1 when the time (or index) parameter is increased from
n to n + 1. That is, using the indices for notating the states,

pij (n,n + 1) = P [χn+1 = j|χn = i]

The more general form of the sth step transition probabilities is given by

pij (n,s) = P [χs = j|χn = i]

which gives the probability that the system will be in state j at step s, given that
it was in state i at step n where s ≥ n.
Note that the probabilities pij (n,s) must satisfy the following requirements:

0 < pij (n,s) ≤ 1, i,j = 1,2, . . . ,N ; n,s = 0,1,2, . . .


X
pij (n,s) = 1, i = 1,2, . . . ,N ; n,s = 0,1,2, . . .
j∈S
28 2 Markov Processes

The probability of going from state i to state j is the probability of somehow get-
ting from i at time n to some intermediate state k at some time r and from there
to state j. The events {χr = k|χn = i} and {χs = j|χr = k} are independent, so
that using this and the fact that from the Markov property,

P [χs = j|χr = k,χn = i] = P [χs = j|χr = k]

we can write recursively over all possible intermediate states k


X
pij (n,s) = P [χr = k|χn = i]P [χs = j|χr = k]
k∈S
X
= pik (n,r)pkj (r,s) (2.8)
k

for n ≤ r ≤ s. Eq. (2.8) is known as the Chapman-Kolmogorov equation for


DTMC.
If the DTMC is homogeneous (cf. Eq. (2.2)) which will be the case in all of our
discussions, the probability of various states m steps into the future depends only
upon m and not upon the current time n; so that we may simplify the notation
and write

pij (m) = pij (n,n + m) = P [χn+m = j|χn = i]

for all m ∈ N. From the Markov property we can establish the following recursive
equation for calculating pij (m)
X
pij (m) = pik (m − 1)pkj (1), m = 2,3, . . . (2.9)
k

We can write Eq. (2.9) in matrix form by defining matrix P = [pij ], where
pij := pij (1), so that

P (m) = P (m−1) P (2.10)

where

P (0) = I

the identity matrix. Note that

P (1) = P (0) P = IP
P (2) = P (1) P = P 2
P (3) = P (2) P = P 3

and in general

P (m) = P m , m = 0,1,2, . . . (2.11)


2.1 Discrete Time Markov Chains 29

This equation enables us to compute the m-step transition probabilities from the
one-step transition probabilities.
(m)
Next we consider a very important quantity, the probability πj of finding our
DTMC in state j at the mth step:
(m)
πj = P [χm = j] (2.12)

How can we calculate these probabilities?


If we write
(m)
pij = pij (m) = P [χm = j|χ0 = i]

for the m-th step transition probability where we have assumed, without loss of
generality, that we entered state i at time 0, then multiplying both sides of this
(0)
equation by πi = P [χ0 = i] (cf. definition in Eq. (2.12)), summing over all states
and applying theorem of Total Probability (cf. page 16), we obtain
X (m) X
P [χ0 = i]pij = P [χ0 = i]P [χm = j|χ0 = i]
i i
X (0) (m)
πi pij = P [χm = j]
i
(m)
= πj

or, alternatively
(m) X (0) (m)
πj = πi pij (2.13)
i

That is, the state probabilities at time m can be determined by multiplying the
multistep transition probabilities by the probability of starting in each of the
states and summing over all states.
The row vector formed by the state probabilities at time m is called the state
probability vector Π(m) . That is,
(m) (m) (m)
Π(m) = (π0 ,π1 ,π2 , . . .)

With this definition, Eq. (2.13) can be written in matrix form as follows

Π(m) = Π(0) P (m) , m = 0,1,2, . . .

or from Eq. (2.11)

Π(m) = Π(0) P m , m = 0,1,2, . . . (2.14)


30 2 Markov Processes
WAIKIKI
2
3
10
1
7 2
10
1 3
2 10 3
2
10

1
IPANEMA
1
2

CLIFTON

Figure 2.1 A Markov chain.

Example 2.1 Consider the simple discrete time MC in Fig.2.1 which illustrates
the behaviour of our surfer. This diagramme is also called the state transition
diagramme of the DTMC. Every instant a unit of time elapses the surfer decides
to do something. When at the Clifton, he decides to go to Waikiki with probability
1
2 or may decide to go to Ipanema with the same probability (our surfer happens
to be very affluent). When in Ipanema he may in fact decide to stay there with
probability 12 at the end of a time period. With our beaches numbered as shown,
we have

0 0.5 0.5
 

P =  0.3 0 0.7 
0.2 0.3 0.5

Assume that our surfer starts off at Clifton (beach 1). In other words the initial
distribution is Π(0) = (1,0,0). From Clifton he can go to Ipanema or Waikiki with
equal probability, i.e.,

0 0.5 0.5
 

Π(1) = (1,0,0)  0.3 0 0.7  = (0,0.5,0.5)


0.2 0.3 0.5

from Eq. (2.14) and so on.

As we will see later, the vector Π(m) of state probabilities tends to a limit for
m → ∞. Even more, one can show that for specific DTMCs the effect of Π(0) on
the vector Π(m) completely vanishes. For our surfer that means, e.g., even if we
do not know at which beach he started the probability of finding him at a specific
beach after a long time is nearly constant. This phenomenon does not hold for all
DTMCs. Consider, e.g., the DTMC of Fig. 2.2. If the process starts in state 0 it
stays there forever. But starting in state 3 there is a chance that the process gets
2.1 Discrete Time Markov Chains 31

p
2 3
q p

0 p q 1
1 5

q
p
1 4
q

Figure 2.2 A simple Markov chain.

absorbed in state 5. Clearly, the probability Π(m) is not independent of the initial
distribution. This effect or to be more precise the absence of such effects can be
verified by investigating the structure of the state transition diagramme. E.g.,
from state 0 or 5 of the DTMC given in Fig. 2.2 no other state can be reached,
thus intuitively explaining the described effect.
Next we consider a classification of Markov states based on the structure of the
state transition diagramme.
Consider states i,j ∈ S. If there is a path from i to j, i.e., there exists an integer
n (which may depend on i and j) such that
pij (n) > 0
then we write i * j.
Two states i and j are said to communicate, written i j, if there is a path from
state i to state j and vice versa.
Let C[i] = {j|i j; j ∈ S},∀i ∈ S. We call C[i] the class of state i.

Example 2.2 Consider the simple MC in Fig. 2.2. In that figure, C[0] = {0},C[5] =
{5},C[1] = {1,2,3,4}.

Definition 2.3 A MC is said to be irreducible if every state communicates with


every other state.

An irreducible MC clearly has only one class of states, i.e. C[i] = C[j] ∀i,j ∈ S.
The MC of Fig. 2.2 is reducible since 0 1 is for instance not true.
Let C denote any class of state and C be the set of Markov states not in the class
C.

Definition 2.4 A class C is said to be closed if no single-step transition is pos-


sible from any state in C to any state in C. If C consists of a single state, say i,
then i is called an absorbing state. A necessary and sufficient condition for i to
be an absorbing state is that pii = 1.
32 2 Markov Processes

Since the latter implies pij = 0 for i 6= j, an absorbing state does not communicate
with any other state.
The MC of Fig. 2.2 has two absorbing states, 0 and 5.
Definition 2.5 A class C is said to be transient if there is a path out of C. That
is, if ∃i ∈ C and k ∈ C such that pik > 0. The individual states in a transient
class are themselves said to be transient.
States 1, 2, 3 and 4 in the MC of Fig. 2.2 are all transient.
Definition 2.6 A MC is said to be absorbing if every state in it is either absorb-
ing or transient.
Finally we define an ergodic class.
Definition 2.7 A class C is said to be ergodic if every path which starts in C
remains in C. That is
X
pij = 1, ∀i ∈ C
j∈C

The individual states in an ergodic class are called ergodic. An irreducible MC


consists of a single ergodic class, i.e. C[i] = S,∀i ∈ S.
(m)
Next write fj for the probability of a Markov process leaving a state j and first
returning to the same state j in m steps. Clearly the probability of ever returning
to state j is given by


X (m)
fj = fj
m=1
We now classify the states j of a MC depending on the value fj of the state. Not
surprisingly, if fj = 1 we say the state is said to be recurrent; if a return is not
certain, that is fj < 1, then state j is said to be transient. Furthermore, if our
MC can return to state j only at steps η,2η,3η, . . ., where η ≥ 2 is the largest
such integer, then state j is said to be periodic with period η. If such an integer
number η does not exist, then the state j is said to be aperiodic.
(m)
Knowing the probability fj of returning to state j in m steps, we can now
define another interesting quantity, the mean recurrence time Mj of state j.

X (m)
Mj = mfj (2.15)
m=1
The mean recurrence time is thus the average number of steps needed to return
to state j for the first time after leaving it.
We can further describe a state j to be recurrent null if Mj = ∞, whereas it
is recurrent nonnull if Mj < ∞. Note that an irreducible MC can only have
recurrent null states if the number of states is infinite.
With all this in mind, we can now state the following important result[108] with-
out proof:
2.1 Discrete Time Markov Chains 33

Theorem 2.1 The states of an irreducible DTMC are all of the same type; thus
they can be either

• all transient,

• all recurrent nonnull, or

• all recurrent null.

Moreover, if periodic, then all states have the same period η.

Exercise 2.2 Assume that we don’t know for certain where our surfer has started.
An oracle tells us that he might have started at Clifton with a chance of 19%, at
Waikiki with 26% and at Ipanema, the beach he likes most, with a chance of 55%.
What is our vector π (0) now? Calculate π (1) , π (2) , π (3) .

2.1.1 Steady State Distribution

The most interesting DTMCs for performance evaluation are those whose state
(m)
probability distribution πj does not change when m → ∞ or to put it dif-
ferently, a probability distribution πj defined on the DTMC states j is said to
(m)
be stationary (or have reached a steady state distribution) if πj = πj when
(0)
πj = πj , that is, once a distribution πj has been attained, it does not change in
the future (with m).

Definition 2.8 Define the steady state probability distribution


{πj ; j ∈ S} of a DTMC by

(m)
πj = lim πj
m→∞

We are after the steady state probability distribution {πj } of being in state j at
some arbitrary point in the future. Clearly, if we know this, we can say a great
deal about the system modelled by the MC. When the DTMC is irreducible,
aperiodic and homogeneous the following theorem [108] helps us out.

Theorem 2.2 In an irreducible and aperiodic homogeneous MC the limiting prob-


abilities πj always exist and are independent of the initial state probability distri-
bution. Moreover, either

1. all states are transient or all states are recurrent null. In both cases πj = 0 ∀ j
and there exists no steady state distribution, or

2. all states are recurrent nonnull and then πj > 0 ∀ j, in which case the set
{πj } is a steady state probability distribution and
34 2 Markov Processes

1
πj = (2.16)
Mj

In this case the quantities πj are uniquely determined through the following
equations
X
πi = 1 (2.17)
i
X
πi pij = πj (2.18)
i

where Mj is defined in Eq. (2.15).


A recurrent nonnull DTMC is also referred to as an ergodic MC and all the states
in such a chain are ergodic.
From our definitions in the previous section, we know that a finite MC is ergodic
if from any state it is possible to reach any other state and belong to a single
class.
The limiting probabilities πj of an ergodic DTMC are referred to as equilibrium
or steady state probabilities. It should be clear that if we observe an ergodic MC
for a fixed time T , that the average sojourn time τ i spent in state i by the DTMC
during T can be computed from

τ i = πi T (2.19)

Another quantity which will be useful to us is the average time υij spent by the
DTMC in state i between two successive visits to state j in steady state. This
quantity is also known as the visit ratio or mean number of visits, and can be
computed from
πi
υij = (2.20)
πj

Referring to Eq. (2.18), it is more convenient to find that equation expressed in


matrix notation. In order to do this we define the probability vector Π as

Π = [π0 ,π1 ,π2 , . . .]

so that we may now rewrite Eq. (2.18) as

Π = ΠP (2.21)

Note that Eq. (2.21) follows directly from the equation Π(m) = Π(m−1) P by
taking the limit as m → ∞. The following example illustrates that no unique
steady state distribution exists for a periodic MC.
2.1 Discrete Time Markov Chains 35

1 1

2 3
1

Figure 2.3 A simple periodic Markov chain.

Example 2.3 Consider the periodic MC illustrated in Fig.2.3 and let Π(0) =
(1,0,0). Then

Π(1) = Π(0) P = (0,1,0)


Π(2) = Π(1) P = (0,0,1)
Π(3) = Π(2) P = (1,0,0)
Π(4) = Π(3) P = (0,1,0)
...

Clearly the limit Π = limm→∞ Π(m) does not exist. Similarly the MC must be
irreducible for a unique solution to exist as the following example illustrates.

Example 2.4 Consider the reducible MC illustrated in Fig.2.4 and let Π(0) =
(1,0,0,0,0). Then
 
0 0 0.75 0.25 0

 0 0 1 0 0 

(1) (0) 
Π =Π 0 1 0 0 0  = (0,0,0.75,0.25,0)


0 0 0 0 1 
 

0 0 0 1 0

Π(2) = Π(1) P = (0,0.75,0,0,0.25)


Π(3) = Π(2) P = (0,0,0.75,0.25,0)
Π(4) = Π(3) P = (0,0.75,0,0,0.25)
...

Again there is no limit Π = limm→∞ Π(m) .


So far we have not said anything about the size of the state space of our DTMC.
When the state space in our examples is finite we speak of a finite MC. The
states of a finite aperiodic irreducible DTMC are always ergodic.
In the following example we determine the steady state distribution for our surfer
example in Fig. 2.1.

Example 2.5 Using Eq. (2.21) we can write the following set of linear equations:
36 2 Markov Processes

0.75 0.25
1 1
2 3 4 5
1 1

Figure 2.4 A simple reducible Markov chain.

1−p
1 2
1

p 1−q

3
q

Figure 2.5 The homogeneous discrete time MC of the exercise

π1 = 0π1 + 0.3π2 + 0.2π3


π2 = 0.5π1 + 0π2 + 0.3π3 (2.22)
π3 = 0.5π1 + 0.7π2 + 0.5π3

Note that in Eqs. (2.22) above, the last equation is a linear combination of the sec-
ond and the first, indicating that there is a linear dependence among them. There
will always be a linear dependence amongst the set of equations in Eq. (2.21) and
it is the reason why we have to use the additional Eq. (2.17) to derive a solu-
tion. Using the latter equation and any two of the equations in (2.22) we obtain
approximately

π1 = 0.188
π2 = 0.260
π3 = 0.552

So our surfer is most likely to be found on the beach at Ipanema (probability .552)
1
and in fact he returns every 0.552 or 1.81 days to that beach.
2.1 Discrete Time Markov Chains 37

Exercise 2.3 Consider the stochastic process described in Exercise 2.1. Let C,R
and D represent a sunny, rainy and dull day respectively and in this way define a
new stochastic process with 9 states. Determine the new transition probabilities.
Consider this process to be a discrete time MC and find the probability of two dull
days following upon one another.

Exercise 2.4 Consider the homogeneous MC illustrated in Fig. 2.5.


1. Give the probability matrix P for the chain.

2. Under what conditions will the chain be irreducible and aperiodic, if at all?

3. Solve for the steady state probability vector Π.

4. What is the mean recurrence time of state 3?

5. For what values of p and q will π1 = π2 = π3 ?

2.1.2 Absorbing Chains and Transient Behaviour

When using MCs to model real systems it is often very useful to know the number
of steps (or, equivalently, the time) spent in the transient states before reaching
an absorbing state. Think of executing a multi-layer network protocol: The time
spent by processes executing the protocol in one layer (transient states) before
going to the the next layer (absorbing state) is one example of such an application.
The absorbing MC illustrated in Fig. 2.6 consisting of a set St of nt transient
states and a set Sa of na absorbing states, illustrates what we have in mind.
We begin our analysis by numbering the states in the MC such that the na
absorbing states occur first and writing the transition probability matrix P as
 
I 0
P = (2.23)
R Q

Once in an absorbing state the process remains there, so I is the identity matrix
with all elements pii = 1, 1 ≤ i ≤ na . R is an nt × na matrix describing the
movement from the transient to the absorbing states, and Q is a nt × nt matrix
describing the movement amongst transient states. Since it is not possible to move
from the absorbing to the transient states 0 is the na × nt zero matrix. Since the
formula for matrix multiplication also applies to matrices written in block form,
we can calculate the powers of P in terms of the matrices R and Q:
 
2 I 0
P =
R + QR Q2

and  
I 0
P3 =
R + QR + Q2 R Q3
38 2 Markov Processes

1 1

2 2

nt na

St Sa

Figure 2.6 An absorbing MC.

or in general  
n I 0
P =
N n R Qn
where N n = I + Q + Q2 + . . . + Qn−1 = ni=1 Qi−1 .
P

We can now state the following fundamental result for an absorbing MC:

Theorem 2.3 When n → ∞, then Qn → 0 and N n → (I − Q)−1 . In particular,


the matrix I - Q is invertible.

We will not prove the theorem (for a proof see [86]) but, from Eq. (2.14) above
and the knowledge that for a transient state the steady state probability πj = 0,
the first part of the result is easy to accept intuitively.
Write

N = [nij ] = (I − Q)−1

N is called the fundamental matrix of the MC.


It follows from the last theorem that
2.1 Discrete Time Markov Chains 39
 
n I 0
lim P =
n→∞ NR 0

For absorbing chains, the only interesting starting states are the transient ones.
Assume that we start with an initial state i ∈ St . For each state j ∈ St , define
the random variable υij to be the number of visits to state j before an absorbing
state is reached. Define υij = 1 when i = j.
We know from Th. 2.2 that υij < ∞ for any transient state j, and that υij has
finite expectation. Assuming these properties we can now prove the following
theorem:

Theorem 2.4 For every pair of transient states i,j

E[υij ] = nij

where N = [nij ] is the fundamental matrix as before.

Proof. Suppose that we move from starting state i to state k in the first step.
If k is an absorbing state, we can never get to state j. If k is a transient state,
we are in the same situation as before with starting state k instead. Using the
Markov property,
X
E[υij ] = δij + qik E[υkj ]
k∈St

The term δij is the Kronecker delta function with value 1 if i = j and 0 otherwise
and it counts the initial visit to state j in case the starting state is j. Denote by
M the matrix whose i,j-entry is E[υij ] for all i,j ∈ St . Then the last equation can
obviously be written

M = I + QM

so that M = (I − Q)−1 = N . t
u

Referring to Fig. 2.6, starting off in some state i ∈ St , the total number of steps
(transitions) before reaching an absorbing state is clearly the sum of times we
visit every state in St before absorption. Denote this random variable by υi and
its expected value E[υi ] by τi .

Theorem 2.5
X
τi = nij i ∈ St (2.24)
j∈St

and τi < ∞.
40 2 Markov Processes

Proof. Since the expectation of the sum is the sum of the expectations the latter
result follows from the previous theorem. t
u

Write ~τ = (τ1 ,τ2 , . . . ,τnt ). Then

~τ = N e> (2.25)

where e is the row vector with 1’s in every position.


What about the expected number of steps needed to reach an absorbing state,
given that we will start in any of the transient states with probability distribution
~r = (r1 ,r2 , . . . ,rnt )? This quantity we will denote by the scalar value τ and refer
to it simply as the mean time before absorption.

Theorem 2.6 Let τ be the mean time before absorption of a DTMC with tran-
sient state set St = {1,2, . . . ,nt } and initial probability distribution ~r = (r1 ,r2 , . . . ,rnt ).
Then

τ = ~r N e> (2.26)

Proof. Since ri is the probability of a starting state i ∈ St , and τi the expected


value of the number of steps to reach an absorbing state from that state i, the
result follows. t
u

Furthermore, define σi2 as the variance of the time υi before absorption starting
in state i ∈ St and σ~2 = (σi2 , . . . ,σn2 t ) as the vector of the variance of these times.
Let τ~2 = (E[v12 ], . . . ,E[vn2 t ]). Then it can be shown (cf. [103], page 49) that

Theorem 2.7

σ~2 = (2N − I)~τ − τ~2 2 (2.27)

Proof. From Sec. 1.4 we know that

σi2 = E[υi2 ] − τi2 (2.28)

Using the same argument as in Th. 2.4 we write for


X X
E[υi2 ] = qik · 1 + qik E[(υi + 1)2 ]
k∈Sa k∈St
X X
= 1+ qik E[υi2 ] + 2 qik E[υi ]
k∈St k∈St
X  
= 1+ qik E[υi2 ] + 2E[υi ]
k∈St

Using vector and matrix notation, we can thus write


2.1 Discrete Time Markov Chains 41
 
E[υi2 ] = e> + Q E[υi2 ] + 2~τ

which reduces to

E[υi2 ] = N (e> + 2Q~τ )


= ~τ + 2(N − I)~τ

by substituting for N Q and N e> from Eq. (2.26). The result follows. t
u

In theory we therefore have the formulas in Eqs. (2.26) and (2.27) to compute
the mean and variance respectively for the time before absorption. In practice,
computing the matrix N = (I − Q)−1 for a MC with a large state space is no mean
task. Courtois and Semal[56] fortunately have devised a method of computing τ
and σ 2 from P. We next describe their technique without proving the results.
Proofs can be found in [103].
To start off, we define the augmented transition probability matrix
!
Q (I − Q)e>
(2.29)
~r 0

Note that (I − Q)e> is the vector of transition probabilities from the states in St
to a new state say, a ∈ Sa , and ~r and Q are the same as before.
The clever idea is that, assuming irreducibility and aperiodicity, the Markov
process defined by the matrix in Eq. (2.29), has the same behaviour as a new
process with the state a designated as an absorbing state provided one assumes
that whenever the latter chain reaches the absorbing state a, it is restarted with
the initial vector ~r and this is done infinitely many times. The new, absorbing
MC is described by the matrix

!
Q (I − Q)e>
(2.30)
0 1

Again, the ergodic behaviour of the process described by Eq. (2.29) describes the
behaviour of the absorbing chain of Eq. (2.30) over an infinite number of runs,
each started with the initial distribution vector ~r.

Theorem 2.8 If τ is the mean time before absorption of the chain


!
Q (I − Q)e>
0 1

when started with the initial distribution ~r, then


1
τ= − 1,
πa
42 2 Markov Processes

where πa is the last component of the steady state distribution of the DTMC
described by the matrix !
Q (I − Q)e>
~r 0

The proof of this theorem can be found in [56]. Intuitively, 1/πa is the mean
time between two visits to the last state a of the Markov process described by
the matrix in Eq. (2.29) (cf. Th. 2.2) and each time the system is in this last
state, one further step is needed to restart the absorbing chain with the initial
distribution ~r.
A similar result exists for the variance σ 2 of the time before absorption.

Theorem 2.9 If σ 2 is the variance of the time before absorption of the chain
!
Q (I − Q)e>
0 1

when started with the initial distribution ~r, then

σ 2 = 2τ τ 0 − τ − τ 2

where τ is as before and τ 0 is given by


1
τ0 = −1
πa0

and πa0 is the last component of the steady state distribution of the DTMC de-
scribed by the matrix !
Q (I − Q)e>
~r0 0
with
1
~r0 = (π1 ,π2 , . . . ,πnt )
1 − πa
where πi is the steady state distribution of the MC given in Th. 2.8.
This concludes our study of discrete time MCs.

Exercise 2.5 Two gamblers are betting on the outcome of an unlimited sequence
of coin tosses. The first gambler always bets heads, which appears with probability
p, 0 < p < 1 on every toss. The second gambler always bets tails, which appears
with probability q = 1 − p. They start with a total of C chips between them.
Whenever one gambler wins he has to give the other one chip. The game stops
when one gambler runs out of chips (is ruined). Assume the gamblers start with
C = 3 chips between them.

1. Determine the probability, in terms of p and q, that a gambler is ruined?


2.2 Semi-Markov Processes 43

2. How long will the game last if the first gambler starts with 1 coin?

Exercise 2.6 Write a program to simulate the game described in the previous
exercise for a sufficient number of coin tosses. Use values of p = 0.2,0.4, . . . ,0.8.
Compare your simulation results with the theoretical answers in the previous ex-
ercise. Assume p = 0.6.

1. Determine the theoretical mean time of a game. Compare your answer with
your simulation results.

2. Determine the theoretical variance of the duration of a game. Compare your


answer with your simulation results.

2.2 Semi-Markov Processes

In our discussions thus far the Markov process had the property that a transition
was made at every time instant. That transition may well return the process to
the same state, but a transition occurred nevertheless.
We now turn our attention to a more general class of processes where the time
between transitions may be several unit time intervals, and where this time can
depend on the particular transition being made. This process is no longer strictly
Markovian, however, it retains enough of the Markovian properties to deserve the
name semi-Markov process[90].

2.2.1 Formal Model of a Semi-Markov Process

A semi-Markov process can be thought of as a process whose successive state


occupancies are governed by the transition probabilities of a Markov process, but
which spends time in any state described by an integer-valued random variable
that depends on the state currently occupied and on the state to which the
next transition will be made. At the transition instants, the semi-Markov process
behaves just like a Markov process. We call this process the embedded Markov
process and denote the single step state transitional probabilities by pij , i,j =
1,2, . . . ,N . For simplicity we assume that the process has a finite state space of
size N .
The times at which transitions occur are now, however, governed by a different
probability mechanism. Before making the transition from state i to the next
state j, the process “sojourns”3 for a time τij in state i before proceeding to state
j.
3
Some authors refer to this time as the “holding time”.
44 2 Markov Processes

In our discrete time case, these sojourn times τij with expected value τij are
positive, integer-valued random variables, each governed by a probability mass
function sij (m), m = 1,2, . . . called the sojourn time mass function for a transi-
tion from state i to state j. That is,

P [τij = m] = sij (m), m = 1,2,3, . . . ; i,j = 1,2, . . . ,N



X
τij = msij (m)
m=1

One distribution sij (m) we are familiar with is the geometric distribution

(1 − a)am−1 , m = 1,2,3, . . .

We assume that the mean of the sojourn time is finite and that the sojourn times
are at least one time unit in duration, i.e.,

sij (0) = 0

In other words, to fully describe a discrete-time semi-Markov process, we must


specify N 2 holding time mass functions, in addition to the one step transition
probabilities.
Observe that we can consider the discrete-time Markov process we discussed in
the previous section to be the discrete-time semi-Markov process for which

1,if m = 1,
sij (m) = ∀ i,j = 1,2, . . . ,N
0,if m = 2,. . .

that is, all sojourn times are exactly one time unit in length.
We next define the waiting time τi with expected value τi as the time spent
in state i, i = 1,2, . . . ,N irrespective of the successor state and we define the
probability mass function of this waiting time as
N
X
P [τi = m] = pij sij (m)
j=1
N
X
τi = pij τij
j=1

That is, the probability that the system will spend m time units in state i if
we do not know its successor state, is the probability that it will spend m time
units in state i if its successor state is j, multiplied by by the probability that its
successor state will indeed be j and summed over all possible successor states.
2.2 Semi-Markov Processes 45

2.2.2 Interval Transition Probabilities

As in the DTMC case, we next set out to compute the n−step transition prob-
abilities, which we denote φij (n), for the semi-Markov case. That is, how can a
process that started by entering state i at time 0 be in state j at time n?
One way this can happen is for i and j to be the same state and for the process
never to have left state i throughout the period (0,n). This requires that the
process makes its first transition, to any other state, only after time n. That is

δij P [τi > n]

where

1, if i = j,
δij =
0, otherwise

It is not difficult to see that the latter term can be written as


 
n X
X N
δij 1 − pij sij (m) (2.31)
m=0 j=1

 
Let W (n) = {δij 1 − nm=0 N j=1 pij sij (m) } be the matrix of these elements.
P P

Every other way to get from state i to j in the interval (0,n) would mean the
process made its first transition from state i to some other state k at a time m,
and then by a succession of such transitions to state j at time n. Note that we
have to consider all intermediate times m, 0 < m ≤ n and intermediate states
k ∈ S, S the Markov state space. In other words
n X
X N
pik P [τik = m]φkj (n − m) (2.32)
m=0 k=1

where i,j = 1,2, . . . ,N ; n = 0,1,2, . . . or


n X
X N
pik sik (m)φkj (n − m) (2.33)
m=0 k=1

Define the congruent matrix multiplication C = A ◦ B of two matrices A = {aij }


and B = {bij } by C = {aij bij }, for all i,j. Furthermore, write

Φ(n) = {φij (n)}


n
X
S(n) = {1 − sij (m)}
m=0

With these definitions, Eq. (2.33) becomes


46 2 Markov Processes

n
X
(P ◦ S(m))Φ(n − m) n = 0,1,2, . . .
m=0
and
n
X
Φ(n) = W (n) + (P ◦ S(m))Φ(n − m) n = 0,1,2, . . . (2.34)
m=0
Φ(n) is called the interval transition probability matrix for the semi-Markov pro-
cess in the interval (0,n) and clearly
Φ(0) = I
Eq. (2.34) provides a convenient recursive basis for computing Φ(n) for any semi-
Markov process. The quantities P and S(m) come directly from the definition of
the process.
Example 2.6 Consider again our typical surfer of Fig. 2.1 on page 30, but let
us now give the example a semi-Markovian flavour. The nature of the surfer now
dictates that the length of time he will stay on a particular beach will depend on
both which beach he is on and where he intends to go next. The (sojourn) time
τij is thus the length of time spent surfing on beach i with the intention of going
to beach j (where j = i is certainly possible). The lifeguards on each beach have
been keeping record of our surfer and have come up with the following probability
mass functions describing the surfer’s behaviour:
   m−1    m−1    m−1
1 2 1 3 1 2
s11 (m) = s12 (m) = s13 (m) =
3 3 4 4 3 3
   m−1    m−1    m−1
1 1 1 7 2 3
s21 (m) = s22 (m) = s23 (m) =
2 2 8 8 5 5
   m−1    m−1    m−1
1 3 1 2 1 1
s31 (m) = s23 (m) = s33 (m) =
4 4 3 3 2 2
for m = 1,2,3, . . .
Solution. Consider the general geometric distribution function (cf. Sec. 1.2)
p(n) = (1 − a)a(n−1) . The first moment or mean n can be calculated in the fol-
lowing way:


X
n= n(1 − a)a(n−1)
n=0
P d d P
Using the property that da = da we write

X d n
n = (1 − a) a
n=0
da
d 1
= (1 − a)
da 1 − a
1
=
1−a
2.2 Semi-Markov Processes 47

so that the first moment τ ij in the example is given by

τ 11 = 3; τ 12 = 4; τ 13 = 3;
τ 21 = 2; τ 22 = 8; τ 23 = 2.5;
τ 31 = 4; τ 32 = 3; τ 33 = 2;

Clearly beach 2 (Waikiki) is the most popular with our surfer, since he spends 8
units of time on the average surfing there and then immediately returning to it.
The mean time he spends on that beach, irrespective of where he might go next is
given by

τ2 = p21 τ 21 + p22 τ 22 + p23 τ 23


= 0.3 × 2 + 0 × 8 + 0.7 × 2.5
= 2.35

Exercise 2.7 In Ex. 2.6, compute the mean time the surfer spends on Ipanema,
assuming that he has arrived there from anywhere else but Waikiki.

2.2.3 Steady State Behaviour

We now set out to find the limiting behaviour of the interval transition proba-
bilities over long intervals. It is important to note that the MC structure of a
semi-Markov process is the same as that of its embedded Markov process. There-
fore the interval transition probabilities of a semi-Markov process can exhibit a
unique limiting behaviour only within the chain of the embedded Markov process.
We begin by defining a limiting interval transition probability matrix Φ for our
process by

Φ = lim Φ(n)
n→∞

with elements φij . However, in steady state, the limiting interval transition prob-
abilities φij do not depend on the starting state i and we therefore write φij = φj .
Define a vector ϕ = (φ0 ,φ1 , . . . ,φN ) as the vector of probabilities φj that the semi-
Markov process is in state j as time n → ∞ and let Π = {π0 ,π1 , . . . ,πN } be the
steady state probability vector of the equivalent embedded MC [cf. Eq. (2.21)].
One can prove (see e.g., Howard[90]) that
πj τ j
φj = PN (2.35)
i=1 πi τ i

or
1
ϕ= ΠM
τ
where we have written
48 2 Markov Processes

N
X
τ= πj τ j
j=1

and M is the diagonal matrix [τ j ] of mean waiting times.


Although the proof requires transform analyses which are beyond the scope of
this book, Eq. (2.35) is intuitively somewhat obvious in the following way: The
probability φj of finding the semi-Markov process in state j is the product of the
average time τ j spent in that state, multiplied by the probability πj of being in
that state in the embedded MC and normalised to the mean total time τ spent
in all of the N possible states.
Note that Eq. (2.35) can also be written as
τj
φj = PN πi
i=1 πj τ i
τj
= PN
i=1 υij τ i

where υij , given by Eq. (2.20), is the visit ratio defined on page 34.
Example 2.7 Suppose we want to know the steady state probability of finding
our surfer on Waikiki beach. This we can do by applying Eq. (2.35). In Ex. 2.5
we determined that
Π = (0.188,0.260,0.552)
so that
π2 τ 2
φ2 =
π1 τ 1 + π2 τ 2 + π3 τ 3
0.260 × 2.35
=
0.188 × 3.5 + 0.260 × 2.35 + 0.552 × 2.7
= 0.22
There is thus a 22 percent chance of finding him on Waikiki or beach number 2.
Exercise 2.8 A car rental company has determined that when a car is rented
in Town 1 there is a 0.8 probability that it will be returned to the same town
and a 0.2 probability that it will be returned to Town 2. When rented in Town
2, there is a 0.7 probability that the car will be returned to Town 2, otherwise it
is returned to Town 1. From its records, the company determined that the rental
period probability mass functions are:
   m−1    m−1
1 2 1 5
s11 (m) = s12 (m) = m = 1,2,3, . . .
3 3 6 6
   m−1   m−1
1 3 1 11

s21 (m) = s22 (m) = m = 1,2,3, . . .
4 4 12 12
What percentage of the time does a car spend in Town 2?
2.3 Continuous Time Markov Chains 49

2.3 Continuous Time Markov Chains

In Sec. 2.1 we described how our surfer had to decide at regular, equal intervals
of time whether to leave or whether to stay on the beach where he is. If we now
allow him to decide at an arbitrary time which beach to go to next, we have the
continuous-time version of that example.
The Continuous Time Markov Chain (CTMC) version of the Markov property,
Eq. (2.1), is given by

Definition 2.9 The stochastic process {χ(t),t ≥ 0} is a CTMC provided

P [χ(t) = x|χ(tn ) = xn ,χ(tn−1 ) = xn−1 , . . . ,χ(t0 ) = x0 ] (2.36)


= P [χ(t) = x|χ(tn ) = xn ]

for any sequence t0 ,t1 , . . . ,tn such that t0 < t1 < . . . < tn and xk ∈ S where S is
the (discrete) state space of the process.

The right-hand side of the above equation is the transition probability of the
CTMC and we write

pij (t,s) = P [χ(s) = xj |χ(t) = xi ]

to identify the probability that the process will be in state xj at time s, given
that it is in state xi at time t ≤ s. Since we are still considering discrete state
Markov processes (chains) we will continue to use i ∈ N rather than xi to denote
a state of our Markov processes.
Note that we need to define

1, if i = j
pij (t,t) = (2.37)
0, otherwise

to establish the fact that the process may not leave immediately to another state.
We already mentioned in Sec. 2 on page 26 that the time a Markov process spends
in any state has to be memoryless. In the case of a DTMC this means that the
chain must have geometrically distributed state sojourn times while a CTMC
must have exponentially distributed sojourn times. This is such an important
property that we include a proof from Kleinrock[108] of it here. The proof is also
instructive in itself.
Let yi be a random variable which describes the time spent in state i. The Markov
property specifies that we may not remember how long we have been in state
i which means that the remaining sojourn time in i may only depend upon
i. Assume that this remaining time t has distribution h(t). Then our Markov
property insists that

P [yi > s + t|yi > s] = h(t)


50 2 Markov Processes

i.e. h(t) is independent of s. Using conditional probabilities, we may write

P [yi > s + t ∩ yi > s]


P [yi > s + t|yi > s] =
P [yi > s]
P [yi > s + t]
=
[yi > s]
where the last step follows from the fact that the event yi > s + t implies that
yi > s. Rewriting the last equation we obtain

P [yi > s + t] = P [yi > s]h(t) (2.38)

Setting s = 0 and since we know P [yi > 0] = 1 we have

P [yi > t] = h(t)

which we then substitute in Eq. (2.38) to obtain the relation

P [yi > s + t] = P [yi > s]P [yi > t] (2.39)

for all s,t ≥ 0. All that remains is to show that the only continuous distribution
function which satisfies Eq. (2.39) is the negative exponential distribution. Write
fi (t) for the corresponding density function. Then we can write
d d
P [yi > t] = (1 − P [yi ≤ t])
dt dt
= −fi (t) (2.40)

Using the latter result and differentiating Eq. (2.39) with respect to s we obtain

dP [yi > s + t]
= −fi (s)P [yi > t]
ds
Dividing both sides by P [yi > t] and setting s = 0 we obtain the differential
equation
dP [yi > t]
= −fi (0)ds
P [yi > t]

which by using (2.40) gives the following density function

fi (t) = fi (0)e−fi (0)t

for all t ≥ 0. Setting λ = fi (0) we are back to fi (x) = λe−λx which we had before
in Eq. (2.4) on page 26. 2
In other words, if a stochastic process has the Markovian property, the time spent
in a state will have a negative exponential distribution. This may seem rather
restrictive since many real systems do not have exponential time distributions.
Other documents randomly have
different content
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like