0% found this document useful (0 votes)
88 views

Laboratory Probability and Statistics 20 21 Errata Corrected

1. The laboratory practice involves exercises on random number generation, statistics, and Markov chains using Matlab. Students must complete at least one exercise in each section and subsection for a minimum of 5 exercises total. More completed exercises will be averaged for the final grade. 2. The document introduces pseudo-random number generation using linear congruential generators and discusses generating random variables from uniform distributions using the inverse transform method. 3. Statistics topics covered include parameter estimation, confidence intervals, and hypothesis testing. Parameter estimation methods like maximum likelihood estimation are introduced. Confidence intervals for the mean, variance, and expected value of random variables are discussed.

Uploaded by

JEA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Laboratory Probability and Statistics 20 21 Errata Corrected

1. The laboratory practice involves exercises on random number generation, statistics, and Markov chains using Matlab. Students must complete at least one exercise in each section and subsection for a minimum of 5 exercises total. More completed exercises will be averaged for the final grade. 2. The document introduces pseudo-random number generation using linear congruential generators and discusses generating random variables from uniform distributions using the inverse transform method. 3. Statistics topics covered include parameter estimation, confidence intervals, and hypothesis testing. Parameter estimation methods like maximum likelihood estimation are introduced. Confidence intervals for the mean, variance, and expected value of random variables are discussed.

Uploaded by

JEA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021

Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Indications: The laboratory practice has 3 main parts: Random numbers, Statistics, and Markov chains. Each part
has an introduction to the topic and theoretical-practical exercises. Clues of useful functions in Matlab to help you solving the
exercises are given. You must resolve a minimum of: one exercises per section 1, 1 exercise per each subsections 2.1, 2.2, 2.3 and
one exercise of section 3. You can hand-in more or all exercises. In that case, the mark will be the mean of the 5 best marks,
following the proportion above. You have to hand-in a pdf report detailing and explaining the resolution and the code used.
Teacher contact: [email protected]

Contents
1 Pseudo-Random Number Generation 1

2 Statistics 3
2.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Confidence interval for the expected value of a random variable . . . . . . . . . . . . . . 5
2.2.2 Batch means method for the mean of a non-Gaussian random variable . . . . . . . . . . 6
2.2.3 Confidence interval for the variance of a Gaussian random variable . . . . . . . . . . . . 6
2.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Markov Chains 8

1 Pseudo-Random Number Generation


Many of the methods of computational statistics require the ability to generate random variables of known
probability distributions. This is the core to be able to perform statistical simulation as the Monte Carlo
method1 .
The methods to create random variables start with the generation of random digits that are linked
together to form uniformly distributed random numbers in the interval (0, 1). The oldest methods to generate
random numbers where done “by hand”: roll a dice, deck cards selection, balls subtraction from an urn, etc.
During the first half of the XX century mechanical and electrical devices were created to generate random
numbers such as rotating disks and electrical circuits. In the second half, many new works proposed physical
generators of random numbers. Usually, the numbers were published in tables so it can be used for Monte
Carlo simulation. The first tables were published by a student of Karl Pearson in 1927. The table of random
numbers most used ever was the one given by the RAND Corporation in 1955. One of the advantages of it
was his dimension, containing a million random digits.
The most common algorithm to generate random numbers is called linear congruential generator (LGC).
Given a seed number x0 the following random number is chosen as xn+1 ≡ axn + b (mod m) for some a, b
and m chosen conveniently2,3 . The numbers are normalized to get values in the interval [0, 1), dividing by m:
ui = xi /m. RANDU, a congruencial random number generator of IBM in the 60s used a = 216 + 3 = 65539,
b = 0, and m = 231 . This random numbers have a reticulated or grided structure as shown in figure 1. In this
grid structure, a series of lines can be identified where all the pairs in the series are placed. Thus, the generation
of random numbers by computers have the difficulty of the deteriorating of the notion of randomness. Also
note, that they are generated by a deterministic algorithm. For this reasons, this random number are called
pseudo-random. The techniques used to generate uniform random numbers and his defects have been widely
studied. For more information about the topic look [1].

1 The method was named after the Casino of Monte Carlo (Monaco). The term was a code name for a secret job in which von

Neumann and Ulam used this mathematical technique in the well-known project to make the atomic bomb.
2 a ≡ b (mod m), is read “a congruent with b module m” and means that the rest to dived a by m is b. See this informative

video for more information about the topic.


3 Visit https://demonstrations.wolfram.com/LinearCongruentialGenerators/ to simulate the LGC algorithm using Wolfram.

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 1 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Figure 1: Pairs of consecutive pseudo-random numbers plotted in a plane. The pairs situate in a line formation.

Exercise 1. Generating random numbers


In this exercise, we test the LGC method to compute a sample of uniforms random variables in the
interval (0, 1).

(a) Use the LGC method to generate 200 values using the following parameters a = 37, b = 1, and m = 64.
Fix any seed x0 . Use the Matlab function stem() to plot the sequence of values. Do they look random?
Now use the Matlab function scatter() and plot the pairs of consecutive random values as in figure 1.
Do them follow a pattern? How many unique values did you get?

(b) Now, generate 100 values with the parameters: a = 7, b = 0, and m = 61. Fix any seed x0 . Proceed in
the same way as in (a). Use now the function scatter3() to plot the sequence of triplets of consecutive
random numbers.

(c) In the 70s IBM generated random numbers using the following parameters: a = 216 + 3 = 65539, b = 0,
and m = 229 . Do it in the same way. Fix a value for the seed. Generate 200 random digits with the
LGC method. Plot the pairs of consecutive random numbers and the sequence of triplets of consecutive
random numbers. Do you see a pattern? Are this selection of parameters useful to generate random
numbers? Finally, make a histogram using the Matlab function histogram(). Do they look uniform?

(d) Generate 500 random numbers with the method in (b), and also another 500 values with the function
rand() in Matlab. Make an histogram off all the 1000 numbers generated. Do they look uniform?

Exercise 2. Generating exponentially distributed random numbers


We are able to generate any probability distribution having uniforms random numbers. And there are
several methods to do it. For example, the inverse transform method of the distribution function, says
that for a given random variable X with distribution function FX (x) which admits inverse, it is verified that
the transformed variable U = FX (X) follows a continuous uniform distribution. That means that you can
−1
generate the random variable X by X = FX (U ), where U ∼ U (0, 1). The procedure in general is:
−1
1. Deduct the expression of the inverse distribution function: X = FX (U ).

2. Generate a uniform random number U in (0,1).


−1
3. Obtain X as X = FX (U ).

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 2 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Generate the exponential random variable with the above method. Fix any value for the parameter λ and
generate the uniform random numbers U s with the function rand() in Matlab. Make an histogram, for
n = 100, 1000, and 10000 simulations using the function histogram(...,’Normalization’,’pdf’). For the
case n = 10000 plot the histogram together with the pdf of the exponential. Do them agree?

2 Statistics
The field of statistics plays the key role in bridging probability models to the real world. To apply probability
models to real situations, we must perform experiments and gather data. This is why the main work-object
of the statistics field is a collection of data from a population. The data is modelled as being a random
sample, X = (X1 , X2 , ..., Xn ), consisting of n independent random variables with the same distribution.
Statics can be classified in descriptive statistics and inferential statistics. The first one deals with
describing the data and the second infers properties of the entire population through the sample. At the same,
time classical inferential statistics can be subdivided into three branches: Parameter estimation, confidence
interval and hypothesis testing. Statistics also play an important role in Decision Theory, see [2], [3]. Typically,
two classical methodology or points of view arises when solving an inferential statistical problem: frequentist
approach and Bayesian approach. This section deals about inferential statistics trough the frequentist approach.
If you want to know more about statistics in general or about Bayesian statistics see [4], [5].

2.1 Parameter estimation


We consider the problem of estimating a parameter θ related to a random variable X. We suppose that we
have obtained a random sample X = (X1 , ..., Xn ) consisting of independent, identically distributed (i.i.d.)
versions of X. Our estimator is given by a function of X:

θ̂(X) = g(X1 , X2 , ..., Xn ) (1)

After making the observations, we obtain the n values (x1 , x2 , ..., xn ) and compute the estimate for θ by a
single value, g(x1 , x2 , ..., xn ). Because of that θ̂(X) is called a point estimator.
As an example, consider that θ = σ 2 = VAR[X]. It is well knows that the sample variance,
n n
2 1X 1X
Ŝ = (Xi − X̄)2 , where X̄ = Xi (2)
n i=1 n i=1

is an estimator of the variance of X. Another parameters could be to estimate the expectation, µ, or typically
the parameter\s of a family of distribution as λ when X ∼ Exp(λ).
Two typical properties are usually evaluated in a estimator: the bias and the mean square error
(MSE). The bias is defined as
B(θ̂) = E[θ̂] − θ. (3)
An estimator is said to be unbiased if E[θ̂] = θ, that is, if the bias is zero. The MSE is defined as:

E[(θ̂ − θ)2 ] = · · · = VAR[θ̂] + B(θ̂) (4)

If the bias is zero the MSE is just the variance of the estimator. Lesser MSE, bigger the accuracy of the
estimator.

Exercise 3. Emergency centre


At peak hours, the inter-arrival times at a emergency center, are exponential distributed random variable
with rate λ = 5 arrivals per minute. Consider the following two estimators for the mean inter-arrival time:
n
1X
• θ̂1 = Xi
n i=1

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 3 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

• θ̂2 = n · min(X1 , X2 , ..., Xn )

(a) The peak hours time start at 22:00. Simulate the first n = 100 inter-arrival emergency times arriving to
the center4 . Represent the arriving process, using the function stairs() (arrivals time in x-axis and n
in the y-axis). The function duration(), minutes() and cumsum() may be helpful. How long took to
the emergency center to receive 100 arrivals? If each emergency take a mean of 5 minutes to be primary
attended, how many emergency boxes would you prepare? Explain your answer.

(b) Generate a sample of (X1 , X2 , ..., X1000 , ..., X20000 ) inter-arrival emergency times. Partitioned the sample
in portions of ni = 100, i = 1, 2, ..., 200. For each portion estimate the mean inter-arrival time with the
estimators θ̂1 and θ̂2 .
You can view them as a sample of estimations, (θ̂1,1 , θ̂1,2 , ..., θ̂1,200 ) and (θ̂2,1 , θ̂2,2 , ..., θ̂2,200 ). Compute
the bias of each estimation and then the (sample) mean bias. Are the estimators unbiased? Estimate
the variance5 of each estimator and the mean square error. Which one, θ̂1 or θ̂1 , is a better estimator in
therms of the MSE?

2.1.1 Maximum likelihood estimation


The maximum likelihood method is a general procedure to find a point estimator for an unknown parameter θ.
Given a sample, the method selects the most plausible value for the parameter that explains the observations.
In other words, the method maximizes the probability of the observed data. Let x = (x1 , x2 , ..., xn ) be the
observed random sample for the random variable X and let θ be the parameter of interest. The likelihood
function, a function of θ given the sample, is

L(θ/x) = P (X1 = x1 , X2 = x2 , ..., Xn = xn /θ), (5)

if the variables are discrete, and


L(θ/x) = f (x1 , x2 , ..., xn /θ) (6)
if the variables are continuous. Thus, the likelihood function is the same to evaluating the joint probability
mass function or the join density function of (X1 , X2 , ..., Xn ) at the observation values (x1 , x2 , ..., xn ), and
letting it depend on the parameter θ. The maximum likelihood method selects the estimator θ̂ to be the
parameter value that maximize the likelihood function:

θ̂ = max L(θ/x) (7)


θ

Exercise 4. Laboratory hand-in typos


Laboratory papers submitted by the last year students have been found to have a Poisson distributed
number of typos per page. Although all typos are Poisson distributed, each student has his own performance.
As so, the average typos per page, λ, may differ from one student to another. The teacher marks depends
slightly on the typos found. But the teacher does not want to check all pages per work so it takes a random
page and analyzes the number of typos. Using this procedure the teacher found that Darius has 2 typos in a
randomly selected page.

(a) Find the Poisson distribution that best fits the observed data, x = (x1 ) = 2. For that, find the maximum
likelihood function of θ = λ, where λ is the Poisson distribution parameter. Then, plot the likelihood
function. Does it have a maximum? Compute the maximum likelihood estimator θ̂, analytically. Does
it agree with the plot? And does it agree with the observed data? Why or why not?

(b) Two pages of the laboratory papers of Abraham y Mateo were mixed by mistake. The teacher knows
that Abraham has a mean of 1 typos per page and Mateo has a mean of 5 typos per page. Reading one
of the mixed pages the teacher found it has 3 typos. Who is the likely author?
4 Generate the random numbers using the inverse transform method from the previous exercise or use exprnd().
5 Use the sample variance Ŝ 2 .

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 4 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

2.2 Confidence intervals


Instead of seeking a single value that we designate to be the estimate of the parameter of interest, confidence
intervals attempts to find an interval or sets of values that is highly likely to contain the true value of the
parameter. In particular, we specify a high probability, say 1 − α, and then find an interval I = (a, b) such
that
P (a ≤ θ ≤ b) = 1 − α (8)
where a = l(X) and b = u(X) are sample statistics. It is important to note that the parameter θ is unknown
but deterministic. What is random is the interval I. Concretely, the extremes of the interval are random
since are driven by the random variables l(X) and u(X)6 .

2.2.1 Confidence interval for the expected value, µ = E[X], of a random variable
Suppose that we have an i.i.d. sample of Gaussian variables (X1 , X2 , ..., Xn ), with unknown mean µ and
variance σ 2 , and that we are interested in finding a confidence interval for the mean. A well know formula for
a confidence interval for the mean of a Gaussian random variables is
 √ √ 
I = X̄ − tα/2,n−1 · σ̂/ n, X̄ + tα/2,n−1 · σ̂/ n (9)

where,
• X̄ is the sample mean.
v
u n
1 X
• σ̂ is the square root of the unbiased sample variance: σ̂ = t (Xi − X̄)2 .
u
n − 1 i=1

• c = tα/2,n−1 is the critical value of a tn−1 Student’s distribution, such that Fn−1 (c) = 1 − α2 , where
Fn−1 is the cumulative distribution function. This distribution is related with the Gaussian one and
it is characterized by having father tails. The t−distribution has only one parameter, called degrees of
freedom, n − 1. In figure 2 the values ±tα/2,n−1 are represented together with a graphic comparison
between the Gaussian and Student t-distribution density functions.

Figure 2: On the left, density function of a t-distribution with critical values. On the right, a comparison of
standard Gaussian density function with the tn -student density function. The Student t-distribution is named
after W.S. Gosset, who published under the pseudonym “A. Student”.

6 Many misunderstanding about confidence interval, mainly in psychology and social science, by both researchers and students,

have been carried out for decades, and yet remain rampant. See for example [6].

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 5 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Exercise 5. Planned obsolescence


The lifetime of a telephone device is known to be Gaussian distributed. An independent researcher group
want to study it the devices have a planned obsolescence. If the mean lifetime of the devices is less than
two years and a half, the telephones will be considered to have planned obsolescence. The researchers tested
fifty devices and the lifetime of all of them was collected.
(a) Load the data file lifetime.txt and plot an histogram. Do they look normally distributed? What are
the minimum and maximum lifetime observed?
(b) Compute a 95% confidence interval for the mean lifetime of the devices7 . What says the interval about
the planned obsolescence?

2.2.2 Batch means method for the mean of a non-Gaussian random variable
The use of the previous method can be easily misused since is only justified if the sample is (at least approxi-
mately) Gaussian distributed. Nevertheless, if the variables are not Gaussian a method of batch means can
be applied, taking advantage of the central limit theorem. This method involves performing a series of inde-
pendent and identical experiments in which the sample mean X̄ of each experiment is computed. If we assume
that in each experiment each sample mean is calculated from a large enough number of i.i.d. observations, the
central limit theorem implies that the sample mean in each experiment is approximately Gaussian. We can
therefore compute a confidence interval for the mean using as sample set the sample of means, (X̄1 , X̄2 , ..., X̄n ).

Exercise 6. Batch means


An experiment is performed and data is collected. A sample of two hundred values of some unknown
i.i.d. random variables is obtained. The investigators want to compute a confidence interval for the mean.
(a) The data is saved in the file batch.txt. Plot a histogram of the obtained data. Does it look Gaussian?
(b) Compute the sample of means. To do so, group the data in 10 batches of 20 samples each. Do the sample
of means look Gaussian? Is the histogram similar to the one in (a)? Why or why not? Perform the
method of batch means. Give a 90% confidence interval.
(c) Actually, this values were generated from an exponential distribution with parameter λ = 1. Does this
statement agree with the confidence interval obtained in (b)?

2.2.3 Confidence interval for the variance, σ 2 = VAR[X], of a Gaussian random variable
As in the case of the mean, it is useful to find a confidence interval for the variance of a random variable. In
general, whenever the sampling distribution of an estimator of a parameter θ is known, a confidence interval
can be computed. If the variables are Gaussian, it results that,

(n − 1)σ̂n2
∼ χ2n−1 (10)
σ2
where

• σ̂n2 is the unbiased sample variance of a sample of size n.


• θ = σ 2 is the true value of the parameter.
• χ2n−1 is a chi-square random variable of degree n − 1. This distribution is plotted in figure 3.

Using the relation in the above equation (10), the following confidence interval for the variance can be
obtained: " #
(n − 1)σ̂n2 (n − 1)σ̂n2
I= , (11)
χ21−α/2,n−1 χ2α/2,n−1
7 You can use the Matlab function tinv() to find the value of tα/2,n−1 .

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 6 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Figure 3: Chi-square density function with critical values. The area of each highlighted region is α/2.

where χ2α/2,n−1 and χ21−α/2,n−1 are the critical values depicted in figure 3.

Exercise 7. Medical knee implant


A factory produces a piece of a medical knee implant. The length of each fabricated piece must be under
control and the variance as low as possible. It is known that if the variance of the piece is less than 0.75 cm2
the process is under control. Knee piece implant length errors in the manufacturing process are know to be
Gaussian distributed. A sample of the length of 100 hundred medical pieces is saved in the file implant.txt.
Import the file in Matlab and compute a 99% confidence interval for the variance8 . Is the process under
control?

2.3 Hypothesis testing


An import question that arises in scientific experiments is whenever a certain hypothesis about the observed
values is true or no. In statistics the problem is stated as follows: an experiment is made and a random sample
is obtained X = (X1 , X2 , ..., Xn ). We are interested in whether the observed data is significantly different
from what would be expected under our hypothesis H0 . For that, we specify a decision rule. We partition the
observation space into an acceptance region R where we accept the hypothesis and a rejection or critical
region Rc where we reject the hypothesis. Resuming, the decision rule is:

Accept H0 if X ∈ R (12)
c
Reject H0 if X ∈ R . (13)

Two kinds of errors can occur when executing this decision rule:

Type I error : Reject H0 when H0 is true (14)


Type II error : Accept H0 when H0 is false. (15)

The significance level of a test, α, is defined as the probability of Type I error, i.e.

α = P [X ∈ Rc /H0 ]. (16)

This value represents our tolerance for Type I errors, that is, of rejecting H0 when in fact is true. Generally
what is wanted is to find the best test so for a given threshold α the Type II error is as less as possible.
8 You can use the Matlab function chi2inv() to find the values of χ2α/2,n−1 and χ21−α/2,n−1 .

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 7 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Unfortunately, in many cases there is no information about the true distribution of the observation X and
hence the probability of Type II error is not possible to evaluate9 .

Exercise 8. Battery performance


A battery manufacturer, Duracelia, claims to have improved their batteries lifetime. A nonconformist
costumer, that saw the announcement on the TV, wants to perform a test to see if it is really true. The old
batteries are known to have a lifetime that is Gaussian distributed with mean 150 hours and standard deviation
4 hours. A sample of the lifetime of 25 new batteries is saved in the file batteries.txt. Let H0 =“batteries
lifetime is unchanged” and consider a significance level of α = 0.01. If H0 is true then the sample mean X̄25
is Gaussian distributed.

(a) Compute the mean and variance of X̄25 under the hypothesis H0 .

(b) Compute c such that α = 0.01 = P [X̄25 > 150 + c /H0 ]. Then the rejection region for the sample mean,
X̄25 , is Rc = (150 + c, ∞). That is, if X̄25 > 150 + c the null hypothesis is rejected and it is accepted
that the lifetime of the batteries has significantly increase.

(c) Import the file batteries.txt and compute the sample mean. Can you reject the null hypothesis H0 ?

3 Markov Chains
This section is about a certain sort of stochastic process, called Markov processes. The characteristic of
this sort of processes is that it retains no memory of where it has been in the past. This means that only the
current state of the process can influence where it goes next. We will be concerned exclusively with the case
where the process can assume only a finite or countable set of states. They are called Markov chains.
What makes them important is not only that models many phenomena of interest, but also the lack of
memory property makes it possible to predict how a Markov chain may behave, and to compute probabilities
and expected values which quantify that behaviour. Thus, Markov processes are quite useful in modelling
many problems found in practice.
A discrete random process {Xn }n∈I is a Markov process if the future of the process given the present
is independent of the past, that is,

P [Xn+1 = xn+1 /Xn = xn , ..., X1 = x1 ] = P [Xn+1 = xn+1 /Xn = xn ]. (17)

In the above expression, we refer to n as the “present”, to n + 1 as the “future”, and to 1, 2, ..., n − 1 as
the “past”. The value of Xn is called the state of the process at instant n. Lets assume {Xn }n∈I takes values
in a finite set of integers S, and that the one-step probabilities, pij = P [X1 = j/X0 = i], are fixed and do not
change with the steps, that is:

P [Xn+1 = j/Xn = i] = pij for all n ∈ I. (18)

Also lets assume some distribution for the beginning of the process:

P [X0 = i] = pi for all i ∈ S. (19)

Can be easily seen that the joint pmf of (X0 , X1 , ..., Xn ) is given by

P [X0 = i0 , X1 = i1 , ..., Xn = in ] = pi0 · p(i0 ,i1 ) · · · p(in−1,in ) . (20)

9 There are more types of hypothesis testing, here we approach just the one mentioned. If you want to know more about statistical

hypothesis testing visit the very nice web page http://www.randomservices.org/random/hypothesis/Introduction.html of the
University of Alabama in Hunstville.

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 8 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Actually the process can be completely specified in a easy way trough a stochastic matrix10
 
p00 p01 · · · p0m
 p10 p11 · · · p1m 
P= . ..  m = |S| (21)
 
.. ..
 .. . . . 
pm0 pm1 ··· pmm ,

called transition probability matrix, and the initial distribution

p = (p0 , p1 , ..., pm )t . (22)

Finally, consider the n − step transition probabilities matrix, named P (n), where the ij element is

pij (n) = P [Xn+k = j/Xk = i] for all k, n + k ∈ I and i, j ∈ S. (23)

Actually, P [Xn+k = j/Xk = i] = P [Xn = j/X0 = i] since we supposed that the transition probabilities do not
change with the steps. It can be shown (Champan-Kolmogorov equations) that this matrix is just

P (n) = P n . (24)

As an example, imagine we are in the state i and want to know the probability of being in state j after n steps.
Then we have just to look at the element ij of the matrix P n .
Markov chains are often best described by diagrams. For example the diagram in figure 4 represents a
Markov chain with probability matrix:  
1−α α
P=
β 1−β ·

Figure 4: Markov chain diagram.

Exercise 9. Google PageRank


An internet surfer browses webpages in a five-page web universe as shown in the diagram 5. The surfer
selects the next page to view by selecting with equal probability from the pages pointed to by the current page.
If a page has no outgoing link (like page 2), then the surfer selects any of the pages in the universe with equal
probability11 .

(a) Compute the transition probability matrix P.


(b) Simulate 5 samples of the process for the first 10 steps and plot it. Which web page is most visited? And
lesser?
(c) Compute empirically the probability to come back to page 3 if you were 2 steps before also on page 3.
For that, perform a simulation of the process until the step n = 100 and compute the relative frequency.
Finally, compute the theoretic probability. Are they similar? Which 2-step transitions are impossible?
10 A matrix P is stochastic if every element is nonnegative and the sum by rows is one.
11 The random surfer model, forms the basis for the PageRank algorithm that was introduced by Google to rank web pages in
their search engine results. The rank of a page is given by the steady state probability of the page in the Markov chain model.
The size of the state space in this Markov chain is in the billions of pages!

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 9 de 10


Escuela Técnica Superior de Ingenierı́a de Course: 2020-2021
Telecomunicación Date: 2020-11-26

Laboratory Teacher:
Marius Marinescu

Probability and Statistics

Figure 5: State-transition diagram.

(d) Let π = (π1 , π2 , ..., π5 ) be the proportions of instants spent in each state in the long term, called steady
state.
• Compute these probabilities empirically. For that simulate the process several times and computes
the relative frequency of being in each state.
• Compute P, P 5 , P 10 , P 20 , and P 50 using a computer12 . Does the matrix converge to a constant
matrix? Are the probabilities by row constant? How would you interpret this?
• There is a way to compute the steady state π distribution in a theoretical and direct way. For that
you have to solve the following (indeterminate) linear system of equations:

π = πP
Pn
and use the normalization condition i=1 πi = 1 since π are probabilities. Compute π analytically.
Are the probabilities similar to the one you computed empirically? What is the most frequent state?

(e) As seen in Table 1, the surfer usually spends 10 minutes in the first page (his favourite one), 5 minutes
in the pages 2, 3, and 4, and 3 minutes on page 5. Find the percentage of the time the system spends in
different states in the long run13 . In which page the surfer spends most of the time? Is the same as the
most visited page?

State 1 2 3 4 5
Expected values (minutes) 10 5 5 5 3

Table 1: Expected value of time spent in each state.

References
[1] J. E. Gentle, Random Number Generation and Monte Carlo Methods. Springer-Verlag, 1998.
[2] J. E. Gentle, Statistical decision theory and Bayesian analysis. Springer-Verlag, 1985.
[3] L. Valdés Sánchez Teófilo; Pardo Llorente, Decisiones Estratégicas. Ediciones Dı́az de Santos, 2000.
[4] M. R. Sheldom, Introducción a la Estadı́stica. Standford University, EE. UU: Ediciones Reverté, 2007.

[5] W. M. Bolstad, Introduction to Bayesian Statistics. Wiley-Blackwell, 2007.


[6] R. Hoekstra and et al., “Robust misinterpretation of confidence intervals,” Psychonomic Bulletin & Review,
no. 21, p. 1157–1164, 2014.

12 As a note, it can also be computed by hand using matrix diagonalization methods from Linear Algebra.
13 Hint: you can draw on the probabilities π to do so, or you can perform a simulation and estimate the times.

Universidad Rey Juan Carlos Grado en Ingenierı́a Biomédica (inglés) Pág. 10 de 10

You might also like