Laboratory Probability and Statistics 20 21 Errata Corrected
Laboratory Probability and Statistics 20 21 Errata Corrected
Laboratory Teacher:
Marius Marinescu
Indications: The laboratory practice has 3 main parts: Random numbers, Statistics, and Markov chains. Each part
has an introduction to the topic and theoretical-practical exercises. Clues of useful functions in Matlab to help you solving the
exercises are given. You must resolve a minimum of: one exercises per section 1, 1 exercise per each subsections 2.1, 2.2, 2.3 and
one exercise of section 3. You can hand-in more or all exercises. In that case, the mark will be the mean of the 5 best marks,
following the proportion above. You have to hand-in a pdf report detailing and explaining the resolution and the code used.
Teacher contact: [email protected]
Contents
1 Pseudo-Random Number Generation 1
2 Statistics 3
2.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Confidence interval for the expected value of a random variable . . . . . . . . . . . . . . 5
2.2.2 Batch means method for the mean of a non-Gaussian random variable . . . . . . . . . . 6
2.2.3 Confidence interval for the variance of a Gaussian random variable . . . . . . . . . . . . 6
2.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Markov Chains 8
1 The method was named after the Casino of Monte Carlo (Monaco). The term was a code name for a secret job in which von
Neumann and Ulam used this mathematical technique in the well-known project to make the atomic bomb.
2 a ≡ b (mod m), is read “a congruent with b module m” and means that the rest to dived a by m is b. See this informative
Laboratory Teacher:
Marius Marinescu
Figure 1: Pairs of consecutive pseudo-random numbers plotted in a plane. The pairs situate in a line formation.
(a) Use the LGC method to generate 200 values using the following parameters a = 37, b = 1, and m = 64.
Fix any seed x0 . Use the Matlab function stem() to plot the sequence of values. Do they look random?
Now use the Matlab function scatter() and plot the pairs of consecutive random values as in figure 1.
Do them follow a pattern? How many unique values did you get?
(b) Now, generate 100 values with the parameters: a = 7, b = 0, and m = 61. Fix any seed x0 . Proceed in
the same way as in (a). Use now the function scatter3() to plot the sequence of triplets of consecutive
random numbers.
(c) In the 70s IBM generated random numbers using the following parameters: a = 216 + 3 = 65539, b = 0,
and m = 229 . Do it in the same way. Fix a value for the seed. Generate 200 random digits with the
LGC method. Plot the pairs of consecutive random numbers and the sequence of triplets of consecutive
random numbers. Do you see a pattern? Are this selection of parameters useful to generate random
numbers? Finally, make a histogram using the Matlab function histogram(). Do they look uniform?
(d) Generate 500 random numbers with the method in (b), and also another 500 values with the function
rand() in Matlab. Make an histogram off all the 1000 numbers generated. Do they look uniform?
Laboratory Teacher:
Marius Marinescu
Generate the exponential random variable with the above method. Fix any value for the parameter λ and
generate the uniform random numbers U s with the function rand() in Matlab. Make an histogram, for
n = 100, 1000, and 10000 simulations using the function histogram(...,’Normalization’,’pdf’). For the
case n = 10000 plot the histogram together with the pdf of the exponential. Do them agree?
2 Statistics
The field of statistics plays the key role in bridging probability models to the real world. To apply probability
models to real situations, we must perform experiments and gather data. This is why the main work-object
of the statistics field is a collection of data from a population. The data is modelled as being a random
sample, X = (X1 , X2 , ..., Xn ), consisting of n independent random variables with the same distribution.
Statics can be classified in descriptive statistics and inferential statistics. The first one deals with
describing the data and the second infers properties of the entire population through the sample. At the same,
time classical inferential statistics can be subdivided into three branches: Parameter estimation, confidence
interval and hypothesis testing. Statistics also play an important role in Decision Theory, see [2], [3]. Typically,
two classical methodology or points of view arises when solving an inferential statistical problem: frequentist
approach and Bayesian approach. This section deals about inferential statistics trough the frequentist approach.
If you want to know more about statistics in general or about Bayesian statistics see [4], [5].
After making the observations, we obtain the n values (x1 , x2 , ..., xn ) and compute the estimate for θ by a
single value, g(x1 , x2 , ..., xn ). Because of that θ̂(X) is called a point estimator.
As an example, consider that θ = σ 2 = VAR[X]. It is well knows that the sample variance,
n n
2 1X 1X
Ŝ = (Xi − X̄)2 , where X̄ = Xi (2)
n i=1 n i=1
is an estimator of the variance of X. Another parameters could be to estimate the expectation, µ, or typically
the parameter\s of a family of distribution as λ when X ∼ Exp(λ).
Two typical properties are usually evaluated in a estimator: the bias and the mean square error
(MSE). The bias is defined as
B(θ̂) = E[θ̂] − θ. (3)
An estimator is said to be unbiased if E[θ̂] = θ, that is, if the bias is zero. The MSE is defined as:
If the bias is zero the MSE is just the variance of the estimator. Lesser MSE, bigger the accuracy of the
estimator.
Laboratory Teacher:
Marius Marinescu
(a) The peak hours time start at 22:00. Simulate the first n = 100 inter-arrival emergency times arriving to
the center4 . Represent the arriving process, using the function stairs() (arrivals time in x-axis and n
in the y-axis). The function duration(), minutes() and cumsum() may be helpful. How long took to
the emergency center to receive 100 arrivals? If each emergency take a mean of 5 minutes to be primary
attended, how many emergency boxes would you prepare? Explain your answer.
(b) Generate a sample of (X1 , X2 , ..., X1000 , ..., X20000 ) inter-arrival emergency times. Partitioned the sample
in portions of ni = 100, i = 1, 2, ..., 200. For each portion estimate the mean inter-arrival time with the
estimators θ̂1 and θ̂2 .
You can view them as a sample of estimations, (θ̂1,1 , θ̂1,2 , ..., θ̂1,200 ) and (θ̂2,1 , θ̂2,2 , ..., θ̂2,200 ). Compute
the bias of each estimation and then the (sample) mean bias. Are the estimators unbiased? Estimate
the variance5 of each estimator and the mean square error. Which one, θ̂1 or θ̂1 , is a better estimator in
therms of the MSE?
(a) Find the Poisson distribution that best fits the observed data, x = (x1 ) = 2. For that, find the maximum
likelihood function of θ = λ, where λ is the Poisson distribution parameter. Then, plot the likelihood
function. Does it have a maximum? Compute the maximum likelihood estimator θ̂, analytically. Does
it agree with the plot? And does it agree with the observed data? Why or why not?
(b) Two pages of the laboratory papers of Abraham y Mateo were mixed by mistake. The teacher knows
that Abraham has a mean of 1 typos per page and Mateo has a mean of 5 typos per page. Reading one
of the mixed pages the teacher found it has 3 typos. Who is the likely author?
4 Generate the random numbers using the inverse transform method from the previous exercise or use exprnd().
5 Use the sample variance Ŝ 2 .
Laboratory Teacher:
Marius Marinescu
2.2.1 Confidence interval for the expected value, µ = E[X], of a random variable
Suppose that we have an i.i.d. sample of Gaussian variables (X1 , X2 , ..., Xn ), with unknown mean µ and
variance σ 2 , and that we are interested in finding a confidence interval for the mean. A well know formula for
a confidence interval for the mean of a Gaussian random variables is
√ √
I = X̄ − tα/2,n−1 · σ̂/ n, X̄ + tα/2,n−1 · σ̂/ n (9)
where,
• X̄ is the sample mean.
v
u n
1 X
• σ̂ is the square root of the unbiased sample variance: σ̂ = t (Xi − X̄)2 .
u
n − 1 i=1
• c = tα/2,n−1 is the critical value of a tn−1 Student’s distribution, such that Fn−1 (c) = 1 − α2 , where
Fn−1 is the cumulative distribution function. This distribution is related with the Gaussian one and
it is characterized by having father tails. The t−distribution has only one parameter, called degrees of
freedom, n − 1. In figure 2 the values ±tα/2,n−1 are represented together with a graphic comparison
between the Gaussian and Student t-distribution density functions.
Figure 2: On the left, density function of a t-distribution with critical values. On the right, a comparison of
standard Gaussian density function with the tn -student density function. The Student t-distribution is named
after W.S. Gosset, who published under the pseudonym “A. Student”.
6 Many misunderstanding about confidence interval, mainly in psychology and social science, by both researchers and students,
have been carried out for decades, and yet remain rampant. See for example [6].
Laboratory Teacher:
Marius Marinescu
2.2.2 Batch means method for the mean of a non-Gaussian random variable
The use of the previous method can be easily misused since is only justified if the sample is (at least approxi-
mately) Gaussian distributed. Nevertheless, if the variables are not Gaussian a method of batch means can
be applied, taking advantage of the central limit theorem. This method involves performing a series of inde-
pendent and identical experiments in which the sample mean X̄ of each experiment is computed. If we assume
that in each experiment each sample mean is calculated from a large enough number of i.i.d. observations, the
central limit theorem implies that the sample mean in each experiment is approximately Gaussian. We can
therefore compute a confidence interval for the mean using as sample set the sample of means, (X̄1 , X̄2 , ..., X̄n ).
2.2.3 Confidence interval for the variance, σ 2 = VAR[X], of a Gaussian random variable
As in the case of the mean, it is useful to find a confidence interval for the variance of a random variable. In
general, whenever the sampling distribution of an estimator of a parameter θ is known, a confidence interval
can be computed. If the variables are Gaussian, it results that,
(n − 1)σ̂n2
∼ χ2n−1 (10)
σ2
where
Using the relation in the above equation (10), the following confidence interval for the variance can be
obtained: " #
(n − 1)σ̂n2 (n − 1)σ̂n2
I= , (11)
χ21−α/2,n−1 χ2α/2,n−1
7 You can use the Matlab function tinv() to find the value of tα/2,n−1 .
Laboratory Teacher:
Marius Marinescu
Figure 3: Chi-square density function with critical values. The area of each highlighted region is α/2.
where χ2α/2,n−1 and χ21−α/2,n−1 are the critical values depicted in figure 3.
Accept H0 if X ∈ R (12)
c
Reject H0 if X ∈ R . (13)
Two kinds of errors can occur when executing this decision rule:
The significance level of a test, α, is defined as the probability of Type I error, i.e.
α = P [X ∈ Rc /H0 ]. (16)
This value represents our tolerance for Type I errors, that is, of rejecting H0 when in fact is true. Generally
what is wanted is to find the best test so for a given threshold α the Type II error is as less as possible.
8 You can use the Matlab function chi2inv() to find the values of χ2α/2,n−1 and χ21−α/2,n−1 .
Laboratory Teacher:
Marius Marinescu
Unfortunately, in many cases there is no information about the true distribution of the observation X and
hence the probability of Type II error is not possible to evaluate9 .
(a) Compute the mean and variance of X̄25 under the hypothesis H0 .
(b) Compute c such that α = 0.01 = P [X̄25 > 150 + c /H0 ]. Then the rejection region for the sample mean,
X̄25 , is Rc = (150 + c, ∞). That is, if X̄25 > 150 + c the null hypothesis is rejected and it is accepted
that the lifetime of the batteries has significantly increase.
(c) Import the file batteries.txt and compute the sample mean. Can you reject the null hypothesis H0 ?
3 Markov Chains
This section is about a certain sort of stochastic process, called Markov processes. The characteristic of
this sort of processes is that it retains no memory of where it has been in the past. This means that only the
current state of the process can influence where it goes next. We will be concerned exclusively with the case
where the process can assume only a finite or countable set of states. They are called Markov chains.
What makes them important is not only that models many phenomena of interest, but also the lack of
memory property makes it possible to predict how a Markov chain may behave, and to compute probabilities
and expected values which quantify that behaviour. Thus, Markov processes are quite useful in modelling
many problems found in practice.
A discrete random process {Xn }n∈I is a Markov process if the future of the process given the present
is independent of the past, that is,
In the above expression, we refer to n as the “present”, to n + 1 as the “future”, and to 1, 2, ..., n − 1 as
the “past”. The value of Xn is called the state of the process at instant n. Lets assume {Xn }n∈I takes values
in a finite set of integers S, and that the one-step probabilities, pij = P [X1 = j/X0 = i], are fixed and do not
change with the steps, that is:
Also lets assume some distribution for the beginning of the process:
Can be easily seen that the joint pmf of (X0 , X1 , ..., Xn ) is given by
9 There are more types of hypothesis testing, here we approach just the one mentioned. If you want to know more about statistical
hypothesis testing visit the very nice web page http://www.randomservices.org/random/hypothesis/Introduction.html of the
University of Alabama in Hunstville.
Laboratory Teacher:
Marius Marinescu
Actually the process can be completely specified in a easy way trough a stochastic matrix10
p00 p01 · · · p0m
p10 p11 · · · p1m
P= . .. m = |S| (21)
.. ..
.. . . .
pm0 pm1 ··· pmm ,
Finally, consider the n − step transition probabilities matrix, named P (n), where the ij element is
Actually, P [Xn+k = j/Xk = i] = P [Xn = j/X0 = i] since we supposed that the transition probabilities do not
change with the steps. It can be shown (Champan-Kolmogorov equations) that this matrix is just
P (n) = P n . (24)
As an example, imagine we are in the state i and want to know the probability of being in state j after n steps.
Then we have just to look at the element ij of the matrix P n .
Markov chains are often best described by diagrams. For example the diagram in figure 4 represents a
Markov chain with probability matrix:
1−α α
P=
β 1−β ·
Laboratory Teacher:
Marius Marinescu
(d) Let π = (π1 , π2 , ..., π5 ) be the proportions of instants spent in each state in the long term, called steady
state.
• Compute these probabilities empirically. For that simulate the process several times and computes
the relative frequency of being in each state.
• Compute P, P 5 , P 10 , P 20 , and P 50 using a computer12 . Does the matrix converge to a constant
matrix? Are the probabilities by row constant? How would you interpret this?
• There is a way to compute the steady state π distribution in a theoretical and direct way. For that
you have to solve the following (indeterminate) linear system of equations:
π = πP
Pn
and use the normalization condition i=1 πi = 1 since π are probabilities. Compute π analytically.
Are the probabilities similar to the one you computed empirically? What is the most frequent state?
(e) As seen in Table 1, the surfer usually spends 10 minutes in the first page (his favourite one), 5 minutes
in the pages 2, 3, and 4, and 3 minutes on page 5. Find the percentage of the time the system spends in
different states in the long run13 . In which page the surfer spends most of the time? Is the same as the
most visited page?
State 1 2 3 4 5
Expected values (minutes) 10 5 5 5 3
References
[1] J. E. Gentle, Random Number Generation and Monte Carlo Methods. Springer-Verlag, 1998.
[2] J. E. Gentle, Statistical decision theory and Bayesian analysis. Springer-Verlag, 1985.
[3] L. Valdés Sánchez Teófilo; Pardo Llorente, Decisiones Estratégicas. Ediciones Dı́az de Santos, 2000.
[4] M. R. Sheldom, Introducción a la Estadı́stica. Standford University, EE. UU: Ediciones Reverté, 2007.
12 As a note, it can also be computed by hand using matrix diagonalization methods from Linear Algebra.
13 Hint: you can draw on the probabilities π to do so, or you can perform a simulation and estimate the times.