0% found this document useful (0 votes)

4 views161 pages

IntroductionToProbabilityAndEstimationTheory

Uploaded by

Saddly Smile

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views161 pages

IntroductionToProbabilityAndEstimationTheory

Uploaded by

Saddly Smile

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 161

Introduction to Probability and Estimation

Theory

Frédéric Lehmann

Télécom SudParis, Institut Mines-Télécom, IP Paris

Département CITI (Communications, Images et Traitement de l’Information)
Laboratoire SAMOVAR

September, 2024

Frédéric Lehmann Introduction to Probability and Estimation Theory 1 / 161

Acronyms

r.v. Random variable

p.m.f. Probability mass function
p.d.f. Probability density function
c.d.f. Cumulative distribution function
a.s. Almost surely
ML Maximum-Likelihood
MSE Mean Squared Error
X ∼ ... r.v. X is distributed according to . . .
B(1, p) Bernoulli law of parameter p
B(n, p) Binomial law of size n and parameter p
P(λ) Poisson law of parameter λ
U ([a, b]) Uniform law over [a, b]
E(λ) Exponential law of parameter λ
N (m, σ 2 ) Gaussian law of mean m and variance σ 2

Frédéric Lehmann Introduction to Probability and Estimation Theory 2 / 161

Motivation...

A useful concept: to model uncertain situations

In science and engineering: noisy experimental data
In biology: a drug can work on a patient or not
In management: is a new activity likely to be profitable ?
With several interpretations:
Frequency of occurrence of an event
Our belief about the likelihood of an event

Frédéric Lehmann Introduction to Probability and Estimation Theory 3 / 161

Outline of Part I: Introduction to Probability I
1 Basic concepts
Sets
Probabilistic models
Conditional probability
Independence
2 Discrete random variables
Probability mass function (p.m.f.)
Expectation and variance
Joint probability mass function of multiple random variables
Results for some useful distributions
3 Continuous random variables
Probability density function (p.d.f.)
Expectation and variance
Frédéric Lehmann Introduction to Probability and Estimation Theory 4 / 161
Outline of Part I: Introduction to Probability II
Joint probability distribution function of multiple random
variables
Results for some useful distributions

4 Limit theorems
Chebyshev inequality
Law of large numbers
Central limit theorem

Frédéric Lehmann Introduction to Probability and Estimation Theory 5 / 161

Outline of Part II: Introduction to Random Processes I

5 Poisson process

6 Discrete-time Markov processes

Frédéric Lehmann Introduction to Probability and Estimation Theory 6 / 161

Outline of Part III: Introduction to Estimation Theory I

7 Classical Estimation

8 Bayesian Estimation

Frédéric Lehmann Introduction to Probability and Estimation Theory 7 / 161

Basic concepts
Discrete random variables
Continuous random variables
Limit theorems

Part I

Introduction to Probability

Frédéric Lehmann Introduction to Probability and Estimation Theory 8 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

1 Basic concepts
Sets
Probabilistic models
Conditional probability
Independence

Frédéric Lehmann Introduction to Probability and Estimation Theory 9 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

1 Basic concepts
Sets
Probabilistic models
Conditional probability
Independence

Frédéric Lehmann Introduction to Probability and Estimation Theory 10 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Sets

Definition
A set, S is a collection of objects, which are the elements of S.

Specification
A set with a finite number of elements can be written:
S = {x1 , . . . , xn }.
Example: the possible outcomes of a dice roll is S = {1, 2, 3, 4, 5, 6}
A set S is countably infinite if its infinitely many elements can be
enumerated in a list, S = {x1 , x2 , . . . }
Example: the set of non-negative integers, N
Otherwise the set is uncountable
Example: the set of real numbers, R or the interval [0, 1]

Frédéric Lehmann Introduction to Probability and Estimation Theory 11 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Set operations

Definitions
Universal set: Ω, the set containing all possible objects in the
context of interest
Empty set: ∅, the set containing no object
Complement of the set S: S c = {x ∈ Ω|x ∈
/ S}
Union of two sets S and T : S ∪ T = {x|x ∈ S or x ∈ T }
Intersection of two sets S and T :
S ∩ T = {x|x ∈ S and x ∈ T }
A collection of disjoint sets: their intersection is ∅
A collection of sets is a partition of S: the sets in the
collection are disjoint and their union is S

Frédéric Lehmann Introduction to Probability and Estimation Theory 12 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

The algebra of sets

Properties
Commutative property: S ∪ T = T ∪ S and S ∩ T = T ∩ S,
Associative property: S ∪ (T ∪ U ) = (S ∪ T ) ∪ U and
S ∩ (T ∩ U ) = (S ∩ T ) ∩ U ,
Distributive law: S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U ) and
S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U )
S ∪ ∅ = S and S ∩ ∅ = ∅
S ∪ Ω = Ω and S ∩
Ω =Sc c
De Morgan’s law: ∪Sn = ∩Snc and ∩Sn = ∪Snc
n n n n

Frédéric Lehmann Introduction to Probability and Estimation Theory 13 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

1 Basic concepts
Sets
Probabilistic models
Conditional probability
Independence

Frédéric Lehmann Introduction to Probability and Estimation Theory 14 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Vocabulary of probability theory

Consider a random experiment, whose outcomes are not

deterministic (i.e. unpredictable)
Definitions
Sample space: Ω, the set containing all possible outcomes of
a given experiment
Event: subset of the sample space Ω

Example: experiment is a single coin toss

with outcomes head (H) or tail (T)

Sample space: Ω = {H, T }

Events: ∅, {H}, {T }, Ω

Frédéric Lehmann Introduction to Probability and Estimation Theory 15 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Intuitive definition of probability

Repeat independently a random experiment n times
Frequentist approach
Hypothesis on the sample space: finite number of equally
likely outcomes
Relative frequency of occurrence of event A:
fn (A) = number of occurrences
n
of A
Probability of event A: P (A) = limn→∞ fn (A)

Criticism
Finite number of equally likely outcomes: not always true
Uses a law of large numbers argument: not yet demonstrated

Frédéric Lehmann Introduction to Probability and Estimation Theory 16 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Axiomatic definition of probability

A probability law P is a function defined on the algebra of sets

taking values in [0, 1], satisfying the following axioms:
Kolmogorov approach
Axiom 1 (Nonnegativity): for any event A, P (A) ≥ 0
Axiom 2 (Normalization): P (Ω) = 1
Axiom 3 (Additivity): Let A1 , A2 , . . . be a sequence of
mutually
disjoint events (i.e. Ai ∩ Aj = ∅, ∀i 6= j), then
P
P ∪Ai = i P (Ai )
i

Frédéric Lehmann Introduction to Probability and Estimation Theory 17 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Probability law

P(A)
Event A

Experiment

P(B)
Event B

Sample space Events

Frédéric Lehmann Introduction to Probability and Estimation Theory 18 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Probability law

Important properties
Property 1: P (Ac ) = 1 − P (A)
Property 2: If A ⊂ B, then P (A) ≤ P (B)
Property 3: P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Property 4: P (A ∪ B) ≤ P (A) + P (B)
Property 5: Let A = A1 ∪ · · · ∪ An be a union of mutually disjoint
events, P (A) = P (A1 ) + P (A2 ) + · · · + P (An )

Example: intuitive definition of probability

Sample space: Ω = {x1 , . . . , xn }, where P ({xi }) = 1/n, ∀i
Using property 5: P (A) = number of elements of A n

Frédéric Lehmann Introduction to Probability and Estimation Theory 19 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

1 Basic concepts
Sets
Probabilistic models
Conditional probability
Independence

Frédéric Lehmann Introduction to Probability and Estimation Theory 20 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Conditional probability

Definition
Consider two events A and B such that: P (B) > 0
P (A∩B)
Conditional probability of A given B: P (A|B) = P (B)

Interpretation in terms of the intuitive definition of probability

Consider the sample space: Ω = {x1 , . . . , xn }, where
P ({xi }) = 1/n, ∀i
Interpretation: probability law on a new universe B, whose
elements are also equally likely
Using property 5: P (A|B) = number of elements of A ∩ B
number of elements of B

Frédéric Lehmann Introduction to Probability and Estimation Theory 21 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Conditional probability example

A dice roll problem

Experiment: outcome of a dice roll
Sample space: Ω = {1, 2, 3, 4, 5, 6}
Events: A = {number < 4}, B = {number is odd}
Calculate: P (A|B)

Solution
Intuitive approach: |A ∩ B| = 2, |B| = 3, thus
P (A|B) = |A ∩ B|/|B| = 2/3
Rigorous approach: P (B) = |B|/|Ω| = 3/6 = 1/2,
P (A ∩ B) = |A ∩ B|/|Ω| = 2/6 = 1/3, thus
P (A|B) = P P(A∩B)
(B) = 2/3

Frédéric Lehmann Introduction to Probability and Estimation Theory 22 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Conditional probabilities specify a probability law

All axioms are satisfied

Axiom 1: ∀A, A ∩ B ⊂ B, thus 0 ≤ P (A ∩ B) ≤ P (B) and
P (A|B) ∈ [0, 1]
P (Ω∩B) P (B)
Axiom 2: P (Ω|B) = P (B) = P (B) =1
Axiom 3: Let A1 , A2 , . . . be a sequence of mutually disjoint
events.
Using the distributive law
P [(∪Ai )∩B] P [∪(Ai ∩B)]
i i
P (∪Ai |B) = P (B) = P (B)
i
Now using thePfact that (Ai ∩ B) ∩ (Aj ∩ B) = ∅, ∀i 6= j
P (A ∩B)
P (∪Ai |B) = i P (B)i
P
= i P (Ai |B)
i

Frédéric Lehmann Introduction to Probability and Estimation Theory 23 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Properties of conditional probabilities

Multiplicative rule
Assumption: All conditioning events have positive probability
Probability of an intersection of events:

P (∩ni=1 Ai ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩A2 ) . . . P (An |∩n−1

i=1 Ai )

Frédéric Lehmann Introduction to Probability and Estimation Theory 24 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Properties of conditional probabilities

Total probability theorem

Assumption 1: A1 , A2 , . . . , An are mutually disjoint events,
that form a partition of the sample space
Assumption 2: P (Ai ) > 0, ∀i
Result
P (B) = P (A1 ∩ B) + P (A2 ∩ B) + · · · + P (An ∩ B)
= P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + · · · + P (B|An )P (An )

Frédéric Lehmann Introduction to Probability and Estimation Theory 25 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

1 Basic concepts
Sets
Probabilistic models
Conditional probability
Independence

Frédéric Lehmann Introduction to Probability and Estimation Theory 26 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Independence of two events

Definition
Two events A and B are independent if P (A ∩ B) = P (A)P (B)
If in addition P (B) > 0, this is equivalent to P (A|B) = P (A)

Interpretation
In a probabilistic sense: the occurrence of B does not alter P (A)
Pay attention to the intuitive sense: A common thought is that A
and B, with P (A) > 0 and P (B) > 0, are independent if they are
disjoint.
However the opposite is true: P (A ∩ B) 6= P (A)P (B), since
P (A ∩ B) = 0

Frédéric Lehmann Introduction to Probability and Estimation Theory 27 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Independence of several events

Definition
A1 , A2 , . . . , An are independent if
Y
P ∩ Ai = P (Ai ), for any subset S of {1, 2, . . . , n}
i∈S
i∈S

Link with pairwise independence

Independence implies pairwise independence, i.e.
P (Ai ∩ Aj ) = P (Ai )P (Aj ), ∀i 6= j
The converse is false

Frédéric Lehmann Introduction to Probability and Estimation Theory 28 / 161

Basic concepts Sets
Discrete random variables Probabilistic models
Continuous random variables Conditional probability
Limit theorems Independence

Useful counting results

Consider two integers, n, k, with k ≤ n

Number of permutations of n objects: n!
n!
Number of k-permutations of n objects: (n−k)!
Combinations of k out of n objects: nk = k!(n−k)!
n!

Frédéric Lehmann Introduction to Probability and Estimation Theory 29 / 161

Basic concepts Probability mass function (p.m.f.)
Discrete random variables Expectation and variance
Continuous random variables Joint probability mass function of multiple random variables
Limit theorems Results for some useful distributions

2 Discrete random variables

Probability mass function (p.m.f.)
Expectation and variance
Joint probability mass function of multiple random variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 30 / 161

Random variable

Random variable (r.v.)

A random variable is a real-valued function X : Ω → R
A function of a random variable is another random variable

Visualization
Random variable X

2
.
. Real number line
.
n

Sample space

Frédéric Lehmann Introduction to Probability and Estimation Theory 31 / 161

Discrete random variable

The range of X (the set of values it can take) is either
finite
or countably infinite

Frédéric Lehmann Introduction to Probability and Estimation Theory 32 / 161

Discrete random variable example

Experiment: Two successive and independent coin tosses

Sample space: Ω = {HH, HT, T H, T T }
Random variable X: number of heads

Tabular representation

Sample HH HT TH TT
X 2 1 1 0

Frédéric Lehmann Introduction to Probability and Estimation Theory 33 / 161

2 Discrete random variables

Probability mass function (p.m.f.)
Expectation and variance
Joint probability mass function of multiple random variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 34 / 161

Probability mass function (p.m.f.)

Definition
Let X be a discrete random variable
Let x1 ≤ x2 ≤ . . . be the possible outcomes of X in
ascending order
Let P ({X = xk }) = pX (xk ), ∀k
pX (x) is the probability mass function (p.m.f.) of X

Frédéric Lehmann Introduction to Probability and Estimation Theory 35 / 161

Probability mass function (p.m.f.) example

Experiment: Two successive and independent coin tosses

Sample space: Ω = {HH, HT, T H, T T }
Using a fair coin:
P (HH) = P (HT ) = P (T H) = P (T T ) = 1/4
Random variable X: number of heads
P (X = 0) = P (T T ) = 1/4,
P (X = 1) = P (HT ∪ T H) = P (HT ) + P (T H) = 1/2,
P (X = 2) = P (HH) = 1/4

Frédéric Lehmann Introduction to Probability and Estimation Theory 36 / 161

Probability mass function (p.m.f.): example continued

Bar diagram of the p.m.f of X, the number of heads for two

successive and independent coin tosses
p.m.f. pX(x)

1/2

1/4

x
0 1 2
Frédéric Lehmann Introduction to Probability and Estimation Theory 37 / 161
Basic concepts Probability mass function (p.m.f.)
Discrete random variables Expectation and variance
Continuous random variables Joint probability mass function of multiple random variables
Limit theorems Results for some useful distributions

Cumulative distribution function (c.d.f.)

Definition
P
The c.d.f. is defined as FX (x) = P (X ≤ x) = u≤x pX (u)

Expression when the range of X, x1 ≤ x2 ≤ · · · ≤ xn , is finite



 0, − ∞ < x < x1

pX (x1 ), x1 ≤ x < x2





FX (x) = pX (x1 ) + pX (x2 ), x2 ≤ x < x3

 .. ..
. .





pX (x1 ) + · · · + pX (xn ) = 1, xn ≤ x < ∞


Frédéric Lehmann Introduction to Probability and Estimation Theory 38 / 161

Cumulative distribution function (c.d.f.)

Properties
FX (x) is a piecewise constant function of x
FX is monotonically nondecreasing
limx→−∞ FX (x) = 0
limx→+∞ FX (x) = 1
P (a < X ≤ b) = FX (b) − FX (a)

Frédéric Lehmann Introduction to Probability and Estimation Theory 39 / 161

Cumulative distribution function (c.d.f.):

example continued
c.d.f of X, the number of heads for two successive and
independent coin tosses
c.d.f. FX(x)

1 ...
pX(2)

1/2

pX(1)

1/4

pX(0)

... x

0 1 2

Frédéric Lehmann Introduction to Probability and Estimation Theory 40 / 161

Change of variable

Functions of random variables

Let X be a random variable with p.m.f. pX (x).
Y = g(X), where g(.) is a real-valued function, is also a random
variable with p.m.f.
X
pY (y) = pX (x)
{x|g(x)=y}

Frédéric Lehmann Introduction to Probability and Estimation Theory 41 / 161

2 Discrete random variables

Probability mass function (p.m.f.)
Expectation and variance
Joint probability mass function of multiple random variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 42 / 161

Expectation

Definition
P
The expectation of a r.v. X is defined as E[X] = x xpX (x).

Interpretation
E(X), as the center of gravity of the p.m.f., represents the mean
value of X in a probabilistic sense.

Example with X, the number of heads for two successive and

independent coin tosses
E[X] = 0 × 1/4 + 1 × 1/2 + 2 × 1/4 = 1

Frédéric Lehmann Introduction to Probability and Estimation Theory 43 / 161

Expectation

Properties
P
E[g(X)] = x g(x)pX (x)
Given two scalars a and b, E[aX + b] = aE[X] + b

Frédéric Lehmann Introduction to Probability and Estimation Theory 44 / 161

Variance

Definition
The variance
of a r.v. X is defined as
V [X] = E (X − E[X])2 = x (x − E[X])2 pX (x).
P
The square root of the variance, σX is called the standard
deviation.

Interpretation
The standard deviation, σX , provides a measure of the dispersion
of X around its mean.

Example with X, the number of heads for two successive and

independent coin tosses
V [X] = (0 − 1)2 × 1/4 + (1 − 1)2 × 1/2 + (2 − 1)2 × 1/4 = 1/2

Frédéric Lehmann Introduction to Probability and Estimation Theory 45 / 161

Variance

Properties
V [X] = E[X 2 ] − (E[X])2
Given two scalars a and b, V [aX + b] = a2 V [X]

Frédéric Lehmann Introduction to Probability and Estimation Theory 46 / 161

Standardized random variable

Definition
Let X be a random variable with expectation E[X] and standard
deviation σX > 0, the standardized random variable is defined as

X − E[X]
X∗ =
σX

Properties
E[X]−E[X]
E[X ∗ ] = σX =0
V [X−E[X]] V [X]
V [X ∗ ] = 2
σX
= V [X] =1

Frédéric Lehmann Introduction to Probability and Estimation Theory 47 / 161

2 Discrete random variables

Probability mass function (p.m.f.)
Expectation and variance
Joint probability mass function of multiple random variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 48 / 161

Joint p.m.f. of two random variables

Let X and Y be two discrete random variables associated to the

same experiment.
Definition
The joint p.m.f. of X and Y for a particular (x, y) is the
probability of the event {X = x, Y = y}:

pX,Y (x, y) = P (X = x, Y = y)

By analogy, this can be easily generalized for more than two

random variables.

Frédéric Lehmann Introduction to Probability and Estimation Theory 49 / 161

Marginal p.m.f.

Let X and Y be two discrete random variables associated to the

same experiment.
Marginal p.m.f. of X and Y

X
pX (x) = P (X = x) = pX,Y (x, y)
y
X
pY (y) = P (Y = y) = pX,Y (x, y)
x

Frédéric Lehmann Introduction to Probability and Estimation Theory 50 / 161

Joint p.m.f. of two random variables

Let the range of X (resp. Y ) be {x1 , . . . , xn } (resp. {y1 , . . . , ym })

Joint p.m.f. pX,Y (x, y) in tabular form

X \Y y1 y2 ... ym Row sum

x1 pX,Y (x1 , y1 ) pX,Y (x1 , y2 ) ... pX,Y (x1 , ym ) pX (x1 )
x2 pX,Y (x2 , y1 ) pX,Y (x2 , y2 ) ... pX,Y (x2 , ym ) pX (x2 )
.. .. .. .. ..
. . . ... . .
xn pX,Y (xn , y1 ) pX,Y (xn , y2 ) ... pX,Y (xn , ym ) pX (xn )
Column sum pY (y1 ) pY (y2 ) ... pY (ym ) 1

Frédéric Lehmann Introduction to Probability and Estimation Theory 51 / 161

Joint c.d.f. of two random variables

Definition
The joint c.d.f of two random variables X and Y is defined as
XX
FX,Y (x, y) = P (X ≤ x, Y ≤ y) = pX,Y (u, v)
u≤x v≤y

Frédéric Lehmann Introduction to Probability and Estimation Theory 52 / 161

Change of variable

Functions of multiple random variables

Let X and Y be two random variable with joint p.m.f pX,Y (x, y).
U = g(X, Y ) and V = h(X, Y ), where g(.) and h(.) are
real-valued functions, are also random variables with joint p.m.f
X
pU,V (u, v) = pX,Y (x, y)
{(x,y)|g(x,y)=u and h(x,y)=v}

Frédéric Lehmann Introduction to Probability and Estimation Theory 53 / 161

Expectation of a function of two random variables

Consider the r.v. Z = g(X, Y )

The expectation of Z is given by
XX
E[g(X, Y )] = g(x, y)pX,Y (x, y)
x y

If g is linear, Z = aX + bY + c
Given the scalars a, b and c

E[aX + bY + c] = aE[X] + bE[Y ] + c

Frédéric Lehmann Introduction to Probability and Estimation Theory 54 / 161

Independence of two random variables

Definition
Two discrete r.v.s are independent if

pX,Y (x, y) = pX (x)pY (y), ∀(x, y)

Properties of independent discrete r.v.s

E[XY ] = E[X]E[Y ]
E[g(X)h(Y )] = E[g(X)]E[h(Y )]
V [X + Y ] = V [X] + V [Y ]

Frédéric Lehmann Introduction to Probability and Estimation Theory 55 / 161

Covariance and correlation coefficient

Definitions
Let X and Y be two discrete r.v.s, their covariance is defined as

cov(X, Y ) = E [(X − E[X])(Y − E[Y ])] = E[XY ] − E[X]E[Y ]

and their correlation coefficient, −1 ≤ ρ(X, Y ) ≤ 1, is defined as

cov(X, Y )
ρ(X, Y ) =
σX σY

Properties
X and Y are independent implies cov(X, Y ) = 0
The converse is false
Frédéric Lehmann Introduction to Probability and Estimation Theory 56 / 161
Basic concepts Probability mass function (p.m.f.)
Discrete random variables Expectation and variance
Continuous random variables Joint probability mass function of multiple random variables
Limit theorems Results for some useful distributions

2 Discrete random variables

Probability mass function (p.m.f.)
Expectation and variance
Joint probability mass function of multiple random variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 57 / 161

Bernoulli distribution with parameter p, B(1, p)

Definition
X ∼ B(1, p) describes the success or failure in a single trial
(
p, if k = 1
pX (k) =
q, if k = 0

where q = 1 − p

Properties
E[X] = p
V [X] = pq

Frédéric Lehmann Introduction to Probability and Estimation Theory 58 / 161

Binomial distribution of size n with parameter p, B(n, p)

Definition
X ∼ B(n, p) describes the number of successes in n independent
Bernoulli trials

n k n−k
pX (k) = p q , for k = 0, 1, . . . , n
k

where q = 1 − p

Properties
E[X] = np
V [X] = npq

Frédéric Lehmann Introduction to Probability and Estimation Theory 59 / 161

Poisson distribution with parameter λ, P(λ)

Definition
X ∼ P(λ) approximates the binomial p.m.f. when n is large, p is
small and λ = np

λk
pX (k) = e−λ , for k = 0, 1, . . .
k!

Properties
E[X] = λ
V [X] = λ

Frédéric Lehmann Introduction to Probability and Estimation Theory 60 / 161

Basic concepts Probability density function (p.d.f.)
Discrete random variables Expectation and variance
Continuous random variables Joint probability distribution function of multiple random variables
Limit theorems Results for some useful distributions

3 Continuous random variables

Probability density function (p.d.f.)
Expectation and variance
Joint probability distribution function of multiple random
variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 61 / 161

3 Continuous random variables

Probability density function (p.d.f.)
Expectation and variance
Joint probability distribution function of multiple random
variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 62 / 161

Continuous random variable

Intuition: A random variable is continuous if it has a continuous

range of possible values
Definition
A random variable X is continuous if there is a function fX , called
the probability density function (p.d.f.) of X verifying
(Nonnegativity) fX (x) ≥ 0, ∀x
R +∞
(Normalization) −∞ fX (x)dx = 1
R
such that ∀B ⊂ R, P (X ∈ B) = B fX (x)dx, where for simplicity,
all integrals are assumed to be well-defined in the usual Riemann
sense.

Frédéric Lehmann Introduction to Probability and Estimation Theory 63 / 161

Probability density function (p.d.f.)

Interpretation of fX (x): probability mass per unit length around x

For a small length δ, R
x+δ
P (X ∈ [x, x + δ]) = x fX (t)dt ≈ fX (x)δ
p.d.f. fX(x)

x x+
Note that P (X = x) = limδ→0 P (X ∈ [x, x + δ]) = 0
Frédéric Lehmann Introduction to Probability and Estimation Theory 64 / 161
Basic concepts Probability density function (p.d.f.)
Discrete random variables Expectation and variance
Continuous random variables Joint probability distribution function of multiple random variables
Limit theorems Results for some useful distributions

Cumulative distribution function (c.d.f.)

Definition
Rx
The c.d.f. is defined as FX (x) = P (X ≤ x) = −∞ fX (t)dt.
By differentiation, fX (x) = dF
dx (x), ∀x where fX is continuous
X

Graphical representation
p.d.f. fX(x) c.d.f. FX(x)

0
x x+

Frédéric Lehmann Introduction to Probability and Estimation Theory 65 / 161

Cumulative distribution function (c.d.f.)

Properties
FX (x) is a continuous function of x
FX is monotonically nondecreasing
limx→−∞ FX (x) = 0
limx→+∞ FX (x) = 1
Since including or exluding the endpoint of an interval has no
effect on an integral
Z b
P (a ≤ X ≤ b) = fX (x)dx
a
= P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b)
= FX (b) − FX (a)
Frédéric Lehmann Introduction to Probability and Estimation Theory 66 / 161
Basic concepts Probability density function (p.d.f.)
Discrete random variables Expectation and variance
Continuous random variables Joint probability distribution function of multiple random variables
Limit theorems Results for some useful distributions

Change of variable

Strictly monotonic functions of random variables

Let X be a random variable with p.d.f. fX (x).
Let g be a strictly monotonic function, such that for some function
h, y = g(x) if and only if x = h(y).
Assume h is differentiable, then the p.d.f. of Y = g(X) is

dh
pY (y) = fX (h(y)) (y)
dy

Frédéric Lehmann Introduction to Probability and Estimation Theory 67 / 161

3 Continuous random variables

Probability density function (p.d.f.)
Expectation and variance
Joint probability distribution function of multiple random
variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 68 / 161

Expectation

Definition
Z +∞
The expectation of a r.v. X is defined as E[X] = xfX (x)dx.
−∞

Interpretation
E(X), as the center of gravity of the p.d.f., represents the mean
value of X in a probabilistic sense.
It can also be seen the anticipated average value of X in a large
number of repeated independent experiments

Frédéric Lehmann Introduction to Probability and Estimation Theory 69 / 161

Expectation

Properties
Z +∞
E[g(X)] = g(x)fX (x)dx
−∞
Given two scalars a and b, E[aX + b] = aE[X] + b

Frédéric Lehmann Introduction to Probability and Estimation Theory 70 / 161

Variance

Definition
The variance of a r.v. X is defined
Z +∞ as
V [X] = E (X − E[X])2 = (x − E[X])2 fX (x)dx.

−∞
The square root of the variance, σX is called the standard
deviation.

Interpretation
The standard deviation, σX , provides a measure of the dispersion
of X around its mean.

Frédéric Lehmann Introduction to Probability and Estimation Theory 71 / 161

Variance

Properties
V [X] = E[X 2 ] − (E[X])2
Given two scalars a and b, V [aX + b] = a2 V [X]

Frédéric Lehmann Introduction to Probability and Estimation Theory 72 / 161

Standardized random variable

Definition
Let X be a random variable with expectation E[X] and standard
deviation σX > 0, the standardized random variable is defined as

X − E[X]
X∗ =
σX

Properties
E[X]−E[X]
E[X ∗ ] = σX =0
V [X−E[X]] V [X]
V [X ∗ ] = 2
σX
= V [X] =1

Frédéric Lehmann Introduction to Probability and Estimation Theory 73 / 161

3 Continuous random variables

Probability density function (p.d.f.)
Expectation and variance
Joint probability distribution function of multiple random
variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 74 / 161

Joint p.d.f. of two random variables

Definition
Two random variable X and Y are jointly continuous if there is a
function fX,Y , called the joint probability density function (joint
p.d.f.) of X and Y verifying
(Nonnegativity) fX,Y (x, y) ≥ 0, ∀(x, y)
Z +∞ Z +∞
(Normalization) fX,Y (x, y)dxdy = 1
−∞ −∞ Z Z
such that ∀B ⊂ R2 , P ((X, Y ) ∈ B) = fX,Y (x, y)dxdy,
(x,y)∈B
where all integrals are assumed well-defined in the Riemann sense.

Frédéric Lehmann Introduction to Probability and Estimation Theory 75 / 161

Joint p.d.f. of two random variables

Intuition: fX,Y encodes the probabilistic information about X, Y

and their dependencies
Interpretation of fX,Y (x, y): probability mass per unit area around
(x, y)
For a small length δ,
Z x+δ Z y+δ
P (X ∈ [x, x + δ], Y ∈ [y, y + δ]) = fX,Y (u, v)dudv
x y
≈ fX,Y (x, y)δ 2

Frédéric Lehmann Introduction to Probability and Estimation Theory 76 / 161

Marginal p.d.f.

Marginal p.d.f. of X (similar expression for Y )

∀A ⊂ R, Z Z +∞
P (X ∈ A and Y ∈ (−∞, +∞)) = fX,Y (x, y)dydx.
A Z −∞

Since the p.d.f. of X is defined by P (X ∈ A) = fX (x)dx,

Z +∞ A

it follows that fX (x) = fX,Y (x, y)dy.

R +∞ −∞
(Similarly, fY (y) = −∞ fX,Y (x, y)dx).

Frédéric Lehmann Introduction to Probability and Estimation Theory 77 / 161

Joint c.d.f. of two random variables

Definition
The joint c.d.f of two random variables X and Y is defined as
Z x Z y
FX,Y (x, y) = P (X ≤ x, Y ≤ y) = fX,Y (u, v)dudv
−∞ −∞

Conversely, by differentiation we have

∂ 2 FX,Y
fX,Y (x, y) = (x, y)
∂x∂y

Frédéric Lehmann Introduction to Probability and Estimation Theory 78 / 161

Change of variable

Functions of multiple random variables

Let X and Y be two jointly continuous random variable with joint
p.d.f fX,Y (x, y).
(U, V ) = g(X, Y ), where g(., .) is a real-valued bijective function,
are also jointly continuous random variables with joint p.d.f

fU,V (u, v) = |J| fX,Y (x, y)

(x,y)=g −1 (u,v)

where J is the jacobian of g −1

∂x ∂x
J= ∂u ∂v
∂y ∂y
∂u ∂v (x,y)=g −1 (u,v)

Frédéric Lehmann Introduction to Probability and Estimation Theory 79 / 161

Expectation of a function of two random variables

Consider the r.v. Z = g(X, Y )

The expectation of Z is given by
Z +∞ Z +∞
E[g(X, Y )] = g(x, y)fX,Y (x, y)dxdy
−∞ −∞

If g is linear, Z = aX + bY + c
Given the scalars a, b and c

E[aX + bY + c] = aE[X] + bE[Y ] + c

Frédéric Lehmann Introduction to Probability and Estimation Theory 80 / 161

Independence of two random variables

Definition
Two continuous r.v.s are independent if

fX,Y (x, y) = fX (x)fY (y), ∀(x, y)

Properties of independent continuous r.v.s

E[XY ] = E[X]E[Y ]
E[g(X)h(Y )] = E[g(X)]E[h(Y )]
V [X + Y ] = V [X] + V [Y ]

Frédéric Lehmann Introduction to Probability and Estimation Theory 81 / 161

Covariance and correlation coefficient

Definitions
Let X and Y be two continuous r.v.s, their covariance is defined as

cov(X, Y ) = E [(X − E[X])(Y − E[Y ])] = E[XY ] − E[X]E[Y ]

and their correlation coefficient, −1 ≤ ρ(X, Y ) ≤ 1, is defined as

cov(X, Y )
ρ(X, Y ) =
σX σY

Properties
X and Y are independent implies cov(X, Y ) = 0
The converse is false
Frédéric Lehmann Introduction to Probability and Estimation Theory 82 / 161
Basic concepts Probability density function (p.d.f.)
Discrete random variables Expectation and variance
Continuous random variables Joint probability distribution function of multiple random variables
Limit theorems Results for some useful distributions

3 Continuous random variables

Probability density function (p.d.f.)
Expectation and variance
Joint probability distribution function of multiple random
variables
Results for some useful distributions

Frédéric Lehmann Introduction to Probability and Estimation Theory 83 / 161

Uniform distribution over [a, b], U([a, b])

Definition
If X ∼ U([a, b]), then

0, if x < a

 1 , if a ≤ x ≤ b
 

x − a
fX (x) = b − a FX (x) = , if a ≤ x ≤ b
b−a
0, otherwise
 


1, otherwise

Properties
a+b
E[X] = 2
(b−a)2
V [X] = 12

Frédéric Lehmann Introduction to Probability and Estimation Theory 84 / 161

Exponential distribution with parameter λ, E(λ)

Definition
If X ∼ E(λ), then
( (
λe−λx , if x ≥ 0 1 − e−λx , if x ≥ 0
fX (x) = FX (x) =
0, otherwise 0, otherwise

Properties
1
E[X] = λ
1
V [X] = λ2

Frédéric Lehmann Introduction to Probability and Estimation Theory 85 / 161

Gaussian distribution with mean 0 and variance 1, N (0, 1)

Definition
If X ∼ N (0, 1) (standard normal r.v.), then
1 x2 x
fX (x) = √ e− 2 ,
Z
1 t2
2π FX (x) = Φ (x) = √ e− 2 dt
−∞ 2π

Properties
E[X] = 0
V [X] = 1

Frédéric Lehmann Introduction to Probability and Estimation Theory 86 / 161

Gaussian distribution with mean m and variance σ 2 ,

N (m, σ 2 )

Definition
If X ∼ N (m, σ 2 ), then

(x−m)2

1 x−m
fX (x) = √ e− 2σ2 FX (x) = Φ
σ 2π σ

Properties
E[X] = m
V [X] = σ 2

Frédéric Lehmann Introduction to Probability and Estimation Theory 87 / 161

Properties of Gaussian random variables

Gaussianity preserved by linearity

Let X ∼ N (mX , σX 2 ), and Y ∼ N (m , σ 2 ) be two independent
Y Y
Gaussian r.v.s.
Let a and b be fixed scalars, then
Z = aX + bY ∼ N (mZ , σZ2 )
mZ = amX + bmY
σZ2 = a2 σX
2 + b2 σ 2
Y

Frédéric Lehmann Introduction to Probability and Estimation Theory 88 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

4 Limit theorems
Chebyshev inequality
Law of large numbers
Central limit theorem

Frédéric Lehmann Introduction to Probability and Estimation Theory 89 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

4 Limit theorems
Chebyshev inequality
Law of large numbers
Central limit theorem

Frédéric Lehmann Introduction to Probability and Estimation Theory 90 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

Chebyshev inequality

Theorem
Let X be a r.v. with finite mean µ and variance σ 2 , then
2
P (|X − µ| ≥ ) ≤ σ2 , ∀ > 0

Proof for a continuous r.v. (similar for a discrete r.v.)

Z +∞ Z
2 2
σ = (x − µ) fX (x)dx ≥ (x − µ)2 fX (x)dx
−∞ |x−µ|≥
Z
2
≥ fX (x)dx = 2 P (|X − µ| ≥ )
|x−µ|≥

Frédéric Lehmann Introduction to Probability and Estimation Theory 91 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

4 Limit theorems
Chebyshev inequality
Law of large numbers
Central limit theorem

Frédéric Lehmann Introduction to Probability and Estimation Theory 92 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

(Weak) Law of large numbers

Theorem
Let X1 , X2 , . . . , Xn be independent and identically distributed
2
with mean µ and variance σ ,then ∀ > 0
(i.i.d.) r.v.s
X1 + X2 + · · · + Xn
lim P −µ ≥ =0
n→+∞ n

Proof for a continuous r.v. (similar for a discrete r.v.)

2
Let Mn = X1 +X2n+···+Xn , E[Mn ] = µ and V [Mn ] = σn
σ2
Applying Chebyshev’s inequality P (|Mn − µ| ≥ ) ≤ n 2 , ∀ > 0.
As n → +∞ this probability tends to 0

Frédéric Lehmann Introduction to Probability and Estimation Theory 93 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

Consequence of the law of large numbers

Consistency with the frequentist definition of probability

Consider an event A with probability P (A). Define the r.v.

1, if A occurs
Xi =
0, otherwise

Let X1 , X2 , . . . , Xn be i.i.d. distributed r.v.s with mean

E[Xi ] = 1 × P (A) + 0 × (1 − P (A)) = P (A), ∀i.
Then, the relative frequency of occurrence of A is
Mn = X1 +X2n+···+Xn , with E[Mn ] = P (A).
Applying the (weak) law of large numbers obtained from the
axiomatic definition of probability, we obtain
lim P (|Mn − P (A)| ≥ ) = 0, ∀ > 0
n→+∞
(Interpretation: Mn is concentrated around P (A) for large n)
Frédéric Lehmann Introduction to Probability and Estimation Theory 94 / 161
Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

Consequence of the law of large numbers

Recovering the p.d.f. using an histogram

Consider a continuous r.v. X with p.d.f fX .
If A is the event {X ∈ [a − h, a + h]}, for a small h > 0,
Z a+h
P (A) = fX (x)dx ≈ 2hfX (a)
a−h
Mn number of times Xi = 1, i ≤ n
Then fX (a) ≈ 2h = n×2h
0.4

0.35

0.3

0.25
fX(x)

0.2

0.15

0.1

0.05

0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x

Frédéric Lehmann Introduction to Probability and Estimation Theory 95 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

4 Limit theorems
Chebyshev inequality
Law of large numbers
Central limit theorem

Frédéric Lehmann Introduction to Probability and Estimation Theory 96 / 161

Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

Central limit theorem

Theorem
Let X1 , X2 , . . . , Xn be i.i.d. r.v.s with mean µ and variance σ 2 ,
and define the standardized sum
n −nµ
Zn = X1 +X2 +···+X
√
σ n
, with E[Zn ] = 0 and V [Zn ] = 1,
then lim P (Zn ≤ x) = Φ(x)
n→+∞

Normal approximation based on the central limit theorem

Let Sn = X1 + X2 + · · · + Xn be a sum of i.i.d. r.v.s, its
distribution can be approximated as normal, typically when
n > 30
The Xi ’s can be discrete or continuous
The distribution of the Xi ’s need not be known
Frédéric Lehmann Introduction to Probability and Estimation Theory 97 / 161
Basic concepts
Chebyshev inequality
Discrete random variables
Law of large numbers
Continuous random variables
Central limit theorem
Limit theorems

Application

De Moivre-Laplace approximation to the binomial

Let X1 , X2 , . . . , Xn be i.i.d. according to B(1, p), the sum
Sn = X1 + X2 + · · · + Xn ∼ B(n, p).
If n > 30, np > 5 and n(1 − p) >5
P (k ≤ Sn ≤ l) ≈ Φ √ l−np −Φ √ k−np
np(1−p) np(1−p)
Histogram of 20000 draws from B(100,0.1)
3000

2500

2000

1500

1000

500

0
0 5 10 15 20 25

Frédéric Lehmann Introduction to Probability and Estimation Theory 98 / 161

Poisson process
Discrete-time Markov processes

Part II

Introduction to Random Processes

Frédéric Lehmann Introduction to Probability and Estimation Theory 99 / 161

Poisson process
Discrete-time Markov processes

Basic concepts

Definition of a random process

Consider a sample space Ω and a set T of discrete or continuous
time instants, a random process is defined as a family of random
variables (Xt )t∈T
For a fixed tk ∈ T , ω → X(tk , ω) is a r.v.
For a fixed ωi ∈ Ω, t → xi (t) = X(t, ωi ) is a deterministic
function (or realization) defined on T

Frédéric Lehmann Introduction to Probability and Estimation Theory 100 / 161

Poisson process
Discrete-time Markov processes

Basic concepts

Illustration of the concept of random process

x1(t)
1
x1(tk)
2
t
0

x2(t)
n

x2(tk)

t
0

xn(t)
xn(tk)

t
0
tk

Frédéric Lehmann Introduction to Probability and Estimation Theory 101 / 161

Poisson process
Discrete-time Markov processes

Basic concepts

Definition of a point
A point is a discrete event that occurs in continuous time (or
space).

Definition of a point process

A point process is a random process {Xk , k ≥ 1}, whose
realizations are points in time (or space).

Examples of point processes

The arrival times of costumers in a store, movie theater, etc
The location of trees (resp. asteroid impacts) in a forest
(resp. on earth)

Frédéric Lehmann Introduction to Probability and Estimation Theory 102 / 161

Poisson process
Discrete-time Markov processes

5 Poisson process

Frédéric Lehmann Introduction to Probability and Estimation Theory 103 / 161

Poisson process
Discrete-time Markov processes

The homogeneous Poisson process over the positive half

line

Definition
A temporal point process {Xk , k ≥ 1} is called a Poisson process
with rate λ if it has the following properties
Time homogeneity: The p.m.f. of the number of arrivals Nτ
k
over any interval of length τ is pNτ (k) = e−λτ (λτk!) , ∀k ≥ 0
Independence: The number of arrivals during a particular
interval is independent of the history of arrivals outside this
interval

Frédéric Lehmann Introduction to Probability and Estimation Theory 104 / 161

Poisson process
Discrete-time Markov processes

The homogeneous Poisson process over the positive half

line

Interpretation
Time homogeneity: Arrivals are equally likely at all times
Independence: The conditional probability of k arrivals during
[t, t0 ], given the occurrence of n arrivals for t ∈
/ [t, t0 ] is equal
to the unconditional probability
Rate λ: Average number of arrivals per unit time

Frédéric Lehmann Introduction to Probability and Estimation Theory 105 / 161

Poisson process
Discrete-time Markov processes

Interarrival times

Definition
The k-th interarrival time is the r.v. defined by T1 = X1 ,
Tk = Xk − Xk−1 , k = 2, 3, . . .
Note that Xk = T1 + T2 + · · · + Tk

Frédéric Lehmann Introduction to Probability and Estimation Theory 106 / 161

Poisson process
Discrete-time Markov processes

Properties of interarrival times

Probability law of the first arrival time

c.d.f. of T1 : FT1 (t) = P (T1 ≤ t) = 1 − P (T1 > t) =
1 − P ({no arrivals in (0, t]}) = 1 − e−λt , t ≥ 0, using the time
homogeneity property
p.d.f. of T1 , fT1 (t), obtained by differentiating FT1 (t):
fT1 (t) = λe−λt , t ≥ 0

Frédéric Lehmann Introduction to Probability and Estimation Theory 107 / 161

Poisson process
Discrete-time Markov processes

Properties of interarrival times

Probability law of the k-th interarrival time

c.d.f. of Tk : using the independence property,
FTk (t) = P (Tk ≤ t|T1 = s1 , . . . , Tk−1 = sk−1 )
= 1 − P (Tk > t|T1 = s1 , . . . , Tk−1 = sk−1 )
( )!
k−1
X k−1 X i
=1−P no arrivals in si , si + t
i=1 i=1

Using time homogeneity: FTk (t) = 1 − e−λt , t ≥ 0.

Since the result does not depend on s1 , . . . , sk−1 , Tk is
independent of T1 , . . . , Tk−1
p.d.f. of Tk : fTk (t) = λe−λt , t ≥ 0

Frédéric Lehmann Introduction to Probability and Estimation Theory 108 / 161

Poisson process
Discrete-time Markov processes

Properties of interarrival times

Summary
Interarrival times T1 , T2 . . . , are independent r.v.s
The p.d.f. of each interarrival time is E(λ)

Simulation of a Poisson process with rate λ

Simulate Tk ∼ E(λ), k ≥ 1
Obtain the points of the Poisson process from
Xk = T1 + T2 + · · · + Tk

Frédéric Lehmann Introduction to Probability and Estimation Theory 109 / 161

Poisson process
Discrete-time Markov processes

P.d.f of the arrival times

Consider the k-th arrival time Xk

Xk is the sum of k independent r.v.s with p.d.f. E(λ)
The p.d.f. of Xk is the Erlang p.d.f. of order k,
k k−1 e−λx
fXk (x) = λ x(k−1)! , x≥0

Frédéric Lehmann Introduction to Probability and Estimation Theory 110 / 161

Poisson process
Discrete-time Markov processes

6 Discrete-time Markov processes

Frédéric Lehmann Introduction to Probability and Estimation Theory 111 / 161

Poisson process
Discrete-time Markov processes

Discrete-time Markov process

Definition
A random process {Xn , n ∈ N}, taking values in the set of states
S = {s1 , s2 , . . . , sm }, is an time homogeneous Markov chain if
(Markov property):
P (Xn+1 = sj |Xn = si , Xn−1 = sin−1 , . . . , X0 = si0 ) =
P (Xn+1 = sj |Xn = si ), ∀sin−1 . . . , si0 ∈ S
(Time homogeneity): The transition probabilities defined by
pij = P (Xn+1 = sj |Xn = si ) ≥ 0 are independent of n
m
X
and satisfy pij = 1
j=1

Frédéric Lehmann Introduction to Probability and Estimation Theory 112 / 161

Poisson process
Discrete-time Markov processes

Specifying a Discrete-time Markov process

Transition probability matrix

A time homogeneous Markov chain can be represented by the
square [m × m] matrix, whose i-th row and j-th column is pij :

P = [pij ]1≤i≤m
1≤j≤m

Transition diagram
A time homogeneous Markov chain can be represented by a graph,
whose nodes are the states (i.e. the elements of S)
whose arcs are the allowed state transitions, labeled with the
corresponding transition probabilities

Frédéric Lehmann Introduction to Probability and Estimation Theory 113 / 161

Poisson process
Discrete-time Markov processes

Discrete-time Markov process example

Machine failure model

A given day, a machine can be either working (i.e. in state s1 ) or
broken down (i.e. in state s2 ). The probability of breaking down,
while it was working the previous day, is b. The probability of
getting repaired, while it was broken down the previous day, is r.

Transition probability matrix/diagram

b

1−b b
P= 1-b s1 s2 1-r
r 1−r
r

Frédéric Lehmann Introduction to Probability and Estimation Theory 114 / 161

Poisson process
Discrete-time Markov processes

Probability of a path

Consider a Markov chain with pij = P (Xn+1 = sj |Xn = si )

Using the multiplicative rule of conditional probabilities

P (X0 = i0 , X1 = i1 , . . . , Xn−1 = in−1 , Xn = in )

= P (X0 = i0 )pi0 i1 . . . pin−1 in

Frédéric Lehmann Introduction to Probability and Estimation Theory 115 / 161

Poisson process
Discrete-time Markov processes

2-step transition probabilities

Definition
Given that we are in state si , the probability that we are in state sj
in two steps from now is
m
(2)
X
pij = p(X2 = sj |X0 = si ) = pik pkj
k=1

which is the (i, j)-th entry of P2

Frédéric Lehmann Introduction to Probability and Estimation Theory 116 / 161

Poisson process
Discrete-time Markov processes

2-step transition probabilities

Derivation
From the total probability theorem
m
(2)
X
pij = P (X2 = sj |X0 = si ) = P (X2 = sj , X1 = sk |X0 = si )
k=1
m
X
= P (X2 = sj |X1 = sk , X0 = si )P (X1 = sk |X0 = si )
k=1
Xm
= pik pkj
k=1

Frédéric Lehmann Introduction to Probability and Estimation Theory 117 / 161

Poisson process
Discrete-time Markov processes

2-step transition probabilities

Derivation
Time 0 Time 1 Time 2

. p1j
pi1 .
.

pik sk
pkj
si .
. sj
pim . pmj

Frédéric Lehmann Introduction to Probability and Estimation Theory 118 / 161

Poisson process
Discrete-time Markov processes

n-step transition probabilities

Chapman-Kolmogorov equation
Given that we are in state si , the probability that we are in state sj
in n steps from now is by definition
(n)
pij = p(Xn = sj |X0 = si ),

which is equal to the (i, j)-th entry of Pn

Proof
By induction

Frédéric Lehmann Introduction to Probability and Estimation Theory 119 / 161

Poisson process
Discrete-time Markov processes

Probability vector at instant n

Result
Let u(n) = [P (Xn = s1 ), . . . , P (Xn = sm )] be the probability
vector at instant n, then

u(n) = u(0) Pn

Proof
Using the total probability theorem
P (X
Pnm= sj )
= Pi=1 P (Xn = sj , X0 = si )
= m P (X0 = si )P (Xn = sj |X0 = si )
Pi=1
m (n)
= i=1 P (X0 = si )pij

Frédéric Lehmann Introduction to Probability and Estimation Theory 120 / 161

Poisson process
Discrete-time Markov processes

Regular Markov chain

Definition
A Markov chain is said regular if some power of the transition
matrix has only (strictly) positive elements.

Interpretation
For some n, it is possible to go from any state to any state in
exactly n steps.

Example of transition matrix for non-regular Markov chain

1 0
P=
1/2 1/2
then Pn will be 0 in position (1, 2), ∀n ≥ 1 (since s1 is absorbing)
Frédéric Lehmann Introduction to Probability and Estimation Theory 121 / 161
Poisson process
Discrete-time Markov processes

Steady-state convergence theorem

For regular Markov chain with transition matrix P

For an arbitrary starting distribution u(0) , the probability vector at
instant n, u(0) Pn , converges as n → ∞ to a unique steady-state
probability vector π.

Properties of the steady-state probability vector

The steady-state probability vector π = [π1 , . . . , πm ] is the unique
solution of

X m
π = πP (or equivalently πj = πk pkj , ∀j)


 k=1

π1 = 1

Frédéric Lehmann Introduction to Probability and Estimation Theory 122 / 161

Poisson process
Discrete-time Markov processes

Birth-death processes

Definition
A birth-death process is a particular Markov process, where only
self-transitions or transitions to neighboring states are allowed.
Therefore, the transitions probabilities (pij 6= 0 iif
j ∈ {i − 1, i, i + 1}) are completely parameterized by
bi = P (Xn+1 = si+1 |Xn = si ) (birth probability at state si )
di = P (Xn+1 = si−1 |Xn = si ) (death probability at state si )

Applications
Modeling of queueing problems in telecommunications
Modeling of desease evolution

Frédéric Lehmann Introduction to Probability and Estimation Theory 123 / 161

Poisson process
Discrete-time Markov processes

Birth-death processes

Transition diagram
1-b0 1-b1-d1 1-bm-1-dm-1 1-dm
b0 b1 bm-2 bm-1

s0 s1 … Sm-1 sm

d1 d2 dm-1 dm

Frédéric Lehmann Introduction to Probability and Estimation Theory 124 / 161

Poisson process
Discrete-time Markov processes

Birth-death processes

Transition probability matrix

1 − b b0 0 0 ... 0 0

0
d1 1 − b1 − d1 b1 0 ... 0 0
1 − b2 − d2
 0 d2 b2 ... 0 0

 
 .. .. .. .. .. 
 . 0 d3 . . . . 
P=
 
.. .. .. .. .. 
 . . 0 . . 0 . 
 . . . . ..

 .. .. .. .. . bm−2 0

 
0 0 0 ... dm−1 1 − bm−1 − dm−1 bm−1
0 0 0 ... 0 dm 1 − dm

Frédéric Lehmann Introduction to Probability and Estimation Theory 125 / 161

Poisson process
Discrete-time Markov processes

Birth-death processes

Steady-state probability vector (Proof: by induction)

A birth-death process is a regular Markov chain, whose
steady-state probability vector π = [π0 , π1 , . . . , πm ] verifies
the local balance equations πi bi = πi+1 di+1 , i = 0, 1, . . . , m − 1

Expression of the steady-state probabilities (Proof: by induction)

The steady-state probabilities are the unique solution of
b b . . . bi−1

πi = π0 0 1
 , i = 1, . . . , m

 d1 d2 . . . di
Xm
πi = 1




i=0

Frédéric Lehmann Introduction to Probability and Estimation Theory 126 / 161

Classical Estimation
Bayesian Estimation

Part III

Estimation Theory

Frédéric Lehmann Introduction to Probability and Estimation Theory 127 / 161

Classical Estimation
Bayesian Estimation

Motivation...

Problem statement: Given a set of noisy observations (data),

determine the value of a parameter of interest
Applications:
Radar: target range from radar returns
Image analysis: object position from a camera image
Communications: carrier frequency from demodulated signal
Biomedecine: fetus heart rate from a noisy sensor
Etc

Frédéric Lehmann Introduction to Probability and Estimation Theory 128 / 161

Classical Estimation
Bayesian Estimation

Data or observation set

N -point data set

Definition: realization x = (x1 , x2 , . . . , xN )T of a continuous
random process X = (X1 , X2 , . . . , XN )T , whose joint
probability distribution (p.d.f.) fX (x; θ) is parameterized by
an unknown parameter θ.
Remark: if the random process X is discrete, replace the
joint p.d.f. of X by its joint p.m.f. pX (x; θ).

Frédéric Lehmann Introduction to Probability and Estimation Theory 129 / 161

Classical Estimation
Bayesian Estimation

General definitions

Estimator for an unknown parameter θ

Definition: a random variable of the form Θ̂ = g(X), for
some function g
Properties: the p.d.f of Θ̂ can be obtained from that of X
and thus also depends on θ

Examples
PN
Mean estimator: M̂ = i=1 Xi /N
PN
Variance estimator: V̂ = i=1 (Xi − M̂ )2 /N

Frédéric Lehmann Introduction to Probability and Estimation Theory 130 / 161

Classical Estimation
Bayesian Estimation

Estimator’s p.d.f.

Desirable properties
For E[Θ̂]: E[Θ̂] = θ (unbiasedness)
For V [Θ̂]: small (good estimation accuracy)
p.d.f. f ^ ( ^ )

^
√V[
^

^
E[

Frédéric Lehmann Introduction to Probability and Estimation Theory 131 / 161

Classical Estimation
Bayesian Estimation

General definitions (continued)

Estimate for an unknown parameter θ

Definition: Let x = (x1 , x2 , . . . , xN )T be a realization of
X = (X1 , X2 , . . . , XN )T , the estimate θ̂ = g(x) is the
corresponding realization of Θ̂

Examples
PN
Mean estimate: m̂ = i=1 xi /N
PN
Variance estimate: v̂ = i=1 (xi − m̂)2 /N

Frédéric Lehmann Introduction to Probability and Estimation Theory 132 / 161

Classical Estimation
Bayesian Estimation

Classical versus Bayesian estimation

Classical estimation
Principle: The unknown parameter θ is considered as a
deterministic constant
We know: fX (x; θ), the joint p.d.f. of X, parameterized by
the unknown constant θ

Bayesian estimation
Principle: The unknown parameter θ is modeled as a random
variable Θ
We know: the joint p.d.f. fX,Θ (x, θ) as the product of
1 A prior distribution fΘ (θ) for the unknown random variable Θ
2 A conditional distribution fX|Θ (x|θ) for the random process X

Frédéric Lehmann Introduction to Probability and Estimation Theory 133 / 161

Classical Estimation
Bayesian Estimation

7 Classical Estimation

Frédéric Lehmann Introduction to Probability and Estimation Theory 134 / 161

Classical Estimation
Bayesian Estimation

Performance measures

Definitions
Bias of the estimator: b(Θ̂) = E[Θ̂] − θ
2
Variance of the estimator: V (Θ̂) = E Θ̂ − E[Θ̂]
2
Mean square error (MSE): mse(Θ̂) = E Θ̂ − θ

Property
2
mse(Θ̂) = E (Θ̂ − E[Θ̂]) + (E[Θ̂ − θ]) = V (Θ̂) + b(Θ̂)2

Frédéric Lehmann Introduction to Probability and Estimation Theory 135 / 161

Classical Estimation
Bayesian Estimation

Performance criteria

Minimum MSE criterion

Find the estimator
Z that minimizes the mean square error
mse(Θ̂) = (θ̂ − θ)2 fX (x; θ)dx
Problem: this estimator is in general a function of θ

Minimum Variance Unbiased Estimator (MVUE)

Find the estimator Θ̂ which
Is unbiased: b(Θ̂) = 0
And minimizes the variance V (Θ̂), ∀θ

Frédéric Lehmann Introduction to Probability and Estimation Theory 136 / 161

Classical Estimation
Bayesian Estimation

Existence of the Minimum Variance Unbiased Estimator

Assume only three unbiased estimators, Θ̂1 , Θ̂2 and Θ̂3 exist
Left-hand side: Θ̂1 is the MVUE
Right-hand side: MVUE does not exist, since no unbiased
estimator has minimum variance ∀θ
^ ^
V[ V[
^ ^
V[ V[

^ ^
V[ V[
^
V[ ^
V[

^
is the MVUE MVUE does not exist

Frédéric Lehmann Introduction to Probability and Estimation Theory 137 / 161

Classical Estimation
Bayesian Estimation

Cramer-Rao lower bound (CRLB)

Theorem (θ is a scalar parameter)

Assumption:
h p.d.f.
i fX (x; θ) satisfies the regularity condition
∂ log fX (x;θ)
E ∂θ = 0, ∀θ
Define the Fisher
2 information as:
∂ log fX (x; θ)
I(θ) = −E
∂2θ
Z 2
∂ log fX (x; θ)
=− fX (x; θ)dx
∂2θ θ=true value

Any unbiased estimator Θ̂ satisfies: V [Θ̂] ≥ I(θ)−1

Provides a benchmark against which to compare the variance of

any unbiased estimator
Frédéric Lehmann Introduction to Probability and Estimation Theory 138 / 161
Classical Estimation
Bayesian Estimation

Efficient Estimator
An unbiased estimator attaining the CRLB ∀θ, is said efficient
Assume only three unbiased estimators, Θ̂1 , Θ̂2 and Θ̂3 exist
Left-hand side: the MVUE Θ̂1 exists, but is not efficient
Right-hand side: the MVUE Θ̂1 exists and is efficient
^ ^
V[ V[
^ ^
V[ V[

^ ^
V[ V[
^
V[ ^
V[ =CRLB
CRLB

^ ^
is the MVUE but is not efficient is the MVUE and is efficient

Frédéric Lehmann Introduction to Probability and Estimation Theory 139 / 161

Classical Estimation
Bayesian Estimation

Cramer-Rao Lower Bound (CRLB)

Theorem continued (θ is a scalar parameter)

An efficient estimator exists iif there exists a function g such
that:
∂ log fX (x; θ)
= I(θ) (g(x) − θ) , ∀θ
∂θ
Then the efficient estimator is given by: Θ̂ = g(X)
Since no unbiased estimator can do better, this is also the
MVUE

Frédéric Lehmann Introduction to Probability and Estimation Theory 140 / 161

Classical Estimation
Bayesian Estimation

Best Linear Unbiased Estimator (BLUE)

Motivation
MVUE may not even exist
Even if MVUE exists, it may not be found using the CRLB
theorem
Suboptimal BLUE estimator: find the minimum variance
unbiased estimator within the restricted class linear estimators

Frédéric Lehmann Introduction to Probability and Estimation Theory 141 / 161

Classical Estimation
Bayesian Estimation

Best Linear Unbiased Estimator (BLUE)

Explore the class of all unbiased estimators

Left-hand side: BLUE is suboptimal
Right-hand side: BLUE is optimal
^ ^
V[ V[

Class of
linear estimators
Class of
linear
estimators Class of
BLUE nonlinear
estimators

Class of
nonlinear estimators

MVUE BLUE=MVUE

Frédéric Lehmann Introduction to Probability and Estimation Theory 142 / 161

Classical Estimation
Bayesian Estimation

Best Linear Unbiased Estimator (BLUE)

BLUE (for a scalar parameter θ)

Assumption 1: Linear observation model X = θs + W, where
W is a length-N zero-mean noise process with arbitrary joint
p.d.f. and s is a suitable length-N deterministic vector
Assumption 2: Linear estimator Θ̂ = aT X
Finding the BLUE: aopt = arg mina V [Θ̂] subject to the
unbiasdness constraint E[Θ̂] = θ
−1
Solution: aopt = sTCC−1s s , where
C = E[(X − E[X])(X − E[X])T ] = E[WWT ]

Only requirements: knowledge of s and C

Frédéric Lehmann Introduction to Probability and Estimation Theory 143 / 161

Classical Estimation
Bayesian Estimation

Maximum-Likelihood Estimator (MLE)

Motivation
MVUE may be difficult to find or not even exist
BLUE always exists, but sometimes a linear estimator is
totally inappropriate (ex: if we force the estimator of the
variance to be linear function of X !)
MLE Principle: what is the value of θ under which the
observations we have actually seen are most likely?

Frédéric Lehmann Introduction to Probability and Estimation Theory 144 / 161

Classical Estimation
Bayesian Estimation

Maximum-Likelihood Estimator (MLE)

Definitions
Likelihood function: the joint p.d.f. fX (x; θ) considered as a
numerical function of θ, by setting x as the actually observed
data
Log-likelihood function: the logarithm of the previous function

Find the value of θ maximizing the likelihood function

The maximum-likelihood estimate is given by
θ̂ = arg max fX (x; θ)
θ
Or equivalently θ̂ = arg max log fX (x; θ)
θ

Frédéric Lehmann Introduction to Probability and Estimation Theory 145 / 161

Classical Estimation
Bayesian Estimation

Maximum-Likelihood Estimator (MLE)

Finding the MLE

Assumption: the log-likelihood function is differentiable, so
that
∂ log fX (x;θ)
∂θ =0
θ=θ̂
Analytical determination: find a closed form solution (not
always feasible)
Numerical determination: gradient algorithm,
Newton-Raphson method, etc

Frédéric Lehmann Introduction to Probability and Estimation Theory 146 / 161

Classical Estimation
Bayesian Estimation

Example: Estimate the rate of a Poisson process

Problem
Costumer arrival times Yi at a restaurant form a Poisson
process with rate θ
Interarrival times Xi = Yi − Yi−1 ∼ E(θ) (with the convention
Y0 = 0) are known to be independent r.v.s
We collect the set of observations x = (x1 , . . . , xN )T

Frédéric Lehmann Introduction to Probability and Estimation Theory 147 / 161

Classical Estimation
Bayesian Estimation

Example: Estimate the rate of a Poisson process

MLE estimator of θ
N
Y
Likelihood function: fX (x; θ) = θe−θxi
i=1
PN
Log-likelihood function: log fX (x; θ) = N log θ − i=1 θxi
Derivative of the log-likelihood function:
∂ log fX (x;θ) N PN
∂θ = θ − i=1 x i
PN −1
i=1 xi
Maximum-likelihood estimate: θ̂ = N (inverse of
the sample mean of the interarrival times)
PN −1
i=1 Xi
Maximum-likelihood estimator: Θ̂ = N

Frédéric Lehmann Introduction to Probability and Estimation Theory 148 / 161

Classical Estimation
Bayesian Estimation

Maximum-Likelihood Estimator (MLE)

Properties of the MLE

If an efficient estimator exists, the MLE will produce it
Asymptotic efficiency: under some regularity conditions, the
MLE Θ̂ is asymptotically (for large data sets) distributed as
N (θ, I(θ)−1 )

Frédéric Lehmann Introduction to Probability and Estimation Theory 149 / 161

Classical Estimation
Bayesian Estimation

8 Bayesian Estimation

Frédéric Lehmann Introduction to Probability and Estimation Theory 150 / 161

Classical Estimation
Bayesian Estimation

Bayesian philosophy

Principle
The unknown parameter θ is modeled as a random variable Θ

Assumption
We know the joint p.d.f. fX,Θ (x, θ) as the product of
1 A prior distribution fΘ (θ) for the unknown random variable Θ

2 A conditional distribution fX|Θ (x|θ) for the random process X

fΘ (θ) encodes our prior knowledge on the unknown parameter.

Using prior knowledge can lead to more accurate estimators.

Frédéric Lehmann Introduction to Probability and Estimation Theory 151 / 161

Classical Estimation
Bayesian Estimation

Cost functions

Notion of error and associated cost function

Consider
a particular realization θ of Θ
an estimate θ̂ corresponding to the estimator Θ̂, obtained for
a particular realization x of X
We let = θ − θ̂ be the error of the estimate. A deterministic
function C() is termed a cost function.

Typical cost functions (cost is low when error is small)

1 Quadratic error: C() = 2

2 Absolute error: C() = ||
3 Hit-or-miss error: C() = 0 if || < δ, C() = 1 if || ≥ δ
Frédéric Lehmann Introduction to Probability and Estimation Theory 152 / 161
Classical Estimation
Bayesian Estimation

Typical cost functions

C( C( | | C(

Quadratic error Absolute error Hit-or-miss error

Frédéric Lehmann Introduction to Probability and Estimation Theory 153 / 161

Classical Estimation
Bayesian Estimation

Bayes risk

Definition
R(Θ̂) = E[C(Θ − Θ̂)], which depends on the choice of the
estimator Θ̂

Expression
Since C(Θ − Θ̂) depends on both r.v.s Θ and Θ̂, its
expectation is with respect to the joint p.d.f. fX,Θ (x, θ)
Z Z
We obtain R(Θ̂) = C(θ − θ̂)fX,Θ (x, θ)dθdx

Note that R(Θ̂) cannot be a function of θ, since this variable

is integrated out (contrary to mse(Θ̂) in classical estimation).

Frédéric Lehmann Introduction to Probability and Estimation Theory 154 / 161

Classical Estimation
Bayesian Estimation

Estimator construction

Find the estimator Θ̂ that minimizes the Bayes risk R(Θ̂)

1 From Bayes’
Z Ztheorem rewrite the Bayes
risk as:
R(Θ̂) = C(θ − θ̂)fΘ|X (θ|x)dθ fX (x)dx, where
Z
f (x,θ)
fX (x) = fX,Θ (x, θ)dθ and fΘ|X (θ|x) = X,ΘfX (x)

2 Since fX (x) ≥ 0, ∀x, the Bayes risk R(Θ̂) will be minimized if

we can Zminimize the inner integral
g(θ̂) = C(θ − θ̂)fΘ|X (θ|x)dθ, ∀x

Only requirement: the posterior p.d.f. of Θ given x, fΘ|X (θ|x)

Frédéric Lehmann Introduction to Probability and Estimation Theory 155 / 161

Classical Estimation
Bayesian Estimation

Find the optimal estimates

General case
∂g(θ̂)
Solve =0
∂ θ̂

Frédéric Lehmann Introduction to Probability and Estimation Theory 156 / 161

Classical Estimation
Bayesian Estimation

Optimal estimates for the typical cost functions

For the quadratic cost function

1 Cost function: C() = 2
2 Bayes risk: Z Z
2
R(Θ̂) = E[|Θ − Θ̂| ] = |θ − θ̂|2 fX,Θ (x, θ)dθdx
Z
3 Function to be minimized: g(θ̂) = |θ − θ̂|2 fΘ|X (θ|x)dθ, ∀x
4 Minimum mean square error (MMSE) estimate: Z
mean of the posterior p.d.f., θ̂ = E[Θ|x] = θfΘ|X (θ|x)dθ

Frédéric Lehmann Introduction to Probability and Estimation Theory 157 / 161

Classical Estimation
Bayesian Estimation

Optimal estimates for the typical cost functions

For the absolute cost function

1 Cost function: C() = ||
Z Z
2 Bayes risk: R(Θ̂) = E[|Θ − Θ̂|] = |θ − θ̂|fX,Θ (x, θ)dθdx
Z
3 Function to be minimized: g(θ̂) = |θ − θ̂|fΘ|X (θ|x)dθ, ∀x
4 Minimum mean absolute error estimate:
median of the posterior p.d.f., such that
R θ̂ R +∞
−∞ fΘ|X (θ|x)dθ = θ̂ fΘ|X (θ|x)dθ

Frédéric Lehmann Introduction to Probability and Estimation Theory 158 / 161

Classical Estimation
Bayesian Estimation

Optimal estimates for the typical cost functions

For the hit-or-miss cost function

1 Cost function: C() = 0 if || < δ, C() = 1 if || ≥ δ, for
arbirary small δ
2 Bayes risk: Z Z
R(Θ̂) = E[C(Θ − Θ̂)] = C(θ − θ̂)fX,Θ (x, θ)dθdx
Z θ̂+δ
3 Function to be minimized: g(θ̂) = 1 − fΘ|X (θ|x)dθ, ∀x
θ̂−δ
4 Maximum a posteriori (MAP) estimate:
mode of the posterior p.d.f., θ̂ = arg max fΘ|X (θ|x)
θ

Frédéric Lehmann Introduction to Probability and Estimation Theory 159 / 161

Classical Estimation
Bayesian Estimation

Optimal estimates for the typical cost functions

f |X ( |x) C( C( C( | |

mean
mode median

f |X ( |x) C( C( C( | |

mode = mean = median

Frédéric Lehmann Introduction to Probability and Estimation Theory 160 / 161

Classical Estimation
Bayesian Estimation

End

Thank you for your attention !

Frédéric Lehmann Introduction to Probability and Estimation Theory 161 / 161

ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
79 pages
Probability Review
No ratings yet
Probability Review
5 pages
ML DL AI Cheatsheet
No ratings yet
ML DL AI Cheatsheet
52 pages
All Cheat Shests 1749903425
No ratings yet
All Cheat Shests 1749903425
3 pages
Stats ch1
No ratings yet
Stats ch1
22 pages
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
No ratings yet
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
21 pages
Learning Material - ITC
No ratings yet
Learning Material - ITC
96 pages
Probability-The Science of Uncertainty and Data
No ratings yet
Probability-The Science of Uncertainty and Data
4 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
No ratings yet
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
132 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
AI ML Cheatsheet
No ratings yet
AI ML Cheatsheet
51 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
STA247
No ratings yet
STA247
27 pages
Module 1 (3)
No ratings yet
Module 1 (3)
12 pages
PTSP
No ratings yet
PTSP
101 pages
Module 1 Lecture 4-Probability Distributions
No ratings yet
Module 1 Lecture 4-Probability Distributions
39 pages
Probability Review
No ratings yet
Probability Review
12 pages
15ma301 U1
No ratings yet
15ma301 U1
37 pages
gsm-199-prev
No ratings yet
gsm-199-prev
25 pages
Probability
No ratings yet
Probability
73 pages
stochbasics_handout
No ratings yet
stochbasics_handout
36 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Lecture-1
No ratings yet
Lecture-1
87 pages
Probability Theory 2013
No ratings yet
Probability Theory 2013
61 pages
18MAB204T - Module-1-Lecture Notes
No ratings yet
18MAB204T - Module-1-Lecture Notes
39 pages
Probability Theory: J.M. Steele
No ratings yet
Probability Theory: J.M. Steele
9 pages
PSQT Notes Co1
No ratings yet
PSQT Notes Co1
7 pages
MAT 283_live-4
No ratings yet
MAT 283_live-4
56 pages
Module A
No ratings yet
Module A
43 pages
On Probability Theory &stochastic Process
No ratings yet
On Probability Theory &stochastic Process
101 pages
Probability and Random Processes 2023
No ratings yet
Probability and Random Processes 2023
43 pages
1-ProbabilityReview v3
No ratings yet
1-ProbabilityReview v3
116 pages
Lectures Ma 2203
No ratings yet
Lectures Ma 2203
209 pages
PTSP PPT
No ratings yet
PTSP PPT
74 pages
Prob__q (1)
No ratings yet
Prob__q (1)
2 pages
Slides-Sksk
100% (1)
Slides-Sksk
151 pages
Slides 11 09 PDF
No ratings yet
Slides 11 09 PDF
105 pages
4b_ProbabilityNotes
No ratings yet
4b_ProbabilityNotes
79 pages
Probability Review
No ratings yet
Probability Review
29 pages
CH 1
No ratings yet
CH 1
107 pages
Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy
No ratings yet
Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy
184 pages
Notes
No ratings yet
Notes
97 pages
3_prob-review
No ratings yet
3_prob-review
77 pages
MA-2203: Introduction To Probability and Statistics: Lectures Notes
No ratings yet
MA-2203: Introduction To Probability and Statistics: Lectures Notes
27 pages
Week - 1
No ratings yet
Week - 1
53 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
CalTech Lecture On Probability
No ratings yet
CalTech Lecture On Probability
14 pages
Random Signals: 2.1 Introduction To Random Sequences, Detection, and Estimation 2.1.1 Events and Probability
No ratings yet
Random Signals: 2.1 Introduction To Random Sequences, Detection, and Estimation 2.1.1 Events and Probability
48 pages
Distribution Theory Questionnaire
No ratings yet
Distribution Theory Questionnaire
3 pages
Probability Preamble
No ratings yet
Probability Preamble
5 pages
Ch1
No ratings yet
Ch1
17 pages
Probability Theory
No ratings yet
Probability Theory
6 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
29 pages
Exercise 1 12 10 2023
No ratings yet
Exercise 1 12 10 2023
37 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Studies in the Theory of Random Processes
From Everand
Studies in the Theory of Random Processes
A. V. Skorokhod
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Ordinary Differential Equations and Stability Theory: An Introduction
From Everand
Ordinary Differential Equations and Stability Theory: An Introduction
David A. Sanchez
No ratings yet
Variance and Standard Deviation: Scott She Eld
No ratings yet
Variance and Standard Deviation: Scott She Eld
20 pages
Chapter 4 Mathematical Expectation - Part 1 0
No ratings yet
Chapter 4 Mathematical Expectation - Part 1 0
14 pages
How Do Expectations Shape Consumer Satisfaction? An Empirical Study On Knowledge Products
No ratings yet
How Do Expectations Shape Consumer Satisfaction? An Empirical Study On Knowledge Products
20 pages
Unit-10 IGNOU STATISTICS
No ratings yet
Unit-10 IGNOU STATISTICS
31 pages
Lecture Notes in Financial Econometrics (MSC Course) : Paul Söderlind 13 June 2013
No ratings yet
Lecture Notes in Financial Econometrics (MSC Course) : Paul Söderlind 13 June 2013
348 pages
Report: Expected Value 100%: Ques On 11
No ratings yet
Report: Expected Value 100%: Ques On 11
4 pages
Level 3 Comp PROBABILITY PDF
100% (1)
Level 3 Comp PROBABILITY PDF
30 pages
Ncert Solutions Class 12
No ratings yet
Ncert Solutions Class 12
20 pages
Full Essentials of Modern Business Statistics With Microsoft Excel 8th Edition David Anderson Ebook All Chapters
100% (12)
Full Essentials of Modern Business Statistics With Microsoft Excel 8th Edition David Anderson Ebook All Chapters
53 pages
1358961Fundamentals Of Probability With Stochastic Processes Solution Manual 4th Edition Ghahramani download
No ratings yet
1358961Fundamentals Of Probability With Stochastic Processes Solution Manual 4th Edition Ghahramani download
79 pages
CHAPTER 15: Decision Theory: Business Statistics in Practice, Third Canadian Edition
No ratings yet
CHAPTER 15: Decision Theory: Business Statistics in Practice, Third Canadian Edition
5 pages
Notes and Correspondence Plotting Positions in Extreme Value Analysis
No ratings yet
Notes and Correspondence Plotting Positions in Extreme Value Analysis
7 pages
Las Once Mil Vergas - Guillaume Apollinaire
No ratings yet
Las Once Mil Vergas - Guillaume Apollinaire
11 pages
Optimal Layout of Container Yards
No ratings yet
Optimal Layout of Container Yards
21 pages
Assignment 13 (B) (XII)
No ratings yet
Assignment 13 (B) (XII)
5 pages
B. Theoretical Probability, Frequency and Expected Outcomes PDF
No ratings yet
B. Theoretical Probability, Frequency and Expected Outcomes PDF
10 pages
DECISION TREE - Worked Example
No ratings yet
DECISION TREE - Worked Example
2 pages
DOC-20250420-WA0010.
No ratings yet
DOC-20250420-WA0010.
12 pages
Assignment Problems
No ratings yet
Assignment Problems
31 pages
Reading 4 Probability Trees and Conditional Expectations - Answers
No ratings yet
Reading 4 Probability Trees and Conditional Expectations - Answers
11 pages
IFoA Syllabus 2019-2017
No ratings yet
IFoA Syllabus 2019-2017
201 pages
Eee231 PDF
No ratings yet
Eee231 PDF
15 pages
COMP 2804 - Assignment 4
No ratings yet
COMP 2804 - Assignment 4
10 pages
MS9110 - Quantitative Techniques
No ratings yet
MS9110 - Quantitative Techniques
7 pages
Pre-Accident Investigations. ISBN 1409447820, 978-1409447825
100% (30)
Pre-Accident Investigations. ISBN 1409447820, 978-1409447825
23 pages
Model Predictive Path Integral Control Using Covariance Variable Important Sampling
No ratings yet
Model Predictive Path Integral Control Using Covariance Variable Important Sampling
9 pages
Probability and Tree Diagrams: Wjec Mathematics
No ratings yet
Probability and Tree Diagrams: Wjec Mathematics
18 pages
MDM4U Course Outline
No ratings yet
MDM4U Course Outline
2 pages
M.sc. Mathematics - Syllabus
No ratings yet
M.sc. Mathematics - Syllabus
24 pages
Jason Z. Gao Zgao@math - Carleton.ca
No ratings yet
Jason Z. Gao Zgao@math - Carleton.ca
3 pages