0% found this document useful (0 votes)

3 views

CS115 Probability 2

This document is a review of probability theory for a computer science course at the University of Information Technology, covering key concepts such as probability models, discrete and continuous random variables, and maximum likelihood estimation. It includes definitions, properties, and examples related to probability measures, random variables, expectation, variance, and independence. The content is primarily sourced from established academic references, providing a structured overview of fundamental probability concepts.

Uploaded by

Ramtin Karbaschi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

CS115 Probability 2

Uploaded by

Ramtin Karbaschi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Probability Theory

Review

Faculty of Computer Science

University of Information Technology (UIT)
Vietnam National University - Ho Chi Minh City (VNU-HCM)

Maths for Computer Science, Fall 2020

(University of Information Technology (UIT)) Math for Computer Science CS115 1 / 46

References

The contents of this document are taken mainly from the follow sources:
John Tsitsiklis. Massachusetts Institute of Technology. Introduction
to Probability.1
Marek Rutkowski. University of Sydney. Probability Review.2
https://www.probabilitycourse.com/

1
https://ocw.mit.edu/resources/
res-6-012-introduction-to-probability-spring-2018/index.htm
2
http:
//www.maths.usyd.edu.au/u/UG/SM/MATH3075/r/Slides_1_Probability.pdf
(University of Information Technology (UIT)) Math for Computer Science CS115 2 / 46
Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 3 / 46

Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 4 / 46

Sample Space

List (set) of all possible states of the world, Ω. The states are called
samples or elementary events.
List (set) of possible outcomes, Ω.
List must be:
Mutually exclusive
Collectively exhaustive
At the “right” granularity
The sample space Ω is either countable or uncountable.

(University of Information Technology (UIT)) Math for Computer Science CS115 5 / 46

Probability

A discrete sample space Ω = (ωk )k∈I , where the set I is countable.

Definition (Probability)
A map P : Ω 7→ [0, 1] is called a probability on a discrete sample space Ω
if the following conditions are satisfied:
P (ωk ) ≥ 0 for all k ∈ I
P
k∈I P (ωk ) = 1

(University of Information Technology (UIT)) Math for Computer Science CS115 6 / 46

Probability Measure
Let F = 2Ω be the set of all subsets of the sample space Ω.
F contains the empty set ∅ and Ω.
Any set A ∈ F is called an event (or a random event).
The set F is called the event space.
Probability is assigned to events.

Definition (Probability Measure)

A map P : F 7→ [0, 1] is called a probability measure on (Ω, F) if
For any sequence Ai ∈ F, i = 1, 2, . . . of events such that
Ai ∩ Aj = ∅ for all i 6= j we have
∞
X
P (∪∞
i=1 Ai ) = P (Ai )
i=1

P (Ω) = 1
(University of Information Technology (UIT)) Math for Computer Science CS115 7 / 46
Probability Measure

A probability P : Ω 7→ [0, 1] on a discrete sample space Ω uniquely

specifies probability of all events Ak = {ωk }.
P ({ωk }) = P (ωk ) = pk .

Theorem
Let P : Ω 7→ [0, 1] be a probability on a discrete sample space Ω. Then the
unique probability measure on (Ω, F) generated by P satisfies for all
A∈F X
P (A) = P (ωk )
ωk ∈A

(University of Information Technology (UIT)) Math for Computer Science CS115 8 / 46

Some properties of probability

If A ⊂ B, then P (A) ≤ P (B).

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (A ∪ B) ≤ P (A) + P (B)
P (A ∪ B ∪ C) = P (A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C)

(University of Information Technology (UIT)) Math for Computer Science CS115 9 / 46

Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 10 / 46

Random Variables

A random variable associates a value (a number) to every possible

outcome.
It can take discrete or continuous values.

Definition (Discrete Random Variable)

A real-valued map X : Ω 7→ R on a discrete sample space Ω = (ωk )k∈I ,
where the set I is countable, is called a discrete random variable.

Notation
Random variable X Numerical value x

Different random variables can be defined on the same sample space.

A function of one or several random variables is also a r.v.

(University of Information Technology (UIT)) Math for Computer Science CS115 11 / 46

Probability Mass Function (pmf)

Probability mass function (pmf) of a discrete random variable X.

It is the “probability law” or “probability distribution” of X.
If we fix some x, then “X = x” is an event.

Definition
pX (x) = P (X = x) = P ({ω ∈ Ω s.t. X(ω) = x})

Properties
pX (x) ≥ 0
P
x pX (x) = 1

(University of Information Technology (UIT)) Math for Computer Science CS115 12 / 46

Expectation

Example: Play a game 1000 times. Random gain at each game is

described by

1, with probability 2/10 ∼ 200

X = 2, with probability 5/10 ∼ 500

4, with probability 3/10 ∼ 300


0.5
0.4 “Average” gain:
0.3
pX (x)

1 · 200 + 2 · 500 + 4 · 300

0.2 = 2.4
1000
0.1 P
0.0 1 Definition: E[X] = xpX (x)
2 3 4 x
x

(University of Information Technology (UIT)) Math for Computer Science CS115 13 / 46

Expectation

P
E[X] = xpX (x)
x
E(·) is called the expectation operator.
Average in a large number of independence experiments.
Expectation of a r.v. can be seen as the weighted average.
It is impossible to know the exact event to happen in the future and
thus expectation is useful in making decisions when the probabilities
of future outcomes are known.
Any random variable defined on a finite set Ω admits the expectation.
P
When the set Ω is countable but infinite, we need |x|pX (x) < ∞
x
so that E[X] is well-defined.

(University of Information Technology (UIT)) Math for Computer Science CS115 14 / 46

Expectation

Definition
The expectation (expected value or mean value) of a random variable
X on a discrete sample space Ω is given by
X X
EP (X) = µ := X(ωk )P (ωk ) = xk p k
k∈I k∈I

where P is a probability measure on Ω.

Definition
The expectation (expected value or mean value) of a discrete random
variable X with range RX = {x1 , x2 , x3 , . . .} (finite or countably infinite)
is defined as
X X
E(X) = µ := xk P (X = xk ) = xk PX (xk )
xk ∈RX xk ∈RX

(University of Information Technology (UIT)) Math for Computer Science CS115 15 / 46

Elementary Properties of Expectation

If X ≥ 0 then E[X] ≥ 0.
If a ≤ X ≤ b then a ≤ E[X] ≤ b.
If c is a constant, E[c] = c.

(University of Information Technology (UIT)) Math for Computer Science CS115 16 / 46

Expected value rule, to compute E[g(X)]

If X is a r.v. and Y = g(X), then Y itself is a r.v.

Average over y: X
E[Y ] = ypY (y)
y

Average over x:

Theorem (Law of the unconscious statistician (LOTUS))

X
E[Y ] = E[g(X)] = g(x)pX (x)
x

E[X 2 ] = x2 pX (x).
P
x
Caution: In general, E[g(X)] 6= g(E[X]).

(University of Information Technology (UIT)) Math for Computer Science CS115 17 / 46

Linearity of Expectation

Theorem
E[aX + b] = aE[X] + b

(University of Information Technology (UIT)) Math for Computer Science CS115 18 / 46

Variance
Variance is a measure of the spread of a random variable about its
mean and also a measure of uncertainty.
0.4

0.3
pX (x)

0.2

0.1

0.0 1 2 3 4 5 6 7 8 9 10
x
R.v. X with µ = E[X]. Average distance from the mean?

E [X − µ] = E [X] − µ = µ − µ = 0

(University of Information Technology (UIT)) Math for Computer Science CS115 19 / 46

Variance

Variance is a measure of the spread of a random variable about its

mean and also a measure of uncertainty.
R.v. X with µ = E[X]. Average distance from the mean?

E [X − µ] = E [X] − µ = µ − µ = 0

Average of the squared distance from the mean.

Definition (Variance)
The variance of a random variable X on a discrete sample space Ω is
defined as
V ar(X) = σ 2 = E P [(X − µ)2 ],
where P is a probability measure on Ω.

(University of Information Technology (UIT)) Math for Computer Science CS115 20 / 46

Variance

V ar(X) = σ 2 = E [(X − µ)2 ]

P
To calculate, use the expected value rule, E[g(X)] = x g(x)pX (x)
X
V ar(X) = E [g(X)] = (x − µ)2 pX (x)
x

Variance is non-negative: V ar(X) = σ 2 ≥ 0.

V ar(X) = 0 iff X is deterministic.

Definition (Standard Deviation)

The standard deviation of a random variable X is defined as
p
SD(X) = σX = V ar(X)

(University of Information Technology (UIT)) Math for Computer Science CS115 21 / 46

Properties of the variance

Theorem
For a random variable X and real numbers a and b,

V ar(aX + b) = a2 V ar(X)

Notation µ = E[X]
Let Y = X + b, γ = E[Y ] = µ + b.

V ar(Y ) = E[(Y −γ)2 ] = E[(X+b−(µ+b))2 ] = E[(X−µ)2 ] = V ar(X)

Let Y = aX, γ = E[Y ] = aµ

V ar(Y ) = E[(aX − aµ)2 ] = E[a2 (X − µ)2 ]

= a2 E[(X − µ)2 ] = a2 V ar(X)

(University of Information Technology (UIT)) Math for Computer Science CS115 22 / 46

Properties of the variance

Computational formula for the variance

V ar(X) = E(X 2 ) − [E(X)]2

V ar(X) = E[(X − µ)2 ]

= E[X 2 − 2µX + µ2 ]
= E[X 2 ] − 2µE[X] + µ2
= E[X 2 ] − (E[X])2

(University of Information Technology (UIT)) Math for Computer Science CS115 23 / 46

Independence and Expectation

In general: E [g(X, Y )] 6= g(E [X], E [Y ])

Exceptions:
E [aX + b] = aE [X] + b E [X + Y + Z] = E [X] + E [Y ] + E [Z]

Theorem
If X, Y are independent: E [X, Y ] = E [X]E [Y ],
g(X) and h(Y ) are also independent: E [g(X), h(Y )] = E [g(X)]E [h(Y )]

(University of Information Technology (UIT)) Math for Computer Science CS115 24 / 46

Independence and Variances

Always true: Var (aX) = a2 Var (X) Var (X + a) = Var (X)

In general: Var (X + Y ) 6= Var (X) + Var (Y )
However

Theorem
If X, Y are independent: Var (X + Y ) = Var (X) + Var (Y )

Proof.
Assume E [X] = E [Y ] = 0 E [XY ] = E [X]E [Y ] = 0.

Var (X + Y ) = E [(X + Y )2 ] = E [X 2 + 2XY + Y 2 ]

= E [X 2 ] + 2E [XY ] + E [Y 2 ] = Var (X) + V ar(Y )

(University of Information Technology (UIT)) Math for Computer Science CS115 25 / 46

Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 26 / 46

Bernoulli Random Variables

A Bernoulli r.v. X takes two possible values, usually 0 and 1,

modeling random experiments that have two possible outcomes (e.g.,
“success” and “failure”).
e.g., tossing a coin. The outcome is either Head or Tail.
e.g., taking an exam. The result is either Pass or Fail.
e.g., classifying images. An image is either Cat or Non-cat.

(University of Information Technology (UIT)) Math for Computer Science CS115 27 / 46

Bernoulli Random Variables

Definition
A random variable X is a Bernoulli random variable with parameter
p ∈ [0, 1], written as X ∼ Bernoulli(p) if its PMF is given by
(
p, for x = 1
PX (x) =
1 − p, for x = 0.
1.0
0.8
p
0.6
pX (x)

0.4
1−p
0.2
0.0 0 1
x
(University of Information Technology (UIT)) Math for Computer Science CS115 28 / 46
Bernoulli & Indicator Random Variables
A Bernoulli r.v. X with parameter p ∈ [0, 1] can also be described as
(
1 with probability p
X=
0 with probability 1 − p
A Bernoulli r.v. is associated with a certain event A. If event A
occurs, then X = 1; otherwise, X = 0.
Bernoulli r.v. is also called the indicator random variable of an event.
Definition
The indicator random variable of an event A is defined by
(
1 if the event A occurs
IA =
0 otherwise

The indicator r.v. for an event A has Bernoulli distribution with parameter
p = P (IA = 1) = PIA (1) = P (A). We can write IA ∼ Bernoulli((P (A)).
(University of Information Technology (UIT)) Math for Computer Science CS115 29 / 46
Discrete Uniform Random Variables

Parameters: integer a, b; a≤b

Experiment: Pick one of a, a + 1, . . . , b at random; all equally likely.
Sample space: {a, a + 1, . . . , b}
Random variable X : X(ω) = ω
b − a + 1 possible values, PX (x) = 1/(b − a + 1) for each value.
Model of: complete ignorance.

1
pX (x)

b−a+1

a a+1 b
x
(University of Information Technology (UIT)) Math for Computer Science CS115 30 / 46
Binomial Random Variables
Parameters: Probability p ∈ [0, 1], positive integer n.
Experiment: e.g., n independent tosses of a coin with P (Head) = p
Sample space: Set of sequences of H and T of length n
Random variable X : number of Heads observed.
Model of: number of successes in a given number of independent
trials.

Examples

PX (2) = P (X = 2)
= P (HHT) + P (HTH) + P (THH)
= 3p2 (1 − p)

3 2
= p (1 − p)
2
(University of Information Technology (UIT)) Math for Computer Science CS115 31 / 46
Binomial Random Variables
n = 10, p = 0.5 n = 100, p = 0.5
0.25 0.08
0.20 0.06
0.15
pX (x)

pX (x)
0.04
0.10
0.05 0.02

0.00 0 2 4 6 8 10 0.00 0 20 40 60 80 100

x x
n = 10, p = 0.1 n = 100, p = 0.1
0.4
0.125
0.3 0.100
0.075
pX (x)

0.2 pX (x)
0.050
0.1
0.025
0.0 0 2 4 6 8 10 0.000 0 20 40 60 80 100
x x

(University of Information Technology (UIT)) Math for Computer Science CS115 32 / 46

Binomial Random Variables

Let Ω = {0, 1, 2, . . . , n} be the sample space and let X be the

number of successes in n independent trials where p is the probability
of success in a single Bernoulli trial.
The probability measure P is called the binomial distribution if

n k
PX (k) = p (1 − p)n−k for k = 0, 1, . . . , n
k

where
n n!
=
k k!(n − k)!
Then
E[X] = np and V ar(X) = np(1 − p)

(University of Information Technology (UIT)) Math for Computer Science CS115 33 / 46

Geometric Random Variables
Parameters: Probability p ∈ (0, 1].
Experiment: infinitely many independent tosses of a coin;
P (Head) = p.
Sample space: Set of infinite sequences of H and T.
Random variable X : number of tosses until the first Head.
Model of: waiting times, number of trials until a success.
PX (k) = P (X = k) = P (T . . T} H) = (1 − p)k−1 p
| .{z
k−1
p = 0.2
0.20 p
(1 − p)p
0.15
(1 − p) 2 p
pX (x)

0.10

0.05

0.00 1 2 3 4 5 6 7 8 9 1011121314151617181920
x
(University of Information Technology (UIT)) Math for Computer Science CS115 34 / 46
Geometric Random Variables

Let Ω = {1, 2, 3, . . .} be the sample space and X be the number of

independent trials to achieve the first success.
Let p stand for the probability of a success in a single trial.
The probability measure P is called the geometric distribution if

PX (k) = (1 − p)k−1 p for k = 1, 2, 3 . . .

Then
1 1−p
E[X] = and V ar(X) =
p p2

(University of Information Technology (UIT)) Math for Computer Science CS115 35 / 46

Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 36 / 46

Continuous Random Variables

Definition
A random variable X on the sample space Ω is said to have a continuous
distribution if there exists a real-valued function f such that

f (x) ≥ 0,
Z ∞
f (x) dx = 1,
−∞

and for all real numbers a < b:

Z b
P (a ≤ X ≤ b) = f (x) dx.
a

Then f : R 7→ R+ is called the probability density function (PDF) of a

continuous random variable X.

(University of Information Technology (UIT)) Math for Computer Science CS115 37 / 46

Probability Density Function (PDF)
0.4
Probability Mass Function (PMF)

0.3
pX(x)

0.2

0.1

0.0
0 1 2 3 4 5 6
x

X
P (a ≤ X ≤ b) = pX (x)
x: a≤x≤b
pX (x) ≥ 0
X
pX (x) = 1
x

(University of Information Technology (UIT)) Math for Computer Science CS115 38 / 46

Probability Density Function (PDF)
0.4
Probability Mass Function (PMF) Probability density function (PDF)
0.5

0.3 0.4

0.3
pX(x)

0.2

fX(x)
0.2
0.1
0.1

0.0
0 1 2 3 4 5 6 0.0
x a b
x
X
P (a ≤ X ≤ b) = pX (x) Z b
x: a≤x≤b P (a ≤ X ≤ b) = fX (x) dx
a
pX (x) ≥ 0
X fX (x) ≥ 0
pX (x) = 1 Z ∞
x fX (x) dx = 1
−∞

(University of Information Technology (UIT)) Math for Computer Science CS115 38 / 46

Probability Density Function (PDF)
0.5
Probability Density Function (PDF)
0.4

0.3

fX(x)
0.2

0.1

0.0
a a+
x
Z b
P (a ≤ X ≤ b) = fX (x) dx
a

δ > 0, small
P (a ≤ X ≤ a + δ) ≈ fX (a).δ
P (X = a) = 0
Just like, a single point has zero length.
But, a set of lots of points has a positive length.
(University of Information Technology (UIT)) Math for Computer Science CS115 39 / 46
Standard Normal (Gaussian) Random Variable N (0, 1)

1.0

0.8

0.6
fX (x)

x 2 /2
0.4

0.2

0.0
3 2 1 0 1 2 3
x

(University of Information Technology (UIT)) Math for Computer Science CS115 40 / 46

Standard Normal (Gaussian) Random Variable N (0, 1)

1.0

0.8

0.6 x 2 /2
fX (x)

e −x 2 /2
0.4

0.2

0.0
3 2 1 0 1 2 3
x

(University of Information Technology (UIT)) Math for Computer Science CS115 40 / 46

Standard Normal (Gaussian) Random Variable N (0, 1)

1.0

0.8
x 2 /2
0.6
fX (x)

e −x 2 /2
1 e −x 2 /2
0.4 p
2π

0.2

0.0
3 2 1 0 1 2 3
x

Z ∞
2 /2 √
e−x dx = 2π
−∞
1 2
fX (x) = √ e−x /2
2π

(University of Information Technology (UIT)) Math for Computer Science CS115 40 / 46

General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8

0.6
(x − µ) 2 /2σ 2
0.4

0.2

0.0
6 4 2 0 2 4 6 8 10
x

(University of Information Technology (UIT)) Math for Computer Science CS115 41 / 46

General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8

0.6 (x − µ) 2 /2σ 2
e −(x − µ)2 /2σ2
0.4

0.2

0.0
6 4 2 0 2 4 6 8 10
x

(University of Information Technology (UIT)) Math for Computer Science CS115 41 / 46

General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8
(x − µ) 2 /2σ 2
0.6 e −(x − µ)2 /2σ2
1 e −(x − µ) 2 /2σ 2
0.4 p
σ 2π

0.2

0.0
6 4 2 0 2 4 6 8 10
x

1 2 2
fX (x) = √ e−(x−µ) /2σ
σ 2π
E [X] = µ
Var (X) = σ 2

(University of Information Technology (UIT)) Math for Computer Science CS115 41 / 46

General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8
σ = 0.5
0.6 σ=1
fX (x)

σ=2
0.4 σ=3
0.2

0.0
6 4 2 0 2 4 6 8 10
x
Smaller σ, narrower PDF.
Let Y = aX + b N ∼ N (µ, σ 2 )
Then, E [Y ] = aX + b Var (Y ) = a2 σ 2 (always true)
But also, Y ∼ N (aµ + b, a2 σ 2 )

(University of Information Technology (UIT)) Math for Computer Science CS115 42 / 46

Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 43 / 46

Example

A bag contains 3 balls, each ball is either red or blue.

The number of blue balls θ can be 0, 1, 2, 3.

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46

Example

A bag contains 3 balls, each ball is either red or blue.

The number of blue balls θ can be 0, 1, 2, 3.
Choose 4 balls randomly with replacement.

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46

Example

A bag contains 3 balls, each ball is either red or blue.

The number of blue balls θ can be 0, 1, 2, 3.
Choose 4 balls randomly with replacement.
Random variables X1 , X2 , X3 , X4 are defined as
(
1, if the i-th chosen ball is blue
Xi =
0, if the i-th chosen ball is red

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46

Example

A bag contains 3 balls, each ball is either red or blue.

After doing the experiment, the following values for Xi ’s are

observed: x1 = 1, x2 = 0, x3 = 1, x4 = 1.

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46

Example

A bag contains 3 balls, each ball is either red or blue.

After doing the experiment, the following values for Xi ’s are

observed: x1 = 1, x2 = 0, x3 = 1, x4 = 1.
Note that Xi ’s are i.i.d. (independent and identically distributed) and
Xi ∼ Bernoulli( 3θ ). For which value of θ is the probability of the
observed sample is the largest?

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46

Example
(
θ
PXi (x) = 3, for x = 1
θ
1− 3, for x = 0

(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46

Example
(
θ
PXi (x) = 3, for x = 1
θ
1− 3, for x = 0
Xi ’s are independent, the joint PMF of X1 , X2 , X3 , X4 can be written
PX1 X2 X3 X4 (x1 , x2 , x3 , x4 ) = PX1 (x1 )PX2 (x2 )PX3 (x3 )PX4 (x4 )
3
θ θ θ θ θ θ
PX1 X2 X3 X4 (1, 0, 1, 1) = · 1 − · · = 1−
3 3 3 3 3 3

(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46

θ PX1 X2 X3 X4 (1, 0, 1, 1; θ)
0 0
1 0.0247
2 0.0988
3 0

(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46

θ PX1 X2 X3 X4 (1, 0, 1, 1; θ)
0 0
1 0.0247
2 0.0988
3 0
The observed data is most likely to occur for θ = 2.
We may choose θ̂ = 2 as our estimate of θ.
(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46
Maximum Likelihood Estimation (MLE)

Definition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with a
parameter θ.
Given that we have observed X1 = x1 , X2 = x2 , . . . , Xn = xn , a
maximum likelihood estimate of θ, denoted as θ̂M L , is a value of θ that
maximizes the likelihood function

L(x1 , x2 , . . . , xn ; θ)

A maximum likelihood estimator (MLE) of the parameter θ, denoted as

Θ̂M L , is a random variable Θ̂M L = Θ̂(X1 , X2 , . . . , Xn ) whose values
X1 = x1 , X2 = x2 , . . . , Xn = xn is given by θ̂M L .

(University of Information Technology (UIT)) Math for Computer Science CS115 46 / 46

Introduction Statistics Imperial College London
50% (2)
Introduction Statistics Imperial College London
474 pages
lec2 (1)
No ratings yet
lec2 (1)
46 pages
S1) Basic Probability Review
No ratings yet
S1) Basic Probability Review
71 pages
Mathematical Expectation
No ratings yet
Mathematical Expectation
25 pages
BS UNIT 2 Note # 3
No ratings yet
BS UNIT 2 Note # 3
7 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
Lecture Note PDF
No ratings yet
Lecture Note PDF
373 pages
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
No ratings yet
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
23 pages
A3 - Random Variables and Distributions
100% (1)
A3 - Random Variables and Distributions
19 pages
Lecture 3 Prob&Statistics
No ratings yet
Lecture 3 Prob&Statistics
50 pages
Lecture 1: Introduction and Review of Prerequisite Concepts: DR Jay Lee Jay - Lee@unsw - Edu.au
No ratings yet
Lecture 1: Introduction and Review of Prerequisite Concepts: DR Jay Lee Jay - Lee@unsw - Edu.au
33 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
Financial Engineering & Risk Management: Review of Basic Probability
No ratings yet
Financial Engineering & Risk Management: Review of Basic Probability
46 pages
Lecture 2 ML_Maths
No ratings yet
Lecture 2 ML_Maths
80 pages
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
Class 4
No ratings yet
Class 4
9 pages
Math 5846 Chapter 2
No ratings yet
Math 5846 Chapter 2
102 pages
HW Solution 7 - Due: Oct 25, 5 PM
No ratings yet
HW Solution 7 - Due: Oct 25, 5 PM
64 pages
Learning Material - ITC
No ratings yet
Learning Material - ITC
96 pages
Probability Review
No ratings yet
Probability Review
12 pages
Tutsheet 5
No ratings yet
Tutsheet 5
2 pages
Ma2261 Probability And Random Processes: ω: X (ω) ≤ x x ∈ R
No ratings yet
Ma2261 Probability And Random Processes: ω: X (ω) ≤ x x ∈ R
17 pages
Discrete Probability Distribution Chapter3
No ratings yet
Discrete Probability Distribution Chapter3
47 pages
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-01-22_Reference-Material-I
No ratings yet
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-01-22_Reference-Material-I
32 pages
Unit 2 Ma 202
No ratings yet
Unit 2 Ma 202
13 pages
Basic Probability Exam Packet
No ratings yet
Basic Probability Exam Packet
44 pages
A Short Tutorial On Blind Source Separation: Fabian J. Theis
No ratings yet
A Short Tutorial On Blind Source Separation: Fabian J. Theis
102 pages
BMA2102 Probability and Statistics II Lecture 1
No ratings yet
BMA2102 Probability and Statistics II Lecture 1
15 pages
محتوى مقرر Probability and Statistics 5
No ratings yet
محتوى مقرر Probability and Statistics 5
196 pages
Estimation Theory Presentation
100% (2)
Estimation Theory Presentation
66 pages
Tut 07
No ratings yet
Tut 07
3 pages
Topic 5 Discrete Distributions
No ratings yet
Topic 5 Discrete Distributions
30 pages
Session3 PSQT DKJ
No ratings yet
Session3 PSQT DKJ
83 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
33 pages
Chapter 3 of Probability & Statistics For Engineers & Scientists Course
No ratings yet
Chapter 3 of Probability & Statistics For Engineers & Scientists Course
26 pages
Chương III
No ratings yet
Chương III
6 pages
CL202: Introduction To Data Analysis: MB+SCP
No ratings yet
CL202: Introduction To Data Analysis: MB+SCP
35 pages
stochbasics_handout
No ratings yet
stochbasics_handout
36 pages
Chapter Three
No ratings yet
Chapter Three
12 pages
Assignment-X
No ratings yet
Assignment-X
2 pages
Chapitre 2
No ratings yet
Chapitre 2
6 pages
CS236 Homework 1
100% (1)
CS236 Homework 1
4 pages
Slide 04
No ratings yet
Slide 04
16 pages
output (4)
No ratings yet
output (4)
6 pages
print
No ratings yet
print
12 pages
SST 204 Module
100% (1)
SST 204 Module
84 pages
Small Mathematical Expectation
No ratings yet
Small Mathematical Expectation
6 pages
Module01 PDF
No ratings yet
Module01 PDF
3 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Bivariate Probability Distributions - Part I: Insan TUNALI EC 531 - Econometrics I
No ratings yet
Bivariate Probability Distributions - Part I: Insan TUNALI EC 531 - Econometrics I
50 pages
Information Theory: By: Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu
No ratings yet
Information Theory: By: Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu
32 pages
On The Increase Rate of Random Fields From Space On Unbounded Domains
No ratings yet
On The Increase Rate of Random Fields From Space On Unbounded Domains
14 pages
Mathematical Stats
No ratings yet
Mathematical Stats
99 pages
PDF Unit 4 Random Variable and Probability Distribution
No ratings yet
PDF Unit 4 Random Variable and Probability Distribution
96 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Estimation Bertinoro09 Cristiano Porciani 1
No ratings yet
Estimation Bertinoro09 Cristiano Porciani 1
42 pages
Ch4 - Output Error Method
No ratings yet
Ch4 - Output Error Method
59 pages
VK Malik
No ratings yet
VK Malik
25 pages
Point Estimatiors
No ratings yet
Point Estimatiors
52 pages
Wolpert 2007
No ratings yet
Wolpert 2007
14 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Yang and Rannala 2012 Molecular Phylogenetics.
100% (1)
Yang and Rannala 2012 Molecular Phylogenetics.
12 pages
Gradient-Based Optimization of Dag-Penalized Likelihood For Learning Linear Dag Models (GOLEM)
No ratings yet
Gradient-Based Optimization of Dag-Penalized Likelihood For Learning Linear Dag Models (GOLEM)
23 pages
TS Lecture1 2019
No ratings yet
TS Lecture1 2019
56 pages
(Ebooks PDF) Download An Introduction To IoT Analytics 1st Edition Harry G Perros Full Chapters
100% (4)
(Ebooks PDF) Download An Introduction To IoT Analytics 1st Edition Harry G Perros Full Chapters
62 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
Basic Interview Q's On ML PDF
100% (2)
Basic Interview Q's On ML PDF
243 pages
1 - Binary Dependent Variable Models
No ratings yet
1 - Binary Dependent Variable Models
63 pages
Gagne 2006
No ratings yet
Gagne 2006
21 pages
Random Signals and Processes: Chapter 9: Estimation of A Random Variable
No ratings yet
Random Signals and Processes: Chapter 9: Estimation of A Random Variable
36 pages
Class-Based N-Gram Models of Natural Language
No ratings yet
Class-Based N-Gram Models of Natural Language
13 pages
Application of Newton Raphson Method To Non - Linear Models 1
No ratings yet
Application of Newton Raphson Method To Non - Linear Models 1
11 pages
Seminsr Garch
No ratings yet
Seminsr Garch
14 pages
1 s2.0 S0957417422004535 Main
No ratings yet
1 s2.0 S0957417422004535 Main
11 pages
[Ebooks PDF] download Bayesian Brain Probabilistic Approaches to Neural Coding Computational Neuroscience Kenji Doya full chapters
100% (9)
[Ebooks PDF] download Bayesian Brain Probabilistic Approaches to Neural Coding Computational Neuroscience Kenji Doya full chapters
67 pages
Graphical Models - Learning
No ratings yet
Graphical Models - Learning
20 pages
Report Around:: Written By:Zainab Alaa
100% (1)
Report Around:: Written By:Zainab Alaa
22 pages
CIR Modeling of Interest Rates: Degree Project
No ratings yet
CIR Modeling of Interest Rates: Degree Project
29 pages
Maximum Likelihood Estimators and Least Squares
No ratings yet
Maximum Likelihood Estimators and Least Squares
5 pages
Accident Analysis and Prevention: Farhana Naznin, Graham Currie, David Logan, Majid Sarvi
No ratings yet
Accident Analysis and Prevention: Farhana Naznin, Graham Currie, David Logan, Majid Sarvi
7 pages
Paper 1
No ratings yet
Paper 1
38 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
Growth-Optimal E-Variables and an extension to the multivariate Csiszar-Sanov-Chernoff Theorem
No ratings yet
Growth-Optimal E-Variables and an extension to the multivariate Csiszar-Sanov-Chernoff Theorem
28 pages
POPK Estimation Methods
No ratings yet
POPK Estimation Methods
9 pages
Get Speech Processing A Dynamic and Optimization Oriented Approach 1st Edition Li Deng (Author) PDF ebook with Full Chapters Now
100% (1)
Get Speech Processing A Dynamic and Optimization Oriented Approach 1st Edition Li Deng (Author) PDF ebook with Full Chapters Now
81 pages