0% found this document useful (0 votes)
3 views

CS115 Probability 2

This document is a review of probability theory for a computer science course at the University of Information Technology, covering key concepts such as probability models, discrete and continuous random variables, and maximum likelihood estimation. It includes definitions, properties, and examples related to probability measures, random variables, expectation, variance, and independence. The content is primarily sourced from established academic references, providing a structured overview of fundamental probability concepts.

Uploaded by

Ramtin Karbaschi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

CS115 Probability 2

This document is a review of probability theory for a computer science course at the University of Information Technology, covering key concepts such as probability models, discrete and continuous random variables, and maximum likelihood estimation. It includes definitions, properties, and examples related to probability measures, random variables, expectation, variance, and independence. The content is primarily sourced from established academic references, providing a structured overview of fundamental probability concepts.

Uploaded by

Ramtin Karbaschi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Probability Theory

Review

Faculty of Computer Science


University of Information Technology (UIT)
Vietnam National University - Ho Chi Minh City (VNU-HCM)

Maths for Computer Science, Fall 2020

(University of Information Technology (UIT)) Math for Computer Science CS115 1 / 46


References

The contents of this document are taken mainly from the follow sources:
John Tsitsiklis. Massachusetts Institute of Technology. Introduction
to Probability.1
Marek Rutkowski. University of Sydney. Probability Review.2
https://www.probabilitycourse.com/

1
https://ocw.mit.edu/resources/
res-6-012-introduction-to-probability-spring-2018/index.htm
2
http:
//www.maths.usyd.edu.au/u/UG/SM/MATH3075/r/Slides_1_Probability.pdf
(University of Information Technology (UIT)) Math for Computer Science CS115 2 / 46
Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 3 / 46


Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 4 / 46


Sample Space

List (set) of all possible states of the world, Ω. The states are called
samples or elementary events.
List (set) of possible outcomes, Ω.
List must be:
Mutually exclusive
Collectively exhaustive
At the “right” granularity
The sample space Ω is either countable or uncountable.

(University of Information Technology (UIT)) Math for Computer Science CS115 5 / 46


Probability

A discrete sample space Ω = (ωk )k∈I , where the set I is countable.

Definition (Probability)
A map P : Ω 7→ [0, 1] is called a probability on a discrete sample space Ω
if the following conditions are satisfied:
P (ωk ) ≥ 0 for all k ∈ I
P
k∈I P (ωk ) = 1

(University of Information Technology (UIT)) Math for Computer Science CS115 6 / 46


Probability Measure
Let F = 2Ω be the set of all subsets of the sample space Ω.
F contains the empty set ∅ and Ω.
Any set A ∈ F is called an event (or a random event).
The set F is called the event space.
Probability is assigned to events.

Definition (Probability Measure)


A map P : F 7→ [0, 1] is called a probability measure on (Ω, F) if
For any sequence Ai ∈ F, i = 1, 2, . . . of events such that
Ai ∩ Aj = ∅ for all i 6= j we have

X
P (∪∞
i=1 Ai ) = P (Ai )
i=1

P (Ω) = 1
(University of Information Technology (UIT)) Math for Computer Science CS115 7 / 46
Probability Measure

A probability P : Ω 7→ [0, 1] on a discrete sample space Ω uniquely


specifies probability of all events Ak = {ωk }.
P ({ωk }) = P (ωk ) = pk .

Theorem
Let P : Ω 7→ [0, 1] be a probability on a discrete sample space Ω. Then the
unique probability measure on (Ω, F) generated by P satisfies for all
A∈F X
P (A) = P (ωk )
ωk ∈A

(University of Information Technology (UIT)) Math for Computer Science CS115 8 / 46


Some properties of probability

If A ⊂ B, then P (A) ≤ P (B).


P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (A ∪ B) ≤ P (A) + P (B)
P (A ∪ B ∪ C) = P (A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C)

(University of Information Technology (UIT)) Math for Computer Science CS115 9 / 46


Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 10 / 46


Random Variables

A random variable associates a value (a number) to every possible


outcome.
It can take discrete or continuous values.

Definition (Discrete Random Variable)


A real-valued map X : Ω 7→ R on a discrete sample space Ω = (ωk )k∈I ,
where the set I is countable, is called a discrete random variable.

Notation
Random variable X Numerical value x

Different random variables can be defined on the same sample space.


A function of one or several random variables is also a r.v.

(University of Information Technology (UIT)) Math for Computer Science CS115 11 / 46


Probability Mass Function (pmf)

Probability mass function (pmf) of a discrete random variable X.


It is the “probability law” or “probability distribution” of X.
If we fix some x, then “X = x” is an event.

Definition
pX (x) = P (X = x) = P ({ω ∈ Ω s.t. X(ω) = x})

Properties
pX (x) ≥ 0
P
x pX (x) = 1

(University of Information Technology (UIT)) Math for Computer Science CS115 12 / 46


Expectation

Example: Play a game 1000 times. Random gain at each game is


described by

1, with probability 2/10 ∼ 200

X = 2, with probability 5/10 ∼ 500

4, with probability 3/10 ∼ 300

0.5
0.4 “Average” gain:
0.3
pX (x)

1 · 200 + 2 · 500 + 4 · 300


0.2 = 2.4
1000
0.1 P
0.0 1 Definition: E[X] = xpX (x)
2 3 4 x
x

(University of Information Technology (UIT)) Math for Computer Science CS115 13 / 46


Expectation

P
E[X] = xpX (x)
x
E(·) is called the expectation operator.
Average in a large number of independence experiments.
Expectation of a r.v. can be seen as the weighted average.
It is impossible to know the exact event to happen in the future and
thus expectation is useful in making decisions when the probabilities
of future outcomes are known.
Any random variable defined on a finite set Ω admits the expectation.
P
When the set Ω is countable but infinite, we need |x|pX (x) < ∞
x
so that E[X] is well-defined.

(University of Information Technology (UIT)) Math for Computer Science CS115 14 / 46


Expectation

Definition
The expectation (expected value or mean value) of a random variable
X on a discrete sample space Ω is given by
X X
EP (X) = µ := X(ωk )P (ωk ) = xk p k
k∈I k∈I

where P is a probability measure on Ω.

Definition
The expectation (expected value or mean value) of a discrete random
variable X with range RX = {x1 , x2 , x3 , . . .} (finite or countably infinite)
is defined as
X X
E(X) = µ := xk P (X = xk ) = xk PX (xk )
xk ∈RX xk ∈RX

(University of Information Technology (UIT)) Math for Computer Science CS115 15 / 46


Elementary Properties of Expectation

If X ≥ 0 then E[X] ≥ 0.
If a ≤ X ≤ b then a ≤ E[X] ≤ b.
If c is a constant, E[c] = c.

(University of Information Technology (UIT)) Math for Computer Science CS115 16 / 46


Expected value rule, to compute E[g(X)]

If X is a r.v. and Y = g(X), then Y itself is a r.v.


Average over y: X
E[Y ] = ypY (y)
y

Average over x:

Theorem (Law of the unconscious statistician (LOTUS))


X
E[Y ] = E[g(X)] = g(x)pX (x)
x

E[X 2 ] = x2 pX (x).
P
x
Caution: In general, E[g(X)] 6= g(E[X]).

(University of Information Technology (UIT)) Math for Computer Science CS115 17 / 46


Linearity of Expectation

Theorem
E[aX + b] = aE[X] + b

(University of Information Technology (UIT)) Math for Computer Science CS115 18 / 46


Variance
Variance is a measure of the spread of a random variable about its
mean and also a measure of uncertainty.
0.4

0.3
pX (x)

0.2

0.1

0.0 1 2 3 4 5 6 7 8 9 10
x
R.v. X with µ = E[X]. Average distance from the mean?

E [X − µ] = E [X] − µ = µ − µ = 0

(University of Information Technology (UIT)) Math for Computer Science CS115 19 / 46


Variance

Variance is a measure of the spread of a random variable about its


mean and also a measure of uncertainty.
R.v. X with µ = E[X]. Average distance from the mean?

E [X − µ] = E [X] − µ = µ − µ = 0

Average of the squared distance from the mean.

Definition (Variance)
The variance of a random variable X on a discrete sample space Ω is
defined as
V ar(X) = σ 2 = E P [(X − µ)2 ],
where P is a probability measure on Ω.

(University of Information Technology (UIT)) Math for Computer Science CS115 20 / 46


Variance

V ar(X) = σ 2 = E [(X − µ)2 ]


P
To calculate, use the expected value rule, E[g(X)] = x g(x)pX (x)
X
V ar(X) = E [g(X)] = (x − µ)2 pX (x)
x

Variance is non-negative: V ar(X) = σ 2 ≥ 0.


V ar(X) = 0 iff X is deterministic.

Definition (Standard Deviation)


The standard deviation of a random variable X is defined as
p
SD(X) = σX = V ar(X)

(University of Information Technology (UIT)) Math for Computer Science CS115 21 / 46


Properties of the variance

Theorem
For a random variable X and real numbers a and b,

V ar(aX + b) = a2 V ar(X)

Notation µ = E[X]
Let Y = X + b, γ = E[Y ] = µ + b.

V ar(Y ) = E[(Y −γ)2 ] = E[(X+b−(µ+b))2 ] = E[(X−µ)2 ] = V ar(X)

Let Y = aX, γ = E[Y ] = aµ

V ar(Y ) = E[(aX − aµ)2 ] = E[a2 (X − µ)2 ]


= a2 E[(X − µ)2 ] = a2 V ar(X)

(University of Information Technology (UIT)) Math for Computer Science CS115 22 / 46


Properties of the variance

Computational formula for the variance


V ar(X) = E(X 2 ) − [E(X)]2

V ar(X) = E[(X − µ)2 ]


= E[X 2 − 2µX + µ2 ]
= E[X 2 ] − 2µE[X] + µ2
= E[X 2 ] − (E[X])2

(University of Information Technology (UIT)) Math for Computer Science CS115 23 / 46


Independence and Expectation

In general: E [g(X, Y )] 6= g(E [X], E [Y ])


Exceptions:
E [aX + b] = aE [X] + b E [X + Y + Z] = E [X] + E [Y ] + E [Z]

Theorem
If X, Y are independent: E [X, Y ] = E [X]E [Y ],
g(X) and h(Y ) are also independent: E [g(X), h(Y )] = E [g(X)]E [h(Y )]

(University of Information Technology (UIT)) Math for Computer Science CS115 24 / 46


Independence and Variances

Always true: Var (aX) = a2 Var (X) Var (X + a) = Var (X)


In general: Var (X + Y ) 6= Var (X) + Var (Y )
However

Theorem
If X, Y are independent: Var (X + Y ) = Var (X) + Var (Y )

Proof.
Assume E [X] = E [Y ] = 0 E [XY ] = E [X]E [Y ] = 0.

Var (X + Y ) = E [(X + Y )2 ] = E [X 2 + 2XY + Y 2 ]


= E [X 2 ] + 2E [XY ] + E [Y 2 ] = Var (X) + V ar(Y )

(University of Information Technology (UIT)) Math for Computer Science CS115 25 / 46


Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 26 / 46


Bernoulli Random Variables

A Bernoulli r.v. X takes two possible values, usually 0 and 1,


modeling random experiments that have two possible outcomes (e.g.,
“success” and “failure”).
e.g., tossing a coin. The outcome is either Head or Tail.
e.g., taking an exam. The result is either Pass or Fail.
e.g., classifying images. An image is either Cat or Non-cat.

(University of Information Technology (UIT)) Math for Computer Science CS115 27 / 46


Bernoulli Random Variables

Definition
A random variable X is a Bernoulli random variable with parameter
p ∈ [0, 1], written as X ∼ Bernoulli(p) if its PMF is given by
(
p, for x = 1
PX (x) =
1 − p, for x = 0.
1.0
0.8
p
0.6
pX (x)

0.4
1−p
0.2
0.0 0 1
x
(University of Information Technology (UIT)) Math for Computer Science CS115 28 / 46
Bernoulli & Indicator Random Variables
A Bernoulli r.v. X with parameter p ∈ [0, 1] can also be described as
(
1 with probability p
X=
0 with probability 1 − p
A Bernoulli r.v. is associated with a certain event A. If event A
occurs, then X = 1; otherwise, X = 0.
Bernoulli r.v. is also called the indicator random variable of an event.
Definition
The indicator random variable of an event A is defined by
(
1 if the event A occurs
IA =
0 otherwise

The indicator r.v. for an event A has Bernoulli distribution with parameter
p = P (IA = 1) = PIA (1) = P (A). We can write IA ∼ Bernoulli((P (A)).
(University of Information Technology (UIT)) Math for Computer Science CS115 29 / 46
Discrete Uniform Random Variables

Parameters: integer a, b; a≤b


Experiment: Pick one of a, a + 1, . . . , b at random; all equally likely.
Sample space: {a, a + 1, . . . , b}
Random variable X : X(ω) = ω
b − a + 1 possible values, PX (x) = 1/(b − a + 1) for each value.
Model of: complete ignorance.

1
pX (x)

b−a+1

a a+1 b
x
(University of Information Technology (UIT)) Math for Computer Science CS115 30 / 46
Binomial Random Variables
Parameters: Probability p ∈ [0, 1], positive integer n.
Experiment: e.g., n independent tosses of a coin with P (Head) = p
Sample space: Set of sequences of H and T of length n
Random variable X : number of Heads observed.
Model of: number of successes in a given number of independent
trials.

Examples

PX (2) = P (X = 2)
= P (HHT) + P (HTH) + P (THH)
= 3p2 (1 − p)
 
3 2
= p (1 − p)
2
(University of Information Technology (UIT)) Math for Computer Science CS115 31 / 46
Binomial Random Variables
n = 10, p = 0.5 n = 100, p = 0.5
0.25 0.08
0.20 0.06
0.15
pX (x)

pX (x)
0.04
0.10
0.05 0.02

0.00 0 2 4 6 8 10 0.00 0 20 40 60 80 100


x x
n = 10, p = 0.1 n = 100, p = 0.1
0.4
0.125
0.3 0.100
0.075
pX (x)

0.2 pX (x)
0.050
0.1
0.025
0.0 0 2 4 6 8 10 0.000 0 20 40 60 80 100
x x

(University of Information Technology (UIT)) Math for Computer Science CS115 32 / 46


Binomial Random Variables

Let Ω = {0, 1, 2, . . . , n} be the sample space and let X be the


number of successes in n independent trials where p is the probability
of success in a single Bernoulli trial.
The probability measure P is called the binomial distribution if
 
n k
PX (k) = p (1 − p)n−k for k = 0, 1, . . . , n
k

where  
n n!
=
k k!(n − k)!
Then
E[X] = np and V ar(X) = np(1 − p)

(University of Information Technology (UIT)) Math for Computer Science CS115 33 / 46


Geometric Random Variables
Parameters: Probability p ∈ (0, 1].
Experiment: infinitely many independent tosses of a coin;
P (Head) = p.
Sample space: Set of infinite sequences of H and T.
Random variable X : number of tosses until the first Head.
Model of: waiting times, number of trials until a success.
PX (k) = P (X = k) = P (T . . T} H) = (1 − p)k−1 p
| .{z
k−1
p = 0.2
0.20 p
(1 − p)p
0.15
(1 − p) 2 p
pX (x)

0.10

0.05

0.00 1 2 3 4 5 6 7 8 9 1011121314151617181920
x
(University of Information Technology (UIT)) Math for Computer Science CS115 34 / 46
Geometric Random Variables

Let Ω = {1, 2, 3, . . .} be the sample space and X be the number of


independent trials to achieve the first success.
Let p stand for the probability of a success in a single trial.
The probability measure P is called the geometric distribution if

PX (k) = (1 − p)k−1 p for k = 1, 2, 3 . . .

Then
1 1−p
E[X] = and V ar(X) =
p p2

(University of Information Technology (UIT)) Math for Computer Science CS115 35 / 46


Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 36 / 46


Continuous Random Variables

Definition
A random variable X on the sample space Ω is said to have a continuous
distribution if there exists a real-valued function f such that

f (x) ≥ 0,
Z ∞
f (x) dx = 1,
−∞

and for all real numbers a < b:


Z b
P (a ≤ X ≤ b) = f (x) dx.
a

Then f : R 7→ R+ is called the probability density function (PDF) of a


continuous random variable X.

(University of Information Technology (UIT)) Math for Computer Science CS115 37 / 46


Probability Density Function (PDF)
0.4
Probability Mass Function (PMF)

0.3
pX(x)

0.2

0.1

0.0
0 1 2 3 4 5 6
x

X
P (a ≤ X ≤ b) = pX (x)
x: a≤x≤b
pX (x) ≥ 0
X
pX (x) = 1
x

(University of Information Technology (UIT)) Math for Computer Science CS115 38 / 46


Probability Density Function (PDF)
0.4
Probability Mass Function (PMF) Probability density function (PDF)
0.5

0.3 0.4

0.3
pX(x)

0.2

fX(x)
0.2
0.1
0.1

0.0
0 1 2 3 4 5 6 0.0
x a b
x
X
P (a ≤ X ≤ b) = pX (x) Z b
x: a≤x≤b P (a ≤ X ≤ b) = fX (x) dx
a
pX (x) ≥ 0
X fX (x) ≥ 0
pX (x) = 1 Z ∞
x fX (x) dx = 1
−∞

(University of Information Technology (UIT)) Math for Computer Science CS115 38 / 46


Probability Density Function (PDF)
0.5
Probability Density Function (PDF)
0.4

0.3

fX(x)
0.2

0.1

0.0
a a+
x
Z b
P (a ≤ X ≤ b) = fX (x) dx
a

δ > 0, small
P (a ≤ X ≤ a + δ) ≈ fX (a).δ
P (X = a) = 0
Just like, a single point has zero length.
But, a set of lots of points has a positive length.
(University of Information Technology (UIT)) Math for Computer Science CS115 39 / 46
Standard Normal (Gaussian) Random Variable N (0, 1)

1.0

0.8

0.6
fX (x)

x 2 /2
0.4

0.2

0.0
3 2 1 0 1 2 3
x

(University of Information Technology (UIT)) Math for Computer Science CS115 40 / 46


Standard Normal (Gaussian) Random Variable N (0, 1)

1.0

0.8

0.6 x 2 /2
fX (x)

e −x 2 /2
0.4

0.2

0.0
3 2 1 0 1 2 3
x

(University of Information Technology (UIT)) Math for Computer Science CS115 40 / 46


Standard Normal (Gaussian) Random Variable N (0, 1)

1.0

0.8
x 2 /2
0.6
fX (x)

e −x 2 /2
1 e −x 2 /2
0.4 p

0.2

0.0
3 2 1 0 1 2 3
x

Z ∞
2 /2 √
e−x dx = 2π
−∞
1 2
fX (x) = √ e−x /2

(University of Information Technology (UIT)) Math for Computer Science CS115 40 / 46


General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8

0.6
(x − µ) 2 /2σ 2
0.4

0.2

0.0
6 4 2 0 2 4 6 8 10
x

(University of Information Technology (UIT)) Math for Computer Science CS115 41 / 46


General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8

0.6 (x − µ) 2 /2σ 2
e −(x − µ)2 /2σ2
0.4

0.2

0.0
6 4 2 0 2 4 6 8 10
x

(University of Information Technology (UIT)) Math for Computer Science CS115 41 / 46


General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8
(x − µ) 2 /2σ 2
0.6 e −(x − µ)2 /2σ2
1 e −(x − µ) 2 /2σ 2
0.4 p
σ 2π

0.2

0.0
6 4 2 0 2 4 6 8 10
x

1 2 2
fX (x) = √ e−(x−µ) /2σ
σ 2π
E [X] = µ
Var (X) = σ 2

(University of Information Technology (UIT)) Math for Computer Science CS115 41 / 46


General Normal (Gaussian) Random Variable N (µ, σ 2 )

1.0

0.8
σ = 0.5
0.6 σ=1
fX (x)

σ=2
0.4 σ=3
0.2

0.0
6 4 2 0 2 4 6 8 10
x
Smaller σ, narrower PDF.
Let Y = aX + b N ∼ N (µ, σ 2 )
Then, E [Y ] = aX + b Var (Y ) = a2 σ 2 (always true)
But also, Y ∼ N (aµ + b, a2 σ 2 )

(University of Information Technology (UIT)) Math for Computer Science CS115 42 / 46


Table of Contents

1 Probability models and axioms

2 Discrete Random Variables

3 Examples of Discrete Probability Distributions

4 Continuous Random Variables

5 Intro to Maximum Likelihood Estimation (MLE)

(University of Information Technology (UIT)) Math for Computer Science CS115 43 / 46


Example

A bag contains 3 balls, each ball is either red or blue.


The number of blue balls θ can be 0, 1, 2, 3.

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46


Example

A bag contains 3 balls, each ball is either red or blue.


The number of blue balls θ can be 0, 1, 2, 3.
Choose 4 balls randomly with replacement.

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46


Example

A bag contains 3 balls, each ball is either red or blue.


The number of blue balls θ can be 0, 1, 2, 3.
Choose 4 balls randomly with replacement.
Random variables X1 , X2 , X3 , X4 are defined as
(
1, if the i-th chosen ball is blue
Xi =
0, if the i-th chosen ball is red

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46


Example

A bag contains 3 balls, each ball is either red or blue.


The number of blue balls θ can be 0, 1, 2, 3.
Choose 4 balls randomly with replacement.
Random variables X1 , X2 , X3 , X4 are defined as
(
1, if the i-th chosen ball is blue
Xi =
0, if the i-th chosen ball is red

After doing the experiment, the following values for Xi ’s are


observed: x1 = 1, x2 = 0, x3 = 1, x4 = 1.

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46


Example

A bag contains 3 balls, each ball is either red or blue.


The number of blue balls θ can be 0, 1, 2, 3.
Choose 4 balls randomly with replacement.
Random variables X1 , X2 , X3 , X4 are defined as
(
1, if the i-th chosen ball is blue
Xi =
0, if the i-th chosen ball is red

After doing the experiment, the following values for Xi ’s are


observed: x1 = 1, x2 = 0, x3 = 1, x4 = 1.
Note that Xi ’s are i.i.d. (independent and identically distributed) and
Xi ∼ Bernoulli( 3θ ). For which value of θ is the probability of the
observed sample is the largest?

(University of Information Technology (UIT)) Math for Computer Science CS115 44 / 46


Example
(
θ
PXi (x) = 3, for x = 1
θ
1− 3, for x = 0

(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46


Example
(
θ
PXi (x) = 3, for x = 1
θ
1− 3, for x = 0
Xi ’s are independent, the joint PMF of X1 , X2 , X3 , X4 can be written
PX1 X2 X3 X4 (x1 , x2 , x3 , x4 ) = PX1 (x1 )PX2 (x2 )PX3 (x3 )PX4 (x4 )
   3  
θ θ θ θ θ θ
PX1 X2 X3 X4 (1, 0, 1, 1) = · 1 − · · = 1−
3 3 3 3 3 3

(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46


Example
(
θ
PXi (x) = 3, for x = 1
θ
1− 3, for x = 0
Xi ’s are independent, the joint PMF of X1 , X2 , X3 , X4 can be written
PX1 X2 X3 X4 (x1 , x2 , x3 , x4 ) = PX1 (x1 )PX2 (x2 )PX3 (x3 )PX4 (x4 )
   3  
θ θ θ θ θ θ
PX1 X2 X3 X4 (1, 0, 1, 1) = · 1 − · · = 1−
3 3 3 3 3 3

θ PX1 X2 X3 X4 (1, 0, 1, 1; θ)
0 0
1 0.0247
2 0.0988
3 0

(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46


Example
(
θ
PXi (x) = 3, for x = 1
θ
1− 3, for x = 0
Xi ’s are independent, the joint PMF of X1 , X2 , X3 , X4 can be written
PX1 X2 X3 X4 (x1 , x2 , x3 , x4 ) = PX1 (x1 )PX2 (x2 )PX3 (x3 )PX4 (x4 )
   3  
θ θ θ θ θ θ
PX1 X2 X3 X4 (1, 0, 1, 1) = · 1 − · · = 1−
3 3 3 3 3 3

θ PX1 X2 X3 X4 (1, 0, 1, 1; θ)
0 0
1 0.0247
2 0.0988
3 0
The observed data is most likely to occur for θ = 2.
We may choose θ̂ = 2 as our estimate of θ.
(University of Information Technology (UIT)) Math for Computer Science CS115 45 / 46
Maximum Likelihood Estimation (MLE)

Definition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with a
parameter θ.
Given that we have observed X1 = x1 , X2 = x2 , . . . , Xn = xn , a
maximum likelihood estimate of θ, denoted as θ̂M L , is a value of θ that
maximizes the likelihood function

L(x1 , x2 , . . . , xn ; θ)

A maximum likelihood estimator (MLE) of the parameter θ, denoted as


Θ̂M L , is a random variable Θ̂M L = Θ̂(X1 , X2 , . . . , Xn ) whose values
X1 = x1 , X2 = x2 , . . . , Xn = xn is given by θ̂M L .

(University of Information Technology (UIT)) Math for Computer Science CS115 46 / 46

You might also like