0% found this document useful (0 votes)

5 views

stat100b_maximum_likelihood

The document discusses the method of maximum likelihood for estimating parameters of probability distributions based on observed data. It explains how to construct the likelihood function, maximize it to find the maximum likelihood estimate (MLE), and provides examples using exponential and normal distributions. Additionally, it covers properties of estimators and includes various problems related to MLE and its asymptotic behavior.

Uploaded by

kken0236

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

stat100b_maximum_likelihood

Uploaded by

kken0236

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

University of California, Los Angeles

Department of Statistics

Statistics 100B Instructor: Nicolas Christou

Method of maximum likelihood

Suppose x1 , x2 , . . . , xn is a random sample of size n from a distribution that has parameter

θ. The joint probability density of these n random variables is

f (x1 , x2 , . . . , xn ; θ)

We also refer to this function as the likelihood function and it is denoted with L. In this
function the parameter θ is unknown and it will be estimated with the method of maximum
likelihood. In principle, the method of maximum likelihood consists of selecting the value of
θ that maximizes the likelihood function (the value of θ that makes the observed data more
likely).

Since x1 , x2 , . . . , xn are independent the likelihood function can be expressed as the product
of the marginal densities:

L = f (x1 , x2 , . . . , xn ; θ) = f (x1 ; θ) × f (x2 ; θ) × . . . × f (xn ; θ)

We will maximize this function w.r.t. θ. It is often easier to maximize the log likelihood
function w.r.t. θ. Therefore, we will take the derivative of the log likelihood function w.r.t.
θ, set it equal to zero and solve for θ. The result will be denoted with θ̂ and we refer to it
as the mle of the parameter θ.

Example:
Let X1 , X2 , . . . , Xn be a random sample of size n from an exponential distribution with
parameter λ. Find the mle of λ.

1
Example:
Let X1 , X2 , . . . , Xn be a random sample of size n from a normal distribution with mean µ
and variance σ 2 . Find the mle of µ and σ 2 .

The information in the sample can be computed using the log likelihood function:
2
In (θ) = −E[ ∂∂θlnL
2 ] (instead of In (θ) = nI1 ).

2
Method of maximum likelihood - An empirical investigation

We will estimate the parameter λ of the exponential distribution with the method of maximum
likelihood. Let X ∼ exp(2) (see figure below).
X~exp(2)

2.0
1.5
f(x)

1.0
0.5
0.0

0 2 4 6 8 10

Let’s pretend that λ is unknown. From this distribution we will select a random sample of
size n = 100 (see observations on the next page). This sample gave 100
P
i=1 xi = 49.86463 and
sample mean x̄ = 0.4986463. Therefore, the method of maximum likelihood estimate of λ is:
λ̂ = x̄1 = 0.4986463
1
= 2.005429.

For different values of the parameter λ we compute the log-likelihood function as follows:
100
X
ln(L) = nln(λ) − λ xi
i=1

These calculations are shown on the next page. We then plot the values of the log likelihood
function against λ and we observe that the maximum occurs at the value of λ̂ = 2.005429 that was
computed above.

3
Observations of a random sample of size n = 100 from exponential distribution with λ = 2:

[1] 1.695824351 0.066702402 0.674994950 0.736106579 1.161993229

[6] 0.296223724 0.043937990 0.508988160 0.294233621 0.024084168
[11] 0.150176375 0.396972182 0.095883055 0.387135421 0.248432954
[16] 0.661809923 0.142542189 0.171455182 1.212420122 0.180640216
[21] 0.009212488 0.160395423 0.188922063 0.884223028 0.240872947
[26] 0.033885428 0.080997465 0.318024634 0.410324188 0.502538879
[31] 0.422821270 0.329996007 0.446404769 0.522652992 0.154471200
[36] 0.064116746 0.268321347 0.263458486 0.581443048 1.031375370
[41] 0.203961618 2.562959307 0.073292671 1.025867874 0.173630370
[46] 0.263878938 0.171617840 0.028656404 1.961520632 0.242559879
[51] 0.491987590 0.410541936 0.500918018 0.322782228 1.497851781
[56] 0.157720428 0.629583415 0.652147642 0.135310800 1.936474929
[61] 0.181363227 0.227498170 1.490756486 0.334677184 0.368089615
[66] 0.272378459 0.525470783 0.476837360 0.224213297 0.171204443
[71] 0.119797853 0.716180556 0.111337474 0.376437023 0.588020059
[76] 0.156395280 0.135622347 0.067554610 1.745086826 1.661906995
[81] 0.023611775 0.080141754 0.089054515 0.004390821 1.183269692
[86] 0.199572674 1.043889988 1.136122111 0.545845778 0.234890293
[91] 0.558763671 0.196966494 0.692430989 0.342892071 0.369322342
[96] 0.671608332 0.254633346 0.076204614 0.157962865 2.543944322

Values of the log likelihood function for different λ:

lambda lnL
[1,] 2.00543 -30.41417
[2,] 0.10000 -235.24497
[3,] 0.20000 -170.91672
[4,] 0.30000 -135.35667
[5,] 0.40000 -111.57492
[6,] 0.50000 -94.24703
[7,] 0.60000 -81.00134
[8,] 0.70000 -70.57273
[9,] 0.80000 -62.20606
[10,] 0.90000 -55.41421
[11,] 1.00000 -49.86463
[12,] 1.10000 -45.32007
[13,] 1.20000 -41.60539
[14,] 1.30000 -38.58759
[15,] 1.40000 -36.16325
[16,] 1.50000 -34.25043
[17,] 1.60000 -32.78304
[18,] 1.70000 -31.70704
[19,] 1.80000 -30.97766
[20,] 1.90000 -30.55740
[21,] 2.00000 -30.41453
[22,] 2.10000 -30.52198
[23,] 2.20000 -30.85644
[24,] 2.30000 -31.39773
[25,] 2.40000 -32.12823
[26,] 2.50000 -33.03249
[27,] 2.60000 -34.09688
[28,] 2.70000 -35.30931
[29,] 2.80000 -36.65901
[30,] 2.90000 -38.13634
[31,] 3.00000 -39.73265
[32,] 3.10000 -41.44013
[33,] 3.20000 -43.25172
[34,] 3.30000 -45.16102
[35,] 3.40000 -47.16218
[36,] 3.50000 -49.24989
[37,] 3.60000 -51.41927
[38,] 3.70000 -53.66583
[39,] 3.80000 -55.98547
[40,] 3.90000 -58.37438
[41,] 4.00000 -60.82907
[42,] 4.10000 -63.34627

4
Plot of the log likelihood function against λ:

● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
●
● ● ●
● ● ●
−50

● ● ●
● ●
●
●
● ●
●
●

●
−100

●
Log likelihood

●
−150

●
−200

0 1 2 3 4

5
Example:
Let X1 , X2 , . . . , Xn be a random sample of size n from a uniform distribution on the interval
(0, θ). Find the mle of θ.

6
Properties of estimators, method of maximum likelihood - examples

Example 1:
Let X1 , X2 , . . . , Xn be an i.i.d. random variables from a probability distribution with pdf
f (x; θ) = θxθ−1 , 0 < x < 1, 0 < θ < ∞. Find the mle of θ.

Example 2:
Let X1 , X2 , . . . , Xn be an i.i.d. random variables from a probability distribution with pdf
f (x; θ) = e−(x−θ) , 0 < x < ∞, θ < x. Find the mle of θ.

Example 3:
Suppose that X1 , . . . , Xm representing yields per acre for corn variety A, is a random sample
from N (µ1 , σ). Also, Y1 , · · · , Yn representing yields for corn variety B, is a random sample
from N (µ2 , σ). If the two samples are independent, find the maximum likelihood estimate
for the common variance σ 2 . Assume that µ1 and µ2 are unknown.

Example 4:
Let X1 , . . . , Xn denote a random sample from the probability density function f (x; θ) =
(θ + 1)xθ , 0 < x < 1, θ > −1. Find the mle of θ.

Example 5:
In a basket there are green and white marbles. You randomly select marbles with replace-
ment until you see a green marble. You found the first green marble on the 10th trial. Then,
your friend does the same. He randomly selects marbles until he obtains a green marble.
His green marble was seen on the 15th trial. Use the method of maximum likelihood to find
an estimate of p, the proportion of green marbles in the basket.

Problem 6:
Let X1 , X2 , . . . , Xn be an i.i.d. random variables from N (µ, σ).
a. Which of the following estimates is unbiased? Show all your work.
Pn Pn
2 i=1 (Xi − X̄)2 2 − X̄)2
i=1 (Xi
σ̂ = , S =
n n−1

b. Which of the estimates of part (a) has the smaller M SE?

Problem 7
Let X1 , X2 , . . . , Xn be an i.i.d. random sample from a normal population with mean zero
and unknown variance σ 2 .
a. Find the maximum likelihood estimate of σ 2 .

b. Show that the estimate of part (a) is unbiased estimator of σ 2 .

c. Find the variance of the estimate of part (a). Is it consistent?

d. Show that the variance of the estimate of part (a) is equal to the Cramér-Rao lower
bound.

7
Problem 8:
Let X1 , X2 , · · · , Xn denote an i.i.d. random sample from the exponential distribution with
mean λ1 .

a. Derive the maximum likelihood estimate of λ.

b. Find the Cramer-Rao lower bound of the estimator of λ.

c. What is the asymptotic distribution of λ̂?

Problem 9:
Let X1 , X2 , · · · , Xn be independent and identically distributed random variables from a Pois-
son distribution with parameter λ. We know that the maximum likelihood estimate of λ is
λ̂ = x̄.

a. Find the variance of λ̂.

b. Is λ̂ an MVUE?

c. Is λ̂ a consistent estimator of λ?

Problem 10:
Suppose that two independent random samples of n1 and n2 observations are selected from
two normal populations. Further, assume that the populations possess a common variance
σ 2 which is unknown. Let the sample variances be S12 and S22 for which E(S12 ) = σ 2 and
E(S22 ) = σ 2 .

a. Show that the pooled estimator of σ 2 that we derived in class below is unbiased.

(n1 − 1)S12 + (n2 − 1)S22

S2 =
n1 + n2 − 2

b. Find the variance of S 2 .

8
Theorem
Asymptotic efficiency of maximum likelihood estimates.
Why do maximum likelihood estimates have an asymptotic normal distribution? Let X1 , X2 , . . . , Xn
be i.i.d. random variables from a probability
q density function f (x|θ). Then if θ̂ is the MLE
1
of θ the theorem states that θ̂ ∼ N (θ, nI(θ) ).

Proof
We will use Taylor series. This says that for a function h
h(y) ≈ h(y0 ) + h0 (y0 )(y − y0 ).
Start with the likelihood function L = Πni=1 f (xi |θ). Then the log-likelihood is
n
X
ln(L) = lnf (xi |θ).
i=1
Now obtain the derivative w.r.t. θ.
n
∂ X ∂
ln(L) = lnf (xi |θ).
∂θ i=1 ∂θ

Now letting θ̂ be the MLE of θ we write this as a Taylor series about that θ̂:
n n
" n
X ∂2
#
X ∂ X ∂
lnf (xi |θ) ≈ lnf (xi |θ̂) + 2
lnf (xi |θ̂) (θ − θ̂)
i=1 ∂θ i=1 ∂θ i=1 ∂θ
√
Now divide left and right by n to get:
n n n
∂2
" #
1 X ∂ 1 X ∂ 1 X
√ lnf (xi |θ) ≈ √ lnf (xi |θ̂) + √ lnf (xi |θ̂) (θ − θ̂)
n i=1 ∂θ n i=1 ∂θ n i=1 ∂θ2
Note: The first term on the right hand side is zero (because this is what we do to find θ̂).
Therefore, we have reduced the relationship to the following:
n n
∂2
" #
1 X ∂ 1 X
√ lnf (xi |θ) ≈ √ lnf (xi |θ̂) (θ − θ̂)
n i=1 ∂θ n i=1 ∂θ2
Examine the left hand side: This involves the sum of n independent, identically distributed
things (Central Limit theorem). Each one of these “things” has mean zero and variance
I(θ). Therefore the left hand side follows approximately N (0, I(θ)). Why?

Therefore, the limiting distribution of the right hand side must also be N (0, I(θ), i.e.
n
∂2
" #
1 X
√ lnf (xi |θ̂) (θ − θ̂) ∼ N (0, I(θ)).
n i=1 ∂θ2
Or write it as (watch the n0 s and the minus sign!):
n √
∂2
" #
1X
− 2
lnf (xi |θ̂) n(θ̂ − θ) ∼ N (0, I(θ)).
n i=1 ∂θ
The expression in the bracket converges to I(θ) (law of large numbers) and therefore we can
express the previous expression as
√
I(θ) n(θ̂ − θ) ∼ N (0, I(θ)),
or
q
1
θ̂ ∼ N (θ, nI(θ)
).

9-4 Trend Line Equations
100% (1)
9-4 Trend Line Equations
2 pages
PAS206 Exercises Unit 2A (Answers)
100% (1)
PAS206 Exercises Unit 2A (Answers)
6 pages
CE 200 Guide
100% (1)
CE 200 Guide
76 pages
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
No ratings yet
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
9 pages
Sim R
No ratings yet
Sim R
6 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
pt4 Adv Regression Models PDF
No ratings yet
pt4 Adv Regression Models PDF
170 pages
Tutorial On The Estimation of The Design-Flood in Ungauged Basins
No ratings yet
Tutorial On The Estimation of The Design-Flood in Ungauged Basins
14 pages
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
No ratings yet
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
16 pages
Reg Intro Sol
No ratings yet
Reg Intro Sol
6 pages
STAT1400 2022 1st Week4-Lecture 8
No ratings yet
STAT1400 2022 1st Week4-Lecture 8
22 pages
Simple Regression II (1)
No ratings yet
Simple Regression II (1)
53 pages
10-cv-val1
No ratings yet
10-cv-val1
26 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
R History
No ratings yet
R History
34 pages
Anscombe Regression Example Data
No ratings yet
Anscombe Regression Example Data
5 pages
K-Means Vs Mini Batch K-Means: A Comparison
No ratings yet
K-Means Vs Mini Batch K-Means: A Comparison
12 pages
Notes For Lectures 11 To 16 - 2024
No ratings yet
Notes For Lectures 11 To 16 - 2024
68 pages
Week4 2020
No ratings yet
Week4 2020
25 pages
Midterm Test: Reaction Times Vs Caffeine Intake
No ratings yet
Midterm Test: Reaction Times Vs Caffeine Intake
2 pages
IMAMultivariate 1
No ratings yet
IMAMultivariate 1
90 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Chapter 4_Models for Nonstationary Time Series
No ratings yet
Chapter 4_Models for Nonstationary Time Series
32 pages
Indicator Kriging
No ratings yet
Indicator Kriging
6 pages
Course2 - Principal Component Analysis (PCA) : Fiche TD Avec Le Logiciel
No ratings yet
Course2 - Principal Component Analysis (PCA) : Fiche TD Avec Le Logiciel
17 pages
2.lecture2 Ate
No ratings yet
2.lecture2 Ate
61 pages
Multiple Linear Regression: BIOST 515 January 15, 2004
No ratings yet
Multiple Linear Regression: BIOST 515 January 15, 2004
32 pages
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
No ratings yet
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
20 pages
BDA_lecture_11a
No ratings yet
BDA_lecture_11a
66 pages
10 Cor1
No ratings yet
10 Cor1
18 pages
english_beamer_powerlaw
No ratings yet
english_beamer_powerlaw
23 pages
Barthelme EP2
No ratings yet
Barthelme EP2
58 pages
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
22 pages
Statistical Computing
No ratings yet
Statistical Computing
36 pages
Week 3-4
No ratings yet
Week 3-4
75 pages
Shift PDF
No ratings yet
Shift PDF
15 pages
Sloane 2024
No ratings yet
Sloane 2024
16 pages
Lecture Slides 1 To 10 - 2024
No ratings yet
Lecture Slides 1 To 10 - 2024
196 pages
Lecture Slides - 11-16 - 2024
No ratings yet
Lecture Slides - 11-16 - 2024
128 pages
4802 Semi Supervised Domain Adaptation With Non Parametric Copulas
No ratings yet
4802 Semi Supervised Domain Adaptation With Non Parametric Copulas
9 pages
Prelims Stats
No ratings yet
Prelims Stats
39 pages
Noncentral T
No ratings yet
Noncentral T
56 pages
Geometric Distribution
No ratings yet
Geometric Distribution
25 pages
330 Lecture15 2014
No ratings yet
330 Lecture15 2014
53 pages
Lecture 20: Outliers and Influential Points
No ratings yet
Lecture 20: Outliers and Influential Points
11 pages
330 Lecture4 2015
No ratings yet
330 Lecture4 2015
32 pages
Lectures Machine Learning
No ratings yet
Lectures Machine Learning
205 pages
07exercise Solution
No ratings yet
07exercise Solution
9 pages
330 Lecture8 2014
No ratings yet
330 Lecture8 2014
34 pages
Clustering 10/36-702 Spring 2018
No ratings yet
Clustering 10/36-702 Spring 2018
50 pages
Starter_Template_for_AMP_1
No ratings yet
Starter_Template_for_AMP_1
4 pages
330 Lecture18 2014
No ratings yet
330 Lecture18 2014
24 pages
06-clus3
No ratings yet
06-clus3
32 pages
CHP 2
No ratings yet
CHP 2
96 pages
2014 Test
No ratings yet
2014 Test
13 pages
lecture7-graddesc
No ratings yet
lecture7-graddesc
8 pages
LecN11_R
No ratings yet
LecN11_R
4 pages
Allg 4
No ratings yet
Allg 4
1 page
Junior Mathematical Dominoes: 40 blackline masters for ages 7-11
From Everand
Junior Mathematical Dominoes: 40 blackline masters for ages 7-11
Tracey Macdonald
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Power System Reliability
100% (1)
Power System Reliability
11 pages
Fm-Hse-01-04 Jsa
No ratings yet
Fm-Hse-01-04 Jsa
10 pages
Topic 4 Chapter 06
No ratings yet
Topic 4 Chapter 06
23 pages
Biased Coin
No ratings yet
Biased Coin
7 pages
4 Correlation and Spectral Densities PDF
No ratings yet
4 Correlation and Spectral Densities PDF
28 pages
#RWRI Second Order Thoughts
No ratings yet
#RWRI Second Order Thoughts
6 pages
Stochastic Processes Filteration Martingales
No ratings yet
Stochastic Processes Filteration Martingales
62 pages
Probability and Statistics B
No ratings yet
Probability and Statistics B
1 page
AMS 311 Solved Problems Sections 18 To 20
No ratings yet
AMS 311 Solved Problems Sections 18 To 20
2 pages
Independent: Examples of Independent Events
No ratings yet
Independent: Examples of Independent Events
6 pages
Statistics For Business and Economics: Probability
No ratings yet
Statistics For Business and Economics: Probability
36 pages
Hamilton County Judges Report V 2.0
No ratings yet
Hamilton County Judges Report V 2.0
7 pages
Additional Ma Thematic
100% (1)
Additional Ma Thematic
37 pages
The Arithmetic of Z-Numbers - Theory and Applications - Rafik A. Aliev PDF
100% (1)
The Arithmetic of Z-Numbers - Theory and Applications - Rafik A. Aliev PDF
316 pages
Multivariate T-Distribution - Wikipedia
No ratings yet
Multivariate T-Distribution - Wikipedia
16 pages
Download Complete Probability with STEM Applications 3rd Edition Carlton PDF for All Chapters
100% (8)
Download Complete Probability with STEM Applications 3rd Edition Carlton PDF for All Chapters
77 pages
Chapter 05 - Intan Revised
No ratings yet
Chapter 05 - Intan Revised
11 pages
(2004) A Single-Loop Method For Reliability Based Design Optimization
No ratings yet
(2004) A Single-Loop Method For Reliability Based Design Optimization
12 pages
Practice Set2
No ratings yet
Practice Set2
7 pages
Central Limit Theorem Examples and Exercises
No ratings yet
Central Limit Theorem Examples and Exercises
4 pages
Solution PDF
No ratings yet
Solution PDF
305 pages
Surigao Del Norte National High School
No ratings yet
Surigao Del Norte National High School
3 pages
Dilla University: Page 1 of 6
100% (2)
Dilla University: Page 1 of 6
6 pages
Immediate download Introduction to Fuzzy Systems 1st Edition Guanrong Chen (Author) ebooks 2024
100% (3)
Immediate download Introduction to Fuzzy Systems 1st Edition Guanrong Chen (Author) ebooks 2024
79 pages
CHP 10 Solutions
No ratings yet
CHP 10 Solutions
6 pages
Grade 10 Simple Events
No ratings yet
Grade 10 Simple Events
25 pages
Beta Negative Binomial Distribution
No ratings yet
Beta Negative Binomial Distribution
3 pages
February 20th, 2016 - Lecture: Independence: 1. Topics
No ratings yet
February 20th, 2016 - Lecture: Independence: 1. Topics
6 pages

stat100b_maximum_likelihood

Uploaded by

stat100b_maximum_likelihood

Uploaded by

University of California, Los Angeles

Statistics 100B Instructor: Nicolas Christou

Method of maximum likelihood

Suppose x1 , x2 , . . . , xn is a random sample of size n from a distribution that has parameter

L = f (x1 , x2 , . . . , xn ; θ) = f (x1 ; θ) × f (x2 ; θ) × . . . × f (xn ; θ)

[1] 1.695824351 0.066702402 0.674994950 0.736106579 1.161993229

Values of the log likelihood function for different λ:

b. Which of the estimates of part (a) has the smaller M SE?

b. Show that the estimate of part (a) is unbiased estimator of σ 2 .

c. Find the variance of the estimate of part (a). Is it consistent?

a. Derive the maximum likelihood estimate of λ.

b. Find the Cramer-Rao lower bound of the estimator of λ.

c. What is the asymptotic distribution of λ̂?

a. Find the variance of λ̂.

(n1 − 1)S12 + (n2 − 1)S22

b. Find the variance of S 2 .

You might also like