0% found this document useful (0 votes)

29 views

Maximum Likelihood Estimation

Uploaded by

Saurabh Bhandari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Maximum Likelihood Estimation

Uploaded by

Saurabh Bhandari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

https://newonlinecourses.science.psu.edu/stat4...

Published on STAT 414 / 415 (https://onlinecourses.science.psu.edu/stat414)

Home > Maximum Likelihood Estimation

Maximum Likelihood Estimation

Statement of the Problem

Suppose we have a random sample X1, X2,..., Xn whose assumed probability

distribution depends on some unknown parameter θ. Our primary goal here will be to
find a point estimator u(X1, X2,..., Xn), such that u(x1, x2,..., xn) is a "good" point
estimate of θ, where x1, x2,..., xn are the observed values of the random sample. For
example, if we plan to take a random sample X1, X2,..., Xn for which the Xi are
assumed to be normally distributed with mean μ and variance σ2, then our goal will
be to find a good estimate of μ, say, using the data x1, x2,..., xn that we obtained from
our specific random sample.

The Basic Idea

It seems reasonable that a good estimate of the unknown parameter θ would be the
value of θ that maximizes the probability, errrr... that is, the likelihood... of getting
the data we observed. (So, do you see from where the name "maximum likelihood"
comes?) So, that is, in a nutshell, the idea behind the method of maximum likelihood
estimation. But how would we implement the method in practice? Well, suppose we
have a random sample X1, X2,..., Xn for which the probability density (or mass)
function of each Xi is f(xi; θ). Then, the joint probability mass (or density) function
of X1, X2,..., Xn, which we'll (not so arbitrarily) call L(θ) is:

n
L(θ) = P(X1 = x1 , X2 = x2 , … , Xn = xn ) = f (x1 ; θ) ⋅ f (x2 ; θ) ⋯ f (xn ; θ) = ∏ f (xi ; θ)
i=1

The first equality is of course just the definition of the joint probability mass function.
The second equality comes from that fact that we have a random sample, which
implies by definition that the Xi are independent. And, the last equality just uses the
shorthand mathematical notation of a product of indexed terms. Now, in light of the
basic idea of maximum likelihood estimation, one reasonable way to proceed is to
treat the "likelihood function" L(θ) as a function of θ, and find the value of θ that
maximizes it.

Is this still sounding like too much abstract gibberish? Let's take a look at an example
to see if we can make it a bit more concrete.

Example

Suppose we have a random

sample X1, X2,..., Xn where:

1 of 7 12/05/19, 2:59 pm
https://newonlinecourses.science.psu.edu/stat4...

Xi = 0 if a randomly selected student does

not own a sports car, and
Xi = 1 if a randomly selected student does
own a sports car.

Assuming that the Xi are independent Bernoulli

random variables with unknown parameter p, ﬁnd the maximum likelihood estimator
of p, the proportion of students who own a sports car.

Solution. If the Xi are independent Bernoulli random variables with unknown

parameter p, then the probability mass function of each Xi is:

f (xi ; p) = pxi (1 − p)1−xi

for xi = 0 or 1 and 0 < p < 1. Therefore, the likelihood function L(p) is, by
deﬁnition:
n
L(p) = ∏ f (xi ; p) = px1 (1 − p)1−x1 × px2 (1 − p)1−x2 × ⋯ × pxn (1 − p)1−xn
i=1

for 0 < p < 1. Simplifying, by summing up the exponents, we get :

L(p) = p∑ xi (1 − p)n−∑ xi
Now, in order to implement the method of maximum likelihood, we need to
find the p that maximizes the likelihood L(p). We need to put on our calculus
hats now, since in order to maximize the function, we are going to need to
differentiate the likelihood function with respect to p. In doing so, we'll use a
"trick" that often makes the differentiation a bit easier. Note that the natural
logarithm is an increasing function of x:

That is, if x1 < x2, then f(x1) < f(x2). That means that the value of p that
maximizes the natural logarithm of the likelihood function ln(L(p)) is also the
value of p that maximizes the likelihood function L(p). So, the "trick" is to take
the derivative of ln(L(p)) (with respect to p) rather than taking the derivative
of L(p). Again, doing so often makes the diﬀerentiation much easier. (By the

2 of 7 12/05/19, 2:59 pm
https://newonlinecourses.science.psu.edu/stat4...

way, throughout the remainder of this course, I will use either ln(L(p)) or
log(L(p)) to denote the natural logarithm of the likelihood function.)

In this case, the natural logarithm of the likelihood function is:

logL(p) = (∑ xi )log(p) + (n − ∑ xi )log(1 − p)

Now, taking the derivative of the log likelihood, and setting to 0, we get:

Now, multiplying through by p(1−p), we get:

(∑ xi )(1 − p) − (n − ∑ xi )p = 0
Upon distributing, we see that two of the resulting terms cancel each other
out:

leaving us with:

∑ xi − np = 0
Now, all we have to do is solve for p. In doing so, you'll want to make sure
that you always put a hat ("^") on the parameter, in this case p, to indicate it
is an estimate:
n
∑ xi
i=1
p̂ =
n
or, alternatively, an estimator:
n
∑ Xi
i=1
p̂ =
n
Oh, and we should technically verify that we indeed did obtain a maximum.
We can do that by verifying that the second derivative of the log likelihood
with respect to p is negative. It is, but you might want to do the work to
convince yourself!

Now, with that example behind us, let us take a look at formal deﬁnitions of the terms
(1) likelihood function, (2) maximum likelihood estimators, and (3) maximum
likelihood estimates.

3 of 7 12/05/19, 2:59 pm
https://newonlinecourses.science.psu.edu/stat4...

Deﬁnition. Let X1, X2,..., Xn be a random sample from a distribution that

depends on one or more unknown parameters θ1, θ2,..., θm with probability
density (or mass) function f(xi; θ1, θ2,..., θm). Suppose that (θ1, θ2,..., θm) is
restricted to a given parameter space Ω. Then:

(1) When regarded as a function of θ1, θ2,..., θm, the joint probability density
(or mass) function of X1, X2,..., Xn:

n
L(θ1 , θ2 , … , θm ) = ∏ f (xi ; θ1 , θ2 , … , θm )
i=1

((θ1, θ2,..., θm) in Ω) is called the likelihood function.

(2) If:

[u1 (x1 , x2 , … , xn ), u2 (x1 , x2 , … , xn ), … , um (x1 , x2 , … , xn )]

is the m-tuple that maximizes the likelihood function, then:

θî = ui (X1 , X2 , … , Xn )
is the maximum likelihood estimator of θi, for i = 1, 2, ..., m.

(3) The corresponding observed values of the statistics in (2), namely:

[u1 (x1 , x2 , … , xn ), u2 (x1 , x2 , … , xn ), … , um (x1 , x2 , … , xn )]

are called the maximum likelihood estimates of θi, for i = 1, 2, ..., m.

Example

Suppose the weights of randomly selected

American female college students are normally
distributed with unknown mean μ and standard
deviation σ. A random sample of 10 American
female college students yielded the following
weights (in pounds):

115 122 130 127 149 160

152 138 149 180

Based on the deﬁnitions given above, identify the likelihood function and the
maximum likelihood estimator of μ, the mean weight of all American female college
students. Using the given sample, ﬁnd a maximum likelihood estimate of μ as well.

Solution. The probability density function of Xi is:

4 of 7 12/05/19, 2:59 pm
https://newonlinecourses.science.psu.edu/stat4...

[ 2σ 2 ]
1 (xi − μ)2
f (xi ; μ, σ 2 ) = exp −
‾‾
σ √2π‾
for −∞ < x < ∞. The parameter space is Ω = {(μ, σ): −∞ < μ < ∞ and 0 < σ
< ∞}. Therefore, (you might want to convince yourself that) the likelihood
function is:

[ 2σ i=1 ]
−n −n/2 1 n
L(μ, σ) = σ (2π ) exp − 2 ∑ (xi − μ)2

for −∞ < μ < ∞ and 0 < σ < ∞. It can be shown (we'll do so in the next
example!), upon maximizing the likelihood function with respect to μ, that the
maximum likelihood estimator of μ is:

1 n
μ̂ = ∑ Xi = X̄
n i=1
Based on the given sample, a maximum likelihood estimate of μ is:

1 n 1
μ̂ = ∑ xi = (115 + ⋯ + 180) = 142.2
n i=1 10
pounds. Note that the only diﬀerence between the formulas for the maximum
likelihood estimator and the maximum likelihood estimate is that:

the estimator is defined using capital letters (to denote that its value is
random), and
the estimate is defined using lowercase letters (to denote that its value
is fixed and based on an obtained sample)

Okay, so now we have the formal deﬁnitions out of the way. The ﬁrst example on this
page involved a joint probability mass function that depends on only one parameter,
namely p, the proportion of successes. Now, let's take a look at an example that
involves a joint probability density function that depends on two parameters.

Example

Let X1, X2,..., Xn be a random sample from a normal distribution with unknown
mean μ and variance σ2. Find maximum likelihood estimators of mean μ and
variance σ2.

Solution. In ﬁnding the estimators, the ﬁrst thing we'll do is write the
probability density function as a function of θ1 = μ and θ2 = σ2:

[ ]
1 (xi − θ1 )2
f (xi ; θ1 , θ2 ) = exp −
√‾‾ ‾‾
‾
θ2 √2π 2θ2

5 of 7 12/05/19, 2:59 pm
https://newonlinecourses.science.psu.edu/stat4...

for −∞ < θ1 < ∞ and 0 < θ2 < ∞. We do this so as not to cause confusion
when taking the derivative of the likelihood with respect to σ2. Now, that
makes the likelihood function:

[ ]
n 1 n
L(θ1 , θ2 ) = ∏ f (xi ; θ1 , θ2 ) = θ2−n/2 (2π )−n/2 exp − ∑ (xi − θ1 )2
i=1 2θ2 i=1

and therefore the log of the likelihood function:

n n ∑(xi − θ1 )2
logL(θ1 , θ2 ) = − logθ2 − log(2π) −
2 2 2θ2
Now, upon taking the partial derivative of the log likelihood with respect
to θ1, and setting to 0, we see that a few things cancel each other out,
leaving us with:

Now, multiplying through by θ2, and distributing the summation, we get:

∑ xi − nθ1 = 0
Now, solving for θ1, and putting on its hat, we have shown that the maximum
likelihood estimate of θ1 is:

∑ xi
θ1̂ = μ̂ = = x̄
n
Now for θ2. Taking the partial derivative of the log likelihood with respect
to θ2, and setting to 0, we get:

Multiplying through by 2θ22 :

we get:

6 of 7 12/05/19, 2:59 pm
https://newonlinecourses.science.psu.edu/stat4...

−nθ2 + ∑(xi − θ1 )2 = 0
And, solving for θ2, and putting on its hat, we have shown that the maximum
likelihood estimate of θ2 is:

̂ 2 ∑(xi − x̄)2
θ2 = σ ̂ =
n

(I'll again leave it to you to verify, in each case, that the second partial
derivative of the log likelihood is negative, and therefore that we did indeed
ﬁnd maxima.) In summary, we have shown that the maximum likelihood
estimators of μ and variance σ2 for the normal model are:

∑ Xi ∑(Xi − X̄)2
μ̂ = = X̄ and σ 2̂ =
n n

respectively.

Note that the maximum likelihood estimator of σ2 for the normal model is not the
sample variance S2. They are, in fact, competing estimators. So how do we know
which estimator we should use for σ2 ? Well, one way is to choose the estimator that
is "unbiased." Let's go learn about unbiased estimators now.

Source URL: https://onlinecourses.science.psu.edu/stat414/node/191

7 of 7 12/05/19, 2:59 pm

ASMedia ASM3142 Datasheet
No ratings yet
ASMedia ASM3142 Datasheet
21 pages
Your JD Order 447001207 Is On Its Way
No ratings yet
Your JD Order 447001207 Is On Its Way
2 pages
Đề 1
No ratings yet
Đề 1
53 pages
2005 Nissan Altima 20
100% (1)
2005 Nissan Altima 20
74 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Imp - Maximum Likelihood Estimation - STAT 414 - 415
No ratings yet
Imp - Maximum Likelihood Estimation - STAT 414 - 415
8 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
MLE Dan Bayesian Estimation From Walpole Book
No ratings yet
MLE Dan Bayesian Estimation From Walpole Book
13 pages
MLE_Assingnment (1)
No ratings yet
MLE_Assingnment (1)
7 pages
Chap 5
No ratings yet
Chap 5
32 pages
STAT 2006 Chapter 2 - 2022
No ratings yet
STAT 2006 Chapter 2 - 2022
83 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
Maximum
No ratings yet
Maximum
3 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
Topic 14: Maximum Likelihood Estimation: 1 Examples
No ratings yet
Topic 14: Maximum Likelihood Estimation: 1 Examples
6 pages
ML Notes
No ratings yet
ML Notes
4 pages
Inf 2
No ratings yet
Inf 2
37 pages
Lecture_36
No ratings yet
Lecture_36
22 pages
lecture1_ml_MLE
No ratings yet
lecture1_ml_MLE
103 pages
sta255 Week 11-2 pre
No ratings yet
sta255 Week 11-2 pre
21 pages
stat100b_maximum_likelihood
No ratings yet
stat100b_maximum_likelihood
9 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
Unbiased Estimation - of Mean and Variance
No ratings yet
Unbiased Estimation - of Mean and Variance
4 pages
Lecture 5
No ratings yet
Lecture 5
5 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
A Guide To Modern Econometrics by Verbeek 181 190
No ratings yet
A Guide To Modern Econometrics by Verbeek 181 190
10 pages
Session 32 - Point Estimate
No ratings yet
Session 32 - Point Estimate
53 pages
Notes Maximum Likelihood
No ratings yet
Notes Maximum Likelihood
3 pages
Frequentist Estimation: 4.1 Likelihood Function
No ratings yet
Frequentist Estimation: 4.1 Likelihood Function
6 pages
sta255 Week 11-1 pre
No ratings yet
sta255 Week 11-1 pre
37 pages
Biostatistics: School of Mathematics and Statistics
No ratings yet
Biostatistics: School of Mathematics and Statistics
17 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Chapte 2 - Maximum Likelihood - HEC_Lausanne
No ratings yet
Chapte 2 - Maximum Likelihood - HEC_Lausanne
276 pages
Main Parameterestimation PDF
No ratings yet
Main Parameterestimation PDF
73 pages
NOTES
No ratings yet
NOTES
14 pages
STATS 310 Introduction To Statistical Theory: Tutorial Problem 5. Tuesday March 19, 2013 Model Answer
No ratings yet
STATS 310 Introduction To Statistical Theory: Tutorial Problem 5. Tuesday March 19, 2013 Model Answer
4 pages
L08-MaximumLikelihoodEstimation
No ratings yet
L08-MaximumLikelihoodEstimation
5 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Statistics I: Parameter Estimation, Part I
No ratings yet
Statistics I: Parameter Estimation, Part I
24 pages
4B Parameter Estimation: Class Problems
No ratings yet
4B Parameter Estimation: Class Problems
6 pages
MIT18 05S14 Reading10b PDF
No ratings yet
MIT18 05S14 Reading10b PDF
9 pages
21 Mle
No ratings yet
21 Mle
24 pages
Probability and Statistics
No ratings yet
Probability and Statistics
4 pages
7 Mle
No ratings yet
7 Mle
31 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Statistics
No ratings yet
Statistics
60 pages
Chapter 2 - Maximum Likelihood - HEC_Lausanne
No ratings yet
Chapter 2 - Maximum Likelihood - HEC_Lausanne
277 pages
Chapter 5
No ratings yet
Chapter 5
60 pages
X X X F X F X F X: Likelihood Function
No ratings yet
X X X F X F X F X: Likelihood Function
12 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Presentation About Keyboard
No ratings yet
Presentation About Keyboard
10 pages
Automatic Contingency Selection
100% (1)
Automatic Contingency Selection
13 pages
Ddos Attacks, Detection Parameters and Mitigation in Cloud Environment
No ratings yet
Ddos Attacks, Detection Parameters and Mitigation in Cloud Environment
4 pages
SIT 121 OOP SUPP
No ratings yet
SIT 121 OOP SUPP
2 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Bac Dive
No ratings yet
Bac Dive
3 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Kotlin Language Features Map
No ratings yet
Kotlin Language Features Map
8 pages
Introduction To Computing Exit Exam Model Questions
No ratings yet
Introduction To Computing Exit Exam Model Questions
28 pages
Abakus Mobil Air en
No ratings yet
Abakus Mobil Air en
1 page
Imanager Merged
No ratings yet
Imanager Merged
51 pages
The Effective Organization The Nuts and Bolts of Business Value 1st Edition Lee Schlenker All Chapters Instant Download
100% (4)
The Effective Organization The Nuts and Bolts of Business Value 1st Edition Lee Schlenker All Chapters Instant Download
77 pages
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
No ratings yet
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
53 pages
8085 MCQ
No ratings yet
8085 MCQ
20 pages
IrPHY 1p4
No ratings yet
IrPHY 1p4
68 pages
Constrained Optimization Using Matlab
No ratings yet
Constrained Optimization Using Matlab
10 pages
Design Patterns Java
No ratings yet
Design Patterns Java
87 pages
Assignment No 2
No ratings yet
Assignment No 2
3 pages
Design_and_Implementation_of_a_Web_Based
No ratings yet
Design_and_Implementation_of_a_Web_Based
11 pages
Creating A Serial Communication On Win32
No ratings yet
Creating A Serial Communication On Win32
9 pages
PWC Gto 2023
No ratings yet
PWC Gto 2023
20 pages
Food Ordering System
No ratings yet
Food Ordering System
37 pages
Database Management System Sixth Semester (C.B.S.)
No ratings yet
Database Management System Sixth Semester (C.B.S.)
4 pages
Complete Micro-Project Report
50% (2)
Complete Micro-Project Report
13 pages
Tribhuvan University Prime College: Khusibun, Kathmandu Nepal
100% (1)
Tribhuvan University Prime College: Khusibun, Kathmandu Nepal
49 pages
1 Avalanche Transceiver Test 2013 - 2014
No ratings yet
1 Avalanche Transceiver Test 2013 - 2014
19 pages

Maximum Likelihood Estimation

Uploaded by

Maximum Likelihood Estimation

Uploaded by

https://newonlinecourses.science.psu.edu/stat4...

Published on STAT 414 / 415 (https://onlinecourses.science.psu.edu/stat414)

Maximum Likelihood Estimation

Suppose we have a random sample X1, X2,..., Xn whose assumed probability

The Basic Idea

Suppose we have a random

Xi = 0 if a randomly selected student does

Assuming that the Xi are independent Bernoulli

Solution. If the Xi are independent Bernoulli random variables with unknown

f (xi ; p) = pxi (1 − p)1−xi

for 0 < p < 1. Simplifying, by summing up the exponents, we get :

In this case, the natural logarithm of the likelihood function is:

logL(p) = (∑ xi )log(p) + (n − ∑ xi )log(1 − p)

Now, multiplying through by p(1−p), we get:

Deﬁnition. Let X1, X2,..., Xn be a random sample from a distribution that

((θ1, θ2,..., θm) in Ω) is called the likelihood function.

[u1 (x1 , x2 , … , xn ), u2 (x1 , x2 , … , xn ), … , um (x1 , x2 , … , xn )]

(3) The corresponding observed values of the statistics in (2), namely:

[u1 (x1 , x2 , … , xn ), u2 (x1 , x2 , … , xn ), … , um (x1 , x2 , … , xn )]

Suppose the weights of randomly selected

115 122 130 127 149 160

Solution. The probability density function of Xi is:

and therefore the log of the likelihood function:

Now, multiplying through by θ2, and distributing the summation, we get:

Multiplying through by 2θ22 :

Source URL: https://onlinecourses.science.psu.edu/stat414/node/191

You might also like