0% found this document useful (0 votes)
5 views

stat100b_maximum_likelihood

The document discusses the method of maximum likelihood for estimating parameters of probability distributions based on observed data. It explains how to construct the likelihood function, maximize it to find the maximum likelihood estimate (MLE), and provides examples using exponential and normal distributions. Additionally, it covers properties of estimators and includes various problems related to MLE and its asymptotic behavior.

Uploaded by

kken0236
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

stat100b_maximum_likelihood

The document discusses the method of maximum likelihood for estimating parameters of probability distributions based on observed data. It explains how to construct the likelihood function, maximize it to find the maximum likelihood estimate (MLE), and provides examples using exponential and normal distributions. Additionally, it covers properties of estimators and includes various problems related to MLE and its asymptotic behavior.

Uploaded by

kken0236
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

University of California, Los Angeles

Department of Statistics

Statistics 100B Instructor: Nicolas Christou

Method of maximum likelihood

Suppose x1 , x2 , . . . , xn is a random sample of size n from a distribution that has parameter


θ. The joint probability density of these n random variables is

f (x1 , x2 , . . . , xn ; θ)

We also refer to this function as the likelihood function and it is denoted with L. In this
function the parameter θ is unknown and it will be estimated with the method of maximum
likelihood. In principle, the method of maximum likelihood consists of selecting the value of
θ that maximizes the likelihood function (the value of θ that makes the observed data more
likely).

Since x1 , x2 , . . . , xn are independent the likelihood function can be expressed as the product
of the marginal densities:

L = f (x1 , x2 , . . . , xn ; θ) = f (x1 ; θ) × f (x2 ; θ) × . . . × f (xn ; θ)

We will maximize this function w.r.t. θ. It is often easier to maximize the log likelihood
function w.r.t. θ. Therefore, we will take the derivative of the log likelihood function w.r.t.
θ, set it equal to zero and solve for θ. The result will be denoted with θ̂ and we refer to it
as the mle of the parameter θ.

Example:
Let X1 , X2 , . . . , Xn be a random sample of size n from an exponential distribution with
parameter λ. Find the mle of λ.

1
Example:
Let X1 , X2 , . . . , Xn be a random sample of size n from a normal distribution with mean µ
and variance σ 2 . Find the mle of µ and σ 2 .

The information in the sample can be computed using the log likelihood function:
2
In (θ) = −E[ ∂∂θlnL
2 ] (instead of In (θ) = nI1 ).

2
Method of maximum likelihood - An empirical investigation

We will estimate the parameter λ of the exponential distribution with the method of maximum
likelihood. Let X ∼ exp(2) (see figure below).
X~exp(2)

2.0
1.5
f(x)

1.0
0.5
0.0

0 2 4 6 8 10

Let’s pretend that λ is unknown. From this distribution we will select a random sample of
size n = 100 (see observations on the next page). This sample gave 100
P
i=1 xi = 49.86463 and
sample mean x̄ = 0.4986463. Therefore, the method of maximum likelihood estimate of λ is:
λ̂ = x̄1 = 0.4986463
1
= 2.005429.

For different values of the parameter λ we compute the log-likelihood function as follows:
100
X
ln(L) = nln(λ) − λ xi
i=1

These calculations are shown on the next page. We then plot the values of the log likelihood
function against λ and we observe that the maximum occurs at the value of λ̂ = 2.005429 that was
computed above.

3
Observations of a random sample of size n = 100 from exponential distribution with λ = 2:

[1] 1.695824351 0.066702402 0.674994950 0.736106579 1.161993229


[6] 0.296223724 0.043937990 0.508988160 0.294233621 0.024084168
[11] 0.150176375 0.396972182 0.095883055 0.387135421 0.248432954
[16] 0.661809923 0.142542189 0.171455182 1.212420122 0.180640216
[21] 0.009212488 0.160395423 0.188922063 0.884223028 0.240872947
[26] 0.033885428 0.080997465 0.318024634 0.410324188 0.502538879
[31] 0.422821270 0.329996007 0.446404769 0.522652992 0.154471200
[36] 0.064116746 0.268321347 0.263458486 0.581443048 1.031375370
[41] 0.203961618 2.562959307 0.073292671 1.025867874 0.173630370
[46] 0.263878938 0.171617840 0.028656404 1.961520632 0.242559879
[51] 0.491987590 0.410541936 0.500918018 0.322782228 1.497851781
[56] 0.157720428 0.629583415 0.652147642 0.135310800 1.936474929
[61] 0.181363227 0.227498170 1.490756486 0.334677184 0.368089615
[66] 0.272378459 0.525470783 0.476837360 0.224213297 0.171204443
[71] 0.119797853 0.716180556 0.111337474 0.376437023 0.588020059
[76] 0.156395280 0.135622347 0.067554610 1.745086826 1.661906995
[81] 0.023611775 0.080141754 0.089054515 0.004390821 1.183269692
[86] 0.199572674 1.043889988 1.136122111 0.545845778 0.234890293
[91] 0.558763671 0.196966494 0.692430989 0.342892071 0.369322342
[96] 0.671608332 0.254633346 0.076204614 0.157962865 2.543944322

Values of the log likelihood function for different λ:

lambda lnL
[1,] 2.00543 -30.41417
[2,] 0.10000 -235.24497
[3,] 0.20000 -170.91672
[4,] 0.30000 -135.35667
[5,] 0.40000 -111.57492
[6,] 0.50000 -94.24703
[7,] 0.60000 -81.00134
[8,] 0.70000 -70.57273
[9,] 0.80000 -62.20606
[10,] 0.90000 -55.41421
[11,] 1.00000 -49.86463
[12,] 1.10000 -45.32007
[13,] 1.20000 -41.60539
[14,] 1.30000 -38.58759
[15,] 1.40000 -36.16325
[16,] 1.50000 -34.25043
[17,] 1.60000 -32.78304
[18,] 1.70000 -31.70704
[19,] 1.80000 -30.97766
[20,] 1.90000 -30.55740
[21,] 2.00000 -30.41453
[22,] 2.10000 -30.52198
[23,] 2.20000 -30.85644
[24,] 2.30000 -31.39773
[25,] 2.40000 -32.12823
[26,] 2.50000 -33.03249
[27,] 2.60000 -34.09688
[28,] 2.70000 -35.30931
[29,] 2.80000 -36.65901
[30,] 2.90000 -38.13634
[31,] 3.00000 -39.73265
[32,] 3.10000 -41.44013
[33,] 3.20000 -43.25172
[34,] 3.30000 -45.16102
[35,] 3.40000 -47.16218
[36,] 3.50000 -49.24989
[37,] 3.60000 -51.41927
[38,] 3.70000 -53.66583
[39,] 3.80000 -55.98547
[40,] 3.90000 -58.37438
[41,] 4.00000 -60.82907
[42,] 4.10000 -63.34627

4
Plot of the log likelihood function against λ:

● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●

● ● ●
● ● ●
−50

● ● ●
● ●


● ●


−100


Log likelihood


−150


−200

0 1 2 3 4

5
Example:
Let X1 , X2 , . . . , Xn be a random sample of size n from a uniform distribution on the interval
(0, θ). Find the mle of θ.

6
Properties of estimators, method of maximum likelihood - examples

Example 1:
Let X1 , X2 , . . . , Xn be an i.i.d. random variables from a probability distribution with pdf
f (x; θ) = θxθ−1 , 0 < x < 1, 0 < θ < ∞. Find the mle of θ.

Example 2:
Let X1 , X2 , . . . , Xn be an i.i.d. random variables from a probability distribution with pdf
f (x; θ) = e−(x−θ) , 0 < x < ∞, θ < x. Find the mle of θ.

Example 3:
Suppose that X1 , . . . , Xm representing yields per acre for corn variety A, is a random sample
from N (µ1 , σ). Also, Y1 , · · · , Yn representing yields for corn variety B, is a random sample
from N (µ2 , σ). If the two samples are independent, find the maximum likelihood estimate
for the common variance σ 2 . Assume that µ1 and µ2 are unknown.

Example 4:
Let X1 , . . . , Xn denote a random sample from the probability density function f (x; θ) =
(θ + 1)xθ , 0 < x < 1, θ > −1. Find the mle of θ.

Example 5:
In a basket there are green and white marbles. You randomly select marbles with replace-
ment until you see a green marble. You found the first green marble on the 10th trial. Then,
your friend does the same. He randomly selects marbles until he obtains a green marble.
His green marble was seen on the 15th trial. Use the method of maximum likelihood to find
an estimate of p, the proportion of green marbles in the basket.

Problem 6:
Let X1 , X2 , . . . , Xn be an i.i.d. random variables from N (µ, σ).
a. Which of the following estimates is unbiased? Show all your work.
Pn Pn
2 i=1 (Xi − X̄)2 2 − X̄)2
i=1 (Xi
σ̂ = , S =
n n−1

b. Which of the estimates of part (a) has the smaller M SE?


Problem 7
Let X1 , X2 , . . . , Xn be an i.i.d. random sample from a normal population with mean zero
and unknown variance σ 2 .
a. Find the maximum likelihood estimate of σ 2 .

b. Show that the estimate of part (a) is unbiased estimator of σ 2 .

c. Find the variance of the estimate of part (a). Is it consistent?

d. Show that the variance of the estimate of part (a) is equal to the Cramér-Rao lower
bound.

7
Problem 8:
Let X1 , X2 , · · · , Xn denote an i.i.d. random sample from the exponential distribution with
mean λ1 .

a. Derive the maximum likelihood estimate of λ.

b. Find the Cramer-Rao lower bound of the estimator of λ.

c. What is the asymptotic distribution of λ̂?

Problem 9:
Let X1 , X2 , · · · , Xn be independent and identically distributed random variables from a Pois-
son distribution with parameter λ. We know that the maximum likelihood estimate of λ is
λ̂ = x̄.

a. Find the variance of λ̂.

b. Is λ̂ an MVUE?

c. Is λ̂ a consistent estimator of λ?

Problem 10:
Suppose that two independent random samples of n1 and n2 observations are selected from
two normal populations. Further, assume that the populations possess a common variance
σ 2 which is unknown. Let the sample variances be S12 and S22 for which E(S12 ) = σ 2 and
E(S22 ) = σ 2 .

a. Show that the pooled estimator of σ 2 that we derived in class below is unbiased.

(n1 − 1)S12 + (n2 − 1)S22


S2 =
n1 + n2 − 2

b. Find the variance of S 2 .

8
Theorem
Asymptotic efficiency of maximum likelihood estimates.
Why do maximum likelihood estimates have an asymptotic normal distribution? Let X1 , X2 , . . . , Xn
be i.i.d. random variables from a probability
q density function f (x|θ). Then if θ̂ is the MLE
1
of θ the theorem states that θ̂ ∼ N (θ, nI(θ) ).

Proof
We will use Taylor series. This says that for a function h
h(y) ≈ h(y0 ) + h0 (y0 )(y − y0 ).
Start with the likelihood function L = Πni=1 f (xi |θ). Then the log-likelihood is
n
X
ln(L) = lnf (xi |θ).
i=1
Now obtain the derivative w.r.t. θ.
n
∂ X ∂
ln(L) = lnf (xi |θ).
∂θ i=1 ∂θ

Now letting θ̂ be the MLE of θ we write this as a Taylor series about that θ̂:
n n
" n
X ∂2
#
X ∂ X ∂
lnf (xi |θ) ≈ lnf (xi |θ̂) + 2
lnf (xi |θ̂) (θ − θ̂)
i=1 ∂θ i=1 ∂θ i=1 ∂θ

Now divide left and right by n to get:
n n n
∂2
" #
1 X ∂ 1 X ∂ 1 X
√ lnf (xi |θ) ≈ √ lnf (xi |θ̂) + √ lnf (xi |θ̂) (θ − θ̂)
n i=1 ∂θ n i=1 ∂θ n i=1 ∂θ2
Note: The first term on the right hand side is zero (because this is what we do to find θ̂).
Therefore, we have reduced the relationship to the following:
n n
∂2
" #
1 X ∂ 1 X
√ lnf (xi |θ) ≈ √ lnf (xi |θ̂) (θ − θ̂)
n i=1 ∂θ n i=1 ∂θ2
Examine the left hand side: This involves the sum of n independent, identically distributed
things (Central Limit theorem). Each one of these “things” has mean zero and variance
I(θ). Therefore the left hand side follows approximately N (0, I(θ)). Why?

Therefore, the limiting distribution of the right hand side must also be N (0, I(θ), i.e.
n
∂2
" #
1 X
√ lnf (xi |θ̂) (θ − θ̂) ∼ N (0, I(θ)).
n i=1 ∂θ2
Or write it as (watch the n0 s and the minus sign!):
n √
∂2
" #
1X
− 2
lnf (xi |θ̂) n(θ̂ − θ) ∼ N (0, I(θ)).
n i=1 ∂θ
The expression in the bracket converges to I(θ) (law of large numbers) and therefore we can
express the previous expression as

I(θ) n(θ̂ − θ) ∼ N (0, I(θ)),
or
q
1
θ̂ ∼ N (θ, nI(θ)
).

You might also like