Statistics 512 Notes 12: Maximum Likelihood Estimation: X X PX X
Statistics 512 Notes 12: Maximum Likelihood Estimation: X X PX X
So far, the point estimators we have derived have been based on heuristic principles. We would like to have a general method for finding good point estimators. Example: A bucket contains 12 marbles. Each marble is red or white, but the number of red and white marbles are not known. Let denote the number of red marbles in the bucket. To gain some information on the buckets composition, you are allowed to select five of them without replacement and view them. Let X denote the number of red marbles in the sample. Suppose X=3 of the marbles in the sample are red. How should we estimate ? Probability model: Imagine that the marbles are numbered 1-12. A sample of five marbles thus consists of five 12 distinct numbers. All = 792 samples are equally 5 likely. The distribution of X is hypergeometric: 12 x 5 x P ( X = x) = 12 5 The following table shows the probability distribution for X for each for each possible value of .
0 1 2
originally 3 in bucket ( ) 4 5 6 7 8 9 1 0 1 1 1 2
X=Number of red marbles found in sample 0 1 2 3 4 5 1 0 0 0 0 0 .5833 . 0 0 0 0 4167 .3182 . . 0 0 0 5303 1515 .1591 . . . 0 0 4773 3182 0454 .0707 . . . . 0 3535 4243 1414 0101 .0265 . . . . .0012 2210 4419 2652 0442 .0076 . . . . .0076 1136 3788 3788 1136 .0012 . . . . .0265 0442 2652 4419 2210 0 . . . . .0707 0101 1414 4243 3535 0 0 . . . .1591 0454 3182 4773 0 0 0 . . .3182 1515 5303 0 0 0 0 . .5833 4167 0 0 0 0 0 1
Once we obtain the sample X=3, what should we estimate to be? We know that it is impossible that =0, 1, 2, 11 or 12. The set of possible values for once we observe X=3 are =3, 4, 5, 6, 7, 8, 9, 10. Although both =3 and =7 are possible, the occurrence of X=3 would be more likely if =7 [ P =7 ( X = 3) = .4419 ] than if =3 [ P =3 ( X = 3) = .0454 ]. The likelihood function: L( ) = L( ; X 1 ,K , X n ) = f ( X 1 ,K , X n ) In words, the likelihood function is the probability (density) of observing the actual data if the true parameter is . The likelihood function is a function of for fixed values of the sample data X 1 ,K , X n . In the above table, the columns represent the likelihood function for different sample data. Maximum Likelihood Estimation: The maximum likelihood estimate (MLE) of based on the sample is the value of that maximizes the likelihood function for the sample MLE = max L ( ; X1 ,K , X n ) The maximum likelihood estimate is the value of that makes it most probable to observe what was actually observed. X
MLE
0 0
1 2
2 5
3 7
4 10
5 12
Computing the MLE Computing MLE = max L ( ; X1 ,K , X n ) can often be done using differential calculus. It is often easier to work with the log likelihood function l ( ; X 1 ,K , X n ) = log L ( ; X 1 ,K , X n ) Because log is a strictly monotonic increasing function A convenient fact is that if X 1 ,K , X n are independent, then
log L( ; X 1, K , X n ) = log f ( X 1 , K , X n ) = log f ( X i ) =
i =1 n
i =1
log f ( X i )
Xi !
To maximize the log likelihood, we set the first derivative of the log likelihood equal to zero, 1 n l '( ) = i =1 X i n = 0. X is the unique solution to this equation. To confirm that X in fact maximizes l ( ) , we can use the second derivative test, 1 n l ''( ) = 2 i =1 X i
n i =1
X i 0 so that X in fact
n i =1
maximizes l ( ) . When
X i = 0 , it can be seen by