0% found this document useful (0 votes)

17 views

MLEstimation

The document discusses maximum likelihood estimation (MLE). [1] MLE finds the parameter value(s) that maximize the likelihood function given the data. [2] This "maximum likelihood estimate" (MLE) is considered the most likely parameter value relative to other values. [3] MLE forms the basis for statistical inference and estimation and has good statistical properties.

Uploaded by

nithintelkar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

MLEstimation

Uploaded by

nithintelkar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Maximum Likelihood Estimation

The likelihood and log-likelihood functions are the basis for deriving estimators for parameters,
given data. While the shapes of these two functions are different, they have their maximum point
at the same value. In fact, the value of p that corresponds to this maximum point is defined as
the Maximum Likelihood Estimate (MLE) and that value is denoted as ^p. This is the value that
is “mostly likely" relative to the other values. This is a simple, compelling concept and it has a
host of good statistical properties.

Thus, in general, we seek ^) , such that this value maximizes the log-likelihood function. In the
binomial model, the log/ (_) is a function of only one variable, so it is easy to plot and visualize.

The maximum likelihood estimate of p (the unknown parameter in the model) is that value that
maximizes the log-likelihood, given the data. We denote this as ^p. In the binomial model, there
is an analytical form (termed “closed form") of the MLE, thus maximization of the log-likelihood
is not required. In this simple case,

^p = y/n = 7/11 = 0.6363636363...

of course, if the observed data were different, ^p would differ.

The log-likelihood links the data, unknown model parameters and assumptions and
allows rigorous, statistical inferences.

Real world problems have more that one variable or parameter (e.g., p, in the example).
Computers can find the maximum of the multi-dimensional log-likelihood function, the biologist
need not to be terribly concerned with these details.

The actual numerical value of the log-likelihood at its maximum point is of substantial
importance. In the binomial coin flipping example with n = 11 and y = 7, max(_) =
-1.411 (see graph).

The log-likelihood function is of fundamental importance in the theory of inference and in all of
statistics. It is the basis for the methods explored in FW-663. Students should make every
effort to get comfortable with this function in the simple cases. Then, extending the concepts to
more complex cases will come easy.

Likelihood Theory -- What Good Is It?

1. The basis for deriving estimators or estimates of model parameters (e.g., survival
probabilities). These are termed “maximum likelihood estimates," MLEs.

2. Estimates of the precision (or repeatability). This is usually the conditional (on the model)
sampling variance
-covariance matrix (to be discussed).

3. Profile likelihood intervals (asymmetric confidence intervals).

4. A basis for testing hypotheses:

Tests between nested models (so-called likelihood ratio tests)
Goodness of fit tests for a given model

5. Model selection criterion, based on Kullback-Leibler information.

Numbers 1-3 (above) require a model to be “given." Number 4, statistical hypothesis testing,
has become less useful in many respects in the past two decades and we do not stress this
approach as much as others might. Likelihood theory is also important in Bayesian statistics.

Properties of Maximum Likelihood Estimators

For “large" samples (“asymptotically"), MLEs are optimal.

1. MLEs are asymptotically normally distributed.

2. MLEs are asymptotically “minimum variance."

3. MLEs are asymptotically unbiased (MLEs are often biased, but the bias Ä 0 as n Ä
_).
-
One to one transformations are also MLEs. For example, mean life span L is defined as

- ^-
1/log/ (S). Thus, an estimator of L = 1/log/ (S^ ) and then L is also an MLE.

Maximum likelihood estimation represents the backbone of statistical estimation. It is based on

deep theory, originally developed by R. A. Fisher (his first paper on this theory was published in
1912 when he was 22 years old!). While beginning classes often focus on least squares
estimation (“regression"); likelihood theory is the omnibus approach across the sciences,
engineering and medicine.
The Likelihood Principle states that all the relevant information in the sample is contained in
the likelihood function. The likelihood function is also the basis for Bayesian statistics. See
Royall (1997) and Azzalini (1996) for more information on likelihood theory.

Maximum Likelihood Estimates

Generally, the calculus is used to find the maximum point of the log-likelihood function and
obtain MLEs is closed form. This is tedious for biologists and often not useful in real problems
(where a closed form estimator may often not even exist).

The log-likelihood functions we will see have a single mode or maximum point and no local
optima. These conditions make the use of numerical methods appealing and efficient.

Consider, first, the binomial model with a single unknown parameter, p. Using calculus one
could take the first partial derivative of the log-likelihood function with respect to the p, set it to
zero and solve for p. This solution will give ^p, the MLE. This value of ^p is the one the
maximizes the log-likelihood function. It is the value of the parameter that is most likely, given
the data.

The likelihood function provides information on the relative likelihood of various

parameter values, given the data and the model (here, a binomial). Think of 10 of your
friends, 9 of which have one raffle ticket, while the 10>2 has 4 tickets. The person with
4 tickets has a higher likelihood of winning, relative to the other 9. If you were to try to
select the mostly likely winner of the raffle, which person would you pick? Most would
select the person with 4 tickets (would you?). Would you feel strongly that this person
would win? Why? or Why not?

Now, what if 8 people had a single ticket, one had 4 tickets, but the last had 80 tickets.
Surely, the person with 80 tickets is most likely to win (but not with certainty). In this
simple example you have a feeling about the “strength of evidence" about the likely
winner. In the first case, one person has an edge, but not much more. In the second
case, the person with 80 tickets is relatively very likely to win.

The shape of the log-likelihood function is important in a conceptual way to the raffle ticket
example. If the log-likelihood function is relatively flat, one can make the interpretation that
several (perhaps many) values of p are nearly equally likely. They are relatively alike; this is
quantified as the sampling variance or standard error.

If the log-likelihood function is fairly flat, this implies considerable uncertainty and this is
reflected in large sampling variances and standard errors, and wide confidence intervals. On the
other hand, if the log-likelihood function is fairly peaked near its maximum point, this indicates
some values of p are relatively very likely compared to others (like the person with 80 raffle
tickets). There is some considerable degree of certainty implied and this is reflected in small
sampling variances and standard errors, and narrow confidence intervals. So, the log-likelihood
function at its maximum point is important as well as the shape of the function near this maximum
point.

The shape of the likelihood function near the maximum point can be measured by the analytical
second partial derivatives and these can be closely approximated numerically by a computer.
Such numerical derivatives are important in complicated problems where the log-likelihood
exists in 20-60 dimensions (i.e., has 20-60 unknown parameters).

0.14
0.12
0.1
Likelihood

0.08
0.06
0.04
0.02
0
0.75 0.8 0.85 0.9 0.95 1
theta
-2

-4
log Likelihood

-6

-8

-10

-12
0.75 0.8 0.85 0.9 0.95 1
theta

The standard, analytical method of finding the MLEs is to take the first partial derivatives of the
log-likelihood with respect to each parameter in the model. For example:

`jn(_( p ))
`p œ 11 5
p • 1•p (n œ 16)

Set to zero:

`jn(_( p ))
`p œ0

11 5 œ0
• 1•
p p

and solve to get ^p œ 11/16, the MLE.

For most models we have more than one parameter. In general, let there be K parameters, ) 1 ,
) 2 , á , ) K . Based on a specific model we can construct the log-likelihood,

log/ (_ () 1 , ) 2 , á , ) K ± data)) ´ log/ (_)

and K log-likelihood equations,

`log(_)
`)1 œ0

`log(_)
`)2 œ0

`log(_)
`)K œ0

The solution of these equations gives the MLEs, ^) 1 , ^) 2 , á , ^) K .

The MLEs are almost always unique; in particular this is true of multinomial-based models.

In principle log(_ () 1 , ) 2 , á , ) K ± data)) ´ log(_) defines a “surface" in K-dimensional space,

ideas of curvature still apply (as mathematical constructs). Plotting is hard for more than 2
parameters.

Sampling variances and covariances of the MLEs are computed from the log-likelihood,

log(_ () 1 , ) 2 , á , ) K ± data)) ´ log(_)

based on curvature at the maximum. Actual formulae involve second mixed-partial derivatives of
the log-likelihood, hence quantities like

`# log(_) `# log(_)
`)1 `)1 and `)1 `)2
evaluated at the MLEs.

^ be the estimated variance-covariance matrix for the K MLEs; D

^ is a K by K matrix. The
Let D
^ is the matrix of elements as below.
inverse of D

`# log(_)
• `)i `)i

as the ith diagonal element, and

`# jn(_)
• `)i `)j

as the i, j th off-diagonal element.

(these mixed second partial derivatives are evaluated at the MLEs).

The use of log-likelihood functions (rather than likelihood functions) is deeply rooted in the
nature of likelihood theory. Note also that LRT theory leads to tests which basically always
involve taking • 2 ‚ (log-likelihood at MLEs).

Therefore we give this quantity a symbol and a name: deviance,

deviance œ • 2loge (_( ^

))) + 2log/ (_= (^
))),
or
= –2 Šloge (_( ^) )) – log/ (_= ()^ ))‹ ,

evaluated at the MLEs for some model. Here, the first term is the log-likelihood, evaluated at
its maximum point, for the model in question and the second term is the log-likelihood, evaluated
at its maximum point, for the saturated model. The meaning of a saturated model will become
clear in the following material; basically, in the multinomial models, it is a model with as many
parameters as cells. This final term in the deviance can often be dropped, as it is often a
constant across models.
The deviance for the saturated model ´ 0. Deviance, like information, is additive. The
deviance is approximately ;# with df = number of cells –K and is thus useful is examining
goodness-of-fit of a model. There are some ways where use of the deviance in this way will not
provide correct results. MARK outputs the deviance as a measure of model fit and this is often
very useful.

20
Deviance

0
0.75 0.8 0.85 0.9 0.95 1
theta

Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Etf3600 Lecture3 Mle LPM 2013
No ratings yet
Etf3600 Lecture3 Mle LPM 2013
36 pages
Class 1, 18.05 Jeremy Orloff and Jonathan Bloom 1 Probability vs. Statistics
No ratings yet
Class 1, 18.05 Jeremy Orloff and Jonathan Bloom 1 Probability vs. Statistics
20 pages
Probit Logit Interpretation
No ratings yet
Probit Logit Interpretation
26 pages
Modelos - Sem15 - Logit - Probit - Logistic Regression
No ratings yet
Modelos - Sem15 - Logit - Probit - Logistic Regression
8 pages
Codigo Box Cox SAS
No ratings yet
Codigo Box Cox SAS
37 pages
Logistic Nota
No ratings yet
Logistic Nota
87 pages
JMP Neural Network Methodology
No ratings yet
JMP Neural Network Methodology
11 pages
Maximum Likelihood Estimation by R: Instructor: Songfeng Zheng
No ratings yet
Maximum Likelihood Estimation by R: Instructor: Songfeng Zheng
5 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Explaining The Saddlepoint Approximation
No ratings yet
Explaining The Saddlepoint Approximation
9 pages
ARDL Models-Bounds Testing
88% (8)
ARDL Models-Bounds Testing
17 pages
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
No ratings yet
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
5 pages
Control Ex
No ratings yet
Control Ex
14 pages
LoD MLE A SAS Macro For LoD Estimation FINAL v1 05Jan2016A
No ratings yet
LoD MLE A SAS Macro For LoD Estimation FINAL v1 05Jan2016A
10 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
Nonlinear Cosinor
No ratings yet
Nonlinear Cosinor
13 pages
Depmixs4 Paper
No ratings yet
Depmixs4 Paper
21 pages
Notes 4 - Influential Points and Departures From Linearity
No ratings yet
Notes 4 - Influential Points and Departures From Linearity
4 pages
1 What Is A Randomized Algorithm?: Lecture Notes CS:5360 Randomized Algorithms
No ratings yet
1 What Is A Randomized Algorithm?: Lecture Notes CS:5360 Randomized Algorithms
8 pages
Maximum Likelihood Estimation - Stokastik
No ratings yet
Maximum Likelihood Estimation - Stokastik
8 pages
AUZIPRE
No ratings yet
AUZIPRE
12 pages
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
No ratings yet
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
14 pages
Mstat Note12 Parametric Inference FSP
No ratings yet
Mstat Note12 Parametric Inference FSP
45 pages
Data Science Learning2
No ratings yet
Data Science Learning2
28 pages
Zeta Types
No ratings yet
Zeta Types
15 pages
325 Notes
No ratings yet
325 Notes
23 pages
Chapter 10: Binary Choice and Limited Dependent Variable Models, and Maximum Likelihood Estimation
No ratings yet
Chapter 10: Binary Choice and Limited Dependent Variable Models, and Maximum Likelihood Estimation
30 pages
An Introduction To Logistic Regression
No ratings yet
An Introduction To Logistic Regression
48 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
21 pages
Introduction To SCILAB: Ece Department (Maharaja Agrasen Institute of Technology)
No ratings yet
Introduction To SCILAB: Ece Department (Maharaja Agrasen Institute of Technology)
24 pages
ARDL
No ratings yet
ARDL
10 pages
Christophe Andrieu - Arnaud Doucet Bristol, BS8 1TW, UK. Cambridge, CB2 1PZ, UK. Email
No ratings yet
Christophe Andrieu - Arnaud Doucet Bristol, BS8 1TW, UK. Cambridge, CB2 1PZ, UK. Email
4 pages
ALGORITHM 652 HOMPACK - A Suite of Codes For Globally Convergent Homotopy Algorithms
No ratings yet
ALGORITHM 652 HOMPACK - A Suite of Codes For Globally Convergent Homotopy Algorithms
30 pages
X, Given Its Mean Value. Given A Particular Value For The Mean, by Calculating The
No ratings yet
X, Given Its Mean Value. Given A Particular Value For The Mean, by Calculating The
2 pages
Factor in R PDF
No ratings yet
Factor in R PDF
4 pages
Depmix S4
No ratings yet
Depmix S4
17 pages
Big O MIT PDF
No ratings yet
Big O MIT PDF
9 pages
Maximum Likelihood Method-Red1eco
No ratings yet
Maximum Likelihood Method-Red1eco
14 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
Econometrics - Regression Powerpoint
No ratings yet
Econometrics - Regression Powerpoint
18 pages
Unit - 2 - Word Level Analysis
No ratings yet
Unit - 2 - Word Level Analysis
24 pages
A Guide To Modern Econometrics by Verbeek 181 190
No ratings yet
A Guide To Modern Econometrics by Verbeek 181 190
10 pages
Intro To Forecasting
No ratings yet
Intro To Forecasting
15 pages
Manning 11 GLMM SeemsGOOD
No ratings yet
Manning 11 GLMM SeemsGOOD
40 pages
Estimating Bayes Factors Via Posterior Simulation With The Laplace-Metropolis Estimator
No ratings yet
Estimating Bayes Factors Via Posterior Simulation With The Laplace-Metropolis Estimator
16 pages
Makala Mathematic V.english
No ratings yet
Makala Mathematic V.english
5 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Estimation For Multivariate Linear Mixed Models
No ratings yet
Estimation For Multivariate Linear Mixed Models
6 pages
Logistic Regression Analysis (Dayton)
No ratings yet
Logistic Regression Analysis (Dayton)
9 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
MATH2071: LAB 8: Explicit ODE Methods
No ratings yet
MATH2071: LAB 8: Explicit ODE Methods
10 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
Numerical Analysis
100% (1)
Numerical Analysis
92 pages
Baum Chapter10 PDF
No ratings yet
Baum Chapter10 PDF
29 pages
Lab 8
No ratings yet
Lab 8
5 pages
SODEHigjham
No ratings yet
SODEHigjham
23 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
Worked Examples in Advanced Mechanics of Materials using MATLAB
From Everand
Worked Examples in Advanced Mechanics of Materials using MATLAB
Eric Okoth Ogur
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

MLEstimation

Uploaded by

MLEstimation

Uploaded by

Maximum Likelihood Estimation

^p = y/n = 7/11 = 0.6363636363...

of course, if the observed data were different, ^p would differ.

Likelihood Theory -- What Good Is It?

3. Profile likelihood intervals (asymmetric confidence intervals).

4. A basis for testing hypotheses:

5. Model selection criterion, based on Kullback-Leibler information.

Properties of Maximum Likelihood Estimators

For “large" samples (“asymptotically"), MLEs are optimal.

1. MLEs are asymptotically normally distributed.

2. MLEs are asymptotically “minimum variance."

Maximum likelihood estimation represents the backbone of statistical estimation. It is based on

Maximum Likelihood Estimates

The likelihood function provides information on the relative likelihood of various

and solve to get ^p œ 11/16, the MLE.

log/ (_ () 1 , ) 2 , á , ) K ± data)) ´ log/ (_)

and K log-likelihood equations,

The solution of these equations gives the MLEs, ^) 1 , ^) 2 , á , ^) K .

In principle log(_ () 1 , ) 2 , á , ) K ± data)) ´ log(_) defines a “surface" in K-dimensional space,

log(_ () 1 , ) 2 , á , ) K ± data)) ´ log(_)

^ be the estimated variance-covariance matrix for the K MLEs; D

as the ith diagonal element, and

as the i, j th off-diagonal element.

(these mixed second partial derivatives are evaluated at the MLEs).

Therefore we give this quantity a symbol and a name: deviance,

deviance œ • 2loge (_( ^

You might also like