0% found this document useful (0 votes)

34 views

Notes

Large Sample Theory provides approximations to statistical procedures as the sample size increases without bound. It uses three main tools: the Law of Large Numbers, the Central Limit Theorem, and Taylor expansion. For maximum likelihood estimation, it studies the behavior of the log-likelihood function U(θ) and its derivatives as the sample size increases. It shows that under regularity conditions, the maximum likelihood estimator θ^ is consistent, meaning it converges in probability to the true parameter value θ0 as the sample size increases.

Uploaded by

Arun Natchiyappan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Notes

Uploaded by

Arun Natchiyappan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Large Sample Theory

Large Sample Theory is a name given to the search for approximations to

the behaviour of statistical procedures which are derived by computing limits
as the sample size, n, tends to infinity. Suppose we have a data set with a
fairly large sample size, say n = 100. We imagine our data set is one in a
sequence of possible data sets — one for each possible value of n. If we have
a sequence of statistics which converges to something as n tends to infinity
we approximate the value of a probability, for instance, when n = 100 by
the corresponding value when n = ∞. In this section we will investigate
approximations of this kind for Maximimum Likelihood Estimation.
Most large sample theory uses three main technical tools: the Law of
Large Numbers (LLN), the Central Limit Theorem (CLT) and Taylor ex-
pansion. I assume you have heard of all of these but will state versions of
them as we go.
These tools are generally easier to apply to statistics for which we have
explicit formulas than to statistics, like maximum likelihood estimates, where
we do not usually have explicit formulas. In this situation we study the
equations you solve to find the MLEs instead.
We therefore will study the approximate behaviour of the MLE, θ̂, by
studying the function U . Notice first that U is a sum of independent random
variables. This will allow us to apply both the LLN and the CLT to U .

Theorem 1 (LLN) If Y1 , Y2 , . . . are iid with mean µ then

P
Yi
→µ
n
This is called the law of large numbers but it comes in two forms: Strong
and Weak.

Theorem 2 (SLLN) If Y1 , Y2 , . . . are iid with mean µ then

P lim
P
Yi
n→∞ n
=µ)=1.

The strong law is harder to prove than the weak law of large numbers:

1
Theorem 3 (WLLN) If Y1 , Y2 , . . . are iid with mean µ then for each posi-
tive P
Yi
lim P (| − µ| > ) = 0
n
For iid Yi the stronger conclusion (the SLLN) holds but for our heuristics
we will ignore the differences between these notions.
Now suppose that θ0 is the true value of θ. Then

U (θ)/n → µ(θ)

where

∂ log f
µ(θ) =Eθ0 (Xi , θ)
∂θ
Z
∂ log f
= (x, θ)f (x, θ0 )dx
∂θ

Remark: This convergence is pointwise and might not be uniform. That is,
there might be some θ values where the convergence takes much longer than
others. It could be that for every n there is a θ where U (θ)/n and µ(θ) are
not close together.
Consider as an example the case of N (µ, 1) data where
X
U (µ)/n = (Xi − µ)/n = X̄ − µ

If the true mean is µ0 then X̄ → µ0 and

U (µ)/n → µ0 − µ

If we think of a µ < µ0 we see that the derivative of `(µ) is likely to

be positive so that ` increases as we increase µ. For µ more than µ0 the
derivative is probably negative and so ` tends to be decreasing for µ > 0. It
follows that ` is likely to be maximized close to µ0 .
Now we repeat these ideas for a more general case. We study the random
variable log[f (Xi , θ)/f (Xi , θ0 )]. You know the inequality

E(X)2 ≤ E(X 2 )

2
(because the difference is V ar(X) ≥ 0. This inequality has the following
generalization, called Jensen’s inequality. If g is a convex function (non-
negative second derivative, roughly) then
g(E(x)) ≤ E(g(X))
The inequality above has g(x) = x2 . We use g(x) = − log(x) which is convex
because g 00 (x) = x−2 > 0. We get
− log(Eθ0 [f (Xi , θ)/f (Xi , θ0 )] ≤ Eθ0 [− log{f (Xi , θ)/f (Xi , θ0 )}]
But
Z
f (x, θ)
Eθ0 [f (Xi , θ)/f (Xi , θ0 )] = f (x, θ0 )dx
f (x, θ0 )
Z
= f (x, θ)dx

=1
We can reassemble the inequality and this calculation to get
Eθ0 [log{f (Xi , θ)/f (Xi , θ0 )}] ≤ 0
It is possible to prove that the inequality is strict unless the θ and θ0 densities
are actually the same. Let µ(θ) < 0 be this expected value. Then for each θ
we find
X
n−1 [`(θ) − `(θ0 )] = n−1 log[f (Xi , θ)/f (Xi , θ0 )]
→ µ(θ)
This proves that the likelihood is probably higher at θ0 than at any other
single θ. This idea can often be stretched to prove that the MLE is consis-
tent.
Definition: A sequence θ̂n of estimators of θ is consistent if θ̂n converges
weakly (or strongly) to θ.
Proto theorem: In regular problems the MLE θ̂ is consistent.
Now let us study the shape of the log likelihood near the true value of θ̂
under the assumption that θ̂ is a root of the likelihood equations close to θ0 .
We use Taylor expansion to write, for a 1 dimensional parameter θ,
U (θ̂) = 0
= U (θ0 ) + U 0 (θ0 )(θ̂ − θ0 ) + U 00 (θ̃)(θ̂ − θ0 )2 /2

3
for some θ̃ between θ0 and θ̂. (This form of the remainder in Taylor’s theorem
is not valid for multivariate θ.) The derivatives of U are each sums of n terms
and so should be both proportional to n in size. The second derivative is
multiplied by the square of the small number θ̂ − θ0 so should be negligible
compared to the first derivative term. If we ignore the second derivative term
we get
−U 0 (θ0 )(θ̂ − θ0 ) ≈ U (θ0 )
Now let’s look at the terms U and U 0 .
In the normal case X
U (θ0 ) = (Xi − µ0 )
√
has a normal distribution with mean 0 and variance n (SD n). The deriva-
tive is simply
U 0 (µ) = −n
and the next derivative U 00 is 0. We will analyze the general case by noticing
that both U and U 0 are sums of iid random variables. Let
∂ log f
Ui = (Xi , θ0 )
∂θ
and
∂ 2 log f
Vi = − (Xi , θ)
∂θ2
P
In general, U (θ0 ) = Ui has mean 0 and approximately a normal distri-
bution. Here is how we check that:
Eθ0 (U (θ0 )) = nEθ0 (U1 )
Z
∂ log(f (x, θ))
=n (x, θ0 )f (x, θ0 )dx
∂θ
Z
∂f /∂θ(x, θ0 )
=n θf (x, θ0 )dx
f (x, θ0 )
Z
∂f
=n (x, θ0 )dx
∂θ
Z
∂
=n f (x, θ)dx
∂θ θ=θ0
∂
=n 1
∂θ
=0

4
Notice that I have interchanged the order of differentiation and integra-
tion at one point. This step is usually justified by applying the dominated
convergence theorem to the definition of the derivative. The same tactic can
be applied by differentiating the identity which we just proved
Z
∂ log f
(x, θ)f (x, θ)dx = 0
∂θ
Taking the derivative of both sides with respect to θ and pulling the derivative
under the integral sign again gives
Z
∂ ∂ log f
(x, θ)f (x, θ) dx = 0
∂θ ∂θ

Do the derivative and get

Z 2 Z
∂ log(f ) ∂ log f ∂f
− 2
f (x, θ)dx = (x, θ) (x, θ)dx
∂θ ∂θ ∂θ
Z 2
∂ log f
= (x, θ) f (x, θ)dx
∂θ

Definition: The Fisher Information is

I(θ) = −Eθ (U 0 (θ)) = nEθ0 (V1 )

We refer to I(θ0 ) = Eθ0 (V1 ) as the information in 1 observation.

The idea is that I is a measure of how curved the log likelihood tends
to be at the true value of θ. Big curvature means precise estimates. Our
identity above is
I(θ) = V arθ (U (θ)) = nI(θ)
Now we return to our Taylor expansion approximation

−U 0 (θ0 )(θ̂ − θ0 ) ≈ U (θ0 )

and study the two appearancesPof U .

We have shown that U = Ui is a sum of iid mean 0 random variables.
The central limit theorem thus proves that

n−1/2 U (θ) ⇒ N (0, σ 2 )

5
where σ 2 = V ar(Ui ) = E(Vi ) = I(θ).
Next observe that X
−U 0 (θ) = Vi
where again
∂Ui
Vi = −
∂θ
The law of large numbers can be applied to show

−U 0 (θ0 )/n → Eθ0 [V1 ] = I(θ0 )

Now manipulate our Taylor expansion as follows

P −1 P
Vi U
1/2
n (θ̂ − θ0 ) ≈ √ i
n n

Apply Slutsky’s Theorem to conclude that the right hand side of this con-
verges in distribution to N (0, σ 2 /I(θ)2 ) which simplifies, because of the iden-
tities, to N (0, 1/I(θ).
Summary
In regular families:

• Under strong regularity conditions Jensen’s inequality can be used to

demonstrate that θ̂ which maximizes ` globally is consistent and that
this θ̂ is a root of the likelihood equations.

• It is generally easier to study ` only close to θ0 . For instance define A

to be the event that ` is concave on the set of θ such that |θ − θ0 | < δ
and the likelihood equations have a unique root in that set. Under
weaker conditions than the previous case we can prove that there is a
δ > 0 such that
P (A) → 1
In that case we can prove that the root θ̂ of the likelihood equations
mentioned in the definition of A is consistent.

• Sometimes we can only get an even weaker conclusion. Define B to be

the event that `(θ) is concave for n1/2 |θ − θ0 | < L and there is a unique
root of ` over this range. Again this root is consistent but there might
be other consistent roots of the likelihood equations.

6
• Under any of these scenarios there is a consistent root of the likelihood
equations which is definitely the closest to the true value θ0 . This root
θ̂ has the property
√
n(θ̂ − θ0 ) ⇒ N (0, 1/I(θ)) .

We usually simply say that the MLE is consistent and asymptotically

normal with an asymptotic variance which is the inverse of the Fisher infor-
mation. This assertion is actually valid for vector valued θ where now I is a
matrix with ijth entry 2
∂ `
Ii j = −E
∂θi ∂θj
Estimating Equations
The same ideas arise in almost any model where estimates are derived
by solving some equation. As an example I sketch large sample theory for
Generalized Linear Models.
Suppose that for i = 1, . . . , n we have observations of the numbers of
cancer cases Yi in some group of people characterized by values xi of some
covariates. You are supposed to think of xi as containing variables like age,
or a dummy for sex or average income or . . . A parametric regression model
for the Yi might postulate that Yi has a Poisson distribution with mean µi
where the mean µi depends somehow on the covariate values. Typically we
might assume that g(µi ) = β0 + xi β where g is a so-called link function,
often for this case g(µ) = log(µ) and xi β is a matrix product with xi written
as a row vector and µ a column vector. This is supposed to function as a
“linear regression model with Poisson errors”. I will do as a special case
log(µi ) = βxi where xi is a scalar.
The log likelihood is simply
X
`(β) = (Yi log(µi ) − µi )

ignoring irrelevant factorials. The score function is, since log(µi ) = βxi ,
X X
U (β) = (Yi xi − xi µi ) = xi (Yi − µi )

(Notice again that the score has mean 0 when you plug in the true parameter
value.) The key observation, however, is that it is not necessary to believe

7
that Yi has a Poisson distribution to make solving the equation U = 0 sensi-
ble. Suppose only that log(E(Yi )) = xi β. Then we have assumed that
Eβ (U (β)) = 0
This was the key condition in proving that there was a root of the likelihood
equations which was consistent and here it is what is needed, roughly, to
prove that the equation U (β) = 0 has a consistent root β̂. Ignoring higher
order terms in a Taylor expansion will give
V (β)(β̂ − β) ≈ U (β)
where V = −U 0 . In the MLE case we had identities relating the expectation
of V to the variance of U . In general here we have
X
V ar(U ) = x2i V ar(Yi )

If Yi is Poisson with mean µi (and so V ar(Yi ) = µi ) this is

X
V ar(U ) = x2i µi
Moreover we have
Vi = x2i µi
and so X
V (β) = x2i µi
The central limit theorem (the Lyapunov kind) will show P 2 that U (β) has an
2
approximate normal distribution with variance σU = xi V ar(Yi ) and so
X
β̂ − β ≈ N (0, σU2 /( x2i µi )2 )

If V ar(Yi )P= µi , as it is for the Poisson case, the asymptotic variance simpli-
fies to 1/ x2i µi .
Notice that other estimating equations are possible. People suggest al-
ternatives very often. If wi is any set of deterministic weights (even possibly
depending on µi then we could define
X
U (β) = wi (Yi − µi )
and still conclude that U = 0 probably has a consistent root which has an
asymptotic normal distribution. This idea is being used all over the place
these days: see, for example Zeger and Liang’s Generalized estimating equa-
tions (GEE) which the econometricians call Generalized Method of Moments.

8
Problems with maximum likelihood

1. In problems with many parameters the approximations don’t work very

well and maximum likelihood estimators can be far from the right an-
swer. See your homework for the Neyman Scott example where the
MLE is not consistent.

2. When there are multiple roots of the likelihood equation you must
choose the right root. To do so you might start with a different con-
sistent estimator and then apply some iterative scheme like Newton
Raphson to the likelihood equations to find the MLE. It turns out
not many steps of NR are generally required if the starting point is a
reasonable estimate.

Finding (good) preliminary Point Estimates

Method of Moments
Basic strategy: set sample moments equal to population moments and
solve for the parameters.
Definition: The rth sample moment (about the origin) is
n
1X r
X
n i=1 i

The rth population moment is

E(X r )

Definition: Central moments are

n
1X
(Xi − X̄)r
n i=1

and
E [(X − µ)r ] .
If we have p parameters we can estimate the parameters θ1 , . . . , θp by
solving the system of p equations:

µ1 = X̄

9
µ02 = X 2
and so on to
µ0p = X p
You need to remember that the population moments µ0k will be formulas
involving the parameters.
Gamma Example
The Gamma(α, β) density is
α−1
1 x x
f (x; α, β) = exp − 1(x > 0)
βΓ(α) β β

and has
µ1 = αβ
and
µ02 = αβ 2
This gives the equations

αβ = X
αβ 2 = X 2

Divide the second by the first to find the method of moments estimate of β
is
β̃ = X 2 /X
Then from the first equation get

α̃ = X/β̃ = (X)2 /X 2

The equations are much easier to solve than the likelihood equations
which involve the function
d
ψ(α) = log(Γ(α))
dα
called the digamma function.

Worksheet Activity No. 4: Preparation of Nutrient Broth
No ratings yet
Worksheet Activity No. 4: Preparation of Nutrient Broth
3 pages
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
Profile Likelihood Method
No ratings yet
Profile Likelihood Method
21 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
All Ex Sol
No ratings yet
All Ex Sol
43 pages
Statistics
No ratings yet
Statistics
60 pages
Maximum Likelihood Estimation (MLE)
No ratings yet
Maximum Likelihood Estimation (MLE)
4 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
Statistics
No ratings yet
Statistics
4 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
4 Comparison of Estimators: 4.1 Optimality Theory
No ratings yet
4 Comparison of Estimators: 4.1 Optimality Theory
16 pages
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
No ratings yet
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
7 pages
prml_solution_manual-2
No ratings yet
prml_solution_manual-2
122 pages
delta_method
No ratings yet
delta_method
10 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Inf 2
No ratings yet
Inf 2
37 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
SampleQs Solutions PDF
No ratings yet
SampleQs Solutions PDF
20 pages
statistics_lecture 7
No ratings yet
statistics_lecture 7
47 pages
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
No ratings yet
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
11 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
Mathematical Statistics (II)
No ratings yet
Mathematical Statistics (II)
112 pages
18.650 Statistics For Applications
No ratings yet
18.650 Statistics For Applications
25 pages
Asymptotic Statistics (By Changliang ZOU)
No ratings yet
Asymptotic Statistics (By Changliang ZOU)
115 pages
Math435 HW 8
No ratings yet
Math435 HW 8
8 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
109 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
STAT4027 Assignment 1: Lewis Hastie
No ratings yet
STAT4027 Assignment 1: Lewis Hastie
26 pages
STAT2602 Tutorial 5
No ratings yet
STAT2602 Tutorial 5
7 pages
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
No ratings yet
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
47 pages
ps2,3
No ratings yet
ps2,3
48 pages
18.6501x Fundamentals of Statistics
100% (1)
18.6501x Fundamentals of Statistics
8 pages
8112 Notes
No ratings yet
8112 Notes
79 pages
MLE Stuff
No ratings yet
MLE Stuff
8 pages
Adv Statistics I
No ratings yet
Adv Statistics I
95 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Classical Prelim Version
No ratings yet
Classical Prelim Version
6 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
89 pages
STAT 135 Solutions To Homework 4:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 4:: 30 Points
9 pages
NOTES
No ratings yet
NOTES
14 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
Proof Wilks Theorem Likelihood Ratio Test
No ratings yet
Proof Wilks Theorem Likelihood Ratio Test
4 pages
Stat210b Lecture 9
No ratings yet
Stat210b Lecture 9
6 pages
Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World
No ratings yet
Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World
13 pages
EE Exercise Solutions 2022
No ratings yet
EE Exercise Solutions 2022
21 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Pattern Classification: HW3: 1 Exercise 3.6
No ratings yet
Pattern Classification: HW3: 1 Exercise 3.6
11 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Thinning Poisson Process
No ratings yet
Thinning Poisson Process
10 pages
Stat 2013
No ratings yet
Stat 2013
132 pages
Econometría
No ratings yet
Econometría
43 pages
lecture1_ml_MLE
No ratings yet
lecture1_ml_MLE
103 pages
ემპირიული პროცესები
No ratings yet
ემპირიული პროცესები
131 pages
Ch3 PDF
No ratings yet
Ch3 PDF
55 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Case-Commentary-An-Introduction-to-FILAC (1)
No ratings yet
Case-Commentary-An-Introduction-to-FILAC (1)
7 pages
Proverb Dan Riddle
No ratings yet
Proverb Dan Riddle
11 pages
Planning & Designining Rain Water Harvesting Syst.
No ratings yet
Planning & Designining Rain Water Harvesting Syst.
64 pages
The Value of Philosophy by Bertrand Russell
No ratings yet
The Value of Philosophy by Bertrand Russell
4 pages
Micropara 1st Activity
No ratings yet
Micropara 1st Activity
24 pages
Geography P1 GR 10 Exemplar 2012 Annexure Eng
No ratings yet
Geography P1 GR 10 Exemplar 2012 Annexure Eng
12 pages
Elements of Journalism
No ratings yet
Elements of Journalism
12 pages
Question of Way Throught The Woods
No ratings yet
Question of Way Throught The Woods
10 pages
Practical Research 1 Reviewer
No ratings yet
Practical Research 1 Reviewer
2 pages
Meat Packaging PDF
No ratings yet
Meat Packaging PDF
39 pages
Levels of Organisation Test Questions - GCSE Biology (Single Science) Revision - BBC Bitesize
No ratings yet
Levels of Organisation Test Questions - GCSE Biology (Single Science) Revision - BBC Bitesize
1 page
MBA Group 1 BRM Presentation Question 11
No ratings yet
MBA Group 1 BRM Presentation Question 11
18 pages
CH 7 Transmission Lines Part 1
No ratings yet
CH 7 Transmission Lines Part 1
32 pages
MCQ in Strength of Materials Part 2 ECE Board Exam
No ratings yet
MCQ in Strength of Materials Part 2 ECE Board Exam
10 pages
Daily Bread
No ratings yet
Daily Bread
62 pages
Test 15 - Paper
No ratings yet
Test 15 - Paper
17 pages
GPPC-GSPC Group Policy
No ratings yet
GPPC-GSPC Group Policy
15 pages
(MT8701)
100% (2)
(MT8701)
2 pages
Grade 9 March Test 2024
No ratings yet
Grade 9 March Test 2024
7 pages
Social Lesson Plan Lo 21
No ratings yet
Social Lesson Plan Lo 21
5 pages
MacLaury - VANTAGE Theory - in Book 1995
No ratings yet
MacLaury - VANTAGE Theory - in Book 1995
408 pages
Insta Learn - Is Matter Around Us Pure
No ratings yet
Insta Learn - Is Matter Around Us Pure
14 pages
IELTS Listening - Strategy
100% (1)
IELTS Listening - Strategy
23 pages
B. Pharm - Study Material_Notes Drive Link
No ratings yet
B. Pharm - Study Material_Notes Drive Link
1 page
1st Year Math
No ratings yet
1st Year Math
1 page
Control System Development For Small UAV Gimbal
No ratings yet
Control System Development For Small UAV Gimbal
113 pages
Libra (Western Zodiac) MBTI Stereotypes ENFJ or ENFP
No ratings yet
Libra (Western Zodiac) MBTI Stereotypes ENFJ or ENFP
1 page
Geography Settlement
No ratings yet
Geography Settlement
4 pages
MCM Thesis
100% (3)
MCM Thesis
8 pages

Notes

Uploaded by

Notes

Uploaded by

Large Sample Theory

Large Sample Theory is a name given to the search for approximations to

Theorem 1 (LLN) If Y1 , Y2 , . . . are iid with mean µ then

Theorem 2 (SLLN) If Y1 , Y2 , . . . are iid with mean µ then

If the true mean is µ0 then X̄ → µ0 and

If we think of a µ < µ0 we see that the derivative of `(µ) is likely to

Do the derivative and get

Definition: The Fisher Information is

I(θ) = −Eθ (U 0 (θ)) = nEθ0 (V1 )

We refer to I(θ0 ) = Eθ0 (V1 ) as the information in 1 observation.

−U 0 (θ0 )(θ̂ − θ0 ) ≈ U (θ0 )

and study the two appearancesPof U .

n−1/2 U (θ) ⇒ N (0, σ 2 )

−U 0 (θ0 )/n → Eθ0 [V1 ] = I(θ0 )

Now manipulate our Taylor expansion as follows

• Under strong regularity conditions Jensen’s inequality can be used to

• It is generally easier to study ` only close to θ0 . For instance define A

• Sometimes we can only get an even weaker conclusion. Define B to be

We usually simply say that the MLE is consistent and asymptotically

If Yi is Poisson with mean µi (and so V ar(Yi ) = µi ) this is

1. In problems with many parameters the approximations don’t work very

Finding (good) preliminary Point Estimates

The rth population moment is

Definition: Central moments are

You might also like