0% found this document useful (0 votes)
91 views

Proof Wilks Theorem Likelihood Ratio Test

This document provides a proof of Wilks' theorem on likelihood ratio tests. It defines key notations and partitions parameters, estimators, and information matrices. It then performs Taylor expansions to show that the likelihood ratio statistic is asymptotically chi-squared distributed with degrees of freedom equal to the number of restricted parameters under the null hypothesis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Proof Wilks Theorem Likelihood Ratio Test

This document provides a proof of Wilks' theorem on likelihood ratio tests. It defines key notations and partitions parameters, estimators, and information matrices. It then performs Taylor expansions to show that the likelihood ratio statistic is asymptotically chi-squared distributed with degrees of freedom equal to the number of restricted parameters under the null hypothesis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Stat 701 4/19/02

Proof of Wilks’ Theorem on LRT


This handout is intended to supply full details and re-cap of carefully
defined notations of the argument given in class proving the asymptotic dis-
tributional convergence of the likelihood ratio test statistic of general null
hypotheses restricting the values of a subset of parameter components to a
chi-square with degrees of freedom equal to the number of components re-
stricted. Throughout, we assume that the data X = {Xi }ni=1 constitute an
iid sample (of values in some Euclidean data-space) from a density f (x, ϑ)
known except for unknown parameter ϑ ⊂ Θ ⊂ Rk . We assume that the
density f satisfies all of the regularity conditions previously needed to en-
sure that maximum likelihood estimators are locally unique, consistent, and
asymptotically normal. These conditions include the restriction that Θ
contain an open neighborhood of the true value ϑ0 governing the data, and
that Θ lies in some sufficiently small neighborhood of ϑ0 not depending
upon n. The likelihood for the data X is denoted by L(X, ϑ) and the
unrestricted Maximum Likelihood Estimator (MLE) for ϑ is ϑ̂.
Now consider the null hypothesis H0 : ϑ0,j = 0 for 1 ≤ j ≤ r, where
0 < r < k is fixed. Define the restricted MLE ϑ̂res as the maximizer of
L(X, ϑ) over parameter vectors ϑ ⊂ Θ such that ϑ0,j = 0 for 1 ≤ j ≤
r. We require a detailed set of notations designed to partition parameters,
estimators, gradients, score statistics, and information matrices into parts
respectively reflecting the first r and the last k − r components. Under
the null hypothesis, the parameter vector ϑ0 and restricted MLE have the
form
à ! à !
0 0
ϑ0 = , ϑ̂ res
= , ϑ0∗ , ϑ̂∗ ∈ Rk−r
ϑ∗ ϑ̂∗

Next, denote by ∇A , ∇C respectively the gradient operator with respect


to the first r and last k − r component of ϑ. It is clear that the MLE
definitions are equivalent to
à ! à !
∇A 0
∇ log L(X, ϑ̂) ≡ log L(X, ϑ̂) = , ∇C log L(X, ϑ̂res ) = 0
∇C 0

Similarly, we can partition the Fisher Information Matrix

1
à !
−1 ³ ⊗2 ´ 1 A B
I(ϑ0 ) = E ∇ log L(X, ϑ0 ) = E (∇ log L(X, ϑ0 ))⊗2 =
n n Bt C
where A, B, C are respectively r × r, r × (k − r), (k − r) × (k − r) matrices.
The score statistic S is given the partitioned notation
à !
1 1 ξ
√ S(ϑ0 ) ≡ √ ∇ log L(X, ϑ0 ) ≡
n n η
Now we begin to assemble background results, already derived in class,
related to Taylor series expansion of ∇ log L(X, ϑ) as a vector function of
ϑ around the points ϑ0 , ϑ̂, ϑ̂res , all of which lie with high probability for
large n in a tiny neighborhood of ϑ0 . Throughout, we will not be explicit
about the remainder terms, but rather write ≈ to indicate that the left
and right hand sides differ by a (usually random) quantity which converges
under H0 to 0 in probability as n → ∞. Recall that (under H0 )
√ √
à !
ξ
≈ I(ϑ0 ) n (ϑ̂ − ϑ0 ) , η ≈ C n (ϑ̂∗ − ϑ0∗ ) (1)
η
where the second part of (1) is really the same fact as the first, applied to
the parametric Maximum-Likelihood estimation problem (under H0 ) for the
data X and the (k − r)-dimensional parameter ϑ0∗ .
The problem considered in Wilks’ Theorem is the asymptotic distribution
of the Likelihood Ratio Statistic
L(X, ϑ̂) L(X, ϑ̂res )
−2 log Λ ≡ log(T1 /T2 ) , T1 ≡ , T2 ≡ (2)
L(X, ϑ0 ) L(X, ϑ0 )
However, Taylor expansion around ϑ0 showed us that
√ 1 ⊗2 √
2 log(T1 ) ≈ n (ϑ̂ − ϑ0 )t
∇ log L(X, ϑ0 ) n (ϑ̂ − ϑ0 )
n
à !t à !
ξ −1 ξ
≈ I (ϑ0 ) (3)
η η
and similarly, using the fact that under H0 the same data-sample X is
governed by a k − r dimensional parameter with associated information
matrix C, we have
2 log(T2 ) ≈ η t C −1 η (4)

2
Consider next another Taylor’s expansion around ϑ0 :
1 1
√ ∇A log L(X, ϑ̂res ) ≈ ∇A log L(X, ϑ0 )
n n
à !
1 0
+ ∇A ∇t log L(X, ϑ0 ) √
n n (ϑ̂∗ − ϑ0∗ )
Now we use the Law of Large Numbers result that n−1 ∇⊗2 log L(X, ϑ0 ) ≈
−I(ϑ0 ), as we have so often done before, to conclude from the last displayed
equation that
1 √
√ ∇A log L(X, ϑ̂res ) ≈ ξ − B n (ϑ̂∗ − ϑ0∗ ) ≈ ξ − B C −1 η (5)
n

where in the last step we have substituted the second part of (1).
Finally, we expand the difference between the right-hand sides of formulas
(3) and (4) to obtain via (2):
à !t à !
ξ ξ
−2 log Λ ≈ I −1
(ϑ0 ) − η t C −1 η (6)
η η

But the definition of the matrix inverse implies that


à !−1
³
t
´
−1
³
t
´ A B
B C I (ϑ0 ) = B C = (0 Id)
Bt C

and therefore
à !t
B C −1 η ³ ´
I −1 (ϑ0 ) = (C −1 η)t B t C I −1 (ϑ0 ) = (C −1 η)t (0 Id)
η

This implies that the cross-terms in the expression


à !t à ! "à ! à !#t
ξ −1 ξ ξ − BC −1 η BC −1 η
I (ϑ0 ) = + ·
η η 0 η
"Ã ! Ã !#
−1 ξ − BC −1 η BC −1 η
I (ϑ0 ) +
0 η

3
are 0, implying that
à !t à !
ξ ξ
I −1
(ϑ0 ) = (ξ − BC −1 η)t ·
η η

(A − BC −1 B t )−1 (ξ − BC −1 η) + η t C −1 η
Here we have used the fact that the upper-left block of I −1 (ϑ0 ) is equal
to (A − B C −1 B t )−1 , a linear algebra fact which can readily be proved by
solving for u ∈ Rr , v ∈ Rk−r in terms of x ∈ Rr , y ∈ Rk−r by elimination
in the equations: Ã ! Ã ! Ã !
A B u x
=
Bt C v y

Now, substituting into (6), we find

−2 log Λ ≈ (ξ − BC −1 η)t (A − BC −1 B t )−1 (ξ − BC −1 η) (7)

We conclude that the last expression is distributed asymptotically as χ2r


upon recalling that (ξ t η t )t is asymptotically N (0, I(ϑ0 )), which implies
that ξ − BC −1 η is also multivariate normal (r-dimensional) nondegenerate
with mean 0 and variance
³ ´
E (ξ − BC −1 η) (ξ − BC −1 η)t ≈ A − BC −1 B t

For future reference, we remark that the development given here shows
(as in equation 5) among other things that the score statistic
1
√ ∇A log L(X, ϑ̂res )
n

is asymptotically the same is the ‘adjusted’ score statistic ξ − B C −1 η,


which is asymptotically distributed as N (0, A − B C −1 B t ).

You might also like