Proof Wilks Theorem Likelihood Ratio Test
Proof Wilks Theorem Likelihood Ratio Test
1
à !
−1 ³ ⊗2 ´ 1 A B
I(ϑ0 ) = E ∇ log L(X, ϑ0 ) = E (∇ log L(X, ϑ0 ))⊗2 =
n n Bt C
where A, B, C are respectively r × r, r × (k − r), (k − r) × (k − r) matrices.
The score statistic S is given the partitioned notation
à !
1 1 ξ
√ S(ϑ0 ) ≡ √ ∇ log L(X, ϑ0 ) ≡
n n η
Now we begin to assemble background results, already derived in class,
related to Taylor series expansion of ∇ log L(X, ϑ) as a vector function of
ϑ around the points ϑ0 , ϑ̂, ϑ̂res , all of which lie with high probability for
large n in a tiny neighborhood of ϑ0 . Throughout, we will not be explicit
about the remainder terms, but rather write ≈ to indicate that the left
and right hand sides differ by a (usually random) quantity which converges
under H0 to 0 in probability as n → ∞. Recall that (under H0 )
√ √
à !
ξ
≈ I(ϑ0 ) n (ϑ̂ − ϑ0 ) , η ≈ C n (ϑ̂∗ − ϑ0∗ ) (1)
η
where the second part of (1) is really the same fact as the first, applied to
the parametric Maximum-Likelihood estimation problem (under H0 ) for the
data X and the (k − r)-dimensional parameter ϑ0∗ .
The problem considered in Wilks’ Theorem is the asymptotic distribution
of the Likelihood Ratio Statistic
L(X, ϑ̂) L(X, ϑ̂res )
−2 log Λ ≡ log(T1 /T2 ) , T1 ≡ , T2 ≡ (2)
L(X, ϑ0 ) L(X, ϑ0 )
However, Taylor expansion around ϑ0 showed us that
√ 1 ⊗2 √
2 log(T1 ) ≈ n (ϑ̂ − ϑ0 )t
∇ log L(X, ϑ0 ) n (ϑ̂ − ϑ0 )
n
à !t à !
ξ −1 ξ
≈ I (ϑ0 ) (3)
η η
and similarly, using the fact that under H0 the same data-sample X is
governed by a k − r dimensional parameter with associated information
matrix C, we have
2 log(T2 ) ≈ η t C −1 η (4)
2
Consider next another Taylor’s expansion around ϑ0 :
1 1
√ ∇A log L(X, ϑ̂res ) ≈ ∇A log L(X, ϑ0 )
n n
à !
1 0
+ ∇A ∇t log L(X, ϑ0 ) √
n n (ϑ̂∗ − ϑ0∗ )
Now we use the Law of Large Numbers result that n−1 ∇⊗2 log L(X, ϑ0 ) ≈
−I(ϑ0 ), as we have so often done before, to conclude from the last displayed
equation that
1 √
√ ∇A log L(X, ϑ̂res ) ≈ ξ − B n (ϑ̂∗ − ϑ0∗ ) ≈ ξ − B C −1 η (5)
n
where in the last step we have substituted the second part of (1).
Finally, we expand the difference between the right-hand sides of formulas
(3) and (4) to obtain via (2):
à !t à !
ξ ξ
−2 log Λ ≈ I −1
(ϑ0 ) − η t C −1 η (6)
η η
and therefore
à !t
B C −1 η ³ ´
I −1 (ϑ0 ) = (C −1 η)t B t C I −1 (ϑ0 ) = (C −1 η)t (0 Id)
η
3
are 0, implying that
à !t à !
ξ ξ
I −1
(ϑ0 ) = (ξ − BC −1 η)t ·
η η
(A − BC −1 B t )−1 (ξ − BC −1 η) + η t C −1 η
Here we have used the fact that the upper-left block of I −1 (ϑ0 ) is equal
to (A − B C −1 B t )−1 , a linear algebra fact which can readily be proved by
solving for u ∈ Rr , v ∈ Rk−r in terms of x ∈ Rr , y ∈ Rk−r by elimination
in the equations: Ã ! Ã ! Ã !
A B u x
=
Bt C v y
For future reference, we remark that the development given here shows
(as in equation 5) among other things that the score statistic
1
√ ∇A log L(X, ϑ̂res )
n