Three Classical Tests Wald, LM (Score), and LR Tests: Econ 620
Three Classical Tests Wald, LM (Score), and LR Tests: Econ 620
If the constraint is not binding (the null hypothesi is true), the Lagrangian multiplier associated with
the constraint is zero. We can construct a test measuring how far the Lagrangian multiplier is from zero. -
LM test. Finally, another way to check the validity of null hypothesis is to check the distance between two
values of maximum likelihood function like
y; θ
L θ − L (θ0 ) = log
(y; θ0 )
If the null hypothesis is true, the above statistic should not be far away from zero, again.
n
L (θ) = log f (yi | xi ; θ)
i=1
We assume all the regularity conditions for existence, consistency and asymptotic normality of MLE and
denote MLE as θn . The hypotheses of interest are given as
H0 ; g (θ0 ) = 0 HA ; g (θ0 ) = 0
Wald test
Proposition 1
⎛ ⎞−1
∂g θn ∂g θn
ξnW = ng θn ⎝ I −1 θn ⎠ g θn ∼ χ2 (r) under H0 .
∂θ ∂θ
2
where I = EX Eθ − ∂ log∂θ∂θ
f (Y |X;θ)
and I −1 θn is the inverse of I evaluated at θ = θn .
n θn − θ0 → N 0, I −1 (θ0 )
d
(1)
1
The first order Taylor series expansion of g θn around the true value of θ0 , we have
∂g (θ0 )
g θn = g (θ0 ) + θ n − θ 0 + op (1)
∂θ
√ ∂g (θ ) √
n θn − θ0 + op (1)
0
n g θn − g (θ0 ) =
(2)
∂θ
Hence, combining (1) and (2) gives
√ ∂g (θ0 ) −1 ∂g (θ0 )
d
n g θn − g (θ0 ) → N 0, I (θ0 ) (3)
∂θ ∂θ
Under the null hypothesis, we have g (θ0 ) = 0. Therefore,
√ d ∂g (θ0 ) −1 ∂g (θ0 )
ng θn → N 0, I (θ0 ) (4)
∂θ ∂θ
By forming the quadratic form of the normal random variables, we can conclude that
∂g (θ ) ∂g (θ0 )
−1
ng θn g θn ∼ χ2 (r)
0 −1
I (θ 0 ) under H0 . (5)
∂θ ∂θ
The statistic in (5) is useless since it depends on the unknown parameter θ0 . However, we can consistently
approximate the terms in inverse bracket by evaluating at MLE, θn . Therefore,
⎛ ⎞−1
∂g θn ∂g θn
ξnW = ng θn ⎝ I −1 θn ⎠ g θn ∼ χ2 (r) under H0 .
∂θ ∂θ
• An asymptotic test which rejects the null hypothesis with probability one when the alternative hy-
pothesis is true is called a consistent test. Namely, a consistent test has asymptotic power of 1.
• The Wald test we discussed above is a consistent test. A heuristic argument is that if the alternative hy-
p ∂g θ −1
( n ) −1 ∂g (θn )
pothesis is true instead of the null hypothesis, g θn → g (θ0 ) = 0. Therefore, g θn ∂θ I θ n ∂θ g θn
is converging to a constant instead of zero. By multiplying a constant by n, ξnW → ∞ as n → ∞, which
implies that we always reject the null hypothesis when the alternative is true.
• Another form of the Wald test statistic is given by - caution: this is quite confusing -
⎛ ⎞−1
∂g θn ∂g θn
ξnW = g θn ⎝ I −1
θ n
⎠ g θn ∼ χ2 (r) under H0 .
∂θ n
∂θ
2
∂ 2 log(yi |xi ;θ)
and In−1 θn is the inverse of In eval-
n
where In = EX Eθ − ∂∂θ∂θ L(θ)
= EX Eθ − i=1 ∂θ∂θ
uated at θ = θn . Note that In = nI.
• A quite common form of the null hypothesis is the zero restriction on a subset of parameters, i.e.,
H0 ; θ 1 = 0 HA ; θ1 = 0
where θ1 is a (q × 1) subvector of θ with q < p. Then, the Wald statistic is given by
−1
ξnW = nθ1 I 11 θn θ1 ∼ χ2 (q) under H0 .
where I 11 (θ) is the upper left block of the inverse information matrix,
I11 (θ) I12 (θ)
I (θ) =
I21 (θ) I22 (θ)
−1
−1
then, I 11 (θ) = I11 (θ) − I12 (θ) I22 (θ) by the formula for partitioned inverse. I 11 θn is I 11 (θ)
evaluated at MLE.
2
LM test (Score test)
If we have a priori reason or evidence to believe that the parameter vector satisfies some restrictions in the
form of g (θ) = 0, incorporating the information into the maximization of the likelihood function through
constrained optimization will improve the efficiency of estimator compared to MLE from unconstrained
maximization. We solve the following problem;
where θn is the solution of constrained maximization problem called constrained MLE and λ is the vector of
Lagrange multiplier. The LM test is based on the idea that properly scaled λ has an asymptotically normal
distribution.
Proposition 2
1 ∂L θn
∂L θn
ξnS =
I −1 θn
n ∂θ ∂θ
1 ∂g θn −1 ∂g θn
= λ I θ n λ ∼ χ2 (r) under H0 .
n ∂θ ∂θ
⇒ First order Taylor expansions of g θn and g θn around θ0 gives, ignoring op (1) terms,
√ √ ∂g (θ0 ) √
ng θn = ng (θ0 ) +
n θn − θ0 (8)
∂θ
√ √ ∂g (θ0 ) √
ng θn = ng (θ0 ) + n θ n − θ 0 (9)
∂θ
Note that g θn = 0 from (7) and substracting (9) from (8), we have
√ ∂g (θ ) √
ng θn = n θn − θn
0
(10)
∂θ
∂L(θn ) ∂L(θn )
On the other hand, taking first order Taylor series expansions of ∂θ and ∂θ around θ0 gives,
ignoring op (1) terms,
∂L θn ∂L (θ0 ) ∂ 2 L (θ0 )
= + θ n − θ 0 ⇒
∂θ
∂θ ∂θ∂θ
1 ∂L θn 1 ∂L (θ0 ) 1 ∂ 2 L (θ0 ) √
√ =√ + n θ n − θ 0 ⇒
n ∂θ n ∂θ n ∂θ∂θ
1 ∂L θn 1 ∂L (θ0 ) √
√ =√ − I (θ0 ) n θn − θ0 (11)
n ∂θ n ∂θ
p
2
L(θ0 )
n ∂ 2 log(yi |xi ;θ)
note that − n1 ∂∂θ∂θ = − n1 i=1 → I (θ0 ) by the law of large numbers. Similarly,
∂θ∂θ
1 ∂L θn 1 ∂L (θ0 ) √
√ = √ − I (θ0 ) n θn − θ0 (12)
n ∂θ n ∂θ
3
∂L(θn )
Considering the fact that ∂θ = 0 by FOC of the unconstrained maximization problem, we take the
difference between (11) and (12). Then,
1 ∂L θn √ √
√ = −I (θ0 ) n θn − θn = I (θ0 ) n θn − θn (13)
n ∂θ
Hence,
√ 1 ∂L θn
n θn − θn = I −1 (θ0 ) √ (14)
n ∂θ
From (10) and (14), we obtain
√ ∂g (θ0 ) −1 1 ∂L θn
ng θn = I (θ0 ) √
∂θ n ∂θ
−1
λ ∂g (θ0 ) −1 ∂g (θ0 ) √
√ =− I (θ 0 ) ng θn (16)
n ∂θ ∂θ
√ d
From (4), under the null hypothesis, ng θn → N 0, ∂g(θ 0 ) −1
∂θ I (θ0 ) ∂g∂θ(θ0 )
. Consequently, we have
−1
d
λ ∂g (θ0 ) −1 ∂g (θ0 )
√ →N 0, I (θ0 ) (17)
n ∂θ ∂θ
Again, forming the quadratic form of the normal random variables, we obtain
1 ∂g (θ0 ) −1 ∂g (θ0 )
λ I (θ0 ) λ ∼ χ2 (r) under H0 . (18)
n ∂θ ∂θ
4
we choose the second approximation, the LM test statistic becomes
⎛ ⎞−1
1 ∂L θn 1 i
n ∂ log y | x ; θ
i n ∂ log yi | xi ; θn ∂L θn
ξnS = ⎝ ⎠
n ∂θ n i=1 ∂θ ∂θ ∂θ
⎛ ⎞−1
i i n
n ∂ log y | x ; θ i i n
n ∂ log y | x ; θ ∂ log y i | x ;
i n
θ i i n
n ∂ log y | x ; θ
1 ⎝1 ⎠
=
n i=1 ∂θ n i=1 ∂θ ∂θ i=1
∂θ
⎛ ⎞−1
n
n ∂ log yi | xi ; θ n ∂ log yi | xi ; θn
n ∂ log yi | xi ; θ n
n ∂ log yi | xi ; θ
= ⎝ ⎠
i=1
∂θ i=1
∂θ ∂θ i=1
∂θ
this expression seems quite familiar to us - looks like a projection matrix -. The intuition is correct.
∂ log(yi |xi ;θn )
The (uncentered) Ru2 from the regression of 1 on ∂θ is given by
−1 −1 −1
1 X (X X) X X (X X) 1 1 X (X X) X 1
Ru2 = =
1 1 1 1
∂ log(y1 |x1 ;θn ) ∂ log(y2 |x2 ;θn ) ∂ log(yn |xn ;θn )
where X = ··· and 1 = 1 1 ··· 1 .
(n×p) ∂θ ∂θ
∂θ (n×1)
Then,
−1
n ∂ log(yi |xi ;θn )
n ∂ log(yi |xi ;θn ) ∂ log(yi |xi ;θn )
n ∂ log(yi |xi ;θn )
i=1 ∂θ i=1 ∂θ ∂θ i=1 ∂θ
Ru2 =
n
Hence,
ξnS = nRu2
This is quite an interesting result since the computation of LM statistic is nothing but an OLS re-
gression. We regress 1 on the scores evaluated at constrained MLE and compute uncentered R2 and
then multiply it with the number of observations to get LM statistic. One thing to be cautious is that
most software will automatically try to print out centered R2 , which is impossible in this case since
the denominator of centered R2 is simply zero.
5
hypothesis, ignoring stochastically dominated terms,
∂L (θ0 ) 1 2
L θn = L (θ0 ) + θ n − θ 0 + n − θ0 ∂ L (θ0 ) θn − θ0
θ
∂θ 2 ∂θ∂θ
1 ∂L (θ0 ) √ 1√ 1 ∂ 2 L (θ ) √
n θn − θ0
0
= L (θ0 ) + √
n θn − θ0 + n θn − θ0
n ∂θ 2 n ∂θ∂θ
∂L (θ0 ) 1 2
∂ L (θ0 )
L θn = L (θ0 ) +
θn − θ0 + θn − θ0
θn − θ0
∂θ 2 ∂θ∂θ
1 ∂L (θ0 ) √ 1√ 2 √
n − θ0 1 ∂ L (θ0 ) n θn − θ0
= L (θ0 ) + √ n θ n − θ 0 + n θ
n ∂θ 2 n ∂θ∂θ
Taking differences and multiplying by 2, we obtain
2 ∂L (θ0 ) √ √ 1 ∂ 2 L (θ ) √
2 L θn − L θn n θn − θn + n θn − θ0 n θn − θ0
0
=√
n ∂θ n ∂θ∂θ
√ 1 ∂ L (θ0 ) √
2
− n θn − θ0 n θn − θ0
n ∂θ∂θ
→ 2n θn − θ0 I (θ0 ) θn − θn − n θn − θ0 I (θ0 ) θn − θ0
+ n θn − θ0 I (θ0 ) θn − θ0
√ 2
since √1
∂L(θ0 )
n ∂θ
= I (θ0 ) n θn − θ0 from (11) and − n1 ∂∂θ∂θ
L(θ0 ) p
→ I (θ0 ) . Continuing the derivation,
2 L θn − L θn = 2n θn − θ0 I (θ0 ) θn − θn − n θn − θ0 I (θ0 ) θn − θ0
+ n θn − θn + θn − θ0 I (θ0 ) θn − θn + θn − θ0
= 2n θn − θ0 I (θ0 ) θn − θn − n θn − θ0 I (θ0 ) θn − θ0
+ n θn − θn I (θ0 ) θn − θn + n θn − θn I (θ0 ) θn − θ0
+ n θn − θ0 I (θ0 ) θn − θn + n θn − θ0 I (θ0 ) θn − θ0
= 2n θn − θ0 I (θ0 ) θn − θn + n θn − θn I (θ0 ) θn − θn
− n θn − θn I (θ0 ) θn − θ0 − n θn − θ0 I (θ0 ) θn − θn
= n θn − θn I (θ0 ) θn − θn (20)
note that θn − θ0 I (θ0 ) θn − θn = θn − θn I (θ0 ) θn − θ0 .
Now, from (13) and (20), we have
√ √
2 L θn − L θn = n θn − θn I (θ0 ) n θn − θn
∂L n
θ ∂L θn 1
1 −1 −1
= √ I (θ 0 ) I (θ 0 ) I (θ 0 ) √
n ∂θ ∂θ n
∂L θn
1 ∂L θn −1
= I (θ 0 ) = ξnS ∼ χ2 (r) under H0 .
n ∂θ ∂θ
• Calculating LR test statistic requires two maximizations of likelihood function one with and the other
without constraint.
• LR test is also an asymptotically consistent test.
• As shown above, Wald, LM and LR test are asymptotically equivalent with χ2 (r) .
6
Examples of tests in the linear regression model
Suppose the regression model such as
yi = β xi + εi
εi ∼ i.i.n. 0, σ 2
H0 ; R β =γ H0 ; Rβ = γ
(r×p)(p×1)
7
On the other hand, the Lagrange multiplier of the constrained maximization problem is
−1
n = − 2 R (X X)−1 R
λ γ − R n
β
n2
σ
Under H0 , the distribution of the Lagrange multiplier is
−1
n ∼ N 0, 4 R (X X)−1 R
λ
n2
σ
since γ − Rβn ∼ N 0, σ
−1
n2 R (X X) R . Then, the LM test statistic is
n2
σ −1
n
ξnS = λn R (X X) R λ
4
1
−1
−1
= 2 Rβn − γ R (X X) R Rβn − γ
σ
n
2 − σ
σ 2 n n n
=n n 2 n = 2 = 2 =
σ
n 1−1+ 2 2
n
σ
1+ 2 2
n
σ
1+ (n+K)
n −
σ σn n −
σ σn rF