0% found this document useful (0 votes)
48 views

Three Classical Tests Wald, LM (Score), and LR Tests: Econ 620

1) Three classical tests for hypotheses about parameters in statistical models are the Wald test, likelihood ratio (LR) test, and Lagrange multiplier (LM) or score test. 2) The Wald test is based on the distance between the maximum likelihood estimate (MLE) and the value under the null hypothesis. The LR test compares the maximum likelihood under the null versus the unrestricted model. The LM test measures how far the Lagrangian multiplier associated with the constraint is from zero. 3) Under regularity conditions, the test statistics for the Wald, LR, and LM tests are asymptotically chi-squared distributed under the null hypothesis, allowing calculation of p-values. The Wald and LR tests are consistent,

Uploaded by

jbiltov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Three Classical Tests Wald, LM (Score), and LR Tests: Econ 620

1) Three classical tests for hypotheses about parameters in statistical models are the Wald test, likelihood ratio (LR) test, and Lagrange multiplier (LM) or score test. 2) The Wald test is based on the distance between the maximum likelihood estimate (MLE) and the value under the null hypothesis. The LR test compares the maximum likelihood under the null versus the unrestricted model. The LM test measures how far the Lagrangian multiplier associated with the constraint is from zero. 3) Under regularity conditions, the test statistics for the Wald, LR, and LM tests are asymptotically chi-squared distributed under the null hypothesis, allowing calculation of p-values. The Wald and LR tests are consistent,

Uploaded by

jbiltov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Econ 620

Three Classical Tests; Wald, LM(Score), and LR tests


Suppose that we have the density  (y; θ) of a model with the null hypothesis of the form H0 ; θ = θ0 . Let
L (θ) be the log-likelihood function of the model and θ be the MLE of θ.
Wald test is based on the very intuitive idea that we are willing to accept the null hypothesis when θ is
close to θ0 . The distance between θ and θ0 is the basis of constructing the test statistic. On the other hand,
consider the following constrained maximization problem,

max L (θ) s.t. θ = θ0


θ∈Θ

If the constraint is not binding (the null hypothesi is true), the Lagrangian multiplier associated with
the constraint is zero. We can construct a test measuring how far the Lagrangian multiplier is from zero. -
LM test. Finally, another way to check the validity of null hypothesis is to check the distance between two
values of maximum likelihood function like
 
   y; θ
L θ − L (θ0 ) = log
 (y; θ0 )

If the null hypothesis is true, the above statistic should not be far away from zero, again.

Asymptotic Distributions of the Three Tests


Assume that the observed variables can be partitioned into the endogenous variables X and exogenous
variables Y. To simplify the presentation, we assume that the observations (Yi , Xi ) are i.i.d. and we can
obtain conditional distribution of endogenous variables given the exogenous variables as f (yi | xi ; θ) with
θ ∈ Θ ⊆ Rp . The conditional density is known up to unknown parameter vector θ. By i.i.d. assumption, we
can write down the log-likelihood function of n observations of (Yi , Xi ) as


n
L (θ) = log f (yi | xi ; θ)
i=1

We assume all the regularity conditions for existence, consistency and asymptotic normality of MLE and
denote MLE as θn . The hypotheses of interest are given as

H0 ; g (θ0 ) = 0 HA ; g (θ0 ) = 0

where g (·) ; Rp → Rr and the rank of ∂g


∂θ is r.

Wald test
Proposition 1
⎛     ⎞−1
  ∂g θn   ∂g  θn  
ξnW = ng  θn ⎝ I −1 θn ⎠ g θn ∼ χ2 (r) under H0 .
∂θ ∂θ
 2   
where I = EX Eθ − ∂ log∂θ∂θ
f (Y |X;θ)
 and I −1 θn is the inverse of I evaluated at θ = θn .

⇒ From the asymptotic characteristics of MLE, we know that


√  

n θn − θ0 → N 0, I −1 (θ0 )
d
(1)

1
 
The first order Taylor series expansion of g θn around the true value of θ0 , we have
  ∂g (θ0 )   
g θn = g (θ0 ) + θ n − θ 0 + op (1)
∂θ
√     ∂g (θ ) √  
n θn − θ0 + op (1)
0
n g θn − g (θ0 ) = 
(2)
∂θ
Hence, combining (1) and (2) gives

√     ∂g (θ0 ) −1 ∂g  (θ0 )
 d
n g θn − g (θ0 ) → N 0, I (θ0 ) (3)
∂θ ∂θ
Under the null hypothesis, we have g (θ0 ) = 0. Therefore,

√   d ∂g (θ0 ) −1 ∂g  (θ0 )

ng θn → N 0, I (θ0 ) (4)
∂θ ∂θ
By forming the quadratic form of the normal random variables, we can conclude that
  ∂g (θ ) ∂g  (θ0 )
−1  
ng  θn g θn ∼ χ2 (r)
0 −1
I (θ 0 ) under H0 . (5)
∂θ ∂θ
The statistic in (5) is useless since it depends on the unknown parameter θ0 . However, we can consistently
approximate the terms in inverse bracket by evaluating at MLE, θn . Therefore,
⎛     ⎞−1
  ∂g θn   ∂g  θn  
ξnW = ng  θn ⎝ I −1 θn ⎠ g θn ∼ χ2 (r) under H0 .
∂θ ∂θ

• An asymptotic test which rejects the null hypothesis with probability one when the alternative hy-
pothesis is true is called a consistent test. Namely, a consistent test has asymptotic power of 1.
• The Wald test we discussed above is a consistent test. A heuristic argument is that if the alternative hy-
  p   ∂g θ   −1 
( n ) −1  ∂g (θn )
pothesis is true instead of the null hypothesis, g θn → g (θ0 ) = 0. Therefore, g  θn ∂θ  I θ n ∂θ g θn
is converging to a constant instead of zero. By multiplying a constant by n, ξnW → ∞ as n → ∞, which
implies that we always reject the null hypothesis when the alternative is true.
• Another form of the Wald test statistic is given by - caution: this is quite confusing -
⎛     ⎞−1
  ∂g θn   ∂g  θn  
ξnW = g  θn ⎝ I −1 
θ n
⎠ g θn ∼ χ2 (r) under H0 .
∂θ n
∂θ
 2     
∂ 2 log(yi |xi ;θ)
and In−1 θn is the inverse of In eval-
n
where In = EX Eθ − ∂∂θ∂θ L(θ)
 = EX Eθ − i=1 ∂θ∂θ 
uated at θ = θn . Note that In = nI.
• A quite common form of the null hypothesis is the zero restriction on a subset of parameters, i.e.,
H0 ; θ 1 = 0 HA ; θ1 = 0
where θ1 is a (q × 1) subvector of θ with q < p. Then, the Wald statistic is given by
  −1
ξnW = nθ1 I 11 θn θ1 ∼ χ2 (q) under H0 .

where I 11 (θ) is the upper left block of the inverse information matrix,
 
I11 (θ) I12 (θ)
I (θ) =
I21 (θ) I22 (θ)

−1
 
−1
then, I 11 (θ) = I11 (θ) − I12 (θ) I22 (θ) by the formula for partitioned inverse. I 11 θn is I 11 (θ)
evaluated at MLE.

2
LM test (Score test)
If we have a priori reason or evidence to believe that the parameter vector satisfies some restrictions in the
form of g (θ) = 0, incorporating the information into the maximization of the likelihood function through
constrained optimization will improve the efficiency of estimator compared to MLE from unconstrained
maximization. We solve the following problem;

max L (θ) s.t.g (θ) = 0

FOC’s are given by


   
∂L θn ∂g  θn
+ =0
λ (6)
∂θ ∂θ 
g θn = 0 (7)

where θn is the solution of constrained maximization problem called constrained MLE and λ is the vector of
Lagrange multiplier. The LM test is based on the idea that properly scaled λ has an asymptotically normal
distribution.

Proposition 2
   
1   ∂L θn
∂L θn
ξnS = 
I −1 θn
n ∂θ  ∂θ 
  
1  ∂g θn −1    ∂g θn 
= λ I θ n λ ∼ χ2 (r) under H0 .
n ∂θ ∂θ
   
⇒ First order Taylor expansions of g θn and g θn around θ0 gives, ignoring op (1) terms,

√   √ ∂g (θ0 ) √   
ng θn = ng (θ0 ) + 
n θn − θ0 (8)
∂θ
√   √ ∂g (θ0 ) √   
ng θn = ng (θ0 ) + n θ n − θ 0 (9)
∂θ
 
Note that g θn = 0 from (7) and substracting (9) from (8), we have

√   ∂g (θ ) √  
ng θn = n θn − θn
0

(10)
∂θ
∂L(θn ) ∂L(θn )
On the other hand, taking first order Taylor series expansions of ∂θ and ∂θ around θ0 gives,
ignoring op (1) terms,
 
∂L θn ∂L (θ0 ) ∂ 2 L (θ0 )   
= + θ n − θ 0 ⇒
∂θ
  ∂θ ∂θ∂θ

1 ∂L θn 1 ∂L (θ0 ) 1 ∂ 2 L (θ0 ) √   
√ =√ + n θ n − θ 0 ⇒
n ∂θ n ∂θ n ∂θ∂θ
 

1 ∂L θn 1 ∂L (θ0 ) √  
√ =√ − I (θ0 ) n θn − θ0 (11)
n ∂θ n ∂θ
p
2
L(θ0 ) n ∂ 2 log(yi |xi ;θ)
note that − n1 ∂∂θ∂θ  = − n1 i=1 → I (θ0 ) by the law of large numbers. Similarly,
∂θ∂θ 
 

1 ∂L θn 1 ∂L (θ0 ) √  
√ = √ − I (θ0 ) n θn − θ0 (12)
n ∂θ n ∂θ

3
∂L(θn )
Considering the fact that ∂θ = 0 by FOC of the unconstrained maximization problem, we take the
difference between (11) and (12). Then,
 
1 ∂L θn √   √  
√ = −I (θ0 ) n θn − θn = I (θ0 ) n θn − θn (13)
n ∂θ

Hence,  
√   1 ∂L θn
n θn − θn = I −1 (θ0 ) √ (14)
n ∂θ
From (10) and (14), we obtain
 
√   ∂g (θ0 ) −1 1 ∂L θn
ng θn = I (θ0 ) √
∂θ n ∂θ

Using (6), we deduce


 
   
√ ∂g (θ0 ) −1 ∂g θn λ
ng θn = − I (θ 0 ) √
∂θ ∂θ n
∂g (θ0 ) −1  
∂g (θ0 ) λ
→− I (θ0 ) √ (15)
∂θ ∂θ n
  p
since θn → θ0 hence g θn → g (θ0 ) . Therefore,
p

−1

λ ∂g (θ0 ) −1 ∂g  (θ0 ) √  
√ =− I (θ 0 ) ng θn (16)
n ∂θ ∂θ
√   d  

From (4), under the null hypothesis, ng θn → N 0, ∂g(θ 0 ) −1
∂θ  I (θ0 ) ∂g∂θ(θ0 )
. Consequently, we have
 −1 
 d
λ ∂g (θ0 ) −1 ∂g  (θ0 )
√ →N 0, I (θ0 ) (17)
n ∂θ ∂θ

Again, forming the quadratic form of the normal random variables, we obtain

1  ∂g (θ0 ) −1 ∂g  (θ0 ) 
λ I (θ0 ) λ ∼ χ2 (r) under H0 . (18)
n ∂θ ∂θ

Alternatively, using (6), another form of the test statistic is given by


   
∂L n
θ ∂L θn
1 −1
I (θ0 ) ∼ χ2 (r) under H0 (19)
n ∂θ ∂θ
Note that (18) and (19) are useless since they depend on the unknown parameter value θ0 . We can
evaluate the terms involved in θ0 at the constrained MLE, θn to get a usable statistic.
∂L(θn ) −1 ∂L(θn )
 

• Again, another form of LM test is ξnS =  ∂g(θ0 ) −1
(θ0 ) ∂g∂θ(θ0 ) 
∂θ  In (θ0 ) ∂θ =λ ∂θ  In λ.

n ∂ 2 log(yi |xi ;θn ) n ∂ log(yi |xi ;θn ) ∂ log(yi |xi ;θn )


• We can approximate I (θ0 ) with either − n1 i=1 ∂θ∂θ  or 1
n i=1 ∂θ ∂θ  . If

4
we choose the second approximation, the LM test statistic becomes
 ⎛     ⎞−1  
1 ∂L θn 1  i

n ∂ log y | x ; θ 
i n ∂ log yi | xi ; θn ∂L θn
ξnS = ⎝ ⎠
n ∂θ n i=1 ∂θ ∂θ ∂θ
 ⎛     ⎞−1  
 i i n

n ∂ log y | x ; θ  i i n

n ∂ log y | x ; θ ∂ log y i | x ;
i n

θ  i i n

n ∂ log y | x ; θ
1 ⎝1 ⎠
=
n i=1 ∂θ n i=1 ∂θ ∂θ i=1
∂θ
 ⎛     ⎞−1  
 n
n ∂ log yi | xi ; θ  n ∂ log yi | xi ; θn
n ∂ log yi | xi ; θ  n
n ∂ log yi | xi ; θ
= ⎝ ⎠
i=1
∂θ i=1
∂θ ∂θ i=1
∂θ

this expression seems quite familiar to us - looks like a projection matrix -. The intuition is correct.
∂ log(yi |xi ;θn )
The (uncentered) Ru2 from the regression of 1 on ∂θ is given by
−1 −1 −1
1 X (X  X) X  X (X  X) 1 1 X (X  X) X  1
Ru2 = =
1 1 1 1
   
∂ log(y1 |x1 ;θn ) ∂ log(y2 |x2 ;θn ) ∂ log(yn |xn ;θn )
where X = ··· and 1 = 1 1 ··· 1 .
(n×p) ∂θ  ∂θ  
∂θ (n×1)
Then,
−1
n ∂ log(yi |xi ;θn ) n ∂ log(yi |xi ;θn ) ∂ log(yi |xi ;θn ) n ∂ log(yi |xi ;θn )
i=1 ∂θ  i=1 ∂θ ∂θ  i=1 ∂θ
Ru2 =
n
Hence,
ξnS = nRu2
This is quite an interesting result since the computation of LM statistic is nothing but an OLS re-
gression. We regress 1 on the scores evaluated at constrained MLE and compute uncentered R2 and
then multiply it with the number of observations to get LM statistic. One thing to be cautious is that
most software will automatically try to print out centered R2 , which is impossible in this case since
the denominator of centered R2 is simply zero.

• LM test is also an asymptotically consistent test.


• From (16) and (18),
⎛     ⎞−1
  ∂g θn   ∂g  θn  
ξnW = ng  θn ⎝ I −1 θn ⎠ g θn
∂θ ∂θ
  ∂g (θ ) ∂g  (θ0 )
−1  
→ ng  θn g θn = ξnS
0 −1
I (θ 0 )
∂θ ∂θ

Likelihood ratio(LR) test


Proposition 3     
ξnR = 2 L θn − L θn ∼ χ2 (r) under H0 .
   
⇒ We consider the second order Taylor expansions of L θn and L θn around θ0 . Under the null

5
hypothesis, ignoring stochastically dominated terms,
  ∂L (θ0 )    1  2  
L θn = L (θ0 ) + θ n − θ 0 + n − θ0 ∂ L (θ0 ) θn − θ0
θ
∂θ 2 ∂θ∂θ
1 ∂L (θ0 ) √    1√    1 ∂ 2 L (θ ) √  
n θn − θ0
0
= L (θ0 ) + √ 
n θn − θ0 + n θn − θ0 
n ∂θ 2 n ∂θ∂θ
  
∂L (θ0 )   1   2 
∂ L (θ0 )  
L θn = L (θ0 ) + 
θn − θ0 + θn − θ0 
θn − θ0
∂θ 2 ∂θ∂θ
1 ∂L (θ0 ) √    1√   2 √ 
n − θ0 1 ∂ L (θ0 ) n θn − θ0

= L (θ0 ) + √ n θ n − θ 0 + n θ
n ∂θ 2 n ∂θ∂θ
Taking differences and multiplying by 2, we obtain
     2 ∂L (θ0 ) √    √   1 ∂ 2 L (θ ) √  
2 L θn − L θn n θn − θn + n θn − θ0 n θn − θ0
0
=√  
n ∂θ n ∂θ∂θ
√    1 ∂ L (θ0 ) √ 
2  
− n θn − θ0 n θn − θ0
n ∂θ∂θ
       
→ 2n θn − θ0 I (θ0 ) θn − θn − n θn − θ0 I (θ0 ) θn − θ0
   
+ n θn − θ0 I (θ0 ) θn − θ0
√   2
since √1
∂L(θ0 )
n ∂θ 
= I (θ0 ) n θn − θ0 from (11) and − n1 ∂∂θ∂θ
L(θ0 ) p
 → I (θ0 ) . Continuing the derivation,
            
2 L θn − L θn = 2n θn − θ0 I (θ0 ) θn − θn − n θn − θ0 I (θ0 ) θn − θ0
   
+ n θn − θn + θn − θ0 I (θ0 ) θn − θn + θn − θ0
       
= 2n θn − θ0 I (θ0 ) θn − θn − n θn − θ0 I (θ0 ) θn − θ0
       
+ n θn − θn I (θ0 ) θn − θn + n θn − θn I (θ0 ) θn − θ0
       
+ n θn − θ0 I (θ0 ) θn − θn + n θn − θ0 I (θ0 ) θn − θ0
       
= 2n θn − θ0 I (θ0 ) θn − θn + n θn − θn I (θ0 ) θn − θn
       
− n θn − θn I (θ0 ) θn − θ0 − n θn − θ0 I (θ0 ) θn − θn
   
= n θn − θn I (θ0 ) θn − θn (20)
       
note that θn − θ0 I (θ0 ) θn − θn = θn − θn I (θ0 ) θn − θ0 .
Now, from (13) and (20), we have
     √   √  
2 L θn − L θn = n θn − θn I (θ0 ) n θn − θn
   
∂L n
θ ∂L θn 1
1 −1 −1
= √ I (θ 0 ) I (θ 0 ) I (θ 0 ) √
n ∂θ ∂θ n
   
 ∂L θn
1 ∂L θn −1
= I (θ 0 ) = ξnS ∼ χ2 (r) under H0 .
n ∂θ ∂θ
• Calculating LR test statistic requires two maximizations of likelihood function one with and the other
without constraint.
• LR test is also an asymptotically consistent test.
• As shown above, Wald, LM and LR test are asymptotically equivalent with χ2 (r) .

6
Examples of tests in the linear regression model
Suppose the regression model such as

yi = β  xi + εi

εi ∼ i.i.n. 0, σ 2

The hypotheses are given by

H0 ; R β =γ H0 ; Rβ = γ
(r×p)(p×1)

The log-likelihood function is given by


n 1 
L (β) ≈ − log σ 2 − 2 (y − Xβ) (y − Xβ)
2 2σ
Then, the unconstrained MLE is given by
−1
βn = (X  X) X  y
1   
n2 =
σ y − X βn y − X βn
n
Information matrix is given by  
1 
In (θ0 ) = σ2 (X X) 0
n
0 2σ4

The Wald statistic is, from Proposition 1,


       R −1  
ξnW = n Rβn − γ R 0 I −1 θn Rβn − γ
0
     −1    −1  
  R
= Rβn − γ R 0 In θn Rβn − γ
0
   −1  
−1
= Rβn − γ R σ
n2 (X  X) R Rβn − γ
1   
−1
−1  
= 2 Rβn − γ R (X  X) R Rβn − γ ∼ χ2 (r) under H0 .
σ
n

Denote the constrained MLE as βn and σ


n2 , respectively. Then,
1    1   
n2 − σ
σ n2 = y − X βn y − X βn − y − X βn y − X βn
n n
1  
  
= X βn − X βn 
X βn − X βn
n
1     1  
−1
−1  
= βn − βn X  X βn − βn = Rβn − γ R (X  X) R Rβn − γ
n n
 −1  
since βn = βn + (X  X)−1 R R (X  X)−1 R Rβn − γ . Therefore,
   −1  
 2 
Rβn − γ Rβn − γ
−1
n σn − σn2 R (X  X) R
ξnW = =    
n2
σ 1 n n
n y − X β y − X β
   −1  
Rβn − γ Rβn − γ /r
−1
R (X  X) R
nr nr
=     × = F
n − K n −K
y − X βn y − X βn / (n − K)

7
On the other hand, the Lagrange multiplier of the constrained maximization problem is
 −1  
n = − 2 R (X  X)−1 R
λ γ − R n
β
n2
σ
Under H0 , the distribution of the Lagrange multiplier is
 −1
n ∼ N 0, 4 R (X  X)−1 R
λ
n2
σ
   
since γ − Rβn ∼ N 0, σ
−1
n2 R (X  X) R . Then, the LM test statistic is

n2  
σ −1

n
ξnS = λn R (X  X) R λ
4
1   
−1
−1  
= 2 Rβn − γ R (X  X) R Rβn − γ
σ
n
2 − σ
σ 2 n n n
=n n 2 n = 2 = 2 =
σ
n 1−1+ 2 2
n
σ
1+ 2 2
n
σ
1+ (n+K)
 n −
σ σn  n −
σ σn rF

To obtain LR test statistic, note that


  n n 1    
L θn = − log 2π − n2 −
log σ y − X βn y − X βn
2 2 σn2
2
n n n 1   
= − log 2π − n2 −
log σ × y − X βn y − X βn
2 2 σn2
2 n
n n n
= − log 2π − n2 −
log σ n2
×σ
2 2 σn2
2
n n n
= − log 2π − n2 −
log σ
2 2 2
On the other hand,
  n n 1    
L θn = − log 2π − n2 −
log σ y − X βn y − X βn
2 2 σn2
2
n n n 1   
= − log 2π − n2 −
log σ × y − X βn y − X βn
2 2 σn2
2 n
n n n
= − log 2π − n2 −
log σ n2
×σ
2 2 σn2
2
n n n
= − log 2π − n2 −
log σ
2 2 2
Hence,
      n n 
ξnR = 2 L θn − L θn n2 + log σ
= 2 − log σ n2
2
2
n2
σ n2
σ 2 − σ
σ 2
= n log 2 = n log 1 − 1 + 2 = n log 1 + n 2 n
σ
 σ
n σ
n
n
rF
= n log 1 +
n−K
An interesting result can be obtained using the following inequalities,
x
≤ log (1 + x) ≤ x ∀x > −1
1+x
Let x = rF
n−K and applying the above inequalities,we obtain

ξnS ≤ ξnR ≤ ξnW

You might also like