Diagnostic Tests
Diagnostic Tests
and diagnostics
1
Violation of the Assumptions of the CLRM
1. E(ut) = 0
2. Var(ut) = 2 <
3. Cov (ui,uj) = 0
4. The X matrix is non-stochastic or fixed in repeated samples
5. ut N(0,2)
Investigating Violations of the
Assumptions of the CLRM
• We will now study these assumptions further, and in particular look at:
- How we test for violations
- Causes
- Consequences
in general we could encounter any combination of 3 problems:
- the coefficient estimates are wrong
- the associated standard errors are wrong
- the distribution that we assumed for the
test statistics will be inappropriate
- Solutions
- the assumptions are no longer violated
- we work around the problem so that we
use alternative techniques which are still valid
Statistical Distributions for Diagnostic Tests
• The 2- version is sometimes called an “LM” test, and only has one degree
of freedom parameter: the number of restrictions being tested, m.
• For all diagnostic tests, we cannot observe the disturbances and so perform the tests of
the residuals.
• The mean of the residuals will always be zero provided that there is a constant term in
the regression.
• If the regression did not include an intercept, and the average value of the errors was
nonzero, several undesirable consequences could arise. First, R2, defined as ESS/TSS
can be negative, implying that the sample average, ¯y, ‘explains’ more of the variation
in y than the explanatory variables. Second, and more fundamentally, a regression with
no intercept parameter could lead to potentially severe biases in the slope coefficient
estimates
Assumption 2: Var(ut) = 2 <
• We have so far assumed that the variance of the errors is constant, 2 - this
is known as homoscedasticity. If the errors do not have a constant
variance, we say that they are heteroscedastic e.g. say we estimate a
regression and calculate the residuals,ut .
t û +
x2t
-
Detection of Heteroscedasticity: The GQ Test
• Graphical methods
• Formal tests: There are many of them: we will discuss Goldfeld-Quandt test and
White’s test
1. Split the total sample of length T into two sub-samples of length T1 and T2. The
regression model is estimated on each sub-sample and the two residual
variances are calculated.
2. The null hypothesis is that the variances of the disturbances are equal,
H 0: 2
1 22
The GQ Test (Cont’d)
4. The test statistic, denoted GQ, is simply the ratio of the two residual
variances where the larger of the two variances must be placed in the
numerator.
s12
GQ 2
s2
A problem with the test is that the choice of where to split the sample is
that usually arbitrary and may crucially affect the outcome of the test.
Detection of Heteroscedasticity using White’s Test
2. Then u 1auxiliary
ˆt2 the
run 2 x2t regression
3 x3t 4 x22t 5 x32t 6 x2t x3t vt
Performing White’s Test for Heteroscedasticity
• OLS estimation still gives unbiased coefficient estimates, but they are
no longer BLUE.
• Whether the standard errors calculated using the usual formulae are
too big or too small will depend upon the form of the
heteroscedasticity.
How Do we Deal with Heteroscedasticity?
• If the form (i.e. the cause) of the heteroscedasticity is known, then we can
use an estimation method which takes this into account (called generalised
least squares, GLS).
t yt yt-1 yt
1989M09 0.8 - -
1989M10 1.3 0.8 1.3-0.8=0.5
1989M11 -0.9 1.3 -0.9-1.3=-2.2
1989M12 0.2 -0.9 0.2--0.9=1.1
1990M01 -1.7 0.2 -1.7-0.2=-1.9
1990M02 2.3 -1.7 2.3--1.7=4.0
1990M03 0.1 2.3 0.1-2.3=-2.2
1990M04 0.0 0.1 0.0-0.1=-0.1
. . . .
. . . .
. . . .
Autocorrelation
• We assumed of the CLRM’s errors that Cov (ui , uj) = 0 for ij, i.e.
This is essentially the same as saying there is no pattern in the errors.
• If there are patterns in the residuals from a model, we say that they are
autocorrelated.
+
û t ût
+
- +
uˆ t 1 Time
+ ût
û t
+
- +
uˆ t 1 T
ime
- -
- +
uˆt 1 Time
-
-
• The coefficient estimates derived using OLS are still unbiased, but
they are inefficient, i.e. they are not BLUE, even in large sample sizes.
• Thus, if the standard error estimates are inappropriate, there exists the
possibility that we could make the wrong inferences.
• All of the models we have considered so far have been static, e.g.
yt = 1 + 2x2t + ... + kxkt + ut
• But we can easily extend this analysis to the case where the current
value of yt depends on previous values of y or one of the x’s, e.g.
yt = 1 + 2x2t + ... + kxkt + 1yt-1 + 2x2t-1 + … + kxkt-1+ ut
• We could extend the model even further by adding extra lags, e.g.
x2t-2 , yt-3 .
Why Might we Want/Need To Include Lags
in a Regression?
• However, other problems with the regression could cause the null
hypothesis of no autocorrelation to be rejected:
– Omission of relevant variables, which are themselves autocorrelated.
– If we have committed a “misspecification” error by using an
inappropriate functional form.
– Autocorrelation resulting from unparameterised seasonality.
Models in First Difference Form
• Denote the first difference of yt, i.e. yt - yt-1 as yt; similarly for the x-
variables, x2t = x2t - x2t-1 etc.
• “Equilibrium” implies that the variables have reached some steady state
and are no longer changing, i.e. if y and x are in equilibrium, we can say
yt = yt+1 = ... =y and xt = xt+1 = ... =x
Consequently, yt = yt - yt-1 = y - y = 0 etc.
If our model is
yt = 1 + 2 x2t + 3x2t-1 +4yt-1 + ut
4yt-1 = - 1 - 3x2t-1
1 3
y x2
4 4
Problems with Adding Lagged Regressors
to “Cure” Autocorrelation
• This problem occurs when the explanatory variables are very highly
correlated with each other.
• Perfect multicollinearity
Cannot estimate all the coefficients
- e.g. suppose x3 = 2x2
and the model is yt = 1 + 2x2t + 3x3t + 4x4t + ut
Corr x2 x3 x4
x2 - 0.2 0.8
x3 0.2 - 0.3
x4 0.8 0.3 -
• But another problem: if 3 or more variables are linear
- e.g. x2t + x3t = x4t
• Note that high correlation between y and one of the x’s is not
muticollinearity.
Solutions to the Problem of Multicollinearity
• We have previously assumed that the appropriate functional form is linear. This
may not always be true.
• We can formally test this using Ramsey’s RESET test, which is a general test for
mis-specification of functional form.
• Essentially the method works by adding higher order terms of the fitted values
(e.g. yt2 , yt3 etc.) into an auxiliary regression:
Regress u on powers of the fitted values:
t
ut 0 1 yt2 2 yt3 ... p 1 ytp vt
Obtain R2 from this regression. The test statistic is given by TR2 and is
distributed as a 2 ( p 1) .
• So if the value of the test statistic is greater than a 2 then reject the null
( p 1)
hypothesis that the functional form was correct.
But what do we do if this is the case?
yt Axt e ut ln yt ln xt ut
RETEST in Eviews
• Skewness and kurtosis are the (standardised) third and fourth moments
of a distribution.
Normal versus Skewed Distributions
f(x ) f(x )
x x
0.5
0.4
0.3
0.2
0.1
0.0
-5.4 -3.6 -1.8 -0.0 1.8 3.6 5.4
Testing for Normality
• Bera and Jarque formalise this by testing the residuals for normality by
testing whether the coefficient of skewness and the coefficient of excess
kurtosis are jointly zero.
6 24
u
• We estimate b1 and b2 using the residuals from the OLS regression, .
Testing for non-normality in Eviews
• Could use a method which does not assume normality, but difficult and
what are its properties?
• Often the case that one or two very extreme residuals causes us to reject
the normality assumption.
û
t
+
Oct T
ime
1
987