Classical Linear Regression Model Assumptions and Diagnostics
Classical Linear Regression Model Assumptions and Diagnostics
1
‘Introductory Econometrics for Finance’ © Chris Brooks 2008
Violation of the Assumptions of the CLRM
1. E(ut) = 0
2. Var(ut) = 2 <
3. Cov (ui,uj) = 0
4. The X matrix is non-stochastic or fixed in repeated samples
5. ut N(0,2)
• The 2- version is sometimes called an “LM” test, and only has one degree
of freedom parameter: the number of restrictions being tested, m.
• Asymptotically, the 2 tests are equivalent since the 2 is a special case of the
F-distribution:
2 m
F m, T k as T k
m
• For small samples, the F-version is preferable.
• The mean of the residuals will always be zero provided that there is a
constant term in the regression.
• We have so far assumed that the variance of the errors is constant, 2 - this
is known as homoscedasticity. If the errors do not have a constant
variance, we say that they are heteroscedastic e.g. say we estimate a
regression and calculate the residuals,ut .
û + t
x2t
-
‘Introductory Econometrics for Finance’ © Chris Brooks 2008
Detection of Heteroscedasticity: The GQ Test
• Graphical methods
• Formal tests: There are many of them: we will discuss Goldfeld-Quandt test and
White’s test
1. Split the total sample of length T into two sub-samples of length T1 and T2. The
regression model is estimated on each sub-sample and the two residual variances
are calculated.
2. The null hypothesis is that the variances of the disturbances are equal,
H0:
12 22
4. The test statistic, denoted GQ, is simply the ratio of the two residual
variances where the larger of the two variances must be placed in the
numerator.
s12
GQ 2
s2
A problem with the test is that the choice of where to split the sample is
that usually arbitrary and may crucially affect the outcome of the test.
• OLS estimation still gives unbiased coefficient estimates, but they are
no longer BLUE.
• Whether the standard errors calculated using the usual formulae are
too big or too small will depend upon the form of the
heteroscedasticity.
• If the form (i.e. the cause) of the heteroscedasticity is known, then we can
use an estimation method which takes this into account (called generalised
least squares, GLS).
t yt yt-1 yt
1989M09 0.8 - -
1989M10 1.3 0.8 1.3-0.8=0.5
1989M11 -0.9 1.3 -0.9-1.3=-2.2
1989M12 0.2 -0.9 0.2--0.9=1.1
1990M01 -1.7 0.2 -1.7-0.2=-1.9
1990M02 2.3 -1.7 2.3--1.7=4.0
1990M03 0.1 2.3 0.1-2.3=-2.2
1990M04 0.0 0.1 0.0-0.1=-0.1
. . . .
. . . .
. . . .
• We assumed of the CLRM’s errors that Cov (ui , uj) = 0 for ij, i.e.
This is essentially the same as saying there is no pattern in the errors.
• If there are patterns in the residuals from a model, we say that they are
autocorrelated.
+
û t ût
+
- +
uˆ t 1 Time
+ ût
ût
+
- +
uˆt 1 T
ime
- -
- +
uˆt 1 Time
-
-
• The coefficient estimates derived using OLS are still unbiased, but
they are inefficient, i.e. they are not BLUE, even in large sample sizes.
• Thus, if the standard error estimates are inappropriate, there exists the
possibility that we could make the wrong inferences.
• All of the models we have considered so far have been static, e.g.
yt = 1 + 2x2t + ... + kxkt + ut
• But we can easily extend this analysis to the case where the current
value of yt depends on previous values of y or one of the x’s, e.g.
yt = 1 + 2x2t + ... + kxkt + 1yt-1 + 2x2t-1 + … + kxkt-1+ ut
• We could extend the model even further by adding extra lags, e.g.
x2t-2 , yt-3 .
• However, other problems with the regression could cause the null hypothesis
of no autocorrelation to be rejected:
– Omission of relevant variables, which are themselves autocorrelated.
– If we have committed a “misspecification” error by using an inappropriate
functional form.
– Autocorrelation resulting from unparameterised seasonality.
• Denote the first difference of yt, i.e. yt - yt-1 as yt; similarly for the x-
variables, x2t = x2t - x2t-1 etc.
• “Equilibrium” implies that the variables have reached some steady state and
are no longer changing, i.e. if y and x are in equilibrium, we can say
yt = yt+1 = ... =y and xt = xt+1 = ... =x
Consequently, yt = yt - yt-1 = y - y = 0 etc.
If our model is
yt = 1 + 2 x2t + 3x2t-1 +4yt-1 + ut
4yt-1 = - 1 - 3x2t-1
1 3
y x2
4 4
• This problem occurs when the explanatory variables are very highly
correlated with each other.
• Perfect multicollinearity
Cannot estimate all the coefficients
- e.g. suppose x3 = 2x2
and the model is yt = 1 + 2x2t + 3x3t + 4x4t + ut
Corr x2 x3 x4
x2 - 0.2 0.8
x3 0.2 - 0.3
x4 0.8 0.3 -
• But another problem: if 3 or more variables are linear
- e.g. x2t + x3t = x4t
• Note that high correlation between y and one of the x’s is not
muticollinearity.
• Essentially the method works by adding higher order terms of the fitted values (e.g.
etc.)
y t2 , into
yt3 an auxiliary regression:
Regress on powers of the fitted values:
ut
ut 0 1 yt2 2 yt3 ... p 1 ytp vt
Obtain R2 from this regression. The test statistic is given by TR2 and is distributed as
a .
2 ( p 1)
• So if the value of the test statistic is greater than a then reject the null
2
hypothesis that the functional form was correct. ( p 1)
yt Axt e ut ln yt ln xt ut
• Skewness and kurtosis are the (standardised) third and fourth moments
of a distribution.
f(x ) f(x )
x x
0.5
0.4
0.3
0.2
0.1
0.0
-5.4 -3.6 -1.8 -0.0 1.8 3.6 5.4
• Bera and Jarque formalise this by testing the residuals for normality by
testing whether the coefficient of skewness and the coefficient of excess
kurtosis are jointly zero.
u
• We estimate b1 and b2 using the residuals from the OLS regression, .
• Could use a method which does not assume normality, but difficult and
what are its properties?
• Often the case that one or two very extreme residuals causes us to reject
the normality assumption.
û
t
+
Oct T
ime
1
987
-
• Create a new variable:
D87M10t = 1 during October 1987 and zero otherwise.
This effectively knocks out that observation. But we need a theoretical reason for
adding dummy variables.
• Coefficient estimates will still be consistent and unbiased, but the estimators
will be inefficient.
• We have implicitly assumed that the parameters (1, 2 and 3) are constant
for the entire sample period.
• We can test this implicit assumption using parameter stability tests. The idea
is essentially to split the data into sub-periods and then to estimate up to
three models, for each of the sub-parts and for all the data and then to
“compare” the RSS of the models.
2. The restricted regression is now the regression for the whole period while
the “unrestricted regression” comes in two parts: for each of the sub-samples.
We can thus form an F-test which is the difference between the RSS’s.
where:
RSS = RSS for whole sample
RSS1 = RSS for sub-sample 1
RSS2 = RSS for sub-sample 2
T = number of observations
2k = number of regressors in the “unrestricted” regression (since it comes in
two parts)
k = number of regressors in (each part of the) “unrestricted” regression
3. Perform the test. If the value of the test statistic is greater than the critical
value from the F-distribution, which is an F(k, T-2k), then reject the null
hypothesis that the parameters are stable over time.
• Consider the following regression for the CAPM (again) for the returns on
Glaxo.
• Say that we are interested in estimating Beta for monthly data from 1981-
1992. The model for each sub-period is
• 1981M1 - 1987M10
0.24 + 1.2RMt T = 82 RSS1 = 0.03555
• 1987M11 - 1992M12
0.68 + 1.53RMt T = 62 RSS2 = 0.00336
• 1981M1 - 1992M12
0.39 + 1.37RMt T = 144 RSS = 0.0434
H0 : 1 2 and 1 2
• The unrestricted model is the model where this restriction is not imposed
0.0434 00355
. 000336
. 144 4
Test statistic
00355
. 000336
. 2
= 7.698
• We reject H0 at the 5% level and say that we reject the restriction that the
coefficients are the same in the two periods.
• Problem with the Chow test is that we need to have enough data to do the
regression on both sub-samples, i.e. T1>>k, T2>>k.
• An alternative formulation is the predictive failure test.
• What we do with the predictive failure test is estimate the regression over a “long” sub-
period (i.e. most of the data) and then we predict values for the other period and compare
the two.
1000
Value of Series (y t)
800
600
400
200
0
- Split the data according to any known
1
27
53
79
157
183
209
235
261
287
313
417
443
105
131
339
365
391
Sample Period
important historical events (e.g. stock market crash, new government elected)
- Use all but the last few observations and do a predictive failure test on those.
Our Objective:
• “Specific-to-general” was used almost universally until the mid 1980’s, and
involved starting with the simplest model and gradually adding to it.
• Little, if any, diagnostic testing was undertaken. But this meant that all inferences
were potentially invalid.
• The advantages of this approach are that it is statistically sensible and also the
theory on which the models are based usually has nothing to say about the lag
structure of a model.
• First step is to form a “large” model with lots of variables on the right hand side
• This is known as a GUM (generalised unrestricted model)
• At this stage, we want to make sure that the model satisfies all of the
assumptions of the CLRM
• If the assumptions are violated, we need to take appropriate actions to remedy
this, e.g.
- taking logs
- adding lags
- dummy variables
• We need to do this before testing hypotheses
• Once we have a model which satisfies the assumptions, it could be very big
with lots of lags & independent variables
Financial background:
• What are sovereign credit ratings and why are we interested in them?
• Two ratings agencies (Moody’s and Standard and Poor’s) provide credit ratings
for many governments.
• Data
Quantifying the ratings (dependent variable): Aaa/AAA=16, ... , B3/B-=1
Dependent Variable
Expected Average Moody’s S&P Moody’s / S&P
Explanatory Variable sign Rating Rating Rating Difference
Intercept ? 1.442 3.408 -0.524 3.932**
(0.663) (1.379) (-0.223) (2.521)
Per capita income + 1.242*** 1.027*** 1.458*** -0.431***
(5.302) (4.041) (6.048) (-2.688)
GDP growth + 0.151 0.130 0.171** -0.040
(1.935) (1.545) (2.132) (0.756)
Inflation - -0.611*** -0.630*** -0.591*** -0.039
(-2.839) (-2.701) (2.671) (-0.265)
Fiscal Balance + 0.073 0.049 0.097* -0.048
(1.324) (0.818) (1.71) (-1.274)
External Balance + 0.003 0.006 0.001 0.006
(0.314) (0.535) (0.046) (0.779)
External Debt - -0.013*** -0.015*** -0.011*** -0.004***
(-5.088) (-5.365) (-4.236) (-2.133)
Development dummy + 2.776*** 2.957*** 2.595*** 0.362
(4.25) (4.175) (3.861) (0.81)
Default dummy - -2.042*** -1.63** -2.622*** 1.159***
(-3.175) (-2.097) (-3.962) (2.632)
Adjusted R2 0.924 0.905 0.926 0.836
Notes: t-ratios in parentheses; *, **, and *** indicate significance at the 10%, 5% and 1% levels
respectively. Source: Cantor and Packer (1996). Reprinted with permission from Institutional Investor.
• Adjusted R2 is high
0 /1 dummies for
and
• The ratings provide more information on yields than all of the macro
factors put together.
• We cannot determine well what factors influence how the markets will
react to ratings announcements.
• No attempt at reparameterisation