Violation of the Classical Assumptions
Violation of the Classical Assumptions
REGRESSION DIAGNOSTIC I:
MULTICOLLINEARITY
Damodar Gujarati
Econometrics by Example
MULTICOLLINEARITY
One of the assumptions of the classical linear
regression (CLRM) is that there is no exact linear
relationship among the regressors.
If there are one or more such relationships among
the regressors, we call it multicollinearity, or
collinearity for short.
Perfect collinearity: A perfect linear relationship
between the two variables exists.
Imperfect collinearity: The regressors are highly (but
not perfectly) collinear.
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
If collinearity is not perfect, but high, several
consequences ensue:
The OLS estimators are still BLUE, but one or more regression
coefficients have large standard errors relative to the values of
the coefficients, thereby making the t ratios small.
Even though some regression coefficients are statistically
insignificant, the R2 value may be very high.
Therefore, one may conclude (misleadingly) that the true values
of these coefficients are not different from zero.
Also, the regression coefficients may be very sensitive to small
changes in the data, especially if the sample is relatively small.
Damodar Gujarati
Econometrics by Example
VARIANCE INFLATION FACTOR
For the following regression model:
Yi B1 B2 X 2i B3 X 3i ui
It can be shown that:
2
2
var(b2 ) 2 2
2
VIF
x (1 r )
2i 23 x 2i
and
2
2
var(b3 ) 2 2
2
VIF
x (1 r )
3i 23 x 3i
where σ2 is the variance of the error term ui, and r23 is the
coefficient of correlation between X2 and X3.
Damodar Gujarati
Econometrics by Example
VARIANCE INFLATION FACTOR (CONT.)
1
VIF 2
1 r23
is the variance-inflating factor.
Damodar Gujarati
Econometrics by Example
DETECTION OF MULTICOLLINEARITY
1. High R2 but few significant t ratios
2. High pair-wise correlations among explanatory
variables or regressors
3. High partial correlation coefficients
4. Significant F test for auxiliary regressions
(regressions of each regressor on the remaining
regressors)
5. High Variance Inflation Factor (VIF) and low
Tolerance Factor (TOL, the inverse of VIF)
Damodar Gujarati
Econometrics by Example
REMEDIAL MEASURES
What should we do if we detect multicollinearity?
Nothing, for we often have no control over the data.
Redefine the model by excluding variables may attenuate
the problem, provided we do not omit relevant variables.
Principal components analysis: Construct artificial
variables from the regressors such that they are orthogonal
to one another.
These principal components become the regressors in the
model.
Yet the interpretation of the coefficients on the principal
components is not as straightforward.
Damodar Gujarati
Econometrics by Example
Damodar Gujarati
Econometrics by Example
CHAPTER 5
Damodar Gujarati
Econometrics by Example, second edition
HETEROSCEDASTICITY
One of the assumptions of the classical linear
regression (CLRM) is that the variance of ui, the
error term, is constant, or homoscedastic.
Reasons are many, including:
The presence of outliers in the data
Incorrect functional form of the regression model
Incorrect transformation of data
Mixing observations with different measures of scale
(such as mixing high-income households with low-
income households).
Damodar Gujarati
Econometrics by Example, second edition
CONSEQUENCES
If heteroscedasticity exists, several consequences
ensue:
The OLS estimators are still unbiased and consistent, yet the
estimators are less efficient, making statistical inference less
reliable (i.e., the estimated t values may not be reliable).
Thus, estimators are not best linear unbiased estimators
(BLUE); they are simply linear unbiased estimators (LUE).
In the presence of heteroscedasticity, the BLUE estimators
are provided by the method of weighted least squares
(WLS).
Damodar Gujarati
Econometrics by Example, second edition
DETECTION OF HETEROSCEDASTICITY
Graph histogram of squared residuals
Graph squared residuals against predicted Y
Breusch-Pagan (BP) Test
White’s Test of Heteroscedasticity
Other tests such as Park, Glejser, Spearman’s rank
correlation, and Goldfeld-Quandt tests of
heteroscedasticity
Damodar Gujarati
Econometrics by Example, second edition
BREUSCH-PAGAN (BP) TEST
Estimate the OLS regression, and obtain the squared OLS residuals
from this regression.
Regress the square residuals on the k regressors included in the model.
You can choose other regressors also that might have some bearing
on the error variance.
The null hypothesis here is that the error variance is homoscedastic –
that is, all the slope coefficients are simultaneously equal to zero.
Use the F statistic from this regression with (k-1) and (n-k) in the
numerator and denominator df, respectively, to test this hypothesis.
If the computed F statistic is statistically significant, we can reject
the hypothesis of homoscedasticity. If it is not, we may not reject
the null hypothesis.
Damodar Gujarati
Econometrics by Example, second edition
WHITE’S TEST OF HETEROSCEDASTICITY
Regress the squared residuals on the regressors, the
squared terms of these regressors, and the pair-wise cross-
product term of each regressor.
Obtain the R2 value from this regression and multiply it by
the number of observations.
Under the null hypothesis that there is homoscedasticity,
this product follows the Chi-square distribution with df
equal to the number of coefficients estimated.
The White test is more general and more flexible than the
BP test.
Damodar Gujarati
Econometrics by Example, second edition
REMEDIAL MEASURES
What should we do if we detect heteroscedasticity?
Use method of Weighted Least Squares (WLS)
Divide each observation by the (heteroscedastic) σi and estimate the
transformed model by OLS (yet true variance is rarely known).
If the true error variance is proportional to the square of one of the
regressors, we can divide both sides of the equation by that variable
and run the transformed regression.
Take natural log of dependent variable.
Use White’s heteroscedasticity-consistent standard errors or
robust standard errors.
Valid in large samples
Damodar Gujarati
Econometrics by Example, second edition
CHAPTER 6
Damodar Gujarati
Econometrics by Example
AUTOCORRELATION
One of the assumptions of the classical linear
regression (CLRM) is that the covariance between
ui, the error term for observation i, and uj, the error
term for observation j, is zero.
Reasons for autocorrelation include:
The possible strong correlation between the shock in
time t with the shock in time t+1
More common in time series data
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
If autocorrelation exists, several consequences ensue:
The OLS estimators are still unbiased and consistent.
They are still normally distributed in large samples.
They are no longer efficient, meaning that they are no longer
BLUE.
In most cases standard errors are underestimated.
Thus, the hypothesis-testing procedure becomes suspect, since
the estimated standard errors may not be reliable, even
asymptotically (i.e., in large samples).
Damodar Gujarati
Econometrics by Example
DETECTION OF AUTOCORRELATION
Graphical method
Plot the values of the residuals, et, chronologically
If discernible pattern exists, autocorrelation likely a problem
Durbin-Watson test
Breusch-Godfrey (BG) test
Damodar Gujarati
Econometrics by Example
DURBIN-WATSON (d) TEST
The Durbin-Watson d statistic is defined as:
t n
(e et t 1 ) 2
d t 2 t n
e
t 1
2
t
Damodar Gujarati
Econometrics by Example
DURBIN-WATSON (d) TEST ASSUMPTIONS
Assumptions are:
1. The regression model includes an intercept term.
2. The regressors are fixed in repeated sampling.
3. The error term follows the first-order autoregressive (AR1)
scheme:
ut ut 1 vt
where ρ (rho) is the coefficient of autocorrelation, a value between -1
and 1.
4. The error term is normally distributed.
5. The regressors do not include the lagged value(s) of the
dependent variable, Yt.
Damodar Gujarati
Econometrics by Example
DURBIN-WATSON (d) TEST (CONT.)
Two critical values of the d statistic, dL and dU, called the lower and upper
limits, are established
The decision rules are as follows:
1. If d < dL, there probably is evidence of positive autocorrelation.
2. If d > dU, there probably is no evidence of positive autocorrelation.
3. If dL < d < dU, no definite conclusion about positive autocorrelation.
4. If dU < d < 4 - dU, probably there is no evidence of positive or negative
autocorrelation.
5. If 4 - dU < d < 4 - dL, no definite conclusion about negative autocorrelation.
6. If 4 - dL < d < 4, there probably is evidence of negative autocorrelation.
d value always lies between 0 and 4
The closer it is to zero, the greater is the evidence of positive autocorrelation,
and the closer it is to 4, the greater is the evidence of negative
autocorrelation. If d is about 2, there is no evidence of positive or negative
(first) order autocorrelation.
Damodar Gujarati
Econometrics by Example
BREUSCH-GODFREY (BG) TEST
This test allows for:
(1) Lagged values of the dependent variables to be included as
regressors
(2) Higher-order autoregressive schemes, such as AR(2), AR(3), etc.
(3) Moving average terms of the error term, such as ut-1, ut-2, etc.
The error term in the main equation follows the following AR(p)
autoregressive structure:
ut 1ut 1 2ut 2 ... p ut p vt
Damodar Gujarati
Econometrics by Example
BREUSCH-GODFREY (BG) TEST (CONT.)
The BG test involves the following steps:
Regress et, the residuals from our main regression, on the regressors in
the model and the p autoregressive terms given in the equation on the
previous slide, and obtain R2 from this auxiliary regression.
If the sample size is large, BG have shown that: (n – p)R2 ~ X2p
That is, in large samples, (n – p) times R2 follows the chi-square distribution with
p degrees of freedom.
Rejection of the null hypothesis implies evidence of autocorrelation.
As an alternative, we can use the F value obtained from the auxiliary
regression.
This F value has (p , n-k-p) degrees of freedom in the numerator and
denominator, respectively, where k represents the number of parameters in the
auxiliary regression (including the intercept term).
Damodar Gujarati
Econometrics by Example
REMEDIAL MEASURES
First-Difference Transformation
If autocorrelation is of AR(1) type, we have: ut ut 1 vt
Assume ρ=1 and run first-difference model (taking first difference
of dependent variable and all regressors)
Generalized Transformation
Estimate value of ρ through regression of residual on lagged
residual and use value to run transformed regression
Newey-West Method
Generates HAC (heteroscedasticity and autocorrelation
consistent) standard errors
Model Evaluation
Damodar Gujarati
Econometrics by Example