CHAPTER 6
CHAPTER 6
HETEROSCEDASTICITY
6.1 Introduction
Model: Yi = β1 + β2 X 2i + β3 X 3i + . . . + βK X Ki + εi
Var(εi ) = E(εi2 ) = σ 2 i = 1, 2, . . ., n
This assumption tells us that the variance remains constant for all observations. But there are
many situations in which this assumption may not hold. For example, the variance of the error
term may increase or decrease with the dependent variable or one of the independent variables.
Under such circumstances, we have the case of heteroscedasticity. Generally, under
heteroscedasticity we have:
E( εi2 ) = k i σ 2 k i are not all equal
We have said earlier that it seems more plausible that the assumption of error autocorrelation is
violated in case of time series data than cross-sectional data. Here the reverse holds, that is,
heteroscedasticity often occurs in cross-sectional data.
Consider the case of data on income and expenditure of individual families. Here the
assumption of homoscedasticity is not very plausible since we expect less variation in
consumption for low income families than for high income families. At low levels of income,
the average level of consumption is low and the variation around this level is
restricted: consumption can not fall too far below the average level because this might mean
starvation, and it can not rise too far above the average level because the asset does not
allow it. These constraints are likely to be less binding at higher income levels.
1
6.2 Consequences of heteroscedasticity
Consider the simple linear regression model (in deviation form):
y i = β x i + εi i = 1, 2, . . ., n
where εi satisfies all assumptions of the CLRM except that the error terms are heteroscedastic,
that is, E( εi2 ) = k i σ2 and k i are not all equal.
βˆ =
∑ x i yi
∑ xi2
Then we have:
βˆ =
∑x y i i
= β
∑x 2
i
+
∑x ε i i
= β +
∑x ε i i
∑x 2
i ∑x 2
i ∑x i
2
∑x 2
i
=0
⇒ E(βˆ ) = β +
∑ x t E(ε t ) = β
∑x 2
t
Recall that the variance of the OLS estimator β̂ when there is no heteroscedasticity (or under
homoscedasticity) is given by:
σ2
Var(βˆ ) =
∑ x i2
Under heteroscedasticity we have:
∑ x i εi
2
=0
= k iσ 2
∑ x i2 E(εi2 ) ∑ xi x j E(εiε j )
i≠ j
= +
(∑ x ) 2 2
i (∑ x ) 2 2
i
σ 2 ∑ k i x i2 σ2 ∑k x 2
∑ k i x i2
= Var(βˆ )
i i
2
= =
(∑ x ) 2 2
i
∑ x i2 ∑x 2
i ∑ x i
Thus, it can be seen that the two variances, Var(βˆ ) and Var(βˆ ) HET will be equal only if
k i = 1 ∀ i , that is, only if the errors are homoscedastic.
• If ∑k x
i
2
i
> 1 , then OLS will underestimate the variance of β̂
∑x 2
i
Thus, under heteroscedasticity, the OLS estimators of the regression coefficients are not
BLUE and efficient. Generally, under error heteroscedasticity we have the following:
1. The OLS estimators of the regression coefficients are still unbiased and consistent.
2. The estimated variances of the OLS estimators are biased and the conventionally calculated
confidence intervals and test of significance are invalid.
1. White’s test
Decision rule: Reject H 0 (the hypothesis of homoscedasticity) if the above test statistic
exceeds the value from the Chi-square distribution with p degrees of freedom for a given level
of significance α .
If our model has only one independent variable X i , then p = 2: Z1i = X i and Z 2i = X i2 . If we
have two independent variables X1i and X 2i , then p = 5: Z1i = X1i , Z 2i = X 2i , Z3i = X1i X 2i ,
Z 4i = X1i2 and Z5i = X 22i . And so on.
2. Goldfeld-Quandt test
Suppose we have a model with one explanatory variable X1 and let Y be the dependent
variable. The steps involved in this test are the following:
a) Arrange the observations (both Y and X1 ) in increasing order of X1 .
b) Divide the observations into three parts: n1 observations in the first part, p observations in
the middle part, and n 2 observations in the second part ( n1 + n 2 + p = n ). Usually p is
taken to be one-sixth of n.
Note: The variances of the last several disturbances in the first part are likely to be similar to
those of the first several disturbances in the second part. To increase the power of the test, it is
recommended that the two parts be some distance apart. Thus, we drop the middle p residuals
all together.
s 22
d) Calculate the test statistic: Fcal = 2
s1
e) Decision rule: Reject the null hypothesis H 0 : σ12 = σ 22 (and conclude that the errors are
heteroscedastic) if:
Fcal > Fα (n1 − 2 , n 2 − 2)
where Fα (n1 − 2 , n 2 − 2) is the critical value from the F-distribution with n1 − 2 and
n 2 − 2 degrees of freedom in the numerator and denominator, respectively, for a given
significance level α .
3. Breusch-Pagan test
Thus, the variance of the transformed disturbances is constant. So we can apply OLS to
equation (*) to get regression coefficient estimates that are BLUE (Gauss-Markov Theorem).
This estimation method is known as weighted least squares (WLS) since each observation is
weighted (multiplied) by w i = 1/ σi .
The major difficulty with WLS is that σi2 are rarely known. We can overcome this by making
certain assumptions about σi2 or by estimating σi2 from the sample. The information about σi2
is frequently in the form of an assumption that σi2 is associated with some variable, say Zi .
Illustrative example: Consider the following data on consumption expenditure (Y) and
income (X) for 20 households (both in thousands of Dollars):
A plot of the residuals ε̂i against the values of the explanatory variable X i is shown below.
2
1
Residuals
-1
-2
-3
0 10 20 30 40 50
inco me (x)
1. Goldfeld-Quandt test
In order to apply this test, we should first order the observations based on the absolute
magnitude of the explanatory variable X. We then divide the data into three parts: n1 = 8 , p = 4
and n 2 = 8 . As mentioned earlier, to increase the power of the test we drop the middle p = 4
residuals. We then run a separate regression on the first and the second parts, and calculate the
residual variance for each of the two parts. The results are: s12 = 0.316 and s 22 = 3.383 .
,
Calculate the Goldfeld-Quandt test statistic as:
s2 3.383
Fcal = 22 = = 10.706
s1 0.316
For α = 0.05, Fα (n1 − 2, n 2 − 2) = F0.05 (6, 6) = 4.28 .
Decision: Since Fcal = 10.706 is greater than the tabulated value, we reject the null
hypothesis of homoscedasticity at the 5% significance level.
This involves applying OLS to: εˆ i2 = δ0 + δ1Xi + δ 2 X i2 + u i and computing the coefficient of
determination R 2W . This yields R 2W = 0.878 . The White test statistic is:
χ 2cal = n R 2W = 20(0.878) = 17.56
We compare this value with χ 2 α (p) for a given level of significance α. For α = 0.05,
χ 2 0.05 (2) = 5.991 .
Decision: Since χ 2cal = 17.56 is greater than the tabulated value, we reject the null hypothesis
of homoscedasticity at the 5% level of significance.
First we apply OLS estimation and obtain the residuals ε̂i . We then order the residuals based
on the absolute magnitude of the explanatory variable (income). Next we divide the residuals
into three parts: the first and second parts consisting of seven residuals and the third part
1
consisting of six residuals. The variance of each part is computed as: σˆ i2 = ∑ εˆ i2 , where n i
ni
th
is the number of residuals in the i part, i = 1, 2, 3. The results are:
σ̂12 = 0.219597 ⇒ σ̂1 = 0.468612
σ̂ 22 = 1.281444 ⇒ σ̂ 2 = 1.132009
σ̂32 = 3.482903 ⇒ σ̂3 = 1.866254
The next step is to divide the values of the dependent variable, the independent variable and the
constant term (a vector of 1’s) in the ith part by σi :
yi 1 x ε
= α( ) + β( i ) + u i where u i = i .
σˆ i σˆ i σˆ i σˆ i
1 Xi Yi
σ̂i σ̂i σ̂i
2.1340 13.2306 13.0172
2.1340 17.2851 17.0717
2.1340 21.9798 21.9798
2.1340 25.8209 25.8209
2.1340 30.0889 27.9549
2.1340 34.9970 31.5826
2.1340 38.8381 38.1979
0.8834 17.7560 17.4910
0.8834 19.6995 17.5794
0.8834 21.2896 19.0811
0.8834 23.0564 22.5263
0.8834 24.9998 22.0846
0.8834 26.5899 25.8832
0.8834 28.5333 27.5616
0.5358 18.4862 17.7361
0.5358 19.6115 17.0395
0.5358 20.3616 17.9504
0.5358 21.5405 20.7903
0.5358 22.6657 21.8084
0.5358 23.9517 20.6831
The plot of the residuals of the transformed model against the explanatory variable (income) is
shown below. It can be seen that the spread of the residuals has no increasing or decreasing
pattern, i.e., there is no heteroscedasticity.
1.5
1.0
.5
Residual
0.0
-.5
-1.0
-1.5
-2.0
0 10 20 30 40 50
INCOME
Another method of correcting for heteroscedasticity is based on the assumption that the
variance of the disturbances is positively associated with level of income X , that is,
σ2i = σ 2 Xi2 .
The model we are going to estimate is then:
Yi 1 X εi
= α + β i +
Xi Xi Xi Xi
1
Yi
⇒ = α + β + ui where u i = εi / Xi .
Xi
Xi
This simply means that we apply OLS by regressing Yi / X i on 1/ Xi . The SPSS output is
shown below:
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression .010 1 .010 5.275 .034a
Residual .033 18 .002
Total .043 19
a. Predictors: (Constant), oneoverxi
b. Dependent Variable: yioverxi
The model is adequate as judged by the F-test at the 5% level of significance. The estimated
model is shown below:
Econometrics Lecture notes for 9
Masters Program,2015 Hawassa University
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .910 .017 52.624 .000
oneoverxi .612 .266 .476 2.297 .034
a. Dependent Variable: yioverxi
Note that the estimated constant term and slope from the transformed model correspond to the
values of β̂ and α̂ , respectively. Thus, the estimated model is:
Yi = 0.612 + 0.910 X i
(2.297) (52.624)
A plot of the residuals from the transformed is shown below. The plot does not indicate any
increasing or decreasing pattern in the scatter of the residuals.
0.04000
0.02000
Unstandardized Residual
0.00000
-0.02000
-0.04000
-0.06000
-0.08000
0.00 10.00 20.00 30.00 40.00 50.00
income