0% found this document useful (0 votes)
11 views

CHAPTER 6

Chapter 6 discusses heteroscedasticity, a violation of the constant variance assumption in regression models, particularly in cross-sectional data. It outlines the consequences of heteroscedasticity on OLS estimators, noting that while they remain unbiased, their variances are biased, leading to invalid confidence intervals and significance tests. The chapter also details various tests for detecting heteroscedasticity, such as White's test, Goldfeld-Quandt test, and Breusch-Pagan test, along with methods for correcting it through weighted least squares.

Uploaded by

Mohammed Sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CHAPTER 6

Chapter 6 discusses heteroscedasticity, a violation of the constant variance assumption in regression models, particularly in cross-sectional data. It outlines the consequences of heteroscedasticity on OLS estimators, noting that while they remain unbiased, their variances are biased, leading to invalid confidence intervals and significance tests. The chapter also details various tests for detecting heteroscedasticity, such as White's test, Goldfeld-Quandt test, and Breusch-Pagan test, along with methods for correcting it through weighted least squares.

Uploaded by

Mohammed Sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CHAPTER 6

HETEROSCEDASTICITY

6.1 Introduction

Dependent variable: Y of size nx1


Independent (explanatory) variables: X 2 , X 3 , . . ., X K each of size nx1

Model: Yi = β1 + β2 X 2i + β3 X 3i + . . . + βK X Ki + εi

where i = 1, 2, ..., n. One of the assumptions of the CLRM is:

Var(εi ) = E(εi2 ) = σ 2 i = 1, 2, . . ., n

This assumption tells us that the variance remains constant for all observations. But there are
many situations in which this assumption may not hold. For example, the variance of the error
term may increase or decrease with the dependent variable or one of the independent variables.
Under such circumstances, we have the case of heteroscedasticity. Generally, under
heteroscedasticity we have:
E( εi2 ) = k i σ 2 k i are not all equal

Figure 1: Error variance increases with X i

We have said earlier that it seems more plausible that the assumption of error autocorrelation is
violated in case of time series data than cross-sectional data. Here the reverse holds, that is,
heteroscedasticity often occurs in cross-sectional data.

Consider the case of data on income and expenditure of individual families. Here the
assumption of homoscedasticity is not very plausible since we expect less variation in
consumption for low income families than for high income families. At low levels of income,
the average level of consumption is low and the variation around this level is
restricted: consumption can not fall too far below the average level because this might mean
starvation, and it can not rise too far above the average level because the asset does not
allow it. These constraints are likely to be less binding at higher income levels.

1
6.2 Consequences of heteroscedasticity
Consider the simple linear regression model (in deviation form):
y i = β x i + εi i = 1, 2, . . ., n
where εi satisfies all assumptions of the CLRM except that the error terms are heteroscedastic,
that is, E( εi2 ) = k i σ2 and k i are not all equal.

The OLS estimator of β is:

βˆ =
∑ x i yi
∑ xi2
Then we have:

βˆ =
∑x y i i
= β
∑x 2
i
+
∑x ε i i
= β +
∑x ε i i

∑x 2
i ∑x 2
i ∑x i
2
∑x 2
i

=0

⇒ E(βˆ ) = β +
∑ x t E(ε t ) = β
∑x 2
t

Thus, β̂ is an unbiased estimator of β even in the presence of heteroscedasticity.

Recall that the variance of the OLS estimator β̂ when there is no heteroscedasticity (or under
homoscedasticity) is given by:
σ2
Var(βˆ ) =
∑ x i2
Under heteroscedasticity we have:
 ∑ x i εi 
2

Var(βˆ ) HET = E(β − βˆ ) 2


= E 2 
 
 ∑ xi 

 ∑ x i εi
2 2
i≠ j
∑xx εε i 
j i j

= E + 
 (∑ xi ) ( ∑ x i2 ) 
2 2 2


=0
= k iσ 2 

∑ x i2 E(εi2 ) ∑ xi x j E(εiε j )
i≠ j
= +
(∑ x ) 2 2
i (∑ x ) 2 2
i

σ 2 ∑ k i x i2 σ2 ∑k x 2
 ∑ k i x i2 
= Var(βˆ ) 
i i
2 
= =
(∑ x ) 2 2
i
∑ x i2 ∑x 2
i  ∑ x i 

Thus, it can be seen that the two variances, Var(βˆ ) and Var(βˆ ) HET will be equal only if
k i = 1 ∀ i , that is, only if the errors are homoscedastic.

Econometrics Lecture notes for 2


Masters Program,2015 Hawassa University
• If ∑k x
i
2
i
< 1 , then OLS will overestimate the variance of β̂
∑x 2
i

• If ∑k x
i
2
i
> 1 , then OLS will underestimate the variance of β̂
∑x 2
i

Thus, under heteroscedasticity, the OLS estimators of the regression coefficients are not
BLUE and efficient. Generally, under error heteroscedasticity we have the following:
1. The OLS estimators of the regression coefficients are still unbiased and consistent.
2. The estimated variances of the OLS estimators are biased and the conventionally calculated
confidence intervals and test of significance are invalid.

6.3 Tests of heteroscedasticity

1. White’s test

This test involves applying OLS to:


εˆ i2 = γ 0 + γ1 Z1i + γ 2 Z 2i + ... + γ p Zpi + u i
and calculating the coefficient of determination R 2W , where ε̂i are OLS residuals from the
original model. The null hypothesis is:
H 0 : γ1 = γ 2 = . . . = γ p = 0
The test statistic is:
χ 2cal = n R W 2

Decision rule: Reject H 0 (the hypothesis of homoscedasticity) if the above test statistic
exceeds the value from the Chi-square distribution with p degrees of freedom for a given level
of significance α .

If our model has only one independent variable X i , then p = 2: Z1i = X i and Z 2i = X i2 . If we
have two independent variables X1i and X 2i , then p = 5: Z1i = X1i , Z 2i = X 2i , Z3i = X1i X 2i ,
Z 4i = X1i2 and Z5i = X 22i . And so on.

2. Goldfeld-Quandt test

Suppose we have a model with one explanatory variable X1 and let Y be the dependent
variable. The steps involved in this test are the following:
a) Arrange the observations (both Y and X1 ) in increasing order of X1 .
b) Divide the observations into three parts: n1 observations in the first part, p observations in
the middle part, and n 2 observations in the second part ( n1 + n 2 + p = n ). Usually p is
taken to be one-sixth of n.

Econometrics Lecture notes for 3


Masters Program,2015 Hawassa University
c) Run a regression on the first n1 observations, obtain the residuals ε̂1i , and calculate the
n1
residual variance s = 2
1 ∑ εˆ
i =1
2
1i /(n1 − 2) . Similarly run a regression on the second n 2
n2
2
observations, obtain the residuals ε̂ 2 i , and calculate the variance s =
2 ∑ εˆ
i =1
2
2i /(n 2 − 2) .

Note: The variances of the last several disturbances in the first part are likely to be similar to
those of the first several disturbances in the second part. To increase the power of the test, it is
recommended that the two parts be some distance apart. Thus, we drop the middle p residuals
all together.

s 22
d) Calculate the test statistic: Fcal = 2
s1
e) Decision rule: Reject the null hypothesis H 0 : σ12 = σ 22 (and conclude that the errors are
heteroscedastic) if:
Fcal > Fα (n1 − 2 , n 2 − 2)

where Fα (n1 − 2 , n 2 − 2) is the critical value from the F-distribution with n1 − 2 and
n 2 − 2 degrees of freedom in the numerator and denominator, respectively, for a given
significance level α .

3. Breusch-Pagan test

This involves applying OLS to:


εˆ 2i
= γ 0 + γ1X1i + γ 2 X 2i + ... + γ K X Ki + u i
σˆ 2
and calculating the regression sum of squares (RSS). The test statistic is:
RSS
χ 2cal =
2

Decision rule: Reject the null hypothesis of homoscedasticity: H 0 : γ1 = γ 2 = . . . = γ K = 0 if:


χ 2cal > χ α2 (K)
where χ 2α (K) is the critical value from the Chi-square distribution with K degrees of freedom
for a given value of α .

6.4 Correction for heteroscedasticity

Consider the model:


Yi = α + βX i + εi …………….. (1)
where E( εi2 ) = σi2 is known for i = 1, 2, ..., n. We make the following transformation:

Econometrics Lecture notes for 4


Masters Program,2015 Hawassa University
 Yi  α  Xi   εi 
  =   + β  +  
σi  σi 
 σi  σi 
= Yi* = α* = X*i = ε*i

The transformed model can be written as:

Yi* = α* + β X*i + ε*i ………. (2)


We then check if the disturbances of equation (2) satisfy OLS assumptions:
=0

ε  E(εi )
• E(ε*i ) = E  i  = = 0
 σi  σi
2
* * 2 ε  E(εi2 ) σi2
• Var(ε ) = E(ε )
i i = E i  = = = 1
 σi  σi2 σi2

Thus, the variance of the transformed disturbances is constant. So we can apply OLS to
equation (*) to get regression coefficient estimates that are BLUE (Gauss-Markov Theorem).
This estimation method is known as weighted least squares (WLS) since each observation is
weighted (multiplied) by w i = 1/ σi .

Specification of the weights

The major difficulty with WLS is that σi2 are rarely known. We can overcome this by making
certain assumptions about σi2 or by estimating σi2 from the sample. The information about σi2
is frequently in the form of an assumption that σi2 is associated with some variable, say Zi .

Illustration 1: In the case of micro-consumption function, the variance of the disturbances is


often assumed to be positively associated with level of income. So the place of Zi will be
taken by the explanatory variable X i = income, that is,
σ2i = σ 2 Zi2 = σ 2 Xi2 .
We then divide equation (1) throughout by X i :
Yi  1  X  εi
= α  + β i  +
Xi  Xi   Xi  Xi
Yi  1 
⇒ = α   + β + ui ………… (3)
Xi  Xi 
where u i = εi / Xi . Now we have:
2
 εi   εi  E(εi2 ) σ 2 X i2
Var(u i ) = Var   = E   = 2
= 2
= σ2
 Xi   Xi  Xi Xi
Hence, the variance of the disturbance term in equation (3) is constant, and we can apply OLS
by regressing Yi / X i on 1/ Xi . Note that the estimated constant term and slope from the
transformed model (3) correspond to the values of β̂ and α̂ , respectively.

Econometrics Lecture notes for 5


Masters Program,2015 Hawassa University
Illustration 2: In the case of micro-consumption function, the variance of the disturbances may
also be thought to be associated with changes in some ‘outside’ variable, say the size of the
family = Zi , that is,
σ2i = σ 2 Zi2 .
We then divide equation (1) throughout by Zi :
Yi  1  X 
= α   + β  i  + νi
Zi  Zi   Zi 
where νi = εi / Zi . It can easily be shown that Var(ν i ) = σ 2 . To estimate the regression
coefficients, we should run a regression of Yi / Zi on 1/ Zi and X i / Zi without a constant
term.

Illustrative example: Consider the following data on consumption expenditure (Y) and
income (X) for 20 households (both in thousands of Dollars):

Household income expenditure Household income expenditure


1 22.3 19.9 11 8.1 8.0
2 32.3 31.2 12 34.5 33.1
3 36.6 31.8 13 38.0 33.5
4 12.1 12.1 14 14.1 13.1
5 42.3 40.7 15 16.4 14.8
6 6.2 6.1 16 24.1 21.6
7 44.7 38.6 17 30.1 29.3
8 26.1 25.5 18 28.3 25.0
9 10.3 10.3 19 18.2 17.9
10 40.2 38.8 20 20.1 19.8

Applying OLS we get the following results (SPSS output):


Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .847 .703 1.204 .244
income .899 .025 .993 35.534 .000

R 2 = 0.986 , F = 1262.637 (p-value < 0.001)

A plot of the residuals ε̂i against the values of the explanatory variable X i is shown below.
2

1
Residuals

-1

-2

-3
0 10 20 30 40 50

inco me (x)

Econometrics Lecture notes for 6


Masters Program,2015 Hawassa University
It can clearly be seen that the scatter of the residuals (i.e., the variance of the residuals)
increases with X i . This is an indication of a heteroscedasticity problem. However, we should
not come to a conclusion until we apply formal tests of the hypothesis of homoscedasticity.

1. Goldfeld-Quandt test

In order to apply this test, we should first order the observations based on the absolute
magnitude of the explanatory variable X. We then divide the data into three parts: n1 = 8 , p = 4
and n 2 = 8 . As mentioned earlier, to increase the power of the test we drop the middle p = 4
residuals. We then run a separate regression on the first and the second parts, and calculate the
residual variance for each of the two parts. The results are: s12 = 0.316 and s 22 = 3.383 .
,
Calculate the Goldfeld-Quandt test statistic as:
s2 3.383
Fcal = 22 = = 10.706
s1 0.316
For α = 0.05, Fα (n1 − 2, n 2 − 2) = F0.05 (6, 6) = 4.28 .
Decision: Since Fcal = 10.706 is greater than the tabulated value, we reject the null
hypothesis of homoscedasticity at the 5% significance level.

2. Breusch - Pagan test

This involves applying OLS to:


εˆ 2i
= γ 0 + γ1X i + u i
σˆ 2
where σˆ 2 = 1.726 and computing the regression sum of squares (RSS). The OLS result
indicates that RSS = 12.132 . The Breusch – Pagan test statistic is then:
RSS 12.132
χ 2cal = = = 6.066
2 2
For α = 0.05, the critical value is χ 2 α (K) = χ 2 0.05 (1) = 3.841 .
Decision: We reject the null hypothesis of homoscedasticity at the 5% significance level.

3. The White test

This involves applying OLS to: εˆ i2 = δ0 + δ1Xi + δ 2 X i2 + u i and computing the coefficient of
determination R 2W . This yields R 2W = 0.878 . The White test statistic is:
χ 2cal = n R 2W = 20(0.878) = 17.56
We compare this value with χ 2 α (p) for a given level of significance α. For α = 0.05,
χ 2 0.05 (2) = 5.991 .
Decision: Since χ 2cal = 17.56 is greater than the tabulated value, we reject the null hypothesis
of homoscedasticity at the 5% level of significance.

Econometrics Lecture notes for 7


Masters Program,2015 Hawassa University
Weighted least squares (WLS)
All of the tests indicate that the disturbances are heteroscedastic. Thus, the regression
coefficients obtained by OLS are not efficient. In such cases, we have to apply weighted least
squares (WLS) estimation. The weights can be obtained from the sample at hand or from some
prior knowledge. In our example we will estimate the weights σi from the sample.

First we apply OLS estimation and obtain the residuals ε̂i . We then order the residuals based
on the absolute magnitude of the explanatory variable (income). Next we divide the residuals
into three parts: the first and second parts consisting of seven residuals and the third part
1
consisting of six residuals. The variance of each part is computed as: σˆ i2 = ∑ εˆ i2 , where n i
ni
th
is the number of residuals in the i part, i = 1, 2, 3. The results are:
σ̂12 = 0.219597 ⇒ σ̂1 = 0.468612
σ̂ 22 = 1.281444 ⇒ σ̂ 2 = 1.132009
σ̂32 = 3.482903 ⇒ σ̂3 = 1.866254
The next step is to divide the values of the dependent variable, the independent variable and the
constant term (a vector of 1’s) in the ith part by σi :
yi 1 x ε
= α( ) + β( i ) + u i where u i = i .
σˆ i σˆ i σˆ i σˆ i

1 Xi Yi
σ̂i σ̂i σ̂i
2.1340 13.2306 13.0172
2.1340 17.2851 17.0717
2.1340 21.9798 21.9798
2.1340 25.8209 25.8209
2.1340 30.0889 27.9549
2.1340 34.9970 31.5826
2.1340 38.8381 38.1979
0.8834 17.7560 17.4910
0.8834 19.6995 17.5794
0.8834 21.2896 19.0811
0.8834 23.0564 22.5263
0.8834 24.9998 22.0846
0.8834 26.5899 25.8832
0.8834 28.5333 27.5616
0.5358 18.4862 17.7361
0.5358 19.6115 17.0395
0.5358 20.3616 17.9504
0.5358 21.5405 20.7903
0.5358 22.6657 21.8084
0.5358 23.9517 20.6831

Econometrics Lecture notes for 8


Masters Program,2015 Hawassa University
Yi 1 X
We then run an OLS regression of on and i without a constant term. The results
σ̂i σ̂i σ̂i
are:
Yi = 0.652 + 0.910 X i
(1.812) (43.936)
2 2
R = 0.998 , σˆ = 1.107

The plot of the residuals of the transformed model against the explanatory variable (income) is
shown below. It can be seen that the spread of the residuals has no increasing or decreasing
pattern, i.e., there is no heteroscedasticity.

1.5

1.0

.5
Residual

0.0

-.5

-1.0

-1.5

-2.0
0 10 20 30 40 50

INCOME

Another method of correcting for heteroscedasticity is based on the assumption that the
variance of the disturbances is positively associated with level of income X , that is,
σ2i = σ 2 Xi2 .
The model we are going to estimate is then:
Yi  1  X  εi
= α   + β i  +
Xi  Xi   Xi  Xi
 1 
Yi
⇒ = α   + β + ui where u i = εi / Xi .
Xi
 Xi 
This simply means that we apply OLS by regressing Yi / X i on 1/ Xi . The SPSS output is
shown below:
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression .010 1 .010 5.275 .034a
Residual .033 18 .002
Total .043 19
a. Predictors: (Constant), oneoverxi
b. Dependent Variable: yioverxi

The model is adequate as judged by the F-test at the 5% level of significance. The estimated
model is shown below:
Econometrics Lecture notes for 9
Masters Program,2015 Hawassa University
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .910 .017 52.624 .000
oneoverxi .612 .266 .476 2.297 .034
a. Dependent Variable: yioverxi

Note that the estimated constant term and slope from the transformed model correspond to the
values of β̂ and α̂ , respectively. Thus, the estimated model is:

Yi = 0.612 + 0.910 X i
(2.297) (52.624)

A plot of the residuals from the transformed is shown below. The plot does not indicate any
increasing or decreasing pattern in the scatter of the residuals.

0.04000

0.02000
Unstandardized Residual

0.00000

-0.02000

-0.04000

-0.06000

-0.08000
0.00 10.00 20.00 30.00 40.00 50.00
income

Econometrics Lecture notes for 10


Masters Program,2015 Hawassa University

You might also like