Econometricis Chapter Four PBMs
Econometricis Chapter Four PBMs
with mean zero and var(u t ) = 2 , and that the errors corresponding to different observation are
Now, we address the following ‘what if’ questions in this chapter. What if the error variance is
not constant over all observations? What if the different errors are correlated? What if the
explanatory variables are correlated? We need to ask whether and when such violations of the
basic classical assumptions are likely to occur. What are the consequences such violations on
least square estimators? How do we detect the presence of autocorrelation, heteroscedasticity, or
multicollineairy? What are the remedial measures? In the subsequent sections, we attempt to
answer such questions.
4.1 Heteroscedasticity
4.1.1 The Nature of Heteroscedasticty
In the classical linear regression model, one of the basic assumptions is that the probability
distribution of the disturbance term remains same over all observations of X; i.e. the variance of
each ui is the same for all the values of the explanatory variable. Symbolically,
1|Page
4.1.3. Reasons for Hetroscedasticity
There are several reasons why the variances of ui may be variable. Some of these are:
1. Error learning model: it states that as people learn their error of behavior become smaller over
time. In this case i2 is expected to decrease.
Example: as the number of hours of typing practice increases, the average number of typing
errors and as well as their variance decreases.
2. As data collection technique improves, ui2 is likely to decrease. Thus banks that have
sophisticated data processing equipment are likely to commit fewer errors in the monthly or
quarterly statements of their customers than banks without such facilities.
3. Heteroscedasticity can also arise as a result of the presence of outliers. An outlier is an
observation that is much different (either very small or very large) in relation to the other
observation in the sample.
(ˆ ) = + X + (U ) − ( ˆ ) X =
i.e., the least square estimators are unbiased even under the condition of heteroscedasticity. It is
because we do not make use of assumption of homoscedasticity here.
2. Variance of OLS coefficients will be incorrect
2
Under homoscedasticity, var( ˆ ) = 2 K 2 = 2 , but under hetroscedastic assumption we shall
x
have: var( ˆ ) = K i2 (Yi 2 ) = K i2 ui2 2 K i2
2|Page
ui2 is no more a finite constant figure, but rather it tends to change with an increasing range of
value of X and hence cannot be taken out of the summation (notation).
3. OLS estimators shall be inefficient: In other words, the OLS estimators do not have the
smallest variance in the class of unbiased estimators and, therefore, they are not efficient both in
small and large samples. Under the heteroscedastic assumption, therefore:
x x 2 2
var( ˆ ) = K i2 (Y 2i) = i 2 (Yi 2 ) = i 2 ui2 − − − − − − − − − 3.11
x ( x i )
ˆ 2
Under homoscedasticy, var( ) = 2 − − − − − − − − − − − − − − − − − − − 3.12
x
These two variances are different. This implies that, under heteroscedastic assumption although
the OLS estimator is unbiased, but it is inefficient. Its variance is larger than necessary.
To see the consequence of using (3.12) instead of (3.11), let us assume that:
ui2 = K i 2
Where K i are same non-stochastic constant weights. This assumption merely states that the
2 k i xi2 2 k x 2
this value of ui2 in (3.11), we obtain: var( ˆ ) = = 2 i 2i
(xi2 )(xi ) 2 x xi
k i xi2
= (var(ˆ )
Homo . 2
− − − − − 3.13
xi
That is to say if x 2 and k i are positively correlated and if and only if the second term of (3.13) is
greater than 1, then var( ˆ ) under heteroscedasticty will be greater than its variance under
homoscedasticity. As a result the true standard error of ̂ shall be underestimated. As such the
t-value associated with it will be overestimated which might lead to the conclusion that in a
specific case at hand ̂ is statistically significant (which in fact may not be true). Moreover, if
we proceed with our model under false belief of homoscedasticity of the error variance, our
inference and prediction about the population coefficients would be incorrect.
3|Page
4.1.5. Detecting Heteroscedasticity
We have observed that the consequences of heteroscedasticty are serious on OLS estimates. As
such, it is desirable to examine whether or not the regression model is in fact homosedastic.
Hence there are two methods of testing or detecting heteroscedasticity. These are:
i. Informal method
ii. Formal method
i. Formal Methods
a. Goldfield-Quandt test
This popular method is applicable if one assumes that the heteroscedastic variance i2 is
positively related to one of the explanatory variables in the regression model. For simplicity,
consider the usual two variable models:
Yi = + i X i + U i
If the above equation is appropriate, it would mean i2 would be larger, the larger values of X i .If
that turns out to be the case, hetroscedasticity is most likely to be present in the model. To test
this explicitly, Goldfeld and Quandt suggest the following steps:
Step 1: Order or rank the observations according to the values of X i beginning with the lowest
X value
Step 2: Omit C central observations where C is specified a priori, and divide the remaining (n-
(n − c)
c) observations into two groups each of observations
2
(n − c) (n − c)
Step 3: Fit separate OLS regression to the first observations and the last
2 2
observations, and obtain the respective residual sums of squares RSS, and RSS2, RSS1
representing RSS from the regression corresponding to the smaller X i values (the small variance
group) and RSS2 that from the larger X i values (the large variance group). These RSS each have
(n − c) (n − c − 2 K )
− K or df , where: K is the number of parameters to be
2 2
estimated, including the intercept term; and df is the degree of freedom.
RSS 2 / df
Step 4: compute =
RSS1 / df
4|Page
If U i are assumed to be normally distributed (which we usually do), and if the assumption of
homoscedasticity is valid, then it can be shown that follows F distribution with numerator and
RSS2 /(n − c − 2 K ) / 2
denominator df each (n-c-2k)/2. = ~ F (n -c) (n -c)
RSS1 /(n − c − 2 K ) / 2
2
−K ,
2
−K
If in application the computed (= F ) is greater than the critical F at the chosen level of
significance, we can reject the hypothesis of homoscedasticity, i.e. we can say that
hetroscedasticity is very likely.
Example: to illustrate the Goldfeld-Quandt test, we present in table 3.1 data on consumption
expenditure in relation to income for a cross-section of 30 families. Suppose we postulate that
consumption expenditure is linearly related to income but that heteroscedasticity is present in the
data. We further postulate that the nature of heterosedastic is given in equation (3.15) above.
The necessary reordering of the data for the application of the test is also presented in table 3.1.
Table 3.1 Hypothetical data on consumption expenditure Y($) and income X($). (Data ranked by
X values)
Y X Y X
55 80 55 80
65 100 70 85
70 85 75 90
80 110 65 100
79 120 74 105
84 115 80 110
98 130 84 115
95 140 79 120
90 125 90 125
75 90 98 130
74 105 95 140
110 160 108 145
113 150 113 150
125 165 110 160
108 145 125 165
115 180 115 180
140 225 130 185
120 200 135 190
145 240 120 200
130 185 140 205
152 220 144 210
144 210 152 220
175 245 140 225
180 260 137 230
135 190 145 240
140 205 175 245
178 265 189 250
191 270 180 260
137 230 178 265
189 250 191 270
5|Page
Dropping the middle four observations, the OLS regression based on the first 13 and the last 13
observations and their associated sums of squares are shown next (standard errors in
parentheses). Regression based on the last 13 observations
Yi = 3.4094 + 0.6968 X i + ei
(8.7049) (0.0744)
R 2 = 0.8887
RSS1 = 377.17
df = 11
Regression based on the last 13 observations
Yi = −28.0272 + 0.7941 X i + ei
(30.6421) (0.1319)
R 2 = 0.7681
RSS 2 = 1536.8
df = 11
RSS 2 / df 1536.8 / 11
= =
From these results we obtain: RSS1 / df 377.17 / 11
= 4.07
The critical F-value for 11 numerators and 11 denominators for df at the 5% level is 2.82. Since
the estimated F (= ) value exceeds the critical value, we may conclude that there is
hetrosedasticity in the error variance. However, if the level of significance is fixed at 1%, we
may not reject the assumption of homosedasticity (why?) Note that the value of the observed
is 0.014.
There are also other tests of hetroscedasticity like spearman’s rank correlation test, Breusch-
pagan-Goldfe y test and white’s general hetroscedastic test. Read them by yourself.
6|Page
If we apply OLS to the above then it will result in inefficient parameters since var(ui ) is not
constant.
The remedial measure is transforming the above model so that the transformed model satisfies all
the assumptions of the classical regression model including homoscedasticity. Applying OLS to
the transformed variables is known as the method of Generalized Least Squares (GLS). In short
GLS is OLS on the transformed variables that satisfy the standard least squares assumptions.
The estimators thus obtained are known as GLS estimators, and it is these estimators that are
BLUE.
4.2 Autocorrelation
4.2.1 The Nature of Autocorrelation
In our discussion of simple and multiple regression models, one of the assumptions of the
classicalist is that the cov(ui u j ) = (ui u j ) = 0 ,which implies that successive values of
disturbance term U are temporarily independent, i.e. disturbance occurring at one point of
observation is not related to any other disturbance. This means that when observations are made
over time, the effect of disturbance occurring at one period does not carry over into another
period.
If the above assumption is not satisfied, that is, if the value of U in any particular period is
correlated with its own preceding value(s), we say there is autocorrelation of the random
variables. Hence, autocorrelation is defined as a ‘correlation’ between members of series of
observations ordered in time or space.
There are several reasons why serial or autocorrelation a rises. Some of these are:
7|Page
a. Cyclical fluctuations
Time series such as GNP, price index, production, employment and unemployment exhibit
business cycle. Starting at the bottom of recession, when economic recovery starts, most of these
series move upward. In this upswing, the value of a series at one point in time is greater than its
previous value. Thus, there is a momentum built in to them, and it continues until something
happens (e.g. increase in interest rate or tax) to slowdown them. Therefore, regression involving
time series data, successive observations are likely to be interdependent.
Specification Bias
Let’s see one by one how the above specification biases causes autocorrelation.
i. Exclusion of variables: as we have discussed in chapter one , there are several sources of
the random disturbance term (ui). One of these is exclusion of variable(s) from the model.
This error term will show a systematic change as this variable changes. For example,
suppose the correct demand model is given by:
yt = + 1 x1t + 2 x2t + 31 x3t + U t − − − − − − − − − − − − 3.21 where
Now, if equation 3.21 is the ‘correct’ model or true relation, running equation 3.22 is the
tantamount to letting Vt = 3 x3t + U t . And to the extent the price of pork affects the
consumption of beef, the error or disturbance term V will reflect a systematic pattern, thus
creating autocorrelation. A simple test of this would be to run both equation 3.21 and equation
8|Page
3.22 and see whether autocorrelation, if any, observed in equation 3.22 disappears when
equation 3.21 is run. The actual mechanics of detecting autocorrelation will be discussed
latter.
ii. Incorrect functional form: This is also one source of the autocorrelation of error term.
Suppose the ‘true’ or correct model in a cost-output study is as follows.
Marginal cost= 0 + 1output i + 2 output i + U i − − − − − − − − − − − − 3.23
2
However, we
incorrectly fit the following model. M arg inal cos t i = 1 + 2 outputi + Vi --------------3.24
The marginal cost curve corresponding to the ‘true’ model is shown in the figure below along
with the ‘incorrect’ linear cost curve.
As the figure shows, between points A and B the linear marginal cost curve will
consistently over estimate the true marginal cost; whereas, outside these points it will
consistently underestimate the true marginal cost. This result is to be expected because the
disturbance term Vi is, in fact, equal to (output)2+ ui, and hence will catch the systematic
effect of the (output)2 term on the marginal cost. In this case, Vi will reflect autocorrelation
because of the use of an incorrect functional form.
iii. Neglecting lagged term from the model: - If the dependent variable of a certain regression
model is to be affected by the lagged value of itself or the explanatory variable and is not
included in the model, the error term of the incorrect model will reflect a systematic pattern
which indicates autocorrelation in the model. Suppose the correct model for consumption
expenditure is:
9|Page
Ct = + 1 yt + 2 yt −1 + U t -----------------------------------3.25
Ct = + 1 yt + Vt ---------------------------------------------3.26
reflecting autocrrelation.
Autocorrelation, as stated earlier, is a kind of lag correlation between successive values of same
variables. Thus, we treat autocorrelation in the same way as correlation in general. A simple
case of linear correlation is termed here as autocorrelation of first order. In other words, if the
value of U in any particular period depends on its own value in the preceding period alone, we
say that U’s follow a first order autoregressive scheme AR(1) (or first order Markove scheme)
i.e. ut = f (ut −1 ) . ------------------------- - -------------3.28
This form of autocorrelation is called a second order autoregressive scheme and so on.
Generally when autocorrelation is present, we assume simple first form of autocorrelation:
ut = ut −1 + vt --------------------------------------------3.30
where the coefficient of autocorrelation and V is a random variable satisfying all the basic
assumption of ordinary least square.
10 | P a g e
n
u u t t −1
̂ = t =2
n
--------------------------------3.31
u
t =2
2
t −1
Given that for large samples: u t2 u t2−1 , we observe that coefficient of autocorrelation
represents a simple correlation coefficient r.
n n n
u u t t −1 u u t t −1 u u t t −1
ˆ = t =2
n
= t =2
= t =2
= rut ut −1 ---------------------------3.32
2
u t2 u t2−1
u
t =2
2
t −1 n 2
u t −1
t =2
−1 ˆ 1 since − 1 r 1 ---------------------------------------------3.33
This proves the statement “we can treat autocorrelation in the same way as correlation in
general”. From our statistics background we know that:
Our objective, here is to obtain value of ut in terms of autocorrelation coefficient and random
variable vt . The complete form of the first order autoregressive scheme may be discussed as
under:
11 | P a g e
U t = f (U t −1 ) = U t −1 + vt
U t −1 = f (U t −2 ) = U t −2 + vt −1
U t −2 = f (U t −3 ) = U t −3 + vt −2
U t −r = f (U t −( r +1) ) = U t −( r +1) + vt −r We
make use of above relations to perform continuous substitutions in U t = ut −1 + vt as follows.
U t = U t −1 + vt
= ( U t −2 + vt −1 ) + vt , ut −1 = U t −2 + vt −1
= 2U t −2 + vt −1 + vt
= 2 ( U t −3 + vt −3 ) + ( vt −1 + vt )
U t = 3U t −3 + 2 vt −3 + vt −1 + vt In this
way, if we continue the substitution process for r periods (assuming that r is very large), we shall
obtain:
U t = vt + vt −1 + 2 vt −2 + 3 vt −3 + − − − − − − − − -------------3.35
r → 0 since / / 1
u t = r vt − r -----------------------------------------------------------3.36
r =0
Now, using this value of ut , let’s compute its mean, variance and covariance
1. To obtain mean:
(U t ) = r vt − r = r (vt − r ) = 0 since (vt −r ) = 0 ----------3.37
r =0
In other words, we found that the mean of autocorrelated U’s turns out to be zero.
2. To obtain variance
2
By the definition of variance (U ) = r vt −r = ( r ) 2 (vt −r ) 2 = ( r ) 2 var(Vt −r )
i
2
r =0 r =0 r =0
1
= 2 r 2 = 2 (1 + 2 + 4 + 6 + ................ + ) = 2 2
r =0 1 −
2
var(U t ) = --------------------------------(3.38) ; Since / / 1
(1 − 2 )
12 | P a g e
2
Thus, variance of autocorrelated ui is which is constant value. From the above, the
1− 2
variance of Ui depends on the nature of variance of Vi. If the variance of Vi is homoscedaistic, Ui
is homomscedastic and if Vi is hetroscedastic, Ui is hetroscedastic.
3. To obtain covariance:
= E (U tU t −1 ) ------------------------------------------------------------------------ (3.39)
since ut = vt + vt −1 + 2 vt −2 + ........
U t −1 = vt −1 + vt −2 + 2 vt −3 + ........
Substituting the above two equations in equation 3.39, we obtain
cov(U tU t −1 ) = (vt + vt −1 + 2 vt −2 + ........)(vt −1 + vt −2 + 2 vt −3 + ........)
= {vt + (vt −1 + vt −2 + ........)}(vt −1 + vt −2 + 2 vt −3 + ........)
= [vt (vt −1 + vt −2 + ........) + ( (vt −1 + vt −2 + ........)2 ] ; since E (vt vt −r ) = 0
= 0 + ( (vt −1 + vt −2 + ........)2 )
= ( (vt −1 + vt −2 + ........)2 )
= (vt −1 + 2 vt −2 + ...... + 2 times cross products )
2 2
= ( v2 + 2 v2 + ...... + 0)
= ( v2 (1 + 2 + 4 + ......)
2
= since 1 --------------------------------------------------------3.40
1− 2
v2
cov(U t ,U t −1 ) = = u2 ……………………………………………….3.41
1− 2
We have seen that ordinary least square technique is based on basic assumptions. Some of the
basic assumptions are with respect to mean, variance and covariance of disturbance term.
Naturally, therefore, if these assumptions do not hold good on what so ever account, the
13 | P a g e
estimators derived by OLS procedure may not be efficient. Now, we are in a position to examine
the effect of autocorrelation on OLS estimators. Following are effects on the estimators if OLS
method is applied in presence of autocorrelation in the given data.
We know that: ̂ = + k i u i
The variance of estimate ̂ in simple regression model will be biased down wards (i.e.
underestimated) when u’s are auto correlated.
If var( ˆ ) is underestimated, SE( ˆ ) is also underestimated, this makes t-ratio large. This large t-
4. Wrong testing procedure will make wrong prediction and inference about the characteristics
of the population.
Different econometricians and statisticians suggest different types of testing methods. But, the
most frequently and widely used testing method by researchers is the following.
A. The Durbin-Watson d test: The most celebrated test for detecting serial correlation is one that
is developed by statisticians Durbin and Waston. It is popularly known as the Durbin-Waston d
statistic, which is defined as:
14 | P a g e
t =n
(e t − et −1 ) 2
d= t =2
t =n
------------------------------------3.47
e
t =1
2
t
Note that, in the numerator of d statistic the number of observations is n − 1 because one
observation is lost in taking successive differences.
1. The regression model includes an intercept term. If such term is not present, as in the case
of the regression through the origin, it is essential to rerun the regression including the
intercept term to obtain the RSS.
2. The explanatory variables, the X’s, are non-stochastic, or fixed in repeated sampling.
3. The disturbances U t are generated by the first order auto regressive scheme:
Vt = ut −1 + t
4. The regression model does not include lagged value of Y the dependent variable as one
of the explanatory variables. Thus, the test is inapplicable to models of the following
type
yt = 1 + 2 X 2t + 3 X 3t + ....... + k X kt + ryt −1 + U t
Where yt −1 the one period lagged value of y is such models are known as autoregressive
models. If d-test is applied mistakenly, the value of d in such cases will often be around
2, which is the value of d in the absence of first order autocorrelation. Durbin developed
the so-called h-statistic to test serial correlation in such autoregressive.
In using the Durbin –Watson test, it is, therefore, to note that it cannot be applied in
violation of any of the above five assumptions.
15 | P a g e
t =n
(e t − et −1 ) 2
From equation 3.47 the value of d = t =2
t =n
e
t =1
2
t
Thus,
n
2 et2
2et et −1
d= t =2
−+
e 2
t et
et et −1
d 2 1− n
t =1
et
e e
but = t t −1 from equation
et
d = 2(1 − ˆ )
From the above relation, therefore
ˆ = 0, d 2
if ˆ = 1, d 0
ˆ = −1, d 4
Thus we obtain two important conclusions
i. Values of d lies between 0 and 4
ii. If there is no autocorrelation ˆ = 0, then d = 2
Whenever, therefore, the calculated value of d turns out to be sufficiently close to 2, we accept
null hypothesis, and if it is close to zero or four, we reject the null hypothesis that there is no
autocorrelation.
However, because the exact value of d is never known, there exist ranges of values with in which
we can either accept or reject null hypothesis. We do not also have unique critical value of d-
stastics. We have d L -lower bound and d u upper bound of the initial values of d to accept or
16 | P a g e
reject the null hypothesis. For the two-tailed Durbin Watson test, we can set five regions to the
values of d graphically (read it).
The mechanisms of the D.W test are as follows, assuming that the assumptions underlying the
tests are fulfilled.
➢ Obtain the computed value of d using the formula given in equation 3.47
➢ For the given sample size and given number of explanatory variables, find out critical d L
and dU values.
3. If how ever the value of d lies between d L and dU or between (4 − dU ) and (4 − d L ) , the
D.W test is inconclusive.
of d with d L , dU , (4 − d L ) and (4 − dU )
(4 − d L ) =4-1.37=2.63
(4 − dU ) =4-1.5=2.50
Since d is less than dL we reject the null hypothesis of no autocorrelation
Example 2. Consider the model Yt = + X t + U t with the following observation on X and Y
17 | P a g e
X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11
Solution:
1. regress Y on X: i.e. Yt = + X t + U t :
xy 255
ˆ = = = 0.91
x 2 280
Y = −0.29 + 0.91X + U i
(et − et −1 ) 2 60.213
d= = = 1.442
et2 41.767
Values of d L and dU on 5% level of significance, with n=15 and one explanatory variable are:
d L =1.08 and dU =1.36.
(4 − d u ) = 2.64
d U d 4 − d U = (1.364 2.64)
d * = 1.442
Since d* lies between dU d 4 − dU , accept H0. This implies the data is autocorrelated.
18 | P a g e
Although D.W test is extremely popular, the d test has one great drawback in that if it falls in the
inconclusive zone or region, one cannot conclude whether autocorrelation does or does not exist.
Several authors have proposed modifications of the D.W test.
In many situations, however, it has been found that the upper limit dU is approximately the true
significance limit. Thus, the modified DW test is based on dU in case the estimated d value lies
in the inconclusive zone, one can use the following modified d test procedure. Given the level of
significance ; if
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measures. The remedy however depends on what knowledge one has about the
nature of interdependence among the disturbances. This means the remedy depends on whether
the coefficient of autocorrelation is known or not known.
A. when is known- When the structure of autocorrelation is known, i.e is known, the
appropriate corrective procedure is to transform the original model or data so that error term of
the transformed model or data is non auto correlated.
When is not known, we first estimate the coefficient of autocorrelation and apply appropriate
measure accordingly.
19 | P a g e
4.3 Multicollinearity
4.3.1 The Nature of Multicollinearity
Originally, multicollinearity meant the existence of a “perfect” or exact, linear relationship
among some or all explanatory variables of a regression model. For k-variable regression
involving explanatory variables x1 , x2 ,......, xk , an exact linear relationship is said to exist if the
Today, however , the term multicollineaity is used in a broader sense to include the case of
perfect multicollinearity as shown by (1) as well as the case where the x-variables are inter-
correlated but not perfectly so as follows
1 x1 + 2 x2 + ....... + 2 xk + vi = 0 − − − − − − (1)
where vi is the stochastic error term.
20 | P a g e
1. If multicollinearity is perfect, the regression coefficients of the X variables are indeterminate
and their standard errors are infinite.
Proof: - Consider a multiple regression model with two explanatory variables, where the
dependent and independent variables are given in deviation form as follows.
y = ˆ x + ˆ x + e
i 1 1i 2 2i i
Recall the formulas of ˆ1 and ̂ 2 from our discussion of multiple regression.
x1 yx 22 − x 2 yx1 x 2
ˆ1 =
x12i x 22 − (x1 x 2 ) 2
x 2 yx12 − x1 yx1 x 2
ˆ1 =
x12 x 22 − (x1 x 2 ) 2
Assume x 2 = x1 ------------------------3.32
Where is non-zero constants. Substitute 3.32in the above ˆ1 and ̂ 2 formula:
x1 y(x1 ) 2 − x1 yx1x1
ˆ1 =
x12i (x1 ) 2 − (x1x1 ) 2
Applying the same procedure, we obtain similar result (indeterminate value) for ̂ 2 . Likewise,
from our discussion of multiple regression model, variance of ˆ1 is given by :
2 x 22
var( ˆ1 ) =
x12 x12 − (x1 x 2 ) 2
Substituting x 2 = x1 in the above variance formula, we get:
2 2 x12
=
2 (x12 ) 2 − 2 (x12 ) 2
2 2 x12
= = infinite.
0
These are the consequences of perfect multicollinearity. One may raise the question on
consequences of less than perfect correlation. In cases of near or high multicollinearity, one is
likely to encounter the following consequences.
2. If multicollineaity is less than perfect (i.e near or high multicollinearity), the regression
coefficients are determinate
Proof: Consider the two explanatory variables model above in deviation form.
21 | P a g e
If we assume x 2 = x1 it indicates us perfect correlation between x1 and x 2 because the change
in x2 is completely because of the change in x1.Instead of exact multicollinearity, we may have:
x2i = x1i + vt Where 0, vt is stochastic error term such that xi vi = 0 . In this case x2 is not
only determined by x1,but also affected by some other variables given by vi (stochastic error
term).
Substitute x2 = x1i + vt in the formula of ˆ1 above
x1 yx 22 − x 2 yx1 x 2
ˆ1 =
x12i x 22 − (x1 x 2 ) 2
x1 y2 x12 + vi2 − (y i x1i + y i vi )x12i 0
= determinate.
x 22i (2 x 22i + vi2 ) − (x12 ) 2 0
This proves that if we have less than perfect multicollinearity the OLS coefficients are
determinate.
The implication of indetermination of regression coefficients in the case of perfect
multicolinearity is that it is not possible to observe the separate influence of x1 and x 2 . But
such extreme case is not very frequent in practical applications. Most data exhibit less than
perfect multicollinearity.
3. If multicollineaity is less than perfect (i.e. near or high multicollinearity) , OLS estimators
retain the property of BLUE
Explanation:
Note: While we were proving the BLUE property of OLS estimators in simple and multiple
regression models; we did not make use of the assumption of no multicollinearity. Hence, if the
basic assumptions which are important to prove the BLUE property are not violated, whether
multicollinearity exist or not, the OLS estimators are BLUE .
3. Although BLUE, the OLS estimators have large variances and covariances.
2 x 22
var( ˆ 2 ) = 2 2
x1 x 2 − (x1 x 2 ) 2
1
Multiply the numerator and the denominator by 2
x2
1
2 x 22 . 2
x 2 2
var( ˆ 2 ) = =
(x x
2
1
2
2 − (x1 x 2 ) 2
). 1
2 x12 −
(x1 x 2 ) 2
x 2 x12
2 2
= =
(x1 x 2 ) 2 x12 − (1 − r122 )
x − 1 −
2
1
x12 x12
22 | P a g e
Where r122 is the square of correlation coefficient between x1 and x 2 ,
If x = x + v , what happen to the variance of ˆ as r 2 is line rises.
2 1i i 1 12
As r12 tends to 1 or as collinearity increases, the variance of the estimator increase and in the
limit when r = 1 variance of ˆ becomes infinite.
12 1
− r12 2
Similarly cov( 1 , 2 ) = . (why?)
(1 − r122 ) x12 x12
As r12 increases to ward one, the covariance of the two estimators increase in absolute value. The
speed with which variances and covariance increase can be seen with the variance-inflating
factor (VIF) which is defined as:
1
VIF =
1 − r122
VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As
r122 approaches 1, the VIF approaches infinity. That is, as the extent of collinearity increase, the
variance of an estimator increases and in the limit the variance becomes infinite. As can be seen,
if there is no multicollinearity between x1 and x2 , VIF will be 1.
Using this definition we can express var( 1 ) and var( ˆ 2 ) interms of VIF
ˆ 2 ˆ 2
var( 1 ) = 2 VIF and var( 2 ) = 2 VIF
x1 x 2
Which shows that variances of ˆ1 and ˆ 2 are directly proportional to the VIF.
4. Because of the large variance of the estimators, which means large standard errors, the
confidence interval tend to be much wider, leading to the acceptance of “zero null hypothesis”
(i.e. the true population coefficient is zero) more readily.
5. Because of large standard error of the estimators, the computed t-ratio will be very small
leading one or more of the coefficients tend to be statistically insignificant when tested
individually.
6. Although the t-ratio of one or more of the coefficients is very small (which makes the
coefficients statistically insignificant individually), R2, the overall measure of goodness of fit, can
be very high.
Example: if y = + 1 x1 + 2 x2 + .... + k xk + vi
23 | P a g e
In the cases of high collinearity, it is possible to find that one or more of the partial slope
coefficients are individually statistically insignificant on the basis of t-test. But the R2 in such
situations may be so high say in excess of 0.9.in such a case on the basis of F test one can
convincingly reject the hypothesis that 1 = 2 = − − − = k = 0 Indeed, this is one of the
signals of multicollinearity- insignificant t-values but a high overall R2 (i.e a significant F-value).
7. The OLS estimators and their standard errors can be sensitive to small change in the data.
4.3.4 Detection of Multicollinearity
A recognizable set of symptoms for the existence of multicollinearity on which one can rely are:
a. High coefficient of determination ( R2)
b. High correlation coefficients among the explanatory variables (rxi x j ' s)
existence of multicollinearity because multicollinearity can also exist even if the correlation
coefficient is low.
However, the combination of all these criteria should help the detection of multicollinearity.
regression of Y on the X’s. Then, following the relationship between F and R2 established in
chapter three under over all significance , the variable,
R x21 , x2 , x3 ,... xk / k − 2
Ri = ~ F( k − 2, n − k +1)
1 − R x21 , x2 , x3 ,... xk /( n − k + 1)
24 | P a g e
If the computed F exceeds the critical F at the chosen level of significance, it is taken to mean
that the particular Xi collinear with other X’s; if it does not exceed the critical F, we say that it is
not collinear with other X’s in which case we may retain the variable in the model. If Fi is
statistically significant, we will have to decide whether the particular Xi should be dropped from
the model. Note that according tot Klieri’s rule of thumb, which suggest that multicollinearity
may be a troublesome problem only if R2 obtained from an auxiliary regression is greater than
the overall R2, that is obtained from the regression of Y on all regressors.
4.3.4.2. Test of multicollinearity using Eigen values and condition index:
Using Eigen values we can drive a number called condition number K as follows:
max imum eigen value
k=
Minimum eigen value
In addition using these value we can drive the condition index (CI) defined as
Max.eigenvalue
CI = = k
min . eigen value
Decision rule: if K is between 100 and 1000 there is moderate to strong muticollinearity and if it
exceeds 1000 there is sever muticollinearity. Alternatively if CI( = k ) is between 10 and 30,
there is moderate to strong multicollineaity and if it exceeds 30 there is sever muticollinearity.
Example . If k=123,864 and CI=352 – This suggest existence of multicollinearity
4.3.4.4 Test of multicollinearity using Tolerance and variance inflation factor
2 1 2
var( ˆ1 ) = 2 = 2 VIF
2
x1 1 − Ri xi
where Ri2 is the R 2 in the auxiliary regression of Xj on the remaining (k-2) regressors and VIF is
25 | P a g e
as it is zero if it is perfectly related to other regressors. VIF (or tolerance) as a measure of
2
collinearity, is not free of criticism. As we have seen earlier var( ˆ ), = = 2 (VIF ) ; depends
xi
on three factors 2 , xi2 and VIF. A high VIF can be counter balanced by low
2 or high xi2 . To put differently, a high VIF is neither necessary nor sufficient to get high
variances and high standard errors. Therefore, high multicollinearity, as measured by a high VIF
may not necessary cause high standard errors.
4.3.5. Remedial measures
It is more difficult to deal with models indicating the existence of multicollinearity than detecting
the problem of multicollinearity. Different remedial measures have been suggested by
econometricians; depending on the severity of the problem, availability of other sources of data
and the importance of the variables, which are found to be multicollinear in the model.
Some suggest that minor degree of multicollinearity can be tolerated although one should be a bit
careful while interpreting the model under such conditions. Others suggest removing the
variables that show multicollinearity if it is not important in the model. But, by doing so, the
desired characteristics of the model may then get affected. However, following corrective
procedures have been suggested if the problem of multicollinearity is found to be serious.
1. Increase the size of the sample: it is suggested that multicollinearity may be avoided or
reduced if the size of the sample is increased. With increase in the size of the sample, the
covariances are inversely related to the sample size. But we should remember that this will be
true when intercorrelation happens to exist only in the sample but not in the population of the
variables. If the variables are collinear in the population, the procedure of increasing the size of
the sample will not help to reduce multicollinearity.
2. Introduce additional equation in the model: The problem of mutlicollinearity may be
overcome by expressing explicitly the relationship between multicollinear variables. Such
relation in a form of an equation may then be added to the original model. The addition of new
equation transforms our single equation (original) model to simultaneous equation model. The
reduced form method (which is usually applied for estimating simultaneous equation models)
can then be applied to avoid multicollinearity.
26 | P a g e