Unit 10
Unit 10
Structure
10.0 Objectives
10.1 Introduction
10.2 Types of Multicollinearity
10.2.1 Perfect Multicollinearity
10.2.2 Near or Imperfect Multicollinearity
10.3 Consequences of Multicollinearity
10.4 Detection of Multicollinearity
10.5 Remedial Measures of Multicollinearity
10.5.1 Dropping a Variable from the Model
10.5.2 Acquiring Additional Data or New Sample
10.5.3 Re-Specification of the Model
10.5.4 Prior Information about Certain Parameters
10.5.5 Transformation of Variables
10.5.6 Ridge Regression
10.5.7 Other Remedial Measures
10.6 Let Us Sum Up
10.7 Answers/ Hints to Check Your Progress Exercises
10.0 OBJECTIVES
After going through this unit, you should be able to
explain the concept of multicollinearity in a regression model;
comprehend the difference between the near and perfect multicollinearity;
describe the consequences of multicollinearity;
1
explain how multicollinearity can be detected; and
describe the remedial measures of multicollinearity; and
explain the concept of ridge regression.
10.1 INTRODUCTION
The classical linear regression model assumes that there is no perfect
multicollinearity. Multicollinearity means the presence of high correlation
between two or more explanatory variables in a multiple regression model.
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Absence of multicollinearity implies that there is no exact linear relationship Multicollinearity
among the explanatory variables. The assumption of no perfect multicollinearity
is very crucial to a regression model since the presence of perfect
multicollinearity has serious consequences on the regression model. We will
discuss about the consequences, detection methods, and remedial measures for
multicollinearity in this Unit.
Let us consider the same demand function of good Y. In this case we however
assume that there is imperfect multicollinearity between the explanatory variables
(in order to distinguish it from the earlier case, we have changed the parameter
notations). The following is the population regression function:
𝑌 =𝐵 +𝐵 𝑋 +𝐵 𝑋 +𝑢 ….(10.5)
Equation (10.5) refers to the case when two or more explanatory variables are not
exactly linear. For the above regression model, we may obtain an estimated
regression equation as follows:
Since the explanatory variables are not exactly related, we can find estimates for
the parameters. In this case, regression can be estimated unlike the first case of
prefect multicollinearity. It does not mean that there is no problem with our
estimators if there is imperfect multicollinearity. We discuss the consequences of
132 multicollinearity in the next section.
Check Your Progress 1 Multicollinearity
Since the values of standard errors have increased the interval reflected in
expression in (10.7) has widened.
(d) Insignificant t ratios: As pointed out above, standard errors of the
estimators increase due to multicollinearity. The t-ratio is given as
= . Therefore, the t-ratio is very small. Thus we tend to accept (or
( )
do not reject) the null hypothesis and tend to conclude that the variable
has no effect on the dependent variable.
(e) A high 𝑅 and few significant t-ratios: In equation (10.6) we notice that
the 𝑅 is very high, about 98% or 0.98. The t-ratios of both the
explanatory variables are not statistically significant. Only the price
variable slope coefficient has significant t-value. However, using F-test
while testing overall significance 𝐻 : 𝑅 = 0, we reject the null
hypotheses. Thus there is some discrepancy between the results of the F-
test and the t-test.
(f) The OLS estimators are mainly partial slope coefficients and their
standard errors become very sensitive to small changes in the data. If
there is a small change in data, the regression results change substantially.
(g) Wrong signs of regression coefficients: It is a very prominent impact of
the presence of multicollinearity. In the case of the example given at
equation (10.6) we find that the coefficient of the variable income is
negative. The income variable has a ‘wrong’ sign as economic theory
suggests that income effect is positive unless the commodity concerned is
an inferior good.
𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 +𝑢 … (10.8)
Suppose the explanatory variables are perfectly correlated with each other
as shown in equation (10.9) below
𝑋 =𝜆 𝑋 +𝜆 𝑋 … (10.9)
X4 is an exact linear combination of X2 and X 3
𝑅 . = ... (10.10)
Suppose, r42 0.5, r43 0.5, r23 0.5. If we substitute these values in
equation (10.10), we find that 𝑅 . = 1. An implication of the above is
that all the correlation coefficients (among explanatory variables) are not
very high but still there is perfect multicollinearity.
where VIF =
Note that as 𝑅 increases the VIF also increases. This inflates the variance
and hence standard errors of b 2 and b3
if 𝑅 = 1, 𝑉𝐼𝐹 = ∞ ⇒ 𝑉 (𝑏 ) → ∞, 𝑉 (𝑏 ) → ∞
Note that 𝑣𝑎𝑟 (𝑏 ) depends not only on 𝑅 , but also on 2 and x 22i . It
is possible that R i2 is high (say, 0.91) but 𝑣𝑎𝑟(𝑏 ) could be lower due to
low 2 or high x 22i . Thus Vb2 is still lower resulting in high t value.
Thus 𝑅 obtained from auxiliary regression is only a superficial indicator
of multicollinearity.
136
Check Your Progress 2 Multicollinearity
On the other hand, if the objective of the study is not only prediction but also
reliable estimations of the individual parameters of the chosen model then serious
collinearity may be bad, since multicollinearity results in large standard errors of
estimators and therefore widens confidence interval. Thus, resulting in accepting
null hypotheses in most cases. If the objective of the study is to estimate a group 137
Treatment of Violations of coefficients (i.e., sum or difference of two coefficients) then this is possible
of Assumptions
even in presence of multicollinearity. In such a case multicollinearity may not be
a problem.
𝑌 =𝐶 +𝐶 𝑋 +𝑢 …(10.13)
C1 A1 300A 3 , C 2 A 2 2A 3
var (𝑏 ) =
∑
If the cost curves are U-shaped Average Marginal cost curves then the theory
suggests that the coefficient should satisfy following
1) 1 , 2 and 4 0
2) 3 0
3) 32 3 2 4
140
10.7 ANSWERS/ HINTS TO CHECK YOUR Multicollinearity
PORGRESS EXERCISES
Check Your Progress 1
1) The case of perfect multicollinearity mainly reflects the situation when
the explanatory variables and perfectly correlated with each other
implying the coefficient of correlation between the explanatory variables
is 1.
2) This refers to the case when two or more explanatory variables are not
exactly linear this reinforces the fact that collinearity can be high but not
perfect. “High collinearity” refers to the case of “near” or imperfect” or
high multicollinearity. Presence of multicollinearity implies “imperfect
multicollinearity’’
3) In the case of perfect multicollinearity it is not possible to obtain
estimators for the parameters of the regression model. See Section 10.2
for details.
Check Your Progress 2
1) (i) In case of imperfect multicollinearity, some of the estimators are
statistically not significant. But OLS estimates still retain their BLUE
property that is, Best Linear Unbiased Estimators. Therefore, imperfect
multicollinearity does not violate any of the assumptions, OLS estimators
retain BLUE property. Being BLUE with minimum variance does not
imply that the numerical value of variance will be small.
(ii) The R 2 value is very high but very few estimators are significant (t-ratios
low). The example mentioned in earlier section where the demand
function of good Y we computed using the earnings of individuals,
reflects the situation where R 2 is quite high about 98% or 0.98 but only
price variable slope coefficient has significant t-value. However, using F-
test while testing overall significance H 0 : R 2 0, we reject the
hypotheses that both prices and earnings have no effect on the demand of
Y.
(iii) The ordinary least square OLS estimators mainly partial slope coefficients
and their standard errors become very sensitive to small changes in the
data, i.e. they then to be rentable. A small charge of data, the regression
results change quite substantially as in case example of near or imperfect
multicollinearity mentioned above, the standard errors go down and t-
ratios have increased in absolute values.
(iv) Wrong signs of regression coefficients. It is a very prominent impact of
presence of multicollinearity. In case of example where earnings of
individuals were used in deriving demand curve of good Y, the earning
141
Treatment of Violations variable has the ‘wrong’ sign for the economic theory since the income
of Assumptions
effect usually positive unless it is case of inferior good.
2) Examining partial correlations: In case of three explanatory variables
X 2 , X3 and X 4 very high or perfect multicollinearity between X 4 and
X 2 , X3 .
Subsidiary or auxiliary regressions: When one explanatory variables X is
regressed on each of the remaining X variable and the corresponding R 2
is computed. Each of these regressions is referred as subsidiary or
auxiliary regression. A regression Y on X 2 , X3 , X 4 , X5 , X 6 and X 7 with
six explanatory variables. If R 2 comes out to be very high but few
significant t-ratios or very few X coefficients are individually statistically
significant then the purpose is to identify the source of the
multicollinearity or existent of perfect or near perfect linear combination
of other X s .
2
var (b2 )
2
2
x 2i 1 R 2
2 1
.
2 2
X 2i 1 R 2
1 2
VIF Vb2 V.I.F.
1 R2 2
x 2i
142 2
2 Multicollinearity
Similarly, Vb3 VIF
2
x3i
1
VIF is variance inflation factor. As R 2 increases VIF increased
1 R2
thus inflating the variance and hence standard errors of b2 and b3
2 2
If R 2 0, VIF 1 Vb2 and Vb 3
2 2
x 2i x 3i
No collinearity
If R 2 1, VIF V b 2 , V b3
143