0% found this document useful (0 votes)
50 views

Unit 10

This document discusses multicollinearity in regression models. It defines multicollinearity as high correlation between explanatory variables. It describes two types of multicollinearity: perfect multicollinearity, where explanatory variables are perfectly linearly related; and imperfect multicollinearity, where they are nearly but not perfectly related. In the case of perfect multicollinearity, unique estimates of individual regression coefficients cannot be obtained. Imperfect multicollinearity allows estimation but can still cause problems with coefficient estimates. The document outlines consequences of multicollinearity like increased standard errors and unstable estimates, and remedial measures like dropping variables or transforming data.

Uploaded by

zvingwaruthabani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Unit 10

This document discusses multicollinearity in regression models. It defines multicollinearity as high correlation between explanatory variables. It describes two types of multicollinearity: perfect multicollinearity, where explanatory variables are perfectly linearly related; and imperfect multicollinearity, where they are nearly but not perfectly related. In the case of perfect multicollinearity, unique estimates of individual regression coefficients cannot be obtained. Imperfect multicollinearity allows estimation but can still cause problems with coefficient estimates. The document outlines consequences of multicollinearity like increased standard errors and unstable estimates, and remedial measures like dropping variables or transforming data.

Uploaded by

zvingwaruthabani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT 10 MULTICOLLINEARITY 

Structure
10.0 Objectives
10.1 Introduction
10.2 Types of Multicollinearity
10.2.1 Perfect Multicollinearity
10.2.2 Near or Imperfect Multicollinearity
10.3 Consequences of Multicollinearity
10.4 Detection of Multicollinearity
10.5 Remedial Measures of Multicollinearity
10.5.1 Dropping a Variable from the Model
10.5.2 Acquiring Additional Data or New Sample
10.5.3 Re-Specification of the Model
10.5.4 Prior Information about Certain Parameters
10.5.5 Transformation of Variables
10.5.6 Ridge Regression
10.5.7 Other Remedial Measures
10.6 Let Us Sum Up
10.7 Answers/ Hints to Check Your Progress Exercises

10.0 OBJECTIVES
After going through this unit, you should be able to
 explain the concept of multicollinearity in a regression model;
 comprehend the difference between the near and perfect multicollinearity;
 describe the consequences of multicollinearity;
1
 explain how multicollinearity can be detected; and
 describe the remedial measures of multicollinearity; and
 explain the concept of ridge regression.

10.1 INTRODUCTION
The classical linear regression model assumes that there is no perfect
multicollinearity. Multicollinearity means the presence of high correlation
between two or more explanatory variables in a multiple regression model.


Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Absence of multicollinearity implies that there is no exact linear relationship Multicollinearity
among the explanatory variables. The assumption of no perfect multicollinearity
is very crucial to a regression model since the presence of perfect
multicollinearity has serious consequences on the regression model. We will
discuss about the consequences, detection methods, and remedial measures for
multicollinearity in this Unit.

10.2 TYPES OF MULTICOLLINEARITY


Multicollinearity could be of two types: (i) perfect multicollinearity, and (ii)
imperfect multicollinearity. Remember that the division is according to the
degree or extent of relationship between the explanatory variables. The
distinction is made because of the nature of the problem they pose. We describe
both types of multicollinearity below.
10.2.1 Perfect Multicollinearity
In the case of perfect multicollinearity, the explanatory variables are perfectly
correlated with each other. It implies the coefficient of correlation between the
explanatory variables is 1. For instance, suppose want to derive the demand curve
for a good Y. We assume that quantity demanded (Y) is a function of price (𝑋 )
and income (𝑋 ). In symbols,
𝑌 = 𝑓 (𝑋 , 𝑋 ) where 𝑋 is price of good Y and 𝑋 is the weekly consumer
income.
Let us consider the following regression model (population regression function):
𝑌 =𝐴 +𝐴 𝑋 +𝐴 𝑋 +𝑢 … (10.1)
In the above equation, suppose
𝐴 is < 0. This implies that prices are inversely related do demand.
𝐴 > 0. This indicates that as income increases, demand for the good increases.
Suppose there is a perfect relationship between 𝑋 and 𝑋 such that
𝑋 = 300 − 2𝑋 … (10.2)
In the above case, if we regress X3 on X2 we obtain the coefficient of
determination 𝑅 = 1.
If we substitute the value of X3 from equation (10.2), we obtain
𝑌 = 𝐴 + 𝐴 𝑋 + 𝐴 (300 − 2𝑋 ) + 𝑢
= 𝐴 + 𝐴 𝑋 + 300𝐴 − 2𝐴 𝑋 + 𝑢
= (𝐴 + 300𝐴 ) + (𝐴 − 2𝐴 )𝑋 + 𝑢 … (10.3)
Let 𝐶 = (𝐴 + 300𝐴 ) and 𝐶 = (𝐴 − 2𝐴 ) . Then equation (10.3) can be
written as:
131
Treatment of Violations 𝑌 =𝐶 +𝐶 𝑋 +𝑢 ….(10.4)
of Assumptions
Thus if we estimate the regression model given at (10.4), we obtain estimators for
C1 and C2. We do not obtain unique estimators for A1, A2 and A3.

As a result, in the case of perfect linear relationship or perfect multicollinearity


among explanatory variables, we cannot obtain unique estimators of all the
parameters. Since we cannot obtain their unique estimates, we cannot draw any
statistical inferences (hypothesis testing) about them. Thus, in case of perfect
multicollinearity, estimation and hypothesis testing of individual regression
coefficients in a multiple regression are not possible.

10.2.2 Near or Imperfect Multicollinearity


In the previous section, the presence of perfect multicollinearity indicated that we
do not get unique estimators for all the parameters in the model. In practice, we
do not encounter perfect multicollinearity. We usually encounter near or very
high multicollinearity. In this case the explanatory variables are approximately
linearity related.

High collinearity refers to the case of “near” or “imperfect” multicollinearity.


Thus, when we refer to the problem of multicollinearity we usually mean
“imperfect multicollinearity’’

Let us consider the same demand function of good Y. In this case we however
assume that there is imperfect multicollinearity between the explanatory variables
(in order to distinguish it from the earlier case, we have changed the parameter
notations). The following is the population regression function:

𝑌 =𝐵 +𝐵 𝑋 +𝐵 𝑋 +𝑢 ….(10.5)

Equation (10.5) refers to the case when two or more explanatory variables are not
exactly linear. For the above regression model, we may obtain an estimated
regression equation as follows:

Equation (10.5): Y = 145.37 − 2.7975𝑋 − 0.3191𝑋

Standard Error: (120.06) (0.8122) (0.4003)

t-ratio: (1.2107) (–3.4444) (–0.7971)

R 2  0.97778 ... (10.6)

Since the explanatory variables are not exactly related, we can find estimates for
the parameters. In this case, regression can be estimated unlike the first case of
prefect multicollinearity. It does not mean that there is no problem with our
estimators if there is imperfect multicollinearity. We discuss the consequences of
132 multicollinearity in the next section.
Check Your Progress 1 Multicollinearity

1) What is meant by perfect multicollinearity?


...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

2) What do you understand by imperfect multicollinearity?


...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

3) Explain why it is not possible to estimate a multiple regression model in the


presence of perfect multicollinearity.
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

10.3 CONSEQUENCES OF MULTICOLLINEARITY


We know from Unit 4 that the ordinary least squares (OLS) estimators are the
Best Linear Unbiased Estimators (BLUE). It implies they have the minimum
variance in the class of all linear unbiased estimators. In the case of imperfect
multicollinearity, the OLS estimators still remain BLUE. Then what is the
problem? In the presence of multicollinearity, there is an increase in the variance
and standard error of the coefficients. As a result, very few estimators are
statistically significant.
Some more consequences of multicollinearity are given below.
(a) The explanatory variables may not be linearly related in the population
(i.e., in the population regression function), but they could be related in a
particular sample. Thus multicollinearity is a sample problem.
(b) Near or high multicollinearity results in large variances and standard
errors of OLS estimators. As a result, it becomes difficult to estimate true
value of the estimator. 133
Treatment of Violations (c) Multicollinearity results in wider confidence intervals. The standard
of Assumptions
errors associated with the partial slope coefficients are higher. Therefore,
it results in wider confidence intervals.
𝑃 𝑏 −𝑡 ⁄ 𝑆𝐸(𝑏 ) ≤ 𝛽 ≤ 𝑏 + 𝑡 ⁄ 𝑆𝐸(𝑏 ) = 1 − 𝛼 ….(10.7)

Since the values of standard errors have increased the interval reflected in
expression in (10.7) has widened.
(d) Insignificant t ratios: As pointed out above, standard errors of the
estimators increase due to multicollinearity. The t-ratio is given as
= . Therefore, the t-ratio is very small. Thus we tend to accept (or
( )
do not reject) the null hypothesis and tend to conclude that the variable
has no effect on the dependent variable.
(e) A high 𝑅 and few significant t-ratios: In equation (10.6) we notice that
the 𝑅 is very high, about 98% or 0.98. The t-ratios of both the
explanatory variables are not statistically significant. Only the price
variable slope coefficient has significant t-value. However, using F-test
while testing overall significance 𝐻 : 𝑅 = 0, we reject the null
hypotheses. Thus there is some discrepancy between the results of the F-
test and the t-test.
(f) The OLS estimators are mainly partial slope coefficients and their
standard errors become very sensitive to small changes in the data. If
there is a small change in data, the regression results change substantially.
(g) Wrong signs of regression coefficients: It is a very prominent impact of
the presence of multicollinearity. In the case of the example given at
equation (10.6) we find that the coefficient of the variable income is
negative. The income variable has a ‘wrong’ sign as economic theory
suggests that income effect is positive unless the commodity concerned is
an inferior good.

10.4 DETECTION OF MULTICOLLINEARITY


In the previous section we pointed out the consequences of multicollinearity.
Now let us discuss how multicollinearity can be detected.

(h) High R 2 and Few Significant t-ratios


This is the classic symptom of multicollinearity. If 𝑅 is high (greater
than 0.8), the null hypothesis that the partial slope coefficients are jointly
or simultaneously equal to zero H0 : 2  3  0 is rejected in most cases
(on the basis of F-test). But the individual t-tests will reflect that none or
very few partial slope coefficients are statistically different from zero.
This suggests very few slope coefficients are statistically significant.
134
(ii) High Pair-wise Correlations among Explanatory Variables Multicollinearity

Due to high correlation among the independent variables, the estimated


regression coefficients have high standard errors. But this is not
necessarily true as demonstrated below. Even low correlation among the
independent variables can lead to the problem of multicollinearity.
Let r23 , r24 and r34 represent the pair-wise correlation coefficients
between X 2 and X 3 and X 4 respectively. Suppose r23  0.90, reflecting
high collinearity between X 2 and X 3 . Let us consider partial correlation
coefficient r23.4 that indicates correlation between X 2 and X 3 (while
keeping the influence of X 4 constant). Suppose we find that r23.4  0.43 .
It indicates that partial correlation between X2 and X 3 is low reflecting
the absence of high collinearity. Therefore, pair-wise correlation
coefficient when replaced by partial correlation coefficients does not
indicate the presence of multicollinearity. Suppose the true population
regression is given by equation (10.8)

𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 +𝑢 … (10.8)

Suppose the explanatory variables are perfectly correlated with each other
as shown in equation (10.9) below

𝑋 =𝜆 𝑋 +𝜆 𝑋 … (10.9)
X4 is an exact linear combination of X2 and X 3

If we estimate the coefficient of determination by regressing X 4 on X 2


and 𝑋 , we find that

𝑅 . = ... (10.10)

Suppose, r42  0.5, r43  0.5, r23  0.5. If we substitute these values in
equation (10.10), we find that 𝑅 . = 1. An implication of the above is
that all the correlation coefficients (among explanatory variables) are not
very high but still there is perfect multicollinearity.

(iii) Subsidiary or Auxiliary Regressions


Suppose one explanatory variable is regressed on each of the remaining
variables and the corresponding 𝑅 is computed. Each of these
regressions is referred to as subsidiary or auxiliary regression. For
example, in a regression model with seven explanatory variables, we
regress 𝑋 on 𝑋 , 𝑋 , 𝑋 , 𝑋 , 𝑋 and X 7 and find out the 𝑅 . Similarly, we
can regress 𝑋 on 𝑋 , 𝑋 , 𝑋 , 𝑋 , 𝑋 and X 7 and find out the 𝑅 . By
examining the auxiliary regression models we can find out the possibility
135
Treatment of Violations of multicollinearity. We take the rule of thumb that multicollinearity may
of Assumptions
be troublesome if 𝑅 obtained from auxiliary regression is greater than
overall R 2 of the regression model.

A limitation of this method is that we have to compute 𝑅 several times,


which is cumbersome and time consuming.

(iv) Variance Inflation Factor (VIF)


Another indicator of multicollinearity is the variance inflation factor
(VIF). The 𝑅 obtained from auxiliary regressions may not be a reliable
indicator of collinearity. In VIF method we modify the formula of
variance of the estimators as follows; (𝑏 ) and (b3 )

var (𝑏 ) = = . ... (10.11)


∑ ∑

In equation (10.11), you should note that 𝑅 is the auxiliary regression


discussed earlier.

Compare the variance of 𝑏 given in equation (10.11) with the usual


formula for variance of an estimator given in Unit 4. We find that

𝑣𝑎𝑟 (𝑏 ) = 𝑉𝐼𝐹 ... (10.12)


where VIF =

Similarly, 𝑣𝑎𝑟 (𝑏 ) = (𝑉𝐼𝐹 )


Note that as 𝑅 increases the VIF also increases. This inflates the variance
and hence standard errors of b 2 and b3

If 𝑅 = 0, 𝑉𝐼𝐹 = 1 ⇒ 𝑉(𝑏 ) = and 𝑉(𝑏 ) =


∑ ∑

Therefore, there is no collinearity.

On the other hand,

if 𝑅 = 1, 𝑉𝐼𝐹 = ∞ ⇒ 𝑉 (𝑏 ) → ∞, 𝑉 (𝑏 ) → ∞

If 𝑅 is high, however 𝑉 (𝑏 ) tends to ∞.

Note that 𝑣𝑎𝑟 (𝑏 ) depends not only on 𝑅 , but also on  2 and  x 22i . It
is possible that R i2 is high (say, 0.91) but 𝑣𝑎𝑟(𝑏 ) could be lower due to
low 2 or high  x 22i . Thus Vb2  is still lower resulting in high t value.
Thus 𝑅 obtained from auxiliary regression is only a superficial indicator
of multicollinearity.

136
Check Your Progress 2 Multicollinearity

1) Bring out four important consequences of multicollinearity.


...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

2) Explain how multicollinearity can be detected using partial correlations.


...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

3) Describe the method of detection of multicollinearity using the variance


inflation factor (VIF).
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

10.5 REMEDIAL MEASURES OF


MULTICOLLINEARITY
Multicollinearity may not necessarily be an “evil’’ if the goal of the study is to
forecast the mean value of the dependent variable. If the collinearity between the
explanatory variables is expected to continue in future, then the population
regression function can be used to predict the relationship between the dependent
variable Y and other collinear explanatory variables.
However, if in some other sample, the degree of collinearity between the two
variables is not that strong the forecast based on the given Regression is of little
use.

On the other hand, if the objective of the study is not only prediction but also
reliable estimations of the individual parameters of the chosen model then serious
collinearity may be bad, since multicollinearity results in large standard errors of
estimators and therefore widens confidence interval. Thus, resulting in accepting
null hypotheses in most cases. If the objective of the study is to estimate a group 137
Treatment of Violations of coefficients (i.e., sum or difference of two coefficients) then this is possible
of Assumptions
even in presence of multicollinearity. In such a case multicollinearity may not be
a problem.

𝑌 =𝐶 +𝐶 𝑋 +𝑢 …(10.13)
C1  A1  300A 3 , C 2  A 2  2A 3

Running the above regression in equation (10.2), as presented in earlier section


10.2, one can easily estimate C 2 by using OLS method, although neither A 2 nor
A 3 can be estimated individually. There can be situation when in spite of inflated
S.E., the individual coefficients happened to be numerically significant since the
true value itself is so large even or estimate on the downside still shows up a
significant test.

Certain remedies prescribed for reducing the severity of collinearity problem


which can be listed as OLS estimators can still retain BLUE property despite of
near collinearity. Further, one or more regression coefficients can e individually
statistically significant or some of them with wrong signs.

10.5.1 Dropping a Variable from the Model


The simplest solution may be to drop one or more of the collinear variables.
However, dropping a variable from the model may lead to model specification
error. In other words, when we estimate the model without the excluded variable,
the estimated parameters of the reduced model may turn out to be biased.
Therefore, the best practical advice is not to drop a variable from a model that is
theoretically sound. A variable which has t value of its coefficient greater than 1,
then than variable should not be dropped as it will result in a decrease in 𝑅¯ .

10.5.2 Acquiring Additional Data or New Sample


Acquiring additional data implies increasing the sample size. This is likely to
reduce the severity of the multicollinearity problem. As we know from equation
(10.11),

var (𝑏 ) =

Given  2 and 𝑅 , if the sample size of 𝑋 increases, there is an increase in ∑𝑥 .


It will lead a decrease in var (𝑏 ) and its standard error.

10.5.3 Re-Specification of the Model


It is possible that some important variables are omitted from the model. The
functional form of the model may also be incorrect. Therefore, there is a need of
looking into the specification of the model. Many times, taking log form of a
138 model leads to solving the problem of multicollinearity.
10.5.4 Prior Information about Certain Parameters Multicollinearity

Estimated values of certain parameters are available in existing studies. These


values can be used as prior information. These values give us some tentative idea
on the plausible value of the parameters.
10.5.5 Transformation of Variables
Transformation of the variables would minimize the problem of collinearity.
10.5.6 Ridge Regression
The ridge regressions are another method of resolving the problem of
multicollinearity. In the ridge regression, the first step is to standardize the
variables both dependent and independent by subtracting the respective means
and dividing by their standard deviations. This mainly implies that the main
regression is run by transforming both dependent and explanatory variables into
the standardized values.
It is observed that in the presence of multicollinearity, the value of variance
inflation factor is substantially high. This is mainly due to a high value of
coefficient of determination. The ridge regression is applied when the regression
equations are in the form of matrix involving large number of explanatory
variables.
The ridge regression proceeds by adding a small value, k, to the diagonal
elements of the correlation matrix. The reason that the diagonal of ones in the
correlation matrix could be considered as a ridge, this is the reason such
regression is referred as ridge regression.
10.5.7 Other Remedial Measures
There are several other Remedies suggested such as combining time series and
cross-sectional data, factor or principal component analysis and ridge regressions.
Polynomial Regression Models
Let us consider total cost of production (TC) as a function of output as well as
marginal cost (MC) and Average Cost (AC)
𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 …….(10.12)
The cost function is defined as Cubic function for cost as a third-degree
polynomial of variable X. This model in equation (10.12) is linear in parameters
s , therefore satisfy assumption of CLRM of linear Regression Model and can be
estimated by usual OLS method. However, one needs to worry about problem of
collinearity since it is not linear in variables and at the same time X 2 and X 3 are
non-linear function of X and do not violate the assumptions of no perfect
collinearity i.e., no perfect linear relationship between variables. The estimated
results are presented in equation (10.13).
Ŷi  141.7667  63.4776 X i  12.9615 X i2 + 0.9396 X 3i ….(10.13)
139
Treatment of Violations Se (6.3753) (4.7786) (0.9857) (0.0591)
of Assumptions
R 2  0.9983
RC 141.7667
AC    63.4776  12.96X i  (0.9396) X i2
Xi Xi

ACi  63.4776  12.9615X i  141.7667 X i  0.9396X i2


TC
MC   63.4776  2X (12.9615) X i  3  0.9396X i2
X i

If the cost curves are U-shaped Average Marginal cost curves then the theory
suggests that the coefficient should satisfy following
1) 1 , 2 and 4  0
2) 3  0

3)  32  3  2 4

Check Your Progress 3


1) Define two significant methods to rectify the problem of multicollinearity?
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

2) Describe the method of ridge regression.


...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

10.6 LET US SUM UP


This unit presents a clear understanding of the concept of multicollinearity in the
regression model. The unit also presents a clear distinction of near and perfect
multicollinearity. The unit familiarizes the consequences of presence of
multicollinearity in regression model. The method of detection of
multicollinearity has been highlighted in the unit. Finally various techniques that
provide remedial measures including the concept of ridge regression have been
explained in the unit.

140
10.7 ANSWERS/ HINTS TO CHECK YOUR Multicollinearity

PORGRESS EXERCISES
Check Your Progress 1
1) The case of perfect multicollinearity mainly reflects the situation when
the explanatory variables and perfectly correlated with each other
implying the coefficient of correlation between the explanatory variables
is 1.
2) This refers to the case when two or more explanatory variables are not
exactly linear this reinforces the fact that collinearity can be high but not
perfect. “High collinearity” refers to the case of “near” or imperfect” or
high multicollinearity. Presence of multicollinearity implies “imperfect
multicollinearity’’
3) In the case of perfect multicollinearity it is not possible to obtain
estimators for the parameters of the regression model. See Section 10.2
for details.
Check Your Progress 2
1) (i) In case of imperfect multicollinearity, some of the estimators are
statistically not significant. But OLS estimates still retain their BLUE
property that is, Best Linear Unbiased Estimators. Therefore, imperfect
multicollinearity does not violate any of the assumptions, OLS estimators
retain BLUE property. Being BLUE with minimum variance does not
imply that the numerical value of variance will be small.

(ii) The R 2 value is very high but very few estimators are significant (t-ratios
low). The example mentioned in earlier section where the demand
function of good Y we computed using the earnings of individuals,
reflects the situation where R 2 is quite high about 98% or 0.98 but only
price variable slope coefficient has significant t-value. However, using F-
test while testing overall significance H 0 : R 2  0, we reject the
hypotheses that both prices and earnings have no effect on the demand of
Y.
(iii) The ordinary least square OLS estimators mainly partial slope coefficients
and their standard errors become very sensitive to small changes in the
data, i.e. they then to be rentable. A small charge of data, the regression
results change quite substantially as in case example of near or imperfect
multicollinearity mentioned above, the standard errors go down and t-
ratios have increased in absolute values.
(iv) Wrong signs of regression coefficients. It is a very prominent impact of
presence of multicollinearity. In case of example where earnings of
individuals were used in deriving demand curve of good Y, the earning
141
Treatment of Violations variable has the ‘wrong’ sign for the economic theory since the income
of Assumptions
effect usually positive unless it is case of inferior good.
2) Examining partial correlations: In case of three explanatory variables
X 2 , X3 and X 4 very high or perfect multicollinearity between X 4 and
X 2 , X3 .
Subsidiary or auxiliary regressions: When one explanatory variables X is
regressed on each of the remaining X variable and the corresponding R 2
is computed. Each of these regressions is referred as subsidiary or
auxiliary regression. A regression Y on X 2 , X3 , X 4 , X5 , X 6 and X 7 with
six explanatory variables. If R 2 comes out to be very high but few
significant t-ratios or very few X coefficients are individually statistically
significant then the purpose is to identify the source of the
multicollinearity or existent of perfect or near perfect linear combination
of other X s .

For this we Regress X2 on remaining X s and obtain R 22 or also written as


𝑅.

Regress X 3 on remaining Xs , and obtain R 32 coefficient of determination


also written as R 32.24567 each R i2 obtained will lie between 0 and 1. By
testing the null hypothesis H 0 : R i2  0 by applying F-test. Let r23 , r24 and
r34 represent pairwise correlation between X 2 and X 3 , X 2 and X 4 , X 3
and X4 respectively suppose r23  0.90, reflecting high collinearity
between X 2 and X 3 . Considering partial correlations coefficient r23.4 that
indicators correlations coefficient between X 2 and X 3 , Adding the
influence of X 4 constant. If r23.4  0.43 . Thus, partial correlation between
X2 and X 3 is low reflecting no high collinearity or low degree of
collinearity. Therefore, pairwise correlation when replaced by partial
correlation coefficients does not provide indicator of presence of
multicollinearity.

3) Variance Inflation Factor (VIF): R 2 obtained variables auxiliary


regression may not be completely reliable and is not reliable indicator of
collinearity. In this method we modify the formula of var (b2 ) and (b3 )

2
var (b2 ) 
2
 2
 x 2i 1  R 2 
2  1 
 . 
2  2
 X 2i  1  R 2 

 1  2
VIF     Vb2   V.I.F.
1 R2  2
 x 2i
142  2
2 Multicollinearity
Similarly, Vb3   VIF
2
 x3i

1
VIF is variance inflation factor. As R 2 increases VIF increased
1 R2
thus inflating the variance and hence standard errors of b2 and b3

2 2
If R 2  0, VIF  1  Vb2   and Vb 3  
2 2
 x 2i  x 3i

 No collinearity

If R 2  1, VIF    V b 2   , V b3   

If R 2 is high, however 𝑣𝑎𝑟(𝑏 ) → ∞, 𝑣𝑎𝑟(𝑏 ) does not only depend on


R 2 (auxiliary coefficient of determination) or VIF. If also depends on 2
and  x 22i it is possible that R i2 is high 0.91 but 𝑣𝑎𝑟 (𝑏 ) could be lower
due to low 2 or high  x 22i thus Vb2  be still lower resulting in high t
value not showing any low t end thus defeating the indicator of
multicollinearity. Thus R 2 obtained from and binary regression is only a
surface indicator of multicollinearity.
Check Your Progress 3
1) (i) Dropping a variable from the Model: The simplest solution might seem
to be to drop one or more of the collinear variables. However, dropping a
variable from the model may lead to model specification error in either
words, where we estimate the model without that variable, the estimated
parameters of reduced model may turn out to be biased. Therefore, the
best practical advice is not to drop or variable from an economically
variable model first because the collinearity problem is serious. A variable
which has t value of its coefficient greater than 1, then than variable
should not be dipped as it will result in decrease in adjusted R 2
(ii) Acquiring Additional Data or new sample: Acquiring additional data
implies increasing the sample size can reduce the severity of collinearity
problem.
2
V(b 2 ) 
 x 32i (1  R 22 )
Given  2 and R 2 , if the sample size of X 3 increases   x 32i will
increase as a result V(b 3 ) will tend to decrease and standard error b 3 will
also.

2) In ridge regression we first standardise all the variables in the model. Go


through Sub-Section 10.5.6 for details.

143

You might also like