0% found this document useful (0 votes)

50 views

Unit 10

This document discusses multicollinearity in regression models. It defines multicollinearity as high correlation between explanatory variables. It describes two types of multicollinearity: perfect multicollinearity, where explanatory variables are perfectly linearly related; and imperfect multicollinearity, where they are nearly but not perfectly related. In the case of perfect multicollinearity, unique estimates of individual regression coefficients cannot be obtained. Imperfect multicollinearity allows estimation but can still cause problems with coefficient estimates. The document outlines consequences of multicollinearity like increased standard errors and unstable estimates, and remedial measures like dropping variables or transforming data.

Uploaded by

zvingwaruthabani

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Unit 10

Uploaded by

zvingwaruthabani

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

UNIT 10 MULTICOLLINEARITY 

Structure
10.0 Objectives
10.1 Introduction
10.2 Types of Multicollinearity
10.2.1 Perfect Multicollinearity
10.2.2 Near or Imperfect Multicollinearity
10.3 Consequences of Multicollinearity
10.4 Detection of Multicollinearity
10.5 Remedial Measures of Multicollinearity
10.5.1 Dropping a Variable from the Model
10.5.2 Acquiring Additional Data or New Sample
10.5.3 Re-Specification of the Model
10.5.4 Prior Information about Certain Parameters
10.5.5 Transformation of Variables
10.5.6 Ridge Regression
10.5.7 Other Remedial Measures
10.6 Let Us Sum Up
10.7 Answers/ Hints to Check Your Progress Exercises

10.0 OBJECTIVES
After going through this unit, you should be able to
 explain the concept of multicollinearity in a regression model;
 comprehend the difference between the near and perfect multicollinearity;
 describe the consequences of multicollinearity;
1
 explain how multicollinearity can be detected; and
 describe the remedial measures of multicollinearity; and
 explain the concept of ridge regression.

10.1 INTRODUCTION
The classical linear regression model assumes that there is no perfect
multicollinearity. Multicollinearity means the presence of high correlation
between two or more explanatory variables in a multiple regression model.


Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Absence of multicollinearity implies that there is no exact linear relationship Multicollinearity
among the explanatory variables. The assumption of no perfect multicollinearity
is very crucial to a regression model since the presence of perfect
multicollinearity has serious consequences on the regression model. We will
discuss about the consequences, detection methods, and remedial measures for
multicollinearity in this Unit.

10.2 TYPES OF MULTICOLLINEARITY

Multicollinearity could be of two types: (i) perfect multicollinearity, and (ii)
imperfect multicollinearity. Remember that the division is according to the
degree or extent of relationship between the explanatory variables. The
distinction is made because of the nature of the problem they pose. We describe
both types of multicollinearity below.
10.2.1 Perfect Multicollinearity
In the case of perfect multicollinearity, the explanatory variables are perfectly
correlated with each other. It implies the coefficient of correlation between the
explanatory variables is 1. For instance, suppose want to derive the demand curve
for a good Y. We assume that quantity demanded (Y) is a function of price (𝑋 )
and income (𝑋 ). In symbols,
𝑌 = 𝑓 (𝑋 , 𝑋 ) where 𝑋 is price of good Y and 𝑋 is the weekly consumer
income.
Let us consider the following regression model (population regression function):
𝑌 =𝐴 +𝐴 𝑋 +𝐴 𝑋 +𝑢 … (10.1)
In the above equation, suppose
𝐴 is < 0. This implies that prices are inversely related do demand.
𝐴 > 0. This indicates that as income increases, demand for the good increases.
Suppose there is a perfect relationship between 𝑋 and 𝑋 such that
𝑋 = 300 − 2𝑋 … (10.2)
In the above case, if we regress X3 on X2 we obtain the coefficient of
determination 𝑅 = 1.
If we substitute the value of X3 from equation (10.2), we obtain
𝑌 = 𝐴 + 𝐴 𝑋 + 𝐴 (300 − 2𝑋 ) + 𝑢
= 𝐴 + 𝐴 𝑋 + 300𝐴 − 2𝐴 𝑋 + 𝑢
= (𝐴 + 300𝐴 ) + (𝐴 − 2𝐴 )𝑋 + 𝑢 … (10.3)
Let 𝐶 = (𝐴 + 300𝐴 ) and 𝐶 = (𝐴 − 2𝐴 ) . Then equation (10.3) can be
written as:
131
Treatment of Violations 𝑌 =𝐶 +𝐶 𝑋 +𝑢 ….(10.4)
of Assumptions
Thus if we estimate the regression model given at (10.4), we obtain estimators for
C1 and C2. We do not obtain unique estimators for A1, A2 and A3.

As a result, in the case of perfect linear relationship or perfect multicollinearity

among explanatory variables, we cannot obtain unique estimators of all the
parameters. Since we cannot obtain their unique estimates, we cannot draw any
statistical inferences (hypothesis testing) about them. Thus, in case of perfect
multicollinearity, estimation and hypothesis testing of individual regression
coefficients in a multiple regression are not possible.

10.2.2 Near or Imperfect Multicollinearity

In the previous section, the presence of perfect multicollinearity indicated that we
do not get unique estimators for all the parameters in the model. In practice, we
do not encounter perfect multicollinearity. We usually encounter near or very
high multicollinearity. In this case the explanatory variables are approximately
linearity related.

High collinearity refers to the case of “near” or “imperfect” multicollinearity.

Thus, when we refer to the problem of multicollinearity we usually mean
“imperfect multicollinearity’’

Let us consider the same demand function of good Y. In this case we however
assume that there is imperfect multicollinearity between the explanatory variables
(in order to distinguish it from the earlier case, we have changed the parameter
notations). The following is the population regression function:

𝑌 =𝐵 +𝐵 𝑋 +𝐵 𝑋 +𝑢 ….(10.5)

Equation (10.5) refers to the case when two or more explanatory variables are not
exactly linear. For the above regression model, we may obtain an estimated
regression equation as follows:

Equation (10.5): Y = 145.37 − 2.7975𝑋 − 0.3191𝑋

Standard Error: (120.06) (0.8122) (0.4003)

t-ratio: (1.2107) (–3.4444) (–0.7971)

R 2  0.97778 ... (10.6)

Since the explanatory variables are not exactly related, we can find estimates for
the parameters. In this case, regression can be estimated unlike the first case of
prefect multicollinearity. It does not mean that there is no problem with our
estimators if there is imperfect multicollinearity. We discuss the consequences of
132 multicollinearity in the next section.
Check Your Progress 1 Multicollinearity

1) What is meant by perfect multicollinearity?

...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

2) What do you understand by imperfect multicollinearity?

3) Explain why it is not possible to estimate a multiple regression model in the

presence of perfect multicollinearity.
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

10.3 CONSEQUENCES OF MULTICOLLINEARITY

We know from Unit 4 that the ordinary least squares (OLS) estimators are the
Best Linear Unbiased Estimators (BLUE). It implies they have the minimum
variance in the class of all linear unbiased estimators. In the case of imperfect
multicollinearity, the OLS estimators still remain BLUE. Then what is the
problem? In the presence of multicollinearity, there is an increase in the variance
and standard error of the coefficients. As a result, very few estimators are
statistically significant.
Some more consequences of multicollinearity are given below.
(a) The explanatory variables may not be linearly related in the population
(i.e., in the population regression function), but they could be related in a
particular sample. Thus multicollinearity is a sample problem.
(b) Near or high multicollinearity results in large variances and standard
errors of OLS estimators. As a result, it becomes difficult to estimate true
value of the estimator. 133
Treatment of Violations (c) Multicollinearity results in wider confidence intervals. The standard
of Assumptions
errors associated with the partial slope coefficients are higher. Therefore,
it results in wider confidence intervals.
𝑃 𝑏 −𝑡 ⁄ 𝑆𝐸(𝑏 ) ≤ 𝛽 ≤ 𝑏 + 𝑡 ⁄ 𝑆𝐸(𝑏 ) = 1 − 𝛼 ….(10.7)

Since the values of standard errors have increased the interval reflected in
expression in (10.7) has widened.
(d) Insignificant t ratios: As pointed out above, standard errors of the
estimators increase due to multicollinearity. The t-ratio is given as
= . Therefore, the t-ratio is very small. Thus we tend to accept (or
( )
do not reject) the null hypothesis and tend to conclude that the variable
has no effect on the dependent variable.
(e) A high 𝑅 and few significant t-ratios: In equation (10.6) we notice that
the 𝑅 is very high, about 98% or 0.98. The t-ratios of both the
explanatory variables are not statistically significant. Only the price
variable slope coefficient has significant t-value. However, using F-test
while testing overall significance 𝐻 : 𝑅 = 0, we reject the null
hypotheses. Thus there is some discrepancy between the results of the F-
test and the t-test.
(f) The OLS estimators are mainly partial slope coefficients and their
standard errors become very sensitive to small changes in the data. If
there is a small change in data, the regression results change substantially.
(g) Wrong signs of regression coefficients: It is a very prominent impact of
the presence of multicollinearity. In the case of the example given at
equation (10.6) we find that the coefficient of the variable income is
negative. The income variable has a ‘wrong’ sign as economic theory
suggests that income effect is positive unless the commodity concerned is
an inferior good.

10.4 DETECTION OF MULTICOLLINEARITY

In the previous section we pointed out the consequences of multicollinearity.
Now let us discuss how multicollinearity can be detected.

(h) High R 2 and Few Significant t-ratios

This is the classic symptom of multicollinearity. If 𝑅 is high (greater
than 0.8), the null hypothesis that the partial slope coefficients are jointly
or simultaneously equal to zero H0 : 2  3  0 is rejected in most cases
(on the basis of F-test). But the individual t-tests will reflect that none or
very few partial slope coefficients are statistically different from zero.
This suggests very few slope coefficients are statistically significant.
134
(ii) High Pair-wise Correlations among Explanatory Variables Multicollinearity

Due to high correlation among the independent variables, the estimated

regression coefficients have high standard errors. But this is not
necessarily true as demonstrated below. Even low correlation among the
independent variables can lead to the problem of multicollinearity.
Let r23 , r24 and r34 represent the pair-wise correlation coefficients
between X 2 and X 3 and X 4 respectively. Suppose r23  0.90, reflecting
high collinearity between X 2 and X 3 . Let us consider partial correlation
coefficient r23.4 that indicates correlation between X 2 and X 3 (while
keeping the influence of X 4 constant). Suppose we find that r23.4  0.43 .
It indicates that partial correlation between X2 and X 3 is low reflecting
the absence of high collinearity. Therefore, pair-wise correlation
coefficient when replaced by partial correlation coefficients does not
indicate the presence of multicollinearity. Suppose the true population
regression is given by equation (10.8)

𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 +𝑢 … (10.8)

Suppose the explanatory variables are perfectly correlated with each other
as shown in equation (10.9) below

𝑋 =𝜆 𝑋 +𝜆 𝑋 … (10.9)
X4 is an exact linear combination of X2 and X 3

If we estimate the coefficient of determination by regressing X 4 on X 2

and 𝑋 , we find that

𝑅 . = ... (10.10)

Suppose, r42  0.5, r43  0.5, r23  0.5. If we substitute these values in
equation (10.10), we find that 𝑅 . = 1. An implication of the above is
that all the correlation coefficients (among explanatory variables) are not
very high but still there is perfect multicollinearity.

(iii) Subsidiary or Auxiliary Regressions

Suppose one explanatory variable is regressed on each of the remaining
variables and the corresponding 𝑅 is computed. Each of these
regressions is referred to as subsidiary or auxiliary regression. For
example, in a regression model with seven explanatory variables, we
regress 𝑋 on 𝑋 , 𝑋 , 𝑋 , 𝑋 , 𝑋 and X 7 and find out the 𝑅 . Similarly, we
can regress 𝑋 on 𝑋 , 𝑋 , 𝑋 , 𝑋 , 𝑋 and X 7 and find out the 𝑅 . By
examining the auxiliary regression models we can find out the possibility
135
Treatment of Violations of multicollinearity. We take the rule of thumb that multicollinearity may
of Assumptions
be troublesome if 𝑅 obtained from auxiliary regression is greater than
overall R 2 of the regression model.

A limitation of this method is that we have to compute 𝑅 several times,

which is cumbersome and time consuming.

(iv) Variance Inflation Factor (VIF)

Another indicator of multicollinearity is the variance inflation factor
(VIF). The 𝑅 obtained from auxiliary regressions may not be a reliable
indicator of collinearity. In VIF method we modify the formula of
variance of the estimators as follows; (𝑏 ) and (b3 )

var (𝑏 ) = = . ... (10.11)

∑ ∑

In equation (10.11), you should note that 𝑅 is the auxiliary regression

discussed earlier.

Compare the variance of 𝑏 given in equation (10.11) with the usual

formula for variance of an estimator given in Unit 4. We find that

𝑣𝑎𝑟 (𝑏 ) = 𝑉𝐼𝐹 ... (10.12)

∑

where VIF =

Similarly, 𝑣𝑎𝑟 (𝑏 ) = (𝑉𝐼𝐹 )

∑

Note that as 𝑅 increases the VIF also increases. This inflates the variance
and hence standard errors of b 2 and b3

If 𝑅 = 0, 𝑉𝐼𝐹 = 1 ⇒ 𝑉(𝑏 ) = and 𝑉(𝑏 ) =

∑ ∑

Therefore, there is no collinearity.

On the other hand,

if 𝑅 = 1, 𝑉𝐼𝐹 = ∞ ⇒ 𝑉 (𝑏 ) → ∞, 𝑉 (𝑏 ) → ∞

If 𝑅 is high, however 𝑉 (𝑏 ) tends to ∞.

Note that 𝑣𝑎𝑟 (𝑏 ) depends not only on 𝑅 , but also on  2 and  x 22i . It
is possible that R i2 is high (say, 0.91) but 𝑣𝑎𝑟(𝑏 ) could be lower due to
low 2 or high  x 22i . Thus Vb2  is still lower resulting in high t value.
Thus 𝑅 obtained from auxiliary regression is only a superficial indicator
of multicollinearity.

136
Check Your Progress 2 Multicollinearity

1) Bring out four important consequences of multicollinearity.

2) Explain how multicollinearity can be detected using partial correlations.

3) Describe the method of detection of multicollinearity using the variance

inflation factor (VIF).
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

10.5 REMEDIAL MEASURES OF

MULTICOLLINEARITY
Multicollinearity may not necessarily be an “evil’’ if the goal of the study is to
forecast the mean value of the dependent variable. If the collinearity between the
explanatory variables is expected to continue in future, then the population
regression function can be used to predict the relationship between the dependent
variable Y and other collinear explanatory variables.
However, if in some other sample, the degree of collinearity between the two
variables is not that strong the forecast based on the given Regression is of little
use.

On the other hand, if the objective of the study is not only prediction but also
reliable estimations of the individual parameters of the chosen model then serious
collinearity may be bad, since multicollinearity results in large standard errors of
estimators and therefore widens confidence interval. Thus, resulting in accepting
null hypotheses in most cases. If the objective of the study is to estimate a group 137
Treatment of Violations of coefficients (i.e., sum or difference of two coefficients) then this is possible
of Assumptions
even in presence of multicollinearity. In such a case multicollinearity may not be
a problem.

𝑌 =𝐶 +𝐶 𝑋 +𝑢 …(10.13)
C1  A1  300A 3 , C 2  A 2  2A 3

Running the above regression in equation (10.2), as presented in earlier section

10.2, one can easily estimate C 2 by using OLS method, although neither A 2 nor
A 3 can be estimated individually. There can be situation when in spite of inflated
S.E., the individual coefficients happened to be numerically significant since the
true value itself is so large even or estimate on the downside still shows up a
significant test.

Certain remedies prescribed for reducing the severity of collinearity problem

which can be listed as OLS estimators can still retain BLUE property despite of
near collinearity. Further, one or more regression coefficients can e individually
statistically significant or some of them with wrong signs.

10.5.1 Dropping a Variable from the Model

The simplest solution may be to drop one or more of the collinear variables.
However, dropping a variable from the model may lead to model specification
error. In other words, when we estimate the model without the excluded variable,
the estimated parameters of the reduced model may turn out to be biased.
Therefore, the best practical advice is not to drop a variable from a model that is
theoretically sound. A variable which has t value of its coefficient greater than 1,
then than variable should not be dropped as it will result in a decrease in 𝑅¯ .

10.5.2 Acquiring Additional Data or New Sample

Acquiring additional data implies increasing the sample size. This is likely to
reduce the severity of the multicollinearity problem. As we know from equation
(10.11),

var (𝑏 ) =
∑

Given  2 and 𝑅 , if the sample size of 𝑋 increases, there is an increase in ∑𝑥 .

It will lead a decrease in var (𝑏 ) and its standard error.

10.5.3 Re-Specification of the Model

It is possible that some important variables are omitted from the model. The
functional form of the model may also be incorrect. Therefore, there is a need of
looking into the specification of the model. Many times, taking log form of a
138 model leads to solving the problem of multicollinearity.
10.5.4 Prior Information about Certain Parameters Multicollinearity

Estimated values of certain parameters are available in existing studies. These

values can be used as prior information. These values give us some tentative idea
on the plausible value of the parameters.
10.5.5 Transformation of Variables
Transformation of the variables would minimize the problem of collinearity.
10.5.6 Ridge Regression
The ridge regressions are another method of resolving the problem of
multicollinearity. In the ridge regression, the first step is to standardize the
variables both dependent and independent by subtracting the respective means
and dividing by their standard deviations. This mainly implies that the main
regression is run by transforming both dependent and explanatory variables into
the standardized values.
It is observed that in the presence of multicollinearity, the value of variance
inflation factor is substantially high. This is mainly due to a high value of
coefficient of determination. The ridge regression is applied when the regression
equations are in the form of matrix involving large number of explanatory
variables.
The ridge regression proceeds by adding a small value, k, to the diagonal
elements of the correlation matrix. The reason that the diagonal of ones in the
correlation matrix could be considered as a ridge, this is the reason such
regression is referred as ridge regression.
10.5.7 Other Remedial Measures
There are several other Remedies suggested such as combining time series and
cross-sectional data, factor or principal component analysis and ridge regressions.
Polynomial Regression Models
Let us consider total cost of production (TC) as a function of output as well as
marginal cost (MC) and Average Cost (AC)
𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 …….(10.12)
The cost function is defined as Cubic function for cost as a third-degree
polynomial of variable X. This model in equation (10.12) is linear in parameters
s , therefore satisfy assumption of CLRM of linear Regression Model and can be
estimated by usual OLS method. However, one needs to worry about problem of
collinearity since it is not linear in variables and at the same time X 2 and X 3 are
non-linear function of X and do not violate the assumptions of no perfect
collinearity i.e., no perfect linear relationship between variables. The estimated
results are presented in equation (10.13).
Ŷi  141.7667  63.4776 X i  12.9615 X i2 + 0.9396 X 3i ….(10.13)
139
Treatment of Violations Se (6.3753) (4.7786) (0.9857) (0.0591)
of Assumptions
R 2  0.9983
RC 141.7667
AC    63.4776  12.96X i  (0.9396) X i2
Xi Xi

ACi  63.4776  12.9615X i  141.7667 X i  0.9396X i2

TC
MC   63.4776  2X (12.9615) X i  3  0.9396X i2
X i

If the cost curves are U-shaped Average Marginal cost curves then the theory
suggests that the coefficient should satisfy following
1) 1 , 2 and 4  0
2) 3  0

3)  32  3  2 4

Check Your Progress 3

1) Define two significant methods to rectify the problem of multicollinearity?
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................
...........................................................................................................................

2) Describe the method of ridge regression.

10.6 LET US SUM UP

This unit presents a clear understanding of the concept of multicollinearity in the
regression model. The unit also presents a clear distinction of near and perfect
multicollinearity. The unit familiarizes the consequences of presence of
multicollinearity in regression model. The method of detection of
multicollinearity has been highlighted in the unit. Finally various techniques that
provide remedial measures including the concept of ridge regression have been
explained in the unit.

140
10.7 ANSWERS/ HINTS TO CHECK YOUR Multicollinearity

PORGRESS EXERCISES
Check Your Progress 1
1) The case of perfect multicollinearity mainly reflects the situation when
the explanatory variables and perfectly correlated with each other
implying the coefficient of correlation between the explanatory variables
is 1.
2) This refers to the case when two or more explanatory variables are not
exactly linear this reinforces the fact that collinearity can be high but not
perfect. “High collinearity” refers to the case of “near” or imperfect” or
high multicollinearity. Presence of multicollinearity implies “imperfect
multicollinearity’’
3) In the case of perfect multicollinearity it is not possible to obtain
estimators for the parameters of the regression model. See Section 10.2
for details.
Check Your Progress 2
1) (i) In case of imperfect multicollinearity, some of the estimators are
statistically not significant. But OLS estimates still retain their BLUE
property that is, Best Linear Unbiased Estimators. Therefore, imperfect
multicollinearity does not violate any of the assumptions, OLS estimators
retain BLUE property. Being BLUE with minimum variance does not
imply that the numerical value of variance will be small.

(ii) The R 2 value is very high but very few estimators are significant (t-ratios
low). The example mentioned in earlier section where the demand
function of good Y we computed using the earnings of individuals,
reflects the situation where R 2 is quite high about 98% or 0.98 but only
price variable slope coefficient has significant t-value. However, using F-
test while testing overall significance H 0 : R 2  0, we reject the
hypotheses that both prices and earnings have no effect on the demand of
Y.
(iii) The ordinary least square OLS estimators mainly partial slope coefficients
and their standard errors become very sensitive to small changes in the
data, i.e. they then to be rentable. A small charge of data, the regression
results change quite substantially as in case example of near or imperfect
multicollinearity mentioned above, the standard errors go down and t-
ratios have increased in absolute values.
(iv) Wrong signs of regression coefficients. It is a very prominent impact of
presence of multicollinearity. In case of example where earnings of
individuals were used in deriving demand curve of good Y, the earning
141
Treatment of Violations variable has the ‘wrong’ sign for the economic theory since the income
of Assumptions
effect usually positive unless it is case of inferior good.
2) Examining partial correlations: In case of three explanatory variables
X 2 , X3 and X 4 very high or perfect multicollinearity between X 4 and
X 2 , X3 .
Subsidiary or auxiliary regressions: When one explanatory variables X is
regressed on each of the remaining X variable and the corresponding R 2
is computed. Each of these regressions is referred as subsidiary or
auxiliary regression. A regression Y on X 2 , X3 , X 4 , X5 , X 6 and X 7 with
six explanatory variables. If R 2 comes out to be very high but few
significant t-ratios or very few X coefficients are individually statistically
significant then the purpose is to identify the source of the
multicollinearity or existent of perfect or near perfect linear combination
of other X s .

For this we Regress X2 on remaining X s and obtain R 22 or also written as

𝑅.

Regress X 3 on remaining Xs , and obtain R 32 coefficient of determination

also written as R 32.24567 each R i2 obtained will lie between 0 and 1. By
testing the null hypothesis H 0 : R i2  0 by applying F-test. Let r23 , r24 and
r34 represent pairwise correlation between X 2 and X 3 , X 2 and X 4 , X 3
and X4 respectively suppose r23  0.90, reflecting high collinearity
between X 2 and X 3 . Considering partial correlations coefficient r23.4 that
indicators correlations coefficient between X 2 and X 3 , Adding the
influence of X 4 constant. If r23.4  0.43 . Thus, partial correlation between
X2 and X 3 is low reflecting no high collinearity or low degree of
collinearity. Therefore, pairwise correlation when replaced by partial
correlation coefficients does not provide indicator of presence of
multicollinearity.

3) Variance Inflation Factor (VIF): R 2 obtained variables auxiliary

regression may not be completely reliable and is not reliable indicator of
collinearity. In this method we modify the formula of var (b2 ) and (b3 )

2
var (b2 ) 
2
 2
 x 2i 1  R 2 
2  1 
 . 
2  2
 X 2i  1  R 2 

 1  2
VIF     Vb2   V.I.F.
1 R2  2
 x 2i
142  2
2 Multicollinearity
Similarly, Vb3   VIF
2
 x3i

1
VIF is variance inflation factor. As R 2 increases VIF increased
1 R2
thus inflating the variance and hence standard errors of b2 and b3

2 2
If R 2  0, VIF  1  Vb2   and Vb 3  
2 2
 x 2i  x 3i

 No collinearity

If R 2  1, VIF    V b 2   , V b3   

If R 2 is high, however 𝑣𝑎𝑟(𝑏 ) → ∞, 𝑣𝑎𝑟(𝑏 ) does not only depend on

R 2 (auxiliary coefficient of determination) or VIF. If also depends on 2
and  x 22i it is possible that R i2 is high 0.91 but 𝑣𝑎𝑟 (𝑏 ) could be lower
due to low 2 or high  x 22i thus Vb2  be still lower resulting in high t
value not showing any low t end thus defeating the indicator of
multicollinearity. Thus R 2 obtained from and binary regression is only a
surface indicator of multicollinearity.
Check Your Progress 3
1) (i) Dropping a variable from the Model: The simplest solution might seem
to be to drop one or more of the collinear variables. However, dropping a
variable from the model may lead to model specification error in either
words, where we estimate the model without that variable, the estimated
parameters of reduced model may turn out to be biased. Therefore, the
best practical advice is not to drop or variable from an economically
variable model first because the collinearity problem is serious. A variable
which has t value of its coefficient greater than 1, then than variable
should not be dipped as it will result in decrease in adjusted R 2
(ii) Acquiring Additional Data or new sample: Acquiring additional data
implies increasing the sample size can reduce the severity of collinearity
problem.
2
V(b 2 ) 
 x 32i (1  R 22 )
Given  2 and R 2 , if the sample size of X 3 increases   x 32i will
increase as a result V(b 3 ) will tend to decrease and standard error b 3 will
also.

2) In ridge regression we first standardise all the variables in the model. Go

through Sub-Section 10.5.6 for details.

143

Bio Stat Methods
No ratings yet
Bio Stat Methods
474 pages
Multicollinearity Nature of Multicollinearity
100% (2)
Multicollinearity Nature of Multicollinearity
7 pages
Assignment 2 - 2020
No ratings yet
Assignment 2 - 2020
3 pages
Block-4 (3)
No ratings yet
Block-4 (3)
47 pages
Econometrics: Multicollinearity: What Happens If The Regressors Are Correlated?
100% (1)
Econometrics: Multicollinearity: What Happens If The Regressors Are Correlated?
45 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
37 pages
Ecf230 FPD 9 2018 2
No ratings yet
Ecf230 FPD 9 2018 2
44 pages
Econometrics ch11
No ratings yet
Econometrics ch11
44 pages
6 Multicolinearity
No ratings yet
6 Multicolinearity
6 pages
Multicollinearity
100% (1)
Multicollinearity
25 pages
Multicollinearity
No ratings yet
Multicollinearity
25 pages
MULTICOLLINEARITY
No ratings yet
MULTICOLLINEARITY
8 pages
LEC11
No ratings yet
LEC11
21 pages
Multicollinearity: What Happens If Explanatory Variables Are Correlated.
No ratings yet
Multicollinearity: What Happens If Explanatory Variables Are Correlated.
20 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
MULTICOLLINEALITY
No ratings yet
MULTICOLLINEALITY
20 pages
Statistical Modelling: Regression: Multicollinearity
No ratings yet
Statistical Modelling: Regression: Multicollinearity
22 pages
Violation of Assumptions of CLR Model:: Multicollinearity
No ratings yet
Violation of Assumptions of CLR Model:: Multicollinearity
28 pages
Chapter 7 (Multicolinarity)
No ratings yet
Chapter 7 (Multicolinarity)
64 pages
Multi Kol
No ratings yet
Multi Kol
44 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
AE Unit II
No ratings yet
AE Unit II
64 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
8 pages
The Least Squares Assumptions For Multiple Regression: Y X X X
No ratings yet
The Least Squares Assumptions For Multiple Regression: Y X X X
12 pages
Multicollinearity
No ratings yet
Multicollinearity
35 pages
Chapter 04 (1)
No ratings yet
Chapter 04 (1)
70 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
28 pages
Multicollinearity
No ratings yet
Multicollinearity
36 pages
Lecture 5,6,7 - Violations of CLRM
No ratings yet
Lecture 5,6,7 - Violations of CLRM
91 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
22 pages
Multicollinearity
No ratings yet
Multicollinearity
12 pages
Chapter 4 Multicollinearity
No ratings yet
Chapter 4 Multicollinearity
7 pages
Consequences of Multicollinearity
100% (2)
Consequences of Multicollinearity
2 pages
Econometrics: Multicollinearity
No ratings yet
Econometrics: Multicollinearity
9 pages
Multicolnearity 2
No ratings yet
Multicolnearity 2
28 pages
Econometric S
No ratings yet
Econometric S
11 pages
Chapter_5_multicollinearity.pptx
No ratings yet
Chapter_5_multicollinearity.pptx
20 pages
Multicollinearity Samiji
No ratings yet
Multicollinearity Samiji
13 pages
Multicollinearity
100% (1)
Multicollinearity
2 pages
CH 10
No ratings yet
CH 10
9 pages
Multicollinerity
No ratings yet
Multicollinerity
27 pages
Multicollinearity, Causes, Effects & Remedies
100% (5)
Multicollinearity, Causes, Effects & Remedies
14 pages
Data Problems: Multicollinearity and Inadequate Variation
No ratings yet
Data Problems: Multicollinearity and Inadequate Variation
4 pages
CH 10 MULTICOLLINEARITY WHAT HAPPENS IF THE EGRESSORS ARE CORRELATED
No ratings yet
CH 10 MULTICOLLINEARITY WHAT HAPPENS IF THE EGRESSORS ARE CORRELATED
36 pages
Multicollinearity
No ratings yet
Multicollinearity
8 pages
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
100% (1)
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
22 pages
Lecture Notes on Multicollinearity
No ratings yet
Lecture Notes on Multicollinearity
16 pages
Multicollinearity
No ratings yet
Multicollinearity
13 pages
Multicollinearity Assignment April 5
100% (1)
Multicollinearity Assignment April 5
15 pages
Econometrics Presentation
No ratings yet
Econometrics Presentation
31 pages
Multicollinearit 1
No ratings yet
Multicollinearit 1
2 pages
chapter 10 multicollinearity what happens if the regressors are correlated
No ratings yet
chapter 10 multicollinearity what happens if the regressors are correlated
23 pages
4 Regression Diagnostics I
No ratings yet
4 Regression Diagnostics I
10 pages
7dJDuD5Y2Fia6Ch 6 Multicollinearity&Heterosced
No ratings yet
7dJDuD5Y2Fia6Ch 6 Multicollinearity&Heterosced
23 pages
Chapter Four Violations of The Assumptions of Classical Model
No ratings yet
Chapter Four Violations of The Assumptions of Classical Model
151 pages
CHAPTER 4_violations of Assumptions
No ratings yet
CHAPTER 4_violations of Assumptions
96 pages
Multicollinearity 2023
No ratings yet
Multicollinearity 2023
32 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
Encyclopedia of Statistical Sciences 2nd Edition Campbell B. Read pdf download
100% (2)
Encyclopedia of Statistical Sciences 2nd Edition Campbell B. Read pdf download
75 pages
Eece 522 Notes - 11 CH - 6
No ratings yet
Eece 522 Notes - 11 CH - 6
16 pages
Statistics With R Fall 20180912 PDF
No ratings yet
Statistics With R Fall 20180912 PDF
101 pages
Business Research Method for BIM 6th SEM
No ratings yet
Business Research Method for BIM 6th SEM
73 pages
Instant Download Categorical and Nonparametric Data Analysis E. Michael Nussbaum PDF All Chapters
100% (3)
Instant Download Categorical and Nonparametric Data Analysis E. Michael Nussbaum PDF All Chapters
62 pages
1.07 Z-Scores
No ratings yet
1.07 Z-Scores
2 pages
The Islamic University of Gaza: Deanship of Research and Graduate Studies
No ratings yet
The Islamic University of Gaza: Deanship of Research and Graduate Studies
303 pages
Descriptive Vs Inferential Statistic (W2)
100% (1)
Descriptive Vs Inferential Statistic (W2)
6 pages
COSC Security Analytics Syllabus
No ratings yet
COSC Security Analytics Syllabus
2 pages
Econ 141 Problem Set 1
No ratings yet
Econ 141 Problem Set 1
1 page
Thesis Template (Numbered)
No ratings yet
Thesis Template (Numbered)
29 pages
Fbs Research Chapter 3
No ratings yet
Fbs Research Chapter 3
2 pages
Sta104 Revision
No ratings yet
Sta104 Revision
5 pages
Chapter 4 Lesson 2
No ratings yet
Chapter 4 Lesson 2
56 pages
English 1
No ratings yet
English 1
1 page
MPS Syllabus (2021-2022) Class 11 (Commerce) Unit Test 2: English Accountancy
No ratings yet
MPS Syllabus (2021-2022) Class 11 (Commerce) Unit Test 2: English Accountancy
2 pages
Sample Size
No ratings yet
Sample Size
4 pages
Role of Training in Increasing Faculties' Performance in Online Lectures During Pandemic - Neha Verma
No ratings yet
Role of Training in Increasing Faculties' Performance in Online Lectures During Pandemic - Neha Verma
14 pages
Latent Dirichlet Allocation
100% (2)
Latent Dirichlet Allocation
13 pages
The Use of Some Forecasting Methods and SWOT Analysis in The Selected Processes of Foundry
No ratings yet
The Use of Some Forecasting Methods and SWOT Analysis in The Selected Processes of Foundry
7 pages
Xtfef Sthelp
No ratings yet
Xtfef Sthelp
3 pages
Symbolic Regression
No ratings yet
Symbolic Regression
11 pages
Newsvendor
No ratings yet
Newsvendor
9 pages
Targeted Marketing PDF
No ratings yet
Targeted Marketing PDF
10 pages
SDA Lab 2
No ratings yet
SDA Lab 2
8 pages
SKRIPSI KHAIDIR Revisi 1 FIX.-1-1
No ratings yet
SKRIPSI KHAIDIR Revisi 1 FIX.-1-1
62 pages
Z Test (Standard Normal Distribution) (N 30) : 1) Confidence Interval For Mean
No ratings yet
Z Test (Standard Normal Distribution) (N 30) : 1) Confidence Interval For Mean
10 pages
The Effect of Fraud Risk Management, Risk Culture, On The Performance of Nigerian Banking Sector Preliminary Analysis2
No ratings yet
The Effect of Fraud Risk Management, Risk Culture, On The Performance of Nigerian Banking Sector Preliminary Analysis2
14 pages

Unit 10

Uploaded by

Unit 10

Uploaded by

UNIT 10 MULTICOLLINEARITY 

10.2 TYPES OF MULTICOLLINEARITY

As a result, in the case of perfect linear relationship or perfect multicollinearity

10.2.2 Near or Imperfect Multicollinearity

High collinearity refers to the case of “near” or “imperfect” multicollinearity.

Equation (10.5): Y = 145.37 − 2.7975𝑋 − 0.3191𝑋

Standard Error: (120.06) (0.8122) (0.4003)

t-ratio: (1.2107) (–3.4444) (–0.7971)

R 2  0.97778 ... (10.6)

1) What is meant by perfect multicollinearity?

2) What do you understand by imperfect multicollinearity?

3) Explain why it is not possible to estimate a multiple regression model in the

10.3 CONSEQUENCES OF MULTICOLLINEARITY

10.4 DETECTION OF MULTICOLLINEARITY

(h) High R 2 and Few Significant t-ratios

Due to high correlation among the independent variables, the estimated

If we estimate the coefficient of determination by regressing X 4 on X 2

(iii) Subsidiary or Auxiliary Regressions

A limitation of this method is that we have to compute 𝑅 several times,

(iv) Variance Inflation Factor (VIF)

var (𝑏 ) = = . ... (10.11)

In equation (10.11), you should note that 𝑅 is the auxiliary regression

Compare the variance of 𝑏 given in equation (10.11) with the usual

𝑣𝑎𝑟 (𝑏 ) = 𝑉𝐼𝐹 ... (10.12)

Similarly, 𝑣𝑎𝑟 (𝑏 ) = (𝑉𝐼𝐹 )

If 𝑅 = 0, 𝑉𝐼𝐹 = 1 ⇒ 𝑉(𝑏 ) = and 𝑉(𝑏 ) =

Therefore, there is no collinearity.

On the other hand,

If 𝑅 is high, however 𝑉 (𝑏 ) tends to ∞.

1) Bring out four important consequences of multicollinearity.

2) Explain how multicollinearity can be detected using partial correlations.

3) Describe the method of detection of multicollinearity using the variance

10.5 REMEDIAL MEASURES OF

Running the above regression in equation (10.2), as presented in earlier section

Certain remedies prescribed for reducing the severity of collinearity problem

10.5.1 Dropping a Variable from the Model

10.5.2 Acquiring Additional Data or New Sample

Given  2 and 𝑅 , if the sample size of 𝑋 increases, there is an increase in ∑𝑥 .

10.5.3 Re-Specification of the Model

Estimated values of certain parameters are available in existing studies. These

ACi  63.4776  12.9615X i  141.7667 X i  0.9396X i2

Check Your Progress 3

2) Describe the method of ridge regression.

10.6 LET US SUM UP

For this we Regress X2 on remaining X s and obtain R 22 or also written as

Regress X 3 on remaining Xs , and obtain R 32 coefficient of determination

3) Variance Inflation Factor (VIF): R 2 obtained variables auxiliary

If R 2 is high, however 𝑣𝑎𝑟(𝑏 ) → ∞, 𝑣𝑎𝑟(𝑏 ) does not only depend on

2) In ridge regression we first standardise all the variables in the model. Go

You might also like