0% found this document useful (0 votes)
253 views

HSTS423 - Unit 5 Multicolinearity

This document outlines the objectives and content of a unit on multicollinearity. The key topics covered include: 1. Defining multicollinearity and describing common causes such as cointegration of economic variables. 2. Explaining the consequences of multicollinearity such as unstable parameter estimates and inability to determine the influence of individual regressors. 3. Introducing tests for detecting multicollinearity such as inspecting standard errors and correlation coefficients in regression results.

Uploaded by

Tenday Chadoka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
253 views

HSTS423 - Unit 5 Multicolinearity

This document outlines the objectives and content of a unit on multicollinearity. The key topics covered include: 1. Defining multicollinearity and describing common causes such as cointegration of economic variables. 2. Explaining the consequences of multicollinearity such as unstable parameter estimates and inability to determine the influence of individual regressors. 3. Introducing tests for detecting multicollinearity such as inspecting standard errors and correlation coefficients in regression results.

Uploaded by

Tenday Chadoka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT 5

Multicolinearity

Objectives

At the end of this unit students are expected to be able to

1. define and explain multicolinearity as used in Econometrics,

2. describe common causes of multicolinearity,

3. describe consequences of multicolinearity in statistical


inference,

4. relate the aims and procedures of correlation analysis


to the problem of multicolinearity,

5. describe the aims and procedures of Frisch’s confluence


analysis,

6. state the assumptions of Frisch’s confluence analysis,

7. estimate model paramters in the presence of multicolinearity


i.e. conduct appropriate estimation procedures for the cases
where the assumption of non-singular moment matrix X0 X
is violated. In particular, students mus be able to perform
Principal Compnents Analysis,

8. to generate accurate forecasts in the presence multicolinearity


i.e. to make forecasts that take into account the effects of
multicolinearity.

As indicated in Unit 2, in practice some assumptions of the General Linear Model


(GLM) may be violated for a variety of reasons. In this Unit we describe a problem
called multicolinearity, its common causes, treatment etc.
Unit 1
Multicolinearity

2
1.1. CAUSES OF MULTICOLINEARITY 3

The linear model Y = X0 β + u is commonly estimated using the method of least


squares. A crucial condition for the application of least squares is that the explanatory
variables are not perfectly linearly correlated or equivalently, that the moment matrix
X0 X is non-singular. A natural question to ask is, what happens if this assumption is
violated?

Definition 1.1 In modelling a dependent variable Y using some explanatory variables


X = (X1 , . . . , Xk )0 , the term multicolinearity is used to mean the presence of a linear
relationship among the explanatory variables, of the form a0 X = c where a and c are
constant vector and scalar respectively.

Recall also that in Econometrics the regressors are often stochastic or random rather
than deterministic. Thus, it is meaningful to talk of correlation among explanatory
variables.(See exercise 1). Further, it follows that multicolinearity is inherent in most
economic variables due to the interrelationships that exist among economic variables.

If two explanatory variables Xi and Xj are perfectly correlated i.e. ρXi Xj = 1 the
normal equations X0 Xβ = X0 Y become indeterminate i.e. it becomes impossible to
obtain numerical estimates of the parameters βi and the least squares method breaks
down since the moment matrix X0 X is singular or non-invertible.

On the hand, if ρXi ,Xj = 0 for all i 6= j, the explanatory variables are said to be
orthogonal and there is no problem in estimating the parameters βi0 s. In practice,
however, neither of the extreme cases is often met. There is often some degree of
intercorrelation or interdependence among the explanatory variables. In this case each
ρXi Xj will be strictly between 0 and 1, i.e. 0 < |ρXi Xj | < 1 and the multicolinearity
problems may impair the accuracy and stability of parameter estimates, but the exact
effects have not as yet been theoretically established.

When any two explanatory variables are changing in nearly the same way it becomes
extremely difficult to establish the influence of each one regressor say Xi on the depen-
dent variable Y separately. For example, suppose that the expenditure of an individual
depends on income and liquid(i.e. easily disposable) asserts. If, over a period of time,
income and liquid asserts change by the same propotion, then the influence of one of
these explanatory variables on expenditure may be erroneously attributed to the other.

As before, the treatment of multicolinearity problem requires full knowledge of com-


mon causes of multicolinearity. The next section discusses some common causes of
multicolinearity.

1.1 Causes of multicolinearity


As indicated above multicolinearity can arise in a variety of ways.
4 UNIT 1. MULTICOLINEARITY

1. Co-integration: This apparently is the main cause of multicolinearity. There is


a tendency of economic variables to move together over time. Economic variables
are often influenced by the same factors so that the variables show the same broad
pattern of behaviour over time. For example economic booms, crashes etc., affect
a number of economic variables which then tend to change i.e. increase or decrease
together although some variables may lag behind (or lead) others. Thus, variables
such as exchange rates, prices, inflation, income, expenditure etc., tend to show
marked relationships in their evolution over time. The phenomenon is referred to
as co-integration. If these variables are used as explanatory variables in a linear
model, a multicoliearity problem can easily arise. Consider, for example the well
known Cobb-Douglas production function.

P = aK b Lc U.

The variables, K(capital inputs) and L(labour inputs) will frequently be nearly
perfectly correlated. This situation leads to serious estimation problems due to
the presence of marked multicolinearity and subsequently to interpretation errors.
The multicoliearity problem is complicated further if we introduce a variable for
enterpreneurship-quality and/or an index variable to capture the level of technical
development as one is likely to find that the new variables are closely correlated,
with K.

2. Use of lagged variables: The use of lagged variables such as Yt−1 as regres-
sors is now quite common in Econometrics and has generally given satisfactory
results in many researches. However using such lagged variables makes multi-
colinearity almost certain to exist in such models. Thus, regressions involving
lagged variables must be dealt with caution.

3. Lack of experimental control: Lack of experimental control, in particular,


adminstrative interference is a fundamental cause of multicolinearity.

4. Data smoothing: Statistical practice such as data smoothing, sampling proce-


dures, etc can also lead to multicolinearity.

From the foregoing discussion it is clear that some degree of multicolinearity is is


expected to appear in econometric modelling.

1.2 Consequences of multicolinearity

We have already indicated that if two explanatory variables say Xi and Xj are perfectly
correlated i.e. ρXi Xj = 1 then the normal equations X0 Xβ = X0 Y become indetermi-
nate i.e. it becomes impossible to obtain numerical values for the parameters βi and
the least squares method breaks down since the moment matrix X0 X is then singular
or non-invertible. Consider the two explanatory variable case

Y = β0 + β1 X1 + β2 X2 + u,
1.3. TESTS FOR MULTICOLINEARITY 5

By dropping one of the variables, i.e X2 = X1 so that the equation becomes

Y = β0 + (β1 + β2 )X1 + u

it is clear that we can obtain an efficient estimate of the sum of the coeficients. However
it is imposible to obtain efficient estimates of the individual parameters. That is the
sum β1 + β2 will be identified but the the individual parameters β1 and β2 will be
unidentified. There is a close relationship between multicolinearity and identification
problem discussed latter in this course.

If the explanatory variables are not perfectly colinear but are to a certain degree
correlated i.e. 0 < |ρX1 X−j | < 1 then the effects of multcoilinearity are uncer-
tain.Multicolinearity can cause change in the sign of a parameter resulting in erroneous
interpretation of the results.

1.3 Tests for multicolinearity


Although there is no clear cut test or indicator of multicolinearity, a combination of
the following procedures may help detect multicolinearity.

1. Inspection of regression results, in particular examination of:

(i) standard errors of parameter estimates,


(ii) high partial correlations,
(iii) high R2 -statistic
(vi) low t-statistic values
(v) sensitivity of parameter estimates, for example, if a few observations are
dropped and re-estimation of the model yields significantly different param-
eter estimates, this could indicate presence of multicolinearity.

If most or all of these events occur, there is a good reason to suspect multicolin-
earity and should all levels for partial correlations be available it may be possible
to get some indications about the nature of the multicolinearity problem through
a study of their patterns.

2. Frisch’s Confluence Analysis: This is somewhat systematic implementation of


the above indicated procedure of examining regression results. In this procedure
an explanatory variable is classified as useful, superflous or detrimental. The
procedure is to

(a) regress the dependent variable on each one of the explanatory variables
separately and examine the regression results.
(b) choose the elementary regression which appears to give the most plausible
results on both criteria used and then gradually add variables and exam-
ine their effects on the individual coefficients, standard errors, R2 -statistic,
Durbin-Watson statistic, etc.
6 UNIT 1. MULTICOLINEARITY

A new variable is classified as useful, superflous or detrimental as follows;

(i) If the new variable improves R2 and does not affect the values of individual
coefficients then it is considered as useful and is retained as an explanatory
variable

(ii) If the new variable does not improves R2 and does not affect the values of
individual coefficients then it is considered superflows and is rejected.

(iii) If the new variable affects considerably the signs or values of the coefficients,
it is considered detrimental.

3. Spectral decomposition of the moment matrix X 0 X: The inverse of the


moment matrix X0 X has spectral decomposition

v1 v10 v2 v20 vp vp0


(X0 X)−1 = VΛV = + + ... +
λ1 λ2 λp

where Λ = diag(λ1 , . . . , λp ) is a diagonoal matrix of eigenvalues of (X0 X) and


V = [v1 , v2 , . . . , vp ] is the corresponding matrix of eigenvectors of (X0 X). Thus,
under the assumption of spherical disturbance terms the covariance matrix of β̂
is
0 v2 v20 vp vp0
 
2 0 −1 2 v1 v1
cov(β̂) = σ (X X) = σ + + ... +
λ1 λ2 λp

The variance of each parameter estimate is then given by

!
2
v1j 2
v2j 2
vpj
var(β̂j ) = σ 2 + + ... +
λ1 λ2 λp

If the smallest characteristic root or eigenvalue is very small compared to the


maximum eigenvalue then there is an indication of linear dependency among the
explanatory variables. As a rule of thumb, if the condition number which is
defined by
r
λmax
λ=
λmin

is greater than 20 then there is good reason to believe, suspect or declare mult-
colinearity to be present

Example 1.1 The following data show expenditure on clothing(Y ), disposable income(X1 ),
liquid assets(X2 ), price index for clothing items(X3 ) and a general price index(X4 ).
1.3. TESTS FOR MULTICOLINEARITY 7

expenditure disposable liquid price index general price


on clothing (Y ) income (X1 ) assets X2 for clothingX3 index X4

8.4 82.9 17.1 92 94


9.6 88.0 21.3 93 96
10.4 99.9 25.1 96 97
11.4 105.3 29.0 94 97
12.2 117.7 34.0 100 100
14.2 131.0 40.0 101 101
15.8 148.2 44.0 105 104
17.9 161.8 49.0 112 109
19.3 174.2 51.0 112 111
20.8 184.7 53.0 112 111

(a) Assuming the usual GLM assumptions hold, conduct an overall F-test for the
signficance of the model with all the explanatory variables included.

(b) Compute the correlation matrix for the explanatory variables and comment on
the results.

(c) Use Frisch’s Confluence analysis to assess the effects, if any, of multicolinearity
in a regression of clothing expenditure on the supposed-to-be explanatory variables.

Solution 1.1 .

(a) The F-test for the overall significance of the general linear model

Y = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X4 + u

has statistic F = 15.6 with (4,5) degrees of freedom. Since F4,5,0.0 = 5.19 the
test is significant and conclude that there is significant relationship between ex-
penditure on clothing and the economic variables suggested as the explanatory
variables.

(b) The correlation matirix for the explanatory variables X1 , X2 , X3 and X4 , exclud-
ing the constant term, is
 
1.000 0.993 0.980 0.987
 0.993 1.000 0.964 0.973 
R=  0.980 0.964 1.000 0.991  .

0.987 0.973 0.991 1.000

From the correlation matrix we deduce that the explanatory variables appear to
be muticollinear as shown by the high sample correlation coeffiecients.

(c) To explore the effects of multicolinearity we compute the elementary i.e. simple
linear regressions as follows.
8 UNIT 1. MULTICOLINEARITY

regressor estimate β̂ standard error sβ̂ R2 -statistic Durbin-Watson

X1 -1.240 0.370 0.995 2.6


0.118 0.002

X2 -38.51 4.200 0.951 2.4


0.516 0.040

X3 2.110 0.810 0.967 0.4


0.327 0.020

X4 -53.650 3.630 0.977 2.1


0.663 0.030

Since the first elementary regression has the highest R2 -statistic and the Durbin-
Watson statistic is close to 2 (slightly above 2) we conclude that X1 is the most
important explanatory variable. The remaining explanatory variables are now
brought in one by one, each time making a careful inspection of the relevant
regression statistics. The standard errors are prented below the corresponding es-
timate. The t-statistics (not shown) can be easily by simply dividing each estimate
by the corresponding standard error.

regressors β0 β1 β2 β3 β4 R2 DW
1, X1 -1.24 .118 0.995 2.6
(.37) (.002)

1, X1 , X2 1.40 .126 -.036 0.996 2.5


(4.92) (.010) (.070)

1, X1 , X2 , X3 0.94 .138 -0.034 -.037 0.996 3.1


(5.17) (.020) (.060) (.050)

1, X1 , X2 , X3 , X4 -13.53 0.097 -.199 0.015 0.34 0.998 3.4


(7.50) (.030) (.090) (.050) (.15)

Examination of the standard errors, t-ratios, R2 -statistic and the Durbain Watson
statistics suggests that the explanatory variables X2 , X3 and X4 are superflous. A
parsimonious and still satisfactory model may be given by the model

Y = β0 + β1 X1 + u = −1.24 + 0.118X1 + u

It is important to observe that several satisfatory models may be adopted by fitting


several models, computing partial correlations etc.

In the next section examine the question of what to do about detected multicolinearity.
1.4. SOLUTIONS FOR MULTICOLINEARITY 9

1.4 Solutions for multicolinearity

Solutions to the multicolinearity problem vary and in general depend on the sever-
ity of the problem, availability of data i.e larger samples, importance of explanatory
variables which appear to be colinear, as well as the purpose for which the model is
needed. In view of the foregoing discussion, some possible solutions to the problem of
multiclinearity are:

1. Use of different regressor: This approach is only valid as long as misspecifi-


cation is avoided (or there will be biased estimates).

2. Increasing sample size: The logic or power of this procedure lies in the model’s
ability to yield estimates that converge to their true values as the sample size
increases to infinity.

3. Principal components analysis: This procedure uses sub-space of the sam-


ple information and thus reduces the information-set’s dimensions, by excluding
all but the significantly important components from entering the estimation pro-
cess. The procedure and other procedures such as as Ridge regression are not
described here but can be found, if need, from the references cited in the bibliog-
raphy. There are only cited here for the interested reader. They too have their
merits and demerits. For example one limitation of the principal components
methods is the interpretation, if any, of the parmeters in the model with trans-
formed variables. The question of what realy to if multicoliearity is encountered
is addressed at at least partially in the concluding ramrks below.

Concluding remarks. What do to with multicolinearity:

Quite a few multicolinearity remedies have been listed and dicussed above. None of
these turned out to be entirely satisfactory. Some remedies generate variance gains, at
the cost of increased biases, while producing non-interpretable estimates. The fact is
that we know a lot about multicolinearity and the way various estimators are affected,
however, so far, the is no certain cure for the damge produced by multicolinearity
and that is unfortunate due the fact that multicolinearity is a vary common regession
problem, producing impacts which can be detected, while all too frequently, little, or
nothing can be effectively done about them. Some authors have suggested if multicol-
inearity does not seriously affect estimates one may tolerate it and proceed as though
there were no multicolinearity. This implies, for example that forecasts can be gener-
ated in the usual manner. Of course an obvious alternative is to exlcude some colinear
explantory variables. The danger with this method is that the resulting error terms
may be auto-correlated or heteroscedastic.

1.5 Summary of the Unit

In this Unit we have learnt that in practice the assumption of non-singular moment ma-
trix X0 X may be violated for a number of reasons. Common causes of multicolinearity
10 UNIT 1. MULTICOLINEARITY

include, inclusion of highly correlated explanatory variables, adminstrative interfer-


ence, data treatment etc. The undesirable consequences of multicolinearity include
biased variance estimation and hence inefficient parameter estimation, low forecast-
ing power of the resulting model etc. Multicolinearity can be checked or tested for
formaly, by perfoming correlation analysis of the explanatory variables or Frisch’s con-
fluence analysis. Proper estimation and inference which take into account the effects of
multicolinearity can be achieved by employing methods such as Principal Components
Analysis. Further multicolinearity and other violations of the GLM assumptions such
as auto-correlation, heteroscedasticity, can occur simultaneously. In addition to the
methods of commonly used to address the problem of multicolinearity other slightly
specialised treatments of multicolonearity include of restricted estimation and/or intro-
duction of additional equations. We discsus restricted estimation in the next chapter
and simulataneous equation in subsequent chapters.

Activity 1.1 .

1. Explain briefly, why in a linear model Y = X0 β = u with deterministic explana-


tory variables, it is not meaningful to speak of correlation among the explanatory
variables but meaningful to talk of no-singular moment matrix.

2. Consider the model Y = β0 + β1 X1 + β2 X2 + u where the explanatory variables


are orthogonal. Show that

(a) the β 0 s from multiple regression are identical to the coefficient estimates
obtained by simple regressions of Y on the X1 and X2 respectively.
(b) the residual sum of squares is the sum of the residual sums of squares fro
the two simple regression.

3. The following data show stock price Y , short term X1 and long term X2 interest
rates.

Y X1 X2
9 9 7
5 4 2
8 8 4
6 7 3
8 8 4
5 4 2
9 9 7
6 7 3

(a) Carry out Frisch’s confluence analysis. State your conclusions clearly.
(b) Would you drop an explanatory variable if found to cause multicoliearity?.

4. the following data show values of 5 variables Y, X1 , X2 , X3 and X4 .


1.5. SUMMARY OF THE UNIT 11

Y X1 X2 X3 X4

6.0 40.1 5.5 108 63


6.0 40.3 4.7 94 72
6.5 47.5 5.2 108 86
7.1 49.2 6.8 100 100
7.2 52.3 7.3 99 107
7.6 58.0 8.7 99 111
8.0 61.3 10.2 101 114
9.0 62.5 14.1 97 116
9.0 64.7 17.1 93 119
9.3 66.8 21.3 102 121

(a) Test for multicolinearity using spectral decomposition.


(b) Carry out Frisch’s confluence analysis. State your conclusions clearly.
(c) Would you drop an explanatory variable if found to cause multicoliearity?.

5. The following data shows production output P, labour input L and and the capital
input k for 20 firms.

P L K

82 15 90
73 39 40
58 99 20
68 12 60
98 42 60
83 95 30
100 45 60
110 36 80
120 40 80
95 65 40
115 30 80
64 60 30
140 100 60
85 95 40
56 75 20
150 90 90
65 25 30
36 80 10
57 12 40
50 65 20

(i) Obtain estimates of a Cobb-Douglas production function using the observa-


tions 1- 15.
(ii) Explore the pattern of multicollinearity and its effects on the estimates.
12 UNIT 1. MULTICOLINEARITY

(iii) Test the hypothesis that the estimates are sensitive to sample size, utilising
the additional information of observations 16-20.
(iv) Comment on your results.

6. A linear model Y = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X4 was fitted to sample


data consisting of 10 observations. The residual sum of squares, eigenvectors
of the moment matrix and matrix of corresponding eigenvalues were found to be
SSE = .12, (λ1 , . . . , λ5 ) = (398380, 4048, 31, 4, 0.001) and
 
0.0049 0.0080 0.0015 0.0151 0.9998
 0.6676
 −0.6554 −0.3527 −0.0184 0.0027 
 0.1896
V= −0.3062 0.9193 0.1591 −0.0022 

 0.5090 0.4483 0.1681 −0.7154 0.0044 
0.5093 0.5252 −0.0478 0.6799 −0.0169

(a) Find the correlation matrix for the explanatory variables. State with reasons
whther or multicolinearity may be present.
(b) Find the estimate of the standard error of each parameter estimate.

1. Christ, C.F (1966), Econometric Models and Methods, John Wiley, New York.

2. Koutsoyiannis A. (1991) Theory of Econmetrics: An Introductory Exposition of


Econometric methods, Macmillan, Hong-kong

3. Matintike G. (1997), Commerce vol 2, College Press, Harare

4. Stanlake (1980), Introductory Economics, Longman, Harare

5. Statistical Year Book(1987), Central Statistical Office (CSO), Harare

You might also like