HSTS423 - Unit 5 Multicolinearity
HSTS423 - Unit 5 Multicolinearity
Multicolinearity
Objectives
2
1.1. CAUSES OF MULTICOLINEARITY 3
Recall also that in Econometrics the regressors are often stochastic or random rather
than deterministic. Thus, it is meaningful to talk of correlation among explanatory
variables.(See exercise 1). Further, it follows that multicolinearity is inherent in most
economic variables due to the interrelationships that exist among economic variables.
If two explanatory variables Xi and Xj are perfectly correlated i.e. ρXi Xj = 1 the
normal equations X0 Xβ = X0 Y become indeterminate i.e. it becomes impossible to
obtain numerical estimates of the parameters βi and the least squares method breaks
down since the moment matrix X0 X is singular or non-invertible.
On the hand, if ρXi ,Xj = 0 for all i 6= j, the explanatory variables are said to be
orthogonal and there is no problem in estimating the parameters βi0 s. In practice,
however, neither of the extreme cases is often met. There is often some degree of
intercorrelation or interdependence among the explanatory variables. In this case each
ρXi Xj will be strictly between 0 and 1, i.e. 0 < |ρXi Xj | < 1 and the multicolinearity
problems may impair the accuracy and stability of parameter estimates, but the exact
effects have not as yet been theoretically established.
When any two explanatory variables are changing in nearly the same way it becomes
extremely difficult to establish the influence of each one regressor say Xi on the depen-
dent variable Y separately. For example, suppose that the expenditure of an individual
depends on income and liquid(i.e. easily disposable) asserts. If, over a period of time,
income and liquid asserts change by the same propotion, then the influence of one of
these explanatory variables on expenditure may be erroneously attributed to the other.
P = aK b Lc U.
The variables, K(capital inputs) and L(labour inputs) will frequently be nearly
perfectly correlated. This situation leads to serious estimation problems due to
the presence of marked multicolinearity and subsequently to interpretation errors.
The multicoliearity problem is complicated further if we introduce a variable for
enterpreneurship-quality and/or an index variable to capture the level of technical
development as one is likely to find that the new variables are closely correlated,
with K.
2. Use of lagged variables: The use of lagged variables such as Yt−1 as regres-
sors is now quite common in Econometrics and has generally given satisfactory
results in many researches. However using such lagged variables makes multi-
colinearity almost certain to exist in such models. Thus, regressions involving
lagged variables must be dealt with caution.
We have already indicated that if two explanatory variables say Xi and Xj are perfectly
correlated i.e. ρXi Xj = 1 then the normal equations X0 Xβ = X0 Y become indetermi-
nate i.e. it becomes impossible to obtain numerical values for the parameters βi and
the least squares method breaks down since the moment matrix X0 X is then singular
or non-invertible. Consider the two explanatory variable case
Y = β0 + β1 X1 + β2 X2 + u,
1.3. TESTS FOR MULTICOLINEARITY 5
Y = β0 + (β1 + β2 )X1 + u
it is clear that we can obtain an efficient estimate of the sum of the coeficients. However
it is imposible to obtain efficient estimates of the individual parameters. That is the
sum β1 + β2 will be identified but the the individual parameters β1 and β2 will be
unidentified. There is a close relationship between multicolinearity and identification
problem discussed latter in this course.
If the explanatory variables are not perfectly colinear but are to a certain degree
correlated i.e. 0 < |ρX1 X−j | < 1 then the effects of multcoilinearity are uncer-
tain.Multicolinearity can cause change in the sign of a parameter resulting in erroneous
interpretation of the results.
If most or all of these events occur, there is a good reason to suspect multicolin-
earity and should all levels for partial correlations be available it may be possible
to get some indications about the nature of the multicolinearity problem through
a study of their patterns.
(a) regress the dependent variable on each one of the explanatory variables
separately and examine the regression results.
(b) choose the elementary regression which appears to give the most plausible
results on both criteria used and then gradually add variables and exam-
ine their effects on the individual coefficients, standard errors, R2 -statistic,
Durbin-Watson statistic, etc.
6 UNIT 1. MULTICOLINEARITY
(i) If the new variable improves R2 and does not affect the values of individual
coefficients then it is considered as useful and is retained as an explanatory
variable
(ii) If the new variable does not improves R2 and does not affect the values of
individual coefficients then it is considered superflows and is rejected.
(iii) If the new variable affects considerably the signs or values of the coefficients,
it is considered detrimental.
!
2
v1j 2
v2j 2
vpj
var(β̂j ) = σ 2 + + ... +
λ1 λ2 λp
is greater than 20 then there is good reason to believe, suspect or declare mult-
colinearity to be present
Example 1.1 The following data show expenditure on clothing(Y ), disposable income(X1 ),
liquid assets(X2 ), price index for clothing items(X3 ) and a general price index(X4 ).
1.3. TESTS FOR MULTICOLINEARITY 7
(a) Assuming the usual GLM assumptions hold, conduct an overall F-test for the
signficance of the model with all the explanatory variables included.
(b) Compute the correlation matrix for the explanatory variables and comment on
the results.
(c) Use Frisch’s Confluence analysis to assess the effects, if any, of multicolinearity
in a regression of clothing expenditure on the supposed-to-be explanatory variables.
Solution 1.1 .
(a) The F-test for the overall significance of the general linear model
Y = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X4 + u
has statistic F = 15.6 with (4,5) degrees of freedom. Since F4,5,0.0 = 5.19 the
test is significant and conclude that there is significant relationship between ex-
penditure on clothing and the economic variables suggested as the explanatory
variables.
(b) The correlation matirix for the explanatory variables X1 , X2 , X3 and X4 , exclud-
ing the constant term, is
1.000 0.993 0.980 0.987
0.993 1.000 0.964 0.973
R= 0.980 0.964 1.000 0.991 .
From the correlation matrix we deduce that the explanatory variables appear to
be muticollinear as shown by the high sample correlation coeffiecients.
(c) To explore the effects of multicolinearity we compute the elementary i.e. simple
linear regressions as follows.
8 UNIT 1. MULTICOLINEARITY
Since the first elementary regression has the highest R2 -statistic and the Durbin-
Watson statistic is close to 2 (slightly above 2) we conclude that X1 is the most
important explanatory variable. The remaining explanatory variables are now
brought in one by one, each time making a careful inspection of the relevant
regression statistics. The standard errors are prented below the corresponding es-
timate. The t-statistics (not shown) can be easily by simply dividing each estimate
by the corresponding standard error.
regressors β0 β1 β2 β3 β4 R2 DW
1, X1 -1.24 .118 0.995 2.6
(.37) (.002)
Examination of the standard errors, t-ratios, R2 -statistic and the Durbain Watson
statistics suggests that the explanatory variables X2 , X3 and X4 are superflous. A
parsimonious and still satisfactory model may be given by the model
Y = β0 + β1 X1 + u = −1.24 + 0.118X1 + u
In the next section examine the question of what to do about detected multicolinearity.
1.4. SOLUTIONS FOR MULTICOLINEARITY 9
Solutions to the multicolinearity problem vary and in general depend on the sever-
ity of the problem, availability of data i.e larger samples, importance of explanatory
variables which appear to be colinear, as well as the purpose for which the model is
needed. In view of the foregoing discussion, some possible solutions to the problem of
multiclinearity are:
2. Increasing sample size: The logic or power of this procedure lies in the model’s
ability to yield estimates that converge to their true values as the sample size
increases to infinity.
Quite a few multicolinearity remedies have been listed and dicussed above. None of
these turned out to be entirely satisfactory. Some remedies generate variance gains, at
the cost of increased biases, while producing non-interpretable estimates. The fact is
that we know a lot about multicolinearity and the way various estimators are affected,
however, so far, the is no certain cure for the damge produced by multicolinearity
and that is unfortunate due the fact that multicolinearity is a vary common regession
problem, producing impacts which can be detected, while all too frequently, little, or
nothing can be effectively done about them. Some authors have suggested if multicol-
inearity does not seriously affect estimates one may tolerate it and proceed as though
there were no multicolinearity. This implies, for example that forecasts can be gener-
ated in the usual manner. Of course an obvious alternative is to exlcude some colinear
explantory variables. The danger with this method is that the resulting error terms
may be auto-correlated or heteroscedastic.
In this Unit we have learnt that in practice the assumption of non-singular moment ma-
trix X0 X may be violated for a number of reasons. Common causes of multicolinearity
10 UNIT 1. MULTICOLINEARITY
Activity 1.1 .
(a) the β 0 s from multiple regression are identical to the coefficient estimates
obtained by simple regressions of Y on the X1 and X2 respectively.
(b) the residual sum of squares is the sum of the residual sums of squares fro
the two simple regression.
3. The following data show stock price Y , short term X1 and long term X2 interest
rates.
Y X1 X2
9 9 7
5 4 2
8 8 4
6 7 3
8 8 4
5 4 2
9 9 7
6 7 3
(a) Carry out Frisch’s confluence analysis. State your conclusions clearly.
(b) Would you drop an explanatory variable if found to cause multicoliearity?.
Y X1 X2 X3 X4
5. The following data shows production output P, labour input L and and the capital
input k for 20 firms.
P L K
82 15 90
73 39 40
58 99 20
68 12 60
98 42 60
83 95 30
100 45 60
110 36 80
120 40 80
95 65 40
115 30 80
64 60 30
140 100 60
85 95 40
56 75 20
150 90 90
65 25 30
36 80 10
57 12 40
50 65 20
(iii) Test the hypothesis that the estimates are sensitive to sample size, utilising
the additional information of observations 16-20.
(iv) Comment on your results.
(a) Find the correlation matrix for the explanatory variables. State with reasons
whther or multicolinearity may be present.
(b) Find the estimate of the standard error of each parameter estimate.
1. Christ, C.F (1966), Econometric Models and Methods, John Wiley, New York.