0% found this document useful (0 votes)
10 views

Multicollinearity

This document discusses multicollinearity in regression analysis. It provides an example where the variables X2 and X3 are perfectly linearly related based on the given data, making it impossible to separately identify the effects of each variable on Y. The true relationship could be any of an infinite number of possibilities. Attempting a regression would result in undefined coefficients due to the collinearity between the explanatory variables.

Uploaded by

PeterParker1983
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Multicollinearity

This document discusses multicollinearity in regression analysis. It provides an example where the variables X2 and X3 are perfectly linearly related based on the given data, making it impossible to separately identify the effects of each variable on Y. The true relationship could be any of an infinite number of possibilities. Attempting a regression would result in undefined coefficients due to the collinearity between the explanatory variables.

Uploaded by

PeterParker1983
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

MULTICOLLINEARITY

Y = 2 + 3X2 + X3
X3 = 2X2 −1

X2 X3 Y

10 19 51
11 21 56
12 23 61
13 25 66
14 27 71
15 29 76

Suppose that Y = 2 + 3X2 + X3 and that X3 = 2X2 – 1. There is no disturbance term in the
equation for Y, but that is not important. Suppose that we have the six observations shown.

1
MULTICOLLINEARITY

80
Y
70

60

50

40

30 X3
20
X2
10

0
1 2 3 4 5 6

The three variables are plotted as line graphs above. Looking at the data, it is impossible to
tell whether the changes in Y are caused by changes in X2, by changes in X3, or jointly by
changes in both X2 and X3.
2
MULTICOLLINEARITY

Y = 2 + 3X2 + X3
X3 = 2X2 −1

X2 X3 Y X2 X3 Y
change from previous observation

10 19 51
11 21 56 1 2 5
12 23 61 1 2 5
13 25 66 1 2 5
14 27 71 1 2 5
15 29 76 1 2 5

Numerically, Y increases by 5 in each observation. X2 changes by 1.

3
MULTICOLLINEARITY

80
Y
70

60

50

40 Y = 1 + 5X2 ?

30 X3
20
X2
10

0
1 2 3 4 5 6

Hence the true relationship could have been Y = 1 + 5X2.

4
MULTICOLLINEARITY

Y = 2 + 3X2 + X3
X3 = 2X2 −1

X2 X3 Y X2 X3 Y
change from previous observation

10 19 51
11 21 56 1 2 5
12 23 61 1 2 5
13 25 66 1 2 5
14 27 71 1 2 5
15 29 76 1 2 5

However, it can also be seen that X3 increases by 2 in each observation.

5
MULTICOLLINEARITY

80
Y
70

60

50

40 Y = 3.5 + 2.5X3 ?

30 X3
20
X2
10

0
1 2 3 4 5 6

Hence the true relationship could have been Y = 3.5 +2.5X3.

6
MULTICOLLINEARITY

80
Y
70

60

50

40 Y = 3.5 – 2.5p + 5pX2 + 2.5(1 – p)X3

30 X3
20
X2
10

0
1 2 3 4 5 6

These two possibilities are special cases of Y = 3.5 – 2.5p + 5pX2 + 2.5(1 – p)X3, which would
fit the relationship for any value of p.

7
MULTICOLLINEARITY

80
Y
70

60

50

40 Y = 3.5 – 2.5p + 5pX2 + 2.5(1 – p)X3

30 X3
20
X2
10

0
1 2 3 4 5 6

There is no way that regression analysis, or any other technique, could determine the true
relationship from this infinite set of possibilities, given the sample data.

8
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

What would happen if you tried to run a regression when there is an exact linear
relationship among the explanatory variables?

9
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

We will investigate, using the model with two explanatory variables shown above. [Note: A
disturbance term has now been included in the true model, but it makes no difference to the
analysis.]
10
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i ∑ 3i 3
X − X )(Y − Y ) ( X − X ) 2

− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) ∑ ( X 3i − X 3 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2

The expression for the multiple regression coefficient b2 is shown above. We will substitute
for X3 using its relationship with X2.

11
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i ∑ 3i 3
X − X )(Y − Y ) ( X − X ) 2

− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) ∑ ( X 3i − X 3 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2

( ) (
∑ X 3i − X 3 = ∑ [λ + µX 2i ] − [λ + µX 2 ]
2
)2

= ∑ (µX 2 i − µX 2 ) = ∑ µ 2 ( X 2 i − X 2 )
2 2

= µ 2 ∑ ( X 2i − X 2 )
2

First, we will replace the terms highlighted.

12
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2

− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2 2

( ) (
∑ X 3i − X 3 = ∑ [λ + µX 2i ] − [λ + µX 2 ]
2
)2

= ∑ (µX 2 i − µX 2 ) = ∑ µ 2 ( X 2 i − X 2 )
2 2

= µ 2 ∑ ( X 2i − X 2 )
2

We have made the replacement.

13
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2

− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2 2

∑ (X 2i − X 2 )( X 3 i − X 3 ) = ∑ ( X 2 i − X 2 )([λ + µX 2 i ] − [λ + µX 2 ])

= ∑ ( X 2 i − X 2 )(µX 2 i − µX 2 )

= µ ∑ ( X 2i − X 2 )
2

Next, the terms highlighted now.

14
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2

− ∑ ( X 3 i − X 3 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2

β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2

∑ =2i − X 2 )( X 3i − X 3 ) = ∑ ( X 2i − X 2 )([λ + µX 2i ] − [λ + µX 2 ])
( 0
X
0
= ∑ ( X 2 i − X 2 )(µX 2 i − µX 2 )

= µ ∑ ( X 2i − X 2 )
2

We have made the replacement.

15
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2

− ∑ ( X 3 i − X 3 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2

β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2

∑= 3i − X 3 )(Yi − Y ) = ∑ ([λ + µX 2i ] − [λ + µX 2 ])(Yi − Y )


( 0
X
0
= ∑ (µX 2 i − µX 2 )(Yi − Y )

= µ ∑ ( X 2 i − X 2 )(Yi − Y )

Finally this term.

16
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2

− µ ∑ ( X 2 i − X 2 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2

β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2

∑= 3i − X 3 )(Yi − Y ) = ∑ ([λ + µX 2i ] − [λ + µX 2 ])(Yi − Y )


( 0
X
0
= ∑ (µX 2 i − µX 2 )(Yi − Y )

= µ ∑ ( X 2 i − X 2 )(Yi − Y )

Again, we have made the replacement.

17
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2

− µ ∑ ( X 2 i − X 2 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2

β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2

0
=
0

It turns out that the numerator and the denominator are both equal to zero. The regression
coefficient is not defined.

18
MULTICOLLINEARITY

Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2

(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2

− µ ∑ ( X 2 i − X 2 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2

β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2

0
=
0

It is unusual for there to be an exact relationship among the explanatory variables in a


regression. When this occurs, it s typically because there is a logical error in the
specification.
19
MULTICOLLINEARITY

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

However, it often happens that there is an approximate relationship. We will use the wage
equation as an illustration.

20
MULTICOLLINEARITY

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

When relating earnings to schooling and work experience, it if often reasonable to suppose
that the effect of work experience is subject to diminishing returns.

21
MULTICOLLINEARITY

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

A standard way of allowing for this is to include EXPSQ, the square of EXP, in the
specification. According to the hypothesis of diminishing returns, the coefficient of EXPSQ
should be negative.
22
MULTICOLLINEARITY

. reg EARNINGS S EXP EXPSQ


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 3, 496) = 23.63
Model | 8793.741 3 2931.247 Prob > F = 0.0000
Residual | 61535.2252 496 124.062954 R-squared = 0.1250
-----------+------------------------------ Adj R-squared = 0.1197
Total | 70328.9662 499 140.939812 Root MSE = 11.138
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.869284 .2241882 8.34 0.000 1.428809 2.30976
EXP | 1.427853 .6814907 2.10 0.037 .0888882 2.766817
EXPSQ | -.0328379 .047896 -0.69 0.493 -.126942 .0612662
_cons | -15.7658 4.57953 -3.44 0.001 -24.76347 -6.76813
----------------------------------------------------------------------------

We fit this specification using Data Set 21.

23
MULTICOLLINEARITY

. reg EARNINGS S EXP EXPSQ


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.869284 .2241882 8.34 0.000 1.428809 2.30976
EXP | 1.427853 .6814907 2.10 0.037 .0888882 2.766817
EXPSQ | -.0328379 .047896 -0.69 0.493 -.126942 .0612662
_cons | -15.7658 4.57953 -3.44 0.001 -24.76347 -6.76813
----------------------------------------------------------------------------

. reg EARNINGS S EXP


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

The schooling component of the regression results is little affected by the inclusion of the
EXPSQ term. The coefficient of S indicates that an extra year of schooling increases hourly
earnings by $1.88. In the specification without EXPSQ it was 1.87.
24
MULTICOLLINEARITY

. reg EARNINGS S EXP EXPSQ


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.869284 .2241882 8.34 0.000 1.428809 2.30976
EXP | 1.427853 .6814907 2.10 0.037 .0888882 2.766817
EXPSQ | -.0328379 .047896 -0.69 0.493 -.126942 .0612662
_cons | -15.7658 4.57953 -3.44 0.001 -24.76347 -6.76813
----------------------------------------------------------------------------

. reg EARNINGS S EXP


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

Likewise, the standard error, 0.22 in the specification without EXPSQ, is also little changed,
and the coefficient remains highly significant.

25
MULTICOLLINEARITY

. reg EARNINGS S EXP EXPSQ


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.869284 .2241882 8.34 0.000 1.428809 2.30976
EXP | 1.427853 .6814907 2.10 0.037 .0888882 2.766817
EXPSQ | -.0328379 .047896 -0.69 0.493 -.126942 .0612662
_cons | -15.7658 4.57953 -3.44 0.001 -24.76347 -6.76813
----------------------------------------------------------------------------

. reg EARNINGS S EXP


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

In the specification without EXPSQ, the coefficient of EXP is significant at the 0.1 percent
level. When EXPSQ is added, it is significant only at the 5 percent level.

26
MULTICOLLINEARITY

. reg EARNINGS S EXP EXPSQ


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.869284 .2241882 8.34 0.000 1.428809 2.30976
EXP | 1.427853 .6814907 2.10 0.037 .0888882 2.766817
EXPSQ | -.0328379 .047896 -0.69 0.493 -.126942 .0612662
_cons | -15.7658 4.57953 -3.44 0.001 -24.76347 -6.76813
----------------------------------------------------------------------------

. reg EARNINGS S EXP


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

This is mostly because the standard error has increased from 0.21 to 0.68, indicating a
substantial loss of precision.

27
MULTICOLLINEARITY

. reg EARNINGS S EXP EXPSQ


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.869284 .2241882 8.34 0.000 1.428809 2.30976
EXP | 1.427853 .6814907 2.10 0.037 .0888882 2.766817
EXPSQ | -.0328379 .047896 -0.69 0.493 -.126942 .0612662
_cons | -15.7658 4.57953 -3.44 0.001 -24.76347 -6.76813
----------------------------------------------------------------------------

. reg EARNINGS S EXP


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

In the original specification, the 95 percent confidence interval for the coefficient of EXP
was from 0.57 to 1.40, which is already loose enough. Now it is from 0.09 to 2.76.

28
MULTICOLLINEARITY

. reg EARNINGS S EXP EXPSQ


----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.869284 .2241882 8.34 0.000 1.428809 2.30976
EXP | 1.427853 .6814907 2.10 0.037 .0888882 2.766817
EXPSQ | -.0328379 .047896 -0.69 0.493. cor EXP EXPSQ
-.126942 .0612662
_cons | -15.7658 4.57953 -3.44 0.001(obs=500)
-24.76347 -6.76813
----------------------------------------------------------------------------
| EXP EXPSQ
------+------------------
. reg EARNINGS S EXP EXP | 1.0000
EXPSQ | 0.9677 1.0000
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

The loss of precision is attributable to multicollinearity, the correlation between EXP and
EXPSQ being 0.97. The coefficient of EXPSQ has the anticipated negative sign, but it is not
remotely significant.
29
Copyright Christopher Dougherty 2016

These slideshows may be downloaded by anyone, anywhere for personal use.


Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.

The content of this slideshow comes from Section 3.4 of C. Dougherty,


Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.

2016.10.29

You might also like