Multicollinearity
Multicollinearity
Y = 2 + 3X2 + X3
X3 = 2X2 −1
X2 X3 Y
10 19 51
11 21 56
12 23 61
13 25 66
14 27 71
15 29 76
Suppose that Y = 2 + 3X2 + X3 and that X3 = 2X2 – 1. There is no disturbance term in the
equation for Y, but that is not important. Suppose that we have the six observations shown.
1
MULTICOLLINEARITY
80
Y
70
60
50
40
30 X3
20
X2
10
0
1 2 3 4 5 6
The three variables are plotted as line graphs above. Looking at the data, it is impossible to
tell whether the changes in Y are caused by changes in X2, by changes in X3, or jointly by
changes in both X2 and X3.
2
MULTICOLLINEARITY
Y = 2 + 3X2 + X3
X3 = 2X2 −1
X2 X3 Y X2 X3 Y
change from previous observation
10 19 51
11 21 56 1 2 5
12 23 61 1 2 5
13 25 66 1 2 5
14 27 71 1 2 5
15 29 76 1 2 5
3
MULTICOLLINEARITY
80
Y
70
60
50
40 Y = 1 + 5X2 ?
30 X3
20
X2
10
0
1 2 3 4 5 6
4
MULTICOLLINEARITY
Y = 2 + 3X2 + X3
X3 = 2X2 −1
X2 X3 Y X2 X3 Y
change from previous observation
10 19 51
11 21 56 1 2 5
12 23 61 1 2 5
13 25 66 1 2 5
14 27 71 1 2 5
15 29 76 1 2 5
5
MULTICOLLINEARITY
80
Y
70
60
50
40 Y = 3.5 + 2.5X3 ?
30 X3
20
X2
10
0
1 2 3 4 5 6
6
MULTICOLLINEARITY
80
Y
70
60
50
30 X3
20
X2
10
0
1 2 3 4 5 6
These two possibilities are special cases of Y = 3.5 – 2.5p + 5pX2 + 2.5(1 – p)X3, which would
fit the relationship for any value of p.
7
MULTICOLLINEARITY
80
Y
70
60
50
30 X3
20
X2
10
0
1 2 3 4 5 6
There is no way that regression analysis, or any other technique, could determine the true
relationship from this infinite set of possibilities, given the sample data.
8
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
What would happen if you tried to run a regression when there is an exact linear
relationship among the explanatory variables?
9
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
We will investigate, using the model with two explanatory variables shown above. [Note: A
disturbance term has now been included in the true model, but it makes no difference to the
analysis.]
10
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i ∑ 3i 3
X − X )(Y − Y ) ( X − X ) 2
− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) ∑ ( X 3i − X 3 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2
The expression for the multiple regression coefficient b2 is shown above. We will substitute
for X3 using its relationship with X2.
11
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i ∑ 3i 3
X − X )(Y − Y ) ( X − X ) 2
− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) ∑ ( X 3i − X 3 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2
( ) (
∑ X 3i − X 3 = ∑ [λ + µX 2i ] − [λ + µX 2 ]
2
)2
= ∑ (µX 2 i − µX 2 ) = ∑ µ 2 ( X 2 i − X 2 )
2 2
= µ 2 ∑ ( X 2i − X 2 )
2
12
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2
− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2 2
( ) (
∑ X 3i − X 3 = ∑ [λ + µX 2i ] − [λ + µX 2 ]
2
)2
= ∑ (µX 2 i − µX 2 ) = ∑ µ 2 ( X 2 i − X 2 )
2 2
= µ 2 ∑ ( X 2i − X 2 )
2
13
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2
− ∑ ( X 3 i − X 3 )(Yi − Y ) ∑ ( X 2 i − X 2 )( X 3 i − X 3 )
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( ∑ ( X 2i − X 2 )( X 3i − X 3 ) )
2 2 2 2
∑ (X 2i − X 2 )( X 3 i − X 3 ) = ∑ ( X 2 i − X 2 )([λ + µX 2 i ] − [λ + µX 2 ])
= ∑ ( X 2 i − X 2 )(µX 2 i − µX 2 )
= µ ∑ ( X 2i − X 2 )
2
14
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2
− ∑ ( X 3 i − X 3 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2
∑ =2i − X 2 )( X 3i − X 3 ) = ∑ ( X 2i − X 2 )([λ + µX 2i ] − [λ + µX 2 ])
( 0
X
0
= ∑ ( X 2 i − X 2 )(µX 2 i − µX 2 )
= µ ∑ ( X 2i − X 2 )
2
15
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2
− ∑ ( X 3 i − X 3 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2
= µ ∑ ( X 2 i − X 2 )(Yi − Y )
16
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2
− µ ∑ ( X 2 i − X 2 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2
= µ ∑ ( X 2 i − X 2 )(Yi − Y )
17
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2
− µ ∑ ( X 2 i − X 2 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2
0
=
0
It turns out that the numerator and the denominator are both equal to zero. The regression
coefficient is not defined.
18
MULTICOLLINEARITY
Y = β1 + β 2 X 2 + β 3 X 3 + u X 3 = λ + µX 2
(
∑ 2i 2 i
X − X )(Y − Y )µ 2
(
∑ 2i 2
X − X )2
− µ ∑ ( X 2 i − X 2 )(Yi − Y )µ ∑ ( X 2 i − X 2 )
2
β2 =
ˆ
∑ ( X 2i − X 2 ) µ ∑ ( X 2i − X 2 ) − ( µ ∑ ( X 2i − X 2 ) )
2 2 2 2 2
0
=
0
However, it often happens that there is an approximate relationship. We will use the wage
equation as an illustration.
20
MULTICOLLINEARITY
When relating earnings to schooling and work experience, it if often reasonable to suppose
that the effect of work experience is subject to diminishing returns.
21
MULTICOLLINEARITY
A standard way of allowing for this is to include EXPSQ, the square of EXP, in the
specification. According to the hypothesis of diminishing returns, the coefficient of EXPSQ
should be negative.
22
MULTICOLLINEARITY
23
MULTICOLLINEARITY
The schooling component of the regression results is little affected by the inclusion of the
EXPSQ term. The coefficient of S indicates that an extra year of schooling increases hourly
earnings by $1.88. In the specification without EXPSQ it was 1.87.
24
MULTICOLLINEARITY
Likewise, the standard error, 0.22 in the specification without EXPSQ, is also little changed,
and the coefficient remains highly significant.
25
MULTICOLLINEARITY
In the specification without EXPSQ, the coefficient of EXP is significant at the 0.1 percent
level. When EXPSQ is added, it is significant only at the 5 percent level.
26
MULTICOLLINEARITY
This is mostly because the standard error has increased from 0.21 to 0.68, indicating a
substantial loss of precision.
27
MULTICOLLINEARITY
In the original specification, the 95 percent confidence interval for the coefficient of EXP
was from 0.57 to 1.40, which is already loose enough. Now it is from 0.09 to 2.76.
28
MULTICOLLINEARITY
The loss of precision is attributable to multicollinearity, the correlation between EXP and
EXPSQ being 0.97. The coefficient of EXPSQ has the anticipated negative sign, but it is not
remotely significant.
29
Copyright Christopher Dougherty 2016
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2016.10.29