0% found this document useful (0 votes)
5 views

MLRM

Chapter 3 of 'Introduction to Econometrics' by Christopher Dougherty focuses on multiple regression analysis with two explanatory variables. It explains the derivation of regression coefficients using the least squares principle, the calculation of residuals, and the minimization of the sum of squares of residuals (RSS). The chapter also presents a regression output example demonstrating how hourly earnings are influenced by years of schooling and work experience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

MLRM

Chapter 3 of 'Introduction to Econometrics' by Christopher Dougherty focuses on multiple regression analysis with two explanatory variables. It explains the derivation of regression coefficients using the least squares principle, the calculation of residuals, and the minimization of the sum of squares of residuals (RSS). The chapter also presents a regression output example demonstrating how hourly earnings are influenced by years of schooling and work experience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

Type author name/s here

Dougherty

Introduction to Econometrics,
5th edition
Chapter heading
Chapter 3: Multiple Regression
Analysis

© Christopher Dougherty, 2016. All rights reserved.


MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Yˆ b1  b2 X 2  b3 X 3

The regression coefficients are derived using the same least squares principle used in
simple regression analysis. The fitted value of Y in observation i depends on our choice of
b1, b2, and b3.
11
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Yˆ b1  b2 X 2  b3 X 3

uˆ i Yi  Yˆi Yi  b1  b2 X 2 i  b3 X 3 i

The residual ei in observation i is the difference between the actual and fitted values of Y.

12
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Yˆ b1  b2 X 2  b3 X 3

uˆ i Yi  Yˆi Yi  b1  b2 X 2 i  b3 X 3 i

RSS  uˆ  Yi  b1  b2 X 2 i  b3 X 3 i 
2 2
i

We define RSS, the sum of the squares of the residuals, and choose b1, b2, and b3 so as to
minimize it.

13
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Yˆ b1  b2 X 2  b3 X 3

uˆ i Yi  Yˆi Yi  b1  b2 X 2 i  b3 X 3 i

RSS  uˆ  Yi  b1  b2 X 2 i  b3 X 3 i 
2 2
i

 (Yi 2  b12  b22 X 22i  b32 X 32i  2b1Yi  2b2 X 2 iYi


 2b3 X 3 iYi  2b1b2 X 2 i  2b1b3 X 3 i  2b2 b3 X 2 i X 3 i )
 Yi 2  nb12  b22  X 22i  b32  X 32i  2b1  Yi
 2b2  X 2 iYi  2b3  X 3 iYi  2b1b2  X 2 i
 2b1b3  X 3 i  2b2 b3  X 2 i X 3 i

First, we expand RSS as shown.

14
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Yˆ b1  b2 X 2  b3 X 3

uˆ i Yi  Yˆi Yi  b1  b2 X 2 i  b3 X 3 i

RSS  Yi 2  nb12  b22  X 22i  b32  X 32i  2b1  Yi

 2b2  X 2 iYi  2b3  X 3 iYi  2b1b2  X 2 i

 2b1b3  X 3 i  2b2 b3  X 2 i X 3 i

RSS RSS RSS


0 0 0
b1 b2 b3

Then we use the first order conditions for minimizing it.

15
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 X X 
 2i 2 i  3i 3
 Y  Y   X  X 2

   X 3 i  X 3 Yi  Y   X 2 i  X 2  X 3 i  X 3 
ˆ
2 
  X 2i  X 2    X 3 i  X 3     X 2 i  X 2  X 3 i  X 3 
2 2 2

ˆ1 Y  ˆ2 X 2  ˆ3 X 3

We thus obtain three equations in three unknowns. Solving these equations, we obtain
expressions for the specific values that satisfy the OLS criterion. (The expression for ̂ 3 is
the same as that for ̂ 2 , with the subscripts 2 and 3 interchanged everywhere.)
16
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 X X 
 2i 2 i  3i 3
 Y  Y   X  X 2

   X 3 i  X 3 Yi  Y   X 2 i  X 2  X 3 i  X 3 
ˆ
2 
  X 2i  X 2    X 3i  X 3     X 2i  X 2  X 3i  X 3 
2 2 2

ˆ1 Y  ˆ2 X 2  ˆ3 X 3

The expression for ̂ 1 is a straightforward extension of the expression for it in simple


regression analysis.

17
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 X X 
 2i 2 i  3i 3
 Y  Y   X  X 2

   X 3 i  X 3 Yi  Y   X 2 i  X 2  X 3 i  X 3 
ˆ
2 
  X 2i  X 2    X 3i  X 3     X 2i  X 2  X 3i  X 3 
2 2 2

ˆ1 Y  ˆ2 X 2  ˆ3 X 3

However, the expressions for the slope coefficients are considerably more complex than
that for the slope coefficient in simple regression analysis.

18
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 X X 
 2i 2 i  3i 3
 Y  Y   X  X 2

   X 3 i  X 3 Yi  Y   X 2 i  X 2  X 3 i  X 3 
ˆ
2 
  X 2i  X 2    X 3i  X 3     X 2i  X 2  X 3i  X 3 
2 2 2

ˆ1 Y  ˆ2 X 2  ˆ3 X 3

For the general case when there are many explanatory variables, ordinary algebra is
inadequate. It is necessary to switch to matrix algebra.

19
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

ˆ
EARNINGS  14.67  1.88 S  0.98 EXP

Here is the regression output for the wage equation using Data Set 21.

20
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

ˆ
EARNINGS  14.67  1.88 S  0.98 EXP

It indicates that hourly earnings increase by $1.88 for every extra year of schooling and by
$0.98 for every extra year of work experience.

21
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

ˆ
EARNINGS  14.67  1.88 S  0.98 EXP

Literally, the intercept indicates that an individual who had no schooling or work experience
would have hourly earnings of –$14.67.

22
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

ˆ
EARNINGS  14.67  1.88 S  0.98 EXP

Obviously, this is impossible. The lowest value of S in the sample was 8. We have obtained
a nonsense estimate because we have extrapolated too far from the data range.

23
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EARNINGS S EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 35.24
Model | 8735.42401 2 4367.712 Prob > F = 0.0000
Residual | 61593.5422 497 123.930668 R-squared = 0.1242
-----------+------------------------------ Adj R-squared = 0.1207
Total | 70328.9662 499 140.939812 Root MSE = 11.132
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

ˆ
EARNINGS  14.67  1.88 S  0.98 EXP

The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on
S, years of schooling, and EXP, years of work experience.

1
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

120

100
Hourly earnings ($)

80

60

40

20

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)

Suppose that you were particularly interested in the relationship between EARNINGS and S
and wished to represent it graphically, using the sample data.

2
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

120

100
Hourly earnings ($)

80

60

40

20

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)

A simple plot would be misleading.

3
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

120

. cor S EXP
100 (obs=500)
| S ASVABC
--------+------------------
Hourly earnings ($)

S| 1.0000
80
EXP| -0.5836 1.0000

60

40

20

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)

Schooling is negatively correlated with work experience. The plot fails to take account of
this, and as a consequence the regression line underestimates the impact of schooling on
earnings.
4
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

120

. cor S EXP
100 (obs=500)
| S ASVABC
--------+------------------
Hourly earnings ($)

S| 1.0000
80
EXP| -0.5836 1.0000

60

40

20

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)

We will investigate the distortion mathematically when we come to omitted variable bias.

5
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

120

. cor S EXP
100 (obs=500)
| S ASVABC
--------+------------------
Hourly earnings ($)

S| 1.0000
80
EXP| -0.5836 1.0000

60

40

20

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)

To eliminate the distortion, you purge both EARNINGS and S of their components related to
EXP and then draw a scatter diagram using the purged variables.

6
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EARNINGS EXP


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 0.06
Model | 8.36885807 1 8.36885807 Prob > F = 0.8078
Residual | 70320.5974 498 141.206019 R-squared = 0.0001
-----------+------------------------------ Adj R-squared = -0.0019
Total | 70328.9662 499 140.939812 Root MSE = 11.883
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
EXP | -.0442828 .1818981 -0.24 0.808 -.4016651 .3130996
_cons | 19.86614 1.287089 15.43 0.000 17.33735 22.39494
----------------------------------------------------------------------------

. predict EEARN, resid

We start by regressing EARNINGS on EXP, as shown above. The residuals are the part of
EARNINGS which is not related to EXP. The ‘predict’ command is the Stata command for
saving the residuals from the most recent regression. We name them EEARN.
7
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg S EXP
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 257.18
Model | 1278.43322 1 1278.43322 Prob > F = 0.0000
Residual | 2475.58878 498 4.9710618 R-squared = 0.3406
-----------+------------------------------ Adj R-squared = 0.3392
Total | 3754.022 499 7.52309018 Root MSE = 2.2296
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
EXP | -.5473191 .0341292 -16.04 0.000 -.6143741 -.4802641
_cons | 18.39324 .241494 76.16 0.000 17.91877 18.86771
----------------------------------------------------------------------------

. predict ES, resid

We do the same with S. We regress it on EXP and save the residuals as ES.

8
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

EEARN
80

60

40

20

0
-10 -8 -6 -4 -2 0 2 4 6 ES

-20

-40

Now we plot EEARN on ES and the scatter is a faithful representation of the relationship,
both in terms of the slope of the trend line (the solid line) and in terms of the variation about
that line.
9
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

EEARN
80

60

40

20

0
-10 -8 -6 -4 -2 0 2 4 6 ES

-20

-40

As you would expect, the trend line is steeper that in scatter diagram which did not control
for EXP (reproduced here as the dashed line).

10
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 70.56
Model | 8727.05507 1 8727.05507 Prob > F = 0.0000
Residual | 61593.5414 498 123.68181 R-squared = 0.1241
-----------+------------------------------ Adj R-squared = 0.1223
Total | 70320.5965 499 140.923039 Root MSE = 11.121
----------------------------------------------------------------------------
EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ES | 1.877563 .2235186 8.40 0.000 1.438408 2.316719
_cons | -1.32e-08 .4973566 -0.00 1.000 -.977176 .977176
----------------------------------------------------------------------------

Here is the regression of EEARN on ES.

11
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 70.56
Model | 8727.05507 1 8727.05507 Prob > F = 0.0000
Residual | 61593.5414 498 123.68181 R-squared = 0.1241
-----------+------------------------------ Adj R-squared = 0.1223
Total | 70320.5965 499 140.923039 Root MSE = 11.121
----------------------------------------------------------------------------
EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ES | 1.877563 .2235186 8.40 0.000 1.438408 2.316719
_cons | -1.32e-08 .4973566 -0.00 1.000 -.977176 .977176
----------------------------------------------------------------------------

From multiple regression:


. reg EARNINGS S EXP
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

A mathematical proof that the technique works requires matrix algebra. We will content
ourselves by verifying that the estimate of the slope coefficient is the same as in the
multiple regression.
12
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 70.56
Model | 8727.05507 1 8727.05507 Prob > F = 0.0000
Residual | 61593.5414 498 123.68181 R-squared = 0.1241
-----------+------------------------------ Adj R-squared = 0.1223
Total | 70320.5965 499 140.923039 Root MSE = 11.121
----------------------------------------------------------------------------
EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ES | 1.877563 .2235186 8.40 0.000 1.438408 2.316719
_cons | -1.32e-08 .4973566 -0.00 1.000 -.977176 .977176
----------------------------------------------------------------------------

From multiple regression:


. reg EARNINGS S EXP
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

Finally, a small and not very important technical point. You may have noticed that the
standard error and t statistic do not quite match. The reason for this is that the number of
degrees of freedom is overstated by 1 in the residuals regression.
13
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 70.56
Model | 8727.05507 1 8727.05507 Prob > F = 0.0000
Residual | 61593.5414 498 123.68181 R-squared = 0.1241
-----------+------------------------------ Adj R-squared = 0.1223
Total | 70320.5965 499 140.923039 Root MSE = 11.121
----------------------------------------------------------------------------
EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ES | 1.877563 .2235186 8.40 0.000 1.438408 2.316719
_cons | -1.32e-08 .4973566 -0.00 1.000 -.977176 .977176
----------------------------------------------------------------------------

From multiple regression:


. reg EARNINGS S EXP
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.877563 .2237434 8.39 0.000 1.437964 2.317163
EXP | .9833436 .2098457 4.69 0.000 .5710495 1.395638
_cons | -14.66833 4.288375 -3.42 0.001 -23.09391 -6.242752
----------------------------------------------------------------------------

That regression has not made allowance for the fact that we have already used up 1 degree
of freedom in removing EXP from the model.

14
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

This sequence investigates the variances and standard errors of the slope coefficients in a
model with two explanatory variables.

1
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

The expression for the variance of ̂ 2 is shown above. The expression for the variance of ̂ 3
is the same, with the subscripts 2 and 3 interchanged.

2
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

The first factor in the expression is identical to that for the variance of the slope coefficient
in a simple regression model.

3
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

The variance of ̂ 2 depends on the variance of the disturbance term, the number of
observations, and the mean square deviation of X 2 for exactly the same reasons as in a
simple regression model.
4
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

The difference is that in multiple regression analysis the expression is multiplied by a


factor which depends on the correlation between X 2 and X3.

5
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

The higher is the correlation between the explanatory variables, positive or negative, the
greater will be the variance.

6
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

This is easy to understand intuitively. The greater the correlation, the harder it is to
discriminate between the effects of the explanatory variables on Y, and the less accurate
will be the regression estimates.
7
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

Note that the variance expression above is valid only for a model with two explanatory
variables. When there are more than two, the expression becomes much more complex
and it is sensible to switch to matrix algebra.
8
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

 u2 1
standard deviation of ˆ 2  
  2i 2 
2 2
X  X 1  rX2 ,X3

The standard deviation of the distribution of ̂ 2 is of course given by the square root of its
variance.

9
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

 u2 1
standard deviation of ˆ 2  
  2i 2 
2 2
X  X 1  rX2 ,X3

With the exception of the variance of u, we can calculate the components of the standard
deviation from the sample data.

10
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

1 2 n k 2
E   uˆ i   u
n  n

The variance of u has to be estimated. The mean square of the residuals provides a
consistent estimator, but it is biased downwards by a factor (n – k) / n , where k is the
number of parameters, in a finite sample.
11
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

1 2 n k 2 1
E   uˆ i  
n  n
u ˆ 
2
u
n k
 i
ˆ
u 2

Obviously we can obtain an unbiased estimator by dividing the sum of the squares of the
residuals by n – k instead of n. We denote this unbiased estimator ˆ u .
2

12
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

True model Fitted model


Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

 u2 1  u2 1
 2ˆ    
2
  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

1 2 n k 2 1
E   uˆ i  
n  n
u ˆ 
2
u
n k
 i
ˆ
u 2

ˆ u2 1 ˆ u2 1
 
s.e. ˆ2   
 

  X 2i  X 2 
2 2 2
1  rX2 ,X3 n MSD X 2 1  rX2 , X3

Thus the estimate of the standard deviation of the probability distribution of ̂ 2 , known as
the standard error of ̂ 2 for short, is given by the expression above.

13
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==1


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 75
-----------+------------------------------ F( 2, 72) = 4.72
Model | 1027.91667 2 513.958336 Prob > F = 0.0119
Residual | 7841.35558 72 108.907716 R-squared = 0.1159
-----------+------------------------------ Adj R-squared = 0.0913
Total | 8869.27225 74 119.85503 Root MSE = 10.436
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.42955 .536452 2.66 0.010 .3601522 2.498947
EXP | .1918676 .5901747 0.33 0.746 -.9846242 1.368359
_cons | 1.01708 10.84695 0.09 0.926 -20.60593 22.64009
----------------------------------------------------------------------------

We will use this expression to analyze why the standard error of S is larger for the union
subsample than for the non-union subsample in wage equation regressions using Data Set
21.
14
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==1


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 75
-----------+------------------------------ F( 2, 72) = 4.72
Model | 1027.91667 2 513.958336 Prob > F = 0.0119
Residual | 7841.35558 72 108.907716 R-squared = 0.1159
-----------+------------------------------ Adj R-squared = 0.0913
Total | 8869.27225 74 119.85503 Root MSE = 10.436
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.42955 .536452 2.66 0.010 .3601522 2.498947
EXP | .1918676 .5901747 0.33 0.746 -.9846242 1.368359
_cons | 1.01708 10.84695 0.09 0.926 -20.60593 22.64009
----------------------------------------------------------------------------

To select a subsample in Stata, you add an ‘if’ statement to a command. The COLLBARG
variable is equal to 1 for respondents whose rates of pay are determined by collective
bargaining, and it is 0 for the others.
15
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==1


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 75
-----------+------------------------------ F( 2, 72) = 4.72
Model | 1027.91667 2 513.958336 Prob > F = 0.0119
Residual | 7841.35558 72 108.907716 R-squared = 0.1159
-----------+------------------------------ Adj R-squared = 0.0913
Total | 8869.27225 74 119.85503 Root MSE = 10.436
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.42955 .536452 2.66 0.010 .3601522 2.498947
EXP | .1918676 .5901747 0.33 0.746 -.9846242 1.368359
_cons | 1.01708 10.84695 0.09 0.926 -20.60593 22.64009
----------------------------------------------------------------------------

Note that in tests for equality, Stata requires the = sign to be duplicated.

16
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==1


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 75
-----------+------------------------------ F( 2, 72) = 4.72
Model | 1027.91667 2 513.958336 Prob > F = 0.0119
Residual | 7841.35558 72 108.907716 R-squared = 0.1159
-----------+------------------------------ Adj R-squared = 0.0913
Total | 8869.27225 74 119.85503 Root MSE = 10.436
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.42955 .536452 2.66 0.010 .3601522 2.498947
EXP | .1918676 .5901747 0.33 0.746 -.9846242 1.368359
_cons | 1.01708 10.84695 0.09 0.926 -20.60593 22.64009
----------------------------------------------------------------------------

In the case of the union subsample, the standard error of S is 0.5365.

17
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==0


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 425
-----------+------------------------------ F( 2, 422) = 29.48
Model | 7270.82789 2 3635.41394 Prob > F = 0.0000
Residual | 52043.2371 422 123.325206 R-squared = 0.1226
-----------+------------------------------ Adj R-squared = 0.1184
Total | 59314.065 424 139.891663 Root MSE = 11.105
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.866279 .2438803 7.65 0.000 1.386907 2.34565
EXP | 1.100186 .2223238 4.95 0.000 .6631858 1.537186
_cons | -15.9847 4.623791 -3.46 0.001 -25.07323 -6.896172
----------------------------------------------------------------------------

In the case of the non-union subsample, the standard error of S is 0.2439, less than half as
large.

18
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

ˆ u2 1 ˆ 2
1
s.e.   2  
ˆ   u

  X 2i  X 2 
2
1  r 2
X2 ,X3 n MSD  X 2  1  r 2
X2 ,X3

We will explain the difference by looking at the components of the standard error. It is
convenient to start by rearranging the expression for the standard error as the product of
the four factors.
19
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 0.5365

Non-union 0.2439

Factor product

Union

Non-union

We will arrange the components of the standard error as a table.

20
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==1


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 75
-----------+------------------------------ F( 2, 72) = 4.72
Model | 1027.91667 2 513.958336 Prob > F = 0.0119
Residual | 7841.35558 72 108.907716 R-squared = 0.1159
-----------+------------------------------ Adj R-squared = 0.0913
Total | 8869.27225 74 119.85503 Root MSE = 10.436
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.42955 .536452 2.66 0.010 .3601522 2.498947
EXP | .1918676 .5901747 0.33 0.746 -.9846242 1.368359
_cons | 1.01708 10.84695 0.09 0.926 -20.60593 22.64009
----------------------------------------------------------------------------

1
ˆ 
2
u RSS
n k

We will start with ˆ u . Here is RSS for the union subsample.

21
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==1


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 75
-----------+------------------------------ F( 2, 72) = 4.72
Model | 1027.91667 2 513.958336 Prob > F = 0.0119
Residual | 7841.35558 72 108.907716 R-squared = 0.1159
-----------+------------------------------ Adj R-squared = 0.0913
Total | 8869.27225 74 119.85503 Root MSE = 10.436
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.42955 .536452 2.66 0.010 .3601522 2.498947
EXP | .1918676 .5901747 0.33 0.746 -.9846242 1.368359
_cons | 1.01708 10.84695 0.09 0.926 -20.60593 22.64009
----------------------------------------------------------------------------

1
ˆ 
2
u RSS
n k

There are 75 observations in the non-union subsample. k is equal to 3. Thus n – k is equal


to 72.

22
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==1


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 75
-----------+------------------------------ F( 2, 72) = 4.72
Model | 1027.91667 2 513.958336 Prob > F = 0.0119
Residual | 7841.35558 72 108.907716 R-squared = 0.1159
-----------+------------------------------ Adj R-squared = 0.0913
Total | 8869.27225 74 119.85503 Root MSE = 10.436
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.42955 .536452 2.66 0.010 .3601522 2.498947
EXP | .1918676 .5901747 0.33 0.746 -.9846242 1.368359
_cons | 1.01708 10.84695 0.09 0.926 -20.60593 22.64009
----------------------------------------------------------------------------

1
ˆ 
2
u RSS
n k

RSS / (n – k) is equal to 108.908. To obtain ˆ u , we take the square root. This is 10.436.

23
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 0.5365

Non-union 0.2439

Factor product

Union

Non-union

We place this in the table, along with the number of observations.

24
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. reg EARNINGS S EXP if COLLBARG==0


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 425
-----------+------------------------------ F( 2, 422) = 29.48
Model | 7270.82789 2 3635.41394 Prob > F = 0.0000
Residual | 52043.2371 422 123.325206 R-squared = 0.1226
-----------+------------------------------ Adj R-squared = 0.1184
Total | 59314.065 424 139.891663 Root MSE = 11.105
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.866279 .2438803 7.65 0.000 1.386907 2.34565
EXP | 1.100186 .2223238 4.95 0.000 .6631858 1.537186
_cons | -15.9847 4.623791 -3.46 0.001 -25.07323 -6.896172
----------------------------------------------------------------------------

Similarly, in the case of the non-union subsample, ˆ u is the square root of 123.325, which is
11.105. We also note that the number of observations in that subsample is 425.

25
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 0.5365

Non-union 11.105 425 0.2439

Factor product

Union

Non-union

We place these in the table.

26
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 0.5365

Non-union 11.105 425 7.3467 0.2439

Factor product

Union

Non-union

We calculate the mean square deviation of S for the two subsamples from the sample data.

27
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

. cor S EXP if COLLBARG==1


(obs=75)
| S EXP
--------+------------------
S | 1.0000
EXP | -0.5866 1.0000

. cor S EXP if COLLBARG==0


(obs=425)
| S EXP
--------+------------------
S | 1.0000
EXP | -0.5796 1.0000

The correlation coefficients for S and EXP are –0.5866 and –0.5796 for the union and non-
union subsamples, respectively. (Note that "cor" is the Stata command for computing
correlations.)
28
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union

Non-union

These entries complete the top half of the table. We will now look at the impact of each
item on the standard error, using the mathematical expression at the top.

29
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union 10.436

Non-union 11.105

The ˆ u components need no modification. It is a little larger for the non-union subsample,
and so has an adverse effect on the standard error.

30
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union 10.436 0.1155

Non-union 11.105 0.0485

The number of observations is much larger for the non-union subsample, so the second
factor is much smaller than that for the union subsample.

31
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union 10.436 0.1155 0.3605

Non-union 11.105 0.0485 0.3689

Perhaps surprisingly, the variance in schooling is similar for the two subsamples.

32
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union 10.436 0.1155 0.3605 1.2348

Non-union 11.105 0.0485 0.3689 1.2271

The correlation between schooling and work experience is also similar for the two
subsamples. Note that the sign of the correlation makes no difference since it is squared.

33
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union 10.436 0.1155 0.3605 1.2348 0.5366

Non-union 11.105 0.0485 0.3689 1.2271 0.2439

Multiplying the four factors together, we obtain the standard errors. (The discrepancy in
the last digit of the union standard error has been caused by rounding error.)

34
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union 10.436 0.1155 0.3605 1.2348 0.5366

Non-union 11.105 0.0485 0.3689 1.2271 0.2439

We see that the reason that the standard error is smaller for the non-union subsample is
that there are far more observations than in the non-union subsample. Otherwise the
standard errors would have been about the same.
35
PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

1 1 1
 
ˆ
s.e.  2 ˆ u  
n MSD( X 2 )

1  rX22 , X 3

Decomposition of the standard error of S

Component ˆ u n MSD(S) r S, EXP s.e.

Union 10.436 75 7.6932 –0.5866 0.5365

Non-union 11.105 425 7.3467 –0.5796 0.2439

Factor product

Union 10.436 0.1155 0.3605 1.2348 0.5366

Non-union 11.105 0.0485 0.3689 1.2271 0.2439

The goodness of fit, as measured by ˆ u , is slightly inferior for the non-union sample and
this has a marginal offsetting effect. The other two factors are very similar for the two
subsamples.
36
Copyright Christopher Dougherty 2016.

These slideshows may be downloaded by anyone, anywhere for personal use.


Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.

The content of this slideshow comes from Section 3.2 of C. Dougherty,


Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
www.oxfordtextbooks.co.uk/orc/dougherty5e/.

Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.

2016.04.28

You might also like