Lecture6 MultiRegEstimate
Lecture6 MultiRegEstimate
Econometrics 1
Lecture Note 6
Multiple Linear Regression - Estimation
1 / 50
Summary of Key Concepts
2 / 50
Building our Econometric Toolkit
3 / 50
Student Test Scores and Class Size
100
VCE Economics Class Average Test Score
20 40 60
0 80
0 5 10 15 20 25 30 35
Number of Students in the Class
4 / 50
Student Test Scores and Household Income
100
VCE Economics Class Average Test Score
20 40 60
0 80
20 30 40
Average Household Income (1000's)
5 / 50
Class Size and Household Income
35 30
Number of Students in the Class
10 15 20
5
0 25
20 30 40
Average Household Income (1000's)
6 / 50
Econometrically Modelling Test Scores
7 / 50
Econometrically Modelling Test Scores
TestScorei = β0 + β1 ClassSizei + ui
8 / 50
Econometrically Modelling Test Scores
TestScorei = β0 + β1 ClassSizei + ui
▶ The OLS coefficient will fail to isolate the direct link between
ClassSizei and TestScorei
▶ Why?
▶ Because ↓ Incomei → (↑ ClassSizei , ↓ TestScorei )
▶ So the OLS estimate β̂1 from the single linear regression will
be driven by two forces:
1. (↑ ClassSizei , ↓ TestScorei ), the negative direct relationship we
want to determine empirically
2. ↓ Incomei → (↑ ClassSizei , ↓ TestScorei ), a separate negative
indirect correlation between ClassSizei and TestScorei due to
differences in Incomei across classes
9 / 50
Econometrically Modelling Test Scores
10 / 50
Econometrically Modelling Test Scores
11 / 50
Econometrically Modelling Test Scores
▶ Should we expect the β̂1 estimate to be bigger or smaller than
the population value of β1 ?
▶ Conceptually, we interpretting what our OLS estimate β̂1
means, we can think of it containing two parts:
β̂1 = β1 + γ
|{z} |{z}
direct indirect
where:
▶ β1 : (↑ ClassSizei , ↓ TestScorei ), the true negative direct class
size – test score relationship we want to determine empirically
▶ γ: ↓ Incomei → (↑ ClassSizei , ↓ TestScorei ), the negative
indirect class size – test score relationship being driven by
differences in income across classes
▶ Given we expect γ < 0, this means that we can expect our
single linear regression estimate to yield β̂1 < β1 , which
means that it gives a biased estimate of the direct class size –
test score relationship
12 / 50
Econometrically Modelling Test Scores
13 / 50
Omitted Variable Bias
14 / 50
Omitted Variable Bias
15 / 50
Omitted Variable Bias and OLS Assumption #1
16 / 50
Formula for Omitted Variable Bias
17 / 50
Implications of Omitted Variable Bias
σu
β̂1 → β1 + ρXu
σX
▶ With omitted variable bias, as n gets large, β̂1 does not get
close to β1 with high probability
▶ The bias term ρXu σσu exists even if n is very large
X
▶ The size of the bias depends on the magnitude of ρXu
▶ The direction of the bias in β̂1 depends on the sign of ρXu
(whether it’s positive or negative)
18 / 50
Signing Omitted Variable Bias
▶ In our example, we had a positive relationship between our
omitted variable Incomei and our outcome variable Y which
was TestScorei .
▶ This means Incomei enters ui in the single linear regression
with a positive sign (+).
▶ Further, there was a negative (-) relationship between our
omitted variable Incomei and our independent variable X
which was ClassSizei
▶ Therefore, the sign of the correlation between X and u is
given by sign[ρXu ]=sign[(+) × (-)]=(-)
▶ Given that
σu
β̂1 → β1 + ρXu
|{z} σX
(-)
20 / 50
Fixing Omitted Variable Bias
Source of the bias: variation in income
35 30
Number of Students in the Class
10 15 20
5
0 25
20 30 40
Average Household Income (1000's)
21 / 50
Fixing Omitted Variable Bias
Fixing the problem: taking a sub-sample with similar income
35
1. Take a sub-sample of schools with
30
average household income between
$29,000 and $31,000
Number of Students in the Class
20 30 40
Average Household Income (1000's)
22 / 50
Fixing Omitted Variable Bias
Test score – class size relationship based on the sub-sample
100
VCE Economics Class Average Test Score
20 40 60
0 80
0 5 10 15 20 25 30 35
Number of Students in the Class
23 / 50
Fixing Omitted Variable Bias
24 / 50
Multiple Linear Regression
25 / 50
Population Regression Model
▶ Population regression model with k regressors is defined as:
Yi = β0 + β1 X1i + β2 X2i + ui , i = 1, . . . , n
26 / 50
Control Variables
27 / 50
Coefficient Interpretation
Yi = β0 + β1 X1i + β2 X2i + ui
▶ The intercept β0 is called the constant term and it is
interpreted as the average value of Yi when X1i = 0 and
X2i = 0
▶ We can equivalently write the regression including a third
regressor X0i which is a dummy variable that equals one for
all observations:
29 / 50
Heteroskedasticity
30 / 50
Student Test Score Example
31 / 50
OLS Estimation with Multiple Linear Regression
▶ Just like with the singe linear regression, we use the Ordinary
Least Squares (OLS) estimator to estimate the regression
coefficients of a multiple linear regression model
▶ Recall that the OLS estimator aims to find the regression
coefficients that together minimise the mistakes the model
makes in predicting the dependent variable Yi given the k
regressors X1i , X2i , . . . , Xki
▶ For a given set of regression coefficients, b0 , b1 , b2 , . . . , bk ,
the model’s mistake in predicting Yi is:
Yi − b0 − b1 X1i − b2 X2i − . . . − bk Xki
▶ The sum of squared prediction mistakes across all i = 1, . . . , n
observations is:
2
Xn
Yi − b0 − b1 X1i − b2 X2i − . . . − bk Xki
| {z }
i=1 prediction mistake
32 / 50
OLS Estimation with Multiple Linear Regression
▶ The OLS estimators of β0 , β1 , β2 , . . . , βk correspond to the
b0 , b1 , b2 , . . . , bk values that together minimise the sum of
squared prediction mistakes
▶ As usual, the OLS estimators are denoted by β̂0 , β̂1 , β̂2 , . . . , β̂k
▶ The OLS regression function is the (k-dimensional) line
constructed using the OLS estimators:
ûi = Yi − Ŷi
33 / 50
Test Scores and Class Size Example
34 / 50
Measures of Fit in Multiple Linear Regression
ESS SSR
R2 = =1−
TSS TSS
Pn 2
Pn = i=1 (Ŷi2 − Ȳ )
where the explained sum of squares ESS
and the total sum of squares TSS = i=1 (Yi − Ȳ )
35 / 50
Measures of Fit in Multiple Linear Regression
n − 1 SSR s2
R̄ 2 = 1 − = 1 − 2û
n − k − 1 TSS sY
36 / 50
Measures of Fit in Multiple Linear Regression
n − 1 SSR s2
R̄ 2 = 1 − = 1 − 2û
n − k − 1 TSS sY
▶ R̄ 2 is always less than than R 2 and therefore always less than 1
▶ Adding a regressor to the regression has two effects on R̄ 2 :
▶ SSR falls, which causes R̄ 2 to rise
n−1
▶ n−k−1 (because k goes up), which causes R̄ 2 to fall
▶ R̄ 2 can actually be negative if all the regressors together do
n−1
not decrease enough SSR to offset the n−k−1 factor
37 / 50
Measures of Fit in Multiple Linear Regression
38 / 50
Test Scores and Class Size Example
39 / 50
Beware Interpretations of R 2 and R̄ 2
40 / 50
The Least Squares Assumptions in Multiple Linear
Regression
41 / 50
Perfect Multicollinearity
▶ Two regressors exhibit perfect multicollinearity if one of the
regressors is a perfect linear combination of other regressors
▶ Assumption 4 requires that no regressors exhibit perfect
multicollinearity
▶ Example: suppose you tried to run this regression by accident:
42 / 50
Perfect Multicollinearity
43 / 50
Perfect Multicollinearity Example: Huge Classes
▶ Supposed we created a dummy variable HugeClassi which
equals one if a class has more than 35 students and is 0
otherwise. Here is the regression:
45 / 50
Dummy Variable Trap
TestScorei = β0 +β1 ClassSizei +β2 Incomei +β3 Urbani +β4 Regionali +ui
46 / 50
Dummy Variable Trap
▶ The situation when a group of dummy variables add up to
always equal another dummy variable (or the constant
regressor) is called the dummy variable trap
▶ You can avoid the dummy variable trap by dropping one of
the dummy variables (or dropping the constant):
48 / 50
Multicollinearity
49 / 50
Distribution of OLS Estimators in Multiple Linear
Regression
▶ Because random samples vary from one sample to the next,
different samples product different values for the OLS
estimators, β̂0 , β̂1 , β̂2 , . . . , β̂k
▶ That is, these estimators are random variables with a
distribution
▶ Under the 4 least squares assumptions, the OLS estimators
β̂0 , β̂1 , β̂2 , . . . , β̂k are unbiased and consistent estimators of
their population true values β0 , β1 , β2 , . . . , βk
▶ In large samples, the sampling distribution of
β̂0 , β̂1 , β̂2 , . . . , β̂k is well approximated by a multivariate
normal distribution, with each β̂j having a marginal
distribution that is N(βj , σβ̂2 ) for j = 0, 1, 2, . . . , k
j
▶ We can use these results to conduct hypothesis tests with
multiple linear regression models using t-statistics and p-values
similar to what we did with single linear regression models
50 / 50