Chapter 6
Chapter 6
Outline
1. Omitted variable bias
2. Using regression to estimate causal effects
3. Multiple regression and OLS
4. Measures of fit
5. Sampling distribution of the OLS estimator with multiple regressors
6. Control variables
Page 1 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
In the class size example, 𝛽" is the causal effect on test scores of a change in the 𝑆𝑇𝑅 by
one student per teacher.
When 𝛽" is a causal effect, the first least squares assumption must hold: 𝐸 𝑢 𝑋 = 0.
The error 𝑢 arises because of the factors or variables that influence 𝑌 but are not included
in the regression function. There are always omitted variables!
If the omission of those variables results in 𝐸 𝑢 𝑋 ≠ 0, then the OLS estimator will be
biased.
The bias in the OLS estimator that occurs as a result of an omitted factor, or variable, is
called omitted variable bias.
Page 2 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
For omitted variable bias to occur, the omitted variable, 𝑍, must satisfy two conditions:
Both conditions must hold for the omission of 𝑍 to result in omitted variable bias.
1. English language ability (whether the student has English as a second language)
plausibly affects standardized test scores: 𝑍 is a determinant of 𝑌
2. Immigrant communities tend to be less affluent and thus have smaller school
budgets and higher 𝑆𝑇𝑅: 𝑍 is correlated with 𝑋
Page 3 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
A formula for omitted variable bias: recall the equation:
1 6
57" 𝜈5
6
57" 𝑋5 − 𝑋 𝑢5 𝑛
𝛽" − 𝛽" = =
6
57"(𝑋5 − 𝑋)
8 𝑛−1 8
𝑠=
𝑛
where 𝜈5 = 𝑋5 − 𝑋 𝑢5 ≈ 𝑋5 − 𝜇= 𝑢5 .
Page 4 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Let 𝛽" be the causal effect. Under LSA #2 and LSA #3 (that is, even if LSA #1 does not
hold),
1 6
𝑋5 − 𝑋 𝑢5 B 𝜎=D
𝛽" − 𝛽" = 𝑛
57"
1 6 𝜎=8
57"(𝑋5 − 𝑋)8
𝑛
𝜎D 𝜎=D 𝜎D
= × = 𝜌
𝜎= 𝜎= 𝜎D 𝜎= =D
where 𝜌=D = 𝑐𝑜𝑟𝑟(𝑋, 𝑢). If LSA #1 is correct, then 𝜌=D = 0. If not, we have
B 𝜎D
𝛽" 𝛽" + 𝜌
𝜎= =D
à the OLS estimator 𝛽 is biased and is not consistent. Strength and direction of the bias
are determined by 𝜌=D , the correlation between the error term and the regressor.
Page 5 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
In our example, let us think about the bias induced by omitted the share of English
learning students (PctEL). When the estimated regression model does not include PctEL
as a regressor, the true model is
𝑌5 = 𝛽I + 𝛽" 𝑆𝑇𝑅 + 𝛽8 𝑃𝑐𝑡𝐸𝐿 + 𝑢5
Page 6 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
We can look at this another way. Districts with few ESL students: (1) do better on
standardized tests and (2) have smaller classes (bigger budgets). So ignoring the effect of
having many ESL students would result in overstating the class size effect. Is this going
on in the data?
Page 7 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
The test score/ STR/fraction of English learners example shows that, if an omitted
variable satisfies the two conditions for omitted variable bias, then the OLS estimator in
the regression omitting that variable is biased and inconsistent. So, even if 𝑛 is large, 𝛽"
will not be close to 𝛽" .
In this example, we are clearly interested in the causal effect: what do we expect to
happen to test scores if the superintendent reduces the class size?
Page 8 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
• Ideal: subjects all follow the treatment protocol – perfect compliance, no errors in
reporting, etc.
• Randomized: subjects from the population of interest are randomly assigned to a
treatment or control group (so there are no confounding factors)
• Controlled: having a control group permits measuring the differential effect of the
treatment
• Experiment: the treatment is assigned as part of the experiment: the subjects have
no choice, so there is no reverse causality in which subjects choose the treatment
they think will work best
Page 9 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 10 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Randomization implies that any differences between treatment and control groups are
random – not systematically related to the treatment.
We can eliminate the difference in percent English learners between large (control) and
small (treatment) class groups by examining the effect of class size among districts with
the same percent of English learners.
- If the only systematic difference between the large and small class size groups is the
percent English learners, then we are back to the randomized controlled experiment
- This is one way to control for the effect of percent English learners when estimating
the effect of 𝑆𝑇𝑅
Page 11 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 12 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 13 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
• the OLS estimator minimizes the average squared difference between the actual values
of 𝑌5 and the prediction based on the estimated line
• this minimization problem is solved using calculus
• this yields the OLS estimators of 𝛽I , 𝛽" , 𝛽8
Page 14 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Include the percent English learners: 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 686.0 − 1.10×𝑆𝑇𝑅 − 0.65𝑃𝑐𝑡𝐸𝐿
Page 15 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 16 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
As in regression with a single regressor, the 𝑆𝐸𝑅 and 𝑅𝑀𝑆𝐸, are measures of the spread
of the 𝑌𝑠 around the regression line:
6
1
𝑆𝐸𝑅 = 𝑢58
𝑛−𝑘−1
57"
6
1
𝑅𝑀𝑆𝐸 = 𝑢58
𝑛
5
Page 17 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
The 𝑅8 is the fraction of the variance explained – same definition as in regression with a
single regressor:
𝐸𝑆𝑆 𝑆𝑆𝑅
𝑅8 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
6 6 8 6
where 𝐸𝑆𝑆 = 57"(𝑌5 − 𝑌)8 , 𝑆𝑆𝑅 = 57" 𝑢5 , 𝑇𝑆𝑆 = 57"(𝑌5 − 𝑌)8
• The 𝑅8 always increases when you add another regressor – a bit of a problem for a
measure of fit
• The 𝑅8 corrects this problem by penalizing you for including another regressor – the
𝑅8 does not necessarily increase when you add another regressor
𝑛−1 𝑆𝑆𝑅
𝑅8 = 1 −
𝑛 − 𝑘 − 1 𝑇𝑆𝑆
Page 18 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 19 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Assumption #1: The conditional mean of 𝑢 given the 𝑋s is zero
t test of coefficients:
Page 21 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 22 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Back to Multicollinearity: Perfect and Imperfect
Perfect multicollinearity is when one of the regressors is an exact linear function of the
other regressors.
Some examples:
• the example from before – you include 𝑆𝑇𝑅 twice
• regress 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 on a constant, 𝐷, and 𝐵, where 𝐷5 = 1 if 𝑆𝑇𝑅 ≤ 20, and 0,
otherwise; 𝐵5 = 1 if 𝑆𝑇𝑅 > 20, and 0, otherwise. So, 𝐵5 = 1 − 𝐷5 (exact linear
function) and there is perfect multicollinearity. This is an example of the dummy
variable trap.
Page 23 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 24 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Let’s look at an example:
Suppose we would like to relate wages to gender. Suppose we create two dummy
variables:
𝑀𝑎𝑙𝑒 = 1 if the individual is male, 0 if the individual is female;
𝐹𝑒𝑚𝑎𝑙𝑒 = 1 if the individual is female, 0 if the individual is male
Page 25 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
(iii) 𝑊𝑎𝑔𝑒 = 𝛿I + 𝛿" 𝑀𝑎𝑙𝑒 + 𝛿8 𝐹𝑒𝑚𝑎𝑙𝑒 + 𝜀5
For men, 𝑀𝑎𝑙𝑒 = 1, 𝐹𝑒𝑚𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛿I + 𝛿" + 𝜀5
For women, 𝐹𝑒𝑚𝑎𝑙𝑒 = 1, 𝑀𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛿I + 𝛿8 + 𝜀5
- 𝛿I + 𝛿" is the population mean for men
- 𝜹𝟎 + 𝜹𝟐 is the population mean for women
à 𝛿I cannot be estimated
à model suffers from perfect multicollinearity – you are including the same
information twice
Page 26 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Imperfect multicollinearity occurs when two or more regressors are very highly
correlated.
Imperfect multicollinearity implies that one or more of the regression coefficients will be
imprecisely estimated.
• The idea: the coefficient on 𝑋" is the effect of 𝑋" holding 𝑋8 constant; but if 𝑋" and 𝑋8
are highly correlated, there is very little variation in 𝑋" once 𝑋8 is held constant – so
the data don’t contain much information about what happens when 𝑋" changes but 𝑋8
doesn’t. This means that the variance of the OLS estimator of the coefficient on 𝑋"
will be large.
• Imperfect multicollinearity results in large standard errors for one or more of the OLS
coefficients
Page 27 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Page 28 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
A control variable W is a regressor included to hold constant factors that, if neglected,
could lead the estimated causal effect of interest to suffer from omitted variable bias.
1. An effective control variable is one which, when included in the regression, makes
the error term uncorrelated with the variable of interest.
2. Holding constant the control variable(s), the variable of interest is as if randomly
assigned.
3. Among individuals/observations with the same value of the control variable(s), the
variable of interest is uncorrelated with the omitted determinants of 𝑌
Control variables need not be causal and their coefficients generally do not have a causal
interpretation.
Page 30 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
The math of control variables: conditional mean independence
Let 𝑋5 denote the variable of interest and 𝑊5 denote the control variable(s). 𝑊 is an
effective control variable if conditional mean independence holds:
Page 31 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Consider the regression model
𝑌 = 𝛽I + 𝛽" 𝑋 + 𝛽8 𝑊 + 𝑢
where 𝑋 is the variable of interest, 𝛽" is its causal effect, and 𝑊 is an effective control
variable so that conditional mean independence holds:
𝐸 𝑢5 𝑋5 , 𝑊5 = 𝐸(𝑢5 |𝑊5 )
Page 32 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
Under conditional mean independence:
The math: The expected change in 𝑌 resulting from a change in 𝑋, holding (a single)
𝑊 constant, is:
𝐸 𝑌 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 − 𝐸 𝑌 𝑋 = 𝑥, 𝑊 = 𝑤
= 𝛽I + 𝛽" 𝑥 + ∆𝑥 + 𝛽8 𝑤 + 𝐸 𝑢 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 − [𝛽I + 𝛽" 𝑥 + 𝛽8 𝑤
+ 𝐸 𝑢 𝑋 = 𝑥, 𝑊 = 𝑤
= 𝛽" ∆𝑥 + [𝐸 𝑢 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 − 𝐸 𝑢 𝑋 = 𝑥, 𝑊 = 𝑤 = 𝛽" ∆𝑥
where the final line follows from conditional mean independence:
𝐸 𝑢 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 = 𝐸 𝑢 𝑋 = 𝑥, 𝑊 = 𝑤 = 𝐸(𝑢|𝑊 = 𝑤)
2. 𝛽" is unbiased.
Page 33 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
3. 𝛽8 , does not in general estimate a causal effect
The math:
Consider the regression model,
𝑌 = 𝛽I + 𝛽" 𝑋 + 𝛽8 𝑊 + 𝑢
where 𝑢 satisfies the conditional mean independence assumption and where 𝛽" and
𝛽8 are causal effects.
𝐸 𝑢 𝑋, 𝑊 = 𝐸 𝑢 𝑊 = 𝛾I + 𝛾8 𝑊 (∗)
Let
𝜈 = 𝑢 − 𝐸 𝑢 𝑋, 𝑊 (∗∗)
𝑌 = 𝛽I + 𝛽" 𝑋 + 𝛽8 𝑊 + 𝑢
= 𝛽I + 𝛽" 𝑋 + 𝜷𝟐 𝑾 + 𝛾I + 𝛾8 𝑊 + 𝑣
= (𝛽I + 𝛾I ) + 𝛽" 𝑋 + (𝜷𝟐 +𝜸𝟐 )𝑾 + 𝑣
= 𝛿I + 𝛽" 𝑋 + 𝜹𝟐 𝑾 + 𝑣, 𝛿I = 𝛽I + 𝛾I , 𝛿8 = 𝛽8 +𝛾8
Page 35 of 36
CHAPTER 6: LINEAR REGRESSION WITH
MULTIPLE REGRESSORS
In summary, if 𝑊 is such that the conditional mean independence is satisfied, then:
Page 36 of 36