0% found this document useful (0 votes)
38 views

Chapter 6

This document discusses linear regression with multiple regressors. It covers omitted variable bias, which occurs when an omitted variable is correlated with both the dependent and independent variables. This can result in biased coefficient estimates. The document also discusses using regression to estimate causal effects from observational data, and how this differs from an ideal randomized controlled experiment where treatment is randomly assigned. It provides an example of estimating the causal effect of class size on test scores, and how omitting the percentage of English learners could lead to omitted variable bias.

Uploaded by

shayaan ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Chapter 6

This document discusses linear regression with multiple regressors. It covers omitted variable bias, which occurs when an omitted variable is correlated with both the dependent and independent variables. This can result in biased coefficient estimates. The document also discusses using regression to estimate causal effects from observational data, and how this differs from an ideal randomized controlled experiment where treatment is randomly assigned. It provides an example of estimating the causal effect of class size on test scores, and how omitting the percentage of English learners could lead to omitted variable bias.

Uploaded by

shayaan ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CHAPTER

6: LINEAR REGRESSION WITH



MULTIPLE REGRESSORS

Outline
1. Omitted variable bias
2. Using regression to estimate causal effects
3. Multiple regression and OLS
4. Measures of fit
5. Sampling distribution of the OLS estimator with multiple regressors
6. Control variables

Page 1 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Omitted Variable Bias

In the class size example, 𝛽" is the causal effect on test scores of a change in the 𝑆𝑇𝑅 by
one student per teacher.

When 𝛽" is a causal effect, the first least squares assumption must hold: 𝐸 𝑢 𝑋 = 0.

The error 𝑢 arises because of the factors or variables that influence 𝑌 but are not included
in the regression function. There are always omitted variables!

If the omission of those variables results in 𝐸 𝑢 𝑋 ≠ 0, then the OLS estimator will be
biased.

The bias in the OLS estimator that occurs as a result of an omitted factor, or variable, is
called omitted variable bias.

Page 2 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

For omitted variable bias to occur, the omitted variable, 𝑍, must satisfy two conditions:

1. 𝑍 is a determinant of Y (i.e. 𝑍 is part of 𝑢); and


2. 𝑍 is correlated with the regressor 𝑋 (i.e 𝐶𝑜𝑟𝑟(𝑍, 𝑋) ≠ 0)

Both conditions must hold for the omission of 𝑍 to result in omitted variable bias.

In the test score example:

1. English language ability (whether the student has English as a second language)
plausibly affects standardized test scores: 𝑍 is a determinant of 𝑌
2. Immigrant communities tend to be less affluent and thus have smaller school
budgets and higher 𝑆𝑇𝑅: 𝑍 is correlated with 𝑋

Accordingly, 𝛽" is biased. What is the direction of this bias?


• What does common sense suggest?
• If common sense fails you, there is a formula …

Page 3 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

A formula for omitted variable bias: recall the equation:

1 6
57" 𝜈5
6
57" 𝑋5 − 𝑋 𝑢5 𝑛
𝛽" − 𝛽" = =
6
57"(𝑋5 − 𝑋)
8 𝑛−1 8
𝑠=
𝑛

where 𝜈5 = 𝑋5 − 𝑋 𝑢5 ≈ 𝑋5 − 𝜇= 𝑢5 .

Under Least Squares Assumption #1: 𝐸 𝑋5 − 𝜇= 𝑢5 = 𝑐𝑜𝑣 𝑋5 , 𝑢5 = 0.

But what if 𝐸 𝑋5 − 𝜇= 𝑢5 = 𝑐𝑜𝑣(𝑋5 , 𝑢5 ) ≠ 0?

Page 4 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Let 𝛽" be the causal effect. Under LSA #2 and LSA #3 (that is, even if LSA #1 does not
hold),

1 6
𝑋5 − 𝑋 𝑢5 B 𝜎=D
𝛽" − 𝛽" = 𝑛
57"
1 6 𝜎=8
57"(𝑋5 − 𝑋)8
𝑛
𝜎D 𝜎=D 𝜎D
= × = 𝜌
𝜎= 𝜎= 𝜎D 𝜎= =D

where 𝜌=D = 𝑐𝑜𝑟𝑟(𝑋, 𝑢). If LSA #1 is correct, then 𝜌=D = 0. If not, we have

B 𝜎D
𝛽" 𝛽" + 𝜌
𝜎= =D
à the OLS estimator 𝛽 is biased and is not consistent. Strength and direction of the bias
are determined by 𝜌=D , the correlation between the error term and the regressor.

Page 5 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

In our example, let us think about the bias induced by omitted the share of English
learning students (PctEL). When the estimated regression model does not include PctEL
as a regressor, the true model is
𝑌5 = 𝛽I + 𝛽" 𝑆𝑇𝑅 + 𝛽8 𝑃𝑐𝑡𝐸𝐿 + 𝑢5

where STR and PctEL are correlated, we have 𝜌MNO,PQRST ≠ 0.

Omitting PctEL leads to a negatively biased estimate 𝛽" . As a consequence, we expect


𝛽" , the coefficient on STR, to be too large in absolute value. Put differently, the OLS
estimate of 𝛽" suggests that small classes improve tests scored, but that the effect of small
classes is overestimated as it captures the effect of having fewer English learners too.

Page 6 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

We can look at this another way. Districts with few ESL students: (1) do better on
standardized tests and (2) have smaller classes (bigger budgets). So ignoring the effect of
having many ESL students would result in overstating the class size effect. Is this going
on in the data?

• Districts with fewer English learners have higher test scores


• Districts with fewer English learners have smaller class sizes
• Among districts with comparable English learners, the effect of class size is small
(overall test score gap is 7.4)

Page 7 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Using regression to estimate causal effects

The test score/ STR/fraction of English learners example shows that, if an omitted
variable satisfies the two conditions for omitted variable bias, then the OLS estimator in
the regression omitting that variable is biased and inconsistent. So, even if 𝑛 is large, 𝛽"
will not be close to 𝛽" .

In this example, we are clearly interested in the causal effect: what do we expect to
happen to test scores if the superintendent reduces the class size?

What precisely is a causal effect?


• Causality is a complex concept
• In this course, we take a practical approach to causality: A causal effect is defined to
be the effect measured in an ideal randomized controlled experiment

Page 8 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Ideal Randomized Controlled Experiment

• Ideal: subjects all follow the treatment protocol – perfect compliance, no errors in
reporting, etc.
• Randomized: subjects from the population of interest are randomly assigned to a
treatment or control group (so there are no confounding factors)
• Controlled: having a control group permits measuring the differential effect of the
treatment
• Experiment: the treatment is assigned as part of the experiment: the subjects have
no choice, so there is no reverse causality in which subjects choose the treatment
they think will work best

Page 9 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

In our class size example:

Imagine an ideal randomized controlled experiment for measuring the effect on


𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 of reducing 𝑆𝑇𝑅.
• In that experiment, students would be randomly assigned to classes, which would
have different sizes
• Because they are randomly assigned, all student characteristics (and thus 𝑢5 ) would
be distributed independently of 𝑆𝑇𝑅5
• Thus 𝐸 𝑢5 𝑆𝑇𝑅5 = 0. That is, LSA #1 holds in a randomized controlled
experiment.

How does our observational data differ from this ideal?


• The treatment is not randomly assigned
• Consider, the percent English learners 𝑍 in the district. It plausibly satisfies the
two criteria for omitted variable bias.
o A determinant of 𝑌; and
o Correlated with the regressor 𝑋
àThus the control and treatment groups differ in a systematic way, so 𝑐𝑜𝑟𝑟(𝑍, 𝑋) ≠ 0

Page 10 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Randomization implies that any differences between treatment and control groups are
random – not systematically related to the treatment.

We can eliminate the difference in percent English learners between large (control) and
small (treatment) class groups by examining the effect of class size among districts with
the same percent of English learners.
- If the only systematic difference between the large and small class size groups is the
percent English learners, then we are back to the randomized controlled experiment
- This is one way to control for the effect of percent English learners when estimating
the effect of 𝑆𝑇𝑅

Page 11 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Let’s return to omitted variable bias.

Three ways to overcome omitted variable bias:


1. Run a randomized controlled experiment in which treatment (𝑆𝑇𝑅) is randomly
assigned; then the percent of English learners is still a determinant of 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒,
but it is uncorrelated with 𝑆𝑇𝑅 (à this solution is rarely feasible).
2. Adopt the “cross tabulation” approach with finer gradations of 𝑆𝑇𝑅 and percent of
English learners, so we control for the percent of English learners (à but will soon
run out of data, and what about other determinants like family income and parental
education?)
3. Use a regression in which the omitted variable is no longer omitted: include the
percent of English learners as an additional regressor in a multiple regression.

Page 12 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

The Population Multiple Regression Model


Consider the case of two regressors:

𝑌5 = 𝛽I + 𝛽" 𝑋"5 + 𝛽8 𝑋85 + 𝑢5

• 𝑌 is the dependent variable


• 𝑋" , 𝑋8 are the two independent variables (regressors)
• (𝑌5 , 𝑋"5 , 𝑋85 ) denote the 𝑖 th observation on 𝑌, 𝑋" , 𝑋8
• 𝛽I = unknown population intercept
= predicted value of 𝑌 when 𝑋" = 𝑋8 = 0
• 𝛽" = effect on 𝑌 of a change in 𝑋" , holding 𝑋8 constant
= 𝜕𝑌/𝜕𝑋" , holding 𝑋8 constant
• 𝛽8 = effect on 𝑌 of a change in 𝑋8 , holding 𝑋" constant
= 𝜕𝑌/𝜕𝑋8 , holding 𝑋" constant
• 𝑢5 = the regression error (omitted factors)

Page 13 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

With two regressors, the OLS estimator solves:

𝑚𝑖𝑛]^ ,]_ ,]` (𝑌5 − 𝑏I − 𝑏" 𝑋"5 − 𝑏8 𝑋85 )8

• the OLS estimator minimizes the average squared difference between the actual values
of 𝑌5 and the prediction based on the estimated line
• this minimization problem is solved using calculus
• this yields the OLS estimators of 𝛽I , 𝛽" , 𝛽8

Page 14 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Example using Test Score data

Regression of 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 against 𝑆𝑇𝑅: 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 698.9 − 2.28×𝑆𝑇𝑅

Include the percent English learners: 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 686.0 − 1.10×𝑆𝑇𝑅 − 0.65𝑃𝑐𝑡𝐸𝐿

• What happens to the coefficient on 𝑆𝑇𝑅?


• Note: 𝑐𝑜𝑟𝑟 𝑆𝑇𝑅, 𝑃𝑐𝑡𝐸𝐿 = 0.10

> model<-lm(testscr~str+el_pct,data=caschool)
> coeftest(model, vcov = vcovHC(model, type = "HC1"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 686.032249 8.728224 78.5993 < 2e-16 ***
str -1.101296 0.432847 -2.5443 0.01131 *
el_pct -0.649777 0.031032 -20.9391 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 15 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Measures of Fit for Multiple Regression



Actual = predicted + residual (𝑌5 = 𝑌5 + 𝑢5 )

𝑆𝐸𝑅 = standard deviation of 𝑢5 (with degrees of freedom correction)


𝑅𝑀𝑆𝐸 = standard deviation of 𝑢5 (without degrees of freedom correction)
𝑅8 = fraction of variance of 𝑌 explained by 𝑋s
𝑅8 = adjusted 𝑅8 = 𝑅8 with degrees of freedom correction that adjusts for estimation
uncertainty (𝑅8 < 𝑅8 )

Page 16 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

As in regression with a single regressor, the 𝑆𝐸𝑅 and 𝑅𝑀𝑆𝐸, are measures of the spread
of the 𝑌𝑠 around the regression line:

6
1
𝑆𝐸𝑅 = 𝑢58
𝑛−𝑘−1
57"

6
1
𝑅𝑀𝑆𝐸 = 𝑢58
𝑛
5

Page 17 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

The 𝑅8 is the fraction of the variance explained – same definition as in regression with a
single regressor:

𝐸𝑆𝑆 𝑆𝑆𝑅
𝑅8 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
6 6 8 6
where 𝐸𝑆𝑆 = 57"(𝑌5 − 𝑌)8 , 𝑆𝑆𝑅 = 57" 𝑢5 , 𝑇𝑆𝑆 = 57"(𝑌5 − 𝑌)8

• The 𝑅8 always increases when you add another regressor – a bit of a problem for a
measure of fit
• The 𝑅8 corrects this problem by penalizing you for including another regressor – the
𝑅8 does not necessarily increase when you add another regressor

𝑛−1 𝑆𝑆𝑅
𝑅8 = 1 −
𝑛 − 𝑘 − 1 𝑇𝑆𝑆

• 𝑅8 < 𝑅8 , however, if 𝑛 is large, the two will be very close

Page 18 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

The Least Squares Assumptions for Causal Inference in Multiple Regression:

Let 𝛽" , 𝛽8 , … , 𝛽m be causal effects.

𝑌5 = 𝛽I + 𝛽" 𝑋"5 + 𝛽8 𝑋85 + ⋯ + 𝛽m 𝑋m5 + 𝑢5 , 𝑖 = 1, … , 𝑛

1. The conditional distribution of 𝑢 given the 𝑋s has mean zero,


i.e. 𝐸(𝑢5 |𝑋"5 = 𝑥" ,…, 𝑋m5 = 𝑥m ) = 0
2. 𝑋"5 , … , 𝑋m5 , 𝑌5 , 𝑖 = 1, … , 𝑛 are iid
3. Large outliers are unlikely
4. There is no perfect multicollinearity

Page 19 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Assumption #1: The conditional mean of 𝑢 given the 𝑋s is zero

𝐸(𝑢5 |𝑋"5 = 𝑥" ,…, 𝑋m5 = 𝑥m ) = 0

• This has the same interpretation as in regression with a single regressor


• Failure of this condition leads to omitted variable bias, specifically, if an omitted
variable:
o Belongs in the equation (so, is in 𝑢5 ); and
o Is correlated with an included 𝑋
• The best solution, if possible, is to include the omitted variable in the regression
• A second, related solution is to include a variable that controls for the omitted
variable (discussed shortly)

Assumption #2: 𝑋"5 , … , 𝑋m5 , 𝑌5 , 𝑖 = 1, … , 𝑛 are iid


This is satisfied automatically if the data are collected by simple random sampling.

Assumption #3: large outliers are rare


This is the same assumption as we had before for a single regressor. As in the case of a
single regressor, OLS can be sensitive to large outliers, so you need to check your data
(scatterplots) to make sure there are no crazy values (typos or coding errors)
Page 20 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Assumption #4: There is no perfect multicollinearity
Perfect multicollinearity is when one of the regressors is an exact linear function of the
other regressors

Example: Suppose you accidentally include 𝑆𝑇𝑅 twice


> model<-lm(testscr~str+str+el_pct,data=caschool)
> coeftest(model, vcov = vcovHC(model, type = "HC1"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 686.032249 8.728224 78.5993 < 2e-16 ***
str -1.101296 0.432847 -2.5443 0.01131 *
el_pct -0.649777 0.031032 -20.9391 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

à R automatically drops one of the 𝑆𝑇𝑅𝑠

Page 21 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

The Sampling Distribution of the Least Squares Estimator


Under the four Least Squares Assumptions:
• The sampling distribution of 𝛽" has mean 𝛽"
• 𝑣𝑎𝑟(𝛽" ) is inversely proportional to 𝑛
• other than its mean and variance, the exact (finite-𝑛) distribution of 𝛽" is very
complicated; but for large 𝑛:
B
o 𝛽" is consistent: 𝛽" 𝛽" (law of large numbers)
s_ tS(s_ )
o is approximately distributed as 𝑁(0,1) (CLT)
uvw(s_ )

o these statements hold for 𝛽" , … , 𝛽m

Conceptually, there is nothing new here.

Page 22 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Back to Multicollinearity: Perfect and Imperfect

Perfect multicollinearity is when one of the regressors is an exact linear function of the
other regressors.

Some examples:
• the example from before – you include 𝑆𝑇𝑅 twice
• regress 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 on a constant, 𝐷, and 𝐵, where 𝐷5 = 1 if 𝑆𝑇𝑅 ≤ 20, and 0,
otherwise; 𝐵5 = 1 if 𝑆𝑇𝑅 > 20, and 0, otherwise. So, 𝐵5 = 1 − 𝐷5 (exact linear
function) and there is perfect multicollinearity. This is an example of the dummy
variable trap.

Page 23 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

The Dummy Variable Trap:


Suppose you have a set of multiple binary (dummy) variables which are mutually
exclusive and exhaustive – that is, there are multiple categories and every observation
falls in one, and only one, category. If you include all these dummy variables and a
constant, you will have perfect multicollinearity – this is sometimes called the dummy
variable trap.

• Solutions to the dummy variable trap:


§ Omit one of the groups or
§ Omit the intercept
• What are the implications for the interpretation of the coefficients?
• If you have perfect multicollinearity, your statistical software will let you know –
either by crashing or giving an error message or by dropping one of the variables
arbitrarily

Page 24 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Let’s look at an example:
Suppose we would like to relate wages to gender. Suppose we create two dummy
variables:
𝑀𝑎𝑙𝑒 = 1 if the individual is male, 0 if the individual is female;
𝐹𝑒𝑚𝑎𝑙𝑒 = 1 if the individual is female, 0 if the individual is male

Regression models relating wages to gender:


(i) 𝑊𝑎𝑔𝑒 = 𝛽I + 𝛽" 𝑀𝑎𝑙𝑒 + 𝑢5
For women then, 𝑀𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛽I + 𝑢5
- 𝜷𝟎 is the population mean for women
- 𝛽I + 𝛽" is the population mean for men

(ii) 𝑊𝑎𝑔𝑒 = 𝛾I + 𝛾" 𝐹𝑒𝑚𝑎𝑙𝑒 + 𝑣5


For men then, 𝐹𝑒𝑚𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛾I + 𝑣5
- 𝛾I is the population mean for men
- 𝜸𝟎 + 𝜸𝟏 is the population mean for women

Relationship among coefficients: 𝜷𝟎 = 𝜸𝟎 + 𝜸𝟏 , 𝛽I + 𝛽" = 𝛾I

Page 25 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS


(iii) 𝑊𝑎𝑔𝑒 = 𝛿I + 𝛿" 𝑀𝑎𝑙𝑒 + 𝛿8 𝐹𝑒𝑚𝑎𝑙𝑒 + 𝜀5
For men, 𝑀𝑎𝑙𝑒 = 1, 𝐹𝑒𝑚𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛿I + 𝛿" + 𝜀5
For women, 𝐹𝑒𝑚𝑎𝑙𝑒 = 1, 𝑀𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛿I + 𝛿8 + 𝜀5
- 𝛿I + 𝛿" is the population mean for men
- 𝜹𝟎 + 𝜹𝟐 is the population mean for women
à 𝛿I cannot be estimated
à model suffers from perfect multicollinearity – you are including the same
information twice

(iv) 𝑊𝑎𝑔𝑒 = 𝛼I 𝑀𝑎𝑙𝑒 + 𝛼" 𝐹𝑒𝑚𝑎𝑙𝑒 + 𝜂5


For men, 𝑀𝑎𝑙𝑒 = 1, 𝐹𝑒𝑚𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛼I + 𝜂5
For women, 𝐹𝑒𝑚𝑎𝑙𝑒 = 1, 𝑀𝑎𝑙𝑒 = 0: 𝑊𝑎𝑔𝑒 = 𝛼" + 𝜂5
- 𝛼I is the population mean for men
- 𝜶𝟏 is the population mean for women

Relationship among coefficients: 𝜷𝟎 = 𝜸𝟎 + 𝜸𝟏 = 𝜶𝟎 , 𝛽I + 𝛽" = 𝛾I = 𝛼"

Page 26 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Imperfect multicollinearity occurs when two or more regressors are very highly
correlated.

Imperfect multicollinearity implies that one or more of the regression coefficients will be
imprecisely estimated.

• The idea: the coefficient on 𝑋" is the effect of 𝑋" holding 𝑋8 constant; but if 𝑋" and 𝑋8
are highly correlated, there is very little variation in 𝑋" once 𝑋8 is held constant – so
the data don’t contain much information about what happens when 𝑋" changes but 𝑋8
doesn’t. This means that the variance of the OLS estimator of the coefficient on 𝑋"
will be large.
• Imperfect multicollinearity results in large standard errors for one or more of the OLS
coefficients

Page 27 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Control Variables and Conditional Mean Independence


We want to get an unbiased estimate of the effect on test scores of changing class size,
holding constant factors outside the school committee’s control – such as outside learning
opportunities (museums, etc), parental involvement in education (reading with mom at
home), etc.

If we could run an experiment, we would randomly assign students (and teachers) to


different sized classes. Then 𝑆𝑇𝑅5 would be independent of all the other factors that go
into 𝑢5 , so 𝐸 𝑢5 𝑆𝑇𝑅5 = 0 and the OLS slope estimator in the regression of
𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 on 𝑆𝑇𝑅 would be an unbiased estimator of the desired causal effect.

But with observational data, 𝑢5 depends on additional factors (parental involvement,


knowledge of English, access in the community to learning opportunities outside school,
etc).
• If you can observe those factors (e.g. PctEL), then include them in the regression
• But usually you can’t observe all these omitted causal factors (e.g. parental
involvement in homework). In this case, you can include control variables which
T
are correlated with these omitted causal factors, but which themselves are not causal.

Page 28 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

A control variable W is a regressor included to hold constant factors that, if neglected,
could lead the estimated causal effect of interest to suffer from omitted variable bias.

For our test scores example:


𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 700.2 − 1.00𝑆𝑇𝑅 − 0.122𝑃𝑐𝑡𝐸𝐿 − 0.547𝐿𝑐ℎ𝑃𝑐𝑡, 𝑅8 = 0.773
5.6 0.27 0.033 0.024

𝑃𝑐𝑡𝐸𝐿 = percent English learners in the school district


𝐿𝑐ℎ𝑃𝑐𝑡 = percent of students receiving a free/subsidized lunch (only students from low-
income families are eligible)

• 𝑆𝑇𝑅 is the variable of interest


• 𝑃𝑐𝑡𝐸𝐿 probably has a direct causal effect (school is tougher if you are learning
English). But it is also a control variable: immigrant communities tend to be less
affluent and often have fewer outside learning opportunities, and 𝑃𝑐𝑡𝐸𝐿 is correlated
with those omitted causal variables. 𝑃𝑐𝑡𝐸𝐿 is both a possible causal variable and a
control variable.
• 𝐿𝑐ℎ𝑃𝑐𝑡 might have a causal effect (eating lunch helps learning); it also is correlated
with controls for income-related outside learning opportunities. 𝐿𝑐ℎ𝑃𝑐𝑡 is both a
possible causal variable and a control variable.
Page 29 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Three interchangeable statements about what makes for an effect control variable:

1. An effective control variable is one which, when included in the regression, makes
the error term uncorrelated with the variable of interest.
2. Holding constant the control variable(s), the variable of interest is as if randomly
assigned.
3. Among individuals/observations with the same value of the control variable(s), the
variable of interest is uncorrelated with the omitted determinants of 𝑌

Control variables need not be causal and their coefficients generally do not have a causal
interpretation.

Page 30 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

The math of control variables: conditional mean independence

• Because a control variable is correlated with an omitted causal factor, LSA #1


does not hold. For example, the coefficient on 𝐿𝑐ℎ𝑃𝑐𝑡 is correlated with
unmeasured determinants of test scores such as outside learning opportunities, so is
subject to omitted variable bias. But the fact that 𝐿𝑐ℎ𝑃𝑐𝑡 is correlated with these
omitted variables is precisely what makes it a good control variable!
• If LSA #1 does not hold, then what does?
• We need a mathematical condition for what makes an effective control variable.
This condition is conditional mean independence: given the control variable, the
mean of 𝑢5 doesn’t depend on the variable of interest

Let 𝑋5 denote the variable of interest and 𝑊5 denote the control variable(s). 𝑊 is an
effective control variable if conditional mean independence holds:

𝐸 𝑢5 𝑋5 , 𝑊5 = 𝐸(𝑢5 |𝑊5 ) (conditional mean independence)

If 𝑊 is a control variable, then conditional mean independence replaces LSA #1 – it is


the version of LSA #1 that is relevant for control variables.

Page 31 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Consider the regression model
𝑌 = 𝛽I + 𝛽" 𝑋 + 𝛽8 𝑊 + 𝑢

where 𝑋 is the variable of interest, 𝛽" is its causal effect, and 𝑊 is an effective control
variable so that conditional mean independence holds:

𝐸 𝑢5 𝑋5 , 𝑊5 = 𝐸(𝑢5 |𝑊5 )

In addition, suppose that LSA #2, 3, 4 hold. Then:


1. 𝛽" has a causal interpretation
2. 𝛽" is unbiased
3. The coefficient on the control variable, 𝛽8 , does not in general estimate a causal
effect

Page 32 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

Under conditional mean independence:

1. 𝛽" has a causal interpretation


The math: The expected change in 𝑌 resulting from a change in 𝑋, holding (a single)
𝑊 constant, is:

𝐸 𝑌 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 − 𝐸 𝑌 𝑋 = 𝑥, 𝑊 = 𝑤
= 𝛽I + 𝛽" 𝑥 + ∆𝑥 + 𝛽8 𝑤 + 𝐸 𝑢 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 − [𝛽I + 𝛽" 𝑥 + 𝛽8 𝑤
+ 𝐸 𝑢 𝑋 = 𝑥, 𝑊 = 𝑤
= 𝛽" ∆𝑥 + [𝐸 𝑢 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 − 𝐸 𝑢 𝑋 = 𝑥, 𝑊 = 𝑤 = 𝛽" ∆𝑥

where the final line follows from conditional mean independence:

𝐸 𝑢 𝑋 = 𝑥 + ∆𝑥, 𝑊 = 𝑤 = 𝐸 𝑢 𝑋 = 𝑥, 𝑊 = 𝑤 = 𝐸(𝑢|𝑊 = 𝑤)

2. 𝛽" is unbiased.

Page 33 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

3. 𝛽8 , does not in general estimate a causal effect
The math:
Consider the regression model,

𝑌 = 𝛽I + 𝛽" 𝑋 + 𝛽8 𝑊 + 𝑢

where 𝑢 satisfies the conditional mean independence assumption and where 𝛽" and
𝛽8 are causal effects.

Suppose that 𝐸 𝑢 𝑊 = 𝛾I + 𝛾8 𝑊 (that is, 𝐸 𝑢 𝑊 is linear in 𝑊). Then under


conditional mean independence,

𝐸 𝑢 𝑋, 𝑊 = 𝐸 𝑢 𝑊 = 𝛾I + 𝛾8 𝑊 (∗)

Let
𝜈 = 𝑢 − 𝐸 𝑢 𝑋, 𝑊 (∗∗)

so that 𝐸 𝑣 𝑋, 𝑊 = 0. Combining (∗) and (∗∗) yields


𝑢 = 𝐸 𝑢 𝑋, 𝑊 + 𝑣
= 𝛾I + 𝛾8 𝑊 + 𝑣, where 𝐸 𝑣 𝑋, 𝑊 = 0
Page 34 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

We now substitute 𝑢 = 𝛾I + 𝛾8 𝑊 + 𝑣 into the regression,

𝑌 = 𝛽I + 𝛽" 𝑋 + 𝛽8 𝑊 + 𝑢
= 𝛽I + 𝛽" 𝑋 + 𝜷𝟐 𝑾 + 𝛾I + 𝛾8 𝑊 + 𝑣
= (𝛽I + 𝛾I ) + 𝛽" 𝑋 + (𝜷𝟐 +𝜸𝟐 )𝑾 + 𝑣
= 𝛿I + 𝛽" 𝑋 + 𝜹𝟐 𝑾 + 𝑣, 𝛿I = 𝛽I + 𝛾I , 𝛿8 = 𝛽8 +𝛾8

Since 𝐸 𝑣 𝑋, 𝑊 = 0, the OLS estimators of 𝛿I , 𝛽" and 𝛿8 are unbiased.

Notice that 𝐸 𝛽" = 𝛽" and 𝐸 𝜷𝟐 = 𝜹𝟐 = 𝜷𝟐 +𝜸𝟐


à 𝛽" is an unbiased estimator of the causal effect 𝛽" , but 𝛽8 is not unbiased for 𝛽8

Under conditional mean independence,

𝐸 𝛽" = 𝛽" , and


𝐸 𝛽8 = 𝛿8 = 𝛽8 +𝛾8 ≠ 𝛽8

Page 35 of 36

CHAPTER 6: LINEAR REGRESSION WITH

MULTIPLE REGRESSORS

In summary, if 𝑊 is such that the conditional mean independence is satisfied, then:

• The OLS estimator of the effect of interest, 𝛽" , is unbiased


• The OLS estimator of the coefficient on the control variable, 𝛽8 , does not have a
causal interpretation. The reason is that the control variable is correlated with omitted
variables in the error term, so that 𝛽8 is subject to omitted variable bias

Coming up next: Hypothesis Testing and Confidence Intervals

Page 36 of 36

You might also like