Lecture 2_regression_multiple_regressors
Lecture 2_regression_multiple_regressors
1/30
Preview
• OLS with one regressor can give biased β, if E(u|X ) = 0 does not hold
• OLS with multiple regressors can solve the omitted variable bias and get causal
effects
• Ceteris paribus condition
• OLS with multiple regressors can improve predictions
2/30
Example - California schools
Characteristics that are important drivers of the final score are also likely to be correlated
with STR.
For example, because of the large immigration in California: % of students who are still
learning English is important for tests results and may also be related class size.
Example - The role of non-native English speakers
• Students who are still learning English might perform worse on standardized tests
than native speakers. Thus, districts with a higher % of non-native speakers might
have, on average, lower scores.
• Districts with many migrants could have larger classes (why?)
• Then, OLS could erroneously produce a large estimate of β1 . It mixes the impact of
class size with that of migration; it compares small classes with few non-native
speakers (high performers) vs large classes with many non-native speakers (low
performers)
• The effect of STR is biased!
• What if we know the % of non English speakers in each district (elpcti = English
learners percent)? 3/30
Example - California schools
4/30
Omitted Variable Bias
When both conditions are realized, the assumption A1 is violated and the OLS
estimator is biased. Neither changing the sample, nor increasing the number of
observations would solve the problem.
5/30
What is the size of the bias?
Formula for the OVB
Let us suppose that all the assumptions A2-A3 are verified and let us define
ρXu = corr (Xi , ui ).
Then the OLS estimator:
p σu
β̂1 → β1 + ρXu (1)
σX
| {z }
bias
6/30
Can I cancel/reduce the bias?
To cancel the bias I should include all the omitted variables in my model.
A first method to reduce it, is to split the sample into groups such that within each
group the omitted variable is kept constant (e.g. districts for which the % of non English
speaking students is similar) but also such that the other depended variable of interest
has a sufficient variability (e.g. the students-to-teachers ratio).
Using this grouping strategy (here quartile split) we can compute the difference in
average score between large and small classes within groups of schools with similar elpct
and use a simple t-test to test whether the difference is significant.
Sample splitting
7/30
Linear model with multiple regressors
• does not provide a precise causal effect of class size, holding constant the fraction
of English learners
• appears complicated if one includes more than one omitted variable
• becomes unpractical as the number of comparison cells increases and the samples
within each cell decrease
One solution is to extend the single variable OLS model to multiple regression model.
This allows to estimate the causal effect on Yi of changing X1i , while holding constant
the other regressors (X2i , X3i ,etc), which are confounding factors and causing OVB in the
univariate OLS.
For prediction, the multiple regression model can improve accuracy.
8/30
The Multiple Regression Model
9/30
The Multiple Regression Model
Y = X β + u (4)
n×1 n×(k+1)(k+1)×1 n×1
where:
Y1 1 X1,1 ... Xk,1 β0 u1
Y2
1 X1,2 ... Xk,2
β1
u2
Y =
.. , X =
.. .. .. , β =
.. u=
..
. . . ··· . . .
Yn 1 X1,n ... Xk,n βk un
10/30
The OLS estimator
X
argmin [Yi − b0 − b1 X1,i − · · · − bk Xk,i ]2 (5)
b
i
β̂ = (X ′ X )−1 X ′ Y (7)
• After including elpct, the parameter on str changes (more or less reduced by half).
• Why such a drastic change in the estimate?
• In the univariate model, β1 was underestimated (negative OVB)
• Now, OVB is attenuated. Completely removed?
12/30
Assumptions of the Multiple Regression Model
13/30
Perfect multicollinarity
Perfect multicollinearity
Formally we have perfect multicollinearity if a regressor j can be expressed as:
k
X
Xj,i = αh Xh,i ∀ i = 1, . . . , n
h=1
Note: every modern software automatically checks for this and drops one of the
redundant regressors.
14/30
Example - California schools
The dummy variable trap
Suppose we partition the school districts into three categories: rural, suburban, urban
and we create three dummy variables (i.e. Xrural , Xsuburban , Xurban ) with value 1 if the
district i is of that specific category, and value 0 if not.
Imagine we want to estimate:
However, because every district belongs only to one of the three categories we will have
that:
rurali + suburbani + urbani = 1 ∀i
but the vector 1 is a regressor already included in the model (associated with the
constant). Thus, to estimate this model, we’ll need to drop either one of the three
dummy variables (which becomes the reference category) or the constant. for example:
When two (or more) of the regressors are highly correlated, then imperfect
multicollinearity arises.
Imperfect multicollinearity, does not pose any problems for the theory of the OLS
estimators. However, if the regressors are imperfectly multicollinear, then the coefficient
on at least one individual regressor will be imprecisely estimated – in particular, it will
have a large sampling variance.
16/30
Control variables and causality
In the multiple regression we are not interested in the causal effects of all the variables.
Some of them might be there only to avoid OVB in the causal interpretation of the
variables of interest. Thus we have:
The conditional mean of u given W , does not change even after considering the
knowledge about X . Thus, when controlling for W , X becomes uncorrelated from u (as
if they were randomly assigned).
If A1-bis holds, the coefficients for the variables of interest (X ) have a causal 17/30
interpretation, while those for the controls (W ) can be biased.
Goodness of Fit in the Multiple Regression
Similarly to the single regressor case, we can measure the quality of the model by means
of the SER and the R 2 .
The standard error of the regression writes:
s P
n 2
r
i=1 ûi SSR
q
SER = sû = sû2 = = (8)
n−k −1 n−k −1
The denominator adjusts for the degrees of freedom lost due to the estimate of the k + 1
parameters. In large samples, such adjustment is negligible.
The R 2 is like in the univariate case:
ESS SSR
R2 = =1− (9)
TSS TSS
However, the R 2 increases (by construction) every time that we add a new variable to
our model, which contributes to decrease SSR.
18/30
Adjusted R 2
To correct for this issue, it is better to use the adjusted R 2 (often indicated as R̄ 2 ) that
writes:
n − 1 SSR
R̄ 2 = 1 − (10)
n − k − 1 TSS
When adding a new regressor (k increases) the formula for the R̄ 2 entails a trade-off:
so the decision (to add or not the regressor) depends on which effect dominates.
Notes: R̄ 2 is always less than R 2 and can take negative values
19/30
Goodness of Fit and Model Selection
A Note of Caution
When choosing the most appropriate model (among a set of models) the R 2 or the R̄ 2
should not be the unique criterion.
A high value of the R 2 only means that your regression model explains the variability in
Y.
It does not imply that:
• you have an unbiased estimator for the causal effect (and that you have deleted all
the possible OVB);
• the variables in the model are statistically significant.
20/30
The sampling distribution of β̂
Properties of the OLS estimator
As for the single regressor model, under A1-A4 the OLS is unbiased and consistent.
where:
1
Σu = E(uu ′ ) (14)
n−k
1
which (being unobservable) can be estimated as Σ̂u = n−k û û ′ . In most applications, we
exclude A4, i.e., homoschedasticity, and compute heteroschedasticity robust SE (the
software does it!)
In large samples, thanks to the CLT, the OLS is distributed as a multivariate standard
Normal and
d
β̂k → N (βk , σβ̂2 k ) ∀j = 1, . . . , K + 1 (15)
21/30
Hypothesis testing in the multiple regression model
We can rewrite:
β̂k − E[β̂k ]
∼ N (0, 1) ∀ k = 1, . . . , K + 1
SE (βˆk )
• hypothesis testing on a single element βk of the vector β can be carried out using
the usual t-test;
• 95% confidence interval for a single element βk of the vector β can be computed by
β̂k ± 1.96SE (β̂k );
Note: because in Var (β̂k ) there is also the covariance between the different estimates,
the t-tests on single elements of the vector β are not independent. Therefore
including/omitting a regressor will change the final outcome of every single t-test.
22/30
Hypothesis testing in the multiple regression model
23/30
Hypothesis testing in the multiple regression model
Example - California schools
what if we add expenditure per pupil as a further control?
The coef of STR becomes −0.29(0.48) → it becomes non-significant, and flips the
policy implication with respect to the beginning. However, STR and PPexpenditure
are correlated (imperfect multicollinearity)- hence one may test that both β1 = 0 and
24/30
β2 = 0
Testing joint hypothesis
Therefore:
PrH0 (|t1 | ≤ 1.96 and |t2 | ≤ 1.96) = 0.952 = 90.25%
and the test size (rejecting H0 when it is true) is of the 9.75% (not 5%).
Conclusion: you make many more type-I errors than you would expect.
Definition for q = 2
If q = 2, we can define the F-statistic as:
where ρ̂2t1 ,t2 is the correlation between t1 and t2 . Therefore, the F-stat takes into
account the correlation between different t-stats.
If the single t-stats are uncorrelated (ρ̂2t1 ,t2 = 0), the F-stat would simply be an average
of two squared t-statistics:
1 2
F = (t1 + t22 )
2
26/30
F-statistic
• Reject H0 if F > Fα , where Fα is the critical values for a given significance level α
27/30
Testing multiple coefficients
Sometimes a single restriction might involve more parameters. For example, economic
theory might suggest a specific restriction about two parameters having the same value
(e.g. β1 = β2 or β1 − β2 = 0)
In this case therefore a single restriction (q = 1) involves more estimated parameters.
To test for this restriction we can transform the regression model in a form such that the
t-test refers to a single parameter.
Example - equality restriction
Let’s suppose our model is
Yi = β0 + β1 X1 + β2 X2 + ui (17)
In R:
# heteroskedasticity-robust F-test
linearHypothesis(model, c("STR=0", "expenditureK=0"), white.adjust = "hc1")
29/30
Model specification
30/30