SlideShare a Scribd company logo
MULTIPLE REGRESSION ANALYSIS: STATISTICAL
PROPERTIES
Introductory Econometrics: A Modern Approach, 5e
Haoming Liu
National University of Singapore
August 21, 2022
The Expected Value of the OLS Estimators
The Variance of the OLS Estimators
Efficiency of OLS: The Gauss-Markov Theorem
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 1 / 82
Recap
We motived OLS estimation using the population regression function
E(y|x1, x2, ..., xk) = β0 + β1x1 + β2x2 + ... + βkxk
Given n observations, we obtained the sample regression function,
ŷ = β̂0 + β̂1x1 + β̂2x2 + ... + β̂kxk,
by choosing the β̂j to minimize the sum of squared residuals.
The β̂j have a ceteris paribus interpretation. For example,
ŷ = β̂1∆x1, if ∆x2 = ... = ∆xk = 0
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 2 / 82
Recap
We discussed algebraic properties of fitted values and residuals. These
hold for any sample. Also, features of R2, the goodness-of-fit
measure.
The only assumption we needed to discuss the algebraic properties for
a given sample is that we can actually compute the estimates.
Now we turn to statistical properties and features of the study the
sampling distributions of the estimators. We go further than in the
simple regression case, eventually covering statistical inference.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 3 / 82
MLR
As with simple regression, there is a set of assumptions under which
OLS is unbiased. We also explicitly consider the bias caused by
omitting a variable that appears in the population model.
Now we label the assumptions with “MLR” (multiple linear
regression).
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 4 / 82
Assumption MLR.1 (Linear in Parameters)
The model in the population can be written as
y = β0 + β1x1 + β2x2 + ... + βkxk + u
where the βj are the population parameters and u is the unobserved error.
We have seen examples already where y and the xj can be nonlinear
functions of underlying variables, and so the model is flexible.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 5 / 82
Assumption MLR.2 (Random Sampling)
We have a random sample of size n from the population,
{(xi1, xi2, ..., xik, yi ) : i = 1, ..., n}
As with SLR, this assumption introduces the data and implies the
data are a representative sample from the population.
Sometimes we will plug a random draw into the population equation:
yi = β0 + β1xi1 + β2xi2 + ... + βkxik + ui ,
which emphasizes that, along with the observed variables, we
effectively draw an unobserved error, ui , for each unit i.
As an example,
log(wagei ) = β0 + β1educi + β2IQi + β3experi + β4exper2
i + ui
lwagei = β0 + β1educi + β2IQi + β3experi + β4expersqi + ui
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 6 / 82
Assumption MLR.3 (No Perfect Collinearity)
In the sample (and, therefore, in the population), none of the explanatory
variables is constant, and there are no exact linear relationships among
them.
The need to rule out cases where {xij : i = 1, ..., n} has no variation
for each j is clear from simple regression.
There is a new part to the assumption because we have more than
one explanatory variable. We must rule out the (extreme) case that
one (or more) of the explanatory variables is an exact linear function
of the others.
If, say, xi1 is an exact linear function of xi2, ..., xik in the sample, we
say the model suffers from perfect collinearity.
Under perfect collinearity, there are no unique OLS estimators. Stata
and other regression packages will indicate a problem.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 7 / 82
Examples of Perfect Collinearity
x1 = 2 ∗ x2
x1 = x3 + 3 ∗ x4
yi = β0 + β1x1i + β2x2i + ui = β0 + 0.5 ∗ β1x1i + 2 ∗ β2x2i + ui
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 8 / 82
Assumption MLR.3 (No Perfect Collinearity)
Usually perfect collinearity arises from a bad specification of the
population model. A small sample size or bad luck in drawing the
sample can also be the culprit.
Assumption MLR.3 can only hold if n ≥ k + 1, that is, we must have
at least as many observations as we have parameters to estimate.
Suppose that k = 2 and x1 = educ, x2 = exper. If we draw our
sample so that
educi = 2experi
for every i, then MLR.3 is violated. This is very unlikely unless the
sample is small. (In any realistic population there are plenty of people
whose education level is not twice their years of workforce experience.)
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 9 / 82
Assumption MLR.3 (No Perfect Collinearity)
With the samples we have looked at (n = 680, n = 759, even
n = 173), the presence of perfect collinearity is usually a result of
poor model specification, or defining variables inappropriately.
Such problems can almost always be detected by remembering the
ceteris paribus nature of multiple regression.
EXAMPLE: Do not include the same variable in an equation that is
measured in different units. For example, in a CEO salary equation, it
would make no sense to include firm sales measured in dollars along with
sales measured in millions of dollars. There is no new information once we
include one of these.
The return on equity should be included as a percent or proportion, but
not both.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 10 / 82
Assumption MLR.3 (No Perfect Collinearity)
EXAMPLE: Be careful with functional forms! Suppose we start with a
constant elasticity model of family consumption:
log(cons) = β0 + β1 log(inc) + u
How might we allow the elasticity to be nonconstant, but include the
above as a special case? The following does not work:
log(cons) = β0 + β1 log(inc) + β2 log(inc2
) + u
because log(inc2) = 2 log(inc), that is, x2 = 2x1, where x1 = log(inc).
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 11 / 82
Assumption MLR.3 (No Perfect Collinearity)
Instead, we probably mean something like
log(cons) = β0 + β1 log(inc) + β2[log(inc)]2
+ u
which means x2 = x2
1 . With this choice, x2 is an exact nonlinear
function of x1, but this (fortunately) is allowed in MLR.3.
Tracking down perfect collinearity can be harder when it involves
more than two variables.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 12 / 82
Assumption MLR.3 (No Perfect Collinearity)
EXAMPLE: In VOTE1.DTA:
Go to poll everywhere.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 13 / 82
Assumption MLR.3 (No Perfect Collinearity)
One of the three variables has to be dropped. (Stata does this
automatically, but we should rely on ourselves to properly construct a
model and interpret it.)
The model makes no sense from a ceteris paribus perspective. For
example, β1 is suppose to measure the effect of changing expendA on
voteA, holding fixed expendB and totexpend. But if expendB and
totexpend are held fixed, expendA cannot change!
We would probably drop totexpend and just use the two separate
spending variables.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 14 / 82
Assumption MLR.3 (No Perfect Collinearity)
. gen totexpend = expendA + expendB
. reg voteA expendA expendB totexpend
note: expendA omitted because of collinearity
Source | SS df MS Number of obs = 173
-------------+------------------------------ F( 2, 170) = 95.83
Model | 25679.8879 2 12839.944 Prob > F = 0.0000
Residual | 22777.3606 170 133.984474 R-squared = 0.5299
-------------+------------------------------ Adj R-squared = 0.5244
Total | 48457.2486 172 281.728189 Root MSE = 11.575
------------------------------------------------------------------------------
voteA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
expendA | (omitted)
expendB | -.0744583 .0053848 -13.83 0.000 -.0850879 -.0638287
totexpend | .0383308 .0033868 11.32 0.000 .0316452 .0450165
_cons | 49.619 1.426147 34.79 0.000 46.80376 52.43423
------------------------------------------------------------------------------
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 15 / 82
Assumption MLR.3 (No Perfect Collinearity)
. reg voteA expendA expendB
Source | SS df MS Number of obs = 173
-------------+------------------------------ F( 2, 170) = 95.83
Model | 25679.8879 2 12839.9439 Prob > F = 0.0000
Residual | 22777.3607 170 133.984474 R-squared = 0.5299
-------------+------------------------------ Adj R-squared = 0.5244
Total | 48457.2486 172 281.728189 Root MSE = 11.575
------------------------------------------------------------------------------
voteA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
expendA | .0383308 .0033868 11.32 0.000 .0316452 .0450165
expendB | -.0361275 .0031071 -11.63 0.000 -.042261 -.0299939
_cons | 49.619 1.426147 34.79 0.000 46.80376 52.43423
------------------------------------------------------------------------------
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 16 / 82
Assumption MLR.3 (No Perfect Collinearity)
The results of the previous regression seem sensible: spending by
candidate A has a positive effect on the share of the vote received by
A, and spending by B has essentially the opposite effect. (If expendA
increases by 10, so $10,000, and expendB is held fixed, the voteA is
estimated to increase by about .38 percentage points.)
Note that shareA, which is a nonlinear function of expendA and
expendA,
shareA = 100 ·
expendA
(expendA + expendB)
can be included along with the two expenditure variables. It allows for
the relative size of spending to matter.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 17 / 82
Assumption MLR.3 (No Perfect Collinearity)
. reg voteA expendA expendB shareA
Source | SS df MS Number of obs = 173
-------------+------------------------------ F( 3, 169) = 346.87
Model | 41687.0627 3 13895.6876 Prob > F = 0.0000
Residual | 6770.18584 169 40.0602712 R-squared = 0.8603
-------------+------------------------------ Adj R-squared = 0.8578
Total | 48457.2486 172 281.728189 Root MSE = 6.3293
------------------------------------------------------------------------------
voteA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
expendA | -.0064488 .0029065 -2.22 0.028 -.0121866 -.000711
expendB | .0049463 .0026662 1.86 0.065 -.000317 .0102097
shareA | .5096844 .0254977 19.99 0.000 .4593494 .5600194
_cons | 24.96397 1.459247 17.11 0.000 22.08327 27.84467
------------------------------------------------------------------------------
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 18 / 82
Assumption MLR.3 (No Perfect Collinearity)
A Key Point
Assumption MLR.3 does not say the explanatory variables have to be
uncorrelated – in the population or sample. Nor does it say they cannot be
“highly” correlated. MLR.3 rules out perfect correlation in the sample,
that is, correlations of ±1.
Again, in practice violations of MLR.3 are rare unless a mistake has
been made in specifying the model.
Multiple regression would be useless if we had to insist x1, ..., xk were
uncorrelated in the sample (or population)!
If the xj were all pairwise uncorrelated, we could just use a bunch of
simple regressions.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 19 / 82
Assumption MLR.3 (No Perfect Collinearity)
In an equation like
lwage = β0 + β1educ + β2IQ + β3exper + u,
we fully expect correlation among educ, IQ, and exper. (Already saw
educ and IQ are positively correlated; educ and exper tend to be
negatively correlated (why?).)
If educ were uncorrelated with all other variables that affect lwage, we
could stick with simple regression of lwage on educ to estimate β1.
Multiple regression allows us to estimate ceteris paribus effects
precisely when there is correlation among the xj
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 20 / 82
MLR.1 to MLR.3
1 y = β0 + β1x1 + β2x2 + ... + βkxk + u
2 random sampling from the population
3 no perfect collinearity in the sample
The last assumption ensures that the OLS estimators are unique and can be
obtained from the first order conditions (minizing the sum of squared
residuals).
We need a final assumption for unbiasedness.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 21 / 82
Assumption MLR.4 (Zero Conditional Mean)
E(u|x1, x2, ..., xk) = 0 for all (x1, ..., xk)
Remember, the real assumption is E(u|x1, x2, ..., xk) = E(u): the
average value of the error does not change across different slices of
the population defined by x1, ..., xk. Setting E(u) = 0 essentially
defines β0.
If u is correlated with any of the xj , MLR.4 is violated. This is usually
a good way to think about the problem.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 22 / 82
Assumption MLR.4 (Zero Conditional Mean)
When Assumption MLR.4 holds, we say x1, ..., xk are exogenous
explanatory variables. If xj is correlated with u, we often say xj is an
endogenous explanatory variable.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 23 / 82
Assumption MLR.4 (Zero Conditional Mean)
EXAMPLE: Effects of Class Size on Student Performance
Suppose, for a standardized test score,
score = β0 + β1classize + β2income + u
Even at the same income level, families differ in their interest and
concern about their children’s education. Family support and student
motivation are in u. Are these correlated with class size even though
we have included income? Probably.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 24 / 82
Unbiasedness of OLS
Theorem
Under Assumptions MLR.1 through MLR.4, and conditional on
{(xi1, ..., xik) : i = 1, ..., n}, the OLS estimators are unbiased:
E(β̂j ) = βj , j = 0, 1, 2, ..., k
for any values of the βj .
The easiest proof requires matrix algebra. See Appendix 3A for a
proof based on summations.
Often the hope is that if our focus is on, say, x1, we can include
enough other variables in x2, ..., xk to make MLR.4 true, or close to
true.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 25 / 82
Inclusion of Irrelevant Variables
It is important to see that the unbiasedness result allow for the βj to
be any value, including zero.
Suppose, then, that we specify the model
lwage = β0 + β1educ + β2exper + β3motheduc + u,
where MLR.1 through MLR.4 hold. Suppose that β3 = 0, but we do
not know that. We estimate the full model by OLS:

lwage = β̂0 + β̂1educ + β̂2exper + β̂3motheduc
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 26 / 82
Inclusion of Irrelevant Variables
We automatically know from the unbiasedness result that
E(β̂j ) = βj , j = 0, 1, 2
E(β̂3) = 0
The result that including an irrelevant variable, or overspecifying
the model, does not cause bias in any coefficients is often presented
with an extra argument. It follows from the general unbiasedness
result, but it does increase the standard error of the estimates.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 27 / 82
Omitted Variable Bias
Leaving a variable out when it should be including in multiple
regression is a serious problem. This is called excluding a relevant
variable or underspecifying the model.
We can perform a misspecification analysis in this case. The
general case is more complicated.
Consider the case where the correct model has two explanatory
variables:
y = β0 + β1x1 + β2x2 + u
satisfies MLR.1 through MLR.4.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 28 / 82
Omitted Variable Bias
If we regress y on x1 and x2, we know the resulting estimators will be
unbiased. But suppose we leave out x2 and use simple regression of y
on x1:
ỹ = β̃0 + β̃1x1
In most cases, we omit x2 because we cannot collect data on it.
We can easily derive the bias in β̃1 (conditional on the sample
outcomes {(xi1, xi2) : i = 1, ..., n}).
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 29 / 82
Omitted Variable Bias
We already have a relationship between β̃1 and the multiple regression
estimator, β̂1:
β̃1 = β̂1 + β̂2δ̃1
where β̂2 is the multiple regression estimator of β2 and δ̃1 is the slope
coefficient in the auxiliary regression
xi2 on xi1, i = 1, ..., n
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 30 / 82
Omitted Variable Bias
Now just use the fact that β̂1 and β̂2 are unbiased (or would be if we could
compute them): Conditional on the sample values of x1 and x2,
E(β̃1) = E(β̂1) + E(β̂2)δ̃1
= β1 + β2δ̃1
Therefore,
Bias(β̃1) = β2δ̃1
Recall that δ̃1 has the same sign as the sample correlation Corr(xi1, xi2).
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 31 / 82
Omitted Variable Bias
The simple regression estimator is unbiased (for the given outcomes
{(xi1, xi2)}) in two cases.
1. β2 = 0. But this means that x2 does not appear in the population
model, so simple regression is the right thing to do.
2. Corr(xi1, xi2) = 0 (in the sample). Then the simple and multiple
regression estimators are identical because δ̃1 = 0.
If β2 ̸= 0 and Corr(xi1, xi2) ̸= 0 then β̃1 is generally biased. We do know
know β2 and might only have a vague idea about the size of δ̃1. But we
often can guess at the signs.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 32 / 82
Omitted Variable Bias
Technically, the bias computed holds for a particular “sample” on (x1, x2).
But acting as if what matters is correlation between x1 and x2 in the
population gives us the correct answer when we turn to asymptotic
analysis.
In what follows, we do not make the distinction between the sample and
population correlation between x1 and x2.
Bias in the Simple Regression Estimator of β1
Corr(x1, x2) > 0 Corr(x1, x2) < 0
β2 > 0 Positive Bias Negative Bias
β2 < 0 Negative Bias Positive Bias
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 33 / 82
EXAMPLE: Omitted Ability Bias
lwage = β0 + β1educ + β2abil + u
where abil is “ability.” Essentially by definition, β2 > 0. We also think
Corr(educ, abil) > 0
so that higher ability people get more education, on average.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 34 / 82
EXAMPLE: Omitted Ability Bias
In this scenario,
E(β̃1) > β1
so there is an upward bias in simple regression. Our failure to control for
ability leads to (on average) overestimating the return to education. We
attribute some of the effect of ability to education because ability and
education are positively correlated.
Remember, for a particular sample, we can never know whether β̃1 > β1.
But we should be very hesitant to trust a procedure that produces to large
an estimate on average.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 35 / 82
EXAMPLE: Effects of a Tutoring Program on Student
Performance
GPA = β0 + β1tutor + β2abil + u
where tutor is hours spent in tutoring. Again, β2 > 0. Suppose that
students with lower ability tend to use more tutoring:
Corr(tutor, abil) < 0
Then
E(β̃1) = β1 + β2δ̃1 = β1 + (+)(−) < β1
so that our failure to account for ability leads us to underestimate the
effect of tutoring. In fact, it could happen that β1 > 0 but E(β̃1) ≤ 0, so
we tend to find no effect or even a negative effect.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 36 / 82
Omitted Variable Bias
If, as an approximation, we assume educ and exper are uncorrelated,
and exper and abil are uncorrelated, then the bias analysis of β̃1 for
the simpler case carries through, but it is now β3 that matters. So, as
a rough guide, β̃1 will have an upward bias because β3 > 0 and
Corr(educ, abil) > 0.
In the general case, it should be remembered that correlation of any
xj with an omitted variable generally causes bias in all of the OLS
estimators, not just in β̃j
See Appendix 3A in Wooldridge for a more detailed treatment.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 37 / 82
The Variance of the OLS Estimators
So far, we have assumed
1 y = β0 + β1x1 + β2x2 + . . . + βkxk + u
2 random sampling from the population
3 no perfect collinearity in the sample
4 E(u|x1, x2, ..., xk) = 0
Under MLR.3 we can compute the OLS estimates in our sample.
The other assumptions then ensure that OLS is unbiased (conditional
on the outcomes of the explanatory variables).
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 38 / 82
Assumption MLR.5 (Homoskedasticity)
When we have omitted an important variable, we have derived that
OLS is biased, and we have shown how to obtain the sign of the bias
in simple cases.
As in the simple regression case, to obtain Var(β̂j ) we add a
simplifying assumption: homoskedasticity (constant variance).
The variance of the error, u, does not change with any of x1, x2, ..., xk:
Var(u|x1, x2, ..., xk) = Var(u) = σ2
This assumption can never be guaranteed. We make it for now to get
simple formulas, and to be able to discuss efficiency of OLS.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 39 / 82
Assumption MLR.5 (Homoskedasticity)
Assumptions MLR.1 and MLR.4 imply that
E(β̂) = β
MLR.5,
Var(y|x1, x2, ..., xk) = Var(u|x1, x2, ..., xk) = σ2
Assumptions MLR.1 through MLR.5 are called the Gauss Markov
assumptions.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 40 / 82
Assumption MLR.5 (Homoskedasticity)
If we have a savings equation,
sav = β0 + β1inc + β2famsize + β3pareduc + u
where famsize is size of the family and pareduc is total parents’
education, MLR.5 means that the variance in sav cannot depend in
income, family size, or parents’s education.
Later we will show how to relax MLR.5, and how to test whether it is
true.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 41 / 82
Assumption MLR.5 (Homoskedasticity)
To set up the following theorem, we focus only on the slope
coefficients. (A different formula is needed for Var(β̂0).
As before, we are computing the variance conditional on the values of
the explanatory variables in the sample.
We need to define two quantities associated with each xj . The first is
the total variation in xj in the sample:
SSTj =
n
X
i=1
(xij − x̄j )2
(SSTj /n is the sample variance of xj .)
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 42 / 82
Assumption MLR.5 (Homoskedasticity)
The second is a measure of correlation between xj and the other
explanatory variables, in the sample. This is the R-squared from the
regression
xij on xi1, xi2, ..., xi,j−1, xi,j+1, ..., xik
That is, we regress xj on all of the other explanatory variables. (y plays no
role here). Call this R-squared R2
j , j = 1, ..., k.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 43 / 82
Assumption MLR.5 (Homoskedasticity)
Important: R2
j = 1 is ruled out by Assumption MLR.3 because
R2
j = 1 means that, in the sample, xj is an exact linear function of
the other explanatory variables.
Any value 0 ≤ R2
j < 1 is permitted. As R2
j gets closer to one, xj is
more linearly related to the other independent variables.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 44 / 82
Theorem (Sampling Variances of OLS Slope Estimators)
Under Assumptions MLR.1 to MLR.5, and condition on the values of the
explanatory variables in the sample,
Var(β̂j ) =
σ2
SSTj (1 − R2
j )
, j = 1, 2, ..., k.
All five Gauss-Markov assumptions are needed to ensure this formula is
correct.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 45 / 82
Assumption MLR.5 (Homoskedasticity)
Suppose k = 3,
y = β0 + β1x1 + β2x2 + β3x3 + u
E(u|x1, x2, x3) = 0
Var(u|x1, x2, x3) = γ0 + γ1x1
where x1 ≥ 0 (as are γ0 and γ1). This violates MLR.5, and the standard
variance formula is generally incorrect for all OLS estimators, not just
Var(β̂1).
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 46 / 82
Assumption MLR.5 (Homoskedasticity)
The variance
Var(β̂j ) =
σ2
SSTj (1 − R2
j )
has three components. σ2 and SSTj are familiar from simple regression.
The third, 1 − R2
j , is new to multiple regression.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 47 / 82
Factors Affect Var(β̂j)
1 As the error variance (in the population), σ2, decreases, Var(β̂j )
decreases. One way to reduce the error variance is to take more stuff
out of the error. That is, add more explanatory variables.
2 As the total sample variation in xj , SSTj , increases, Var(β̂j )
decreases. As in the simple regression case, it is easier to estimate
how xj affects y if we see more variation in xj
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 48 / 82
Factors Affect Var(β̂j)
As we mentioned earlier, SSTj /n [or SSTj (n − 1) – the difference is
unimportant here] is the sample variance of {xij : i = 1, 2, ..., n}. So
we can assume
SSTj ≈ nσ2
j
where σ2
j > 0 is the population variance of xj .
We can increase SSTj by increasing the sample size. SSTj is roughly
a linear function of n. [Of the three components in Var(β̂j ), this is
the only one that depends systematically on n.]
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 49 / 82
Factors Affect Var(β̂j)
Var(β̂j ) =
σ2
SSTj (1 − R2
j )
As R2
j → 1, Var(β̂j ) → ∞. R2
j measures how linearly related xj is to the
other explanatory variables.
We get the smallest variance for β̂j when R2
j = 0:
Var(β̂j ) =
σ2
SSTj
,
which looks just like the simple regression formula.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 50 / 82
Factors Affect Var(β̂j)
If xj is unrelated to all other independent variables, it is easier to
estimate its ceteris paribus effect on y.
R2
j = 0 is very rare. Even small values are not especially common.
In fact, R2
j ≈ 1 is somewhat common, and this can cause problems
for getting a sufficiently precise estimate of βj .
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 51 / 82
Assumption MLR.5 (Homoskedasticity)
Below is a graph of Var(β̂1) as a function of R2
1 :
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 52 / 82
Loosely, R2
j “close” to one is called the “problem” of
multicollinearity. Unfortunately, we cannot define what we mean by
“close” that is relevant for all situations. We have ruled out the case
of perfect collinearity, R2
j = 1.
Here is an important point: One often hears discussions of
multicollinearity as if high correlation among two or more of the xj is
a violation of an assumption we have made. But it does not violate
any of the Gauss-Markov assumptions, including MLR.3.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 53 / 82
We know that if the zero conditional mean assumption is violated,
OLS is not unbiased. If MLR.1 through MLR.4 hold, but
homoscedasticity (constant variance) does not, then
Var(β̂j ) =
σ2
SSTj (1 − R2
j )
is not the correct formula.
But multicollinearity does not cause the OLS estimators to be biased.
We still have E(β̂j ) = βj .
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 54 / 82
Further, any claim that the OLS variance formula is “biased” in the
presence of multicollinearity is also wrong. The formula is correct
under MLR.1 through MLR.5.
In fact, the formula is doing its job: It shows that if R2
j is “close” to
one, Var(β̂j ) might be very large. If R2
j is “close” to one, xj does not
have much sample variation separate from the other explanatory
variables. We are trying to estimate the effect of xj on y, holding
x1, ..., xj−1, xj+1, ..., , xk fixed, but the data might not be allowing us
to do that very precisely.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 55 / 82
Because multicollinearity violates none of our assumptions, it is
essentially impossible to state hard rules about when it is a
“problem.” This has not stopped some from trying.
Other than just looking at the R2
j , a common “measure” of
multicollinearity is called the variance inflation factor (VIF):
VIFj =
1
1 − R2
j
.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 56 / 82
Because
Var(β̂j ) =
σ2
SSTj
· VIFj ,
the VIFj tells us how many times larger the variance is than if we had the
“ideal” case of no correlation of xij with xi1, ..., xi,j−1, xi,j+1, ..., , xik.
This sometimes leads to silly rules-of-thumb. For example, one should
be “concerned” if VIFj > 10 (equivalently, R2
j > .9).
Is R2
j > .9 “large”? Yes, in the sense that it would be better to have
R2
j smaller.
But, if we want to control for, say, x2, ..., xk to get a good ceteris
paribus effect of x1 on y, we often have no choice.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 57 / 82
A large VIFj can be offset by a large SSTj :
Var(β̂j ) =
σ2
SSTj
· VIFj
Remember, SSTj grows roughly linearly with the sample size, n. A
large VIFj can be offset by a large sample size. The value of VIFj per
se is irrelevant. Ultimately, it is Var(β̂j ) that is important.
Even so, at this point, we have no way of knowing whether Var(β̂j ) is
“too large” for the estimate β̂j to be useful. Only when we discuss
confidence intervals and hypothesis testing will this be apparent.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 58 / 82
Be wary of work that reports a set of multicollinearity “diagnostics”
and concludes nothing useful can be learned because multicollinearity
is “too severe.” Sometimes a VIF of about 10 is used to make such a
claim.
Other “diagnostics” are even more difficult to interpret. Using them
indiscriminately is often a mistake.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 59 / 82
Consider an example:
y = β0 + β1x1 + β2x2 + β3x3 + u,
where β1 is the coefficient of interest. In fact, assumex2 and x3 act as
controls so that we hope to get a good ceteris paribus estimate of x1.
Such controls are often highly correlated. (For example, x2 and x3
could be different standardized test scores.)
The key is that the correlation between x2 and x3 has nothing to do
with Var(β̂1). It is only correlation of x1 with (x2, x3) that matters.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 60 / 82
In an example to determine whether communities with larger minority
populations are discriminated against in lending, we might have
percapproved = β0 + β1percminority
+β2avginc + β3avghouseval + u,
where β1 is the key coefficient. We might expect avginc and avghouseval
to be highly correlated across communities. But we do not care really care
whether we can precisely estimate β2 or β3.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 61 / 82
Variances in Misspecified Models
As with bias calculations, we can study the variances of the OLS
estimators in misspecified models.
Consider the same case with (at most) two explanatory variables:
y = β0 + β1x1 + β2x2 + u
which we assume satisfies the Gauss-Markov assumptions.
We run the “short” regression, y on x1, and also the “long”
regression, y on x1, x2:
ỹ = β̃0 + β̃1x1
ŷ = β̂0 + β̂1x1 + β̂2x2
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 62 / 82
We know from the previous analysis that
Var(β̂1) =
σ2
SST1(1 − R2
1 )
conditional on the values xi1 and xi2 in the sample.
What about the simple regression estimator? Can show
Var(β̃1) =
σ2
SST1
which is again conditional on {(xi1, xi2) : i = 1, ..., n}.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 63 / 82
Whenever xi1 and xi2 are correlated, R2
1 > 0, and
Var(β̃1) =
σ2
SST1
<
σ2
SST1(1 − R2
1 )
< Var(β̂1)
So, by omitting x2, we can in fact get an estimator with a smaller
variance, even though it is biased. When we look at bias and
variance, we have a tradeoff between simple and multiple regression.
In the case R2
1 > 0, we can draw two conclusions.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 64 / 82
y = β0 + β1x1 + β2x2 + u
1 If β2 ̸= 0, β̃1 is biased, β̂1 is unbiased, but Var(β̃1) < Var(β̂1).
2 If β2 = 0, β̃1 and β̂1 are both unbiased and Var(β̃1) = Var(β̂1).
Case 2 is clear cut. If β2 = 0, x2 has no (partial) effect on y. When x2 is
correlated with x1, including it along with x1 in the regression makes it
more difficult to estimate the partial effect of x1.
Simple regression is clearly preferred.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 65 / 82
Case 1 is more difficult, but there are reasons to prefer the unbiased
estimator, β̂1.
First, the bias in β̃1 does not systematically change with the sample
size. We should assume the bias is as large when n = 1, 000 as when
n = 10.
By contrast, the variances Var(β̃1) and Var(β̂1) both shrink at the
rate 1/n. With a large sample size, the difference between Var(β̃1)
and Var(β̂1) is less important, especially considering the bias in β̃1 is
not shrinking.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 66 / 82
Second reason for preferring β̂1 is more subtle. The formulas
Var(β̃1) =
σ2
SST1
Var(β̂1) =
σ2
SST1(1 − R2
1 )
because they condition on the same explanatory variables, act as if the
error variance does not change when we add x2. But if β2 ̸= 0, the
variance (σ̂2) does shrink.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 67 / 82
In a more advanced course, we would be making a comparison between
Var(β̃1) =
η2
SST1
Var(β̂1) =
σ2
SST1(1 − R2
1 )
where η2 > σ2 reflects the larger error variance in the simple regression
analysis.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 68 / 82
Estimating the Error Variance
We still need to estimate σ2. With n observations and k + 1 parameters,
we only have
df = n − (k + 1)
degrees of freedom. Recall we lose the k + 1 df due to k + 1 restrictions
on the OLS residuals:
n
X
i=1
ûi = 0
n
X
i=1
xij ûi = 0, j = 1, 2, ..., k
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 69 / 82
Unbiased Estimation of σ2
Under the Gauss-Markov assumptions (MLR.1 through MLR.5)
σ̂2
= (n − k − 1)−1
n
X
i=1
û2
i = SSR/df
is an unbiased estimator of σ2.
This means that, if we divide by n rather than n − k − 1, the bias is
−σ2

k + 1
n

which means the estimated variance would be too small, on average.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 70 / 82
Unbiased Estimation of σ2
The bias disappears as n increases.
The square root of σ̂2, σ̂, is reported by all regression packages.
(standard error of the regression, or root mean squared error).
Note that SSR falls when a new explanatory variable is added, but df
falls, too. So σ̂ can increase or decrease when a new variable is added
in multiple regression.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 71 / 82
The standard error of each β̂j is computed (for the slopes) as
se(β̂j ) =
σ̂
q
SSTj (1 − R2
j )
and it will be critical to report these along with the coefficient
estimates.
We have discussed the three factors that affect se(β̂j ) already.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 72 / 82
Using WAGE2.DTA:
. reg lwage educ IQ exper
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 3, 755) = 69.78
Model | 57.0352742 3 19.0117581 Prob  F = 0.0000
Residual | 205.71337 755 .27246804 R-squared = 0.2171
-------------+------------------------------ Adj R-squared = 0.2140
Total | 262.748644 758 .346634095 Root MSE = .52198
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1069849 .0116513 9.18 0.000 .084112 .1298578
IQ | .0080269 .0015893 5.05 0.000 .0049068 .0111469
exper | .0435405 .0084242 5.17 0.000 .0270028 .0600783
_cons | -.228922 .2299876 -1.00 0.320 -.6804132 .2225692
------------------------------------------------------------------------------

lwage = −.229
(.230)
+ .107
(.012)
educ + .0080
(.0016)
IQ + .0435
(.0084)
exper
n = 759, R2
= .217
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 73 / 82
Efficiency of OLS: The Gauss-Markov Theorem
How come we use OLS, rather than some other estimation method?
Consider simple regression:
y = β0 + β1x + u
and write, for each i,
yi = β0 + β1xi + ui .
If we average across the n obervations we get
ȳ = β0 + β1x̄ + ū
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 74 / 82
For any i with xi ̸= x̄, subtract and rearrange:
β1 =
(yi − ȳ)
(xi − x̄)
+
(ui − ū)
(xi − x̄)
The last term has a zero expected value under random sampling and
E(u|x) = 0. If xi ̸= x̄ for all i, we could use an estimator
β̆1 = n−1
n
X
i=1
(yi − ȳ)
(xi − x̄)
β̆1 is not the same as the OLS estimator,
β̂1 =
Pn
i=1(xi − x̄)yi
Pn
i=1(xi − x̄)2
How do we know OLS is better than this new estimator, β̆1?
Generally, we do not. Under MLR.1 to MLR.4, both estimators are
unbiased.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 75 / 82
This means β̂1 has a sampling distribution that is less spread out
around β1 than β̆1. When comparing unbiased estimators, we prefer
an estimator with smaller variance.
We can make very general statements for the multiple regression case,
provided the 5 Gauss-Markov assumptions hold.
However, we must also limit the class of estimators that we can
compare with OLS.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 76 / 82
THEOREM (Gauss-Markov)
Under Assumptions MLR.1 through MLR.5, the OLS estimators β̂0, β̂1, ...,
β̂k are the best linear unbiased estimators (BLUEs)
Start from the end of “BLUE” and work backwards:
E (estimator) We must be able to compute an estimate from the
observable data, using a fixed rule.
L (linear) The estimator is a linear function of {yi : i = 1, 2, ..., n}. It
can be a nonlinear function of the explanatory variables.
These estimators have the general form
β̃j =
n
X
i=1
wij yi
qwhere the {wij : i = 1, ..., n} are any functions of
{(xi1, ..., xik) : i = 1, ..., n}.
The OLS estimators can be written in this way.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 77 / 82
U (unbiased)
We must impose enough restrictions on the wij – we omit those here – so
that
E(β̃j ) = βj , j = 0, 1..., k
(conditional on {(xi1, ..., xik) : i = 1, ..., n}).
We know the OLS estimators are unbiased under MLR.1 through MLR.4.
So are a lot of other linear estimators.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 78 / 82
B (best)
This means smallest variance (which makes sense once we impose
unbiasedness). In other words, what can be shown is that, under MLR.1
through MLR.5, and conditional on the explanatory variables in the
sample,
Var(β̂j ) ≤ Var(β̃j ) all j
(and usually the inequality is strict).
If we do not impose unbiasedness, then we can use silly rules – such as
β̃j = 1 always – to get estimators with zero variance.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 79 / 82
How do we use the GM Theorem? If the Gauss-Markov assumptions
hold, and we insist on unbiased estimators that are also linear
functions of {yi : i = 1, 2, ..., n}, then we need look no further than
OLS: it delivers the smallest possible variances.
It might be possible (but even so, not practical) to find unbiased
estimators that are nonlinear functions of {yi : i = 1, 2, ..., n} that
have smaller variances than OLS. The GM Theorem only allows linear
estimators in the comparison group.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 80 / 82
Appendix 3A contains a proof of the GM Theorem.
If MLR.5 fails, that is, Var(u|x1, ..., xk) depends on one or more xj ,
the GM conclusions do not hold. There may be linear, unbiased
estimators of the βj with smaller variance.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 81 / 82
Remember: Failure of MLR.5 does not cause bias in the β̂j , but it
does have two consequences:
1. The usual formuals for Var(β̂j ), and therefore for se(β̂j ), are
wrong.
2. The β̂j are no longer BLUE.
The first of these is more serious, as it will directly affect statistical
inference (next). The second consequence means we may want to
search for estimators other than OLS. This is not so easy. And with a
large sample, it may not be very important.
Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES
August 21, 2022 82 / 82

More Related Content

PDF
Is lmanalysis-131124184049-phpapp02
caselyndelacruz
 
PPTX
Patinkin real balance effect
senthamizh veena
 
PPT
Macroeconomics chapter 18
MDevSNPT
 
PPTX
General equilibrium ppt
DeepinderKaur38
 
PPSX
Welfare economics
Prabha Panth
 
PPSX
Market failure
Prabha Panth
 
PPTX
Partial equilibrium, reference pricing and price distortion
Devegowda S R
 
PPTX
Bergson social welfare function(1).pptx
jaheermuktharkp
 
Is lmanalysis-131124184049-phpapp02
caselyndelacruz
 
Patinkin real balance effect
senthamizh veena
 
Macroeconomics chapter 18
MDevSNPT
 
General equilibrium ppt
DeepinderKaur38
 
Welfare economics
Prabha Panth
 
Market failure
Prabha Panth
 
Partial equilibrium, reference pricing and price distortion
Devegowda S R
 
Bergson social welfare function(1).pptx
jaheermuktharkp
 

What's hot (20)

PPTX
Liquidity Preference Theory
efinancemanagement.com
 
PPTX
Arrows Impossibility Theorem.pptx
jaheermuktharkp
 
PPTX
stackelberg Duopoly model
athira thayyil
 
PPSX
Patinkin's Real Balance Effect
Prabha Panth
 
PDF
The adding up problem product exhaustion theorem yohannes mengesha
Yohannes Mengesha, PhD Fellow
 
PPTX
Ramsey-Cass-Koopmans model.pptx
TesfahunGetachew1
 
PPTX
Leontief Paradox.pptx
SuzOn3
 
PPTX
Offer curve
Dr. Sunil Chandanshive
 
PPT
Boumals theory of sales maximisation
Manish Kumar
 
PPT
BAUMOL THEORY.ppt
SebaMohanty1
 
PPTX
Bertrand competition presentation
Robin McKinnie
 
PDF
Chapter 6 - Romer Model
Ryan Herzog
 
PPTX
arrowsimpossibilitytheorem-220603081823-4733ec7f (1).pptx
PriyadharshanBobby
 
PDF
Devaluation Marshall Learner Approach
Indore Management Institute & Research Centre
 
PPSX
Sylos labini’s model of limit pricing
Prabha Panth
 
PPSX
Tobin's Portfolio demand for money
Prabha Panth
 
PPT
Krugman
Marcus Markus
 
PPTX
Exchange Control & Exchange Rate
Dhina Karan
 
PPTX
Tobin’s q theory
Sana Hassan Afridi
 
PPSX
6. joan robinson's model
Prabha Panth
 
Liquidity Preference Theory
efinancemanagement.com
 
Arrows Impossibility Theorem.pptx
jaheermuktharkp
 
stackelberg Duopoly model
athira thayyil
 
Patinkin's Real Balance Effect
Prabha Panth
 
The adding up problem product exhaustion theorem yohannes mengesha
Yohannes Mengesha, PhD Fellow
 
Ramsey-Cass-Koopmans model.pptx
TesfahunGetachew1
 
Leontief Paradox.pptx
SuzOn3
 
Boumals theory of sales maximisation
Manish Kumar
 
BAUMOL THEORY.ppt
SebaMohanty1
 
Bertrand competition presentation
Robin McKinnie
 
Chapter 6 - Romer Model
Ryan Herzog
 
arrowsimpossibilitytheorem-220603081823-4733ec7f (1).pptx
PriyadharshanBobby
 
Devaluation Marshall Learner Approach
Indore Management Institute & Research Centre
 
Sylos labini’s model of limit pricing
Prabha Panth
 
Tobin's Portfolio demand for money
Prabha Panth
 
Krugman
Marcus Markus
 
Exchange Control & Exchange Rate
Dhina Karan
 
Tobin’s q theory
Sana Hassan Afridi
 
6. joan robinson's model
Prabha Panth
 
Ad

Similar to Chapter5.pdf.pdf (20)

PDF
Chapter6.pdf.pdf
ROBERTOENRIQUEGARCAA1
 
PPTX
Multicolinearity
Pawan Kawan
 
PPTX
Ch_03_Wooldridge_5e_PPT Econometrics.pptx
zhihanyu3
 
PDF
Chapter-3.pdf
Abebaw31
 
PPTX
Multiple Linear Regression.pptx
BHUSHANKPATEL
 
PDF
Chapter4
Vu Vo
 
PPT
Chapter 3 Multiple linear regression.ppt
aschalew shiferaw
 
PPT
Econometrics ch11
Baterdene Batchuluun
 
PPT
Econometrics_ch11.ppt
MewdedDelelegn
 
PDF
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Maninda Edirisooriya
 
PDF
Multiple Regression Model.pdf
UsamaIqbal83
 
PDF
Chapter3 econometrics
Vu Vo
 
PDF
Multicollinearity1
Muhammad Ali
 
PPTX
Multicollinearity PPT
GunjanKhandelwal13
 
PDF
Chapter4_Multi_Reg_Estim.pdf.pdf
ROBERTOENRIQUEGARCAA1
 
PDF
Ch5_OLSasymptotic.pdf
ROBERTOENRIQUEGARCAA1
 
PDF
pdf (9).pdf
ROBERTOENRIQUEGARCAA1
 
PPTX
Lecture - 8 MLR.pptx
iris765749
 
PPTX
Multiple regression (1)
Shakeel Nouman
 
Chapter6.pdf.pdf
ROBERTOENRIQUEGARCAA1
 
Multicolinearity
Pawan Kawan
 
Ch_03_Wooldridge_5e_PPT Econometrics.pptx
zhihanyu3
 
Chapter-3.pdf
Abebaw31
 
Multiple Linear Regression.pptx
BHUSHANKPATEL
 
Chapter4
Vu Vo
 
Chapter 3 Multiple linear regression.ppt
aschalew shiferaw
 
Econometrics ch11
Baterdene Batchuluun
 
Econometrics_ch11.ppt
MewdedDelelegn
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Maninda Edirisooriya
 
Multiple Regression Model.pdf
UsamaIqbal83
 
Chapter3 econometrics
Vu Vo
 
Multicollinearity1
Muhammad Ali
 
Multicollinearity PPT
GunjanKhandelwal13
 
Chapter4_Multi_Reg_Estim.pdf.pdf
ROBERTOENRIQUEGARCAA1
 
Ch5_OLSasymptotic.pdf
ROBERTOENRIQUEGARCAA1
 
Lecture - 8 MLR.pptx
iris765749
 
Multiple regression (1)
Shakeel Nouman
 
Ad

More from ROBERTOENRIQUEGARCAA1 (20)

PDF
Incetidumbre y susenso en el cine curso de cfg
ROBERTOENRIQUEGARCAA1
 
PDF
psicologia y Narrativa en el cine curso de cfg
ROBERTOENRIQUEGARCAA1
 
PDF
incertidumbre, suspenso y psicologia en el cine cfg
ROBERTOENRIQUEGARCAA1
 
PDF
metaforas cognitivas en el cine curso cfg
ROBERTOENRIQUEGARCAA1
 
PDF
Metaforas sonoras en el cine clase de cfg de cine y neurociencias
ROBERTOENRIQUEGARCAA1
 
PPT
Memory Lecture Psychology Introduction part 1
ROBERTOENRIQUEGARCAA1
 
PDF
Sherlock.pdf
ROBERTOENRIQUEGARCAA1
 
PDF
Cognicion Social clase
ROBERTOENRIQUEGARCAA1
 
PPT
surveys non experimental
ROBERTOENRIQUEGARCAA1
 
PPT
experimental research
ROBERTOENRIQUEGARCAA1
 
PPT
non experimental
ROBERTOENRIQUEGARCAA1
 
PPT
quasi experimental research
ROBERTOENRIQUEGARCAA1
 
PPT
variables cont
ROBERTOENRIQUEGARCAA1
 
PPT
sampling experimental
ROBERTOENRIQUEGARCAA1
 
PPT
experimental designs
ROBERTOENRIQUEGARCAA1
 
PPT
experimental control
ROBERTOENRIQUEGARCAA1
 
PPT
validity reliability
ROBERTOENRIQUEGARCAA1
 
PPT
Experiment basics
ROBERTOENRIQUEGARCAA1
 
PPTX
Week 11.pptx
ROBERTOENRIQUEGARCAA1
 
Incetidumbre y susenso en el cine curso de cfg
ROBERTOENRIQUEGARCAA1
 
psicologia y Narrativa en el cine curso de cfg
ROBERTOENRIQUEGARCAA1
 
incertidumbre, suspenso y psicologia en el cine cfg
ROBERTOENRIQUEGARCAA1
 
metaforas cognitivas en el cine curso cfg
ROBERTOENRIQUEGARCAA1
 
Metaforas sonoras en el cine clase de cfg de cine y neurociencias
ROBERTOENRIQUEGARCAA1
 
Memory Lecture Psychology Introduction part 1
ROBERTOENRIQUEGARCAA1
 
Sherlock.pdf
ROBERTOENRIQUEGARCAA1
 
Cognicion Social clase
ROBERTOENRIQUEGARCAA1
 
surveys non experimental
ROBERTOENRIQUEGARCAA1
 
experimental research
ROBERTOENRIQUEGARCAA1
 
non experimental
ROBERTOENRIQUEGARCAA1
 
quasi experimental research
ROBERTOENRIQUEGARCAA1
 
variables cont
ROBERTOENRIQUEGARCAA1
 
sampling experimental
ROBERTOENRIQUEGARCAA1
 
experimental designs
ROBERTOENRIQUEGARCAA1
 
experimental control
ROBERTOENRIQUEGARCAA1
 
validity reliability
ROBERTOENRIQUEGARCAA1
 
Experiment basics
ROBERTOENRIQUEGARCAA1
 
Week 11.pptx
ROBERTOENRIQUEGARCAA1
 

Recently uploaded (20)

PDF
[Cameron] Robust Inference for Regression with Clustered Data - slides (2015)...
soarnagi1
 
PPTX
Accounting for liabilities stockholderss
Adugna37
 
PPTX
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
PDF
LM Curve Deri IS-LM Framework sess 10.pdf
mrigankjain19
 
PPTX
Principles of Management buisness sti.pptx
CarToonMaNia5
 
PDF
Joseph Patrick Roop - Roth IRAs: Weighing the Pros and Cons
Joseph Roop
 
PPT
Time Value of Money_Fundamentals of Financial Management
nafisa791613
 
PDF
Tran Quoc Bao named in Fortune - Asia Healthcare Leadership Index 2025
Gorman Bain Capital
 
PDF
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
PPTX
Maintenance_of_Genetic_Purity_of_Seed.pptx
prasadbishnu190
 
PDF
Illuminating the Future: Universal Electrification in South Africa by Matthew...
Matthews Bantsijang
 
PPTX
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
PDF
Why Most People Misunderstand Risk in Personal Finance.
Harsh Mishra
 
PPT
financial system chapter 1 overview of FS
kumlachewTegegn1
 
PPTX
Accounting for Managers and businesses .pptx
Nikita Bhardwaj
 
PDF
2025 Mid-year Budget Review_SPEECH_FINAL_23ndJuly2025_v5.pdf
JeorgeWilsonKingson1
 
PPTX
Unit1_Managerial_Economics_SEM 1-PPT.pptx
RISHIRISHI87
 
PPTX
Judaism-group-1.pptx for reporting grade 11
ayselprettysomuch
 
PPT
The reporting entity and financial statements
Adugna37
 
PPTX
d and f block elements chapter 4 in class 12
dynamicplays04
 
[Cameron] Robust Inference for Regression with Clustered Data - slides (2015)...
soarnagi1
 
Accounting for liabilities stockholderss
Adugna37
 
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
LM Curve Deri IS-LM Framework sess 10.pdf
mrigankjain19
 
Principles of Management buisness sti.pptx
CarToonMaNia5
 
Joseph Patrick Roop - Roth IRAs: Weighing the Pros and Cons
Joseph Roop
 
Time Value of Money_Fundamentals of Financial Management
nafisa791613
 
Tran Quoc Bao named in Fortune - Asia Healthcare Leadership Index 2025
Gorman Bain Capital
 
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
Maintenance_of_Genetic_Purity_of_Seed.pptx
prasadbishnu190
 
Illuminating the Future: Universal Electrification in South Africa by Matthew...
Matthews Bantsijang
 
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
Why Most People Misunderstand Risk in Personal Finance.
Harsh Mishra
 
financial system chapter 1 overview of FS
kumlachewTegegn1
 
Accounting for Managers and businesses .pptx
Nikita Bhardwaj
 
2025 Mid-year Budget Review_SPEECH_FINAL_23ndJuly2025_v5.pdf
JeorgeWilsonKingson1
 
Unit1_Managerial_Economics_SEM 1-PPT.pptx
RISHIRISHI87
 
Judaism-group-1.pptx for reporting grade 11
ayselprettysomuch
 
The reporting entity and financial statements
Adugna37
 
d and f block elements chapter 4 in class 12
dynamicplays04
 

Chapter5.pdf.pdf

  • 1. MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES Introductory Econometrics: A Modern Approach, 5e Haoming Liu National University of Singapore August 21, 2022 The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 1 / 82
  • 2. Recap We motived OLS estimation using the population regression function E(y|x1, x2, ..., xk) = β0 + β1x1 + β2x2 + ... + βkxk Given n observations, we obtained the sample regression function, ŷ = β̂0 + β̂1x1 + β̂2x2 + ... + β̂kxk, by choosing the β̂j to minimize the sum of squared residuals. The β̂j have a ceteris paribus interpretation. For example, ŷ = β̂1∆x1, if ∆x2 = ... = ∆xk = 0 Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 2 / 82
  • 3. Recap We discussed algebraic properties of fitted values and residuals. These hold for any sample. Also, features of R2, the goodness-of-fit measure. The only assumption we needed to discuss the algebraic properties for a given sample is that we can actually compute the estimates. Now we turn to statistical properties and features of the study the sampling distributions of the estimators. We go further than in the simple regression case, eventually covering statistical inference. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 3 / 82
  • 4. MLR As with simple regression, there is a set of assumptions under which OLS is unbiased. We also explicitly consider the bias caused by omitting a variable that appears in the population model. Now we label the assumptions with “MLR” (multiple linear regression). Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 4 / 82
  • 5. Assumption MLR.1 (Linear in Parameters) The model in the population can be written as y = β0 + β1x1 + β2x2 + ... + βkxk + u where the βj are the population parameters and u is the unobserved error. We have seen examples already where y and the xj can be nonlinear functions of underlying variables, and so the model is flexible. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 5 / 82
  • 6. Assumption MLR.2 (Random Sampling) We have a random sample of size n from the population, {(xi1, xi2, ..., xik, yi ) : i = 1, ..., n} As with SLR, this assumption introduces the data and implies the data are a representative sample from the population. Sometimes we will plug a random draw into the population equation: yi = β0 + β1xi1 + β2xi2 + ... + βkxik + ui , which emphasizes that, along with the observed variables, we effectively draw an unobserved error, ui , for each unit i. As an example, log(wagei ) = β0 + β1educi + β2IQi + β3experi + β4exper2 i + ui lwagei = β0 + β1educi + β2IQi + β3experi + β4expersqi + ui Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 6 / 82
  • 7. Assumption MLR.3 (No Perfect Collinearity) In the sample (and, therefore, in the population), none of the explanatory variables is constant, and there are no exact linear relationships among them. The need to rule out cases where {xij : i = 1, ..., n} has no variation for each j is clear from simple regression. There is a new part to the assumption because we have more than one explanatory variable. We must rule out the (extreme) case that one (or more) of the explanatory variables is an exact linear function of the others. If, say, xi1 is an exact linear function of xi2, ..., xik in the sample, we say the model suffers from perfect collinearity. Under perfect collinearity, there are no unique OLS estimators. Stata and other regression packages will indicate a problem. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 7 / 82
  • 8. Examples of Perfect Collinearity x1 = 2 ∗ x2 x1 = x3 + 3 ∗ x4 yi = β0 + β1x1i + β2x2i + ui = β0 + 0.5 ∗ β1x1i + 2 ∗ β2x2i + ui Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 8 / 82
  • 9. Assumption MLR.3 (No Perfect Collinearity) Usually perfect collinearity arises from a bad specification of the population model. A small sample size or bad luck in drawing the sample can also be the culprit. Assumption MLR.3 can only hold if n ≥ k + 1, that is, we must have at least as many observations as we have parameters to estimate. Suppose that k = 2 and x1 = educ, x2 = exper. If we draw our sample so that educi = 2experi for every i, then MLR.3 is violated. This is very unlikely unless the sample is small. (In any realistic population there are plenty of people whose education level is not twice their years of workforce experience.) Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 9 / 82
  • 10. Assumption MLR.3 (No Perfect Collinearity) With the samples we have looked at (n = 680, n = 759, even n = 173), the presence of perfect collinearity is usually a result of poor model specification, or defining variables inappropriately. Such problems can almost always be detected by remembering the ceteris paribus nature of multiple regression. EXAMPLE: Do not include the same variable in an equation that is measured in different units. For example, in a CEO salary equation, it would make no sense to include firm sales measured in dollars along with sales measured in millions of dollars. There is no new information once we include one of these. The return on equity should be included as a percent or proportion, but not both. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 10 / 82
  • 11. Assumption MLR.3 (No Perfect Collinearity) EXAMPLE: Be careful with functional forms! Suppose we start with a constant elasticity model of family consumption: log(cons) = β0 + β1 log(inc) + u How might we allow the elasticity to be nonconstant, but include the above as a special case? The following does not work: log(cons) = β0 + β1 log(inc) + β2 log(inc2 ) + u because log(inc2) = 2 log(inc), that is, x2 = 2x1, where x1 = log(inc). Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 11 / 82
  • 12. Assumption MLR.3 (No Perfect Collinearity) Instead, we probably mean something like log(cons) = β0 + β1 log(inc) + β2[log(inc)]2 + u which means x2 = x2 1 . With this choice, x2 is an exact nonlinear function of x1, but this (fortunately) is allowed in MLR.3. Tracking down perfect collinearity can be harder when it involves more than two variables. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 12 / 82
  • 13. Assumption MLR.3 (No Perfect Collinearity) EXAMPLE: In VOTE1.DTA: Go to poll everywhere. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 13 / 82
  • 14. Assumption MLR.3 (No Perfect Collinearity) One of the three variables has to be dropped. (Stata does this automatically, but we should rely on ourselves to properly construct a model and interpret it.) The model makes no sense from a ceteris paribus perspective. For example, β1 is suppose to measure the effect of changing expendA on voteA, holding fixed expendB and totexpend. But if expendB and totexpend are held fixed, expendA cannot change! We would probably drop totexpend and just use the two separate spending variables. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 14 / 82
  • 15. Assumption MLR.3 (No Perfect Collinearity) . gen totexpend = expendA + expendB . reg voteA expendA expendB totexpend note: expendA omitted because of collinearity Source | SS df MS Number of obs = 173 -------------+------------------------------ F( 2, 170) = 95.83 Model | 25679.8879 2 12839.944 Prob > F = 0.0000 Residual | 22777.3606 170 133.984474 R-squared = 0.5299 -------------+------------------------------ Adj R-squared = 0.5244 Total | 48457.2486 172 281.728189 Root MSE = 11.575 ------------------------------------------------------------------------------ voteA | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- expendA | (omitted) expendB | -.0744583 .0053848 -13.83 0.000 -.0850879 -.0638287 totexpend | .0383308 .0033868 11.32 0.000 .0316452 .0450165 _cons | 49.619 1.426147 34.79 0.000 46.80376 52.43423 ------------------------------------------------------------------------------ Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 15 / 82
  • 16. Assumption MLR.3 (No Perfect Collinearity) . reg voteA expendA expendB Source | SS df MS Number of obs = 173 -------------+------------------------------ F( 2, 170) = 95.83 Model | 25679.8879 2 12839.9439 Prob > F = 0.0000 Residual | 22777.3607 170 133.984474 R-squared = 0.5299 -------------+------------------------------ Adj R-squared = 0.5244 Total | 48457.2486 172 281.728189 Root MSE = 11.575 ------------------------------------------------------------------------------ voteA | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- expendA | .0383308 .0033868 11.32 0.000 .0316452 .0450165 expendB | -.0361275 .0031071 -11.63 0.000 -.042261 -.0299939 _cons | 49.619 1.426147 34.79 0.000 46.80376 52.43423 ------------------------------------------------------------------------------ Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 16 / 82
  • 17. Assumption MLR.3 (No Perfect Collinearity) The results of the previous regression seem sensible: spending by candidate A has a positive effect on the share of the vote received by A, and spending by B has essentially the opposite effect. (If expendA increases by 10, so $10,000, and expendB is held fixed, the voteA is estimated to increase by about .38 percentage points.) Note that shareA, which is a nonlinear function of expendA and expendA, shareA = 100 · expendA (expendA + expendB) can be included along with the two expenditure variables. It allows for the relative size of spending to matter. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 17 / 82
  • 18. Assumption MLR.3 (No Perfect Collinearity) . reg voteA expendA expendB shareA Source | SS df MS Number of obs = 173 -------------+------------------------------ F( 3, 169) = 346.87 Model | 41687.0627 3 13895.6876 Prob > F = 0.0000 Residual | 6770.18584 169 40.0602712 R-squared = 0.8603 -------------+------------------------------ Adj R-squared = 0.8578 Total | 48457.2486 172 281.728189 Root MSE = 6.3293 ------------------------------------------------------------------------------ voteA | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- expendA | -.0064488 .0029065 -2.22 0.028 -.0121866 -.000711 expendB | .0049463 .0026662 1.86 0.065 -.000317 .0102097 shareA | .5096844 .0254977 19.99 0.000 .4593494 .5600194 _cons | 24.96397 1.459247 17.11 0.000 22.08327 27.84467 ------------------------------------------------------------------------------ Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 18 / 82
  • 19. Assumption MLR.3 (No Perfect Collinearity) A Key Point Assumption MLR.3 does not say the explanatory variables have to be uncorrelated – in the population or sample. Nor does it say they cannot be “highly” correlated. MLR.3 rules out perfect correlation in the sample, that is, correlations of ±1. Again, in practice violations of MLR.3 are rare unless a mistake has been made in specifying the model. Multiple regression would be useless if we had to insist x1, ..., xk were uncorrelated in the sample (or population)! If the xj were all pairwise uncorrelated, we could just use a bunch of simple regressions. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 19 / 82
  • 20. Assumption MLR.3 (No Perfect Collinearity) In an equation like lwage = β0 + β1educ + β2IQ + β3exper + u, we fully expect correlation among educ, IQ, and exper. (Already saw educ and IQ are positively correlated; educ and exper tend to be negatively correlated (why?).) If educ were uncorrelated with all other variables that affect lwage, we could stick with simple regression of lwage on educ to estimate β1. Multiple regression allows us to estimate ceteris paribus effects precisely when there is correlation among the xj Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 20 / 82
  • 21. MLR.1 to MLR.3 1 y = β0 + β1x1 + β2x2 + ... + βkxk + u 2 random sampling from the population 3 no perfect collinearity in the sample The last assumption ensures that the OLS estimators are unique and can be obtained from the first order conditions (minizing the sum of squared residuals). We need a final assumption for unbiasedness. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 21 / 82
  • 22. Assumption MLR.4 (Zero Conditional Mean) E(u|x1, x2, ..., xk) = 0 for all (x1, ..., xk) Remember, the real assumption is E(u|x1, x2, ..., xk) = E(u): the average value of the error does not change across different slices of the population defined by x1, ..., xk. Setting E(u) = 0 essentially defines β0. If u is correlated with any of the xj , MLR.4 is violated. This is usually a good way to think about the problem. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 22 / 82
  • 23. Assumption MLR.4 (Zero Conditional Mean) When Assumption MLR.4 holds, we say x1, ..., xk are exogenous explanatory variables. If xj is correlated with u, we often say xj is an endogenous explanatory variable. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 23 / 82
  • 24. Assumption MLR.4 (Zero Conditional Mean) EXAMPLE: Effects of Class Size on Student Performance Suppose, for a standardized test score, score = β0 + β1classize + β2income + u Even at the same income level, families differ in their interest and concern about their children’s education. Family support and student motivation are in u. Are these correlated with class size even though we have included income? Probably. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 24 / 82
  • 25. Unbiasedness of OLS Theorem Under Assumptions MLR.1 through MLR.4, and conditional on {(xi1, ..., xik) : i = 1, ..., n}, the OLS estimators are unbiased: E(β̂j ) = βj , j = 0, 1, 2, ..., k for any values of the βj . The easiest proof requires matrix algebra. See Appendix 3A for a proof based on summations. Often the hope is that if our focus is on, say, x1, we can include enough other variables in x2, ..., xk to make MLR.4 true, or close to true. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 25 / 82
  • 26. Inclusion of Irrelevant Variables It is important to see that the unbiasedness result allow for the βj to be any value, including zero. Suppose, then, that we specify the model lwage = β0 + β1educ + β2exper + β3motheduc + u, where MLR.1 through MLR.4 hold. Suppose that β3 = 0, but we do not know that. We estimate the full model by OLS: lwage = β̂0 + β̂1educ + β̂2exper + β̂3motheduc Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 26 / 82
  • 27. Inclusion of Irrelevant Variables We automatically know from the unbiasedness result that E(β̂j ) = βj , j = 0, 1, 2 E(β̂3) = 0 The result that including an irrelevant variable, or overspecifying the model, does not cause bias in any coefficients is often presented with an extra argument. It follows from the general unbiasedness result, but it does increase the standard error of the estimates. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 27 / 82
  • 28. Omitted Variable Bias Leaving a variable out when it should be including in multiple regression is a serious problem. This is called excluding a relevant variable or underspecifying the model. We can perform a misspecification analysis in this case. The general case is more complicated. Consider the case where the correct model has two explanatory variables: y = β0 + β1x1 + β2x2 + u satisfies MLR.1 through MLR.4. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 28 / 82
  • 29. Omitted Variable Bias If we regress y on x1 and x2, we know the resulting estimators will be unbiased. But suppose we leave out x2 and use simple regression of y on x1: ỹ = β̃0 + β̃1x1 In most cases, we omit x2 because we cannot collect data on it. We can easily derive the bias in β̃1 (conditional on the sample outcomes {(xi1, xi2) : i = 1, ..., n}). Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 29 / 82
  • 30. Omitted Variable Bias We already have a relationship between β̃1 and the multiple regression estimator, β̂1: β̃1 = β̂1 + β̂2δ̃1 where β̂2 is the multiple regression estimator of β2 and δ̃1 is the slope coefficient in the auxiliary regression xi2 on xi1, i = 1, ..., n Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 30 / 82
  • 31. Omitted Variable Bias Now just use the fact that β̂1 and β̂2 are unbiased (or would be if we could compute them): Conditional on the sample values of x1 and x2, E(β̃1) = E(β̂1) + E(β̂2)δ̃1 = β1 + β2δ̃1 Therefore, Bias(β̃1) = β2δ̃1 Recall that δ̃1 has the same sign as the sample correlation Corr(xi1, xi2). Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 31 / 82
  • 32. Omitted Variable Bias The simple regression estimator is unbiased (for the given outcomes {(xi1, xi2)}) in two cases. 1. β2 = 0. But this means that x2 does not appear in the population model, so simple regression is the right thing to do. 2. Corr(xi1, xi2) = 0 (in the sample). Then the simple and multiple regression estimators are identical because δ̃1 = 0. If β2 ̸= 0 and Corr(xi1, xi2) ̸= 0 then β̃1 is generally biased. We do know know β2 and might only have a vague idea about the size of δ̃1. But we often can guess at the signs. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 32 / 82
  • 33. Omitted Variable Bias Technically, the bias computed holds for a particular “sample” on (x1, x2). But acting as if what matters is correlation between x1 and x2 in the population gives us the correct answer when we turn to asymptotic analysis. In what follows, we do not make the distinction between the sample and population correlation between x1 and x2. Bias in the Simple Regression Estimator of β1 Corr(x1, x2) > 0 Corr(x1, x2) < 0 β2 > 0 Positive Bias Negative Bias β2 < 0 Negative Bias Positive Bias Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 33 / 82
  • 34. EXAMPLE: Omitted Ability Bias lwage = β0 + β1educ + β2abil + u where abil is “ability.” Essentially by definition, β2 > 0. We also think Corr(educ, abil) > 0 so that higher ability people get more education, on average. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 34 / 82
  • 35. EXAMPLE: Omitted Ability Bias In this scenario, E(β̃1) > β1 so there is an upward bias in simple regression. Our failure to control for ability leads to (on average) overestimating the return to education. We attribute some of the effect of ability to education because ability and education are positively correlated. Remember, for a particular sample, we can never know whether β̃1 > β1. But we should be very hesitant to trust a procedure that produces to large an estimate on average. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 35 / 82
  • 36. EXAMPLE: Effects of a Tutoring Program on Student Performance GPA = β0 + β1tutor + β2abil + u where tutor is hours spent in tutoring. Again, β2 > 0. Suppose that students with lower ability tend to use more tutoring: Corr(tutor, abil) < 0 Then E(β̃1) = β1 + β2δ̃1 = β1 + (+)(−) < β1 so that our failure to account for ability leads us to underestimate the effect of tutoring. In fact, it could happen that β1 > 0 but E(β̃1) ≤ 0, so we tend to find no effect or even a negative effect. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 36 / 82
  • 37. Omitted Variable Bias If, as an approximation, we assume educ and exper are uncorrelated, and exper and abil are uncorrelated, then the bias analysis of β̃1 for the simpler case carries through, but it is now β3 that matters. So, as a rough guide, β̃1 will have an upward bias because β3 > 0 and Corr(educ, abil) > 0. In the general case, it should be remembered that correlation of any xj with an omitted variable generally causes bias in all of the OLS estimators, not just in β̃j See Appendix 3A in Wooldridge for a more detailed treatment. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 37 / 82
  • 38. The Variance of the OLS Estimators So far, we have assumed 1 y = β0 + β1x1 + β2x2 + . . . + βkxk + u 2 random sampling from the population 3 no perfect collinearity in the sample 4 E(u|x1, x2, ..., xk) = 0 Under MLR.3 we can compute the OLS estimates in our sample. The other assumptions then ensure that OLS is unbiased (conditional on the outcomes of the explanatory variables). Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 38 / 82
  • 39. Assumption MLR.5 (Homoskedasticity) When we have omitted an important variable, we have derived that OLS is biased, and we have shown how to obtain the sign of the bias in simple cases. As in the simple regression case, to obtain Var(β̂j ) we add a simplifying assumption: homoskedasticity (constant variance). The variance of the error, u, does not change with any of x1, x2, ..., xk: Var(u|x1, x2, ..., xk) = Var(u) = σ2 This assumption can never be guaranteed. We make it for now to get simple formulas, and to be able to discuss efficiency of OLS. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 39 / 82
  • 40. Assumption MLR.5 (Homoskedasticity) Assumptions MLR.1 and MLR.4 imply that E(β̂) = β MLR.5, Var(y|x1, x2, ..., xk) = Var(u|x1, x2, ..., xk) = σ2 Assumptions MLR.1 through MLR.5 are called the Gauss Markov assumptions. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 40 / 82
  • 41. Assumption MLR.5 (Homoskedasticity) If we have a savings equation, sav = β0 + β1inc + β2famsize + β3pareduc + u where famsize is size of the family and pareduc is total parents’ education, MLR.5 means that the variance in sav cannot depend in income, family size, or parents’s education. Later we will show how to relax MLR.5, and how to test whether it is true. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 41 / 82
  • 42. Assumption MLR.5 (Homoskedasticity) To set up the following theorem, we focus only on the slope coefficients. (A different formula is needed for Var(β̂0). As before, we are computing the variance conditional on the values of the explanatory variables in the sample. We need to define two quantities associated with each xj . The first is the total variation in xj in the sample: SSTj = n X i=1 (xij − x̄j )2 (SSTj /n is the sample variance of xj .) Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 42 / 82
  • 43. Assumption MLR.5 (Homoskedasticity) The second is a measure of correlation between xj and the other explanatory variables, in the sample. This is the R-squared from the regression xij on xi1, xi2, ..., xi,j−1, xi,j+1, ..., xik That is, we regress xj on all of the other explanatory variables. (y plays no role here). Call this R-squared R2 j , j = 1, ..., k. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 43 / 82
  • 44. Assumption MLR.5 (Homoskedasticity) Important: R2 j = 1 is ruled out by Assumption MLR.3 because R2 j = 1 means that, in the sample, xj is an exact linear function of the other explanatory variables. Any value 0 ≤ R2 j < 1 is permitted. As R2 j gets closer to one, xj is more linearly related to the other independent variables. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 44 / 82
  • 45. Theorem (Sampling Variances of OLS Slope Estimators) Under Assumptions MLR.1 to MLR.5, and condition on the values of the explanatory variables in the sample, Var(β̂j ) = σ2 SSTj (1 − R2 j ) , j = 1, 2, ..., k. All five Gauss-Markov assumptions are needed to ensure this formula is correct. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 45 / 82
  • 46. Assumption MLR.5 (Homoskedasticity) Suppose k = 3, y = β0 + β1x1 + β2x2 + β3x3 + u E(u|x1, x2, x3) = 0 Var(u|x1, x2, x3) = γ0 + γ1x1 where x1 ≥ 0 (as are γ0 and γ1). This violates MLR.5, and the standard variance formula is generally incorrect for all OLS estimators, not just Var(β̂1). Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 46 / 82
  • 47. Assumption MLR.5 (Homoskedasticity) The variance Var(β̂j ) = σ2 SSTj (1 − R2 j ) has three components. σ2 and SSTj are familiar from simple regression. The third, 1 − R2 j , is new to multiple regression. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 47 / 82
  • 48. Factors Affect Var(β̂j) 1 As the error variance (in the population), σ2, decreases, Var(β̂j ) decreases. One way to reduce the error variance is to take more stuff out of the error. That is, add more explanatory variables. 2 As the total sample variation in xj , SSTj , increases, Var(β̂j ) decreases. As in the simple regression case, it is easier to estimate how xj affects y if we see more variation in xj Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 48 / 82
  • 49. Factors Affect Var(β̂j) As we mentioned earlier, SSTj /n [or SSTj (n − 1) – the difference is unimportant here] is the sample variance of {xij : i = 1, 2, ..., n}. So we can assume SSTj ≈ nσ2 j where σ2 j > 0 is the population variance of xj . We can increase SSTj by increasing the sample size. SSTj is roughly a linear function of n. [Of the three components in Var(β̂j ), this is the only one that depends systematically on n.] Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 49 / 82
  • 50. Factors Affect Var(β̂j) Var(β̂j ) = σ2 SSTj (1 − R2 j ) As R2 j → 1, Var(β̂j ) → ∞. R2 j measures how linearly related xj is to the other explanatory variables. We get the smallest variance for β̂j when R2 j = 0: Var(β̂j ) = σ2 SSTj , which looks just like the simple regression formula. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 50 / 82
  • 51. Factors Affect Var(β̂j) If xj is unrelated to all other independent variables, it is easier to estimate its ceteris paribus effect on y. R2 j = 0 is very rare. Even small values are not especially common. In fact, R2 j ≈ 1 is somewhat common, and this can cause problems for getting a sufficiently precise estimate of βj . Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 51 / 82
  • 52. Assumption MLR.5 (Homoskedasticity) Below is a graph of Var(β̂1) as a function of R2 1 : Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 52 / 82
  • 53. Loosely, R2 j “close” to one is called the “problem” of multicollinearity. Unfortunately, we cannot define what we mean by “close” that is relevant for all situations. We have ruled out the case of perfect collinearity, R2 j = 1. Here is an important point: One often hears discussions of multicollinearity as if high correlation among two or more of the xj is a violation of an assumption we have made. But it does not violate any of the Gauss-Markov assumptions, including MLR.3. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 53 / 82
  • 54. We know that if the zero conditional mean assumption is violated, OLS is not unbiased. If MLR.1 through MLR.4 hold, but homoscedasticity (constant variance) does not, then Var(β̂j ) = σ2 SSTj (1 − R2 j ) is not the correct formula. But multicollinearity does not cause the OLS estimators to be biased. We still have E(β̂j ) = βj . Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 54 / 82
  • 55. Further, any claim that the OLS variance formula is “biased” in the presence of multicollinearity is also wrong. The formula is correct under MLR.1 through MLR.5. In fact, the formula is doing its job: It shows that if R2 j is “close” to one, Var(β̂j ) might be very large. If R2 j is “close” to one, xj does not have much sample variation separate from the other explanatory variables. We are trying to estimate the effect of xj on y, holding x1, ..., xj−1, xj+1, ..., , xk fixed, but the data might not be allowing us to do that very precisely. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 55 / 82
  • 56. Because multicollinearity violates none of our assumptions, it is essentially impossible to state hard rules about when it is a “problem.” This has not stopped some from trying. Other than just looking at the R2 j , a common “measure” of multicollinearity is called the variance inflation factor (VIF): VIFj = 1 1 − R2 j . Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 56 / 82
  • 57. Because Var(β̂j ) = σ2 SSTj · VIFj , the VIFj tells us how many times larger the variance is than if we had the “ideal” case of no correlation of xij with xi1, ..., xi,j−1, xi,j+1, ..., , xik. This sometimes leads to silly rules-of-thumb. For example, one should be “concerned” if VIFj > 10 (equivalently, R2 j > .9). Is R2 j > .9 “large”? Yes, in the sense that it would be better to have R2 j smaller. But, if we want to control for, say, x2, ..., xk to get a good ceteris paribus effect of x1 on y, we often have no choice. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 57 / 82
  • 58. A large VIFj can be offset by a large SSTj : Var(β̂j ) = σ2 SSTj · VIFj Remember, SSTj grows roughly linearly with the sample size, n. A large VIFj can be offset by a large sample size. The value of VIFj per se is irrelevant. Ultimately, it is Var(β̂j ) that is important. Even so, at this point, we have no way of knowing whether Var(β̂j ) is “too large” for the estimate β̂j to be useful. Only when we discuss confidence intervals and hypothesis testing will this be apparent. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 58 / 82
  • 59. Be wary of work that reports a set of multicollinearity “diagnostics” and concludes nothing useful can be learned because multicollinearity is “too severe.” Sometimes a VIF of about 10 is used to make such a claim. Other “diagnostics” are even more difficult to interpret. Using them indiscriminately is often a mistake. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 59 / 82
  • 60. Consider an example: y = β0 + β1x1 + β2x2 + β3x3 + u, where β1 is the coefficient of interest. In fact, assumex2 and x3 act as controls so that we hope to get a good ceteris paribus estimate of x1. Such controls are often highly correlated. (For example, x2 and x3 could be different standardized test scores.) The key is that the correlation between x2 and x3 has nothing to do with Var(β̂1). It is only correlation of x1 with (x2, x3) that matters. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 60 / 82
  • 61. In an example to determine whether communities with larger minority populations are discriminated against in lending, we might have percapproved = β0 + β1percminority +β2avginc + β3avghouseval + u, where β1 is the key coefficient. We might expect avginc and avghouseval to be highly correlated across communities. But we do not care really care whether we can precisely estimate β2 or β3. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 61 / 82
  • 62. Variances in Misspecified Models As with bias calculations, we can study the variances of the OLS estimators in misspecified models. Consider the same case with (at most) two explanatory variables: y = β0 + β1x1 + β2x2 + u which we assume satisfies the Gauss-Markov assumptions. We run the “short” regression, y on x1, and also the “long” regression, y on x1, x2: ỹ = β̃0 + β̃1x1 ŷ = β̂0 + β̂1x1 + β̂2x2 Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 62 / 82
  • 63. We know from the previous analysis that Var(β̂1) = σ2 SST1(1 − R2 1 ) conditional on the values xi1 and xi2 in the sample. What about the simple regression estimator? Can show Var(β̃1) = σ2 SST1 which is again conditional on {(xi1, xi2) : i = 1, ..., n}. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 63 / 82
  • 64. Whenever xi1 and xi2 are correlated, R2 1 > 0, and Var(β̃1) = σ2 SST1 < σ2 SST1(1 − R2 1 ) < Var(β̂1) So, by omitting x2, we can in fact get an estimator with a smaller variance, even though it is biased. When we look at bias and variance, we have a tradeoff between simple and multiple regression. In the case R2 1 > 0, we can draw two conclusions. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 64 / 82
  • 65. y = β0 + β1x1 + β2x2 + u 1 If β2 ̸= 0, β̃1 is biased, β̂1 is unbiased, but Var(β̃1) < Var(β̂1). 2 If β2 = 0, β̃1 and β̂1 are both unbiased and Var(β̃1) = Var(β̂1). Case 2 is clear cut. If β2 = 0, x2 has no (partial) effect on y. When x2 is correlated with x1, including it along with x1 in the regression makes it more difficult to estimate the partial effect of x1. Simple regression is clearly preferred. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 65 / 82
  • 66. Case 1 is more difficult, but there are reasons to prefer the unbiased estimator, β̂1. First, the bias in β̃1 does not systematically change with the sample size. We should assume the bias is as large when n = 1, 000 as when n = 10. By contrast, the variances Var(β̃1) and Var(β̂1) both shrink at the rate 1/n. With a large sample size, the difference between Var(β̃1) and Var(β̂1) is less important, especially considering the bias in β̃1 is not shrinking. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 66 / 82
  • 67. Second reason for preferring β̂1 is more subtle. The formulas Var(β̃1) = σ2 SST1 Var(β̂1) = σ2 SST1(1 − R2 1 ) because they condition on the same explanatory variables, act as if the error variance does not change when we add x2. But if β2 ̸= 0, the variance (σ̂2) does shrink. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 67 / 82
  • 68. In a more advanced course, we would be making a comparison between Var(β̃1) = η2 SST1 Var(β̂1) = σ2 SST1(1 − R2 1 ) where η2 > σ2 reflects the larger error variance in the simple regression analysis. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 68 / 82
  • 69. Estimating the Error Variance We still need to estimate σ2. With n observations and k + 1 parameters, we only have df = n − (k + 1) degrees of freedom. Recall we lose the k + 1 df due to k + 1 restrictions on the OLS residuals: n X i=1 ûi = 0 n X i=1 xij ûi = 0, j = 1, 2, ..., k Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 69 / 82
  • 70. Unbiased Estimation of σ2 Under the Gauss-Markov assumptions (MLR.1 through MLR.5) σ̂2 = (n − k − 1)−1 n X i=1 û2 i = SSR/df is an unbiased estimator of σ2. This means that, if we divide by n rather than n − k − 1, the bias is −σ2 k + 1 n which means the estimated variance would be too small, on average. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 70 / 82
  • 71. Unbiased Estimation of σ2 The bias disappears as n increases. The square root of σ̂2, σ̂, is reported by all regression packages. (standard error of the regression, or root mean squared error). Note that SSR falls when a new explanatory variable is added, but df falls, too. So σ̂ can increase or decrease when a new variable is added in multiple regression. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 71 / 82
  • 72. The standard error of each β̂j is computed (for the slopes) as se(β̂j ) = σ̂ q SSTj (1 − R2 j ) and it will be critical to report these along with the coefficient estimates. We have discussed the three factors that affect se(β̂j ) already. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 72 / 82
  • 73. Using WAGE2.DTA: . reg lwage educ IQ exper Source | SS df MS Number of obs = 759 -------------+------------------------------ F( 3, 755) = 69.78 Model | 57.0352742 3 19.0117581 Prob F = 0.0000 Residual | 205.71337 755 .27246804 R-squared = 0.2171 -------------+------------------------------ Adj R-squared = 0.2140 Total | 262.748644 758 .346634095 Root MSE = .52198 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1069849 .0116513 9.18 0.000 .084112 .1298578 IQ | .0080269 .0015893 5.05 0.000 .0049068 .0111469 exper | .0435405 .0084242 5.17 0.000 .0270028 .0600783 _cons | -.228922 .2299876 -1.00 0.320 -.6804132 .2225692 ------------------------------------------------------------------------------ lwage = −.229 (.230) + .107 (.012) educ + .0080 (.0016) IQ + .0435 (.0084) exper n = 759, R2 = .217 Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 73 / 82
  • 74. Efficiency of OLS: The Gauss-Markov Theorem How come we use OLS, rather than some other estimation method? Consider simple regression: y = β0 + β1x + u and write, for each i, yi = β0 + β1xi + ui . If we average across the n obervations we get ȳ = β0 + β1x̄ + ū Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 74 / 82
  • 75. For any i with xi ̸= x̄, subtract and rearrange: β1 = (yi − ȳ) (xi − x̄) + (ui − ū) (xi − x̄) The last term has a zero expected value under random sampling and E(u|x) = 0. If xi ̸= x̄ for all i, we could use an estimator β̆1 = n−1 n X i=1 (yi − ȳ) (xi − x̄) β̆1 is not the same as the OLS estimator, β̂1 = Pn i=1(xi − x̄)yi Pn i=1(xi − x̄)2 How do we know OLS is better than this new estimator, β̆1? Generally, we do not. Under MLR.1 to MLR.4, both estimators are unbiased. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 75 / 82
  • 76. This means β̂1 has a sampling distribution that is less spread out around β1 than β̆1. When comparing unbiased estimators, we prefer an estimator with smaller variance. We can make very general statements for the multiple regression case, provided the 5 Gauss-Markov assumptions hold. However, we must also limit the class of estimators that we can compare with OLS. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 76 / 82
  • 77. THEOREM (Gauss-Markov) Under Assumptions MLR.1 through MLR.5, the OLS estimators β̂0, β̂1, ..., β̂k are the best linear unbiased estimators (BLUEs) Start from the end of “BLUE” and work backwards: E (estimator) We must be able to compute an estimate from the observable data, using a fixed rule. L (linear) The estimator is a linear function of {yi : i = 1, 2, ..., n}. It can be a nonlinear function of the explanatory variables. These estimators have the general form β̃j = n X i=1 wij yi qwhere the {wij : i = 1, ..., n} are any functions of {(xi1, ..., xik) : i = 1, ..., n}. The OLS estimators can be written in this way. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 77 / 82
  • 78. U (unbiased) We must impose enough restrictions on the wij – we omit those here – so that E(β̃j ) = βj , j = 0, 1..., k (conditional on {(xi1, ..., xik) : i = 1, ..., n}). We know the OLS estimators are unbiased under MLR.1 through MLR.4. So are a lot of other linear estimators. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 78 / 82
  • 79. B (best) This means smallest variance (which makes sense once we impose unbiasedness). In other words, what can be shown is that, under MLR.1 through MLR.5, and conditional on the explanatory variables in the sample, Var(β̂j ) ≤ Var(β̃j ) all j (and usually the inequality is strict). If we do not impose unbiasedness, then we can use silly rules – such as β̃j = 1 always – to get estimators with zero variance. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 79 / 82
  • 80. How do we use the GM Theorem? If the Gauss-Markov assumptions hold, and we insist on unbiased estimators that are also linear functions of {yi : i = 1, 2, ..., n}, then we need look no further than OLS: it delivers the smallest possible variances. It might be possible (but even so, not practical) to find unbiased estimators that are nonlinear functions of {yi : i = 1, 2, ..., n} that have smaller variances than OLS. The GM Theorem only allows linear estimators in the comparison group. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 80 / 82
  • 81. Appendix 3A contains a proof of the GM Theorem. If MLR.5 fails, that is, Var(u|x1, ..., xk) depends on one or more xj , the GM conclusions do not hold. There may be linear, unbiased estimators of the βj with smaller variance. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 81 / 82
  • 82. Remember: Failure of MLR.5 does not cause bias in the β̂j , but it does have two consequences: 1. The usual formuals for Var(β̂j ), and therefore for se(β̂j ), are wrong. 2. The β̂j are no longer BLUE. The first of these is more serious, as it will directly affect statistical inference (next). The second consequence means we may want to search for estimators other than OLS. This is not so easy. And with a large sample, it may not be very important. Liu, H (NUS) MULTIPLE REGRESSION ANALYSIS: STATISTICAL PROPERTIES August 21, 2022 82 / 82