0% found this document useful (0 votes)
56 views

5 - 7. MR - Estimation

This document provides an overview of multiple regression analysis. It discusses how multiple regression allows researchers to explicitly control for many factors that influence the dependent variable. Multiple regression estimates the relationship between a dependent variable and two or more independent variables. It allows researchers to measure the effect of one independent variable on the dependent variable while holding other independent variables fixed. The document also discusses the mechanics and interpretation of ordinary least squares estimation in multiple regression models.

Uploaded by

Nikhilesh Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

5 - 7. MR - Estimation

This document provides an overview of multiple regression analysis. It discusses how multiple regression allows researchers to explicitly control for many factors that influence the dependent variable. Multiple regression estimates the relationship between a dependent variable and two or more independent variables. It allows researchers to measure the effect of one independent variable on the dependent variable while holding other independent variables fixed. The document also discusses the mechanics and interpretation of ordinary least squares estimation in multiple regression models.

Uploaded by

Nikhilesh Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

5 - 7: Multiple Regression Analysis - Estimation∗

Learning Objectives

• Multiple Regression Analysis


• Mechanics & Interpretation of OLS
• Expected Value of the OLS Estimators
• Variance of OLS Estimators
• Efficiency of OLS

Multiple Regression (MR) Analysis

• Allows to explicitly control for many factors that affect the dependent variable.
• Hence, more amenable to ceteris paribus analysis.
• Naturally, if we add more factors to our model, then more of the variation in y can be
explained.
• It can also incorporate fairly general functional form relationships.

MR: Model with Two Independent Variables

• Wage Example.
– wage = β0 + β1 educ + β2 exper + u.
– W age is determined by the two independent variables.
– Other unobserved factors are contained in u.
– MR effectively takes exper out of the u and puts it explicitly in the equation.
– We still have to make assumptions about how u is related to educ and exper. (zero-
conditional mean)
– However, we can be confident of one thing.
– Because equation contains exper explicitly, we will be able to measure the effect of educ
on wage, holding exper fixed.
– In a simple regression analysis - which puts exper in the error term - we would have to
assume that exper is uncorrelated with educ, a tenuous assumption.

MR: Model with Two Independent Variables

• Average Test Score Example.


– Effect of per-student spending (expend) on the average standardized test score (avgscore)
at the high school level.
– avgscore = β0 + β1 expend + β2 avginc + u.
∗ Reference: WR, Chapter 3

1
– avginc: average family income.
– By including avginc explicitly in the model, we are able to control for its effect on
avgscore.
– In simple regression, avginc would be included in u, which would likely be correlated
with expend, causing the OLS estimator of β1 to be biased.

MR: Model with Two Independent Variables

• General model with two independent variables


– y = β0 + β1 x1 + β2 x2 + u.
– The key assumption about how u is related to x1 and x2 .
– E(u|x1 , x2 ) = 0.

MR & Functional Forms

• Family Consumption Example.


– Suppose family consumption (cons) is a quadratic function of family income (inc).
– cons = β0 + β1 inc + β2 inc2 + u.
– Mechanically, there will be no difference in using the method of OLS.
– However, an important difference in how one interprets the parameters.
– It makes no sense to measure the effect of inc on cons while holding inc2 fixed.
– Instead, the change in consumption with respect to the change in income - the marginal
propensity to consume is approximated by
– ∆cons
∆inc ≈ β1 + 2β2 inc.

MR with k Independent Variables

• Multiple regression analysis allows many observed factors to affect y.


• The general multiple linear regression (MLR) model
– y = β0 + β1 x1 + β2 x2 + ... + βk xk + u.
– The terminology is similar to that for simple regression.
– There will always be factors we cannot include and are in u.
– The key assumption in terms of a conditional expectation:
– E(u|x1 , x2 , ..., xk ) = 0.
– At a minimum, this requires that all factors in u be uncorrelated with the explanatory
variables.
– It also means that we have correctly accounted for the functional relationships.

Mechanics & Interpretation of OLS

• OLS for two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
– The
Pn method of OLS chooses the estimates to minimize the sum of squared residuals.
2 2
– û
i=1 i = (y i − β̂0 − β̂1 x1 − β̂2 x2 ) .

Mechanics & Interpretation of OLS

• Case with k independent variables.


– OLS: ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂k xk .

2
– The
Pn method of OLS chooses the estimates to minimize the sum of squared residuals.
2 2
– û
i=1 i = (yi − β̂0 − β̂1 xi1 − β̂2 xi2 − ... − β̂k xik ) .

Mechanics & Interpretation of OLS: Interpretation

• Two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
– β̂1 and β̂2 have partial effect. We can obtain the predicted change in y given the
changes in x1 and x2 .
– ∆ŷ = β̂1 ∆x1 + β̂2 ∆x2 .
– In particular, when x2 is held fixed.
– ∆ŷ = β̂1 ∆x1 .

Mechanics & Interpretation of OLS: Interpretation

• Two independent variables: Determinants of College GPA.


– OLS line to predict college GPA from high school GPA and achievement test score.
\A = 1.29 + 0.453hsGP A + 0.009ACT .
– colGP
– Because no one has either a zero on hsGPA or ACT, the intercept in this equation is
not, by itself, meaningful.
– Positive partial relationship between colGP A and hsGP A.
– Holding ACT fixed, another point on hsGP A is associated with .453 of a point on the
colGP A.

Mechanics & Interpretation of OLS: Interpretation

• More than two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂k xk .
– ∆ŷ = β̂1 ∆x1 + β̂2 ∆x2 + ... + β̂k ∆xk .
– β̂1 measures the change in ŷ due to a one-unit increase in x1 , holding all other variables
fixed.
– ∆ŷ = β̂1 ∆x1 .

Mechanics & Interpretation of OLS: Interpretation

• More than two independent variables: Hourly Wage Example.


\ = 0.284 + 0.092educ + 0.0041exper + 0.022tenure.
– log(wage)
– Holding exper and tenure fixed, another year of education is predicted to increase the
wage by 9.2%.

MR: Holding other Factors Fixed

• College GPA Example.


\A = 1.29 + 0.453hsGP A + 0.009ACT .
– colGP
– The coefficient on ACT measures the predicted difference in colGP A, holding hsGP A
fixed.
– MR provides this ceteris paribus interpretation even though the data have not been
collected in a ceteris paribus fashion.

3
– It may seem that we actually went out and sampled people with the same high school
GPA but possibly with different ACT scores.
– If we could collect a sample of individuals with the same high school GPA, then we
could perform a simple regression analysis relating colGP A to ACT .
– MR effectively allows us to mimic this situation without restricting the values of any
independent variables.
– MR allows us to keep other factors fixed in nonexperimental environments.

MR: Changing More Than One x Simultaneously

• Hourly Wage Example


\ = 0.284 + 0.092educ + 0.0041exper + 0.022tenure.
– log(wage)
– Estimated effect on wage when exper and tenure both increase by 1 year (holding educ
fixed).
\
– ∆log(wage) = 0.0041∆exper + 0.022∆tenure = 0.0041 + 0.022 = 0.0261 = 2.61%.

MR: Fitted Values & Residuals

-Fitted values - ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik .
• Residuals
– ûi = yi − ŷi .

MR: “Partialling Out” Interpretation

• Consider the case of two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 . Pn
r̂i1 yi
– One way to express β̂1 is: β̂1 = Pi=i
n 2 r̂i1
i=1
– r̂i1 are the OLS residuals from a simple regression of x1 on x2 .
– We regress x1 on x2 , and then obtain the residuals.
– Then do a simple regression of y on r̂1 to obtain β̂1 .
• Partial effect interpretation.
– r̂i1 is xi1 after the effects of xi2 is partialled out or netted out.
– Thus, β̂1 measures the sample relationship between y and x1 after x2 has been partialled
out.
– In general model with k explanatory variables, r̂i1 come from the regression of x1 on
x2 , ..., xk .
– Thus, β̂1 measures the relationship between y and x1 after x2 , .., xk has been partialled
out.

Comparison of SR & MR

• Equations
– SR: ỹ = β̃0 + β̃1 x1 .
– MR: ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
• Relationship between β̃1 and β̂1 .
– β̃1 = β̂1 + β̂2 δ̃1 .
– δ̃1 is the slope coefficient from the simple regression of x2 on x1 .

4
• Two cases when they are equal.
– The partial effect of x2 on ŷ is zero in the sample. That is, β̂2 = 0.
– x1 and x2 are uncorrelated in the sample. That is, δ̃1 = 0.

Comparison of SR & MR

• Example: Participation in 401(k) pension plans.


– The effect of a plan’s match rate (mrate) on the participation rate (prate) in its 401(k)
pension plan.
– OLS: p\ rate = 80.12 + 5.52mrate + .243age.
– age is the age of 401(k) plan.
– What happens if we do not control for age?
– OLS: p\ rate = 83.08 + 5.86mrate.
– There is a difference in coefficient of mrate, but it is not big.
– This can be explained by the fact that the sample correlation between mrate and age is
only .12.

Comparison of SR & MR

• In case with k independent variables.


• SR of y on x1 and MR of y on x1 , x2 , .., xk produce an identical estimate of x1 only if
– OLS coefficients on x2 through xk are all zero.
– x1 is uncorrelated with each of x2 , ..., xk .

MR: Goodness of Fit

• Way of measuring how well the independent variables explains the dependent variable.
• For a sample, we can define the following:
Pn
– Total Sum of Squares (SST) = i=1P (yi − ȳ)2 .
n
– Explained Sum of Squares (SSE) =P i=1 (ŷi − ȳ)2 .
n
– Residual Sum of Squares (SSR) = i=1 ûi 2 .

MR: Goodness of Fit

• R-squared of the regression (also called coefficient of determination).


– R2 = SSE SSR
SST = 1 − SST
– Fraction of the sample variation in y that is explained by x1 , .., xk .
– The value of R2 is always between zero and one, because SSE can be no greater than
SST.
– R2 can also be shown to equal the squared correlation coefficient between the actual yi
Pn values yˆi . ¯ 2
and the fitted
2 ( (yi −ȳ)(ŷi −ŷ))
– R = Pn i=1 2 Pn
( (yi −ȳ) )( ¯ 2 (ŷi −ŷ) )
i=1 i=1

MR: Goodness of Fit

• Example: Determinants of College GPA.


\A = 1.29 + 0.453hsGP A + 0.009ACT .
– colGP
– n = 141, R2 = 0.176.

5
– This means that hsGP A and ACT together explain about 17.6% of the variation in
colGP A.

MR: Goodness of Fit

• Important fact about R2 .


– It never decreases, and it usually increases, when another independent variable is added
to a regression and the same set of observations is used for both regressions.
– This is an algebraic fact: SSR never increases when additional regressors are added to
the model.
– However, it assumes we do not have missing data on the explanatory variables.
– If two regressions use different sets of observations, then, in general, we cannot tell how
the R2 will compare.

MR: Regression through Origin

• OLS estimation when the intercept is zero.


– ỹ = β̃1 x1 + β̃2 x2 + ... + β̃k xk .
– The OLS estimates minimize the sum of squared residuals, but with the intercept set at
zero.
– However, the properties of OLS no longer hold for regression through the origin.
– In particular, the OLS residuals no longer have a zero sample average.
– One serious drawback: If β0 in the population model is different from 0, then the β̂j will
be biased.

Expected Value of the OLS Estimators

• Assumptions, under which the OLS estimators are unbiased for the population parameters.
– Assumption MLR 1: Linear in Parameters.
– Assumption MLR 2: Random Sampling.
– Assumption MLR 3: No Perfect Collinearity.
– Assumption MLR 4: Zero Conditional Mean.

Expected Value of the OLS Estimators

• Assumption MLR 1: Linear in Parameters.


– Simply defines the MLR model.
– The population model can be written as.
– y = β0 + β1 x1 + β2 x2 + ... + βk xk + u.

Expected Value of the OLS Estimators

• Assumption MLR 2: Random Sampling.


– We have a random sample following the population model: {(xi1 , xi2 , ..., xik , yi ) : i =
1, 2, .., n}.
– Equation of a particular observation i, in terms of population model.
– yi = β0 + β1 xi1 + β2 xi2 + ... + βk xik + ui .
– The term ui contains the unobserved factors for ith observation.
– OLS chooses the β0 and βk so that the residuals average to zero and the sample correlation
between each independent variable and the residuals is zero.

6
Expected Value of the OLS Estimators

• Assumption MLR 3: No Perfect Collinearity.


– None of the x is constant, and there are no exact linear relationships among the
independent variables.
– If an independent variable is an exact linear combination of the other independent
variables, then we say the model suffers from perfect collinearity, and it cannot be
estimated by OLS.
– MLR.3 does allow the independent variables to be correlated; they just cannot be
perfectly correlated.
– The simplest way that two independent variables can be perfectly correlated is when
one variable is a constant multiple of another.
– Same variable measured in different units in a regression equation.

Expected Value of the OLS Estimators

• Assumption MLR 3: No Perfect Collinearity.


– Nonlinear functions of the same variable can appear among the regressors. e.g. inc and
inc2 .
– Some caution in log models.
– log(cons) = β0 + β1 log(inc) + β2 log(inc2 ) + u.
– x1 = log(inc) and x2 = log(inc2 )
– Using basic properties of natural log: log(inc2 ) = 2log(inc).
– This means, x2 = 2x1 . Hence perfect collinearity.
– Rather include [log(inc)]2 in the equation.

Expected Value of the OLS Estimators

• Assumption MLR 3: No Perfect Collinearity.


– The solution to the perfect collinearity is simple: drop one of the linearly related
variables.
– MLR.3 also fails if the sample size is too small in relation to the number of parameters
being estimated.
– If the model is carefully specified and n ≥ k + 1, Assumption MLR.3 can fail in rare
cases.

Expected Value of the OLS Estimators

• Assumption MLR 4: Zero Conditional Mean.


– E(u|x1 , x2 , .., xk ) = 0.
– One way that MLR.4 can fail is if the functional relationship is misspecified.
– e.g. non inclusion of non linear term, log term.
– Omitting an important factor that is correlated with any of x1 , x2 , ..., xk causes MLR.4
to fail also.
– Problem of measurement error in an explanatory variable: Failure of MLR.4.
– When Assumption MLR.4 holds, we say that we have exogenous explanatory variables.
– If xj is correlated with u for any reason, then xj is said to be an endogenous explanatory
variable.

7
Expected Value of the OLS Estimators

• Unbiasedness of OLS: Under Assumptions MLR.1 through MLR.4.


– E(β̂j ) = βj , j = 0, 1, 2, ..., k.
– The OLS estimators are unbiased estimators of the population parameters.
– Exact meaning: Procedure by which the OLS estimates are obtained is unbiased when
we view the procedure as being applied across all possible random samples.

Expected Value of the OLS Estimators

• Including Irrelevant Variables in a Regression Model.


– Overspecifying the model.
– Population model: y = β0 + β1 x1 + β2 x2 + β3 x3 + u.
– This model satisfies Assumptions MLR.1 through MLR.4.
– However, x3 has no effect on y after x1 and x2 have been controlled for.
– E(y|x1 , x2 , x3 ) = E(y|x1 , x2 ) = β0 + β1 x1 + β2 x2
– The variable x3 may or may not be correlated with x1 and x2 .
– Including one or more irrelevant variables does not affect the unbiasedness of the OLS
estimators.
– However, there is undesirable effects on the variances of the OLS estimators.

Expected Value of the OLS Estimators

• Omitted Variable Bias: The Simple Case.


– Problem of excluding a relevant variable or underspecifying the model. Misspecification
Analysis.
– Suppose, the true population model is: y = β0 + β1 x1 + β2 x2 + u.
– However, due to our ignorance or data unavailability, we estimate the model by excluding
x2 .
– ỹ = β̃0 + β̃1 x1 .
– Deriving expected value of β̃1 .
– E(β̃1 ) = β1 + β2 δ̃1 .
– δ̃1 is the slope from the simple regression of x2 on x1 .
– Hence, bias in β̃1 can be derived as:
– Bias(β̃1 ) = β2 δ̃1 . (Also called as the omitted variable bias).

Expected Value of the OLS Estimators

• Omitted Variable Bias: The Simple Case.


– Two cases when β̃1 is unbiased
– β2 is 0
– δ̃1 is 0: If x1 and x2 are uncorrelatd in the sample.
– However in reality, we do not observe x2 .
– Hence, we will only have the idea about the direction of β2 and the sign of correlation
between x1 and x2 .

Expected Value of the OLS Estimators

• Omitted Variable Bias: The Simple Case.


– Summary of Bias.

8
Expected Value of the OLS Estimators

• Omitted Variable Bias: Hourly Wage Example


– Suppose true model is: log(wage) = β0 + β1 educ + β2 abil + u.
– However, we do not have data for abil: ability.
^ = 0.584 + 0.083educ.
– Hence we obtain: log(wage)
– This is the result from only a single sample, so we cannot say that .083 is greater than
β1 .
– Nevertheless, the average of the estimates across all random samples would be too large.

Expected Value of the OLS Estimators

• Omitted Variable Bias.


– Upward Bias: E(β̃1 ) > β1 .
– Downward Bias: E(β̃1 ) < β1 .
– Biased towards zero: Cases where E(β̃1 ) is closer to 0 than is β1 .
– If β1 is positive, then β̃1 is biased toward zero if it has a downward bias.
– If β1 is negative, then β̃1 is biased toward zero if it has a upward bias.

Expected Value of the OLS Estimators

• Omitted Variable Bias: General Case.


– Deriving the sign of bias when there are multiple regressors is more difficult.
– Correlation between a single x and u generally results in all OLS estimators being biased.
• Wage Example.
– Suppose, true model: wage = β0 + β1 educ + β2 exper + β3 abil + u.
– If abil is omitted, both β1 and β2 are biased, even if we assume exper is uncorrelated
with abil.
– We could have the idea of bias in β1 only if could assume: exper and abil, & educ and
abil are uncorrelated.
– Then we can say: Because β3 > 0 and educ and abil are positively correlated, β̃1 would
have an upward bias.

Variance of OLS Estimators

• AssumptionMLR 5: Homoscedasticity.
– V ar(u|x1 , x2 , ..., xk ) = σ 2 .
– Formulas are simplified.
– OLS has an important efficiency property.
– Assumptions MLR.1 through MLR.5 are collectively known as the Gauss - Markov
assumptions.
– x (bold x) to denote all independent varibles.
– V ar(y|x) = σ 2 .

9
Variance of OLS Estimators

• Sampling variances of the OLS slope estimators.


– Under Assumptions MLR.1 through MLR.5,
2
– V ar(β̂j ) = SSTjσ(1−R2 ) .
j
Pn
– SSTj = i=1 (xij − x̄j )2 .
– Rj2 is the R-squared from regressing xj on all other independent variables.
– A large variance means a less precise estimator: Larger confidence intervals and less
acurate hypothesis testing.

Variance: Components of OLS Variances

• The Error Variance, σ 2 .


– Larger σ 2 means larger sampling variances.
– More “noise” in the equation (larger σ 2 ) makes it difficult to estimate the partial effect.
– However, it is unknown and we need to estimate it.
• The total sample variance in xj , SSTj .
– Larger the total variation in xj , smaller the variance of the estimators.
– One way to increase sample variation in xj : Increase the sample size.

Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– Rj2 is the R-squared from regressing xj on all other independent variables.
– Two Independent variables: y = β0 + β1 x1 + β2 x2 + u.
2
– V ar(β̂1 ) = [SST1σ(1−R2 )]
1
– R12 is the R-squared from regressing x1 on x2 .
– R12 close to 1: Much of the variation in x1 is explained by x2 . High correlation.

Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– General Case: Rj2 is the proportion of variation in xj that can be explained by the other
independent variables.
– Rj2 close to 0: when xj has near 0 sample correlation with every other independent
variable.

Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– Multicollinearity: High (but not perfect) correlatioin between two or more independent
variables.
– Not a violation of Assumption MLR.3.
– Way to deal with the problem of multicollinearity: Increase the sample size.
– In addition, a high correlation between certain variables might be irrelevant.
– If we are interested in β1 , then a high correlation between x2 and x3 have no direct
effect on V ar(β̂1 ).

10
Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– Statistic to to determine the severity of multicollinearity (However, easy to misuse).
– Variance Inflation Factor.
1
– V IFj = (1−R 2) .
j
– Hence variance can be written as:
σ2
– V ar(β̂1 ) = SST j
V IFj .

Variance: Variances in Misspecified models

• The choice of whether to include a particular variable in a regression model.


– The tradeoff between bias and variance.
• Suppose the True Model is: y = β0 + β1 x1 + β2 x2 + u.
• Two estimators of β1 .
– β̂1 from multiple regression: ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
– β̃1 from simple regression: ỹ = β̃0 + β̃1 x1 .
– If β2 6= 0, β̃1 is biased, unless x1 and x2 are uncorrelated.
– However, β̂1 is unbiased for any value of β1 .
– Hence, if bias is the only criteria, then β̂1 is preferred.

Variance: Variances in Misspecified models

• However, variance is also important.


• V ar(β̃1 ) is always smaller than V ar(β̂1 ). If x1 and x2 are uncorrelated, then both are same.
• Assuming x1 and x2 are not uncorrelated, following conclusions can be drawn.
– When β2 6= 0, β̃1 is biased, β̂1 is unbiased, and V ar(β̃1 ) < V ar(β̂1 ).
– When β2 = 0, β̃1 and β̂1 are both unbiased, and V ar(β̃1 ) < V ar(β̂1 ).
• If β2 = 0, then including x2 will increase the variance of the estimator of β1 .
• If β2 6= 0 , then excluding x2 will result in a biased estimator of β1 .
– Comparing the likely size of the omitted variable bias with the reduction in the variance.

Variance: Variances in Misspecified models

• However, when β2 6= 0, there are 2 favorable reasons for including x_2in the model.
– Bias in β̃1 does not shrink, as the sample size increases but variance can be reduced.
– σ 2 increases when x2 is dropped from the equation.

Variance: Estimating σ 2

• Unbiased estimator of σ 2 in general multiple regression.


• Under the Gauss-Markov assumptions MLR.1 through MLR.5, E(σ̂ 2 ) = σ 2 .
Pn 2
û SSR
• σ̂ 2 = (n−k−1)
i=1 i
= (n−k−1) .
• σ̂ = Standard error of regression, standard error of the estimate, root mean squared error.

11
σ̂
• se(β̂j ) = [SSTj (1−Rj2 ]1/2
. Invalid in presence of heteroscedasticity.

• Thus, heteroscedasticity does not cause bias in β̂j , but lead to bias in the V ar(β̂j ), which
then invalidates the standard errors.
• Standard errors can also be written as:
– se(β̂j ) = √nsd(x σ̂)√1−R2 .
j j

Efficiency of OLS: The Gauss-Markov Theorem

• Justifies the use of the OLS method rather than using a variety of competing estimators.
• Gauss-Markov Theorem.
– Under assumptions MLR 1 through MLR 5, the OLS estimator β̂j for βj is the best
linear unbiased estimator (Efficient).
– Best: Having the smallest variance.
– Linear: If it can be expressed as a linear function of the dependent variable.
– Unbiased: E(β̂j ) = βj .
– Estimator: It is a rule that can be applied to any sample of data to produce an estimate.

12

You might also like