Multiple Linear Regression Model
Multiple Linear Regression Model
0 / 59
Econometrics: why do we do it?
•We love playing with data and are interested in relationships
•Ultimately: IDENTIFICATION
1
Empirical Economics
• Using data and statistical methods to examine the evidence – a
supplement/complement to theory
• Distinguishing between correlations and causal relationships is the key
task in empirical economics
• Correlated: Two economic variables move together.
• Causal: Movement in one variable causes movement in the other.
• Variable A and B move together: Either (1) by chance, or (2) A causes B,or
(3) B causes A, or (4) another variable C causes movement in A and B.
2
Icecream and Cancer, Nicholas Cage and Drowning
3
School enrollment rates are lower among Progresa beneficiaries.
Why?
What the characteristics of grant beneficiaries?
• (1) kids with existing education deficit, or (2) those needing to leave school to work, or (3) those
needing to care for someone in a large household. (4) They may live in households with individuals
who don’t value education highly – as they never got to have it or had poor quality education – (5)
and they live with family who also can’t help their kids with their education. And finally – (6) Progresa
kids probably live in areas with low quality schools
•The probability of enrolment is way lower even before Progresa
•Long story short: low income (variable C) causes grant receipt (variable A), low income (variable C)
causes poor enrolment (variable B), thus we observe a correlation between grant receipt (A) and
poor enrolment (B). A does not cause B!
•Maybe the grant was not enough to get them through school, given their unfortunate circumstances,
or it came too late.
•We expect the coefficient on grant receipt to be negative.
•How can you really identify the grant effect?
4
Thoughts
•Identifying causal effects is exceptionally difficult – we don’t expect it from
you in your project, but we do expect you to think and talk about it
•We can use a randomised controlled trial to establish causality but it has its
own downsides logistically and ethically.
•What we can do right now, is to control for all those many important factors
which confound identification
5
Multiple Regression
Analysis: Estimation
Chapter 3 Roadmap
1. Motivation for Multiple Regression
2. Mechanics and Interpretation of Ordinary Least Squares
3. The Expected Value of the OLS Estimators
4. The Variance of the OLS Estimators
5. Efficiency of OLS: The Gauss-Markov Theorem
6. The Language of MLR
7. Scenarios for MLR
7
8
Stata Output Interpretation
𝜎2
n
SSE
𝑅2
SSR
SST
𝜎ො
1
𝛽
𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝑢
• What’s missing?
• There is a lot which might be left in u
• What variable might be related to both of them?
10
• Interpretation? An additional year of education is
associated with a R694 increase in the monthly wage, C.P.
11
The Model with Two Independent Variables
•Let’s add in age of the individual (in years)
𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑎𝑔𝑒 + 𝑢
•We take age out of the error term and explicitly put it in equation
Note:
◦ Education tends to be correlated with age
◦ 𝛽2 measures the ceteris paribus effect of age on wage
12
• 1 more year of age is associated with a R148 increase in monthly wage
• What happened to the coefficient on education?
13
MLR Advantage: Flexibility in Functional Form
It depends on how
•NB! Be super careful when interpreting the coefficients. old the person
𝜕𝑤𝑎𝑔𝑒 already is
= 𝛽2 + 2𝛽3 𝑎𝑔𝑒
By how much does wage increase 𝜕𝑎𝑔𝑒
if age is increased by one year?
14
What is the impact of a change in age on wages? For a 22 year-old or a 60 year old?
𝜕𝑤𝑎𝑔𝑒 𝜕𝑤𝑎𝑔𝑒
= 107.3 + 2 ∗ 0.51𝑎𝑔𝑒 = 107.3 + 2 ∗ 0.51 ∗ 22 = 𝑅129.7
𝜕𝑎𝑔𝑒 𝜕𝑎𝑔𝑒
15
Was this a Linear Regression Model?
•Yes
•It does assume a quadratic relationship between wage and age BUT
•The model must be linear in the coefficients (𝛽𝑗 ), not the variables (linear
regression definition)
16
Multiple Linear Regression: Terminology
Explain variable y in terms of variables x1, x2, …, xk
Error term,
Partial effects, coefficients, slope parameters disturbance,
Intercept
unobservables
• NB! Holding other factors fixed = ceteris paribus = all things being equal
• For those with the same levels of x2, what is the impact of
changing x1 by one unit, on y?
• The coefficients are also called partial/marginal effects
18
Conditional Expected Value
Often we forget what the conditional expectation is:
An unconditional expectation: E(𝑋) is just a number that we calculate
Here is a conditional expectation, E(𝑋|𝑌1, 𝑌2, 𝑌3)
This often confuses. Remember, E(𝑋|𝑌1, 𝑌2, 𝑌3) means calculate the expected value of X, when the
values of 𝑌1, 𝑌2, 𝑌3 are given or set at particular values. For example:
E 𝑋 𝑌1, 𝑌2, 𝑌3 = E 𝑋 𝑌1 = 5, 𝑌2 = 1, 𝑌3 = 18
Calculate the expected value of X, but first set the values of 𝑌1, 𝑌2, 𝑌3 as above.
What if the conditional mean equals the unconditional? E.g. E 𝑋 𝑌1, 𝑌2, 𝑌3 = 𝐸(𝑋)
This implies the expected value of X doesn’t change, even if you set 𝑌1, 𝑌2, 𝑌3 to different values.
Thus X is independent of of 𝒀𝟏, 𝒀𝟐, 𝒀𝟑: i.e. Corr 𝑋 𝑌1 = 0, Corr 𝑋 𝑌2 = 0, Corr 𝑋 𝑌3 = 0
19
Key Assumption: Zero Conditional Mean
E 𝑢 𝑥1, 𝑥2, … , 𝑥𝑘 = 0
•This is the same assumption as previously – SLR.4: E 𝑢 𝑥 = 0
•All the factors in u, the unobserved term, are uncorrelated with the x’s.
•The other way to say it: E 𝑢 𝑥1, 𝑥2, … , 𝑥𝑘 = 𝐸 𝑢 = 0
•This will not hold if any of the x’s are correlated with anything in u
•Remember u is just a variable, whose mean we can calculate.
We will return to this concept.
20
Chapter 3 Roadmap
21
Random Sampling
•A random sample says every person in the sample of size n had an equal chance
of being selected
•Or, every household had an equal probability of being selected.
•The sample is then representative of the population
•Start with a random sample (n individuals, k independent variables)
•The subset of individuals i, with n individuals in total, with data for variables
𝑥1, … 𝑥𝑘 is represented as: { 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑘, 𝑦𝑖 : 𝑖 = 1, … 𝑛}
22
How do we obtain the OLS estimates?
• Start with a random sample (n individuals, k independent variables)
{ 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑘, 𝑦𝑖 : 𝑖 = 1, … 𝑛}
23
24
Obtaining the OLS Estimates: 𝛽መ𝑘
25
Obtaining the OLS Estimates
•So how do minimize the sum of squared residuals?
26
How do we differentiate the SSR?
27
Given the SSR:
𝝏𝑺𝑺𝑹
•We need , for i = 0, 1, 2, … , k
𝝏𝜷𝒊
•Set the derivatives equal to zero,
to minimize SSR
28
We obtain:
CLEAN UP:
We divide through by -2 to get
rid of the 2s and the minus
signs.
29
Finally:
- Also solve using the method of moments: it uses E(u) = 0 and E(xju) = 0, j = 1, 2, …, k
- Clearly we don’t do this by hand.
- We would solve this using linear algebra. How are we guaranteed unique solutions?
30
Interpretation of the OLS Regression Equation
◦ MLR holds the other x variables fixed even if, in reality, they are
correlated with the x variable under consideration - "Ceteris paribus“
◦ NB! We still assume that unobserved factors in u do not change if the
explanatory variables change: E 𝑢 𝑥𝑖 = 𝐸(𝑢) = 0
31
Determinants of Wages: Interpretations
𝑤𝑎𝑔𝑒
ෟ = −9557 + 865𝑒𝑑𝑢𝑐 + 147𝑎𝑔𝑒
32
Properties of OLS in any Sample
• Fitted values and residuals
Deviations from regression Correlations between deviations Sample averages of y and of the
line sum to zero and regressors are zero regressors lie on regression line
33
Cov X,Y
Correlation and Covariance: Corr X, Y =
σ𝑋σY
Note:
𝑛
1
𝐸 𝑢ෝ𝑖 = 𝑢ෝ𝑖 = 0
𝑛
1
Thus σ𝑛1 𝑢ෝ𝑖 =0
In addition
Why does this: σ𝑛1 𝑥𝑖𝑗𝑢ෝ𝑖 = 0 tell us anything about
Co𝑟𝑟(𝑥𝑖𝑗𝑢ෝ𝑖 )?
We know
Cov 𝑋, 𝑌 = 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌)
Cov 𝑥𝑖𝑗,ෞ
𝑢𝑖 = 𝐸 𝑥𝑖𝑗𝑢ෝ𝑖 − 𝐸 𝑥𝑖𝑗 𝐸 𝑢ෝ𝑖 = 𝐸 𝑥𝑖𝑗𝑢ෝ𝑖
𝑛
1
𝐸 𝑥𝑖𝑗𝑢ෝ𝑖 = 𝑥𝑖𝑗𝑢ෝ𝑖 = 0
𝑛
1
Implying Co𝑟𝑟(𝑥𝑖𝑗𝑢ෝ𝑖 ) = 0
Cov 𝑋, 𝑌 = 𝐸[(𝑋 − 𝐸 𝑋 𝑌 − 𝐸 𝑌 ]
σ𝑋, σY > 0: they don’t change the sign of Cov(X,Y)
34
OLS Facts
(1) We know the sample average of the residuals is zero and thus:
Given 𝑢ෝ𝑖 = 𝑦𝑖 − 𝑦ෝ𝑖, we average both sides, the LHS = 0, and we get:
𝑦ത = 𝑦ො
(2) The sample covariance between each xj and 𝑢ො is zero and thus:
𝐶𝑜𝑣(ෝ
𝑦𝑖 , 𝑢ෝ𝑖 ) = 0
𝑦ෝ𝑖 is a function of all the x’s, and hence is also not correlated with ui.
(3) The point (𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒌 , 𝒚 ) always lies on the regression line (page 74):
𝑦ത = 𝛽መ0 + 𝛽መ1 𝑥ഥ1 + 𝛽መ2 𝑥ഥ2 + ⋯ + 𝛽መ𝑘 𝑥ഥ𝑘
Residuals: 𝑢ෝ𝑖 = 𝑦𝑖 − 𝑦ෝ𝑖
•That means our actual 𝐲𝐢 > our predicted 𝒚ෝ𝒊, so i.e. the model
underpredicted the person’s y value.
36
Simple and Multiple Regression Compared
• Simple regression of y on x1:
𝑦 = 𝛽෨0 + 𝛽෨1 𝑥1
• We can write 𝛽෨1 = 𝛽መ1 + 𝛽መ2 𝛿ሚ1 (Note: We didn’t prove this)
Where 𝛿ሚ1 is the slope coefficient from the regression of x2 on x1
37
Given 𝛽ሚ1 = 𝛽መ1 + 𝛽መ2 𝛿ሚ1 , when is 𝛽ሚ1 = 𝛽መ1 ?
1. 𝛽መ2 = 0
◦ This is when the partial effect of x2 on y is zero
2. 𝛿ሚ1 = 0
◦ This is when x1 and x2 are uncorrelated
• We can use the formula to compare 𝛽෨1 and 𝛽መ1 .
NB! You should know how to use the relationship SST = SSE + SSR to derive
the R-squared.
39
Goodness-of-Fit
•Decomposition of total variation
𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
41
R-squared never
decreases
45
Standard Assumptions for Multiple Regression
46
Standard Assumptions: MLR.3
•Assumption MLR.3 (no perfect collinearity)
"In the sample (and therefore in the population), none of the independent
variables is constant and there are no exact linear relationships
among the independent variables“
47
Violations of Assumption MLR.3
• One variable is a constant multiple of the other
◦ E.g. Regress wage on education in years, and education in decades
Educ_decades = educyrs/10
• A variable can be expressed as an exact linear function of others
◦ E.g. Wage on number of children in hh, number of adults in hh, and hh size
HHsize = numchildrenHH + numadultHH (don’t include all 3!)
• Too small sample size - MLR.3 fails if n < k + 1 or extreme bad luck
• Question: Could we include log(inc) and log(inc2) in a wage equation?
48
MLR.4: Zero Conditional Mean
The value of the explanatory variables
E 𝑢𝑖 𝑥1𝑖 , 𝑥2𝑖 , … , 𝑥𝑘𝑖 = 𝐸 𝑢 = 0 Should not contain any information about
the mean of the unobserved factors
If age was not included in the regression, it would end up in the error
term; it would be hard to defend that educ is uncorrelated with u
49
Zero Conditional Mean Misconceptions
Which is a better way of putting MLR.4?
E 𝑢𝑖 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑘 = 0
Or
E 𝑢𝑖 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑘 = 𝐸 𝑢 = 0?
The 2nd line: the part that E(u|x) = E(u) is most important!
• That E(u) = 0 is immaterial (we could have made it any constant value)
• It is most important to know that u and x must be uncorrelated
50
When does MLR.4 fail?
E 𝑢𝑖 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑘 = 0
This could fail with mis-specified functional form:
• E.g. leave out age2 in a wage equation, or used wage instead of log wage.
• It also fails if we omit an important determinant of y which is correlated
with any of the xs.
→ This might happen if we are lacking data, or don‘t know what to include
•If MLR.4 is violated, the OLS estimators are biased
51
Endogenous vs Exogenous Variables
◦ Endogenous variables
◦ Explanatory variables that are correlated with the error term
◦ Endogeneity is a violation of assumption MLR.4
◦ Exogenous variables
◦ Explanatory variables that are uncorrelated with the error term
◦ MLR.4 holds if all explanatory variables are exogenous
◦ Exogeneity
◦ key assumption for a causal interpretation of the regression
◦ and for unbiasedness of the OLS estimators
52
When is an x variable endogenous?
• If we have omitted variables
→ E.g. no data on IQ in a wage equation)
• If the x suffers from measurement error
→ E.g. household income
• If explanatory variables are determined jointly with y
→E.g. price and quantity
53
NB! How do MLR.3 and MLR.4 differ?
•MLR.3 – no perfect collinearity
◦ We can tell immediately if MLR.3 holds or doesn’t
◦ Stata just won’t run the regression – it is smarter than us
56
Does this regression satisfy MLR.1 – MLR.4?
57
Including Irrelevant Variables in a Regression
= 0 in the population
True model
(contains x1 and x2)
𝑦 = 𝛽෨0 + 𝛽෨1 𝑥1
Estimated model
•We use “~” rather than “^” (x2 is omitted)
for the underspecified model
59
60
What is 𝜷𝟏 if we leave out x2?
If x1 and x2 are correlated, assume a linear
regression relationship between them
It will look as if people with many years of education earn very high wages,
BUT this is partly due to the fact that people with more education are also more able on
average.
62
When is there no OVB? ෩ 𝟏 biased?
Is 𝜷
• If the omitted variable is irrelevant, or uncorrelated with x1. If not:
෩𝟏 = 𝜷
𝜷 𝟏 + 𝜷
𝟐𝜹
෩𝟏
෩ 𝟏 ):
We must find 𝑬(𝜷 E(𝛽෨1 ) = 𝐸(𝛽መ1 + 𝛽መ2 𝛿ሚ1 ) = 𝐸(𝛽መ1 ) + 𝐸(𝛽መ2 𝛿ሚ1 )
= 𝐸(𝛽መ1 ) + 𝐸(𝛽መ2 )𝛿ሚ1 = 𝛽1 + 𝛽2 𝛿ሚ1
The bias is the difference between the two: 𝐵𝑖𝑎𝑠 𝛽෨1 = 𝐸 𝛽෨1 − 𝛽1 = 𝛽2 𝛿ሚ1
64
OVB Terminology: 𝐵𝑖𝑎𝑠 𝛽ሚ1 = 𝐸 𝛽ሚ1 − 𝛽1 = 𝛽2 𝛿ሚ1
65
What happened here? Is there OVB?
66
𝛿ሚ1 < 0 (? ) Why might this be the case?
67
OVB: More General Cases
Estimated model
(x3 is omitted)
•We add exper to the wage equation used previously, with omitted variable abil
•What happens to the coefficient on exper if we omit abil?
•The coefficients on both educ and exper will be biased, even if Corr(abil,exper) = 0
•Will the coefficient on educ be biased?
•We can treat this like the simple 2 variable case if (1) if Corr(abil,exper) = 0, and if Corr(educ,exper) =
0. is only correlated with exper, and not educ, and educ and exper are not correlated.
•This is quite unlikely
•We usually just ignore the other variables and focus on our variable of interest (e.g. x1), and
discuss whether its coefficient is biased – but this is only ok if x2 to xk are uncorrelated with x1.
69
Discussion
70
Chapter 3 Roadmap
1. Motivation for Multiple Regression
2. Mechanics and Interpretation of Ordinary Least Squares
3. The Expected Value of the OLS Estimators
4. The Variance of the OLS Estimators
5. Efficiency of OLS: The Gauss-Markov Theorem
6. The Language of MLR
7. Scenarios for MLR
71
Standard MLR Assumptions: Continued
72
2
Wait. Why is 𝑉𝑎𝑟 𝑢 𝑥 = 𝑉𝑎𝑟 𝑦 𝐱 = 𝜎 ?
Given: y = β0+ β1x1 + β2x2 + ... + βkxk + u
Let A = β0+ β1x1 + β2x2 + ... + βkxk.
Then y = A + u, and Var(y|x) = Var(A + u|x)
Then Var(y|x) = Var(A|x) + Var(u|x) because A and u are independent,
Cov (A,u) = 0 (if E(u|x) = E(u) = 0)
Therefore Var(y|x) = 0 + Var(u|x) = 0 + σ2 = σ2
Var(A|x) = 0 because once the values
of x are given, A does not vary.
73
The Canonical Heteroskedasticity E.g.
74
Gauss-Markov Assumptions
• MLR.1 through MLR.5 are the Gauss Markov assumptions for cross
sectional regression
• Assumptions MLR.1 and MLR.4 summarized:
𝐸 𝑦 𝐱 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘
75
Theorem 3.2:
Sampling Variances of the OLS Slope Estimators
NB! this is tattoo worthy, t shirt worthy, write on the wall worthy
76
𝜎2
The Components of Variance: 𝑉𝑎𝑟 𝛽መ𝑗 =
𝑆𝑆𝑇𝑗(1−𝑅𝑗2)
to be small. WHY?
•NB! We want 𝑽𝒂𝒓(𝜷)
•The error variance (𝜎 2 )
◦ A high 𝜎 2 increases the sampling variance due to more "noise“ in the equation
◦ A large error variance makes estimates imprecise
◦ 𝜎 2 does not decrease with sample size. WHY? It is a population level parameter
• The total sample variation in the explanatory variable (SSTj)
◦ More sample variation leads to more precise estimates
◦ Total sample variation automatically increases with the sample size
◦ Increasing the sample size is a way to get more precise estimates
77
𝜎2
𝑉𝑎𝑟 𝛽መ𝑗 =
𝑆𝑆𝑇𝑗(1 − 𝑅𝑗2 )
OLS Variance Components: Multicollinearity
Investigating linear relationships among the x’s (𝑅𝑗2 )
Regress xj on ALL other independent variables (including a constant)
◦ Var(𝛽መ𝑗 ) will be higher the better xj can be linearly explained by the other x‘s
(Why? Look to see where 𝑅𝑗2 is in the variance formula)
◦ The problem of almost linearly dependent explanatory variables is
called multicollinearity (i.e. 𝑅𝑗2 → 1 for some xj )
78
𝜎2
Multicollinearity 𝑉𝑎𝑟 𝛽መ𝑗 =
𝑆𝑆𝑇𝑗(1−𝑅𝑗2)
•𝑅𝑗2 = 1
◦ Ruled out by MLR.3
◦ 𝑥𝑗 is a perfect linear combination of some of the x’s
•𝑅𝑗2 close to 1
◦ High correlation between 2 or more x variables - but not perfect!
◦ Multicollinearity, but NOT a violation of MLR.3
79
Example: Test Marks
𝑡𝑒𝑠𝑡𝑚𝑎𝑟𝑘 = 𝛽0 + 𝛽1 𝑛𝑢𝑚_𝑙𝑒𝑐𝑡𝑢𝑟𝑒𝑠 + 𝛽2 𝑚𝑎𝑡𝑟𝑖𝑐_𝑚𝑎𝑟𝑘 + 𝛽3 𝐺𝑃𝐴 + 𝑢
80
Multicollinearity Can be Irrelevant
• Consider this model:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢
81
Multicollinearity Discussion
◦ Dropping some independent variables may reduce multicollinearity (but
might lead to omitted variable bias)
◦ NB: Only the sampling variance of the variables involved in
multicollinearity will be inflated; the estimates of other effects may be
very precise – thank goodness!
◦ NB: that multicollinearity is NOT a violation of MLR.3 in the strict sense
◦ Multicollinearity may be detected through variance inflation factors
(limited usefulness) 𝑉𝐼𝐹𝑗 = 1 2 Arbitrary rule of thumb often used: the
(1 − 𝑅𝑗 ) VIF should not be larger than 10
82
Variances in Misspecified Models
◦ We decide to include a particular variable in a regression by analyzing
the tradeoff between bias and variance
Estimated Model 1
Estimated Model 2
Model 1 Model 2
•What if x2 is relevant?
•Conclusion: it‘s weird, but from a variance perspective Model 2 is always preferred
84
Variances in Misspecified Models
• Things to note:
88
Estimating the OLS Sampling Variances
The true sampling variation of the estimated
The estimated
sampling variation
of the estimated
•Note that these formulae are only valid under MLR.1-MLR.5 (in
particular, there has to be homoscedasticity)
89
𝜎ො
Standard errors
90
Heteroskedasticity: 𝑉𝑎𝑟 𝑢 𝑥 = 𝑉𝑎𝑟 𝑦 𝐱 ≠ 𝜎 2
91
Chapter 3 Roadmap
93
Theorem 3.4: Gauss-Markov Theorem
◦ Under assumptions MLR.1 - MLR.5, the OLS estimators are the best
linear unbiased estimators (BLUEs) of the regression coefficients, i.e.
•Estimator
◦ Rule that can be applied to any sample of data to produce an estimate
95
Chapter 3 Roadmap
1. Motivation for Multiple Regression
2. Mechanics and Interpretation of Ordinary Least Squares
3. The Expected Value of the OLS Estimators
4. The Variance of the OLS Estimators
5. Efficiency of OLS: The Gauss-Markov Theorem
6. The Language of MLR
7. Scenarios for MLR
96
The Language of Multiple Linear Regression
•NB: OLS is an estimation method, not a model (like the linear model below)
97
Interpreting logs in regressions
Note also that when we have a level-log or a log-level model, we call our 𝛽1 a semi-elasticity
interpretation.
When we have a log-log model, then 𝛽1 is called an elasticity interpretation.
98