0% found this document useful (0 votes)
18 views

MFIN 514 Mod 3

This document provides an overview of multiple regression analysis and violations of ordinary least squares (OLS) assumptions. It discusses omitted variable bias that can occur when important independent variables are excluded from a regression model. It also covers calculating degrees of freedom, standard error of regression, root mean squared error, analysis of variance (ANOVA), the R-squared statistic, and adjusted R-squared statistic in the context of multiple regression analysis. Examples using California test score data are provided to illustrate key concepts.

Uploaded by

Liam Fraleigh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

MFIN 514 Mod 3

This document provides an overview of multiple regression analysis and violations of ordinary least squares (OLS) assumptions. It discusses omitted variable bias that can occur when important independent variables are excluded from a regression model. It also covers calculating degrees of freedom, standard error of regression, root mean squared error, analysis of variance (ANOVA), the R-squared statistic, and adjusted R-squared statistic in the context of multiple regression analysis. Examples using California test score data are provided to illustrate key concepts.

Uploaded by

Liam Fraleigh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

MFIN 514:

Multiple Regression,
Violations of OLS
Assumptions

Dr. Ryan Ratcliff

SCHOOL OF BUSINESS
Multiple Regression Model
Consider the case of two (or more) regressors:
Yi = β0 + β1X1i + β2X2i + ei, i = 1,…,n

Y is the dependent variable


X1, X2 are the two independent variables (regressors)
(Yi, X1i, X2i) denote the ith observation on Y, X1, and X2.
β0 = intercept
β1 = effect on Y of a change in X1, holding X2 constant
β2 = effect on Y of a change in X2, holding X1 constant
ei = the regression error (omitted factors)

With a few exceptions, most of what you know about simple


regression will generalize to this case with multiple regressors

SCHOOL OF BUSINESS
Omitted Variable Bias
• In our test scores example, we
found that test scores were
negatively correlated with
higher student teacher ratio
(STR).

• Everything that might affect


test scores that’s not STR is in
the error term.

• OLS assumes that the error is


uncorrelated with STR. If
there is something in the error
term that’s correlated with
STR, our estimate of β will be
biased.

SCHOOL OF BUSINESS
Omitted Variable Bias

• Districts with lower % Eng. Learners (PCT_EL) have higher test


scores AND lower STR (smaller classes)
• Do we find negative correlation between STR and Tests Scores
b/c STR is just a proxy for PCT_EL?

SCHOOL OF BUSINESS
Omitted Variable Bias: Math
TESTSCRi = a + b(STRi)+ d(EL_PCTi)+ei
This is the “correct” regression that accounts for both variables, and the b,d
coefficient have the usual “holding all else constant” interpretation.

EL_PCTi = c + g(STRi)+ui (STR, EL_PCT are correlated)

TESTSCRi = a + b(STRi)+ d[c + g(STRi)+ui ]+ei


TESTSCRi = (a+dc)+ (b+dg)STRi + (dui +ei)

If we just have STR, the estimated coefficient is actually (b+dg): a mix of


the coefficient we’re trying to estimate (b) and the indirect effect that
EL_PCT matters for TEST_SCR (d) and STR is correlated with this
omitted variable (g). The dg term is called “omitted variable bias.”

SCHOOL OF BUSINESS
Omitted Variable Bias: Intuition
TESTSCRi = a + b(STRi)+ d(EL_PCTi)+ei
EL_PCTi = c + g(STRi)+ui (STR, EL_PCT are correlated)
TESTSCRi = (a+dc)+ (b+dg)STRi + (dui +ei)

Our regression shows a negative coefficient on STR: bigger classes


predict lower test scores. However, if
1) big classes tend to have high shares of non-native speakers (g>0) AND
2) a high el_pct also predicts lower test scores (d<0), then

dg<0 will make our estimated coefficient in the STR regression look more
negative than the base effect, b – our estimate is biased. In some sense,
STR “gets credit” for some of EL_PCT’s negative effect on test scores.

SCHOOL OF BUSINESS
Cures for Omitted Variable Bias
Three ways to overcome omitted variable bias

1. Run a controlled experiment in which treatment (STR) is randomly


assigned: then PctEL is still a determinant of TestScore, but PctEL
is uncorrelated with STR (g=0). (This solution to OV bias is often
infeasible.)

2. Adopt the “cross tabulation” approach, with finer gradations of STR


and PctEL – within each group, all classes have the same PctEL,
so we “control for PctEL” (Common in Finance)

3. Use a regression in which the omitted variable (PctEL) is no longer


omitted: include PctEL as an additional regressor in a multiple
regression.

SCHOOL OF BUSINESS
CA Test Scores: Mult. Regression
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------

• The t-test for the significance of an individual coefficient is the


same as before…
• Compare these two regressions, and relate these results to our
previous discussion of Omitted Variable Bias.

SCHOOL OF BUSINESS
Coef. Tests,Predictions Same as Before
� 𝐻𝐻𝐻𝐻
𝛽𝛽−𝛽𝛽 �
𝛽𝛽
t-statistic: 𝑡𝑡 = =
𝑆𝑆𝑆𝑆(𝛽𝛽) 𝑆𝑆𝑆𝑆(𝛽𝛽)

Conf Interval: {𝛽𝛽̂ ± 5% Crit Value×SE(𝛽𝛽)}


̂

Predicted Values:
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------

TESTSCR Prediction = 686.03 – 1.10*STR – 0.6498 * PCTEL

SCHOOL OF BUSINESS
N – k : Degrees of Freedom
reg testscr str pctel, robust;

Regression with robust standard errors Number of obs = 420


F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------

• A number of tests (esp. ANOVA) will require you to know Sample Size
(N), and # of regressors beyond constant (K)

• Here, N = 420, and K = 2. Several formulae (eg Fstat) will contain a


“degrees of freedom” correction N – K – 1 (-1 is for the constant).

SCHOOL OF BUSINESS
N –k: SER and RMSE
As in regression with a single regressor, the Std.
Error of the Regression and the Root Mean-Sq.
Error are measures of the spread of the Ys around
the regression line (Std. Dev. of Errors):

1 n

SER = ∑
n − k − 1 i =1
ˆ
ui
2 Degrees of Freedom
Correction

1 n 2
RMSE = ∑
n i =1
uˆi No Correction

SCHOOL OF BUSINESS
ANOVA: Same, except N - k
Source of Sum of
Df Mean Square
Variation Squares
Regression
k RSS MSR = RSS/k
(explained)
Error
n–k–1 SSE MSE=SSE/(n-k-1)
(unexplained)
Total n–1 SST

𝑅𝑅𝑅𝑅𝑅𝑅�
2 explained variation RSS 𝑘𝑘 𝑀𝑀𝑀𝑀𝑀𝑀
R = = 𝐹𝐹 =
𝑆𝑆𝑆𝑆𝑆𝑆
=
𝑀𝑀𝑀𝑀𝑀𝑀
total variation SST 𝑛𝑛 − 𝑘𝑘 − 1

SCHOOL OF BUSINESS
R2 vs. Adjusted R2
Recall R2 = RSS / SST = 1 – SSE / SST.

This formulation has the annoying feature that R2 always


increases when we add variables to the regression, even if
they are insignificant.

Adj. R2 uses the N – k logic to apply a penalty to including


another variable.

𝑛𝑛−1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛−1


Adj R2= 1 − =1− (1 − 𝑅𝑅2 )
𝑛𝑛−𝑘𝑘−1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛−𝑘𝑘−1

If SSE doesn’t go down enough, the benefit of the new


variables does not exceed the cost, so Adj. R2 won’t increase.

SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051


Residual 194 0.559 0.003
Total 196 0.662

Dave Turner is a security analyst who is using regression analysis to determine how well two
factors explain returns for common stocks. The independent variables are the natural logarithm
of the number of analysts following the companies, Ln(no. of analysts), and the natural
logarithm of the market value of the companies, Ln(market value). The regression output
generated from a statistical program is given in the following tables. Each p-value corresponds
to a two-tail test.

Turner plans to use the result in the analysis of two investments. WLK Corp. has twelve analysts
following it and a market capitalization of $2.33 billion. NGR Corp. has two analysts following it
and a market capitalization of $47 million.

SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051


Residual 194 0.559 0.003
Total 196 0.662

The 95% confidence interval (use a t-stat of 1.96 for this question only) of the
estimated coefficient for the independent variable Ln(Market Value) is closest to:

A) 0.011 to 0.001
B) 0.014 to -0.009
C) -0.018 to -0.036

SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051


Residual 194 0.559 0.003
Total 196 0.662

NGR Corp. has two analysts following it and a market capitalization of $47
million. If the number of analysts on NGR Corp. were to double to 4, the change
in the forecast of NGR would be closest to?

A) −0.019.
B) −0.035.
C) −0.055.

SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051


Residual 194 0.559 0.003
Total 196 0.662

Based on a R2 calculated from the information in Table 2, the analyst should


conclude that the number of analysts and ln(market value) of the firm explain:

A) 84.4% of the variation in returns.


B) 18.4% of the variation in returns.
C) 15.6% of the variation in returns.

SCHOOL OF BUSINESS
Model Interpretation
Big picture, across model comments
Across all variations, STR has a statistically
significant, negative relationship with test
scores.

Which model seems best?


Both STR and PCT_EL change magnitudes
depending on the presence of other
controls. Model 3 has highest Adj. R2 /
lowest SER and seems to best address
omitted variable bias.

Econ. interp. / sig of coeffs


In Model 3, a 1 student increase in average
class size predicts a 1 point drop in test
scores. Since average class sizes vary by
more than 20 students across the sample,
these differences matter on 800 point test

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

A dummy variable is a 0 or 1 variable that groups the data


into categories:

• ACTION = 1 if this was an action movie


• SEQUEL = 1 if this movie is a sequel

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

To interpret the dummy, write out the pred. eqtn by category.


Overall, it’s

BOX = $5,672,516 + 236,527* BUDGET – 2,807,283*ACTION + …

For a new comedy (ACTION, SEQUEL, HORROR = 0), our prediction


is BOX = $5,672,516 + 236,527* BUDGET – 2,807,283*(0)

For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –


2,807,283*(1) = $2,865,233 + 236,527* BUDGET

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –


2,807,283*(1) = $2,865,233 + 236,527* BUDGET

When the dummy appears alone, it is an intercept shifter: the


prediction is box office will be lower for an action movie with the
same budget as a non-action movie (?!)

However, significance matters here: the weak t-stat (p-value 53%)


says that we can’t reject the null that box office for action movies is
no different than other movies.

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.
What if we had the idea that action movies generate more box office
per dollar of budget – a different slope.

By multiplying the variable by the appropriate dummy, we can model


this difference in slope:

BOX = a + c*ACTION + b*BUDGET + d*(BUDGET*ACTION) +…

Not Action (ACTION=0)


BOX = a + c*0 + b*BUDGET + d*(BUDGET*0) +.. = a + b*BUDGET…

Action Movie (ACTION=1)


BOX = a + c*1 + b*BUDGET + d*(BUDGET*1) = (a+c) + (b+d)*BUDGET …

d is difference in slope for action movies; c is the different intercept.

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

How do you interpret these results?

SCHOOL OF BUSINESS
Specification Tricks: Logs
Lots of regression specifications use logs:

I. linear-log Yi = β0 + β1ln(Xi) + ui
II. log-linear ln(Yi) = β0 + β1Xi + ui
III. log-log ln(Yi) = β0 + β1ln(Xi) + ui

There are two main reasons to use logs:

1) It’s an easy cure for skewed/heteroskedastic data


2) Coefficients on logs give percentage changes
(elasticities)

SCHOOL OF BUSINESS
Logs: Skew / Heteroskedasticity
Lots of size type variables in finance are very skewed, which will distort OLS.
Taking the log of data like this gives a more normal distribution.
Current Assets ln(Current Assets)
4.0e-04

.3
3.0e-04 4.0e-04

.3
2.0e-04 3.0e-04

.2 .2
Density

Density
Density

Density
1.0e-04 2.0e-04

.1 .1
1.0e-04 0

0 20000 40000 60000 2 4 6 8 10


Current Assets - Total l_act
0

0 20000 40000 60000 2 4 6 8 10


Current Assets - Total l_act

SCHOOL OF BUSINESS
Specification Tricks: Logs
I. linear-log Yi = β0 + β1ln(Xi) + ui
1% change in X  β1 unit change in Y

II. log-linear ln(Yi) = β0 + β1Xi + ui


1 unit change in X  β1% change in Y

III. log-log ln(Yi) = β0 + β1ln(Xi) + ui


1% change in X  β1% change in Y

Important: You can’t compare SER, R2, etc.


across a model of Y vs. ln(Y) – different units.

SCHOOL OF BUSINESS
Violations of Regression Assumptions
Regression Assumption Condition if Violated
Error term has constant Heteroskedasticity
variance.
Error terms are not Serial correlation
correlated with each other. (autocorrelation)
No exact linear relationship Multicollinearity
among “X” variables.

Define it, Explain its effect on OLS

Detect it, Correct for it

SCHOOL OF BUSINESS
Heteroskedasticity
Type 1: Unconditional heteroskedasticity – doesn’t matter

Type 2: Conditional heteroskedasticity


• Related to independent variables (next slide)
• This IS a problem
• Impact: t-stats are usually artificially high

Not affected
Too small in OLS
• Standard error too low = t-stat too high; Type I errors

SCHOOL OF BUSINESS
Conditional Heteroskedasticity
Y Low residual
variance

High residual
variance

0 X
Detection: Scatter diagrams can show when error
variance changes systematically with an X variable.

SCHOOL OF BUSINESS
Conditional Heteroskedasticity
Breusch-Pagan test: Regress squared
residuals on “X” variables.

• Point: Test significance of resulting R2 (do the independent


variables explain a significant part of the variation in
squared residuals?)
• H0: No heteroskedasticity
• Chi-square test: BP = Rresid2 × n (with k df)

Name Drop: B-P Test detects heteroskedasticity

SCHOOL OF BUSINESS
Correcting for Heteroskedasticity
First Method: Use STATA “robust” standard
errors (Huber-White standard errors).
Result: Relative to OLS, standard errors
higher, t-stats lower, and conclusions more
accurate
Second Method: Use generalized least
squares, modifying original equation to
eliminate heteroscedasticity (not on CFA 2).

SCHOOL OF BUSINESS
Serial Correlation: Definition
Positive autocorrelation: Each error term is
positively correlated w/ previous error.
• Common in financial time series data; not as common for
cross-sectional data.

Same problems as heteroskedasticity


• OLS t-stats are too high (Type I errors)
Again: False
significance
Not affected
Too small in OLS

SCHOOL OF BUSINESS
Serial Correlation: Detection
Residual Plots – clusters of +/- errors

Durbin-Watson statistic
DW ≅ 2(1 – r)
Three cases: No correlation, positive correlation, and
negative correlation
• No autocorrelation (ρ = 0)
• DW ≅ 2(1 – 0) = 2
• Positive serial correlation (ρ = 1)
• DW ≅ 2(1 – 1) = 0
• Negative serial correlation (ρ = –1)
• DW ≅ 2(1 – (– 1) ) = 4

SCHOOL OF BUSINESS
Serial Correlation: Correction
Preferred method: Use HAC Std. Errors
• Hansen or Newey West Heteroskedastcity and
Autocorrelation Consistent Std. Errors are bigger than OLS
errors, which offsets OLS tendency to over-reject H0.
• Some gymnastics required to implement in STATA

Alternative: Quasi-Differencing
• Old school: transform data with an estimate of the
correlation between errors so that the new data is not
serially correlated.

SCHOOL OF BUSINESS
Multicollinearity
Define: Two or more “X” variables are strongly
correlated with each other
Intuition: X1 and X2 strongly correlated: hard
to estimate effect of changing X1 when X2 is
held constant.
Effects:
• Inflates OLS SEs; reduces OLS t-stats; increases chance of
Type II “should reject but don’t” errors

• Point: t-stats artificially small so variables falsely look


unimportant

SCHOOL OF BUSINESS
Multicollinearity: Detection & Correction
Observation 1: Significant F-stat (and high R2),
but all t-stats insignificant

Observation 2: High correlation between “X”


variables (more complicated for k>2)

Correction
• Omit one or more of the correlated “X” variables

SCHOOL OF BUSINESS
Perfect Multicollinearity: Dummy Trap
Suppose your data can be perfectly sorted into 2 (or more)
categories: e.g. USD Alums and others

If you include a USD Alum and Not USD Alum dummy together,
then summing the Alum + Not variables will equal 1 for every
observation, which is identical to the constant – “perfect
multicollinearity”

Correction
• Estimate the constant and leave out one dummy: constant is
the intercept for the omitted dummy, other dummies are
deviations from that intercept.
• No constant and all the dummies: each dummy is an intercept.

SCHOOL OF BUSINESS
Summarizing Problems & Fixes

Conditional
Violation Serial Correlation Multicollinearity
Heteroskedasticity

Residual variance related Residuals are Two or more X’s are


What is it?
to level of X’s correlated correlated

Effect? Type I errors (overreject) Type I errors Type II errors

Breusch-Pagan Conflicting t and F


Detection? Durbin-Watson test
chi-square test statistics
Hansen / Newey
Use White-corrected Drop one of the correlated
Correction? West standard errors
standard errors variables

SCHOOL OF BUSINESS
Key Concepts for CFA 2
• Regression: Output, Hypo Test, Conf. Int.

• ANOVA table

• OLS Problems: Detect, Effects, Cures

SCHOOL OF BUSINESS

You might also like