0% found this document useful (0 votes)
6 views

Lec Topic6

Uploaded by

xinyangw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lec Topic6

Uploaded by

xinyangw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

ECON2280 Introductory Econometrics

First Term, 2024-2025

Multiple Regression Analysis: Further Issues

Fall, 2024

1 / 33
Effects of Data Scaling on OLS Statistics

2 / 33
Example

▶ Consider the model relating infant birth weight to cigarette smoking


and family income:

\ = β̂0 + β̂1 cigs + β̂2 faminc


bwght

– bwght: child birth weight, in ounces


– cigs: number of cigarettes smoked by the mother while
pregnant, per day.
– faminc: annual family income, in thousands of dollars.

▶ The t statistic on cigs is -5.06, so the variable is very statistically


significant.
▶ Now, we show how the coefficients, standard errors, confidence
intervals, t statistics, and F statistics change when the dependent
and independent variables are rescaled.

3 / 33
Chapter 6 Multiple Regression Analysis: Further Issues 187
Example
T a b l e 6 . 1 Effects of Data Scaling
Dependent Variable (1) bwght (2) bwghtlbs (3) bwght
Independent Variables
cigs 2.4634 2.0289 —
(.0916) (.0057)
packs — — 29.268
(1.832)
faminc .0927 .0058 .0927
(.0292) (.0018) (.0292)
intercept 116.974 7.3109 116.974
(1.049) (.0656) (1.049)
Observations 1,388

© Cengage Learning, 2013


1,388 1,388
R-Squared .0298 .0298 .0298
SSR 557,485.51 2,177.6778 557,485.51
SER 20.063 1.2539 20.063

▶ What happens
The estimates of thiswhen weobtained
equation, measure birth
using the data inweight
BWGHT.RAW,in pounds,
are given inrather
the
than
first in ounces?
column of Table Let
6.1. Standard
bwghtlbs = listed
errors are in parentheses. The estimate on cigs
bwght/16.
says that if a woman smoked 5 more cigarettes per day, birth weight is predicted to be
about .4634(5) 5 2.317 ounces less. The t statistic on cigs is 25.06, so the variable is very
▶ Allstatistically
the coefficients,
significant. standard errors, CIs, the standard error of the
regression (SER)
Now, suppose that are rescaled
we decide bybirth
to measure 1/16;
weightSSR arerather
in pounds, rescaled by (1/16)2 ;
than in ounces.
Let bwghtlbs 5 bwght/16 be birth weight in pounds. What happens to our OLS statistics
t statistics,
if we use this asFthe
statistics, and inR-squared
dependent variable remain
our equation? It is easy to unchanged.
find the effect on the
coefficient estimates by simple manipulation of equation (6.1). Divide this entire equation
by 16: 4 / 33
Example
Chapter 6 Multiple Regression Analysis: Further Issues 187

T a b l e 6 . 1 Effects of Data Scaling


Dependent Variable (1) bwght (2) bwghtlbs (3) bwght
Independent Variables
cigs 2.4634 2.0289 —
(.0916) (.0057)
packs — — 29.268
(1.832)
faminc .0927 .0058 .0927
(.0292) (.0018) (.0292)
intercept 116.974 7.3109 116.974
(1.049) (.0656) (1.049)
Observations 1,388

© Cengage Learning, 2013


1,388 1,388
R-Squared .0298 .0298 .0298
SSR 557,485.51 2,177.6778 557,485.51
SER 20.063 1.2539 20.063

▶ LetTheusestimates
change cigs
of this to packs.
equation, In particular,
obtained using packs =arecigs/20.
the data in BWGHT.RAW, given in the
first column of Table 6.1. Standard errors are listed in parentheses. The estimate on cigs
▶ Allsays
thethatstatistics remain
if a woman smoked unchanged,
5 more cigarettes perexcept
day, birth that
weight the coefficient,
is predicted to be
about .4634(5) 5 2.317 ounces less. The t statistic on cigs is 25.06, so the variable is very
standard errors, and
statistically significant. CIs for β̂ 2 are rescaled by 20.
Now, suppose that we decide to measure birth weight in pounds, rather than in ounces.
Let bwghtlbs 5 bwght/16 be birth weight in pounds. What happens to our OLS statistics
if we use this as the dependent variable in our equation? It is easy to find the effect on the
coefficient estimates by simple manipulation of equation (6.1). Divide this entire equation 5 / 33
Beta Coefficients

▶ To interpret the OLS estimates and gauge the economic magnitude,


we often standardize the dependent and independent variables.
Example:
– How to compare the estimated effects of SAT and high school
GPA on college GPA? They are in different units, and hence
not directly comparable.
– But the standardized scores are comparable.
– Interpretation: How college GPA changes on average when the
SAT is one standard deviation higher? How does it compare to
the change when the high school GPA is one standard
deviation higher?

▶ The original OLS equation:

yi = β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik + ûi .

6 / 33
Beta Coefficients
▶ Simple algebra gives the equation:

yi − ȳ = β̂1 (xi1 − x̄1 ) + β̂2 (xi2 − x̄2 ) + · · · + β̂k (xik − x̄k ) + ûi

Then,
(yi −ȳ )/σ̂y = (σ̂1 /σ̂y )β̂1 [(xi1 −x̄1 )/σ̂1 ]+· · ·+(σ̂k /σ̂y )β̂k [(xik −x̄k )/σ̂k ]+(ûi /σ̂y ).

▶ It is useful to rewrite the equation (dropping the i subscript) as

zy = b̂1 z1 + b̂2 z2 + · · · + b̂k zk + ε,

where zy denotes the z-score of y , z1 is the z-score of x1 , and so on.


The new coefficients are

b̂j = (σ̂j /σ̂y )β̂j for j = 1, · · · , k.

These b̂j are called standardized coefficients or beta coefficients.

7 / 33
Unit Change in Logarithmic Form

▶ Changing the unit of measurement of y and x , when they appear in


logarithmic form, does not affect any of the slope estimates, but
may affect the intercept estimate. This follows from the simple fact
that
ln(c1 yi ) = ln(c1 ) + ln(yi ) ∀ c1 > 0.
The new intercept will be ln(c1 ) + β̂0 . We can apply the similar
derivation when changing the unit of measurement for xj .
▶ In the log-log models, the OLS coefficients of interest are estimated
elasticities, and hence are easily interpretable. But there is still
room for computing beta coefficients, especially in the case that the
variation of the independent variables are very different.
E.g., log household expenditure on entertainment has a much wider
variation than log household expenditure on foods.

8 / 33
More on Functional Form

9 / 33
More on Using Logarithmic Functional Forms
▶ Logarithmic transformations have the convenient elasticity
interpretation.
▶ Slope coefficients of logged variables are invariant to rescaling.
More on Functional Form
▶ Taking logs often eliminates/mitigates problems with outliers.

1.5

0.5

-0.5

-1

-1.5

-2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

log y
Figure: log y < y and lim y =0 10 / 33
More on Using Logarithmic Functional Forms

▶ Taking logs often helps to secure normality (e.g., ln(wage) vs.


wage) and homoskedasticity.
▶ Variables measured in units such as years (e.g., education,
experience, tenure, age, etc) usually should not be logged.
▶ Variables measured in percentage points (e.g., unemployment rate,
participation rate of a pension plan, etc.) usually should also not be
logged.
▶ Logs must not be used if variables take on zero or negative values
(e.g., hours of work during a month).
▶ It is hard to reverse the log-operation when constructing predictions.
(We will discuss more on this point later.)

11 / 33
Models with Quadratics

▶ Example: Suppose the fitted regression line for the wage equation is

[ = 3.73 + .298 exper − .0061 expr 2


wage
(.35) (.041) (.0009)
n = 526, R 2 = 0.093

▶ The predicted wage is a concave function of exper .

▶ The marginal effect of exper on wage is

∂wage
= β̂1 + 2β̂2 exper = .298 − 2 × .0061exper .
∂exper

▶ The first year of experience increases the wage by some $.298, the
second year by .298 − 2(.0061)(1) = $.286 < $.298

12 / 33
of the curve to the right of 24 can be ignored. The cost of using a quadratic to capture
­diminishing effects is that the quadratic must eventually turn around. If this point is beyond
all but a small percentage of the people in the sample, then this is not of much concern.
Models with Quadratics
wage ​
F i g u r e 6 . 1 Quadratic relationship between 
​ and exper.

wage

7.37

3.73

© Cengage Learning, 2013


24.4 exper

▶ x∗ = β̂1
= .298
≈ 24.4
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2(.0061)
2β̂2

▶ What should we make of that the return to experience turns


negative after 24.4 years? It depends on (i) how many observations
in the sample lie to the right of the turning point (28% in the
sample); (ii) There may be a specification problem (e.g., omitted
variables)
13 / 33
Example: Effects of Pollution on Housing Prices

▶ The fitted regression line is

ln(price)
\ = 13.39 − .902 ln(nox ) − .087 ln(dist)
(.57) (.115) (.043)
− .545 rooms + .062 rooms 2 − .048 stratio
(.165) (.013) (.006)
n = 526, R 2 = 0.093

– nox : nitrogen oxide in air


– dist: distance from employment centers, in miles
– stratio: student/teacher ratio

▶ The predicted ln(price) is a convex function of rooms.

▶ The coefficient of rooms is negative. Does this mean that, at a low


number of rooms, more rooms are associated with lower prices?

14 / 33
Example: Effects of Pollution on Housing Prices

▶ The marginal effect of rooms on ln(price) is

∂ ln(price) ∂price/price
= = −.545 + 2 × .062rooms
∂rooms ∂rooms
Chapter 6 Multiple Regression Analysis: Further Issues 197

Figure 6.2 
​ ) as a quadratic function of rooms.
log(price  ​ ▶ x∗ = −.545
2(.062) ≈ 4.4
log(price)

▶ The share of observations with


rooms less than 4.4 is 1%.
▶ Increase rooms from 5 to 6:

−.545 + .124(5) = +7.5%


© Cengage Learning, 2013

▶ Increase rooms from 6 to 7:


4.4 rooms

−.545 + .124(6) = +19.9%


and so

%Δ​
price ​ < 100{[2.545 1 2(.062)]rooms}Δrooms
5 (254.5 1 12.4 rooms)Δrooms.
15 / 33
Other Possibilities

▶ Using Quadratics Along with Logarithms:

ln(price)
\ =β0 + β1 ln(nox ) + β2 ln(nox )2
+ β3 crime + β4 rooms + β5 rooms 2 + β6 stratio + u,

which implies

∂ ln(price) %∂price
= = β1 + 2β2 ln(nox ).
∂ ln(nox ) %∂nox

▶ Higher Order Polynomials: It is often assumed that the total cost


takes the following form,

cost = β0 + β1 quantity + β2 quantity 2 + β3 quantity 3 + u,

which implies a U-shaped marginal cost (MC), where β0 is the total


fixed cost.

16 / 33
Other Possibilities More on Functional Form

TVC
TC

TFC

0 0

Figure: Quadratic MC Implies Cubic TC: q is the inflection point


17 / 33
Models with Interaction Terms

▶ In the model

price = β0 + β1 sqrft + β2 bdrms + β3 sqrft · bdrms + β4 bthrms + u,

sqrft · bdrms is the interaction term.


▶ The marginal effect of bdrms on price is

∂price
= β2 + β3 sqrft.
∂bdrms
▶ The effect of the number of bedrooms depends on the level of
square footage.
▶ Interaction effects complicate interpretation of parameters: β2 is the
effect of number of bedrooms, but for a square footage of zero.
▶ How to avoid this interpretation difficulty?

18 / 33
Models with Interaction Terms

▶ The model
y = β0 + β1 x 1 + β2 x 2 + β3 x 1 x 2 + u
can be rewritten as

y = α0 + δ1 x1 + δ2 x2 + β3 (x1 − µ1 )(x2 − µ2 ) + u,

where µ1 = E [x1 ] and µ2 = E [x2 ] are population means of x1 and


x2 , and can be replaced by their sample means.
▶ It is straightforward to show that

α̂0 = β̂0 − β̂3 µ1 µ2

δ̂1 = β̂1 + β̂3 µ2


δ̂2 = β̂2 + β̂3 µ1

19 / 33
Models with Interaction Terms

▶ Now
∂y
= δ2 + β3 (x1 − µ1 ),
∂x2
i.e., δ2 is the effect of x2 if all other variables take on their mean
values.
▶ Advantages of reparametrization:

– It is easy to interpret all parameters. In particular, the


coefficients on the original variables have a useful
interpretation.
– Standard errors for partial effects at the mean values are
available.
– If necessary, interaction may be centered at other interesting
values.

20 / 33
More on Goodness-of-Fit and Selection of
Regressors

21 / 33
Adjusted R-Squared

▶ General remarks on R-squared:

– A high R-squared does not imply that the estimates have a


causal interpretations.
– A low R-squared does not preclude precise estimation of partial
effects.

▶ Recall that
SSR/n σ̃ 2
R2 = 1 − = 1 − u2 .
SST /n σ̃y
σ2
so R 2 is an estimator the population R-squared: ρ2 = 1 − σu2 , which
y
is the proportion of the variation in y in the population explained by
the independent variables.
▶ However, σ̃u2 and σ̃y2 are biased estimators for σu2 and σy2 .

22 / 33
Adjusted R-Squared

▶ Adjusted R-Squared:

SSR/(n − k − 1) σ̂ 2
R̄ 2 = 1 − = 1 − u2 ,
SST /(n − 1) σ̂y

where σ̂u2 and σ̂y2 are unbiased estimators of σu2 and σy2 due to the
correction of degree of freedoms.
▶ R̄ 2 imposes a penalty for adding new regressors: k ↑ =⇒ R 2 ↓

▶ R̄ 2 increases if and only if the t statistic of a newly added regressor


is greater than one in absolute value. [proof not required]
Compared with y = β0 + β1 x1 + u, the regression
y = β0 + β1 x1 + β2 x2 + u has a larger R̄ 2 if an only if

|tβ̂2 | > 1.

23 / 33
Adjusted R-Squared

▶ Relationship between R 2 and R̄ 2 :

SSR n − k − 1 SSR/(n − k − 1) n−k −1


1 − R2 = = = (1 − R̄ 2 ).
SST n−1 SST /(n − 1) n−1

Therefore, we have
n−1
R̄ 2 = 1 − (1 − R 2 ) < R2
n−k −1

unless k = 0 or R 2 = 1.
▶ Note that R̄ 2 even gets negative if R 2 < n−1 .
k

24 / 33
Relationship Between R̄ 2 and R 2
More on Goodness-of-Fit and Selection of Regressors

0 1

2
Figure: Relationship Between R and R 2

25 / 33
Using Adjusted R 2 to Choose between Nonnested Models

▶ Models are nonnested if neither model is a special case of the other.

▶ For example, to incorporate diminishing return of sales to R&D, we


consider two models:
(1) rdintens = β0 + β1 ln(sales) + u
(2) rdintens = β0 + β1 sales + β2 sales 2 + u,

where rdintens =R&D intensity


▶ R 2 = 0.061 and R̄ 2 = 0.030 in model (1), and R 2 = 0.148 and
R̄ 2 = 0.090 in model (2).
▶ A comparison between the R-squared of both models is unfair to the
first model because the first model contains fewer parameters.
▶ In the given example, even after adjusting for the difference in
degrees of freedom, the quadratic model is preferred.

26 / 33
Comparing Models with Different Dependent Variables

▶ R-squared or adjusted R-squared can NOT be used to compare


models which differ in their definition of the dependent variable.
▶ Example (CEO Compensation and Firm Performance):

\ = 830.63 + .0163 sales + 19.63 roe


salary
(223.90) (.0089) (11.08)
n = 209, R 2 = .029, R̄ 2 = .020, SST = 391, 732, 982

and
\ = 4.36 + .275 lsales + .0179 roe
lsalary
(0.29) (.033) (0.0040)
n = 209, R 2 = .282, R̄ 2 = .275, SST = 66.72

▶ There is much less variation in lsalary to be explained than in


salary , so it is not fair to compare R 2 and R̄ 2 of the two models.

27 / 33
Controlling for Too Many Factors in Regression Analysis

▶ In some cases, certain variables should not be held fixed:

– In a regression of traffic fatalities on state beer taxes (and other


factors) one should not directly control for beer consumption.

– why? Beer taxes influence traffic fatalities only through beer


consumption; if beer consumption is controlled, then the coefficient
of beer taxes measures the indirect effect of beer taxes, which is
hardly interesting.

– In a regression of family health expenditures on pesticide usage


among farmers one should not control for doctor visits.

– why? Health expenditures include doctor visits, and we would like


to pick up all effects of pesticide use on health expenditure.

28 / 33
Controlling for Too Many Factors in Regression Analysis

▶ Different regressions may serve different purposes:

▶ For a regression that relates house prices on house characteristics:


– If the purpose of the regression is to study the validity of
assessments, one would include price assessments and also
housing attributes
– If the purpose of the regression is to estimate a hedonic price
model which measures the marginal values of various housing
attributes, one should not include price assessments

29 / 33
Adding Regressors to Reduce the Error Variance

▶ Recall that
σ2
Var (β̂j ) = .
SSTj (1 − Rj2 )
– On the one hand, adding regressors may exacerbate
multicollinearity problems (Rj2 ↑)
– On the other hand, adding regressors reduces the error
variance (σ 2 ↓)

▶ Variables that are uncorrelated with other regressors should be


added because they reduce error variance (σ 2 ↓) without increasing
multicollinearity (Rj2 remains the same.)
▶ Example (Individual Beer Consumption and Beer Prices): Including
individual characteristics in a regression of beer consumption on
beer prices leads to more precise estimates of the price elasticity if
individual characteristics are uncorrelated with beer prices.

30 / 33
Predicting y When ln(y ) is the Dependent Variable
▶ Note that
ln(y ) = β0 + β1 x1 + · · · + βk xk + u
=⇒ y = exp(β0 + β1 x1 + · · · + βk xk ) exp(u) = m(x) exp(u)
▶ Under the additional assumption that u is independent of
(x1 , · · · , xk ), we have

E [y |x] = exp(β0 + β1 x1 + · · · + βk xk )E [exp(u)|x]


= exp(β0 + β1 x1 + · · · + βk xk )E [exp(u)]
= m(x)α0 ,

where the second equality is due to the independence between u


and x , so the predicted y is

ŷ = m̂(x)α̂0
Pn
where m̂(x) = exp(β̂0 + β̂1 x1 + · · · + β̂k xk ) and α̂0 = 1
n i=1 exp(ûi )

31 / 33
Comparing R-Squared of a Logged and an Unlogged
Specification

▶ Reconsider the CEO salary problem:

\ = 613.43 + .0190 sales + .0234 rmktval + 12.70 rceoten


salary
(65.23) (.0100) (.0095) (5.61)
n = 177, R 2 = .201

and
\ = 4.504 + .163 sales + .0109 rmktval + .0117 rceoten
lsalary
(.257) (.039) (.050) (.0053)
n = 177, R̃ 2 = .318

▶ R 2 and R̃ 2 are the R-squareds for the predictions of the unlogged


salary variable (although the second regression is originally for
logged salaries). They can be directly compared.

32 / 33
About R̃ 2

▶ Recall that
R 2 = Corr
d (y , ŷ )2 ,

where ŷ is the predicted value of y .


▶ When lsalary is the dependent variable, the predicted value of y is
m̂(x)α̂0 = ỹ α̂0 , where ỹ = m̂(x)
▶ Since α̂0 > 0,

d (y , ŷ ) = Corr
Corr d (y , α̂0 ỹ ) = Corr
d (y , ỹ )

which is invariant to α̂0 .


Recall that for a > 0, Corr (X , aY ) = Corr (X , Y ) = √ Cov (X ,Y )
Var (X )Var (Y )

▶ As a result,
R̃ 2 = Corr
d (y , ỹ )2 ,

33 / 33

You might also like