Exercise 1 Solution Exercise 1 Solution
Exercise 1 Solution Exercise 1 Solution
Exercise 1 solution
• (i) Heteroskedasticity.
• (iii) A sample correlation coefficient of .95 between two independent variables both
included in the model.
Only (ii), omitting an important variable, can cause bias, and this is true only when the
omitted variable is correlated with the included explanatory variables. The homoskedas-
ticity assumption, MLR.5, played no role in showing that the OLS estimators are unbiased.
(Homoskedasticity was used to obtain the usual variance formulas for the bˆj) Further, the
degree of collinearity between the explanatory variables in the sample, even if it is re-
flected in a correlation as high as .95, does not affect the Gauss-Markov assumptions.
Only if there is a perfect linear relationship among two or more explanatory variables is
MLR.3 violated.
The following equation describes the median housing price in a community in terms of
amount of pollution (nox for nitrous oxide) and the average number of rooms in houses in
the community (rooms):
• (i) What are the probable signs of β1 and β2 ? What is the interpretation of β1 ?
Explain.
We expectβ1 < 0 because more pollution can be expected to lower housing values; note
that β1 is the elasticity of price with respect to nox. β2 is probably positive because rooms
roughly measures the size of a house. (However, it does not allow us to distinguish homes
where each room is large from homes where each room is small.)
• (ii) Why might nox [or more precisely, log(nox)] and rooms be negatively corre-
lated? If this is the case, does the simple regression of log(price) on log(nox)
produce an upward or a downward biased estimator of β1 ?
If we assume that rooms increases with quality of the home, then log(nox) and rooms are
negatively correlated when poorer neighborhoods have more pollution, something that is
often true. We can use Table 3.2 to determine the direction of the bias. If β2 > 0 and
Corr(x1 ,x2 ) < 0, the simple regression estimator β˜1 has a downward bias. But because
β1 < 0, this means that the simple regression, on average, overstates the importance of
pollution. [E(β̃1 ) is more negative than β1 ].
• (iii) Using the data in HPRICE2.RAW, the following equations were estimated:
\ = 11.71 − 1.043log(nox),
log(price) n = 506, R2 = 0.264
• (iv) Is the relationship between the simple and multiple regression estimates of the
elasticity of price with respect to nox what you would have predicted, given your an-
swer in part? (ii) Does this mean that -0.718 is definitely closer to the true elasticity
than -1.043?
This is what we expect from the typical sample based on our analysis in part (ii). The
simple regression estimate, −1.043, is more negative (larger in magnitude) than the multi-
ple regression estimate, −.718. As those estimates are only for one sample, we can never
know which is closer to β1 . However, if this is a “typical” sample, the true β1 is closer to
−.718.
3. The file CEO.dat contains data on 447 chief executive officers and can be used to
examine the effects of firm performance on CEO salary.
• (i) Estimate a model relating annual salary to firm sales and assets value. Make the
model of the constant elasticity variety for both independent variables. Write the
results out in equation form.
\ = 4.254 − 0.193log(sales) + 0.156log(assets),
log(salary) n = 447,
n = 447, R2 = 0.245
The coefficient on sales imples that every time sales increases by 1%, salary increases
by 0.193%, or another way of saying it is that everytime sales increases by a 100%, salary
increases by 19.3%. The coefficient for assets implies that everytime assets increase by
1%, salary increases by 0.156%.
• (ii) Add profits to the model from part (i). Why can this variable not be included
in logarithmic form? Would you say that these firm performance variables explain
most of the variation in CEO salaries?
\ = 4.674 − 0.151log(sales) + 0.146log(assets) + 0.0000436pro f its,
log(salary)
n = 447, R2 = 0.252
The coefficient on profits is very small. Here, profits are measured in millions, so if
❉
profits increase by $1 billion, which means profits = 1,000 – a huge change – predicted
salary increases by about only 4.36%. However, remember that we are holding sales and
assets fixed.
• (iii) Add the variable tenure to the model in part (ii). What is the estimated percent-
age return for another year of CEO tenure, holding other factors fixed?
\ = 4.491 − 0.159log(sales) + 0.149log(assets) + 0.000041pro f its + 0.0128846tenure
log(salary)
,
n = 447, R2 = 0.280
The coefficient on tenure means that every extra year of tenure as CEO results in
0.012*100 = 1.2% increase in salary
• (iv) Find the sample correlation coefficient between the variables log(sales) and
profits. Are these variables highly correlated? What does this say about the OLS
estimators?
The sample correlation between log(sales) and profits is about .582, which is fairly high.
As we know, this causes no bias in the OLS estimators, although it can cause their vari-
ances to be large. Given the fairly substantial correlation between log(sales) and firm
profits, it is not too surprising that the latter adds nothing to explaining CEO salaries.
4. The file Kc house data.dat contains data on This dataset contains house sale prices
for King County, which includes Seattle. It includes homes sold between May 2014
and May 2015.
• (i) Confirm the partialling out interpretation of the OLS estimates by explicitly doing
the partialling out. Regress log(price) on log(sq f t − living), log(sq f t − lot) and
f loors.
• (ii) Regress log(sq f t − living) on log(sq f t − lot) and f loors, and save the residual,
which we can call, log(sq f t ˜− living).
• (iii) Now regress log(price) on log(sq f t ˜− living). Can you confirm that the coeffi-
cient on log(sq f t ˜− living) is the same we get for log(sq f t − living) on the regres-
sion in (i). What about the standard errors in (ii), are they the same?
• (iv) Run a regression that also gives you the same standard errors. (hint: you need to
remove the proportion of variance coming from log(sq f t − lot) and f loors on our
dependent variable)
5. Use the Kc house data.dat again, this time we look at ommited variable bias.
• (i) Run a simple regression of log(sq f t − living) on log(sq f t − above), to obtain the
slope coefficient, δ̃1 .
• (ii) Run a simple regression of log(price) on log(sq f t − living), to obtain the slope
coefficient, β̃1 .
We can verify that 0.8367 = 0.8271 + 0.0110 × 0.8596. Note that the difference is due to
rounding.
A problem of interest to health officials (and others) is to determine the effects of smoking
during pregnancy on infant health. One measure of infant health is birth weight; a birth
weight that is too low can put an infant at risk for contracting various illnesses. Since
factors other than cigarette smoking that affect birth weight are likely to be correlated
with smoking, we should take those factors into account. For example, higher income
generally results in access to better prenatal care, as well as better nutrition for the mother.
An equation that recognizes this is:
Probably β2 > 0, as more income typically means better nutrition for the mother and better
prenatal care.
• (ii) Do you think cigs and faminc are likely to be correlated? Explain why the
correlation might be positive or negative.
On the one hand, an increase in income generally increases the consumption of a good,
and cigs and faminc could be positively correlated. On the other, family incomes are also
higher for families with more education, and more education and cigarette smoking tend to
be negatively correlated. The sample correlation between cigs and faminc is about −0.173,
indicating a negative correlation.
• (iii) Now, estimate the equation with and without faminc, using the data in BWGHT
.RAW. Report the results in equation form, including the sample size and R-squared.
Discuss your results, focusing on whether adding faminc substantially changes the
estimated effect of cigs on bwght.
The effect of cigarette smoking is slightly smaller when faminc is added to the regression,
but the difference is not great. This is due to the fact that cigs and faminc are not very
correlated, and the coefficient on faminc is practically small. (The variable faminc is
measured in thousands, so 10000 USD more in 1988 income increases predicted birth
weight by only 0.93 ounces)