PS4 Intro to Econometrics
PS4 Intro to Econometrics
Problem Set 4
Please include the names of people that worked together with you in this assignment.
Make sure that your answers are clear and easily readable.
1. Two researchers independently collect samples of size N for an outcome y and regressor x from the same
population. They suggest that averaging their estimates will give a more precise estimate than either researcher’s
estimate alone for a simple linear regression model 𝑦! = 𝛽" + 𝛽# 𝑥! + 𝜖! . Let researcher A’s estimate be 𝛽#̇ and
researcher B’s estimate be 𝛽#̈ . Assume that each of them collected a sample of size N.
We learned in class that the variance associated with the OLS slope estimator is given by:
𝜎$
𝑉𝑎𝑟,𝛽#̇ - =
∑%
!&#(𝑥! − 𝑥̅ )$
𝜎$
𝑉𝑎𝑟,𝛽#̈ - = $
∑%'&#,𝑥' − 𝑥̅ -
Notice that each denominator varies with the sample collected from the regressor 𝑥! .
The “averaged” estimator they suggested is:
(𝛽#̇ + 𝛽#̈ )
𝛽5# =
2
i) Find the variance of this estimator. Hint: the two researchers used independent samples, so 𝛽#̇ , 𝛽#̈ are
also independent from each other.
ii) If instead of using the “averaged” estimator, assume that the two researchers combined their data into
one sample of size 2N, and get the combined estimator denoted by 𝛽#∗ . Show that the variance of 𝛽5# is
at least as big as the variance of 𝛽#∗ .
$
Hint 1: Notice that ∑$% $ % $ %
)&#(𝑥) − 𝑥̅ ) = ∑!&#(𝑥! − 𝑥̅ ) + ∑'&#,𝑥' − 𝑥̅ -
Hint 2: you will need the result that 4𝑧𝑦 ≤ (𝑧 + 𝑦)$
2. Suppose you estimate a populational model 𝑦! = 𝛽" + 𝛽# 𝑥! + 𝜖! by OLS, where the outcome is house prices and
the regressor is household income. Assume you observe an estimate of 𝛽:# = 50 and that Stata reports a
𝑠. 𝑒. ,𝛽:# - = 20, calculated assuming homoskedasticity (or 𝑉𝑎𝑟(𝜖|𝑥! ) = 𝜎 $ for all x).
a) If the true value of 𝛽# is assumed to be 0, what is the associated (two-sided) p-value? What is the p-value is
the true value of this parameter is assumed to be 40? Explain in your own words what the p-value means.
(You may assume that the sample size is large, so you can use the normal approximation)
c) Now you ask Stata to take heteroskedasticity into account in the estimation and get a new 𝑠. 𝑒. ,𝛽:# - = 50.
If the true value of 𝛽# is assumed to be zero, what is the associated (two-sided) p-value? Are you more or
less confident of an association between house price and household income? Explain.
3. Suppose you are estimating the model for the population of data scientists in Chicago:
where 𝑟𝑒𝑝𝑜𝑟𝑡𝑠! measures the number of reports produced in a month by employee i; 𝑠𝑒𝑛𝑖𝑜𝑟! is a dummy
variable equals to 1 if a worker i has been in the company for 10 years or more and zero otherwise; ℎ𝑜𝑢𝑟𝑠!
denote the weekly hours worked in the company. Assume 𝛽# > 0, 𝛽$ > 0, and 𝑐𝑜𝑣(𝑠𝑒𝑛𝑖𝑜𝑟! , ℎ𝑜𝑢𝑟𝑠! ) < 0.
4. Prove the OVB formula presented in lecture. Instead of simply copying lecture notes, reflect on the assumptions
used in every single step of the proof. Also, explain with your own words why we have positive (or negative)
OVB depending on the signs of 𝛽$ and the 𝑐𝑜𝑣(𝑥#! , 𝑥$! ). Give an intuition for the sign of the OVB.
Standard Normal table
t-Student table – part 1
414 APPENDIX C. DISTRIBUTION TABLES
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
One Tail One Tail Two Tails
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
One Tail One Tail Two Tails