Problem Set 2 Answer PDF
Problem Set 2 Answer PDF
1. Consider the simple regression model yi =β0 +β1xi +ui and let z be a binary
instrumental variable for x. Use (15.10) in the book to show that the IV estimator ˆ1
can be written as
y1 y 0
ˆ1 where y1 and x1 are the sample averages of yi and xi over the part of
x1 x0
the sample where z=1 and y 0 and x 0 are the sample averages of yi and xi over the
part of the sample where z=0.
This estimator, known as the grouping estimator, was first suggested by Wald (1940).
In the next problem and in the empirical part of the problem set below, we will refer
to this Wald estimator.
Step 1: Rewrite the numerator in the formula for ˆ1 dropping the z
n n
Remember, this is allowed because ( zi z ) ( xi x ) =
i 1
z ( x x ) and similarly when
i 1
i i
we replace x with y. (If you need to verify that statement, crank through the algebra to
show it.)
n n
n
z ( y y ) z y z y
i 1
i i
i 1
i i
i 1
i n1 y1 n1 y ,
n
where n1 = z
i 1
i is the number of observations with zi = 1, and we have used the fact
n
that zi yi /n1 = y1 , the average of the yi over the i with zi = 1. So far, we have shown
i 1
that the numerator in ˆ1 is n1( y1 – y ).
Step 2: Write y as a weighted average of the averages over the two subgroups:
y = (n0/n) y0 + (n1/n) y1 ,
where n0 = n – n1. Therefore,
2. Take the model yi =β0 +β1xi +ui and data, (yi , xi, zi ) i= 1,..., n where i denotes
entities, y is the dependent variable, and x is an explanatory variable for each entity
and z is an instrument that takes on the value of either 0 or 1 (a dummy variable).
Assume that both x and y are continuous. Note that the 2SLS estimator will be the
Wald Estimator discussed above.
Sample
y x z
20 3 0
20 0
30 3 0
6 0
50 3 0
40 4 0
65 2 0
70 0
45 8 0
30 9 0
8 1
75 9 1
60 8 1
60 1
55 7 1
8 1
90 7 1
85 9 1
75 4 1
90 7 1
Note: In the table, I blacked out some of the values of the data, but these were included
the regressions that follow. The idea is that you cannot calculate ˆ1 using a computer
package (or by hand doing averages).
Given the information provided below, what is ˆ1 ? (Note—not all of the following
information may be relevant.)
Sample Summary Statistics:
y 54.25 x 6.1 z 0.5
stdev(y)= 22.37 stdev(x)=2.31 stdev(z)=0.51
Answer:
y1 y0 24.5
Following the previous problem, ˆ1 10.208
x1 x0 24
3. (From 15.7) The following is a simple model to measure the effect of a school choice
program on standardized test performance (see Rouse[1998])
Where score is the score on a statewide test, choice is a binary variable indicating
whether a student attended a choice school in the last year, and faminc is family
income. The IV for choice is grant, the dollar amount granted by the government to
students to use for tuition at choice schools. The grant amount differed by family
income level, which is why we control for faminc in the equation.
(a) Even with faminc in the equation, why might choice be correlation with u?
Even at a given income level, some students are more motivated and more able than
others, and their families are more supportive (say, in terms of providing transportation)
and enthusiastic about education. Therefore, there is likely to be a self-selection
problem: students that would do better anyway are also more likely to attend a choice
school.
(b) If within each income class, the grant amounts were assigned randomly, is grant
uncorrelated with u?
Assuming we have the functional form for faminc correct, the answer is yes. Since u1
does not contain income, random assignment of grants within income class means that
grant designation is not correlated with unobservables such as student ability,
motivation, and family support.
(c) What other condition needs to be satisfied for grant to be a good instrument for
choice?
Grant needs to be correlated with choice: it seems plausible here that larger grants make
it more likely that families will send their child to a choice school.
(d) Write the reduced form equation for choice (that is, choice as a function of all
exogenous variables). What is needed for grant to be partially correlated with
choice?
The reduced form is
choice = 0 + 1faminc + 2grant + v2,
and we need 2 0. In other words, after accounting for income, the grant amount must
have some affect on choice. This seems reasonable, provided the grant amounts differ
within each income class.
(e) Write the reduced form equation for score (that is, score as a function of all
exogenous variables). Explain why this equation is useful. How do you interpret
the coefficient on grant?
The reduced form for score is just a linear function of the exogenous variables
score = 0 + 1faminc + 2grant + v1.
This equation allows us to directly estimate the effect of increasing the grant amount on
the test score, holding family income fixed. From a policy perspective this is itself of
some interest.