0% found this document useful (0 votes)
130 views5 pages

Problem Set 2 Answer PDF

The document discusses using an instrumental variable approach to estimate the effect of school choice on standardized test scores. [1] It provides an example model and uses data to calculate the Wald estimator, obtaining an estimate of 10.208 for the effect of choice on scores. [2] It then discusses conditions for an instrument - government grants assigned randomly within income levels - to be valid, namely that grants be correlated with choice but uncorrelated with unobserved student ability. [3] The document derives the relevant reduced form equations, explaining they allow estimating the direct effect of the instrument on the outcome variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views5 pages

Problem Set 2 Answer PDF

The document discusses using an instrumental variable approach to estimate the effect of school choice on standardized test scores. [1] It provides an example model and uses data to calculate the Wald estimator, obtaining an estimate of 10.208 for the effect of choice on scores. [2] It then discusses conditions for an instrument - government grants assigned randomly within income levels - to be valid, namely that grants be correlated with choice but uncorrelated with unobserved student ability. [3] The document derives the relevant reduced form equations, explaining they allow estimating the direct effect of the instrument on the outcome variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Homework on Instrument Variable—Chapter 15

1. Consider the simple regression model yi =β0 +β1xi +ui and let z be a binary
instrumental variable for x. Use (15.10) in the book to show that the IV estimator ˆ1
can be written as
y1  y 0
ˆ1  where y1 and x1 are the sample averages of yi and xi over the part of
x1  x0
the sample where z=1 and y 0 and x 0 are the sample averages of yi and xi over the
part of the sample where z=0.

This estimator, known as the grouping estimator, was first suggested by Wald (1940).
In the next problem and in the empirical part of the problem set below, we will refer
to this Wald estimator.

Step 1: Rewrite the numerator in the formula for ˆ1 dropping the z
n n
Remember, this is allowed because  ( zi  z ) ( xi  x ) =
i 1
 z ( x  x ) and similarly when
i 1
i i

we replace x with y. (If you need to verify that statement, crank through the algebra to
show it.)

n n
 n

 z ( y  y )   z y    z y
i 1
i i
i 1
i i
i 1
i  n1 y1  n1 y ,

n
where n1 = z
i 1
i is the number of observations with zi = 1, and we have used the fact

 n 
that   zi yi  /n1 = y1 , the average of the yi over the i with zi = 1. So far, we have shown
 i 1 
that the numerator in ˆ1 is n1( y1 – y ).

Step 2: Write y as a weighted average of the averages over the two subgroups:

y = (n0/n) y0 + (n1/n) y1 ,
where n0 = n – n1. Therefore,

y1 – y = [(n – n1)/n] y1 – (n0/n) y0 = (n0/n) ( y1 - y0 ).


Therefore, the numerator of ˆ can be written as
1
(n0n1/n)( y1 – y0 ).

Step 3: By simply replacing y with x, the denominator in ˆ1 can be expressed as


(n0n1/n)( x1 – x0 ). When we take the ratio of these, the terms involving n0, n1, and n,
cancel, leaving
ˆ1 = ( y1 – y0 )/( x1 – x0 ).

2. Take the model yi =β0 +β1xi +ui and data, (yi , xi, zi ) i= 1,..., n where i denotes
entities, y is the dependent variable, and x is an explanatory variable for each entity
and z is an instrument that takes on the value of either 0 or 1 (a dummy variable).
Assume that both x and y are continuous. Note that the 2SLS estimator will be the
Wald Estimator discussed above.

The following is some data to make this more concrete.

Sample
y x z
20 3 0
20 0
30 3 0
6 0
50 3 0
40 4 0
65 2 0
70 0
45 8 0
30 9 0
8 1
75 9 1
60 8 1
60 1
55 7 1
8 1
90 7 1
85 9 1
75 4 1
90 7 1
Note: In the table, I blacked out some of the values of the data, but these were included
the regressions that follow. The idea is that you cannot calculate ˆ1 using a computer
package (or by hand doing averages).

Given the information provided below, what is ˆ1 ? (Note—not all of the following
information may be relevant.)
Sample Summary Statistics:
y  54.25 x  6.1 z  0.5
stdev(y)= 22.37 stdev(x)=2.31 stdev(z)=0.51

Regression #1Dependent Variable: X


Method: Least Squares
Included observations: 20
Variable Coefficient Std. Error t-Statistic Prob.
Constant 4.900000 0.636832 7.694332 0.0000
Z 2.400000 0.900617 2.664840 0.0158
R-squared 0.282908 Mean dependent var 6.100000
Adjusted R-squared 0.243069 S.D. dependent var 2.314713
S.E. of regression 2.013841 Akaike info criterion 4.332604
Sum squared resid 73.00000 Schwarz criterion 4.432177
Log likelihood -41.32604 F-statistic 7.101370
Durbin-Watson stat 1.514521 Prob(F-statistic) 0.015786

Regression #2 Dependent Variable: Y


Method: Least Squares
Included observations: 20
Variable Coefficient Std. Error t-Statistic Prob.
Constant 36.48330 14.13098 2.581795 0.0188
X 2.912574 2.172712 1.340524 0.1967

R-squared 0.090772 Mean dependent var 54.25000


Adjusted R-squared 0.040259 S.D. dependent var 22.37686
S.E. of regression 21.92180 Akaike info criterion 9.107479
Sum squared resid 8650.172 Schwarz criterion 9.207052
Log likelihood -89.07479 F-statistic 1.797005
Durbin-Watson stat 0.836087 Prob(F-statistic) 0.196750

Regression #3 Dependent Variable: Y


Method: Least Squares
Included observations: 20
Variable Coefficient Std. Error t-Statistic Prob.
Constant 42.00000 6.015027 6.982512 0.0000
Z 24.50000 8.506533 2.880139 0.0100
R-squared 0.315464 Mean dependent var 54.25000
Adjusted R-squared 0.277435 S.D. dependent var 22.37686
S.E. of regression 19.02119 Akaike info criterion 8.823623
Sum squared resid 6512.500 Schwarz criterion 8.923197
Log likelihood -86.23623 F-statistic 8.295202
Durbin-Watson stat 1.181612 Prob(F-statistic) 0.009963

Answer:
y1  y0 24.5
Following the previous problem, ˆ1    10.208
x1  x0 24
3. (From 15.7) The following is a simple model to measure the effect of a school choice
program on standardized test performance (see Rouse[1998])

score = β0 + β1choice + β2faminc + u

Where score is the score on a statewide test, choice is a binary variable indicating
whether a student attended a choice school in the last year, and faminc is family
income. The IV for choice is grant, the dollar amount granted by the government to
students to use for tuition at choice schools. The grant amount differed by family
income level, which is why we control for faminc in the equation.

(a) Even with faminc in the equation, why might choice be correlation with u?

Even at a given income level, some students are more motivated and more able than
others, and their families are more supportive (say, in terms of providing transportation)
and enthusiastic about education. Therefore, there is likely to be a self-selection
problem: students that would do better anyway are also more likely to attend a choice
school.

(b) If within each income class, the grant amounts were assigned randomly, is grant
uncorrelated with u?
Assuming we have the functional form for faminc correct, the answer is yes. Since u1
does not contain income, random assignment of grants within income class means that
grant designation is not correlated with unobservables such as student ability,
motivation, and family support.

(c) What other condition needs to be satisfied for grant to be a good instrument for
choice?
Grant needs to be correlated with choice: it seems plausible here that larger grants make
it more likely that families will send their child to a choice school.

(d) Write the reduced form equation for choice (that is, choice as a function of all
exogenous variables). What is needed for grant to be partially correlated with
choice?
The reduced form is
choice = 0 + 1faminc + 2grant + v2,

and we need 2  0. In other words, after accounting for income, the grant amount must
have some affect on choice. This seems reasonable, provided the grant amounts differ
within each income class.

(e) Write the reduced form equation for score (that is, score as a function of all
exogenous variables). Explain why this equation is useful. How do you interpret
the coefficient on grant?
The reduced form for score is just a linear function of the exogenous variables
score = 0 + 1faminc + 2grant + v1.

This equation allows us to directly estimate the effect of increasing the grant amount on
the test score, holding family income fixed. From a policy perspective this is itself of
some interest.

You might also like