Econometric Analysis of Cross Section and Panel Data, 2e: Models For Fractional Responses
Econometric Analysis of Cross Section and Panel Data, 2e: Models For Fractional Responses
1. Introduction
2. Possible Approaches to Fractional Responses
3. Fractional Logit and Probit
4. Endogenous Explanatory Variables
5. Two-Part Models
6. Panel Data
1
1. INTRODUCTION
∙ Suppose y is a fractional response, that is, 0 ≤ y ≤ 1.
∙ Allow the possibility that y is a corner solution at zero, one, or both. It
could also be an essentially continuous variable strictly between zero
and one in the population.
∙ y can be a proportion computed from the fraction of events occuring
in a given number of trials. [For example, it could be the fraction of
workers participating in a 401(k) pension plan.] But it could also be
fundamentally continuous, such as the proportion of county land zoned
for agriculture.
2
∙ For now, no problem of missing data, so we avoid the phrase
“censored at zero” or “censored at one.”
∙ If a variable is initially measured as a percentage, divide it by 100 to
turn it into a proportion.
∙ Makes sense to start with linear models; at a minimum, estimated
partial effects can be compared with those from more complicated
nonlinear models.
∙ Remember a general rule: issues such as endogenous explanatory
variables and unobserved heterogeneity are more easily handled with
linear models. To allow nonlinear functional forms, we will impose
extra assumptions.
3
2. POSSIBLE APPROACHES TO FRACTIONAL RESPONSES
∙ In the case where y has corners at zero and one, a two-limit Tobit
model is logically consistent. But it uses a full set of distributional
assumptions [which, of course, has the benefit of allowing us to
estimate any feature of Dy|x]. Plus, it is logically inconsistent if we
have only one corner.
∙ Can use other logically consistent distributions. If y i is a continuous
on 0, 1, a conditional Beta distribution makes sense.
∙ The Beta distribution is not in the LEF, so, like the Tobit approach,
MLE using the Beta distribution is inconsistent for the parameters in a
correctly specified conditional mean.
4
∙ We focus mainly on models for estimating the conditonal mean.
Later, discuss two-part models.
∙ Linear model has essentially same drawbacks as for binary response:
Ey|x x 1 2 x 2 . . . K x K
can hold over all potential values of x only in rare circumstances (such
as mutually exclusive and exhaustive dummy variables).
5
∙ As with other limited dependent variables, we should view the linear
model as the best linear approximation to Ey|x (which we can
potentially improve by using quadratics, interactions, and other
functional forms such as logarithms).
∙ As always, the OLS estimators are consistent for the linear projection
parameters, which approximate (we hope) average partial effects.
6
∙ A common approach when 0 y 1 is to use the so-called log-odds
transformation of y, logy/1 − y, in a linear regression. Define
w logy/1 − y and assume
Ew|x x.
∙ The log-odds approach is simple and, because w can range over all
real values, the linear conditional mean is attractive.
7
∙ Drawbacks to the log-odds approach: First, it cannot be applied to
corner solution responses unless we make some arbitrary adjustments.
Because logy/1 − y → − as y → 0 and logy/1 − y → as
y → 1, our estimates might be sensitive to the adjustments at the
endpoints.
∙ Second, even if y is strictly in the unit interval, is difficult to
interpret: without further assumptions, it is not possible to estimate
Ey|x from a model for Elogy/1 − y|x.
8
∙ One possibility is to assume the log-odds transformation yields a
linear model with an additive error independent of x:
where we take Ee 0 (and assume that x 1 1). Then, we can write
9
∙ If e and x are independent,
Ey|x expx e/1 expx edFe,
where F is the distribution function of e.
∙ Duan’s (JASA, 1983) “smearing estimate” can be used without
specifying De:
N
Êy|x N −1 ∑ expx̂ ê i /1 expx̂ ê i ,
i1
10
∙ Estimated partial effects are obtained by taking derivatives with
respect to the x j , or discrete differences.
∙ A similar analysis applies if we replace the log-odds transformation
with −1 y, where −1 is the inverse function of the standard
normal cdf, in which case we average x̂ ê i across i to estimate
Ey|x.
∙ Can use the delta method for standard errors, or the bootstrap.
∙ Question: If we are mainly interested in Ey|x, why not just model it
directly?
11
3. FRACTIONAL LOGIT AND PROBIT
∙ Let y be a response in 0, 1, possibly including the endpoints. We can
model its mean as
or as a probit function,
Ey|x x.
∙ In each case the fitted values will be in 0, 1 and each allows y to
take on any values in 0, 1, including the endpoints zero and one.
12
∙ Partial effects are obtained just as in standard logit and probit, but
these are on the mean and not the response probability.
∙ The above functional forms do not, of course, exhaust the
possibilities. For example,
13
∙ Generally, let the mean function be Gx. We could estimate by
nonlinear least squares. NLS is consistent and inference is
straightforward, provided we use the fully robust sandwich variance
matrix estimator that does not restrict Vary|x.
∙ As in estimating models of conditional means for unbounded,
nonnegative responses, NLS is unlikely to be efficient for fractional
responses because common distributions for a fractional response imply
heteroskedasticity.
∙ Could use a two-step weighted NLS if we model Vary|x.
14
∙ A simpler, one-step strategy is to use a QMLE approach. We know
the Bernoulli log likelihood is in the linear exponential family.
Therefore, the QMLE that solves
N
max
b
∑1 − y i log1 − Gx i b y i logGx i b
i1
15
∙ Call the QMLE fractional logit regression or fractional probit
regression.
∙ These are just as robust as the NLS estimators.
∙ Fully robust inference is straightforward for QMLE. When the mean
is correctly specified, estimate the asymptotic variance of ̂ as
N −1 N N −1
ĝ 2i x ′i x i û 2i ĝ 2i x ′i x i ĝ 2i x ′i x i
∑ Ĝ i 1 − Ĝ i
∑ Ĝ i 1 − Ĝ i 2
∑ Ĝ i 1 − Ĝ i
i1 i1 i1
where
û i y i − Gx i ̂.
16
∙ If we allow the mean to be misspecified, we replace the outer part of
the sandwich with the estimated Hessian, not expected Hessian
conditional on x i .
∙ The Bernoulli GLM variance assumption is
Vary|x 2 Ey|x1 − Ey|x.
17
∙ As we discussed in the general case, the Bernoulli QMLE has an
attractive efficiency property. If it turns out that the GLM variance
assumption holds, then the QMLE is efficient in the class of all
estimators that use only Ey|x Gx for consistency. In particular,
the QMLE is more efficient than NLS.
∙ Of course, if, say, a Beta distribution were correct, and we use the
correct MLE, this would be more efficient than the QMLE. But the
MLE uses more assumptions for consistency.
18
∙ If the GLM variance assumption holds, the asymptotic variance
matrix estimator simplies to
N −1
gx i ̂ 2 x ′i x i
̂ 2 ∑ Gx i ̂1 − Gx i ̂
i1
with
N
̂ 2 N − K −1 ∑û 2i /v̂ i ,
i1
19
∙ One case where Bernoulli GLM assumption holds. Suppose
y i s i /n i , where s i is the number of “successes” in n i Bernoulli draws.
Suppose that s i given n i , x i follows a Binomialn i , Gx i
distribution.
∙ Then Ey i |n i , x i Gx i and
Vary i |n i , x i n −1
i Gx i 1 − Gx i . If n i is independent of x i ,
where 2 ≡ En −1
i ≤ 1 (with strict inequality unless n i 1 with
probability one).
20
∙ In practice, it is unlikely that n i and x i are independent, so fully
robust inference should be used.
∙ Further, within-group correlation – that is, if we write s i ∑ r1
i n
w ir
for binary responses w ir , the w ir : r 1, . . . , n i are correlated
conditional on n i , x i – generally invalidates the GLM variance
assumption, as in the Binomial case.
21
∙ If we are given data on proportions but do not know n i , it makes
sense to use a fractional logit or probit analysis. If we observe the n i ,
we might use binomial regression instead (which is fully robust
provided Es i |n i , x i n i Gx i ).
∙ If we maintain Es i |n i , x i n i Gx i and y i s i /n i ,
This means that binomial regression using the counts s i and fractional
regression using y i should yield similar estimates of .
22
∙ If the binomial distributional assumption is true, MLE using s i , n i is
asymptotically more efficient than fractional regression. But the
variance in binomial regression often has overdispersion. The fractional
regression can actually be more efficient. (And, it is often more
resilient to outliers.)
23
∙ Can compare the APEs to OLS estimates of a linear model. For a
continuous variable x j ,
N
APE j N −1 ∑ gx i ̂ ̂ j
i1
∙ If x j is binary,
N
APE j N −1 ∑Gx ij ̂ −Gx ij ̂
1 0
i1
1 0
where x ij has x ij 1 and x ij has x ij 0.
24
∙ Whether (say) x K is discrete or continuous, we can obtain an estimate
0 1
of APE K when x K changes from, say, a K to a K , without using a
calculus approximation, as in the previous equation but where
25
∙ If, say, x K is the key variable, and it is continuous, might plot the
response as a function of x K , inserting mean values (say) of the other
variables or averaging them out:
or
N
ASF K x K N −1 ∑ Ĝ 1 ̂ 2 x i2 . . . ̂ K−1 x i,K−1 ̂ K x K
i1
26
∙ Simple functional form test: After estimation of ̂, add powers of x i ̂,
such as x i ̂ 2 , x i ̂ 3 , use fractional QMLE on the expanded “model”
and use a robust Wald test of joint significance for 1 , 2 . (This test, an
extension of RESET for linear models, can be applied to any index
context, including count regression with an exponential mean.)
∙ This is an example of a variable addition test, which is essentially a
score test but slightly easier to implement.
27
∙ For goodness-of-fit of the mean, can compute an R-squared as
N
∑ i1
y i − ŷ i 2
R2 1 − N
∑ i1 y i − ȳ 2
where ŷ i Gx i ̂.
∙ Another possibility is the squared correlation between y i and ŷ i .
∙ Unlike OLS estimation of a linear model, these are not algebraically
the same.
28
∙ GLM in Stata:
glm y x1 ... xK, fam(bin) link(logit) robust
glm y x1 ... xK, fam(bin) link(probit) sca(x2)
glm y x1 ... xK, fam(bin) link(loglog) robust
∙ Best to make inference fully robust, but the GLM variance
assumption often gives similar standard errors.
∙ The usual MLE standard errors are too conservative, often very
conservative.
∙ The “loglog” link implements the model Ey|x exp− expx.
29
∙ After any of the commands, fitted values are easy to get:
predict yhat
∙ To get the estimated indices, x i ̂, and powers of them:
predict xbhat, xb
gen xbhatsq xbhat^2
gen xbhatcu xbhat^3
30
EXAMPLE: Participation rates in 401(k) pension plans.
. use 401k
. des
31
. sum
. count if mrate 1
292
. count if mrate 2
101
32
. reg prate mrate age ltotemp sole, robust
------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .0485354 .0043289 11.21 0.000 .0400443 .0570266
age | .0031704 .0004032 7.86 0.000 .0023795 .0039613
ltotemp | -.0240487 .0031777 -7.57 0.000 -.0302818 -.0178156
sole | .0217378 .0086932 2.50 0.013 .004686 .0387896
_cons | .9465254 .0218303 43.36 0.000 .9037049 .9893458
------------------------------------------------------------------------------
33
. glm prate mrate age ltotemp sole, fam(bin) link(logit) robust
note: prate has non-integer values
AIC .5589556
Log pseudolikelihood -423.7189416 BIC -10901.66
------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .9167158 .134119 6.84 0.000 .6538474 1.179584
age | .0322364 .0049561 6.50 0.000 .0225226 .0419502
ltotemp | -.2080024 .0258256 -8.05 0.000 -.2586195 -.1573852
sole | .1676861 .0846774 1.98 0.048 .0017215 .3336507
_cons | 2.370495 .1921688 12.34 0.000 1.993851 2.747139
------------------------------------------------------------------------------
34
. margeff
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .0969144 .0140539 6.90 0.000 .0693694 .1244595
age | .0034081 .0005304 6.43 0.000 .0023686 .0044477
ltotemp | -.0219898 .0027723 -7.93 0.000 -.0274233 -.0165562
sole | .0176176 .0083374 2.11 0.035 .0012766 .0339586
------------------------------------------------------------------------------
. * The APE for mrate is about double the linear model estimate.
35
. predict prateh_l
(option mu assumed; predicted mean prate)
| prate prateh_l
-------------------------------
prate | 1.0000
prateh_l | 0.4263 1.0000
. di .4263^2
.18173169
36
. * The nonrobust standard errors are too large:
AIC .5589556
Log likelihood -423.7189416 BIC -10901.66
------------------------------------------------------------------------------
| OIM
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .9167158 .2059862 4.45 0.000 .5129902 1.320441
age | .0322364 .010257 3.14 0.002 .012133 .0523398
ltotemp | -.2080024 .0551219 -3.77 0.000 -.3160393 -.0999654
sole | .1676861 .1716409 0.98 0.329 -.1687239 .5040961
_cons | 2.370495 .4263752 5.56 0.000 1.534815 3.206175
------------------------------------------------------------------------------
37
. gen mratesq mrate^2
. gen agesq age^2
. gen ltotempsq ltotemp^2
. reg prate mrate mratesq age agesq ltotemp ltotempsq sole, robust
------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .137551 .0124891 11.01 0.000 .1130534 .1620487
mratesq | -.0255695 .0029956 -8.54 0.000 -.0314454 -.0196936
age | .0076809 .0015391 4.99 0.000 .0046619 .0106999
agesq | -.000129 .0000371 -3.48 0.001 -.0002017 -.0000563
ltotemp | -.113806 .0218575 -5.21 0.000 -.1566799 -.070932
ltotempsq | .0061188 .0014904 4.11 0.000 .0031953 .0090423
sole | .0119101 .0087466 1.36 0.173 -.0052465 .0290667
_cons | 1.2029 .0788964 15.25 0.000 1.048143 1.357657
------------------------------------------------------------------------------
38
. * Now compute the APE for mrate:
. sum mrate_me_lin
. predict xbh_sq_lin
(option xb assumed; fitted values)
39
. reg prate mrate mratesq age agesq ltotemp ltotempsq sole xbh_sq_linsq
xbh_sq_lincu, robust
------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | 5.612875 2.25341 2.49 0.013 1.192761 10.03299
mratesq | -1.042382 .4188927 -2.49 0.013 -1.864049 -.2207152
age | .3121919 .125702 2.48 0.013 .0656248 .5587591
agesq | -.0052452 .0021126 -2.48 0.013 -.0093891 -.0011012
ltotemp | -4.634524 1.8645 -2.49 0.013 -8.291782 -.9772653
ltotempsq | .2492836 .1002737 2.49 0.013 .0525946 .4459725
sole | .4824868 .1956819 2.47 0.014 .0986524 .8663211
xbh_sq_linsq | -40.86979 17.93412 -2.28 0.023 -76.04795 -5.691631
xbh_sq_lincu | 13.81167 6.515698 2.12 0.034 1.030992 26.59236
_cons | 36.28292 14.74269 2.46 0.014 7.364812 65.20104
------------------------------------------------------------------------------
( 1) xbh_sq_linsq 0
( 2) xbh_sq_lincu 0
F( 2, 1524) 21.68
Prob F 0.0000
40
. glm prate mrate mratesq age agesq ltotemp ltotempsq sole, fam(bin)
link(logit) robust
note: prate has non-integer values
AIC .5544207
Log pseudolikelihood -417.2406567 BIC -10892.61
------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | 1.614381 .1675732 9.63 0.000 1.285943 1.942818
mratesq | -.2753789 .0435835 -6.32 0.000 -.360801 -.1899567
age | .0764414 .0158978 4.81 0.000 .0452823 .1076006
agesq | -.0012815 .000386 -3.32 0.001 -.002038 -.000525
ltotemp | -1.199122 .2209129 -5.43 0.000 -1.632103 -.7661407
ltotempsq | .0650906 .0145924 4.46 0.000 .03649 .0936912
sole | .1015973 .0837603 1.21 0.225 -.0625698 .2657644
_cons | 5.535748 .833326 6.64 0.000 3.902459 7.169037
------------------------------------------------------------------------------
41
. predict prateh_l2
| prate prateh~2
-------------------------------
prate | 1.0000
prateh_l2 | 0.4602 1.0000
. di .4602^2
.21178404
. di 1.614/(2*.275)
2.9345455
42
. * Using margeff with the quadratics doesn’t make much sense.
. * Compute APE "by hand."
. predict xbh_l2, xb
. sum scale
. sum mrate_me
. * About 40% higher than the linear model estimated APE, .100.
43
. predict xbh_sq_log, xb
. glm prate mrate mratesq age agesq ltotemp ltotempsq sole xbh_sq_logsq
xbh_sq_logcu, fam(bin) link(logit) robust
note: prate has non-integer values
------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | 4.065386 1.356546 3.00 0.003 1.406605 6.724167
mratesq | -.697588 .2330727 -2.99 0.003 -1.154402 -.2407738
age | .1928297 .0648958 2.97 0.003 .0656361 .3200232
agesq | -.0032323 .0011257 -2.87 0.004 -.0054387 -.0010259
ltotemp | -3.050577 1.057474 -2.88 0.004 -5.123189 -.9779661
ltotempsq | .1659176 .0585484 2.83 0.005 .0511647 .2806704
sole | .2623277 .1261936 2.08 0.038 .0149928 .5096626
xbh_sq_logsq | -.8103757 .4117121 -1.97 0.049 -1.617317 -.0034348
xbh_sq_logcu | .129453 .0625514 2.07 0.038 .0068545 .2520515
_cons | 13.20982 4.311299 3.06 0.002 4.759834 21.65981
------------------------------------------------------------------------------
44
. test xbh_sq_logsq xbh_sq_logcu
( 1) [prate]xbh_sq_logsq 0
( 2) [prate]xbh_sq_logcu 0
chi2( 2) 4.51
Prob chi2 0.1048
45
4. ENDOGENOUS EXPLANATORY VARIABLES
∙ The fractional probit model can easily handle certain kinds of
continuous endogenous explanatory variables.
∙ As before, model endogeneity as an omitted variable:
Ey 1 |z, y 2 , c 1 Ey 1 |z 1 , y 2 , c 1 z 1 1 1 y 2 c 1
y 2 z 2 v 2 z 1 21 z 2 22 v 2 ,
46
∙ Ideally, could assume the linear equation for y 2 represents a linear
projection. But we need to assume more.
∙ Sufficient is
c 1 1 v 2 e 1 , e 1 |z, v 2 ~ Normal0, 2e1 ,
47
∙ Then
Ey 1 |z, y 2 Ey 1 |z, y 2 , v 2 z 1 e1 e1 y 2 e1 v 2 ,
48
∙ Simple test of the null hypothesis that y 2 is exogenous is the fully
robust t statistic on v̂ i2 ; the first-step estimation can be ignored under
the null.
∙ If 1 ≠ 0, then the robust sandwich variance matrix estimator of the
scaled coefficients is not valid because it does not account for the first
step estimation. Can adjust for the two-step M-estimation results or use
the bootstrap.
49
∙ The average structural function is consistently estimated as
N
ASFz 1 , y 2 N −1 ∑ z 1 ̂ e1 ̂ e1 y 2 ̂ e1 v̂ i2 ,
i1
50
∙ Basic model can be extended in many ways. For example, can replace
y 2 z 2 v 2 with
hy 2 z 2 v 2
where h is strictly monotonic. (This is for the case where we want y 2
in the structural model yet it is unlikely to have a linear reduced form
with additive, independent error.)
∙ If y 2 0 then h 2 y 2 logy 2 is natural; if 0 y 2 1, might use
the log-odds transformation, h 2 y 2 logy 2 /1 − y 2 .
51
∙ Unfortunately, if y 2 has a mass point – such as a binary response, or
corner response, or count variable – a transformation yielding an
additive, independent error probably does not exist.
∙ Allowing flexible functional forms for y 2 is easy. For example, if the
structural model contains y 22 and interactions, say y 2 z 1 , the estimating
equation could look like
52
∙ After the two-step QMLE, the ASF is estimated as
N
ASFz 1 , y 2 N −1 ∑ z 1 ̂ e1 ̂ e1 y 2
̂ e1 y 22 y 2 z 1 ̂ e1 ̂ e1 v̂ i2 ,
i1
53
∙ In the second stage, we would add a cubic in v̂ 2 to the fractional
probit. In the model just above,
N
ASFz 1 , y 2 N −1 ∑ z 1 ̂ e1 ̂ e1 y 2
̂ e1 y 22 y 2 z 1 ̂ e1
i1
that is, we again just average out the control function. The bootstrap
would be very convenient for standard errors.
54
∙ Recent work by Blundell and Powell (2004, Review of Economic
Studies) goes even further. Just allow Ey 1 |z 1 , y 2 , v 2 to be a flexible
function of its arguments, say
Ey 1 |z 1 , y 2 , v 2 g 1 z 1 , y 2 , v 2
55
∙ The ASF is consistently estimated as
N
ASFz 1 , y 2 N −1 ∑ ĝ 1 z 1 , y 2 , v̂ i2 .
i1
56
∙ We can accomodate multiple continuous endogenous varaibles. Let
x 1 k 1 z 1 , y 2 for a vector of functions k 1 , , and allow a set of
reduced forms for strictly monotonic functions h 2g y 2g , g 1, . . . , G 1 ,
where G 1 is the dimension of y 2 .
∙ See Wooldridge (2005, “Unobserved Heterogeneity and Estimation of
Average Partial Effects,” Rothenberg Festschrift)
57
∙ Recent (unpublished) work: If y 2 is binary and follows a probit
model, can have y 1 fractional with a probit conditional mean and apply
“bivariate probit” to y 1 , y 2 , even though y 1 is not binary. (Not
currently allowed in Stata.)
58
5. TWO-PART MODELS
∙ If have corners at y 0 or y 1 (or, occasionally at both values),
might want to use a two-part (hurdle) model.
∙ For concreteness, assume Py 0 0 but y is continuous in 0, 1,
so there is no pile-up at one.
∙ In addition to modeling Py 0|x, could model Dy|x, y 0. But
more robust to model Ey|x, y 0 using fractional response.
59
∙ Let
Py 0|x 1 − Fx
Ey|x, y 0 Gx
60
∙ Estimation is straightforward. (1) Estimate by binary response (say,
logit or probit) of w i on x i . (2) Use QMLE (fractional logit or probit, or
some other functional form) of y i on x i using data for y i 0 to estimate
.
∙ Can compute an R-squared for the overall mean (and the mean
conditional on y 0) to compare with one-part models. Can test the
functional forms of the two parts, too, using RESET and other tests
(such as for “heteroskedasticity”).
∙ Open (?) question: How to combine two-part models and
endogeneity?
61
6. PANEL DATA METHODS
∙ If no interest in explicitly including unobserved heterogeneity, can
use pooled versions of methods discussed. Of course, should allow for
arbitrary serial dependence in inference as well as variance
misspecification in the LEF distribution.
∙ Might have dynamic completeness in the mean if lagged dependent
variables have been included. (What is the best functional form for
doing so?) As usual, if Ey it |z it , y i,t−1 , z i,t−1 , . . . has been properly
specified, then serial correlation cannot be an issue.
62
∙ In Stata, we just use the “glm” command with a clustering option:
glm y x1 ... xK, fam(bin) link(logit)
cluster(id)
∙ With complete dynamics in the mean:
glm y x1 ... xK, fam(bin) link(logit) robust
or replace the logit link with another.
63
Models and Partial Effects with Heterogeneity
∙ Consider, for 0 ≤ y it ≤ 1,
Ey it |x it , c i x it c i , t 1, . . . , T.
64
For discrete changes, we compute
1 0
x t c − x t c
1 0
for two different settings of the covariates, x t and x t .
∙ Partial effects depend on x t and c. What should we plug in for c?
∙ Instead, focus on the average partial effects (APEs):
E c i j x t c i j E c i x t c i ,
65
∙ When are the APEs identified? More generally than the parameters,
but more assumptions are needed.
∙ Strict exogeneity conditional on c i : if x i ≡ x i1 , x i2 , . . . , x iT ,
Ey it |x i , c i Ey it |x it , c i , t 1, . . . , T.
66
∙ Altonji and Matzkin (2005, Econometrica) use general
exchangeability. Could allow the distribution to depend on other
features of x it : t 1, . . . , T, such as time trends or average growth
rates.
∙ We assume more:
c i |x i1 , x i2 , . . . , x iT ~ Normal x̄ i , 2a .
67
∙ is identified up to a positive scale factor, and the APEs are
identified:
Ey it |x i , a i x it x̄ i a i
and so
68
∙ a , a , and a are identified if there is time variation in x it .
Chamberlain device: replace x̄ i with x i .
∙ Coneniently, the APEs can be obtained by differentiating or
differencing
E x̄ i a x t a x̄ i a
69
∙ As usual, APEs for continuous and discrete variables can be obtained
from ASFx t .
∙ In practice, we would have time dummies, which we could just
̂ at .
indicate with
∙ We can always include time constant variables, say r i , along with x̄ i .
It is then up to us to interpret the partial effects with respect to r i .
70
Estimation Methods Under Strict Exogeneity
∙ Many consistent estimators of the scaled parameters. Define
w it ≡ 1, x it , x̄ i (or with time dummies and time constant variables)
and ≡ a , ′a , ′a ′ . Then can be estimated using pooled nonlinear
least squares (NLS), with regression function w it .
71
∙ Pooled NLS estimator is consistent and N -asymptotically normal
(with fixed T), but is likely to be inefficient.
∙ First, it ignores the serial dependence in the y it , which is likely to be
substantial even after conditioning on x i . Second, Vary it |x i is very
unlikely to be homoskedastic. Could ignore serial correlation, model
Vary it |x i , and use weighted least squares.
72
∙ We already know that we can used pooled fractional probit (or logit,
for that matter), with explanatory variables 1, x it , x̄ i (and, likely, year
dummies).
∙ The “working variance” assumption for pooled FP is
Vary it |x i 2 w it 1 − w it ,
where 0 2 ≤ 1.
∙ Still need to cluster to obtain standard errors robust to serial
correlation, even if the variance function is correct.
73
∙ In Stata, with year dummies explicit:
glm y x1 ... xK x1bar ... xKbar d2 ... dT,
fam(bin) link(probit) cluster(id)
margeff
∙ The “margeff” command gives APEs and appropriate standard errors.
∙ Can add time-constant variables to the list of explanatory variables.
74
∙ Random effects approaches – that is, that attempt to obtain a joint
distribution Dy i1 , . . . , y iT |x i by modeling and then integrating out
unobserved heterogeneity – would require additional distributional
assumptions while being computationally demanding. A nice middle
ground is the GEE approach. We already have the working variance
assumption for fractional probit.
∙ We need to specify a “working” correlation matrix, too. Define the
errors as
75
∙ Define standardized errors as
e it ≡ u it / w it 1 − w it ;
76
∙ Define residuals u it ≡ y it − w it and the standardized (Pearson)
where Vx i , is the T T diagonal matrix with w it 1 − w it
down its diagonal.
77
∙ Now apply multivariate WNLS, which is asymptotically the same as
GEE. Naturally, use a fully robust variance matrix estimator.
∙ Can allow an “unstructured” correlation matrix, too, but the
correlations never depend on x i .
xtgee y x1 ... xK x1bar ... xKbar, fam(bin)
link(probit) corr(exch) robust
xtgee y x1 ... xK x1bar ... xKbar, fam(bin)
link(probit) corr(uns) robust
78
∙ Can apply the “margeff” command in Stata to get APEs averaged
across the cross section and time. For continuous explanatory variables,
the common scale factor is
N T
NT −1 ∑ ∑
̂ a x it ̂ a x̄ i ̂ a .
i1 t1
79
Models with Endogenous Explanatory Variables
∙ Represent endogeneity as an omitted, time-varying variable, in
addition to unobserved heterogeneity:
80
∙ Use a Chamberlain-Mundlak approach, but only relating the
heterogeneity to all strictly exogenous variables:
c i1 1 z̄ i 1 a i1 , Da i1 |z i Da i1 .
81
∙ Need to obtain an estimating equation. First, note that
Ey it1 |z i , y it2 , a i1 , v it1 z it1 1 1 y it2 1 z̄ i 1 a i1 v it1
82
∙ Rather than assume a i1 , v it2 is independent of z i , we can get by with
a weaker assumption, but we imposed normality:
83
∙ Write
r it1 1 v it2 e it1
84
∙ Two step procedure:
(1) Estimate the reduced form for y it2 (pooled or for each t
separately). Obtain the residuals, v̂ it2 .
(2) Use the probit QMLE to estimate 1 , 1 , 1 , 1 and 1 .
(GEE would require strict exogeneity of v it2 !)
∙ How do we interpret the scaled estimates? They give directions of
effects. Conveniently, they also index the APEs. For given z 1 and y 2 ,
average out z̄ i and v̂ it2 (for each t):
N
̂ 1 N −1 ∑ z t1 ̂ 1 ̂ 1 y t2
̂ 1 z̄ i ̂ 1 ̂ 1 v̂ it2 .
i1
85
∙ Applying “margeff” in the second stage consistently estimates the
APEs averaged across t, but the standard errors do not account for the
two-step estimation. Use panel bootstrap for standard errors to allow
for serial dependence and the two-step estimation.
∙ Of course, we can also compute discrete changes for any of the
elements of z t1 , y t2 .
86
EXAMPLE: Effects of Spending on Test Pass Rates
∙ Reform occurs between 1993/94 and 1994/95 school year; its passage
was a surprise to almost everyone.
∙ Since 1994/95, each district receives a foundation allowance, based
on revenues in 1993/94.
∙ Intially, all districts were brought up to a minimum allowance –
$4,200 in the first year. The goal was to eventually give each district a
basic allowance ($5,000 in the first year).
∙ Districts divided into three groups in 1994/95 for purposes of initial
foundation allowance. Subsequent grants determined by statewide
School Aid Fund.
87
∙ Catch-up formula for districts receiving below the basic. Initially,
more than half of the districts received less than the basic allowance.
By 1998/99, it was down to about 36%. In 1999/00, all districts began
receiving the basic allowance, which was then $5,700. Two-thirds of all
districts now receive the basic allowance.
∙ From 1991/92 to 2003/04, in the 10th percentile, expenditures rose
from $4,616 (2004 dollars) to $7,125, a 54 percent increase. In the 50th
percentile, it was a 48 percent increase. In the 90th percentile, per pupil
expenditures rose from $7,132 in 1992/93 to $9,529, a 34 percent
increase.
88
∙ Response variable: math4, the fraction of fourth graders passing the
MEAP math test at a school.
∙ Spending variable is logavgrexppp, where the average is over the
current and previous three years.
∙ The linear model is
math4 it t 1 logavgrexp it 2 lunch it 3 logenroll it c i1 u it1
89
∙ Allowing spending to be endogenous. Controlling for 1993/94
spending, foundation grant should be exogenous. Exploit
nonsmoothness in the grant as a function of initial spending.
90
. use meap92_01
91
. xtreg math4 lavgrexp lunch lenroll y96-y01, fe cluster(distid)
F(9,500) 171.93
corr(u_i, Xb) -0.1787 Prob F 0.0000
92
-----------------------------------------------------------------------------
sigma_u | .1130256
sigma_e | .08314135
rho | .64888558 (fraction of variance due to u_i)
------------------------------------------------------------------------------
93
. reg math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01,
cluster(distid)
94
. * Now use fractional probit.
95
96
. margeff
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .2968496 .0695326 4.27 0.000 .1605682 .433131
alavgrexp | -.1959097 .0750686 -2.61 0.009 -.3430414 -.0487781
lunch | -.0737791 .0698318 -1.06 0.291 -.2106469 .0630887
alunch | -.3358104 .0723725 -4.64 0.000 -.4776579 -.1939629
lenroll | .0299132 .0465622 0.64 0.521 -.061347 .1211734
alenroll | -.0301089 .0467477 -0.64 0.520 -.1217326 .0615149
y96 | -.0122924 .0061107 -2.01 0.044 -.0242692 -.0003156
y97 | -.0508008 .0097646 -5.20 0.000 -.069939 -.0316625
y98 | .0809879 .0100272 8.08 0.000 .0613349 .1006408
y99 | .0696954 .0111375 6.26 0.000 .0478662 .0915245
y00 | .0970224 .0115066 8.43 0.000 .0744698 .119575
y01 | .0729829 .0132849 5.49 0.000 .046945 .0990208
------------------------------------------------------------------------------
97
. xtgee math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01,
fa(bin) link(probit) corr(exch) robust
98
. margeff
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .2979576 .0692519 4.30 0.000 .1622263 .4336889
alavgrexp | -.1965515 .0752801 -2.61 0.009 -.3440978 -.0490052
lunch | -.0799305 .0704803 -1.13 0.257 -.2180693 .0582082
alunch | -.3285784 .0728656 -4.51 0.000 -.4713924 -.1857644
lenroll | .0294948 .0467283 0.63 0.528 -.0620909 .1210805
alenroll | -.0276313 .0469381 -0.59 0.556 -.1196283 .0643656
y96 | -.012373 .0061106 -2.02 0.043 -.0243497 -.0003964
y97 | -.0509306 .0097618 -5.22 0.000 -.0700633 -.0317979
y98 | .0808226 .010009 8.08 0.000 .0612054 .1004399
y99 | .0695541 .0111192 6.26 0.000 .0477609 .0913472
y00 | .0968972 .0115004 8.43 0.000 .0743568 .1194376
y01 | .0729416 .0132624 5.50 0.000 .0469478 .0989353
------------------------------------------------------------------------------
99
. * Now allow spending to be endogenous. Use foundation allowance, and
. * interactions, as IVs.
. * First, linear model:
100
le94y01 | .0865874 .0816732 1.06 0.290 -.0738776 .2470524
_cons | -.334823 .2593105 -1.29 0.197 -.8442955 .1746496
------------------------------------------------------------------------------
Instrumented: lavgrexp
Instruments: lunch alunch lenroll alenroll y96 y97 y98 y99 y00 y01
lexppp94 le94y96 le94y97 le94y98 le94y99 le94y00 le94y01
lfound lfndy96 lfndy97 lfndy98 lfndy99 lfndy00 lfndy01
------------------------------------------------------------------------------
101
. * Get reduced form residuals for fractional probit:
102
. glm math4 lavgrexp v2hat lunch alunch lenroll alenroll y96-y01 lexppp94
le94y96-le94y01, fa(bin) link(probit) cluster(distid)
note: math4 has non-integer values
103
. margeff
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .5830163 .2203345 2.65 0.008 .1511686 1.014864
v2hat | -.4641533 .242971 -1.91 0.056 -.9403678 .0120611
lunch | -.1003741 .0716361 -1.40 0.161 -.2407782 .04003
alunch | -.3754579 .0734083 -5.11 0.000 -.5193355 -.2315803
lenroll | .0962161 .0665257 1.45 0.148 -.0341719 .2266041
alenroll | -.0980059 .0669786 -1.46 0.143 -.2292817 .0332698
...
------------------------------------------------------------------------------
. * These standard errors do not account for the first-stage estimation. Should
. * use the panel bootstrap accounting for both stages.
. * Only marginal evidence that spending is endogenous, but the negative sign
. * fits the story that districts increase spending when performance is
. * (expected to be) worse, based on unobservables (to us).
104