0% found this document useful (0 votes)
115 views104 pages

Econometric Analysis of Cross Section and Panel Data, 2e: Models For Fractional Responses

This document discusses models for fractional response variables, which are variables bounded between 0 and 1. It presents several possible approaches, including linear models, log-odds transformations, fractional logit and probit models, and two-part models. Fractional logit and probit models directly model the conditional mean of the response as a function of covariates using a logit or probit link, providing a simple and robust approach for estimating fractional responses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views104 pages

Econometric Analysis of Cross Section and Panel Data, 2e: Models For Fractional Responses

This document discusses models for fractional response variables, which are variables bounded between 0 and 1. It presents several possible approaches, including linear models, log-odds transformations, fractional logit and probit models, and two-part models. Fractional logit and probit models directly model the conditional mean of the response as a function of covariates using a logit or probit link, providing a simple and robust approach for estimating fractional responses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

MODELS FOR FRACTIONAL RESPONSES

Econometric Analysis of Cross Section and Panel Data, 2e


MIT Press
Jeffrey M. Wooldridge

1. Introduction
2. Possible Approaches to Fractional Responses
3. Fractional Logit and Probit
4. Endogenous Explanatory Variables
5. Two-Part Models
6. Panel Data

1
1. INTRODUCTION
∙ Suppose y is a fractional response, that is, 0 ≤ y ≤ 1.
∙ Allow the possibility that y is a corner solution at zero, one, or both. It
could also be an essentially continuous variable strictly between zero
and one in the population.
∙ y can be a proportion computed from the fraction of events occuring
in a given number of trials. [For example, it could be the fraction of
workers participating in a 401(k) pension plan.] But it could also be
fundamentally continuous, such as the proportion of county land zoned
for agriculture.

2
∙ For now, no problem of missing data, so we avoid the phrase
“censored at zero” or “censored at one.”
∙ If a variable is initially measured as a percentage, divide it by 100 to
turn it into a proportion.
∙ Makes sense to start with linear models; at a minimum, estimated
partial effects can be compared with those from more complicated
nonlinear models.
∙ Remember a general rule: issues such as endogenous explanatory
variables and unobserved heterogeneity are more easily handled with
linear models. To allow nonlinear functional forms, we will impose
extra assumptions.

3
2. POSSIBLE APPROACHES TO FRACTIONAL RESPONSES
∙ In the case where y has corners at zero and one, a two-limit Tobit
model is logically consistent. But it uses a full set of distributional
assumptions [which, of course, has the benefit of allowing us to
estimate any feature of Dy|x]. Plus, it is logically inconsistent if we
have only one corner.
∙ Can use other logically consistent distributions. If y i is a continuous
on 0, 1, a conditional Beta distribution makes sense.
∙ The Beta distribution is not in the LEF, so, like the Tobit approach,
MLE using the Beta distribution is inconsistent for the parameters in a
correctly specified conditional mean.

4
∙ We focus mainly on models for estimating the conditonal mean.
Later, discuss two-part models.
∙ Linear model has essentially same drawbacks as for binary response:
Ey|x  x   1   2 x 2 . . .  K x K

can hold over all potential values of x only in rare circumstances (such
as mutually exclusive and exhaustive dummy variables).

5
∙ As with other limited dependent variables, we should view the linear
model as the best linear approximation to Ey|x (which we can
potentially improve by using quadratics, interactions, and other
functional forms such as logarithms).
∙ As always, the OLS estimators are consistent for the linear projection
parameters, which approximate (we hope) average partial effects.

6
∙ A common approach when 0  y  1 is to use the so-called log-odds
transformation of y, logy/1 − y, in a linear regression. Define
w  logy/1 − y and assume

Ew|x  x.

∙ The log-odds approach is simple and, because w can range over all
real values, the linear conditional mean is attractive.

7
∙ Drawbacks to the log-odds approach: First, it cannot be applied to
corner solution responses unless we make some arbitrary adjustments.
Because logy/1 − y → − as y → 0 and logy/1 − y →  as
y → 1, our estimates might be sensitive to the adjustments at the
endpoints.
∙ Second, even if y is strictly in the unit interval,  is difficult to
interpret: without further assumptions, it is not possible to estimate
Ey|x from a model for Elogy/1 − y|x.

8
∙ One possibility is to assume the log-odds transformation yields a
linear model with an additive error independent of x:

logy/1 − y  x  e, De|x  De,

where we take Ee  0 (and assume that x 1  1). Then, we can write

y  expx  e/1  expx  e.

9
∙ If e and x are independent,
Ey|x   expx  e/1  expx  edFe,
where F is the distribution function of e.
∙ Duan’s (JASA, 1983) “smearing estimate” can be used without
specifying De:
N
Êy|x  N −1 ∑ expx̂  ê i /1  expx̂  ê i ,
i1

where ̂ is the OLS estimator from w i on x i and ê i  w i − x i ̂ are the


OLS residuals.

10
∙ Estimated partial effects are obtained by taking derivatives with
respect to the x j , or discrete differences.
∙ A similar analysis applies if we replace the log-odds transformation
with  −1 y, where  −1  is the inverse function of the standard
normal cdf, in which case we average x̂  ê i  across i to estimate
Ey|x.
∙ Can use the delta method for standard errors, or the bootstrap.
∙ Question: If we are mainly interested in Ey|x, why not just model it
directly?

11
3. FRACTIONAL LOGIT AND PROBIT
∙ Let y be a response in 0, 1, possibly including the endpoints. We can
model its mean as

Ey|x  expx/1  expx,

or as a probit function,

Ey|x  x.

∙ In each case the fitted values will be in 0, 1 and each allows y to
take on any values in 0, 1, including the endpoints zero and one.

12
∙ Partial effects are obtained just as in standard logit and probit, but
these are on the mean and not the response probability.
∙ The above functional forms do not, of course, exhaust the
possibilities. For example,

Ey|x  exp− expx

allows a different shape. (The function Gz  exp− expz is the


cumulative distribution function of an asymmetric random variable.)

13
∙ Generally, let the mean function be Gx. We could estimate  by
nonlinear least squares. NLS is consistent and inference is
straightforward, provided we use the fully robust sandwich variance
matrix estimator that does not restrict Vary|x.
∙ As in estimating models of conditional means for unbounded,
nonnegative responses, NLS is unlikely to be efficient for fractional
responses because common distributions for a fractional response imply
heteroskedasticity.
∙ Could use a two-step weighted NLS if we model Vary|x.

14
∙ A simpler, one-step strategy is to use a QMLE approach. We know
the Bernoulli log likelihood is in the linear exponential family.
Therefore, the QMLE that solves
N
max
b
∑1 − y i  log1 − Gx i b  y i logGx i b
i1

is consistent for  whenever the conditional mean is correctly specified.


∙ Notice that the quasi-LLF is well defined for any y i in 0, 1 and
functions 0  G  1. Plus, it is a standard estimation problem
because it is identical to estimating binary response models.

15
∙ Call the QMLE fractional logit regression or fractional probit
regression.
∙ These are just as robust as the NLS estimators.
∙ Fully robust inference is straightforward for QMLE. When the mean
is correctly specified, estimate the asymptotic variance of ̂ as
N −1 N N −1
ĝ 2i x ′i x i û 2i ĝ 2i x ′i x i ĝ 2i x ′i x i
∑ Ĝ i 1 − Ĝ i 
∑ Ĝ i 1 − Ĝ i  2
∑ Ĝ i 1 − Ĝ i 
i1 i1 i1

where

û i  y i − Gx i ̂.

16
∙ If we allow the mean to be misspecified, we replace the outer part of
the sandwich with the estimated Hessian, not expected Hessian
conditional on x i .
∙ The Bernoulli GLM variance assumption is
Vary|x   2 Ey|x1 − Ey|x.

∙ When this assumption holds it is often with  2  1; in this case


inference based on the usual binary response statistics will be too
conservative – often, much too conservative.

17
∙ As we discussed in the general case, the Bernoulli QMLE has an
attractive efficiency property. If it turns out that the GLM variance
assumption holds, then the QMLE is efficient in the class of all
estimators that use only Ey|x  Gx for consistency. In particular,
the QMLE is more efficient than NLS.
∙ Of course, if, say, a Beta distribution were correct, and we use the
correct MLE, this would be more efficient than the QMLE. But the
MLE uses more assumptions for consistency.

18
∙ If the GLM variance assumption holds, the asymptotic variance
matrix estimator simplies to
N −1
gx i ̂ 2 x ′i x i
̂ 2 ∑ Gx i ̂1 − Gx i ̂
i1

with
N
̂ 2  N − K −1 ∑û 2i /v̂ i ,
i1

v̂ i  Gx i ̂1 − Gx i ̂.

19
∙ One case where Bernoulli GLM assumption holds. Suppose
y i  s i /n i , where s i is the number of “successes” in n i Bernoulli draws.
Suppose that s i given n i , x i  follows a Binomialn i , Gx i 
distribution.
∙ Then Ey i |n i , x i   Gx i  and
Vary i |n i , x i   n −1
i Gx i 1 − Gx i . If n i is independent of x i ,

Vary i |x i   VarEy i |n i , x i |x i   EVary i |n i , x i |x i 


 0  En −1
i |x i Gx i 1 − Gx i 

≡  2 Gx i 1 − Gx i ,

where  2 ≡ En −1
i  ≤ 1 (with strict inequality unless n i  1 with

probability one).

20
∙ In practice, it is unlikely that n i and x i are independent, so fully
robust inference should be used.
∙ Further, within-group correlation – that is, if we write s i  ∑ r1
i n
w ir
for binary responses w ir , the w ir : r  1, . . . , n i  are correlated
conditional on n i , x i  – generally invalidates the GLM variance
assumption, as in the Binomial case.

21
∙ If we are given data on proportions but do not know n i , it makes
sense to use a fractional logit or probit analysis. If we observe the n i ,
we might use binomial regression instead (which is fully robust
provided Es i |n i , x i   n i Gx i ).
∙ If we maintain Es i |n i , x i   n i Gx i  and y i  s i /n i ,

Ey i |n i , x i   Es i |n i , x i /n i  Gx i   Ey i |x i 

This means that binomial regression using the counts s i and fractional
regression using y i should yield similar estimates of .

22
∙ If the binomial distributional assumption is true, MLE using s i , n i  is
asymptotically more efficient than fractional regression. But the
variance in binomial regression often has overdispersion. The fractional
regression can actually be more efficient. (And, it is often more
resilient to outliers.)

23
∙ Can compare the APEs to OLS estimates of a linear model. For a
continuous variable x j ,
N
APE j  N −1 ∑ gx i ̂ ̂ j
i1

∙ If x j is binary,
N
APE j  N −1 ∑Gx ij ̂ −Gx ij ̂
1 0

i1

1 0
where x ij has x ij  1 and x ij has x ij  0.

24
∙ Whether (say) x K is discrete or continuous, we can obtain an estimate
0 1
of APE K when x K changes from, say, a K to a K , without using a
calculus approximation, as in the previous equation but where

x iK  ̂ 1  ̂ 2 x i2 . . . ̂ K−1 x i,K−1  ̂ K a K


0 0

x iK  ̂ 1  ̂ 2 x i2 . . . ̂ K−1 x i,K−1  ̂ K a K


1 1

25
∙ If, say, x K is the key variable, and it is continuous, might plot the
response as a function of x K , inserting mean values (say) of the other
variables or averaging them out:

G̂ 1  ̂ 2 x̄ 2 . . . ̂ K−1 x̄ K−1  ̂ K x K 

or
N
ASF K x K   N −1 ∑ G̂ 1  ̂ 2 x i2 . . . ̂ K−1 x i,K−1  ̂ K x K 
i1

∙ Can compare these response functions with linear model.


∙ Can put the usual functional forms in the index; makes partial effects
more difficult to compute.

26
∙ Simple functional form test: After estimation of ̂, add powers of x i ̂,
such as x i ̂ 2 , x i ̂ 3 , use fractional QMLE on the expanded “model”

Gx i    1 x i ̂ 2   2 x i ̂ 3 ,

and use a robust Wald test of joint significance for  1 ,  2 . (This test, an
extension of RESET for linear models, can be applied to any index
context, including count regression with an exponential mean.)
∙ This is an example of a variable addition test, which is essentially a
score test but slightly easier to implement.

27
∙ For goodness-of-fit of the mean, can compute an R-squared as
N
∑ i1
y i − ŷ i  2
R2  1 − N
∑ i1 y i − ȳ  2
where ŷ i  Gx i ̂.
∙ Another possibility is the squared correlation between y i and ŷ i .
∙ Unlike OLS estimation of a linear model, these are not algebraically
the same.

28
∙ GLM in Stata:
glm y x1 ... xK, fam(bin) link(logit) robust
glm y x1 ... xK, fam(bin) link(probit) sca(x2)
glm y x1 ... xK, fam(bin) link(loglog) robust
∙ Best to make inference fully robust, but the GLM variance
assumption often gives similar standard errors.
∙ The usual MLE standard errors are too conservative, often very
conservative.
∙ The “loglog” link implements the model Ey|x  exp− expx.

29
∙ After any of the commands, fitted values are easy to get:
predict yhat
∙ To get the estimated indices, x i ̂, and powers of them:
predict xbhat, xb
gen xbhatsq  xbhat^2
gen xbhatcu  xbhat^3

30
EXAMPLE: Participation rates in 401(k) pension plans.
. use 401k

. des

Contains data from \swbook1_4e\statafiles\401k.dta


obs: 1,534
vars: 8 9 Jun 1998 08:20
size: 46,020 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
prate float %7.0g participation rate, percent
mrate float %7.0g 401k plan match rate
totpart float %7.0g total 401k participants
totelg float %7.0g total eligible for 401k plan
age byte %7.0g age of 401k plan
totemp float %7.0g total number of firm employees
sole byte %7.0g  1 if 401k is firm’s sole plan
ltotemp float %9.0g log of totemp
-------------------------------------------------------------------------------
Sorted by:

31
. sum

Variable | Obs Mean Std. Dev. Min Max


---------------------------------------------------------------------
prate | 1534 87.36291 16.71654 3 100
mrate | 1534 .7315124 .7795393 .01 4.91
totpart | 1534 1354.231 4629.265 50 58811
totelg | 1534 1628.535 5370.719 51 70429
age | 1534 13.18123 9.171114 4 51
---------------------------------------------------------------------
totemp | 1534 3568.495 11217.94 58 144387
sole | 1534 .4876141 .5000096 0 1
ltotemp | 1534 6.686034 1.453375 4.060443 11.88025

. count if mrate  1
292

. count if mrate  2
101

. replace prate  prate/100


(1534 real changes made)

32
. reg prate mrate age ltotemp sole, robust

Linear regression Number of obs  1534


F( 4, 1529)  73.36
Prob  F  0.0000
R-squared  0.1474
Root MSE  .15456

------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .0485354 .0043289 11.21 0.000 .0400443 .0570266
age | .0031704 .0004032 7.86 0.000 .0023795 .0039613
ltotemp | -.0240487 .0031777 -7.57 0.000 -.0302818 -.0178156
sole | .0217378 .0086932 2.50 0.013 .004686 .0387896
_cons | .9465254 .0218303 43.36 0.000 .9037049 .9893458
------------------------------------------------------------------------------

. * The nonrobust standard errors are similar, actually slightly larger.

33
. glm prate mrate age ltotemp sole, fam(bin) link(logit) robust
note: prate has non-integer values

Generalized linear models No. of obs  1534


Optimization : ML Residual df  1529
Scale parameter  1
Deviance  314.528326 (1/df) Deviance  .2057085
Pearson  367.9839977 (1/df) Pearson  .2406697

Variance function: V(u)  u*(1-u/1) [Binomial]


Link function : g(u)  ln(u/(1-u)) [Logit]

AIC  .5589556
Log pseudolikelihood  -423.7189416 BIC  -10901.66

------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .9167158 .134119 6.84 0.000 .6538474 1.179584
age | .0322364 .0049561 6.50 0.000 .0225226 .0419502
ltotemp | -.2080024 .0258256 -8.05 0.000 -.2586195 -.1573852
sole | .1676861 .0846774 1.98 0.048 .0017215 .3336507
_cons | 2.370495 .1921688 12.34 0.000 1.993851 2.747139
------------------------------------------------------------------------------

34
. margeff

Average partial effects after glm


y  Pr(prate)

------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .0969144 .0140539 6.90 0.000 .0693694 .1244595
age | .0034081 .0005304 6.43 0.000 .0023686 .0044477
ltotemp | -.0219898 .0027723 -7.93 0.000 -.0274233 -.0165562
sole | .0176176 .0083374 2.11 0.035 .0012766 .0339586
------------------------------------------------------------------------------

. * The APE for mrate is about double the linear model estimate.

35
. predict prateh_l
(option mu assumed; predicted mean prate)

. corr prate prateh_l


(obs1534)

| prate prateh_l
-------------------------------
prate | 1.0000
prateh_l | 0.4263 1.0000

. di .4263^2
.18173169

36
. * The nonrobust standard errors are too large:

. glm prate mrate age ltotemp sole, fam(bin) link(logit)


note: prate has non-integer values

Generalized linear models No. of obs  1534


Optimization : ML Residual df  1529
Scale parameter  1
Deviance  314.528326 (1/df) Deviance  .2057085
Pearson  367.9839977 (1/df) Pearson  .2406697

AIC  .5589556
Log likelihood  -423.7189416 BIC  -10901.66

------------------------------------------------------------------------------
| OIM
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .9167158 .2059862 4.45 0.000 .5129902 1.320441
age | .0322364 .010257 3.14 0.002 .012133 .0523398
ltotemp | -.2080024 .0551219 -3.77 0.000 -.3160393 -.0999654
sole | .1676861 .1716409 0.98 0.329 -.1687239 .5040961
_cons | 2.370495 .4263752 5.56 0.000 1.534815 3.206175
------------------------------------------------------------------------------

37
. gen mratesq  mrate^2
. gen agesq  age^2
. gen ltotempsq  ltotemp^2

. reg prate mrate mratesq age agesq ltotemp ltotempsq sole, robust

Linear regression Number of obs  1534


F( 7, 1526)  56.14
Prob  F  0.0000
R-squared  0.1883
Root MSE  .15095

------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | .137551 .0124891 11.01 0.000 .1130534 .1620487
mratesq | -.0255695 .0029956 -8.54 0.000 -.0314454 -.0196936
age | .0076809 .0015391 4.99 0.000 .0046619 .0106999
agesq | -.000129 .0000371 -3.48 0.001 -.0002017 -.0000563
ltotemp | -.113806 .0218575 -5.21 0.000 -.1566799 -.070932
ltotempsq | .0061188 .0014904 4.11 0.000 .0031953 .0090423
sole | .0119101 .0087466 1.36 0.173 -.0052465 .0290667
_cons | 1.2029 .0788964 15.25 0.000 1.048143 1.357657
------------------------------------------------------------------------------

38
. * Now compute the APE for mrate:

. gen mrate_me_lin  _b[mrate]  2*_b[mratesq]*mrate

. sum mrate_me_lin

Variable | Obs Mean Std. Dev. Min Max


---------------------------------------------------------------------
mrate_me_lin | 1534 .1001422 .0398649 -.1135418 .1370397

. * Obtain RESET using the square and cube:

. predict xbh_sq_lin
(option xb assumed; fitted values)

. gen xbh_sq_linsq  xbh_sq_lin^2

. gen xbh_sq_lincu  xbh_sq_lin^3

39
. reg prate mrate mratesq age agesq ltotemp ltotempsq sole xbh_sq_linsq
xbh_sq_lincu, robust

------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | 5.612875 2.25341 2.49 0.013 1.192761 10.03299
mratesq | -1.042382 .4188927 -2.49 0.013 -1.864049 -.2207152
age | .3121919 .125702 2.48 0.013 .0656248 .5587591
agesq | -.0052452 .0021126 -2.48 0.013 -.0093891 -.0011012
ltotemp | -4.634524 1.8645 -2.49 0.013 -8.291782 -.9772653
ltotempsq | .2492836 .1002737 2.49 0.013 .0525946 .4459725
sole | .4824868 .1956819 2.47 0.014 .0986524 .8663211
xbh_sq_linsq | -40.86979 17.93412 -2.28 0.023 -76.04795 -5.691631
xbh_sq_lincu | 13.81167 6.515698 2.12 0.034 1.030992 26.59236
_cons | 36.28292 14.74269 2.46 0.014 7.364812 65.20104
------------------------------------------------------------------------------

. test xbh_sq_linsq xbh_sq_lincu

( 1) xbh_sq_linsq  0
( 2) xbh_sq_lincu  0

F( 2, 1524)  21.68
Prob  F  0.0000

. * A strong statistical rejection of the linear model even with quadratics.

40
. glm prate mrate mratesq age agesq ltotemp ltotempsq sole, fam(bin)
link(logit) robust
note: prate has non-integer values

Generalized linear models No. of obs  1534


Optimization : ML Residual df  1526
Scale parameter  1
Deviance  301.5717563 (1/df) Deviance  .1976224
Pearson  318.2190643 (1/df) Pearson  .2085315

AIC  .5544207
Log pseudolikelihood  -417.2406567 BIC  -10892.61

------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | 1.614381 .1675732 9.63 0.000 1.285943 1.942818
mratesq | -.2753789 .0435835 -6.32 0.000 -.360801 -.1899567
age | .0764414 .0158978 4.81 0.000 .0452823 .1076006
agesq | -.0012815 .000386 -3.32 0.001 -.002038 -.000525
ltotemp | -1.199122 .2209129 -5.43 0.000 -1.632103 -.7661407
ltotempsq | .0650906 .0145924 4.46 0.000 .03649 .0936912
sole | .1015973 .0837603 1.21 0.225 -.0625698 .2657644
_cons | 5.535748 .833326 6.64 0.000 3.902459 7.169037
------------------------------------------------------------------------------

41
. predict prateh_l2

. corr prate prateh_l2


(obs1534)

| prate prateh~2
-------------------------------
prate | 1.0000
prateh_l2 | 0.4602 1.0000

. di .4602^2
.21178404

. * Fits better than linear model with quadratics (R-squared  .188).

. di 1.614/(2*.275)
2.9345455

. count if mrate  2.93


52

. * So only 52 out of 1,534 observations are to the right of the turning


. * point.

42
. * Using margeff with the quadratics doesn’t make much sense.
. * Compute APE "by hand."

. predict xbh_l2, xb

. gen scale  exp(xbh_l2)/(1  exp(xbh_l2))^2

. sum scale

Variable | Obs Mean Std. Dev. Min Max


---------------------------------------------------------------------
scale | 1534 .104565 .0540043 .0071288 .2364579

. gen mrate_me  (_b[mrate]  2*_b[mratesq]*mrate)*scale

. sum mrate_me

Variable | Obs Mean Std. Dev. Min Max


---------------------------------------------------------------------
mrate_me | 1534 .1414986 .0902254 -.0718435 .3778262

. * About 40% higher than the linear model estimated APE, .100.

43
. predict xbh_sq_log, xb

. gen xbh_sq_logsq  xbh_sq_log^2

. gen xbh_sq_logcu  xbh_sq_log^3

. glm prate mrate mratesq age agesq ltotemp ltotempsq sole xbh_sq_logsq
xbh_sq_logcu, fam(bin) link(logit) robust
note: prate has non-integer values

------------------------------------------------------------------------------
| Robust
prate | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
mrate | 4.065386 1.356546 3.00 0.003 1.406605 6.724167
mratesq | -.697588 .2330727 -2.99 0.003 -1.154402 -.2407738
age | .1928297 .0648958 2.97 0.003 .0656361 .3200232
agesq | -.0032323 .0011257 -2.87 0.004 -.0054387 -.0010259
ltotemp | -3.050577 1.057474 -2.88 0.004 -5.123189 -.9779661
ltotempsq | .1659176 .0585484 2.83 0.005 .0511647 .2806704
sole | .2623277 .1261936 2.08 0.038 .0149928 .5096626
xbh_sq_logsq | -.8103757 .4117121 -1.97 0.049 -1.617317 -.0034348
xbh_sq_logcu | .129453 .0625514 2.07 0.038 .0068545 .2520515
_cons | 13.20982 4.311299 3.06 0.002 4.759834 21.65981
------------------------------------------------------------------------------

44
. test xbh_sq_logsq xbh_sq_logcu

( 1) [prate]xbh_sq_logsq  0
( 2) [prate]xbh_sq_logcu  0

chi2( 2)  4.51
Prob  chi2  0.1048

. * So we do not reject the fractional logit at the 10% significance level.

. * Can plot the mean function as a function of mrate, with other


. * variables fixed at specific values.

45
4. ENDOGENOUS EXPLANATORY VARIABLES
∙ The fractional probit model can easily handle certain kinds of
continuous endogenous explanatory variables.
∙ As before, model endogeneity as an omitted variable:
Ey 1 |z, y 2 , c 1   Ey 1 |z 1 , y 2 , c 1   z 1  1   1 y 2  c 1 
y 2  z 2  v 2  z 1  21  z 2  22  v 2 ,

where c 1 is an omitted factor thought to be correlated with y 2 but


independent of the exogenous variables z.

46
∙ Ideally, could assume the linear equation for y 2 represents a linear
projection. But we need to assume more.
∙ Sufficient is
c 1   1 v 2  e 1 , e 1 |z, v 2 ~ Normal0,  2e1 ,

where a sufficient, though not necessary, condition is that c 1 , v 2  is


bivariate normal and independent of z.

47
∙ Then
Ey 1 |z, y 2   Ey 1 |z, y 2 , v 2   z 1  e1   e1 y 2   e1 v 2 ,

where the “e” subscript denotes multiplication by the scale factor


1/1   2e 1  1/2 .
∙ Fortunately, the scaled coefficients index the average partial effects.
∙ Two-step method: (1) Obtain the OLS residuals v̂ i2 from the
regression y i2 on z i . Next, use fractional probit of y i1 on z i1 , y i2 , v̂ i2 to
estimate the scaled coefficients.

48
∙ Simple test of the null hypothesis that y 2 is exogenous is the fully
robust t statistic on v̂ i2 ; the first-step estimation can be ignored under
the null.
∙ If  1 ≠ 0, then the robust sandwich variance matrix estimator of the
scaled coefficients is not valid because it does not account for the first
step estimation. Can adjust for the two-step M-estimation results or use
the bootstrap.

49
∙ The average structural function is consistently estimated as
N
ASFz 1 , y 2   N −1 ∑ z 1 ̂ e1  ̂ e1 y 2  ̂ e1 v̂ i2 ,
i1

and this can be used to obtain APEs with respect to y 2 or z 1 .


∙ Bootstrapping the standard errors and test statistics is a sensible way
to proceed with inference.

50
∙ Basic model can be extended in many ways. For example, can replace
y 2  z 2  v 2 with

hy 2   z 2  v 2

where h is strictly monotonic. (This is for the case where we want y 2
in the structural model yet it is unlikely to have a linear reduced form
with additive, independent error.)
∙ If y 2  0 then h 2 y 2   logy 2  is natural; if 0  y 2  1, might use
the log-odds transformation, h 2 y 2   logy 2 /1 − y 2 .

51
∙ Unfortunately, if y 2 has a mass point – such as a binary response, or
corner response, or count variable – a transformation yielding an
additive, independent error probably does not exist.
∙ Allowing flexible functional forms for y 2 is easy. For example, if the
structural model contains y 22 and interactions, say y 2 z 1 , the estimating
equation could look like

Ey 1 |z, y 2 , v 2   z 1  e1   e1 y 2   e1 y 22  y 2 z 1  e1   e1 v 2 ,

so that a single control function, v 2 , corrects the endogeneity of y 2 .

52
∙ After the two-step QMLE, the ASF is estimated as
N
ASFz 1 , y 2   N −1 ∑ z 1 ̂ e1  ̂ e1 y 2  
̂ e1 y 22  y 2 z 1 ̂ e1  ̂ e1 v̂ i2 ,
i1

and now derivatives or changes with respect to z 1 , y 2  can be obtained.


∙ Further, we might allow Dc 1 |v 2  to be more flexible, such as
c 1   11 v 2   12 v 22   13 v 32  e 1 , e 1 |z, v 2 ~ Normal0,  2e1 .

∙ Notice that c 1 cannot have an unconditional normal distribution,


particular if v 2 is normal. This bothers some people.

53
∙ In the second stage, we would add a cubic in v̂ 2 to the fractional
probit. In the model just above,
N
ASFz 1 , y 2   N −1 ∑ z 1 ̂ e1  ̂ e1 y 2  
̂ e1 y 22  y 2 z 1 ̂ e1
i1

 ̂ e11 v̂ i2  ̂ e12 v̂ 2i2  ̂ e13 v̂ 3i2 ,

that is, we again just average out the control function. The bootstrap
would be very convenient for standard errors.

54
∙ Recent work by Blundell and Powell (2004, Review of Economic
Studies) goes even further. Just allow Ey 1 |z 1 , y 2 , v 2  to be a flexible
function of its arguments, say

Ey 1 |z 1 , y 2 , v 2   g 1 z 1 , y 2 , v 2 

∙ To obtain the control function, assume


y 2  g 2 z  v 2 , v 2 independent of z,

where g 2  is only assumed to be a smooth function. Estimate g 2 


nonparametrically, obtain v̂ i2  y i2 − ĝ 2 z i . Then use nonparametric
regression of y i1 on z i1 , y i2 , v̂ i2

55
∙ The ASF is consistently estimated as
N
ASFz 1 , y 2   N −1 ∑ ĝ 1 z 1 , y 2 , v̂ i2 .
i1

∙ Can approximate this approach by using flexible parametric models.

56
∙ We can accomodate multiple continuous endogenous varaibles. Let
x 1  k 1 z 1 , y 2  for a vector of functions k 1 , , and allow a set of
reduced forms for strictly monotonic functions h 2g y 2g , g  1, . . . , G 1 ,
where G 1 is the dimension of y 2 .
∙ See Wooldridge (2005, “Unobserved Heterogeneity and Estimation of
Average Partial Effects,” Rothenberg Festschrift)

57
∙ Recent (unpublished) work: If y 2 is binary and follows a probit
model, can have y 1 fractional with a probit conditional mean and apply
“bivariate probit” to y 1 , y 2 , even though y 1 is not binary. (Not
currently allowed in Stata.)

58
5. TWO-PART MODELS
∙ If have corners at y  0 or y  1 (or, occasionally at both values),
might want to use a two-part (hurdle) model.
∙ For concreteness, assume Py  0  0 but y is continuous in 0, 1,
so there is no pile-up at one.
∙ In addition to modeling Py  0|x, could model Dy|x, y  0. But
more robust to model Ey|x, y  0 using fractional response.

59
∙ Let
Py  0|x  1 − Fx
Ey|x, y  0  Gx

∙ Let w  1y  0, so that Pw  1|x  Fx.


∙ Now the “unconditional” expectation is
Ey|x  FxGx,

which complicates partial effects.

60
∙ Estimation is straightforward. (1) Estimate  by binary response (say,
logit or probit) of w i on x i . (2) Use QMLE (fractional logit or probit, or
some other functional form) of y i on x i using data for y i  0 to estimate
.
∙ Can compute an R-squared for the overall mean (and the mean
conditional on y  0) to compare with one-part models. Can test the
functional forms of the two parts, too, using RESET and other tests
(such as for “heteroskedasticity”).
∙ Open (?) question: How to combine two-part models and
endogeneity?

61
6. PANEL DATA METHODS
∙ If no interest in explicitly including unobserved heterogeneity, can
use pooled versions of methods discussed. Of course, should allow for
arbitrary serial dependence in inference as well as variance
misspecification in the LEF distribution.
∙ Might have dynamic completeness in the mean if lagged dependent
variables have been included. (What is the best functional form for
doing so?) As usual, if Ey it |z it , y i,t−1 , z i,t−1 , . . .  has been properly
specified, then serial correlation cannot be an issue.

62
∙ In Stata, we just use the “glm” command with a clustering option:
glm y x1 ... xK, fam(bin) link(logit)
cluster(id)
∙ With complete dynamics in the mean:
glm y x1 ... xK, fam(bin) link(logit) robust
or replace the logit link with another.

63
Models and Partial Effects with Heterogeneity
∙ Consider, for 0 ≤ y it ≤ 1,

Ey it |x it , c i   x it   c i , t  1, . . . , T.

∙ As with an endogenous explanatory variable in a cross section setting,


with unobserved heterogeneity the fractional probit approach has
advantages over other functional forms.
∙ Elements of  give the directions of the partial effects. For example, if
x tj is continuous, then
∂Ey t |x t , c
  j x t   c.
∂x tj

64
For discrete changes, we compute
1 0
x t   c − x t   c
1 0
for two different settings of the covariates, x t and x t .
∙ Partial effects depend on x t and c. What should we plug in for c?
∙ Instead, focus on the average partial effects (APEs):
E c i  j x t   c i    j E c i x t   c i ,

which depends on x t (and ) but not on c. (Or discrete differences.) As


before, essentially the same as the “average structural function,”
ASFx t   E c i x t   c i .

65
∙ When are the APEs identified? More generally than the parameters,
but more assumptions are needed.
∙ Strict exogeneity conditional on c i : if x i ≡ x i1 , x i2 , . . . , x iT ,
Ey it |x i , c i   Ey it |x it , c i , t  1, . . . , T.

As always, rules out lagged dependent variables, feedback, and


contemporaneous endogeneity.
∙ Need to restrict Dc i |x i . Enough would be, say,
Dc i |x i   Dc i |x̄ i ,

where x̄ i is the time average.

66
∙ Altonji and Matzkin (2005, Econometrica) use general
exchangeability. Could allow the distribution to depend on other
features of x it : t  1, . . . , T, such as time trends or average growth
rates.
∙ We assume more:
c i |x i1 , x i2 , . . . , x iT  ~ Normal  x̄ i ,  2a .

Write c i    x̄ i   a i where a i |x i ~Normal0,  2a .


∙ Do not impose additional distributional assumptions on Dy it |x i , c i .
Leave the serial dependence in y it  across time unrestricted.

67
∙  is identified up to a positive scale factor, and the APEs are
identified:

Ey it |x i , a i     x it   x̄ i   a i 

and so

Ey it |x i   E  x it   x̄ i   a i |x i 


   x it   x̄ i /1   2a  1/2 
≡  a  x it  a  x̄ i  a ,

where the “a” subscript denotes division by 1   2a  1/2 .

68
∙  a ,  a , and  a are identified if there is time variation in x it .
Chamberlain device: replace x̄ i with x i .
∙ Coneniently, the APEs can be obtained by differentiating or
differencing

E x̄ i  a  x t  a  x̄ i  a 

with respect to the elements of x t . The average structural function is


consistently estimated by
N
ASFx t   N −1 ∑ i1 ̂ a  x t ̂ a  x̄ i ̂ a 
̂ a , ̂ a , ̂ a are consistent estimators.
where 

69
∙ As usual, APEs for continuous and discrete variables can be obtained
from ASFx t .
∙ In practice, we would have time dummies, which we could just
̂ at .
indicate with 
∙ We can always include time constant variables, say r i , along with x̄ i .
It is then up to us to interpret the partial effects with respect to r i .

70
Estimation Methods Under Strict Exogeneity
∙ Many consistent estimators of the scaled parameters. Define
w it ≡ 1, x it , x̄ i  (or with time dummies and time constant variables)
and  ≡  a ,  ′a ,  ′a  ′ . Then  can be estimated using pooled nonlinear
least squares (NLS), with regression function w it .

71
∙ Pooled NLS estimator is consistent and N -asymptotically normal
(with fixed T), but is likely to be inefficient.
∙ First, it ignores the serial dependence in the y it , which is likely to be
substantial even after conditioning on x i . Second, Vary it |x i  is very
unlikely to be homoskedastic. Could ignore serial correlation, model
Vary it |x i , and use weighted least squares.

72
∙ We already know that we can used pooled fractional probit (or logit,
for that matter), with explanatory variables 1, x it , x̄ i  (and, likely, year
dummies).
∙ The “working variance” assumption for pooled FP is
Vary it |x i    2 w it 1 − w it ,

where 0   2 ≤ 1.
∙ Still need to cluster to obtain standard errors robust to serial
correlation, even if the variance function is correct.

73
∙ In Stata, with year dummies explicit:
glm y x1 ... xK x1bar ... xKbar d2 ... dT,
fam(bin) link(probit) cluster(id)
margeff
∙ The “margeff” command gives APEs and appropriate standard errors.
∙ Can add time-constant variables to the list of explanatory variables.

74
∙ Random effects approaches – that is, that attempt to obtain a joint
distribution Dy i1 , . . . , y iT |x i  by modeling and then integrating out
unobserved heterogeneity – would require additional distributional
assumptions while being computationally demanding. A nice middle
ground is the GEE approach. We already have the working variance
assumption for fractional probit.
∙ We need to specify a “working” correlation matrix, too. Define the
errors as

u it ≡ y it − Ey it |x i   y it − w it    a  x it  a  x̄ i  a , t  1, . . . , T.

75
∙ Define standardized errors as
e it ≡ u it / w it 1 − w it  ;

under, Vare it |x i    2 . Exchangeability is that the pairwise


correlations between pairs of standardized errors is contant, say .
∙ To estimate a common correlation parameter, let  be a preliminary
estimator of  – probably the pooled Bernoulli QMLE.

76
∙ Define residuals u it ≡ y it − w it  and the standardized (Pearson)

residuals ě it ≡ u it / w it 1 − w it  . Then,


N T
̂  NTT − 1 −1 ∑ ∑ ∑ ě it ě is .
i1 t1 s≠t

∙ Given the estimated T  T working correlation matrix, R̂ , which


has unity down its diagonal and ̃ everywhere else, we can construct
the estimated “working” variance matrix:

Vx i ,  R̂ Vx i ,  ,


1/2 1/2

where Vx i ,  is the T  T diagonal matrix with w it 1 − w it 
down its diagonal.

77
∙ Now apply multivariate WNLS, which is asymptotically the same as
GEE. Naturally, use a fully robust variance matrix estimator.
∙ Can allow an “unstructured” correlation matrix, too, but the
correlations never depend on x i .
xtgee y x1 ... xK x1bar ... xKbar, fam(bin)
link(probit) corr(exch) robust
xtgee y x1 ... xK x1bar ... xKbar, fam(bin)
link(probit) corr(uns) robust

78
∙ Can apply the “margeff” command in Stata to get APEs averaged
across the cross section and time. For continuous explanatory variables,
the common scale factor is
N T
NT −1 ∑ ∑ 
̂ a  x it ̂ a  x̄ i ̂ a .
i1 t1

∙ Can compare APEs with linear model estimated by fixed effects.


∙ As with previous models, can add x i,t1 (or a subset of variables) as a
test of strict exogeneity. Estimation can be pooled QMLE or GEE.

79
Models with Endogenous Explanatory Variables
∙ Represent endogeneity as an omitted, time-varying variable, in
addition to unobserved heterogeneity:

Ey it1 |z i , y it2 , c i1 , v it1   Ey it1 |z it1 , y it2 , c i1 , v it1 


 z it1  1   1 y it2  c i1  v it1 ,

where c i1 is the time-constant unobserved effect and v it1 is a


time-varying omitted factor that can be correlated with y it2 .
∙ Elements of z it are assumed strictly exogenous, and we have at least
one exclusion restriction: z it  z it1 , z it2 .

80
∙ Use a Chamberlain-Mundlak approach, but only relating the
heterogeneity to all strictly exogenous variables:

c i1   1  z̄ i  1  a i1 , Da i1 |z i   Da i1 .

∙ Even before we specify Da i1 , this is restrictive because it assumes,


in particular, Ec i |z i  is linear in z̄ i and that Varc i |z i  is constant. More
recent work has shown how we can get by with less, such as
Dc i1 |z i   Dc i1 |z̄ i .

81
∙ Need to obtain an estimating equation. First, note that
Ey it1 |z i , y it2 , a i1 , v it1   z it1  1   1 y it2   1  z̄ i  1  a i1  v it1 

≡ z it1  1   1 y it2   1  z̄ i  1  r it1 .

∙ Assume a linear reduced form for y it2 :


y it2   2  z it  2  z̄ i  2  v it2 , t  1, . . . , T
Dv it2 |z i   Dv it2 

(and we might allow for time-varying coefficients).

82
∙ Rather than assume a i1 , v it2  is independent of z i , we can get by with
a weaker assumption, but we imposed normality:

r it1 |z i , v it2  ~ Normal 1 v it2 ,  21 , t  1, . . . , T.

[Easy to allow  1 to change over time.]


∙ Either way, the assumptions effectively rule out discreteness in y it2 .

83
∙ Write
r it1   1 v it2  e it1

where e it1 is independent of z i , v it2  (and, therefore, of y it2 ) and


normally distributed. Again, using a standard mixing property of the
normal distribution,

Ey it1 |z i , y it2 , v it2   z it1  1   1 y it2   1  z̄ i  1   1 v it2 

where the “” denotes division by 1   21  1/2 .


∙ Identification comes off of the exclusion of the time-varying
exogenous variables z it2 .

84
∙ Two step procedure:
(1) Estimate the reduced form for y it2 (pooled or for each t
separately). Obtain the residuals, v̂ it2 .
(2) Use the probit QMLE to estimate  1 ,  1 ,  1 ,  1 and  1 .
(GEE would require strict exogeneity of v it2 !)
∙ How do we interpret the scaled estimates? They give directions of
effects. Conveniently, they also index the APEs. For given z 1 and y 2 ,
average out z̄ i and v̂ it2 (for each t):
N
̂ 1  N −1 ∑ z t1 ̂ 1  ̂ 1 y t2  
̂ 1  z̄ i ̂ 1  ̂ 1 v̂ it2  .
i1

85
∙ Applying “margeff” in the second stage consistently estimates the
APEs averaged across t, but the standard errors do not account for the
two-step estimation. Use panel bootstrap for standard errors to allow
for serial dependence and the two-step estimation.
∙ Of course, we can also compute discrete changes for any of the
elements of z t1 , y t2 .

86
EXAMPLE: Effects of Spending on Test Pass Rates
∙ Reform occurs between 1993/94 and 1994/95 school year; its passage
was a surprise to almost everyone.
∙ Since 1994/95, each district receives a foundation allowance, based
on revenues in 1993/94.
∙ Intially, all districts were brought up to a minimum allowance –
$4,200 in the first year. The goal was to eventually give each district a
basic allowance ($5,000 in the first year).
∙ Districts divided into three groups in 1994/95 for purposes of initial
foundation allowance. Subsequent grants determined by statewide
School Aid Fund.

87
∙ Catch-up formula for districts receiving below the basic. Initially,
more than half of the districts received less than the basic allowance.
By 1998/99, it was down to about 36%. In 1999/00, all districts began
receiving the basic allowance, which was then $5,700. Two-thirds of all
districts now receive the basic allowance.
∙ From 1991/92 to 2003/04, in the 10th percentile, expenditures rose
from $4,616 (2004 dollars) to $7,125, a 54 percent increase. In the 50th
percentile, it was a 48 percent increase. In the 90th percentile, per pupil
expenditures rose from $7,132 in 1992/93 to $9,529, a 34 percent
increase.

88
∙ Response variable: math4, the fraction of fourth graders passing the
MEAP math test at a school.
∙ Spending variable is logavgrexppp, where the average is over the
current and previous three years.
∙ The linear model is
math4 it   t   1 logavgrexp it    2 lunch it   3 logenroll it   c i1  u it1

Estimating this model by fixed effects is identical to adding the time


averages of the three explanatory variables and using pooled OLS.
∙ The “fractional probit” model:
Emath4 it |x i1 , x i2 , . . . , x iT    at  x it  a  x̄ i  a .

89
∙ Allowing spending to be endogenous. Controlling for 1993/94
spending, foundation grant should be exogenous. Exploit
nonsmoothness in the grant as a function of initial spending.

math4 it   t   1 logavgrexp it    2 lunch it   3 logenroll it 

  4t logrexppp i,1994    1 lunch i   2 logenroll i   v it1

∙ And, fractional probit version of this.

90
. use meap92_01

. xtset distid year


panel variable: distid (strongly balanced)
time variable: year, 1992 to 2001
delta: 1 unit

. des math4 avgrexp lunch enroll found

storage display value


variable name type format label variable label
-------------------------------------------------------------------------------
math4 double %9.0g fraction satisfactory, 4th
grade math
avgrexp float %9.0g (rexppp  rexppp_1  rexppp_2 
rexppp_3)/4
lunch float %9.0g fraction eligible for free lunch
enroll float %9.0g district enrollment
found int %9.0g foundation grant, $: 1995-2001

. sum math4 rexppp lunch

Variable | Obs Mean Std. Dev. Min Max


---------------------------------------------------------------------
math4 | 5010 .6149834 .1912023 .059 1
rexppp | 5010 6331.99 1168.198 3553.361 15191.49
lunch | 5010 .2802852 .1571325 .0087 .9126999

91
. xtreg math4 lavgrexp lunch lenroll y96-y01, fe cluster(distid)

Fixed-effects (within) regression Number of obs  3507


Group variable: distid Number of groups  501

R-sq: within  0.4713 Obs per group: min  7


between  0.0219 avg  7.0
overall  0.2049 max  7

F(9,500)  171.93
corr(u_i, Xb)  -0.1787 Prob  F  0.0000

(Std. Err. adjusted for 501 clusters in distid)


------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .3770929 .0705668 5.34 0.000 .2384489 .5157369
lunch | -.0419467 .0731611 -0.57 0.567 -.1856877 .1017944
lenroll | .0020568 .0488107 0.04 0.966 -.0938426 .0979561
y96 | -.0155968 .0063937 -2.44 0.015 -.0281587 -.003035
y97 | -.0589732 .0095232 -6.19 0.000 -.0776837 -.0402628
y98 | .0781686 .0112949 6.92 0.000 .0559772 .1003599
y99 | .0642748 .0123103 5.22 0.000 .0400884 .0884612
y00 | .0895688 .0133223 6.72 0.000 .0633942 .1157434
y01 | .0630091 .014717 4.28 0.000 .0340943 .0919239
_cons | -2.640402 .8161357 -3.24 0.001 -4.24388 -1.036924

92
-----------------------------------------------------------------------------
sigma_u | .1130256
sigma_e | .08314135
rho | .64888558 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. des alavgrexp alunch alenroll

storage display value


variable name type format label variable label
-------------------------------------------------------------------------------
alavgrexp float %9.0g time average lavgrexp, 1995-2001
alunch float %9.0g time average lunch, 1995-2001
alenroll float %9.0g time average lenroll, 1995-2001

93
. reg math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01,
cluster(distid)

Linear regression Number of obs  3507


F( 12, 500)  161.09
Prob  F  0.0000
R-squared  0.4218
Root MSE  .11542

(Std. Err. adjusted for 501 clusters in distid)


------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .377092 .0705971 5.34 0.000 .2383884 .5157956
alavgrexp | -.286541 .0731797 -3.92 0.000 -.4303185 -.1427635
lunch | -.0419466 .0731925 -0.57 0.567 -.1857494 .1018562
alunch | -.3770088 .0766141 -4.92 0.000 -.5275341 -.2264835
lenroll | .0020566 .0488317 0.04 0.966 -.093884 .0979972
alenroll | -.0031646 .0491534 -0.06 0.949 -.0997373 .0934082
y96 | -.0155968 .0063965 -2.44 0.015 -.0281641 -.0030295
y97 | -.0589731 .0095273 -6.19 0.000 -.0776916 -.0402546
y98 | .0781687 .0112998 6.92 0.000 .0559678 .1003696
y99 | .064275 .0123156 5.22 0.000 .0400782 .0884717
y00 | .089569 .013328 6.72 0.000 .0633831 .1157548
y01 | .0630093 .0147233 4.28 0.000 .0340821 .0919365
_cons | -.0006233 .2450239 -0.00 0.998 -.4820268 .4807801
------------------------------------------------------------------------------

94
. * Now use fractional probit.

. glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01,


fa(bin) link(probit) cluster(distid)
note: math4 has non-integer values

Generalized linear models No. of obs  3507


Optimization : ML Residual df  3494
Scale parameter  1
Deviance  237.643665 (1/df) Deviance  .0680148
Pearson  225.1094075 (1/df) Pearson  .0644274

(Std. Err. adjusted for 501 clusters in distid)


------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .8810302 .2068026 4.26 0.000 .4757045 1.286356
alavgrexp | -.5814474 .2229411 -2.61 0.009 -1.018404 -.1444909
lunch | -.2189714 .2071544 -1.06 0.290 -.6249865 .1870437
alunch | -.9966635 .2155739 -4.62 0.000 -1.419181 -.5741465
lenroll | .0887804 .1382077 0.64 0.521 -.1821017 .3596626
alenroll | -.0893612 .1387674 -0.64 0.520 -.3613404 .1826181
y96 | -.0362309 .0178481 -2.03 0.042 -.0712125 -.0012493
y97 | -.1467327 .0273205 -5.37 0.000 -.20028 -.0931855
y98 | .2520084 .0337706 7.46 0.000 .1858192 .3181975
y99 | .2152507 .0367226 5.86 0.000 .1432757 .2872257
y00 | .3049632 .0399409 7.64 0.000 .2266805 .3832459
y01 | .2257321 .0439608 5.13 0.000 .1395705 .3118938
_cons | -1.855832 .7556621 -2.46 0.014 -3.336902 -.3747616
------------------------------------------------------------------------------

95
96
. margeff

Average partial effects after glm


y  Pr(math4)

------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .2968496 .0695326 4.27 0.000 .1605682 .433131
alavgrexp | -.1959097 .0750686 -2.61 0.009 -.3430414 -.0487781
lunch | -.0737791 .0698318 -1.06 0.291 -.2106469 .0630887
alunch | -.3358104 .0723725 -4.64 0.000 -.4776579 -.1939629
lenroll | .0299132 .0465622 0.64 0.521 -.061347 .1211734
alenroll | -.0301089 .0467477 -0.64 0.520 -.1217326 .0615149
y96 | -.0122924 .0061107 -2.01 0.044 -.0242692 -.0003156
y97 | -.0508008 .0097646 -5.20 0.000 -.069939 -.0316625
y98 | .0809879 .0100272 8.08 0.000 .0613349 .1006408
y99 | .0696954 .0111375 6.26 0.000 .0478662 .0915245
y00 | .0970224 .0115066 8.43 0.000 .0744698 .119575
y01 | .0729829 .0132849 5.49 0.000 .046945 .0990208
------------------------------------------------------------------------------

. * These standard errors are very close to bootstrapped standard errors.

97
. xtgee math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01,
fa(bin) link(probit) corr(exch) robust

GEE population-averaged model Number of obs  3507


Group variable: distid Number of groups  501
Link: probit Obs per group: min  7
Family: binomial avg  7.0
Correlation: exchangeable max  7
Wald chi2(12)  1815.43
Scale parameter: 1 Prob  chi2  0.0000

(Std. Err. adjusted for clustering on distid)


------------------------------------------------------------------------------
| Semi-robust
math4 | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .884564 .2060662 4.29 0.000 .4806817 1.288446
alavgrexp | -.5835138 .2236705 -2.61 0.009 -1.0219 -.1451277
lunch | -.2372942 .2091221 -1.13 0.256 -.6471659 .1725775
alunch | -.9754696 .2170624 -4.49 0.000 -1.400904 -.5500351
lenroll | .0875629 .1387427 0.63 0.528 -.1843677 .3594935
alenroll | -.0820307 .1393712 -0.59 0.556 -.3551933 .1911318
y96 | -.0364771 .0178529 -2.04 0.041 -.0714681 -.001486
y97 | -.1471389 .0273264 -5.38 0.000 -.2006976 -.0935801
y98 | .2515377 .0337018 7.46 0.000 .1854833 .317592
y99 | .2148552 .0366599 5.86 0.000 .143003 .2867073
y00 | .3046286 .0399143 7.63 0.000 .2263981 .3828591
y01 | .2256619 .0438877 5.14 0.000 .1396437 .3116801
_cons | -1.914975 .7528262 -2.54 0.011 -3.390487 -.4394628
------------------------------------------------------------------------------

98
. margeff

Average partial effects after xtgee


y  Pr(math4)

------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .2979576 .0692519 4.30 0.000 .1622263 .4336889
alavgrexp | -.1965515 .0752801 -2.61 0.009 -.3440978 -.0490052
lunch | -.0799305 .0704803 -1.13 0.257 -.2180693 .0582082
alunch | -.3285784 .0728656 -4.51 0.000 -.4713924 -.1857644
lenroll | .0294948 .0467283 0.63 0.528 -.0620909 .1210805
alenroll | -.0276313 .0469381 -0.59 0.556 -.1196283 .0643656
y96 | -.012373 .0061106 -2.02 0.043 -.0243497 -.0003964
y97 | -.0509306 .0097618 -5.22 0.000 -.0700633 -.0317979
y98 | .0808226 .010009 8.08 0.000 .0612054 .1004399
y99 | .0695541 .0111192 6.26 0.000 .0477609 .0913472
y00 | .0968972 .0115004 8.43 0.000 .0743568 .1194376
y01 | .0729416 .0132624 5.50 0.000 .0469478 .0989353
------------------------------------------------------------------------------

99
. * Now allow spending to be endogenous. Use foundation allowance, and
. * interactions, as IVs.
. * First, linear model:

. ivreg math4 lunch alunch lenroll alenroll y96-y01 lexppp94 le94y96-le94y01


(lavgrexp  lfound lfndy96-lfndy01), cluster(distid)

Instrumental variables (2SLS) regression Number of obs  3507


F( 18, 500)  107.05
Prob  F  0.0000
R-squared  0.4134
Root MSE  .11635

(Std. Err. adjusted for 501 clusters in distid)


------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .5545247 .2205466 2.51 0.012 .1212123 .987837
lunch | -.0621991 .0742948 -0.84 0.403 -.2081675 .0837693
alunch | -.4207815 .0758344 -5.55 0.000 -.5697749 -.2717882
lenroll | .0463616 .0696215 0.67 0.506 -.0904253 .1831484
alenroll | -.049052 .070249 -0.70 0.485 -.1870716 .0889676
y96 | -1.085453 .2736479 -3.97 0.000 -1.623095 -.5478119
y97 | -1.049922 .376541 -2.79 0.005 -1.78972 -.3101244
y98 | -.4548311 .4958826 -0.92 0.359 -1.429102 .5194394
y99 | -.4360973 .5893671 -0.74 0.460 -1.594038 .7218439
y00 | -.3559283 .6509999 -0.55 0.585 -1.634961 .923104
y01 | -.704579 .7310773 -0.96 0.336 -2.140941 .7317831
lexppp94 | -.4343213 .2189488 -1.98 0.048 -.8644944 -.0041482
le94y96 | .1253255 .0318181 3.94 0.000 .0628119 .1878392
le94y97 | .11487 .0425422 2.70 0.007 .0312865 .1984534
le94y98 | .0599439 .0554377 1.08 0.280 -.0489757 .1688636
le94y99 | .0557854 .0661784 0.84 0.400 -.0742367 .1858075
le94y00 | .048899 .0727172 0.67 0.502 -.0939699 .1917678

100
le94y01 | .0865874 .0816732 1.06 0.290 -.0738776 .2470524
_cons | -.334823 .2593105 -1.29 0.197 -.8442955 .1746496
------------------------------------------------------------------------------
Instrumented: lavgrexp
Instruments: lunch alunch lenroll alenroll y96 y97 y98 y99 y00 y01
lexppp94 le94y96 le94y97 le94y98 le94y99 le94y00 le94y01
lfound lfndy96 lfndy97 lfndy98 lfndy99 lfndy00 lfndy01
------------------------------------------------------------------------------

. * Estimate is substantially larger than when spending is treated as exogenous.

101
. * Get reduced form residuals for fractional probit:

. reg lavgrexp lfound lfndy96-lfndy01 lunch alunch lenroll alenroll y96-y01


lexppp94 le94y96-le94y01, cluster(distid)

Linear regression Number of obs  3507


F( 24, 500)  1174.57
Prob  F  0.0000
R-squared  0.9327
Root MSE  .03987

(Std. Err. adjusted for 501 clusters in distid)


------------------------------------------------------------------------------
| Robust
lavgrexp | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfound | .2447063 .0417034 5.87 0.000 .1627709 .3266417
lfndy96 | .0053951 .0254713 0.21 0.832 -.044649 .0554391
lfndy97 | -.0059551 .0401705 -0.15 0.882 -.0848789 .0729687
lfndy98 | .0045356 .0510673 0.09 0.929 -.0957972 .1048685
lfndy99 | .0920788 .0493854 1.86 0.063 -.0049497 .1891074
lfndy00 | .1364484 .0490355 2.78 0.006 .0401074 .2327894
lfndy01 | .2364039 .0555885 4.25 0.000 .127188 .3456198
...
_cons | .1632959 .0996687 1.64 0.102 -.0325251 .359117
------------------------------------------------------------------------------

. predict v2hat, resid


(1503 missing values generated)

102
. glm math4 lavgrexp v2hat lunch alunch lenroll alenroll y96-y01 lexppp94
le94y96-le94y01, fa(bin) link(probit) cluster(distid)
note: math4 has non-integer values

Generalized linear models No. of obs  3507


Optimization : ML Residual df  3487
Scale parameter  1
Deviance  236.0659249 (1/df) Deviance  .0676989
Pearson  223.3709371 (1/df) Pearson  .0640582

Variance function: V(u)  u*(1-u/1) [Binomial]


Link function : g(u)  invnorm(u) [Probit]

(Std. Err. adjusted for 501 clusters in distid)


------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | 1.731039 .6541194 2.65 0.008 .4489886 3.013089
v2hat | -1.378126 .720843 -1.91 0.056 -2.790952 .0347007
lunch | -.2980214 .2125498 -1.40 0.161 -.7146114 .1185686
alunch | -1.114775 .2188037 -5.09 0.000 -1.543623 -.685928
lenroll | .2856761 .197511 1.45 0.148 -.1014383 .6727905
alenroll | -.2909903 .1988745 -1.46 0.143 -.6807771 .0987966
...
_cons | -2.455592 .7329693 -3.35 0.001 -3.892185 -1.018998
------------------------------------------------------------------------------

103
. margeff

Average partial effects after glm


y  Pr(math4)

------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .5830163 .2203345 2.65 0.008 .1511686 1.014864
v2hat | -.4641533 .242971 -1.91 0.056 -.9403678 .0120611
lunch | -.1003741 .0716361 -1.40 0.161 -.2407782 .04003
alunch | -.3754579 .0734083 -5.11 0.000 -.5193355 -.2315803
lenroll | .0962161 .0665257 1.45 0.148 -.0341719 .2266041
alenroll | -.0980059 .0669786 -1.46 0.143 -.2292817 .0332698
...
------------------------------------------------------------------------------

. * These standard errors do not account for the first-stage estimation. Should
. * use the panel bootstrap accounting for both stages.
. * Only marginal evidence that spending is endogenous, but the negative sign
. * fits the story that districts increase spending when performance is
. * (expected to be) worse, based on unobservables (to us).

104

You might also like