Loksh in 2011
Loksh in 2011
Abstract. In this article, we describe the switch probit command, which imple-
ments the maximum likelihood method to fit the model of the binary choice with
binary endogenous regressors.
Keywords: st0233, switch probit, endogenous variables, maximum likelihood,
limited-dependent variables, binary choice models, impact evaluation, marginal
treatment effect
1 Introduction
In this article, we describe the implementation of a maximum likelihood (ML) estimator
of the parameters of binary choice models with endogenous regressors. In these models,
a switching equation sorts individuals over two different states (with only one regime or
outcome observed for each individual). The econometric problem of fitting a model with
endogenous switching with binary endogenous regressors arises in a variety of settings in
the modeling of the effects of fertility and migration on female labor force participation
(LFP), modeling of housing demand, and in the modeling of markets in disequilibrium.
For example,
c 2011 StataCorp LP st0233
M. Lokshin and Z. Sajaia 369
• Carrasco (2001) estimates the causal effect of fertility on LFP of females in the
United States using the 1986–1989 rounds of the University of Michigan Panel
Study of Income Dynamics (PSID). The paper finds that the probability of LFP in
women falls more in the model that accounts for endogenous fertility than in the
model with exogenous fertility. The paper points to a downward bias induced by
the exogeneity assumption of children variables that introduces a spurious positive
correlation between fertility and LFP decisions.
• Lokshin and Glinskaya (2009) assess the impact of male migration on the labor
market behavior of females in Nepal. The results indicate that male migration has
a negative impact on the level of LFP by the women left behind. The paper finds
evidence of substantial heterogeneity (based both on observable and unobservable
characteristics) in the impact of male migration.
Models with endogenous switching can be fit one branch (selection equation and
outcome equation) at a time (two heckprob estimations) or by simultaneous ML esti-
mations (see [R] biprobit and [R] heckprob). However, both of these methods are
inefficient, and the biprobit command is restrictive in that it assumes an equality of
coefficients in the outcome equations for both treatment regimes. In addition, these
approaches require potentially cumbersome adjustments to derive consistent standard
errors. The switch probit command, on the other hand, implements the full informa-
tion ML method to simultaneously estimate the binary selection and the binary outcome
parts of the model to yield consistent standard errors of the estimates. This approach
relies on an assumption of joint normality of the error terms in the selection and outcome
equations. The switch probit command also derives the average treatment effects—
the average effects of treatment on the treated and on the untreated—and the marginal
treatment effects.
2 Model
Consider a model that describes the behavior of an agent with two binary outcome
equations and a criterion function Ti that determines which regime the agent faces.
Ti can be interpreted as a treatment. A motivating example is the effect of husband
migration on wife’s LFP. Here the treatment (migration of a husband) and the outcome
(whether the wife works outside the home) can take one of the two potential values:
Observed yi is defined as
yi = y1i if Ti = 1
yi = y0i if Ti = 0
∗ ∗
where y1i and y0i are the latent variables (wife’s propensity for LFP) that determine
the observed binary outcomes y1 and y0 (whether the wife works or not); X1 and X0
are vectors of weakly exogenous variables; Z is a vector of variables that determines a
switch between the regimes; β1 , β0 , and γ are vectors of parameters; and μi , ε1i , and
ε0i are the error terms. Assume that μi , ε1i , and ε0i are jointly normally distributed,
with a mean-zero vector and correlation matrix
⎛ ⎞
1 ρ0 ρ1
Ω=⎝ 1 ρ10 ⎠ (4)
1
where ρ0 and ρ1 are the correlations between ε0 , μ and ε1 , μ, and ρ10 is the correlation
between ε0 and ε1 .
Because y1i and y0i are never observed simultaneously, the joint distribution of
(ε0 , ε1 ) is not identified, and consequently, ρ10 cannot be estimated. We assume that
ρ10 = 1 (γ is estimable only up to a scalar factor). This model is identified by nonlin-
earities of its functional form. The log-likelihood function for the simultaneous system
of equations [(1)–(3)] is
ln() = wi ln{Φ2 (X1i β1 , Zi γ, ρ1 )}
Ti =0,yi =0
+ wi ln{Φ2 (−X1i β1 , Zi γ, −ρ1 )}
Ti =0,yi =0
+ wi ln{Φ2 (X0i β0 , −Zi γ, −ρ0 )}
Ti =0,yi =0
+ wi ln{Φ2 (−X0i β0 , −Zi γ, ρ0 )}
Ti =0,yi =0
After estimating the model’s parameters, the following statistics can be calculated
(Aakvik, Heckman, and Vytlacil 2000):
• The effect of the treatment on the treated, or the expected effect of the treatment
on individuals with observed characteristics x who participated in the program
(TT):1
• The effect of the treatment on the untreated (TU), which is the expected effect of
the treatment on individuals with observed characteristics x who did not partici-
pate in the program:
• The treatment effect (TE), which is the expected effect of the treatment for the
person with observed characteristics x randomly drawn from the population:
• The marginal treatment effect (MTE), which is the effect of the treatment on
individuals with observed characteristics x and unobserved characteristics μ:
• The average treatment effects (ATT, ATU, and ATE) for the corresponding sub-
groups of the population, which can be calculated by averaging (6) through (8)
over the observations in the subgroups. For example, the average treatment effect
on the treated (ATT), which is the mean effect of the treatment on those who
actually participated in the program, is
1. The treatment effect statistics are defined only for the cases when the exogenous variables in (2)
and (3) are the same, in other words, when X0 = X1 . When X0 = X1 , the treatment effect
statistics are calculated based on the vector of explanatory variables, which is a union of variables
in X0 and X1 . The coefficients corresponding to the variables that were not initially included in
the sets of explanatory variables for either of the equations are set to zero.
372 Impact of interventions on discrete outcomes
1
NT
ATT = TT(xi )
NT i=1
The probability of being treated and having a positive outcome—the probability for
a husband to migrate and for his wife to work:
The probability of being treated and having a zero outcome—the probability for a
husband to migrate and for his wife not to work:
The probability of not being treated and having a positive outcome—the probability
for a husband not to migrate and for his wife to work:
The probability of not being treated and having a zero outcome—the probability
for a husband not to migrate and for his a wife not to work:
Φ2 (Zγ, X1 β1 , ρ1 )
Pr(y = 1|T = 1, X = x) = (15)
F (Zγ)
pweights, fweights, and iweights are allowed; see [U] 11.1.6 weight.
depvar1 is a binary outcome variable in regime 1. varlist1 is a vector of explanatory
variables in the equation explaining outcome in regime 1 [equation (2)]. depvar0 and
varlist0 are, correspondingly, the binary outcome variable and the set of explanatory
variables in regime 0 [equation (3)]. depvar s is a binary dependent variable in selection
equation (1), and varlist s is a set of explanatory variables in selection equation (1).
In cases when the explanatory variables in the binary outcome equations are the same
and there is only one dependent variable, only one equation needs to be specified.
Alternatively, when the exogenous variables are different in outcome equations [(2) and
(3)] and the dependent variables are different between the two outcome equations, both
equations must be specified.
3.2 Options
select(depvar s varlist s) gives the specification of switching (1) for Ti . varlist s might
include the set of instruments that help identify the model. It is an integral part of
the switch probit estimation and is required. A full specification of explanatory
variables is required for the selection equation (1); in other words, both instruments
and exogenous variables must be specified in varlist s. If there are no instrumental
variables in the model, the model will be identified by nonlinearities.
noconstant suppresses the constant terms.
offset1(varname), offset0(varname), and offset s(varname) include variables in
each equation with coefficients constrained to 1.
For more information, see [R] estimation options.
constraints(numlist | matname) applies linear constraints to the fitted model.
collinear keeps collinear variables in the equations. By default, only noncollinear
explanatory variables are used.
2. The syntax of switch probit is similar to the syntax of the movestay command (Lokshin and Sajaia
2004).
374 Impact of interventions on discrete outcomes
4 Postestimation
The predict command can follow switch probit to calculate the predictive statistics.
The statistics could be both in and out of the sample; type “predict . . . if e(sample)
. . . ” to generate statistics for observations in the estimated sample only.
predict type newvar if in , statistic
One of the following statistics may be specified with the predict command after
switch probit:
p11, the default, calculates the probability of being treated and having a positive out-
come [equation (11)].
p10 calculates the probability of being treated and having a zero outcome [equation
(12)].
p01 calculates the probability of not being treated and having a positive outcome [equa-
tion (13)].
p00 calculates the probability of not being treated and having a zero outcome [equation
(14)].
psel calculates the probability of being treated [equation (10)].
pcond1 calculates the probability of a positive outcome conditional on being treated
[equation (15)].
pcond0 calculates the probability of a positive outcome conditional on not being treated
[equation (16)].
zb calculates the probit linear prediction for the selection equation.
xb1 calculates the linear prediction based on the coefficients of the outcome equation in
regime 1.
xb0 calculates the linear prediction based on the coefficients of the outcome equation in
regime 0.
stdpsel calculates the standard error of the linear prediction of the selection equation.
stdp1 calculates the standard error of the linear prediction of regime 1.
stdp0 calculates the standard error of the linear prediction of regime 0.
tt calculates the treatment effect on the treated [equation (6)].
tu calculates the treatment effect on the untreated [equation (7)].
te calculates the treatment effect [equation (8)].
mte calculates the marginal treatment effect [equation (9)].
M. Lokshin and Z. Sajaia 377
5 Example
We illustrate the use of the switch probit command by looking at the problem of
estimating the impact of a husband’s migration on wife’s LFP. A typical empirical spec-
ification for such a model might be the following:
Here Mi∗ is a latent continuous variable that determines the propensity of a husband
to migrate; LFP∗i0 and LFP∗i1 is the latent continuous propensity of a wife to work outside
the home if her husband migrates (subscript 1) or stays home (subscript 0); Zi is
a vector of characteristics that influences the migration decision; Xi is a vector of
characteristics that is thought to influence the wife’s LFP decision. β1 , β0 , and γ are
vectors of parameters, and μi , ε1i , and ε0i are the disturbance terms. The observed wife’s
LFP, Wi , is a dichotomous realization of latent variable LFP∗ i1 if a husband migrates and
of latent variable LFP∗i0 if he does not migrate.
The assumption that is often made in this type of model is that the wife’s decision
to participate in the labor market is endogenous to her husband’s migration decision.
Some unobserved characteristics that influence the probability of a husband to migrate
could also influence the decision of his wife to work or not. Neglecting these selectivity
effects is likely to produce biased estimates of the impact of the husband’s migration on
the wife’s LFP. The simultaneous ML estimation of (17), (18), and (19) with the proper
instrumentation of the migration decision might correct such a bias.
The data from this example are a nonrandom subsample of the data from the
2004 round of the Nepal Living Standards Survey (for example, Lokshin and Glinskaya
[2009]). The migration indicator migrates takes on value 1 if the husband migrates and
0 if he stays in the native country. The dependent variables in the wife’s LFP equations,
(19), are binary indicators of whether a wife works if her husband migrates (works 1)
or whether she works if her husband stays (works 0). The set of exogenous variables in
the LFP regressions (19) includes such wife’s characteristics as her age, age-squared, ed-
ucational dummies (wedu 2–wedu 5), and regional dummies (reg2–reg6). The omitted
category for educational dummies is “illiterate”, and higher-index dummies correspond
to higher levels of wife’s education. In addition to these variables, the migration equa-
tion (17) includes an instrument—pmigrants—to improve identification. A proportion
of migrants in a ward is believed to influence the husband’s migration decision but not
to affect the wife’s LFP decision.
The ML estimation of this specification using the switch probit command on
switch probit example.dta is shown below:
378 Impact of interventions on discrete outcomes
. use switch_probit_example
. switch_probit works age age2 wedu_2-wedu_5 hhsize hhsize2 reg_*,
> select(migrant age age2 wedu_2-wedu_5 hhsize hhsize2 reg_* pmigrants)
migrant
age -.053828 .0101656 -5.30 0.000 -.0737521 -.0339039
age2 .0738756 .0133015 5.55 0.000 .0478051 .0999461
wedu_2 .0451867 .0639763 0.71 0.480 -.0802046 .170578
wedu_3 .1416557 .0674213 2.10 0.036 .0095124 .2737991
wedu_4 .197108 .0611319 3.22 0.001 .0772916 .3169243
wedu_5 .0142019 .0994006 0.14 0.886 -.1806197 .2090234
hhsize -.0556154 .0140021 -3.97 0.000 -.0830591 -.0281718
hhsize2 .0013728 .000646 2.13 0.034 .0001068 .0026389
reg_2 .7102527 .082358 8.62 0.000 .548834 .8716713
reg_3 .873601 .0915923 9.54 0.000 .6940835 1.053119
reg_4 .6790904 .0865635 7.84 0.000 .509429 .8487518
reg_5 .7782603 .0931983 8.35 0.000 .5955949 .9609257
reg_6 .9390181 .0849033 11.06 0.000 .7726107 1.105425
pmigrants .9877826 .1358246 7.27 0.000 .7215712 1.253994
_cons -.2400678 .2142056 -1.12 0.262 -.659903 .1797675
works_1
age .1251623 .0247011 5.07 0.000 .0767491 .1735755
age2 -.1696758 .0327141 -5.19 0.000 -.2337943 -.1055572
wedu_2 -.2971512 .154475 -1.92 0.054 -.5999166 .0056142
wedu_3 -.0435156 .1531251 -0.28 0.776 -.3436352 .2566041
wedu_4 -.0806875 .141102 -0.57 0.567 -.3572423 .1958673
wedu_5 .4507575 .2118544 2.13 0.033 .0355305 .8659846
hhsize -.1056211 .0539037 -1.96 0.050 -.2112704 .0000283
hhsize2 .0022055 .0033594 0.66 0.511 -.0043789 .0087898
reg_2 -.2725182 .2955016 -0.92 0.356 -.8516906 .3066542
reg_3 -.7467022 .3581774 -2.08 0.037 -1.448717 -.0446874
reg_4 -.4249299 .2932972 -1.45 0.147 -.9997818 .1499221
reg_5 -.3665319 .3253127 -1.13 0.260 -1.004133 .2710693
reg_6 -.1816081 .3456906 -0.53 0.599 -.8591491 .495933
_cons -2.041429 .7257811 -2.81 0.005 -3.463934 -.618924
works_0
age .0888403 .0130782 6.79 0.000 .0632076 .1144731
age2 -.1230907 .0174413 -7.06 0.000 -.157275 -.0889063
wedu_2 .1174843 .077252 1.52 0.128 -.0339267 .2688954
wedu_3 .0595546 .0853855 0.70 0.486 -.1077979 .2269072
wedu_4 .0320592 .0765806 0.42 0.675 -.1180359 .1821544
wedu_5 .2318881 .0982554 2.36 0.018 .039311 .4244652
hhsize -.0530284 .0204413 -2.59 0.009 -.0930925 -.0129642
hhsize2 .0011186 .0009241 1.21 0.226 -.0006926 .0029297
M. Lokshin and Z. Sajaia 379
The results of the husband’s migration equation are reported in the section of the
output headed “migrant”. The results of the wife’s LFP equation in the regime where
her husband migrates are reported in the “works 1” section, and the wife’s LFP equation
in the regime where her husband stays is outputted in the “works 0” section.
The variables /athrho1 and /athro0 are ancillary parameters used in the ML pro-
cedure. /athrho1 and /athrho0 are the transformations of the correlation coefficients
as in (5).
The correlation coefficients rho1 and rho0 are both negative but are significant only
for the correlation between the error terms in the equation determining the husband’s
migration and the wife’s LFP equation if her husband stays home.
The likelihood-ratio test for joint independence of the equations is reported in the
last line of the output. The test rejects the H0 that ρ0 = ρ1 : Prob > x2 = 0.08.
We can now derive the effect of a husband’s migration on his wife’s LFP by inter-
preting migration as a treatment, (6), using the predict command:
. predict tt, tt
. summarize tt if (migrant == 1)
Variable Obs Mean Std. Dev. Min Max
The baseline data-generating process of (1) through (4) has the following form:
We conduct Monte Carlo simulations for four scenarios on samples of 10,000 ob-
servations with 1,000 repetitions. In all simulations, we show the ratio of γi to βi
(i = 1, . . . , 3) to ensure the comparability of the estimation results across different
model specifications.
In the first scenario, shocks μi , ε1i , and ε0i are generated as standard trivariate
normal; the instrument z is excluded from the outcome equations. The estimates of the
ratios correspond well to the true coefficients.
In the second scenario, with the same error distribution, we add instrument z into the
outcome equations. In this specification, the model is identified through nonlinearities
of the functional form. The estimated ratios of coefficients are still close to the true
ratios. The coefficient on the instrumental variable is insignificant in both outcome
equations. The standard errors of the estimates in the outcome equations are larger
compared with the instrument-identified specification. Note that Wald tests at the 5%
level always reject the true null hypothesis for 5% of the parameter estimates. The weak
identification offered by function-form identification makes the large-sample properties
of the estimator worthless in this case.
In the third scenario, the errors are χ2 distributed and instrument z is excluded from
the outcome equations. The estimated coefficients are now further away from the true
coefficients compared with the first scenario.
Comparable results are observed in the fourth scenario with nonnormal errors, al-
though the precision of the estimates deteriorates significantly in this case. The simu-
lation based on different functional forms for the nonnormal distribution of the shocks
in (20) produces similar estimates.
The results of our simulations indicate that the estimator described in this paper is
relatively robust in terms of identification of the model. These findings are consistent
with conclusions of Wilde (2000) that “in recursive multiple-equation probit models with
endogenous dummy regressors no exclusion restrictions for the exogenous variables are
needed if there is sufficient variation in the data”.
We also evaluate the performance of our estimator in terms of predicting the ATE
and ATT effects. The data-generating process in these simulations is similar to the data-
generating process described in (20), but in addition to presenting the results based on
10,000 observations, we generate ATT and ATE for the sample sizes ranging from 200 to
30,000 observations.
Figure 1 shows the results of Monte Carlo simulations of ATE and ATT for the spec-
ification with normally distributed error terms and 1,000 repetitions. The simulations
demonstrate a good performance of the ML algorithm described in this paper when the
errors in (1), (2), and (3) are jointly normally distributed. Even for smaller sample
sizes, the method produces efficient and unbiased estimates of ATE and ATT effects.
382 Impact of interventions on discrete outcomes
True effect
Mean estimated effect
−.3
0 95% confidence interval
−.4
−.1
−.5
−.2
−.6
−.25 −.65
.21 2 3 4 5 10 15 20 30 .21 2 3 4 5 10 15 20 30
Observations 000’s Observations 000’s
Figure 1. The results of Monte Carlo simulations (1,000 repetitions) of ATE and ATT
effects; specification with normally distributed errors
Figure 2 presents the results of Monte Carlo simulations of ATE and ATT for the
specification where the error terms are nonnormally distributed. The violation of the
normality assumption results in biased estimates for both ATE and ATT effects. The
bias is larger for estimations based on smaller sample sizes.
M. Lokshin and Z. Sajaia 383
True effect
Mean estimated effect
0
95% confidence interval
−.2
−.1
−.3
−.2
−.4
−.3 −.45
.21 2 3 4 5 10 15 20 30 .21 2 3 4 5 10 15 20 30
Observations 000’s Observations 000’s
Figure 2. The results of Monte Carlo simulations (1,000 repetitions) of ATE and ATT
effects; specification with nonnormally distributed errors
Our final simulation results examine the validity of confidence intervals depending
on the assumptions about the joint distribution of the error terms in (20). Figure 3
shows the coverage rates for ATE and ATT effects. The coverage rates are constructed
for the specifications with normally (solid line) and nonnormally distributed error terms
(dotted line) in (20). The simulations are based on the bootstrap estimations of the
confidence intervals for ATE and ATT effects for 1,000 replications for 1,000 Monte Carlo
repetitions (that is, 1,000,000 ML model estimation for each sample size). The size α
confidence interval is reported as the interval between the α/2 and 1 − α/2 quantiles of
the simulated draws of ATE and ATT.
384 Impact of interventions on discrete outcomes
ATE ATT
1 1
.95 .95
.9 .9
.8 .8
Coverage rate for 95% confidence interval
.6 .6
Normal
Nonnormal
.4 .4
.2 .2
0 0
.2 1 2 3 4 5 10 15 20 30 .2 1 2 3 4 5 10 15 20 30
The left panel of figure 3 shows the coverage rates for 95% confidence intervals of
ATE for samples of a different size. The coverage rates for the confidence intervals
estimated based on normally distributed errors are close to the nominal 95%. The
coverage rates for nonnormal specification demonstrate undercoverage that increases
with the sample size. For example, while the coverage rates for samples of up to 2,000
observations are close to the nominal 95%, the coverage rates drop to about 54% for
the simulations based on the sample with 30,000 observations. This decline in the
coverage rates for nonnormal specification are consistent with the estimation bias and
the narrower confidence intervals shown in figure 2.
The right panel of figure 3 presents the coverage rates for 95% confidence intervals of
ATT estimates. Similarly to ATE, the coverage rates for ATT estimated under normality
assumptions are close to nominal. For the simulations based on nonnormal errors, the
undercoverage is more severe compared with ATE: the 95% coverage rates decline rapidly
from about 90% for the small samples to 0% for the samples of 15,000 observations
and larger. Again these results are consistent with the bias and pattern of confidence
intervals for ATT shown in the right panel of figure 2.
7 Conclusion
This article describes a Stata implementation of an ML estimator for the parameters
of a binary response model with endogenous switching. The switch probit command
M. Lokshin and Z. Sajaia 385
extends the set of Stata ML algorithms for estimation of the models with endogenous
switching (for example, movestay by Lokshin and Sajaia [2004]). We think that the
ability of the new command to produce estimates of the treatment impact for different
population subgroups could be useful in applied studies of impact evaluation.
The results of our Monte Carlo simulations indicate that while the estimator per-
forms well under the assumption of normally distributed error terms, it produces biased
estimates if the normality assumptions are violated. Researchers who suspect that the
normality assumptions are not likely to hold might want to use other, semiparametric
or nonparametric, methods of estimation for such models.
8 References
Aakvik, A., J. J. Heckman, and E. J. Vytlacil. 2000. Treatment effects for discrete
outcomes when responses to treatment vary among observationally identical persons:
An application to Norwegian vocational rehabilitation programs. NBER Technical
Working Paper No. 262. http://www.nber.org/papers/t0262.
Carrasco, R. 2001. Binary choice with binary endogenous regressors in panel data:
Estimating the effect of fertility on female labor participation. Journal of Business
and Economic Statistics 19: 385–394.
Lokshin, M., and E. Glinskaya. 2009. The effect of male migration on employment
patterns of women in Nepal. World Bank Economic Review 23: 481–507.
Lokshin, M., and Z. Sajaia. 2004. Maximum likelihood estimation of endogenous switch-
ing regression models. Stata Journal 4: 282–289.