SML and Probit in STATA
SML and Probit in STATA
1 Introduction
The parameters of discrete-choice models are typically estimated by maximum like-
lihood (ML) after imposing assumptions on the distribution of the underlying error
terms. If the distributional assumptions are correctly specified, then parametric ML
estimators are known to be consistent and asymptotically efficient. However, as dis-
cussed at length in the semiparametric literature, departures from the distributional
assumptions may lead to inconsistent estimation. This problem has motivated the de-
velopment of several semiparametric estimation procedures which consistently estimate
the model parameters under less restrictive distributional assumptions. Semiparamet-
ric estimation of binary-choice models has been considered by Manski (1975); Cosslett
(1983); Gallant and Nychka (1987); Powell, Stock, and Stoker (1989); Horowitz (1992);
Ichimura (1993); and Klein and Spady (1993), among others.
In this article, I discuss the semi-nonparametric (SNP) approach of Gallant and
Nychka (1987), the semiparametric maximum likelihood (SML) approach of Klein and
Spady (1993), and a set of new Stata commands for semiparametric estimation of uni-
variate and bivariate binary-choice models. The SNP approach of Gallant and Nychka
c 2008 StataCorp LP st0144
G. De Luca 191
(1987), originally proposed for estimation of density functions, was adapted to estima-
tion of univariate and bivariate binary-choice models by Gabler, Laisney, and Lechner
(1993) and De Luca and Peracchi (2007), respectively.1 A generalization of the SML
estimator of Klein and Spady (1993) for semiparametric estimation of bivariate binary-
choice models with sample selection was provided by Lee (1995). The SNP and SML
approaches differ from the parametric approach because they can handle a broader
class of error distributions. The SNP and SML approaches differ from each other in
how they approximate the unknown distributions. The SNP approach uses a flexible
functional form to approximate the unknown distribution while the SML approach uses
kernel functions.
The remainder of the article is organized as follows. In section 2, I briefly review
parametric specification and ML estimation of three binary-choice models of interest.
SNP and SML estimation procedures, the underlying identifiability restrictions, and the
asymptotic properties of the corresponding estimators are discussed in sections 3 and 4,
respectively. Section 5 describes the syntax and the options of the Stata commands,
while section 6 provides some examples. Monte Carlo evidence on the small-sample
performances of the SNP and SML estimators relative to the parametric probit estimator
is presented in section 7.
2 Parametric ML estimation
A univariate binary-choice model is a model for the conditional probability of a binary
indicator. This model is typically represented by the following threshold crossing model,
Y ∗ = α + β X + U (1)
Y = 1(Y ∗ ≥ 0) (2)
where Y ∗ is a latent continuous random variable, X is a k vector of exogenous variables,
θ = (α, β) is a (k + 1) vector of unknown parameters, and U is a latent regression error.
The latent variable Y ∗ is related to its observable counterpart Y through the observation
rules (2), where 1{A} is the indicator function of the event A. If the latent regression
error U is assumed to follow a standardized Gaussian distribution, then model (1)–(2)
is known as a probit model.2 In this case, the log-likelihood function for a random
sample of n observations (Y1 , X 1 ), . . . , (Yn , X n ) is of the form
n
L(θ) = Yi ln πi (θ) + (1 − Yi ) ln {1 − πi (θ)} (3)
i=1
Yj∗ = αj + β
j X j + Uj j = 1, 2 (4)
Yj = 1(Yj∗ ≥ 0) j = 1, 2 (5)
where the Yj∗ are latent variables for which only the binary indicators Yj can be observed,
the X j are kj vectors of (not necessary distinct) exogenous variables, the θ j = (αj , β j )
are (kj + 1) vectors of unknown parameters, and the Uj are latent regression errors.
When U1 and U2 have a bivariate Gaussian distribution with zero means, unit vari-
ances, and correlation coefficient ρ, model (4)–(5) is known as a bivariate probit model.
Because Y1 and Y2 are fully observable, the vectors of parameters θ 1 and θ 2 can always
be estimated consistently by separate estimation of two univariate probit models, one
for Y1 and one for Y2 . However, when the correlation coefficient ρ is different from zero,
it is more efficient to estimate the two equations jointly by maximizing the log-likelihood
function
n
L(θ) = Yi1 Yi2 ln πi11 (θ) + Yi1 (1 − Yi2 ) ln πi10 (θ)+
i=1 (6)
(1 − Yi1 )Yi2 ln πi01 (θ) + (1 − Yi1 )(1 − Yi2 ) ln πi00 (θ)
where θ = (θ 1 , θ 2 , ρ), and the probabilities underlying the four possible realizations of
the two binary indicators Y1 and Y2 are given by3
where Φ2 (·, ·; ρ) is the bivariate Gaussian distribution function with zero means, unit
variances, and correlation coefficient ρ, and μj = αj + β
j X j . An ML estimator θ
maximizes the log-likelihood function (6) over the parameter space Θ = k1 +k2 +2
×
(−1, 1).
Consider a bivariate binary-choice model with sample selection where the indicator
Y1 is always observed, while the indicator Y2 is assumed to be observed only for the
subsample of n1 observations (with n1 < n) for which Y1 = 1. The model can be written
as
3. In the following, the suffix i and the explicit conditioning on the vector of covariates X1 and X2
are suppressed to simplify notation.
G. De Luca 193
Yj∗ = αj + β
j X j + Uj j = 1, 2 (7)
Y1 = 1(Y1∗ ≥ 0) (8)
Y2 = 1(Y2∗ ≥ 0) if Y1 = 1 (9)
When the latent regression errors U1 and U2 have a bivariate Gaussian distribution
with zero means, unit variances, and correlation coefficient ρ, model (7)–(9) is known
as a bivariate probit model with sample selection. Unlike the case of full observability,
the presence of sample selection has two important implications. First, ignoring the
potential correlation between the two latent regression errors may lead to inconsistent
estimates of θ 2 = (α2 , β 2 ) and inefficient estimates of θ 1 = (α1 , β 1 ). Second, identifi-
ability of the model parameters requires imposing at least one exclusion restriction on
the two sets of exogenous covariates X 1 and X 2 (Meng and Schmidt 1985). Construc-
tion of the log-likelihood function for joint estimation of the overall vector of model
parameters θ = (θ 1 , θ 2 , ρ) is straightforward after noticing that the data identify only
three possible events: (Y1 = 1, Y2 = 1), (Y1 = 1, Y2 = 0), and (Y1 = 0). Thus the
log-likelihood function for a random sample of n observations is
n
L(θ) = Yi1 Yi2 ln πi11 (θ) + Yi1 (1 − Yi2 ) ln πi10 (θ) + (1 − Yi1 ) ln πi0 (θ 1 ) (10)
i=1
3 SNP estimation
The basic idea of SNP estimation is to approximate the unknown densities of the latent
regression errors by Hermite polynomial expansions and use the approximations to
derive a pseudo-ML estimator for the model parameters. Once we relax the Gaussian
distributional assumption, a semiparametric specification of the likelihood function is
needed. For the three binary-choice models considered in this article, semiparametric
specifications of the log-likelihood functions have the same form as (3), (6), and (10),
respectively, with the probability functions replaced by4
π11 (θ 1 , θ 2 ) = 1 − F1 (−μ1 ) − F2 (−μ2 ) + F (−μ1 , −μ2 )
π10 (θ 1 , θ 2 ) = F2 (−μ2 ) − F (−μ1 , −μ2 )
π01 (θ 1 , θ 2 ) = F2 (−μ1 ) − F (−μ1 , −μ2 )
π00 (θ 1 , θ 2 ) = F (−μ1 , −μ2 )
where Fj is the unknown marginal distribution function of the latent regression error
Uj , j = 1, 2, and F is the unknown joint distribution function of (U1 , U2 ).5
4. The marginal probability function is defined by π1 (θ1 ) = 1 − F1 (−μ1 ).
5. The probability functions underlying the probit specifications can be easily obtained from these
general expressions by exploiting the symmetry of the Gaussian distribution.
194 SNP and SML estimation
Following Gallant and Nychka (1987), we approximate the unknown joint density,
f , of the latent regression errors by a Hermite polynomial expansion of the form
1
f ∗ (u1 , u2 ) =
τR (u1 , u2 )2 φ(u1 ) φ(u2 ) (11)
ψR
R1 R2 h k
where φ(·) is the standardized Gaussian density, τR (u1 , u2 ) = h=0 k=0 τhk u1 u2 is a
polynomial in u1 and u2 of order R = (R1 , R2 ), and
∞ ∞
ψR = τR (u1 , u2 )2 φ(u1 )φ(u2 ) du1 du2
−∞ −∞
∗
bh bk
with τhk = r=ah s=ak τrs τh−r,k−s , where ah = max(0, h − R1 ), ak = max(0, k − R2 ),
bh = min(h, R1 ), and bk = min(k, R2 ). Integrating f ∗ (u1 , u2 ) alternatively with respect
to u2 and u1 gives the following approximations to the marginal densities f1 and f2
∞
∗
f1 (u1 ) = f ∗ (u1 , u2 ) du2
−∞
2R 2R 2R (12)
1 1 2 1 1
∗
= τhk mk uh1 φ(u1 ) = γ1h uh1 φ(u1 )
ψR ψR
h=0 k=0 h=0
∞
f2∗ (u2 ) = f ∗ (u1 , u2 ) du1
−∞
2R 2R 2R (13)
1 1 2 1 2
∗
= τhk mh uk2 φ(u2 ) = γ2k uk2 φ(u2 )
ψR ψR
h=0 k=0 k=0
6. Further details on the smoothness conditions defining this class of densities can be found in
Gallant and Nychka (1987, 369).
G. De Luca 195
1 ∗
F ∗ (u1 , u2 ) = Φ(u1 )Φ(u2 ) + A (u1 , u2 )φ(u1 )φ(u2 )
ψR 1
1 ∗ 1 ∗
− A (u2 )Φ(u1 )φ(u2 ) − A (u1 )φ(u1 )Φ(u2 )
ψR 2 ψR 3
Similarly, integrating the marginal densities (12) and (13) gives the following approxi-
mations to the marginal distribution functions F1 and F2 ,
1 ∗
F1∗ (u1 ) = Φ(u1 ) − A (u1 )φ(u1 )
ψR 3
1 ∗
F2∗ (u2 ) = Φ(u2 ) − A (u2 )φ(u2 )
ψR 2
7. For instance, a univariate SNP model with a single categorical variable X is identified only if X
can take at least (1 + R1 ) different values. A bivariate SNP model with X is identified only if X
can take at least (2 + R1 R2 )/3 different values. A bivariate SNP model with sample selection, in
which X1 and X2 are two distinct categorical variables with p1 and p2 different values, is identified
only if (2 + R1 R2 ) ≤ 2 p1 p2 .
196 SNP and SML estimation
where
2R1
2R2
A∗1 (u1 , u2 ) = ∗
τhk Ah (u1 )Ak (u2 )
h=0 k=0
2R1
2R2
A∗2 (u2 ) = ∗
τhk mh Ak (u2 )
h=0 k=0
2R1
2R2
A∗3 (u1 ) = ∗
τhk mk Ah (u1 )
h=0 k=0
4 SML estimation
The basic idea of the SML estimation procedure is that of maximizing a pseudo–log-
likelihood function in which the unknown probability functions are locally approximated
by nonparametric kernel estimators.
Consider first SML estimation of a univariate binary-choice model. Before describing
the estimation procedure in detail, we discuss nonparametric identification of the vector
of parameters θ = (α, β). As for the SNP estimation procedure, the intercept coefficient
G. De Luca 197
α can be absorbed into the unknown distribution function of the error term and is not
separately identified. Furthermore, the slope coefficients β can only be identified up to
a scale parameter. In this case, however, the scale normalization must be based on a
continuous variable with a nonzero coefficient and it must be directly imposed on the
estimation process.8 Per Pagan and Ullah (1999), these location-scale normalizations
can be obtained by imposing the linear index restriction
π(θ) = Pr(Y = 1 | X; θ) = Pr{Y = 1 | υ(X; δ)} = π(δ)
where υ(X; δ) = X1 + δ X 2 , X1 is a continuous variable with a nonzero coefficient,
X 2 are the other covariates, and δ = (δ2 , . . . , δk ) is the vector of identifiable parameters
with δj = βj /β1 . The index restriction is also useful to reduce the dimension of the
covariate space thereby avoiding the curse of dimensionality problem.
Under the index restriction, one can use Bayes Theorem to write
P g{υ(X; δ) | Y = 1}
π(δ) = (14)
P g{υ(X; δ) | Y = 1} + (1 − P ) g{υ(X; δ) | Y = 0}
where P = Pr{Y = 1} is the unconditional probability of observing a positive outcome
and g(·) is the conditional density of υ(X; δ) given Y . As in Klein and Spady (1993),
a nonparametric estimator of g1υ {υ(X; δ)} = P g{υ(X; δ) | Y = 1} in the numerator
of (14) is given by
n
υi − υj
g 1υ (υi ; hn ) = {(n − 1)hn }−1 yj K
hn
j=i
where the υ(X j ; δ j ) = Xj1 +δ j X j2 , j = 1, 2, are linear indexes, the Xj1 are continuous
variables with nonzero coefficients, the X j2 are the remaining covariates, and the δ j =
(δj2 , . . . , δjkj ) are vectors of identifiable parameters with δjh = βjh /βj1 . As argued by
Ichimura and Lee (1991), nonparametric identification of a double-index model requires
the existence a distinct continuous variable for each index. Thus, unlike the parametric
or the SNP specification of the model, exclusion restrictions should now include some
continuous variables.
Subject to these identifiability restrictions, Bayes Theorem implies that
P1|1 g(υ | Y1 = 1, Y2 = 1)
π1|1 (δ) = (16)
P1|1 g(υ | Y1 = 1, Y2 = 1) + (1 − P1|1 ) g(υ | Y1 = 1, Y2 = 0)
where n1 is the subsample of observations for which Y1 = 1, and K2 (·) is the product
of two univariate Gaussian kernels with the same bandwidth hn1 . A nonparametric
estimator of the density g0υ|1 (υ) = (1 − P1|1 ) g(υ | Y1 = 1, Y2 = 0) in the denominator
of (16) can be defined in a similar way by replacing y2j with (1 − y2j ). As before, these
nonparametric estimators differ from those adopted by Lee (1995) only because we use
Gaussian kernels, instead of bias-reducing kernels. Thus the conditional probability
π1|1 (δ) is estimated by
Lee (1995) shows that, under mild regularity conditions, the resulting SML estimator is
√
n consistent and asymptotically normal. Furthermore, its asymptotic variance is very
close to the efficiency bound of semiparametric estimators for this type of model.
5 Stata commands
5.1 Syntax of SNP commands
The new Stata commands snp, snp2, and snp2s estimate the parameters of the SNP
binary-choice models considered in this article. In particular, snp fits a univariate
binary-choice model, snp2 fits a bivariate binary-choice model, while snp2s fits a bivari-
ate binary-choice model with sample selection. The general syntax of these commands
is as follows:
snp depvar varlist if in weight , noconstant offset(varname)
order(#) robust from(matname) dplot(filename) level(#)
maximize options
snp2 equation1 equation2 if in weight
, order1(#) order2(#) robust
from(matname) dplot(filename) level(#) maximize options
snp2s depvar varlist if in weight ,
select(depvar s = varlist s , offset(varname) noconstant )
order1(#) order2(#) robust from(matname) dplot(filename) level(#)
maximize options
snp, snp2, and snp2s are implemented for Stata 9 by using ml model lf. These com-
mands share the same features of all Stata estimation commands, including access to
the estimation results and the options for the maximization process (see [R] maximize).
fweights, pweights, and iweights are allowed (see [U] 14.1.6 weight). Most of the
options are similar to those of other Stata estimation commands. A description of the
options that are specific to our SNP commands is provided below.
200 SNP and SML estimation
order(#) specifies the order R to be used in the univariate Hermite polynomial ex-
pansion. The default is order(3).
order1(#) specifies the order R1 to be used in the bivariate Hermite polynomial ex-
pansion. The default is order1(3).
order2(#) specifies the order R2 to be used in the bivariate Hermite polynomial ex-
pansion. The default is order2(3).
robust specifies that the Huber/White/sandwich estimator of the covariance matrix is
to be used in place of the traditional calculation (see [U] 23.11 Obtaining robust
variance estimates).11
from(matname) specifies the name of the matrix to be used as starting values. By
default, starting values are the estimates of the corresponding probit specification,
namely, the probit estimates for snp, the biprobit estimates for snp2, and the
heckprob estimates for snp2s.
dplot(filename) plots the estimated marginal densities of the error terms. A Gaussian
density with the same estimated mean and variance is added to each density plot.
For the snp command, filename specifies the name of the density plot to be created.
For snp2 and snp2s, three new graphs are created. The first is a plot of the esti-
mated marginal density of U1 and is stored as filename 1. The second is a plot of
the estimated marginal density of U2 and is stored as filename 2. The third is a
combination of the two density plots in a single graph and is stored as filename.
sml2s depvar varlist if in weight ,
select(depvar s = varlist s , offset(varname) noconstant )
bwidth1(#) bwidth2(#) from(matname) level(#) maximize options
sml and sml2s are implemented for Stata 9 by using ml model d2 and ml model d0,
respectively. In this case, ml model lf cannot be used because SML estimators violate
11. As pointed out by an anonymous referee, for a finite R, the SNP model can be misspecified and the
robust option accounts for this misspecification in estimating the covariance matrix of the SNP
estimator.
G. De Luca 201
the linear-form restriction.12 Unlike the SNP commands, pweight and robust are not
allowed with sml and sml2s commands. Although this may be a drawback of our SML
routines, it is important to mention that SML estimators impose weaker distributional
assumptions than the SNP estimators and they are also robust to the presence of het-
eroskedasticity of a general but known form and heteroskedasticity of an unknown form
if it depends on the underlying indexes (see Klein and Spady [1993]). A description of
the options that are specific to our SML commands is provided below.
Options
2. Asymptotic properties of the SNP estimators require that the degree R of the
Hermite polynomial expansion increases with the sample size. In particular, snp
generalizes the probit model only if R ≥ 3 (see Gabler, Laisney, and Lechner
[1993]). For snp2 and snp2s, the error terms may have skewness and kurtosis
different from those of a Gaussian distribution only if R1 ≥ 2 or R2 ≥ 2. In
practice, the values of R, R1 , and R2 may be selected either through a sequence of
likelihood-ratio tests or by model-selection criteria such as the Akaike information
criterion or the Bayesian information criterion (see the lrtest command).
3. SML estimation uses Gaussian kernels with a fixed bandwidth. Asymptotic prop-
erties of the SML estimators require the bandwidth parameters to satisfy the re-
−1/6 −1/8
strictions n−1/6 < hn < n−1/8 and n1 < hn1 < n1 . In practice, one may
12. An extensive discussion on the alternative Stata ML models can be found in Gould, Pitblado, and
Sribney (2006).
202 SNP and SML estimation
either experiment with alternative values of hn and hn1 in the above range or use
a more sophisticated method like generalized cross validation (see Gerfin [1996]).
4. The proposed estimators are more computationally demanding than the corre-
sponding parametric estimators because of both the greater complexity of the
likelihood functions and the fact that they are written as ado-files. The number
of iterations required by SNP estimators typically increases with the order of the
Hermite polynomial expansion. Convergence of SML estimators usually requires a
lower number of iterations, but they are more computationally demanding since
kernel regression is conducted at each step of the maximization process. For both
types of estimators, estimation time further depends on the number of observa-
tions and the number of covariates.
6 Examples
This section provides illustrations of the SNP and SML commands using simulated data,
which allows us to have a benchmark for the estimation results. The Stata code for our
data-generating process is
Error terms are generated from a bivariate Gaussian distribution with zero means, unit
variances, and a correlation coefficient equal to 0.5. The set of covariates includes four
variables: X1 and X2 are independently drawn from a standardized uniform distribution
on (−1, 1), X3 is drawn from a chi-squared distribution with 1 degree of freedom, and
X4 is drawn from a Bernoulli distribution with a probability of success equal to 0.5. To
guarantee identifiability of the model parameters, our data-generating process imposes
one exclusion restriction in each equation, namely, X1 only enters the equation of Y1 ,
while X2 only enters the equation of Y2 .
G. De Luca 203
Because of the different scale normalization, estimated coefficients of the probit model
are not directly comparable with those of the SNP and SML models. Here we compare
the ratio of the estimated coefficients by using the nlcom command.
The SNP estimates of the same model, with degree of the univariate Hermite poly-
nomial expansion R = 4, are given by
y1
x1 1.600913 .2670701 5.99 0.000 1.077465 2.124361
x3 -1.688176 .2840946 -5.94 0.000 -2.244991 -1.13136
x4 2.990631 .4982394 6.00 0.000 2.0141 3.967163
. matrix b0=e(b)
Estimated coefficients and standard errors are very close to the corresponding probit
estimates. SNP coefficients are not significantly different from zero, and a likelihood-
ratio test of the probit model against the SNP model does not reject the Gaussianity
assumption. Estimates of skewness and kurtosis are also close to the Gaussian values of
0 and 3, respectively. In general, however, a very large-sample size, of say 10,000 obser-
vations, is necessary to obtain accurate estimates of these higher order moments.13 The
post option in nlcom causes this command to behave like a Stata estimation command.
Below we use these normalized estimates of the snp command as starting values for the
sml command:
13. Simulation also indicates that while the skewness and kurtosis converge to those of the true error
distribution, the reported variance differs by a scale factor from the variance of the true error
distribution.
G. De Luca 205
y1
x1 1.101044 .0526561 20.91 0.000 .99784 1.204248
x3 -1.151508 .0589227 -19.54 0.000 -1.266994 -1.036021
x4 2.056867 .0982408 20.94 0.000 1.864318 2.249415
_cons .1432707 .0589297 2.43 0.015 .0277707 .2587708
y2
x2 1.045511 .0457917 22.83 0.000 .9557608 1.135261
x3 .4806406 .0356907 13.47 0.000 .4106882 .550593
x4 -1.551646 .0813526 -19.07 0.000 -1.711094 -1.392198
_cons .0184006 .0544944 0.34 0.736 -.0884065 .1252077
y1
x1 1.554566 .1293488 12.02 0.000 1.301047 1.808085
x3 -1.617294 .1431821 -11.30 0.000 -1.897926 -1.336662
x4 2.971796 .2476522 12.00 0.000 2.486407 3.457186
y2
x2 1.743243 .1550252 11.24 0.000 1.439399 2.047087
x3 .7931422 .0845287 9.38 0.000 .627469 .9588155
x4 -2.603726 .22436 -11.61 0.000 -3.043464 -2.163989
Intercepts:
_cons1 0 Fixed
_cons2 0 Fixed
SNP coefs:
g_1_1 -.467446 .337281 -1.39 0.166 -1.128505 .1936125
g_1_2 -.0437985 .0702888 -0.62 0.533 -.181562 .0939651
g_1_3 .2417127 .0936064 2.58 0.010 .0582476 .4251778
g_2_1 .0275117 .0667043 0.41 0.680 -.1032263 .1582497
g_2_2 .1097933 .0351317 3.13 0.002 .0409364 .1786502
g_2_3 -.0127886 .0201542 -0.63 0.526 -.0522901 .026713
g_3_1 .1368238 .082591 1.66 0.098 -.0250516 .2986991
g_3_2 .0312873 .0186252 1.68 0.093 -.0052175 .067792
g_3_3 -.0309619 .0215084 -1.44 0.150 -.0731175 .0111938
By specifying the noconstant options, the intercept coefficients are normalized to zero
and starting values are set to the estimates of the bivariate probit model with no in-
tercept. Once differences in the scale of the error terms are taken into account, the
estimated coefficients of biprobit and snp2 seem to be very close. As explained in
section 3, the bivariate probit model is nested in the bivariate SNP model only if the
correlation coefficient ρ is equal to zero. Accordingly, a likelihood-ratio test for the
Gaussianity of the error terms cannot be used. Furthermore, it is important to no-
tice that snp2 and snp2s do not provide standard errors and confidence intervals for
the estimated correlation coefficient. If this is a parameter of interest, inference can be
carried out via the bootstrap, although this alternative can be computationally demand-
ing. The estimated correlation coefficient is indeed provided as an estimation output in
e(rho). Figure 1 shows the plots of the two estimated marginal densities obtained by
specifying the dplot option.
.4
.4
.3
.3
Density
Density
.2
.2
.1
.1
0
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
Eq. 1 Eq. 2
In the next example, we introduce selectivity in the equation for Y2 and present
parametric ML estimates of the resulting bivariate binary-choice model with sample
selection.
y2
x2 1.093629 .0681056 16.06 0.000 .9601448 1.227114
x3 .4040404 .0848008 4.76 0.000 .2378339 .5702469
x4 -1.609261 .1514939 -10.62 0.000 -1.906183 -1.312338
_cons .046692 .1134511 0.41 0.681 -.175668 .269052
y1
x1 1.089935 .0536583 20.31 0.000 .9847668 1.195103
x3 -1.149565 .0601002 -19.13 0.000 -1.267359 -1.031771
x4 2.044188 .0983742 20.78 0.000 1.851378 2.236998
_cons .1475302 .0592512 2.49 0.013 .0313999 .2636604
LR test of indep. eqns. (rho = 0): chi2(1) = 19.62 Prob > chi2 = 0.0000
The snp2s estimates of the same model with (R1 , R2 ) = (4, 3) are given by
G. De Luca 209
y2
x2 1.837312 .1807611 10.16 0.000 1.483027 2.191597
x3 .685451 .1467516 4.67 0.000 .3978231 .9730789
x4 -2.681849 .2982197 -8.99 0.000 -3.266349 -2.097349
y1
x1 1.582025 .1916435 8.26 0.000 1.20641 1.957639
x3 -1.695678 .1991834 -8.51 0.000 -2.08607 -1.305286
x4 3.011402 .3429733 8.78 0.000 2.339186 3.683617
Intercepts:
_cons1 .1475302 Fixed
_cons2 .046692 Fixed
SNP coefs:
g_1_1 -.4411874 .4443872 -0.99 0.321 -1.31217 .4297954
g_1_2 -.0775538 .1175507 -0.66 0.509 -.307949 .1528413
g_1_3 .2447965 .099868 2.45 0.014 .0490589 .4405341
g_2_1 .2157904 .2817236 0.77 0.444 -.3363777 .7679585
g_2_2 .1268945 .0864948 1.47 0.142 -.0426323 .2964212
g_2_3 -.0950307 .0726311 -1.31 0.191 -.2373851 .0473237
g_3_1 .113475 .0886566 1.28 0.201 -.0602888 .2872388
g_3_2 .0453493 .0369828 1.23 0.220 -.0271357 .1178343
g_3_3 -.0294287 .0219723 -1.34 0.180 -.0724937 .0136362
g_4_1 -.0449806 .0489556 -0.92 0.358 -.1409318 .0509705
g_4_2 -.005844 .0185705 -0.31 0.753 -.0422415 .0305535
g_4_3 .0187026 .0113096 1.65 0.098 -.0034638 .040869
As a final example, we provide estimates obtained from the sml2s command by setting
−1/6.02
hn = n−1/6.5 and hn1 = n1 .
. quietly summarize y2
. local bw2=1/(r(N)^(1/6.02))
. sml2s y2 x3 x4, select(y1=x3 x4, offset(x1)) offset(x2) bwidth2(`bw2´) nolog
Two-stage SML estimator - Lee (1995) Number of obs = 2000
Wald chi2(2) = 154.17
Log likelihood = -1044.401 Prob > chi2 = 0.0000
y2
x3 .3691167 .0789709 4.67 0.000 .2143366 .5238969
x4 -1.463612 .1178916 -12.41 0.000 -1.694675 -1.232549
x2 (offset)
y1
x3 -1.063704 .0488601 -21.77 0.000 -1.159468 -.9679396
x4 1.889661 .0845938 22.34 0.000 1.72386 2.055462
x1 (offset)
The first two lines provide a simple way to specify alternative values for the bandwidth
parameters. Here we are implicitly assuming that the number of nonmissing observa-
tions on Y1 is equal to the size of the estimation sample. If this is not the case, because
of missing data on the covariates, the summarize command on the first line should be
appropriately restricted to the relevant estimation sample.
In Design 1, the error terms were generated from a bivariate Gaussian distribution
with zero means, unit variances, and correlation coefficient ρ = −0.5. In Designs 2–4,
the error terms were generated from a mixture of two bivariate Gaussian distributions
with equal covariance matrices,
where π is the mixing probability, and the fj (·, ·; mj , Ω), j = 1, 2, are bivariate Gaussian
densities with mean mj = (mj1 , mj2 ) and covariance matrix
2
ω11 ω12
Ω= 2
ω22
By varying the mixing probability π and the parameters of the two Gaussian compo-
nents f1 (U1 , U2 ; m1 , Ω) and f2 (U1 , U2 ; m2 , Ω), one can then define a family of bivariate
mixtures with given skewness, kurtosis, and correlation coefficient.14 Table 1 gives the
skewness and kurtosis used in each design. The latent regression errors were gener-
ated from an asymmetric and mesokurtic distribution in Design 2, a symmetric and
platykurtic distribution in Design 3, and an asymmetric and leptokurtic distribution in
Design 4.15 Error terms were then standardized to have zero means, unit variances,
14. Although bivariate mixture distributions allow us to control the level of skewness, kurtosis, and
correlation coefficient in each design, it is difficult to assess whether or not these error structures
are nested into the SNP model for a finite value of R. For this reason, our simulation design may
be biased against the SNP estimator.
15. To investigate the small-sample behavior of the three estimators under different levels of skewness
and kurtosis, error terms were always generated with stronger departures from Gaussianity in the
distribution of U2 .
212 SNP and SML estimation
and correlation coefficient ρ = −0.5 in each design. Stata code for the data-generating
process of the non-Gaussian designs is
where the mixing probability pi and the set of parameters (mu11, mu12, mu21, mu22,
v1, v2, cov) are chosen to obtain the selected levels of skewness, kurtosis, and corre-
lation coefficient (see Preston [1953]).
Throughout the study, comparability of the probit, SNP, and SML estimators is
obtained by imposing the scale normalization β11 = β21 = 1. For the parametric probit
and the SNP estimators the normalization is imposed on the estimation results by taking
the ratio of the estimated coefficients β12 /β11 and β23 /β21 , while for the SML estimator
the normalization is directly imposed on the estimation process by constraining the
coefficients of X11 and X21 to one. We always used the default starting values for the
SNP and SML estimators. Furthermore, SNP and SML estimation were performed with
prespecified values of R and hn , respectively. To save computational time, no check was
undertaken to investigate convergence to the global maximum rather than a local one,
and we used rule-of-thumb values for R and hn .
G. De Luca 213
Tables 2–4 focus on the univariate binary-choice model for Y2 and present summary
statistics for the simulation results from 1000 replications with sample sizes 500, 1000,
and 2000, respectively.16 The normalization restrictions imply that there is only one
free parameter in the model whose true value is 1. SNP estimation was performed under
three alternative choices of R (with R = 3, 4, 5) as degree of the univariate Hermite
polynomial expansion, while SML estimation was performed under three alternative
values of the bandwidth parameter hn = n−1/δ (with δ = 6.02, 6.25, 6.5). According
to our simulation results, efficiency losses of the SNP and the SML estimators in the
Gaussian design (Design 1) are rather small. In particular, the relative efficiency of the
SNP estimator relative to the probit estimator ranges between 74% and 89%, while the
relative efficiency of the SML estimator relative to the probit estimator ranges between
78% and 83%.17
A comparison of the three estimators in the non-Gaussian designs further suggests
that SNP and SML estimators substantially dominate the probit estimator, specially in
Designs 2 and 4 where error terms are generated from asymmetric distributions. First,
the bias of the probit estimator is about 10% in Design 2 and about 6.5% in Design 4,
while the bias of SNP and SML estimators never exceed 1.5%. Second, the ratios between
the mean squared estimates (MSE) of the probit estimator and the MSEs of the two
semiparametric estimators range between 1.7 and 5.3 in Design 2, and between 1.2 and
3.3 in Design 4. As expected, efficiency gains of the SNP and SML estimators relative to
the probit estimator always increase as the sample size becomes larger. Third, the actual
rejection rate of the Wald test for the probit estimate being equal to the true value of
the parameter is quite far from the nominal value of 5%, while the actual rejection rates
of the Wald tests for the SNP and SML estimates converge to their nominal values as
the sample size becomes larger.
16. For each simulation design and selected sample size, we provide average and standard deviation of
the estimates, mean square error of each comparable estimator, and rejection rate of the Wald test
for each estimated coefficient being equal to its true value.
17. Results on the SNP estimator are consistent with the simulation results of Klein and Spady (1993)
who find a relative efficiency of 78% on different simulation designs.
214 SNP and SML estimation
Tables 5–7 provide simulation results of the bivariate binary-choice model for Y1
and Y2 . The normalization restrictions now imply that there are two free parameters in
the model, one in equation 1 whose true value is −1 and one in equation 2 whose true
value is 1. In this set of simulations, we compare performances of the bivariate probit
estimator with those of the SNP estimator with R1 = R2 = 4. As for the univariate
model, we find that efficiency losses of the SNP estimator in the Gaussian cases are very
small. In this case, however, a larger sample size is usually needed to obtain substantial
reductions in the MSE. Most of the efficiency gains typically occur for the coefficients of
the second equation where there are stronger departures from Gaussianity (see table 1).
Although rejection rates of the Wald tests for the SNP estimates are better than those
for the bivariate probit estimates, they are still far from their nominal values even with
a sample size n = 2000. This poor coverage of the SNP estimator is likely to be due
to the incorrect choice of R1 and R2 . In other words, the bivariate distribution of the
latent regression errors may not be nested in the SNP model for the selected values of
R1 and R2 . For this kind of model misspecification, the coverage of the SNP estimator
could be improved by using the Huber/White/sandwich estimator of the covariance
matrix. Here our Monte Carlo simulations are based on the traditional calculation of
the covariance matrix to make the results of the SNP estimator comparable with those
of the SML estimators.18
18. As explained in section 5.2, the SML commands do not support the robust option for the Hu-
ber/White/sandwich estimator of the covariance matrix.
216 SNP and SML estimation
Finally, tables 8–10 provide simulation results of the bivariate binary-choice model
with sample selection for Y1 and Y2 . In this case, selectivity was introduced by setting
Y2 to missing whenever Y1 = 0. As for the bivariate model without sample selection,
the normalization restrictions imply that there are two free parameters in the model,
one in the selection equation whose true value is −1 and one in the main equation whose
true value is 1. In this case, we compare performances of the bivariate probit estimator
with sample selection, the SNP estimator with R1 = 4 and R2 = 3, and the SML
−1/6.5
estimator with hn = n−1/6.5 and hn1 = n1 . Our simulation results suggest again
that efficiency losses of SNP and SML estimators with respect to a correctly specified
probit estimator are rather small in both equations (namely, 87% and 80% in the first
equation, and 86% and 70% in the second equation). In the non-Gaussian cases, the
probit estimator is instead markedly biased and less efficient than the SNP and SML
estimators specially in the presence of asymmetric distributions and relatively large-
sample sizes. As before, the actual rejection rates of the Wald tests for the SNP and
SML estimates are better than those for the parametric probit estimates, but they are
still far from their nominal values of 5%. These coverage problems are likely to be
due to the incorrect choice of the degree of the Hermite polynomial expansion and the
bandwidth parameters, respectively. Given the computational burden of our Monte
Carlo simulations, investigating the optimal choice of these parameters is behind the
scope of this article. We leave this topic for future research.
Table 8. Simulation results for the bivariate binary-choice model with sample selection
(n = 500)
Table 9. Simulation results for the bivariate binary-choice model with sample selection
(n = 1000)
Table 10. Simulation results for the bivariate binary-choice model with sample selection
(n = 2000)
8 Acknowledgments
I thank Franco Peracchi, David Drukker, Vince Wiggins, and two anonymous referees
for their valuable comments and suggestions. I also thank Claudio Rossetti for help in
programming the univariate SML estimator.
G. De Luca 219
9 References
Chamberlain, G. 1986. Asymptotic efficiency in semiparametric models with censoring.
Journal of Econometrics 32: 189–218.
———. 1987. Efficiency bounds for distribution-free estimators of the binary choice and
censored regression models. Econometrica 55: 559–586.
De Luca, G., and F. Peracchi. 2007. A sample selection model for unit and item non-
response in cross-sectional surveys. CEIS Tor Vergata—Research Paper Series 33:
1–44.
Gould, W., J. Pitblado, and W. Sribney. 2006. Maximum Likelihood Estimation with
Stata. 3rd ed. College Station, TX: Stata Press.
Horowitz, J. L. 1992. A smoothed maximum score estimator for the binary response
model. Econometrica 60: 505–531.
Ichimura, H. 1993. Semiparametric least squares (SLS) and weighted SLS estimation of
single-index models. Journal of Econometrics 58: 71–120.
Ichimura, H., and L. F. Lee. 1991. Semiparametric least square of multiple index models:
single equation estimation. In Nonparametric and Semiparametric Methods in Econo-
metrics and Statistics, ed. W. A. Barnett, J. Powell, and G. E. Tauchen, 350–351.
Cambridge: Cambridge University Press.
Klein, R. W., and R. H. Spady. 1993. An efficient semiparametric estimator for binary
response models. Econometrica 61: 387–421.
Manski, C. 1975. Maximum score estimation of the stochastic utility model of choice.
Journal of Econometrics 3: 205–228.
Melenberg, B., and A. van Soest. 1996. Measuring the costs of children: Parametric
and semiparametric estimators. Statistica Neerlandica 50: 171–192.
220 SNP and SML estimation
Meng, C. L., and P. Schmidt. 1985. On the cost of partial observability in the bivariate
probit model. International Economic Review 26: 71–85.
Preston, E. J. 1953. A graphical method for the analysis of statistical distributions into
two normal components. Biometrika 40: 460–464.