lpm stata baum
lpm stata baum
Christopher F Baum
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 1 / 73
Introduction
Introduction
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 2 / 73
Modeling a binary outcome with an endogenous binary regressor
Acknowledgement
This presentation is based on the work of Lewbel, Dong & Yang,
“Viewpoint: Comparing features of Convenient Estimators for Binary
Choice Models With Endogenous Regressors”, Canadian Journal of
Economics, 45:3, 2012. Baum’s contribution is the review and
enhancement of the software developed in this research project.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 3 / 73
Motivation
Motivation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 4 / 73
Motivation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 5 / 73
Motivation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 6 / 73
Binary choice models
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 7 / 73
Binary choice models
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 8 / 73
Linear probability models
D = Xβ + ε
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 9 / 73
Linear probability models
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 10 / 73
Linear probability models
The other, well recognized, flaw in the LPM is that its fitted values are
not constrained to lie in the unit interval, so that predicted probabilities
below zero or above one are commonly encountered. Any regressor
that can take on a large range of values will inevitably cause the LPM’s
predictions to breach these bounds.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 11 / 73
Linear probability models
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 12 / 73
Linear probability models Examples of support for the LPM approach
E XAMPLE 1
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 13 / 73
Linear probability models Examples of support for the LPM approach
E XAMPLE 2
This is surely a red herring for Stata users, as the margins command
in Stata 11 or 12 computes those standard errors via the delta method.
They also discuss the difficulty of computing marginal effects for a
binary regressor: again, not an issue for Stata 12 users, with the new
contrast command.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 14 / 73
Maximum Likelihood approach
D = I(X e βe + X 0 β0 + ε ≥ 0)
X e = G(Z , θ, e)
which, for a single binary endogenous regressor, G(·) probit, and ε and
e jointly Normal, is the model estimated by Stata’s biprobit
command.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 15 / 73
Control Function approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 16 / 73
Control Function approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 17 / 73
Control Function approach
D = I(X e βe + X 0 β0 + ε ≥ 0)
Xe = Zα + e
D = I(X e βe + X 0 β0 + λe + U ≥ 0)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 18 / 73
Control Function approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 19 / 73
Special Regressor approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 20 / 73
Special Regressor approach
D = I(X e βe + X 0 β0 + V + ε ≥ 0)
or, equivalently,
D = I(X β + V + ε ≥ 0)
This is the same basic form for D as in the ML or control function (CF)
approach. Note, however, that the special regressor V has been
separated from the other exogenous regressors, and its coefficient
normalized to unity: a harmless normalization.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 21 / 73
Special Regressor approach
The main drawback of this method is that the special regressor V must
be conditionally independent of ε. Even if it is exogenous, it could fail
to satisfy this assumption because of the way in which V might affect
other endogenous regressors. Also, V must be continuously
distributed after conditioning on the other regressors, so that a term
like V 2 could not be included as an additional regressor.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 22 / 73
Special Regressor approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 23 / 73
Special Regressor approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 24 / 73
The average index function (AIF)
D = I(X β + ε ≥ 0)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 25 / 73
The average index function (AIF)
Blundell and Powell (Rev. Ec. Stud., 2004) propose using the average
structural function (ASF) to summarize choice probabilities: F−ε (X β),
even though ε is no longer independent of X . In this case,
F−ε|X (X β|X ) should be computed: a formidable task.
Lewbel, Dong and Yang (Can. J. Econ., 2012) propose using the
measure E(D|X β), which they call the average index function (AIF), to
summarize choice probabilities.
Like the ASF, the AIF is based on the estimated index, and equals the
propensity score when ε ⊥ X . However, when this assumption is
violated (by endogeneity or heteroskedasticity), the AIF is usually
easier to estimate, via a unidimensional nonparametric regression of D
on X β.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 26 / 73
The average index function (AIF)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 27 / 73
The average index function (AIF)
For the LPM, the ASF and AIF both equal the fitted values of the linear
2SLS regression of D on X. For the other methods, the AIF choice
probabilities can be estimated using a standard unidimensional kernel
regression of D on X β̂: for instance, using the lpoly command in
Stata, with the at() option specifying the observed data points. This
will produce the AIF for each observation i, M bi.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 28 / 73
The average index function (AIF)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 29 / 73
The Stata implementation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 30 / 73
The Stata implementation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 31 / 73
The Stata implementation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 32 / 73
Empirical illustration 1
Empirical illustration 1
In this example of the special regressor method, taken from Dong and
Lewbel (BC WP 604), the binary dependent variable is an indicator
that individual i migrates from one US state to another. The objective
is to estimate the probability of interstate migration.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 33 / 73
Empirical illustration 1
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 34 / 73
Empirical illustration 1
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 35 / 73
Empirical illustration 1
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 36 / 73
Empirical illustration 1
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 37 / 73
Empirical illustration 2
Empirical illustration 2
The special regressor Vit in this context is the firm’s return on assets,
or ROA. We also include the lagged value of income over total assets
and a set of year dummies as exogenous factors. The instruments Z
also include the lagged values of two ratios: capital expenditures to
total assets and acquisitions to total assets.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 38 / 73
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 39 / 73
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 40 / 73
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 41 / 73
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 42 / 73
Summary remarks on sspecialreg
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 43 / 73
IV methods with generated instruments
Acknowledgement
This presentation is based on the work of Arthur Lewbel, “Using
Heteroskedasticity to Identify and Estimate Mismeasured and
Endogenous Regressor Models,” Journal of Business & Economic
Statistics, 2012. The contributions of Baum and Mark E Schaffer are
the development of Stata software to implement Lewbel’s methodology.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 44 / 73
Motivation
Motivation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 45 / 73
Motivation Challenges in employing IV methods
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 46 / 73
Motivation Lewbel’s approach
Lewbel’s approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 47 / 73
Motivation Lewbel’s approach
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 48 / 73
The basic framework
Y1 = X 0 β1 + Y2 γ1 + ε1
Y2 = X 0 β2 + Y1 γ2 + ε2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 49 / 73
The basic framework
In many applied contexts, the third assumption made for the validity of
an instrument—that it only indirectly affects the response variable—is
difficult to establish. The zero restriction on its coefficient may not be
plausible. The assumption is readily testable, but if it does not hold, IV
estimates will be inconsistent.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 50 / 73
The basic framework
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 51 / 73
Single-equation estimation
Single-equation estimation
Zj = (Xj − X ) ·
where is the vector of residuals from the ‘first-stage regression’ of
each endogenous regressor on all exogenous regressors, including a
constant vector.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 52 / 73
Single-equation estimation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 53 / 73
Stata implementation ivreg2h
Stata implementation
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 54 / 73
Stata implementation ivreg2h
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 55 / 73
Empirical illustration 1
Empirical illustration 1
In Lewbel’s 2012 JBES paper, he illustrates the use of his method with
an Engel curve for food expenditures. An Engel curve describes how
household expenditure on a particular good or service varies with
household income (Ernst Engel, 1857, 1895).1 Engel’s research gave
rise to Engel’s Law: while food expenditures are an increasing function
of income and family size, food budget shares decrease with income
(Lewbel, New Palgrave Dictionary of Economics, 2d ed. 2007).
1
Not to be confused with Friedrich Engels, Karl Marx’s coauthor.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 56 / 73
Empirical illustration 1
The data are 854 households, all married couples without children,
from the UK Family Expenditure Survey, 1980–1982, as studied by
Banks, Blundell and Lewbel (Review of Economics and Statistics,
1997). The dependent variable is the food budget share, with a sample
mean of 0.285. The key explanatory variable is log real total
expenditures, with a sample mean of 0.599. A number of additional
regressors (age, spouse’s age, ages2 , and a number of indicators) are
available as controls. The coefficients of interest in this model are
those of log real total expenditures and the constant term.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 57 / 73
Empirical illustration 1
.4
.2
0
-.5 0 .5 1 1.5 2
Log Real Total Expenditure
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 58 / 73
Empirical illustration 1
We first estimate the model with OLS regression, ignoring any issue of
mismeasurement. We then reestimate the model with log total income
as an instrument using two-stage least squares: an exactly identified
model. As such, this is also the IV-GMM estimate of the model.
In the following table, these estimates are labeled as OLS and TSLS1.
A Durbin–Wu–Hausman test for the endogeneity of log real total
expenditures in the TSLS1 model rejects with p-value=0.0203,
indicating that application of OLS is inappropriate.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 59 / 73
Empirical illustration 1
(1) (2)
OLS TSLS,ExactID
lrtotexp -0.127 -0.0859
(0.00838) (0.0198)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 60 / 73
Empirical illustration 1
(1) (2)
TSLS,GenInst GMM,GenInst
lrtotexp -0.0554 -0.0521
(0.0589) (0.0546)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 61 / 73
Empirical illustration 1
2
The GMM results do not agree with those labeled GMM2 in the JBES
article. However, it appears that the published GMM2 results are not the true
optimum.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 62 / 73
Empirical illustration 1
(1) (2)
TSLS,AugInst GMM,AugInst
lrtotexp -0.0862 -0.0867
(0.0186) (0.0182)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 63 / 73
Empirical illustration 1
(1) (2)
GMM,ExactID GMM,AugInst
lrtotexp -0.0859 -0.0867
(0.0198) (0.0182)
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 64 / 73
Empirical illustration 2
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 65 / 73
Empirical illustration 2
For purposes of illustration, we first fit the model treating the level of
cash holdings as endogenous, but maintaining that we have no
available external instruments. In this context, ivreg2h produces
three generated instruments: one from each included exogenous
regressor. We employ IV-GMM with a cluster-robust VCE, clustered by
firm.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 66 / 73
Empirical illustration 2
dE 0.0301∗∗∗
(7.24)
dNA -0.0115∗∗∗
(-6.00)
Lev -0.0447∗∗∗
(-18.45)
N 117036
jdf 2
jp 0.245
t statistics in parentheses
∗
p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 67 / 73
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 68 / 73
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 69 / 73
Empirical illustration 2
The results show that there are minor differences in the point
estimates produced by standard IV and those from the augmented
equation. However, the latter are more efficient, with smaller standard
errors for each coefficient. The model is now overidentified by three
degrees of freedom, allowing us to conduct a test of over identifying
restrictions. The p-value of that test,jp, indicates no problem.
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 70 / 73
Empirical illustration 2
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 71 / 73
Summary remarks on ivreg2h
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 72 / 73
Concluding remarks
Concluding remarks
CF Baum (BC / DIW) Implementing new econometric tools in Stata MXSUG, May 2013 73 / 73