Limited Dependent Variable Models
Limited Dependent Variable Models
1 Truncation
The effect of truncation occurs when the observed data in the sample are
Limited Dependent Variable Models only drawn from a subset of a larger population. The sampling of the
subset is based on the value of the dependent variable.
An example: A study of the determinants of incomes of the poor.
Only households with income below a certain poverty line are part of the
Contents sample.
0
1.3 Estimation
−1
The simple linear regression of the observed variable yi on xi
−2
yi = xi β + ui
−3
0 2 4 6 8 10
x
will yield biased estimates of beta, as the error term ui = (εi |yi∗ > a) is
correlated with xi and E(ui ) = E(εi |yi∗ > a) = σλi > 0.
Figure 1: The truncated regression model.
The truncated regression is therefore usually estimated by maximum
likelihood (ML). The log likelihood function is
where φ(.) is the pdf and Φ(.) the cumulative normal distribution.
N
N
−1 yi − xi β a − xi β
Note that the expected value of the observed variable is not linear in ln L = ln σ φ − ln 1 − Φ
σ σ
xi (try to derive the equation below) i=1 i=1
φ [(xi β − a)/σ)] and allows to estimate both β and σ by an iterative numerical procedure.
E(yi |xi ) = E (yi∗ |yi∗ > a, xi ) = xi β + σ = xi β + σλi
Φ [(xi β − a)/σ] Usual ML properties (consistency, asymptotic efficiency and normality,
etc) apply.
where λi ≡ φ(αi )/Φ(αi ) and αi = (xi β − a)/σ. Figure 1 visualizes
the truncated regression model in an example with N = 30, K = 2 (a
1.4 Implementation in STATA
constant and one independent variable) lower truncation point a = 0,
β = (−2, 0.5) and σ = 1. Stata estimates the truncated regression model by the command
truncreg depvar [indepvars], ll(#)
1.2 Interpretation of Parameters
where ll(#) defines the lower truncation point a. You can also estimate
The interpretation of the parameters depends very much on the research a more general model with a lower and an upper truncation point
question. If the researcher is interested in the underlying linear rela-
tionship in the whole population, the slope coefficients β can simply be truncreg depvar [indepvars], ll(varname) lu(varname)
interpreted as marginal effects. However, if the researcher is only inter- where the upper ll and lower lu thresholds can be observation specific
ested in the effect on the observed subpopulation, the marginal effect is and their values are defined by varname.
5 Lecture Notes in Microeconometrics Limited Dependent Variable Models 6
You can use the post-estimation commands predict and mfx to request 2 Censoring
predictions and marginal effects. For example
Censoring occurs when the values of the dependent variable are restricted
truncreg inc age edu, lu(36000)
to a range of values. As in the case of truncation the dependent variable
predict inc hat, e(.,36000)
is only observed for a subsample. However, there is information (the
mfx compute, predict(e(.,36000)) at(age=45,edu=12)
independent variables) about the whole sample.
fits a truncated regression model to incomes under CHF 36,000, predicts Some examples:
the income E(yi |xi ) = E (yi∗ |yi∗ > a, xi ) in this subpopulation and cal- • Income data are often top-coded in survey data. For example, all
culates the marginal effects of age and education on expected observed incomes above CHF 200,000 may be reported as CHF 200,000.
income E(yi |xi ) for a 45 year old person with income 50,000 and 12 years However, households with high incomes are part of the sample and
of education. their characteristics reported.
• Tickets sold for soccer matches cannot exceed the stadion’s capac-
ity.
• Expenditures for durable goods are either positive or zero. (This
is the example used in Tobin’s (1958) original paper.)
• The number of extramarital affairs are nonnegative. (Note that
although Fair’s (1978) famous article uses a Tobit model, count
data models may be more appropriate)
y
0
of values above 0 with density f (yi |xi ) = σφ[(yi − xi β)/σ].
The expected value of the observed variable is −1
E(yi |xi ) = 0 · P (yi∗ ≤ 0|xi ) + E(yi∗ |yi∗ > 0, xi ) · P (yi∗ > 0|xi ) −2
φ (xi β/σ)
= xi β + σ Φ (xi β/σ) −3
Φ (xi β/σ) 0 2 4
x
6 8 10
The OLS regression of the observed variable yi on xi Stata estimates the standard (type 1) tobit model by the command
tobit depvar [indepvars], ll(0)
yi = xi β + ui
You can also estimate more general models with censoring from above
will yield biased estimates of β, as E(yi |xi ) = xi β Φ(αi ) + σφ(αi ) is not a ll(#) and below lu(#)
linear function of xi . Note that restricting the sample to fully observed
tobit depvar [indepvars], ll(#) lu(#)
observations, i.e. where yi > 0, does not solve the problem as can be
seen in the truncated regression model above. You can use the post-estimation commands predict and mfx to request
The truncated regression is usually estimated by maximum likeli- predictions and marginal effects. For example
hood (ML). Assuming independence across observations, the log likeli- tobit housing inc age edu, ll(0)
hood function is predict housing hat, ystar(0,.)
yi − xi β xi β mfx compute, predict(ystar(0,.)) at(inc=50000,age=45,edu=12)
ln L = ln σ −1 φ + ln 1 − Φ
σ σ
{i|yi >0 } {i|yi =0 }
predicts E(yi |xi ) = E(yi∗ |yi∗ > 0, xi ) · P (yi∗ > 0|xi ) and calculates the
and allows to estimate both β and σ by an iterative numerical procedure. marginal effects of income, age and experience on expected observed
The above likelihood function is a (strange) mixture of discrete and con- housing expenditures E(yi |xi ) for a 45 year old person with income 50,000
tinuous components and standard ML proofs do not apply. However, and 12 years of education.
it can be shown that the Tobit estimator has the usual ML properties.
Although the log-likelihood function of the Tobit model is not globally
concave, it has a unique maximum. The ML estimator is inconsistent in
the presence of heteroscedasticity. Greene (2004, section 22.3.3) shows
how to test for heteroscedasticity.
The ML estimation of the censored regression models rests heavily
on the strong assumption that the error term is normally distributed.
Several semi-parametric estimation strategies have been proposed that
relax the distributional assumption about the error term. See Chay and
Powell (2001) for an introduction.
11 Lecture Notes in Microeconometrics Limited Dependent Variable Models 12
3 Selection E(d*)
2 latent
observed
The sample selection problem occurs when the observed sample is not a
random sample but systematically chosen from the population. Trunca-
1
tion and censoring as special cases are special cases of sample selection 6
d
or incidental truncation.
0
The classical example: Income is only observed for employed persons
but not for the ones that decide to stay at home (historically mainly 21
women). −1
0 1 2 3 4 5 6 7
3.1 The Model (Heckman Selection Model, Tobit Type 2) z
7
Consider a model with two latent variables yi∗ and d∗i which linearly E(y*)
OLS
depend on observable independent variables xi and zi , respectively 6
latent
observed
5
d∗i = zi γ + νi 6
yi∗ = xi β + εi
4
y
3 21
with
1 ρ σe 2
(νi , εi ) ∼ N 0,
ρ σe σε2 1
error correlation explains why, for given xi and zi , points yi∗ above the 4
y
expected value (e.g. point 6) are more likely to be observed. 3 21
3.2 Interpretation of Parameters related with xi if ρ = 0 and zi is correlated with xi . The resulting bias
is called selection bias or sample selectivity bias.
In most cases, we are interested on the effect of independent variables in
Note that there is no bias if the unobservable components are uncor-
the whole population. Therefore we would like to obtain an unbiased and
related (ρ = 0) even when the observed sample is highly selective, i.e.
consistent estimator of β which is directly interpreted as marginal effect.
even when x and z are correlated and thus some values of x are more
In some cases, however, the researcher is interested in the effect on the
likely to be observed than others. Figure 4 shows this situation. Needless
observed population. For regressors that appear on the LHS of both yi∗
to say that there is no bias if the observable and unobservable character-
and d∗i , the marginal effect depends not only on β but also on γ through
istics between the decision and the regression equation are uncorrelated.
the probability of being in the sample. See Greene (2003, section 22.4.2).
This case of a pure random sample is sketched in Figure 5.
3.3 Estimation
3.3.1 Estimation with Maximum Likelihood
The OLS regression of the observed variable yi on xi
The decision and regression equations can be simultaneously estimated
yi = xi β + ui by maximum likelihood under the distributional assumptions made. The
log-likelihood function consists of two parts: (1) The likelihood contri-
will yield biased estimates of β as the factor ρσε φ(zi γ)/Φ(zi γ) is omitted
bution from observations with di = 0, i.e. the probability of not being
and becomes part of the error term. The error term ui is therefore cor-
15 Lecture Notes in Microeconometrics Limited Dependent Variable Models 16
3 21 ing values are very important. Therefore, estimates from the two-step
procedure in the following section are often used as starting values. The
2
ML estimation is only necessary when a test on ρ = 0 is rejected in the
1 two-step estimation.
0 The ML estimation of the heckman selection model rests heavily on
0 2 4 6 8 10
x the assumption that the error terms are jointly normally distributed.
This is a very strong and often unrealistic assumption. Several semi-
Figure 5: The selection model with both uncorrelated observable and parametric estimation strategies have been proposed that relax the dis-
unobservable characteristics, i.e random sampling. tributional assumption about the error term. See Vella (1998) for an
introduction.
Note that this likelihood function identifies β, γ, ρ, σε but not the We can use this to consistently estimate the inverse Mills ratio λ̂i =
variance of ν which was set to unity. In the case of ρ = 0, the log φ(zi γ̂)/Φ(zi γ̂) for all observations.
17 Lecture Notes in Microeconometrics Limited Dependent Variable Models 18
4.5 identified by the non-linearity of the inverse Mills ratio λ(.). However, as
function
4 example can be seen in Figure 6, λ(.) is almost linear for a large range of values
3.5 zi γ. It is therefore strongly advised to include variables in z that are not
3 included in x although it is often difficult to find such variables.
λi=φ(ziγ)/Φ(ziγ)
2.5