0% found this document useful (0 votes)

8 views

Notes 13

This document discusses limited dependent variables and models for analyzing them. It covers binary dependent variables modeled by probit and logit regression. These non-linear models estimate the probability of an outcome using maximum likelihood estimation to account for the binary nature of the dependent variable, unlike the linear probability model.

Uploaded by

zenith6505

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Notes 13

Uploaded by

zenith6505

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Section 13 Limited Dependent Variables

What is a limited dependent variable?

 Our standard assumption of an error term that is normally distributed conditional on the
regressors implies that the dependent variable can be (with positive probability) any real
number, positive or negative.
 Limited dependent variables are dependent variables that have limited ranges: usually
either discontinuous or range bounded. There are many models of LDVs based on what
the limitations are:
o 0/1 dependent variables (dummies) by probit and logit
o Ordered dependent variables by ordered probit and logit
o Categorical dependent variables (with more than two categories) by multinomial
logit
o Truncated dependent variables by Heckman’s procedure
o Censored dependent variables by tobit
o Count (integer) dependent variables by Poisson regression
o Hazard (length) dependent variables by hazard models
 Because of the limited ranges of the dependent variable, the standard additive normal
error is not tenable for these models. Instead we must model the probability of various
discrete outcomes.
 LDV models are usually estimated by maximum likelihood, given the assumed
distribution of the conditional probabilities of various outcomes.

Binary dependent variables

 For binary dependent variable: E  yi | x i   Pr  yi  1| x i .
 Linear probability model: using OLS with a binary dependent variable
o We can model, as usual in OLS, E  yi | x i   0  1 x1i    k x ki .
o Show graph of Pr  yi  1| x i  as linear function of x.
o However, we can’t just stick a normal error term onto this function. If we write
yi  E  yi | x i   ei   0  1 x1i     k x ki  ui , then since yi is either zero or one, ei
can only take on the values    0  1 x1i     k x ki  and
    0  1 x1i    k x ki  .
o If E  yi | x i   Pr  yi  1  0  1 x1i    k x ki  x i , then the error term must

have a conditional Bernoulli distribution with Pr ui   xi    xi , and

Pr ui    xi    1   xi .

~ 154 ~
o Sums of random variables with the Bernoulli distribution do converge to normal,
so the coefficient estimates will still be asymptotically normal.
o However, the immediate problem with this is that the linear function
0  1 x1i    k x ki will not lie in the range [0, 1] that is required for
probabilities for all values of x.
o This problem is mirrored by the fact that the predicted values of y for some
observations is likely to be outside [0, 1], which does not make sense as a
prediction of Pr[yi = 1| x].
 Show diagram of straight-line prediction of probability and possibility of
predictions outside of [0, 1].
o Finally, there is heteroskedasticity in the model as the variance of the Bernoulli

  xi   1  xi , which varies with x.

error term is 
 This is easily accommodated with robust standard errors.
o Bottom line on linear probability model:
 Simple
 Probably OK as long as x is close to sample means, so that predicted Pr[yi
= 1 | x] stays in [0, 1].
 Not the best model when the dependent variable is binary.
 Logit (logistic) and probit regression
o These are the standard models when the dependent variable is binary. They differ
only in the assumed distribution of the error term and are in practice virtually
equivalent.
o Structure of the models: E  yi | x   Pr[ yi  1| x ]  G 0  1 x1i     k x ki ,
where G is a cumulative probability function that is either
1 ez
 Logistic: G  z     z    for the logit model or
1  e z 1  e z
z z
1  12 2
 Normal: G  z     z     d    e d  for probit.
  2
o Draw graph of cumulative distribution function and show interpretation of z and
implied probability of y = 1.
 Compare to linear probability model’s assumption of linear relationship
between z and probability.
 Show how actual data points would look on this graph.
o Estimation of probit and logit models:
 These models are always estimated by (nonlinear) maximum likelihood.
 The (discrete) density function of yi conditional on xi is
1 yi 
f  yi | x i ,    G  x i    1  G  x i   
yi
, yi  0, 1, which can be

~ 155 ~
rewritten less compactly (but more intuitively) as
Pr  yi  1| x i ,    G  x i   ,
Pr  yi  0| x i ,   1  G  x i   .
 The likelihood function, assuming that all observations in the sample are
N
1 yi 
IID, is L   y , x    G  x i    i 1  G  x i   
y
.
i 1

 The likelihood maximization is always done in terms of the log-

likelihood function:
N
ln L   y, x     yi ln G  xi     1  yi  ln 1  G  x i   .
i 1

 This function can be evaluated for any choice of . By searching over the
parameter space for the value of  that maximizes this value, we can
calculate the logit or probit coefficient estimator as the  that leads to the
highest value of the likelihood function.
 Maximum likelihood estimators are known to be consistent,
asymptotically normal, and asymptotically efficient under broadly
applicable conditions.
 Calculating the standard errors of the coefficient estimators is
complicated, but is handled by Stata. The asymptotic covariance matrix
of any MLE is the inverse of the “information matrix”:
1
   2 ln L  ;Y , X   
 1
cov ˆ   I        E 
  
  . The information matrix
 
involves the expected values of the matrix of second partial derivatives of
the log-likelihood function with respect to the  parameters. It can be
approximated for the sample numerically to get an estimated covariance
matrix for the parameter vector.
 Hypothesis tests in this, as in any ML model, are easiest as likelihood
ratio tests: 2 ln Lu  ln Lr  ~ q . Stata test command also works and does
2

ˆ j  c
a Wald test: t  ~ t N  K  where the t distribution is asymptotic.
 
se ˆj

 Goodness of fit:
 Fraction predicted correctly:
o If you take the prediction of yi to be 1 if G x i ˆ  0.5 and  
zero otherwise, then you get a prediction of zero or one
for each yi. The fraction predicted correctly is just what it
sounds like.
 Pseudo-R2:
~ 156 ~
o In the spirit of the usual R2, this is

1

ln L ˆ ; x , y  ,  Z    y , 0, 0, , 0 .
ln L   Z ; x , y 
o [Note: This formula is very strange and looks upside
down, but it’s not. The reason it looks weird is because
we are taking the ratio of logs (we usually subtract them).
Because (with a discrete dependent variable) the
likelihood function is a product of probabilities, it is
always less than one. This means that the logs are
negative, with the denominator being more negative than
the numerator. This, an improvement in fit increases the
likelihood in the numerator by decreasing its absolute
value, making the ratio smaller and the R2 value closer to
one.]
o This ratio is the likelihood function with the best
parameter estimate divided by the likelihood function if
we just predict each y by the sample proportion of y
values that are one.
o Interpretation of  in probit and logit regressions:
E  yi | x i 
 In the usual OLS model,  j  , which is what we are interested
x j
in knowing.
z
 In probit or logit model,  j  is not in useful units because z has no
x j
direct interpretation.
 Use graph to demonstrate  as horizontal movement
 What we’re interested in knowing (for a continuous regressor x) is
 Pr  y  1 d Pr  y  1 z
  G   z   j  g  z   j , where g is the
x j dz x j
probability density function associated with the cumulative distribution
function G.
 Graphical interpretation:  measures horizontal movement due to
unit change in x; G (z) measures effect of unit horizontal
movement on probability of y = 1.
 They have the same sign, so tests of  = 0 are equivalent to tests
 Pr  y  1
of 0.
x j

~ 157 ~
ez
 For logit, g  z      z  1    z   G  z  1  G  z  .
1  e z 
2

o The results of the logit model is often expressed in terms

of odds ratios:
e xi 
Pr  yi  1| x i     x i   
1  e xi 
1  e    x   e
xi 
i
xi 

  x i    e xi     x i   e xi   1    x i    e xi 
  xi  Pr  yi  1| x i 
e xi     "odds ratio"
1    xi  Pr  yi  0| x i 
o j is the effect of xj on the “log odds ratio”
1
1  z2
 For probit, g  z     z   e 2
.
2
 Because they are density functions, g(z)  0 for all z, so the
 Pr  y  1
“partial effects” have the same sign as j.
x j
 For dummy regressors, we are interested in
Pr  y  1| x j  1  Pr  y  1|x j  0 .
 In Stata: probit reports the coefficients and dprobit (which is still
supported but no longer official) reports the partial effects. The
regression is identical for each.
o Note that the partial effects depend on z and thus on x.
You can specify the values at which to evaluate the partial
effects in dprobit with the default being at the means.
o Partial effects of dummy variables are reported (by
default) as difference in probabilities above, with other
variables at means.
 In Stata: logit reports coefficients and logistic reports the “odds-
ˆ
ratio” e j . (This is really the proportional effect of the variable on
the odds ratio, not the odds ratio itself.)
xi  j 
o If xji increases by one, e x  increases to e i
 e xi  e j , so
ˆ j
e measures the estimated proportion by which a one-
unit change in xji changes the odds ratio.
o Interpretation can be tricky:
 All e values are positive.
 A zero effect means that  = 0 and e = 1.

~ 158 ~
 A variable that reduces the odds ratio has a  < 1.
 A variable that increases the odds ratio has a
 > 1.
 Example: If ej = 2 and the initial probability p of
y = 1 for this observation is .2, (so the initial odds-
ratio p/(1 – p) is (.2) / (.8) = 0.25), then a one-unit
increase in xj multiplies the odds ratio by ej = 2,
making it 0.5, which means that the probability of
y = 1 has increased from 0.2 to 0.333 = 0.5/(1 +
0.5).
 If we do the same example for an observation
with an initial p = 0.5, then the initial odds ratio is
1, the unit increase in xj multiplies it by 2, making
the new odds ratio 2, and thus the probability has
increased from 0.5 to 2/(1 + 2) = 0.667.
 Post-estimation commands after probit and logit are very useful
to get predictions (predict gives probability y = 1 by default, not
xb, and margins can be used to get predictions at specific values
of x)
o See probit/logistic postestimation help file for details
o Important for project
o Reliability of probit and logit estimators
 Omitted-variable bias
 This is more of a problem in probit and logit models because a
coefficient of an included variable can be inconsistent even when
it is uncorrelated with the omitted variable
 Heteroskedasticity
 Again, more of a problem in probit and logit because the standard
MLE based on an assumption of homoskedasticity is
inconsistent.
 You can use the White robust estimator for the covariance
(“robust standard errors”), but you are calculating a valid
standard error for a coefficient that does not converge to the true
parameter value, so it is of less utility than in OLS,
 How to deal with these issues?
 Be careful about omitted variables
 Try to specify the model in a scaled way that makes variance as
constant as possible

~ 159 ~
Discrete-choice dependent variables
 What if there are more than two choices?
o Instead of the dependent variable being whether someone attends Reed or not, it
could be whether someone attends Reed (y = 3), attends another private college
(2), attends a public college (1), or doesn’t attend college at all (y = 0).
o This would be four choices rather than two.
o This is an “unordered-choice model:” There is no obvious order to these choices.
If we define y as above and say that changes in characteristics of the individual
(not of the choices) x (say, higher SAT) that make y more likely to move from 0
to 1, we can’t also be confident that these changes in x are more likely to make y
move from 1 to 2 or from 2 to 3.
 Multinomial (polytomous) logit model (Greene 6/e, section 23.11)
xi  j
e
o Pr  yi  j | x i   M
, where there are M distinct choices. This model has
e
m 1
x i m

M(k + 1)  parameters, but only (M – 1)(k + 1) of them are unique because the
sum of the probabilities must be one. (If an increase in family income raises the
probabilities that you will choose y = 2, 3, and 4, it must lower the probability of
choosing y = 1 by an equivalent amount. Thus, i, 1 can be determined from i, 2,
i, 3, and i, 4. Where the second subscript refers to the choice and the first to the
independent variable.) We usually normalize by setting the vector 1 = 0, which
makes the numerator of the probability fraction 1 for choice 1.
 Pr  yi  j | x i  
In the multinomial logit model, ln   x i  j . The coefficients thus
 Pr  y  1| x  
o
 i i 
can be interpreted as the effect of x on the log odds ratio.
o Independence of irrelevant alternatives assumption is implicit in multinomial
logit model
 It shouldn’t matter for the coefficients of the attending-Reed equation
whether one adds attending Lewis & Clark as a special case of attending
a private college (making 5 alternatives) or not
 This assumption may not be reasonable in some cases, making the model
inappropriate.
o Multinomial logit models can be estimated by maximum likelihood methods. In
Stata, use mlogit.
 Related models:
o Conditional logit model: The x variables relate to properties of the choices
instead of or in addition to the individual. (Not clogit in Stata; that’s something
else.)

~ 160 ~
o Nested logit model: Decisions are nested. For example, decision whether to
attend college, then if attending whether to attend Reed, another private college,
or a public. In Stata, use nlogit.
o Multinomial probit: Same thing with normal rather than logistic function. Very
time-consuming to estimate, so it’s not used often.

Ordered dependent variables

 Many variables are ordinal in nature: we know that 4 is bigger than 3 and that 3 is bigger
than 2, but we don’t know the 4 is the same amount bigger than 3 as 3 is bigger than 2.
o Examples would include bond ratings, opinion-survey responses, academic
actions, and perhaps grades and SAT scores.
 We can think of the ordinal dependent variable y as representing levels of the outcomes
of some underlying latent variable y*.
o We assume that yi*  x i   ei , and that we observe the ordinal choice yi:
1 if yi*  1 ,

2 if 1  yi   2 ,
*


yi  3 if  2  yi*  3 ,
 

 M if  M 1  yi* .

o If the error term is normal, then we can use ordered probit to estimate the 
vector and the thresholds  corresponding to the different levels of the variables.
o Ordered logit is used when the error term follows the logistic distribution.
o Ordered probit/logit involves estimating the  vector and the threshold values 1
through M – 1 by maximum likelihood.
o If we normalize the model to give the error term unit variance (divide y and x by
the standard deviation of error), then we have
Pr  yi  1| x i     1  x i  
Pr  yi  2| x i      2  x i      1  x i  
Pr  yi  3| x i     3  x i      2  x i  

Pr  yi  M | x i   1     M 1  x i   .
n
M 
o The likelihood function is L  , ; y, x      I  yi  m  Pr  yi  m | x i ,  ,
i 1  m 1 
where I(yi = m) is an indicator function that is one if the condition is true and the
probability is given by the formulas above. The likelihood function is maximized
by searching over alternative values of  and  to find those that maximize.

~ 161 ~
o Show Greene (6/e) Figure 23.4 from p. 833.

o Partial effects: what does j mean?

 As in the standard probit and logit models, j is the derivative of the
unobserved y* with respect to xj.
 We can derive marginal effects of xj on the probabilities of y being each
value as
 Pr  yi  1| x i 
   1  x i    j ,
x j ,i
 Pr  yi  2| x i 
   1  x i       2  xi     j ,
x j ,i

 Pr  yi  M | x i 
    M 1  x i    j .
x j ,i
 In Stata, you can use the margins command after oprobit to get marginal
effects.
 Margins, dydx(*) predict (outcome(#1)) will calculate the effect of a
one-unit change in each regressor (*) on the probability of
outcome #1. Leave off the predict () to get all.
 Predict gives predicted probabilities of each level for each observation.
 Predict probs* creates new variables starting with probs with
probabilities of each outcome. Can also restrict to get only some
with outcomes() option.

~ 162 ~
Count dependent variables
 Count dependent variables can only take on non-negative integer values.
o Normal distribution is not a plausible choice.
o Poisson distribution is often used for count models:
e i im
 Pr  yi  m| xi   .
m!
 Poisson distribution has mean and variance both equal to i, so
E  yi | x i   i
 In Poisson regression we model  i  e X i  .
n
The log-likelihood function is ln L    e i  yi x i   ln  yi !  and we
x

i 1

estimate as usual by maximizing this function.

 Interpretation of coefficients:
E  yi | x i 
  i  j  e x i  j .
x j
o Poisson regression is implemented in Stata by the poisson command.
 Limitation of Poisson regression
o The fact that the conditional mean and conditional variance of the Poisson
distribution are the same is restrictive.
o If it doesn’t fit your data well, then a more general model might be appropriate.

~ 163 ~
o The most common alternative is the negative binomial regression model, which
is implemented as nbreg in Stata.

Tobit, censored, and truncated regression models

 These three models are very easy to confuse!
o All involve situations where we have no observations from some region of the
(usually normal) distribution.
o Example: Sometimes we have corner solutions in economic decisions: many
people choose to consume zero of many commodities. (This is the tobit model.)
o Example: Sometimes surveys are “top-coded” with the maximum response being
50,001 or something like that. (This is censored regression.)
o Example: If the dependent variable is duration until death of patients after
treatment, some patients will not yet have died. (Another censored regression.)
o Example: Some events sell out, meaning that the observed values of demand are
censored at the site capacity. (Yet another censored regression.)
o Example: Sample consists only of people with values of y below a limit c. (This is
truncated regression model.)
 Tobit estimator for corner solutions
o Suppose that some finite fraction  of observations choose zero, but those
choosing positive quantities follow the remainder of the normal distribution
(lopping off the left-end  of probability).
o Why can’t we just use OLS?
 Like linear probability model, we ignore restrictions on the distribution of
u and we predict values < 0.
o Why can’t we just use the observations with yi > 0?
 This would imply selection on u because we’d be more likely to eliminate
observations with u < 0.
o Why can’t we use ln(y)?
 Observations with y = 0 would have ln(y) = –∞.
o We can model this with a latent variable yi*  x i  ui as a latent underlying

 y , if yi  0,
* *
variable with a normal distribution and yi   i as the observed
0, otherwise
outcome.
 This variable has a censored distribution with finite probability of a zero
outcome but otherwise normally distributed over the positive values.

~ 164 ~
 The conditional density of y is
1  yi  x i  
f  yi | x i     for y > 0, and
   
Pr  yi  0| x i   1    x i  /  .
 This density is the basis for the tobit estimator of the vector .
 Tobit maximizes (over , ) the log-likelihood function:
  x i    1  yi  x i  
ln L  , ; y, x    ln 1        ln   
   i : yi 0      
i : yi  0 
 The limit value (zero here, but it could be some value c) must be
specified.
 Can also have distributions that are censored above, or both above and
below (perhaps the share of merlot in total wine consumption), where
some people choose zero and some choose one).
o Interpreting tobit coefficients
 There are two expected values of interest in the tobit model:
 “Conditional (on yi > 0) expectations”
E  yi |xi | yi  0   E  yi | yi  0, x i 
o Draw graph showing censorship at 0 and density function
  yi 
over yi > 0 = f  yi | yi  0   .
1    yi 
o Remarkable and useful property of standard normal
c 
distribution: E  z | z  c   .
1   c 
o yi > 0 iff ui > –xi  and ui is (by assumption) distributed
normally with mean 0 and variance 2. Thus ui/ is
u u  c 
standard normal and E  i i  c   .
   1   c 
o Conditional on x, E(x) = x, so
E  yi | yi  0, x i   x i   E  ui |ui   x i x i 
u u x 
 x i   E  i i   i x i 
   
 x 
 i 
 
 xi     ,
 xi  
 
  
where we use the properties that (–z) = (z) and 1 – (–z)
= (z).
~ 165 ~
c 
o We define the inverse Mills ratio as   c   .
 c 
 x 
o Then E  yi | yi  0, x i   x i     i  is the “conditional
  
expectation” of y given that y is positive.
 “Unconditional (on y > 0) expectation” (which is still
conditional on x) E  yi | x i  :
E  yi | x i   0  Pr  yi  0| x i   E  yi | yi  0, x i   Pr  yi  0| x i 
 E  yi | yi  0, x i   Pr  yi  0| x i 
  x    u x  
o   x i     i    Pr  i  i 
      
  x  
  i  
  x
  xi        i 
  x    
  i  
   
 x   x 
   i  x i     i  .
     
 Interpretation of j?
E  yi | xi 
 In the usual OLS model,  j  .
x j
 Here,
E  yi | yi  0, x i    x i  /     x i  /  
 j  
x j   xi /  xj
  x i  /  
 j  j .
  xi  /  
  c    c    c     c    c 
o By quotient rule,  .
  c  
c 
2

o But (c) = (c) by definition and using the definition of

the normal density function, (c) = –c (c), so
c   c    c     c  
2
  c 

  c  
c 
2

 c   c   
  c  
2

   c   c    c   .

~ 166 ~
o Therefore,
E  yi | yi  0, x i    x  x   x   
  j    i   i    i   .
x j         
 The expression in braces is between 0 and 1, so
the effect of xj on the conditional expectation of y
is of the same sign as j but smaller magnitude.
 Testing j = 0 is a valid test for the partial effect
being zero.
 Given that E  yi | x i   E  yi | yi  0, x i   Pr  yi  0| x i ,
E  yi | x i  E  yi | yi  0, x i 
  Pr  yi  0| x i 
x j x j
 Pr  yi  0| x i 
  E  yi | yi  0, x i .
x j

 x 
  i 
 Pr  yi  0| x i      j  x 
o    i .
x j  j    
E  yi | yi  0, x i    x  x   x   
o   j    i   i    i   .
x j         
o So (with all , , and  functions evaluated at xi/)
E  yi | x i   xi    j
 j                 x i     
x j     
 xi  xi  
  j         
             
      
 x 
  j   i .
  
o Doing tobit estimation in Stata
 tobit depvar indvars , ll(0) does tobit with zero lower censorship
 ul( ) option specifies possible upper point of censorship
 After estimation, can use the predict command to generate some useful
series:
 predict , pr(0, .) gives the predicted probability that each observation
x
is not censored, Pr  yi  0| x i     i  .
  
 predict , e(0, .) gives the predicted value of each observation
conditional on not being censored, E  yi | yi  0, x i 

~ 167 ~
 predict , ystar(0, .) gives the unconditional predicted value of each
observation E  yi | x i  .
 margins is used after tobit to get partial effects
 The options correspond to those of predict:
o Margins dydx(*) , pr(0, ) gives effect of unit change in all
 Pr  y  0 
variables (*) on probability that y > 0:
x
o Margins dydx(*) , e(0, ) gives effect of unit change in all
variables on expected value condition on not being
E  y | y  0, x 
censored
x
o Margins dydx(*) , ystar(0, ) gives effect of unit change in
all variables on unconditional expected value
E  y *  E  max  y,0  

x x
 Censored regression (top-coding problems, unexpired duration models, etc.)
o We have data on the x variables for all observations, but have no observations on
y for those at one end (or both ends) of the distribution.
 If y > c, then we observe c.
o Let yi  xi  ui , where ui is homoskedastic normal. We don’t observe y but
instead observe wi  min  yi , c i  , where ci is a known constant that can vary with
i.
o Note difference from tobit model: In tobit a finite fraction of people chose the
limit value. Here they chose something continuous outside the limit but we
simply do not observe it.
 This means that we don’t have to model the censorship as part of the
choice, rather only account for it in the estimation based on our flawed
data.
o For uncensored observations, we have the usual distribution of y:
1  yi  x i  
f  w i | x i   f  yi | x i    .
   
o For censored observations,
Pr wi  c i | x i   Pr  yi  c i | x i 
 Pr  ei  c i  x i | x i 
 c  xi  
1  i .
  
o So likelihood function is same as the tobit model, as is estimation.

~ 168 ~
o However, in the censored regression case we don’t need to worry about people
choosing the limit value, we only worry about observing it. Thus, j is the effect of
xj on y, period. We don’t need to hassle with the marginal effects calculations as
in the tobit model. Consequently, we can use the Stata tobit command and just
neglect the margins command afterward.
 Truncated regression models
o Truncated regression differs from censored regression in that neither y nor x is
observed for observations beyond the limit point. Thus, we cannot use these data
points at all, making the tobit estimator impossible to calculate.
 This is a sample problem again, but truncation of the sample (all
variables) is more severe than censorship of a single variable because we
have less (no) information about the missing observations.
 In the censored model, we can use the x values of the censored
observations to determine what kinds of observations will be in the
censored range. In the truncated model, we don’t have that information.
o Truncated regression model
yi  0  x i   i ,

ui | x ~ N (0, 2 ).
 IID assumption is violated:
 We observe (xi, yi) only if yi  ci, where the truncation threshold
can vary with i and can depend on xi.
 The conditional density function of yi given that it is in the sample (> ci)
 y  xi  
 i
f  yi | x i , ,  
2
 e  ,

is g  yi | x i , c i  
e
 yi  c i .
1  F  c i | x i , ,  e2   c  xi  
  i 
 e 
 The  function in the denominator is the probability that observation i is
not censored, given xi and ci. We divide by this to redistribute the
truncated amount of probability over the remaining density function.
 The log-likelihood function is just the log of this density summed over all
the observations in the sample.
 OLS in this case would give slope estimates that are biased toward zero.
 Incidental truncation and sample selection
o Sample selection does not bias OLS estimators unless the selection criterion is
related to u. So selection based exclusively on x or on something outside the
model that is uncorrelated with u does not present a problem.
o “Incidental truncation” occurs when we observe y for only a subset of the
population that depends not on y but on another variable, but the other variable
is correlated with u.

~ 169 ~
 The primary (only?) example in the literature is y = ln(wage offer), which
is observed only for people who work.
 But people who have unusually low wage offers (given their other
characteristics) are less likely to work and therefore more likely to be
truncated, so the variable determining truncation (work status) is
correlated with the error term of the wage equation.
yi  x i  ui
o 1, if z i   vi  0,
si  
0, otherwise.
 si is a sample indicator that is one for observations for which we observe y
and zero otherwise.
 We assume that E  ui | x i , z i   0 and xi is a strict subset of zi.
 We also assume that v is a standard normal that is independent of z, but
that it may be correlated with u.
 E  yi |z i , vi   x i  E  ui |z i , vi   x i   E  ui |vi  .
 Let E  ui |vi   vi with  being a parameter of their joint normal
distribution (related to the correlation).
 This means that
E  yi |zi , vi   x i  vi ,
E  yi |zi , si   x i   E  vi |zi , si  .
 Since our sample is the set of observations for which s = 1, we need the
expected value of y conditional on s = 1, and by logic similar to that used
in the tobit model, E  vi | zi , si  1    z i   , where  is the inverse Mills
ratio /.
 Thus, E  yi | z i , si  1  x i     z i   .
o We can’t observe the  term unless we know . The Heckit estimator is a two-
step estimation procedure for estimating first , then .
 The selection variable s follows a probit model:
Pr  si  1  Pr  zi   vi    Pr vi  zi  
 Pr vi  v i      z i   .
 Thus, we estimate the sample-selection equation as a probit of s on z,
using all of the observations (because we don’t need to observe y for this
equation and we observe z for all observations).
 We then compute the estimated inverse Mills ratio for each observation
  zi ˆ 
as ˆ i    zi ˆ   .
  zi ˆ 

~ 170 ~
 We can then estimate  by running OLS on yi  x i  ˆ i  ui using only
the observations for which y is observed. The inclusion of the estimated
inverse Mills ratio on the right-hand side corrects the bias due to sample
selection and makes the  estimates consistent and approximately
normal.
 Testing  = 0 with a standard t test is a valid test for whether there was
sample selection.
o Note that the regular OLS standard errors are incorrect because they assume that
 is exactly known. There will be error in estimating  by ̂ , so this error needs
to be taken into account in calculating the reliability of ̂ .
 Stata command heckman computes heckit estimator either by full
maximum likelihood or by the two-step estimation method. This will
correct the standard errors.
o In order to apply this model reliably, there must be at least one variable that
determines sample selection that does not affect y.
 In the wage equation, it is usually assumed that family variables such as
number of children would not affect the wage offer but would affect a
person’s choice of whether or not to accept it and work.

~ 171 ~

Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
Regression With A Binary Dependent Variable
No ratings yet
Regression With A Binary Dependent Variable
63 pages
Limited Dependent Variables - Binary Dependent Variables
No ratings yet
Limited Dependent Variables - Binary Dependent Variables
24 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Chapter 5 Mgt
No ratings yet
Chapter 5 Mgt
60 pages
411 Note LDV
No ratings yet
411 Note LDV
12 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Binary Logistic Regression
No ratings yet
Binary Logistic Regression
8 pages
Logit and Probit Models
No ratings yet
Logit and Probit Models
44 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
34 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
Logreg
No ratings yet
Logreg
26 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
Lecture 7 Probit
No ratings yet
Lecture 7 Probit
24 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
Probit and Logit-Madesh
No ratings yet
Probit and Logit-Madesh
22 pages
Assignment On Probit Model
No ratings yet
Assignment On Probit Model
17 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
Logit To Probit To LPM Example
No ratings yet
Logit To Probit To LPM Example
21 pages
LPM, Logit and Probit Models
No ratings yet
LPM, Logit and Probit Models
21 pages
Presentation Last
No ratings yet
Presentation Last
20 pages
metrikaq
No ratings yet
metrikaq
11 pages
Binary Dependent Var
100% (1)
Binary Dependent Var
5 pages
Probit_Logit_Models
No ratings yet
Probit_Logit_Models
26 pages
Regression With A Binary Dependent Variable: Michael Ash
No ratings yet
Regression With A Binary Dependent Variable: Michael Ash
18 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
09-Limited Dependent Variable Models
No ratings yet
09-Limited Dependent Variable Models
71 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Cap1_Slides
No ratings yet
Cap1_Slides
30 pages
STAT3301 - Term Exam 2 - CH11 Study Package
No ratings yet
STAT3301 - Term Exam 2 - CH11 Study Package
6 pages
Econometrics - Qualitative Response Models
No ratings yet
Econometrics - Qualitative Response Models
17 pages
An Introduction To Logistic Regression
No ratings yet
An Introduction To Logistic Regression
48 pages
Chapter 6. Limited dependent variable models FINAL
No ratings yet
Chapter 6. Limited dependent variable models FINAL
16 pages
Econometrics - Exercise set 2 (solution)
No ratings yet
Econometrics - Exercise set 2 (solution)
12 pages
Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
PD2004 9
No ratings yet
PD2004 9
26 pages
Unit - II Regression-LogisticRegressionModels
No ratings yet
Unit - II Regression-LogisticRegressionModels
7 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
BSC Intermediate Econometrics: Please Do Not Distribute
No ratings yet
BSC Intermediate Econometrics: Please Do Not Distribute
25 pages
Chapter 15 Qualitative Response Regression Models Part 2
No ratings yet
Chapter 15 Qualitative Response Regression Models Part 2
31 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
26GeneralizedLinearModelBernoulliAnnotated PDF
No ratings yet
26GeneralizedLinearModelBernoulliAnnotated PDF
46 pages
(Discrete Choice Model Soderbom)
No ratings yet
(Discrete Choice Model Soderbom)
43 pages
Microeconometrie Chapitre1 BinaryOutcomeModels
No ratings yet
Microeconometrie Chapitre1 BinaryOutcomeModels
42 pages
Probit Model
No ratings yet
Probit Model
29 pages
Logit Probit and Tobit Models For Catego PDF
No ratings yet
Logit Probit and Tobit Models For Catego PDF
19 pages
Fernando, Logit Tobit Probit March 2011
No ratings yet
Fernando, Logit Tobit Probit March 2011
19 pages
Basic R Programming: Exercises
No ratings yet
Basic R Programming: Exercises
7 pages
Exercises of Complex Analysis
From Everand
Exercises of Complex Analysis
Simone Malacrida
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Introduction to Logarithms and Exponentials
From Everand
Introduction to Logarithms and Exponentials
Simone Malacrida
No ratings yet
Romer 5e Solutions Manual 01
No ratings yet
Romer 5e Solutions Manual 01
22 pages
ECO2020ILNUncertainty
No ratings yet
ECO2020ILNUncertainty
37 pages
Linear Rational Expectations Models -- Charles H_ Whiteman -- University of Minnesota Press, Minneapolis, 1983
No ratings yet
Linear Rational Expectations Models -- Charles H_ Whiteman -- University of Minnesota Press, Minneapolis, 1983
160 pages
Notes On The Identification of VARs Using External Instruments - Michele Piffer
No ratings yet
Notes On The Identification of VARs Using External Instruments - Michele Piffer
21 pages
Soal Olimpiade SMP
No ratings yet
Soal Olimpiade SMP
9 pages
Manual Dew Point Sensor - Extreme Dry Air
No ratings yet
Manual Dew Point Sensor - Extreme Dry Air
32 pages
Thermodynamics Midterm Review
No ratings yet
Thermodynamics Midterm Review
8 pages
Radical Institutionalism - Basic Concepts
No ratings yet
Radical Institutionalism - Basic Concepts
22 pages
SAT Practice Test 3 With Answer Key and Scoring Info
No ratings yet
SAT Practice Test 3 With Answer Key and Scoring Info
112 pages
S3 - Sanghoon Lee
No ratings yet
S3 - Sanghoon Lee
12 pages
Assignment 2 Cognitive Psych
No ratings yet
Assignment 2 Cognitive Psych
2 pages
Jenway Catalogue PDF
No ratings yet
Jenway Catalogue PDF
104 pages
De Thi Chon Hoc Sinh Gioi Mon Tieng Anh Lop 12 THPT Tinh Tay Ninh Nam 2014 2015
No ratings yet
De Thi Chon Hoc Sinh Gioi Mon Tieng Anh Lop 12 THPT Tinh Tay Ninh Nam 2014 2015
18 pages
320 and 364 Mapleview W 664 674 and 692 Essa Preliminary Geotechnical Report
No ratings yet
320 and 364 Mapleview W 664 674 and 692 Essa Preliminary Geotechnical Report
61 pages
MNSC 3RD CHAPTERimportant Managerial Issues
No ratings yet
MNSC 3RD CHAPTERimportant Managerial Issues
4 pages
Per 3, Eng Nelson PDF
100% (1)
Per 3, Eng Nelson PDF
36 pages
DHRD-form-6
No ratings yet
DHRD-form-6
3 pages
Presentation of DILG MC No. 2021-074: Lgoo Vi Alibaer M. Pangandaman
No ratings yet
Presentation of DILG MC No. 2021-074: Lgoo Vi Alibaer M. Pangandaman
31 pages
Introduction To Demand Planning & Forecasting: Introducción A La Planificación y Previsión de La Demanda
No ratings yet
Introduction To Demand Planning & Forecasting: Introducción A La Planificación y Previsión de La Demanda
6 pages
Core-11 (Honours) : by Regd. /speed Post/E-Mail Berhampur University
No ratings yet
Core-11 (Honours) : by Regd. /speed Post/E-Mail Berhampur University
4 pages
Type 1 Settling: Column Test: Environmental Engineering II Prof. S. Chakraborty 08/09/2020
No ratings yet
Type 1 Settling: Column Test: Environmental Engineering II Prof. S. Chakraborty 08/09/2020
11 pages
Chapter I Pr1 11 Humss J Group 1
No ratings yet
Chapter I Pr1 11 Humss J Group 1
6 pages
ROS Navigation Concepts and Tutorial
No ratings yet
ROS Navigation Concepts and Tutorial
28 pages
Mechanics: Expansion of A Helical Spring
No ratings yet
Mechanics: Expansion of A Helical Spring
2 pages
Thesis Topics in Child Psychology
100% (3)
Thesis Topics in Child Psychology
8 pages
Focal Length of Convex Lens
No ratings yet
Focal Length of Convex Lens
5 pages
Tundra Biome: By: Esther Kim & Mae Lesko AP Biology 7/8 B
No ratings yet
Tundra Biome: By: Esther Kim & Mae Lesko AP Biology 7/8 B
36 pages
10 worksheet Unit 2 Grammar 2
No ratings yet
10 worksheet Unit 2 Grammar 2
2 pages
Lit Quiz No. 2
No ratings yet
Lit Quiz No. 2
3 pages
Relationship of Sociology and Other Social Science Disciplines
80% (5)
Relationship of Sociology and Other Social Science Disciplines
15 pages
STM Course Outline Fall 2024
No ratings yet
STM Course Outline Fall 2024
16 pages
MODULE 6 - (Analysis of Tension Members)
No ratings yet
MODULE 6 - (Analysis of Tension Members)
34 pages
Floridi Et Al. (2021)
No ratings yet
Floridi Et Al. (2021)
26 pages
Hist. PP1
No ratings yet
Hist. PP1
20 pages