Chapter Four
Chapter Four
Chapter Four
Discrete Choice and Limited Dependent Variable Model
4.1 Introduction
Limited Dependent Variable (LDV) is broadly defined as a dependent variable whose range
of values is substantively restricted. A binary dependent variable is an example of a LDV.
That is, a binary response/choice variable takes on only two values, zero and one. For
example, a regression model that includes yes/no or present/absent type of response are
known as dichotomous or dummy dependent variable regression model in which the
determinants of an event happening or not happening are identified. They are applicable in a
wide variety of fields and are used in survey or census-type of data. Among the methods that
are used to estimate such models are the linear probability model (LPM), the logit model, and
the probit model. These methods are used to approximate the mathematical relationship
between explanatory variable and dependent dummy variable, which is always assigned
qualitative values. In this section therefore the binary choice model, the multivariate choice
model and censored and truncated models are discussed.
Example
Suppose we want to test the relationship between household consumption (C) and income (Y)
over the time period 1960-1990. Assume that the relationship between consumption and
income is also affected by other dummy variables like whether the household has children or
not; whether the household head age is over 70 or not; and presence of war in between 1977-
1980 periods. Thus the regression model would be specified as
C t 1Y1t 2 D1t 3 D2 t 4 D4 t u t
Thus, due to the limitation of the LPM there is a need to have an appropriate model in which
the relationship between the probability an event will occur and the explanatory variable is
non-linear. The most common probability models that fill the identified gaps in LPM are the
logit and probit models, which have the S-shaped of the cumulative distribution function
(CDF). The Logit model is based on the logistic CDF where as the probit model is the normal
CDF and both models guarantee that the estimated probabilities lie in the 0-1 range and that
they are non-linearly related to the explanatory variables. The logistic and probit formulations
are quite comparable; a chief difference being that logistic has slightly flatter tails which is a
normal curve approaches the axes more likely than logistic curve. Therefore the choice
between the two is one of the mathematical convenience and matter of choosing between the
cumulative distributions functions
Example 1: To analyze how business constraints affect MSEs operators’ perception on growth
potential of enterprises income by asking the operators about the income situation of the
enterprises (that is, whether it increased, remained the same or declined). To measure the
perception of respondents on income of the enterprises the dummy variable (Zi) is constructed as
dummy one if an enterprise experiences growth in income and zero otherwise. Therefore, the
model indicates the probability that enterprise will experience growth in income given the
constraints and control variables. Thus, the logit model on the growth potential of income
(incgrow) given constraints (const), and control variables (contrv) can be specified as:
incgrow 0 1const 2 contrv .ui
Note that there are different forms of binary logit and hence the interpretations are also
different: probabilities, odds, and logits. Let’s now assume a continuous X. The logit model
has three equivalent forms:
Example 2: to examine the effects of social capital on access to credit the logit model is
employed. The underlying equation for the binary logit model which examine the likelihood
of a household having access to credit is
*
Yi ' X i U i
*
where Yi is an unobservable latent variable for having access to credit, X i is a vector of
explanatory variables, ' is a vector of parameters to be estimated, U i is the error term and
To examine whether or not a household has access to credit the dummy variable access to
credit is constructed as dummy one if a household has access to credit and zero otherwise.
The observed binary for whether or not a household has access to credit is assumed to be
determined as in the usual logit model.
1 if a household with access to credit , Yi 0
Yi
0 otherwise, Yi 0
Graphically, the logit distribution has flatter tailts that it approaches the axes more slowly.
This is the main indicator of the difference between the two.
An example: To analyze how business constraints affect MSEs operators’ perception on growth
potential of enterprises income by using the three responses of operators as it is, that is, whether
income increased, remained the same or declined.
To analyze effects of social capital on off-farm income source: off-farm employment, trade,
gift and remittance, and welfare programs. In order to interpret the result of this model we
need to take one category as a reference and thus interpret the remaining three categories in
relative to the reference category. The off-farm income equation which shows interaction
between the off-farm income sources and social capital, controlling for other explanatory
variables, can be written as:
Yij X i j eij
Where Yij is a four category response variable – off-farm income source; Xi is a set of
explanatory variables; j is parameters to be estimated and eij is the disturbance term.
b) Ordered probit model: This is a model with multiple categories, as the case of
multinomial model, but these categories have a natural order. Models for ordinal dependent
variables can be formulated as a threshold model with a latent dependent variable.
In (a) data are censored at a. One knows that the true value is a or less. The regression line
would be less steep (dashed line). Truncation means that cases below a are completely
missing. Truncation also biases OLS estimates. (b) is the case of incidential truncation or
sample selection. Due to a non-random selection mechanism information on Y is missing for
some cases. This bias OLS estimates also. Therefore, special estimation methods exist for
such data. In this regard, censored data are analyzed with the tobit model:
Where:
In censored regression, the dependent variable may contain some zero values. With these
zero values for the dependent variable, using ordinary least squares (OLS) to estimate the
model would lead to biased and inconsistent results. Proper estimation of the model requires
use of a censored tobit regression. The censored tobit analysis which is given as:
Yi X i i (i ,..., n)
Where Yi is the dependent variable with some zero values; Xi refers the explanatory
variables; is vector of parameters; and i is the error term.
An example: if you want to examine the effects of social capital on income diversification in
order to analyze the implication of household’s engagement in social institutions on
diversifying their income source, you can use the censored tobit model. Since all households
do not necessarily earn income from other sources other than the main source, there might be
a possibility that the dependent variables become zero values. For this, we used censored
tobit analysis which is given as:
Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 7
Wollo University, College of Business and Economics, Department of Economics
yi xi i (i,..., n)
Where Yi share of income from other sources to total income of the household which is
censored at zero, Xi is vector of determinant of income diversification including variables
related to social capital and household characteristics, is vector of parameters, and i is the
error term.