0% found this document useful (0 votes)
2 views

Chapter Four

Chapter Four discusses Discrete Choice and Limited Dependent Variable Models, focusing on binary dependent variables and their estimation methods such as linear probability, logit, and probit models. It explains the use of dummy variables to represent qualitative attributes and introduces multivariate choice models like multinomial logit and ordered probit models. Additionally, the chapter covers censored and truncated models, particularly the Tobit regression, highlighting the differences between these concepts and their applications in econometric analysis.

Uploaded by

yodahekahsay19
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter Four

Chapter Four discusses Discrete Choice and Limited Dependent Variable Models, focusing on binary dependent variables and their estimation methods such as linear probability, logit, and probit models. It explains the use of dummy variables to represent qualitative attributes and introduces multivariate choice models like multinomial logit and ordered probit models. Additionally, the chapter covers censored and truncated models, particularly the Tobit regression, highlighting the differences between these concepts and their applications in econometric analysis.

Uploaded by

yodahekahsay19
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Wollo University, College of Business and Economics, Department of Economics

Chapter Four
Discrete Choice and Limited Dependent Variable Model
4.1 Introduction
Limited Dependent Variable (LDV) is broadly defined as a dependent variable whose range
of values is substantively restricted. A binary dependent variable is an example of a LDV.
That is, a binary response/choice variable takes on only two values, zero and one. For
example, a regression model that includes yes/no or present/absent type of response are
known as dichotomous or dummy dependent variable regression model in which the
determinants of an event happening or not happening are identified. They are applicable in a
wide variety of fields and are used in survey or census-type of data. Among the methods that
are used to estimate such models are the linear probability model (LPM), the logit model, and
the probit model. These methods are used to approximate the mathematical relationship
between explanatory variable and dependent dummy variable, which is always assigned
qualitative values. In this section therefore the binary choice model, the multivariate choice
model and censored and truncated models are discussed.

4.2 The Concept of Dummy Variables


Dummy variables are variables that are qualitative in nature mainly used as proxies for other
variables either those cannot be measured quantitatively or those represent values over some
continuous range. Variables, for example, sex, profession and religion are dummy variables.
Dummy variables are, therefore, indicate the presence or absence of an attribute and can be
quantified by constructing an artificial variable that take two values 1 (for the presence of the
attribute) and 0 (for the absence of the attribute).

Example
Suppose we want to test the relationship between household consumption (C) and income (Y)
over the time period 1960-1990. Assume that the relationship between consumption and
income is also affected by other dummy variables like whether the household has children or
not; whether the household head age is over 70 or not; and presence of war in between 1977-
1980 periods. Thus the regression model would be specified as
C t     1Y1t   2 D1t   3 D2 t   4 D4 t  u t

Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 1


Wollo University, College of Business and Economics, Department of Economics

1 if the household has children


D1t  
0 if no children (otherwise)
1 if age of household is over 70 years
Where D2t  
0 otherwise
1 if war is present (t  1977  1980)
D3t  
0 if war is not present (t  1960  1976 & 1981  1990)

4.3 Binary Choice Models


a) Linear Probability Model (LPM): The LPM is the simplest of the limited
dependent variable models to use but has several limitations. It assumes that the conditional
probability increases linearly with the values of the explanatory variables. As a result, the
possibility of the estimated probability lying outside the 0-1 bounds so that the fundamental
problem With the LPM is that it assumes that the marginal or incremental effects of
explanatory variables remain constant throughout, which seems patently unrealistic. This also
leads to non-normality of the error term.

Thus, due to the limitation of the LPM there is a need to have an appropriate model in which
the relationship between the probability an event will occur and the explanatory variable is
non-linear. The most common probability models that fill the identified gaps in LPM are the
logit and probit models, which have the S-shaped of the cumulative distribution function
(CDF). The Logit model is based on the logistic CDF where as the probit model is the normal
CDF and both models guarantee that the estimated probabilities lie in the 0-1 range and that
they are non-linearly related to the explanatory variables. The logistic and probit formulations
are quite comparable; a chief difference being that logistic has slightly flatter tails which is a
normal curve approaches the axes more likely than logistic curve. Therefore the choice
between the two is one of the mathematical convenience and matter of choosing between the
cumulative distributions functions

b) Binary Logit model


Binary logit model is also a non-linear model with non-metric dependent variable with only
two groups, yes/no for example, to be formed and metric and non-metric independent
variables.
Z i  B0  B1 x1i  B2 x 2 i  ...  Bn x ni  U i

Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 2


Wollo University, College of Business and Economics, Department of Economics

Example 1: To analyze how business constraints affect MSEs operators’ perception on growth
potential of enterprises income by asking the operators about the income situation of the
enterprises (that is, whether it increased, remained the same or declined). To measure the
perception of respondents on income of the enterprises the dummy variable (Zi) is constructed as
dummy one if an enterprise experiences growth in income and zero otherwise. Therefore, the
model indicates the probability that enterprise will experience growth in income given the
constraints and control variables. Thus, the logit model on the growth potential of income
(incgrow) given constraints (const), and control variables (contrv) can be specified as:
incgrow   0  1const   2 contrv  .ui

Note that there are different forms of binary logit and hence the interpretations are also
different: probabilities, odds, and logits. Let’s now assume a continuous X. The logit model
has three equivalent forms:

Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 3


Wollo University, College of Business and Economics, Department of Economics

Example 2: to examine the effects of social capital on access to credit the logit model is
employed. The underlying equation for the binary logit model which examine the likelihood
of a household having access to credit is
*
Yi   ' X i  U i

*
where Yi is an unobservable latent variable for having access to credit, X i is a vector of

explanatory variables,  ' is a vector of parameters to be estimated, U i is the error term and

the subscripts is the households.

To examine whether or not a household has access to credit the dummy variable access to
credit is constructed as dummy one if a household has access to credit and zero otherwise.
The observed binary for whether or not a household has access to credit is assumed to be
determined as in the usual logit model.

1 if a household with access to credit , Yi  0
Yi  
0 otherwise, Yi   0

c) Binary Probit Model


Binary probit model: we can apply all procedures from above binary logit model
analogously (only the odds interpretation does not work). Since logistic and normal
distributions are very similar, results are in most situations identical for all practical purposes.
Coefficients can be transformed by a scaling factor (multiply probit coefficients by 1.6-1.8).
Only in the tails results may be different.

An example: absolute poverty status of sample households


This reflects the analysis of the determinants of absolute poverty by classifying poverty as
being poor and non-poor.

Probit and Logit have a S-shaped probability function. As X increases, probability of Y


increases, but never steps outside the 0-1 interval. That is, it approaches zero at slower and
slower rates as X gets small, and it approaches one at slower and slower rates as X gets large.

Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 4


Wollo University, College of Business and Economics, Department of Economics

Graphically, the logit distribution has flatter tailts that it approaches the axes more slowly.
This is the main indicator of the difference between the two.

What do CDF look like in graphical representation?

Shape of the logit and probit

4.4 Multivariate Choice Models


a) Multinomial logit model: In case we classify the dependent variable in more than two
groups, multinomial logit model. In other words, this model is used by extending the logit
model with binary outcomes to the case where the response has more than two outcomes.
Thus, multinomial logit analysis exhibits a superior ability to estimate the effect of
explanatory variables on multiple categories of the dependent variable.

An example: To analyze how business constraints affect MSEs operators’ perception on growth
potential of enterprises income by using the three responses of operators as it is, that is, whether
income increased, remained the same or declined.

To analyze effects of social capital on off-farm income source: off-farm employment, trade,
gift and remittance, and welfare programs. In order to interpret the result of this model we
need to take one category as a reference and thus interpret the remaining three categories in
relative to the reference category. The off-farm income equation which shows interaction
between the off-farm income sources and social capital, controlling for other explanatory
variables, can be written as:
Yij  X i  j  eij

Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 5


Wollo University, College of Business and Economics, Department of Economics

Where Yij is a four category response variable – off-farm income source; Xi is a set of
explanatory variables; j is parameters to be estimated and eij is the disturbance term.

b) Ordered probit model: This is a model with multiple categories, as the case of
multinomial model, but these categories have a natural order. Models for ordinal dependent
variables can be formulated as a threshold model with a latent dependent variable.

An Example: Extreme poverty status of sample households


This reflects the analysis of the determinants of extreme poverty by classifying poverty as
being extreme or hard core poor, poor and non-poor. This can be generalized as follows by

letting the underlying response model be described as:


Yi   ' xi  u i (i  1,2,..., n )

Where Y is the underlying response variable (extreme poverty status), x is a set of


explanatory variables (demographic and socio-economic variables), and u is the residual.

4.5 Censored or Truncated Models


Tobit Regression: Censored and truncated
Censoring occurs, when some observations on the dependent variable report not the true
value but a cut point. Truncation means that complete observations beyond a cut point are
missing. OLS estimates with censored or truncated data are biased.

In (a) data are censored at a. One knows that the true value is a or less. The regression line
would be less steep (dashed line). Truncation means that cases below a are completely
missing. Truncation also biases OLS estimates. (b) is the case of incidential truncation or
sample selection. Due to a non-random selection mechanism information on Y is missing for

Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 6


Wollo University, College of Business and Economics, Department of Economics

some cases. This bias OLS estimates also. Therefore, special estimation methods exist for
such data. In this regard, censored data are analyzed with the tobit model:

Where:

Y∗ is the latent uncensored dependent variable

is a discrete effect on the latent, uncensored variable


What we observe is

In censored regression, the dependent variable may contain some zero values. With these
zero values for the dependent variable, using ordinary least squares (OLS) to estimate the
model would lead to biased and inconsistent results. Proper estimation of the model requires
use of a censored tobit regression. The censored tobit analysis which is given as:
Yi  X i    i (i ,..., n)

Where Yi is the dependent variable with some zero values; Xi refers the explanatory
variables;  is vector of parameters; and i is the error term.

Note: Differences of Truncated and Censored models are


Truncated model does not know how many samples are truncated. For instance, when in a
telephone survey, those who have no phones are truncated. There is no enough
information to obtain correct regression.
Censored Model know how many samples are censored. For instance, in a survey, those
who have no cars are censored but we know the number.

An example: if you want to examine the effects of social capital on income diversification in
order to analyze the implication of household’s engagement in social institutions on
diversifying their income source, you can use the censored tobit model. Since all households
do not necessarily earn income from other sources other than the main source, there might be
a possibility that the dependent variables become zero values. For this, we used censored
tobit analysis which is given as:
Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 7
Wollo University, College of Business and Economics, Department of Economics

yi  xi    i (i,..., n)

Where Yi share of income from other sources to total income of the household which is
censored at zero, Xi is vector of determinant of income diversification including variables
related to social capital and household characteristics,  is vector of parameters, and i is the
error term.

Econometrics Lecture Notes; 2016 By Addisu Molla (PhD) 8

You might also like