0% found this document useful (0 votes)
30 views

Chapter 10: Binary Choice and Limited Dependent Variable Models, and Maximum Likelihood Estimation

The document discusses binary choice models and maximum likelihood estimation. It introduces the linear probability model, logit analysis, and probit analysis for modeling qualitative dependent variables. It also covers tobit analysis, sample selection models, and Heckman analysis. The document then discusses maximum likelihood estimation and its properties for fitting these types of models.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Chapter 10: Binary Choice and Limited Dependent Variable Models, and Maximum Likelihood Estimation

The document discusses binary choice models and maximum likelihood estimation. It introduces the linear probability model, logit analysis, and probit analysis for modeling qualitative dependent variables. It also covers tobit analysis, sample selection models, and Heckman analysis. The document then discusses maximum likelihood estimation and its properties for fitting these types of models.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 10: Binary choice

Chapter 10: Binary choice and limited


dependent variable models, and
maximum likelihood estimation

Overview
The first part of this chapter describes the linear probability model,
logit analysis, and probit analysis, three techniques for fitting regression
models where the dependent variable is a qualitative characteristic. Next
it discusses tobit analysis, a censored regression model fitted using a
combination of linear regression analysis and probit analysis. This leads
to sample selection models and heckman analysis. The second part of the
chapter introduces maximum likelihood estimation, the method used to fit
all of these models except the linear probability model.

Learning outcomes
After working through the corresponding chapter in the textbook, studying
the corresponding slideshows, and doing the starred exercises in the
textbook and the additional exercises in this guide, you should be able to:
• describe the linear probability model and explain its defects
• describe logit analysis, giving the mathematical specification
• describe probit analysis, including the mathematical specification
• calculate marginal effects in logit and probit analysis
• explain why OLS yields biased estimates when applied to a sample
with censored observations, even when the censored observations are
deleted
• explain the problem of sample selection bias and describe how the
heckman procedure may provide a solution to it (in general terms,
without mathematical detail)
• explain the principle underlying maximum likelihood estimation
• apply maximum likelihood estimation from first principles in simple
models.

Further material
Limiting distributions and the properties of maximum likelihood
estimators
Provided that weak regularity conditions involving the differentiability of
the likelihood function are satisfied, maximum likelihood (ML) estimators
have the following attractive properties in large samples:
(1) They are consistent.
(2) They are asymptotically normally distributed.
(3) They are asymptotically efficient.
The meaning of the first property is familiar. It implies that the probability
density function of the estimator collapses to a spike at the true value. This
being the case, what can the other assertions mean? If the distribution

195
20 Elements of econometrics

becomes degenerate as the sample size becomes very large, how can it be
described as having a normal distribution? And how can it be described as
being efficient, when its variance, and the variance of any other consistent
estimator, tend to zero?
To discuss the last two properties, we consider what is known as the
limiting distribution of an estimator. This is the distribution of the
estimator when the divergence between it and its population mean is
multiplied by n . If we do this, the distribution of a typical estimator
remains nondegenerate as n becomes large, and this enables us to say
meaningful things about its shape and to make comparisons with the
distributions of other estimators (also multiplied by n ).
To put this mathematically, suppose that there is one parameter of interest,
θ, and that θˆ is its ML estimator. Then (2) says that

( ) (
n θˆ − θ ~ N 0, σ 2 )
for some variance σ 2. (3) says that, given any other consistent estimator
~ ~
( )
θ , n θ − θ cannot have a smaller variance.

Test procedures for maximum likelihood estimation


This section on ML tests contains material that is a little advanced for an
introductory econometrics course. It is provided because likelihood ratio
tests are encountered in the sections on binary choice models and because
a brief introduction may be of help to those who proceed to a more
advanced course.
There are three main approaches to testing hypotheses in maximum
likelihood estimation: likelihood ratio (LR) tests, Wald tests, and Lagrange
multiplier (LM) tests. Since the theory behind Lagrange multiplier tests is
relatively complex, the present discussion will be confined to the first two
types. We will start by assuming that the probability density function of
a random variable X is a known function of a single unknown parameter
θ and that the likelihood function for θ given a sample of n observations
on X, L(θ | X 1 ,..., X n ) , satisfies weak regularity conditions involving its
differentiability. In particular, we assume that θ is determined by the
first-order condition dL/dθ = 0. (This rules out estimators such as that
in Exercise A10.7) The null hypothesis is H0: θ = θ0, the alternative
hypothesis is H1: θ ≠ θ0, and the maximum likelihood estimate of θ is θˆ .

Likelihood ratio tests


A likelihood ratio test compares the value of the likelihood function at
()
θ = θˆ with its value at θ = θ 0 . In view of the definition of θˆ , L θˆ ≥ L(θ 0 )
0
()
for all θ . However, if the null hypothesis is true, the ratio L θˆ L(θ ) 0
should not be significantly greater than 1. As a consequence, the logarithm
of the ratio,


 L θˆ 
log
() ()
 = log L θˆ − log L(θ 0 )
 L(θ ) 
 0 
should not be significantly different from zero. In that it involves a
comparison of the measures of goodness of fit for unrestricted and
restricted versions of the model, the LR test is similar to an F test.
Under the null hypothesis, it can be shown that in large samples the test
statistic

/5
 ORJ / TÖ  ORJ / T  

196
Chapter 10: Binary choice

has a chi-squared distribution with one degree of freedom. If there are


multiple parameters of interest, and multiple restrictions, the number of
degrees of freedom is equal to the number of restrictions.
Examples
We will return to the example in Section 10.6 in the textbook, where we
have a normally-distributed random variable X with unknown population
mean μ and known standard deviation equal to 1. Given a sample of n
observations, the likelihood function is
 1 − 1 (X   1

 × ... ×  1 e 2
1 −µ )2 − (X n −µ )2
L( µˆ X 1 ,..., X n ) =  e 2 .
 2π   2π 
   

The log–likelihood is

 1  1 n
log L(µˆ | X 1 ,..., X n ) = n log  − ∑ ( X i − µˆ )2
 2
 2π  i =1

and the unrestricted ML estimate is µ̂ = X . The LR statistic for the null


hypothesis H0: μ = μ0 is therefore

If we relaxed the assumption σ = 1, the unrestricted likelihood function is



 1   
2 2
1  X − µˆ  1  X n − µˆ 

L( µˆ , σˆ X 1 ,..., X n ) = 
−  1
e 2  σˆ 

 × ... ×  1 e − 2  σˆ  
σˆ 2π   σˆ 2π 
   

and the log-likelihood is


 1  1 n
log L(µˆ , σˆ | X 1 ,..., X n ) = n log ∑ (X − µˆ ) .
2
 − n log σˆ −
 2σˆ 2
i
 2π  i =1

The first-order condition obtained by differentiating by σ is


∂ log L n 1 n

∂σ
=− + 3
σ σ
∑ (X
i =1
i − µ) = 0
2

from which we obtain


1 n
∑ ( X i − µˆ ) .
2
σˆ 2 =
n i =1
Substituting back into the log-likelihood function, the latter now becomes
a function of μ only (and is known as the concentrated log-likelihood
function or, sometimes, the profile log-likelihood function):
1

 1  1 n 2 n
log L(µ | X 1 ,..., X n ) = n log  − n log ∑ ( X i − µ )2  −

 2π   n i =1  2

As before, the ML estimator of μ is X . Hence the LR statistic is

197
20 Elements of econometrics

It is worth noting that this is closely related to the F statistic obtained


when one fits the least squares model
Xi = μ + ui .
( )
n
2
The least squares estimator of μ is X and RSS = ∑ X i − X .
i =1

n
If one imposes the restriction μ = μ0 , we have RSS R = ∑ ( X i − µ 0 ) and
2

the F statistic i =1


( )
n n

∑ (X − µ0 ) − ∑ X i − X
2 2
i
F (1, n − 1) = i =1 i =1
.
 n
(
 ∑ X i − X )  (n − 1)
2

 i =1 

Returning to the LR statistic, we have


Note that we have used the approximation log(1 + a) = a which is valid


when a is small enough for higher powers to be neglected.

Wald tests
Wald tests are based on the same principle as t tests in that they evaluate
whether the discrepancy between the maximum likelihood estimate θ and
the hypothetical value θ0 is significant, taking account of the variance in
the estimate. The test statistic for the null hypothesis H 0 : θˆ − θ 0 = 0 is
(θˆ − θ ) 0
2

σˆ θ2ˆ

where σˆ θ2ˆ is the estimate of the variance of θ evaluated at the maximum


likelihood value. σˆ θ2ˆ can be estimated in various ways that are
asymptotically equivalent if the likelihood function has been specified
correctly. A common estimator is that obtained as minus the inverse
of the second differential of the log-likelihood function evaluated at
the maximum likelihood estimate. Under the null hypothesis that the
restriction is valid, the test statistic has a chi-squared distribution with one
degree of freedom. When there are multiple restrictions, the test statistic
becomes more complex and the number of degrees of freedom is equal to
the number of restrictions.

198
Chapter 10: Binary choice

Examples
We will use the same examples as for the LR test, first, assuming that σ = 1
and then assuming that it has to be estimated along with μ. In the first case
the log-likelihood function is

 1  1 n
log L(µ | X 1 ,..., X n ) = n log  − ∑ ( X i − µ )2 .

 2π  2 i =1
n
The first differential is ∑ (X
i =1
i − µ ) and the second is – n, so the estimate of

the variance is
1 2
. The Wald test statistic is therefore n X − µ 0 . ( )
n
In the second example, where σ was unknown, the concentrated log-
likelihood function is
1
 1  1 n 2 n
log L(µ | X 1 ,..., X n ) = n log  − n log ∑ ( X i − µ )2  −

 2π   n i =1  2

 1  n 1 n  n  n
= n log  − log − log ∑ ( X i − µ )2  − .
 2 n 2
 2π   i =1  2

The first derivative with respect to μ is


n

dlogL
∑ (X i − µ)
=n i =1 .
dµ n

∑ ( X i − µ )2
i =1

The second derivative is



(− n ) ∑ ( X i − µ )2  −  ∑ ( X i − µ ) − 2∑ ( X i − µ )
n n n

d 2 log L  i =1   i =1  i =1 .
=n
dµ 2  n
2
2

∑ ( X i − µ ) 
 i =1 
n
Evaluated at the ML estimator µ̂ = X , ∑ (X
i =1
i − µ ) = 0 and hence
d 2 log L n2
=− n
dµ 2
∑ (X
i =1
i − µ)
2

σˆ 2
giving an estimated variance , given
n

σˆ 2 =
1 n
(
∑ Xi − X
n i =1
) 2

Hence the Wald test statistic is


(X − µ ) 0
2

. Under the null hypothesis, this


σˆ 2 n
is distributed as a chi-squared statistic with one degree of freedom.
When there is just one restriction, as in the present case, the Wald statistic
is the square of the corresponding asymptotic t statistic (asymptotic because
the variance has been estimated asymptotically). The chi-squared test and
the t test are equivalent, given that, when there is one degree of freedom,
the critical value of the chi-squared statistic for any significance level is the
square of the critical value of the normal distribution.

199
20 Elements of econometrics

LR test of restrictions in a regression model


Given the regression model
k
Yi = β 1 + ∑ β j X ij + u i
j =2

with u assumed to be iid N(0, σ2), the log-likelihood function for the
parameters is

2
 1  1 n  k 
log L(β 1 ,..., β k , σ | Yi , X i , i = 1,...,n) = n log −
 2σ 2 ∑  Yi − β 1 − ∑ β j X ij  .
 
 σ 2π  i =1  j =2 

This is a straightforward generalisation of the expression for a simple


regression derived in Section 10.6 in the textbook. Hence
n 1
log L(b 1 ,..., b k , σ | Yi , X i , i = 1,...n ) = −n log σ − log 2π − Z
2 2σ 2

where
2

n k 
Z = ∑  Yi − β 1 − ∑ β j X ij  .
i =1  j =2 
The estimates of the β parameters affect only Z. To maximise the log-
likelihood, they should be chosen so as to minimise Z, and of course this
is exactly what one is doing when one is fitting a least squares regression.
Hence Z = RSS. It remains to determine the ML estimate of σ. Taking
the partial differential with respect to σ, we obtain one of the first-order
conditions for a maximum:
∂ log L(b 1 ,..., b k , σ ) n 1
= − + 3 RSS = 0 .
∂σ σ σ

From this we obtain


RSS
σˆ 2 = .
n
Hence the ML estimator is the sum of the squares of the residuals divided
by n. This is different from the least squares estimator, which is the sum of
the squares of the residuals divided by n – k, but the difference disappears
as the sample size becomes large. Substituting for σˆ 2 in the log-likelihood
function, we obtain the concentrated likelihood function
1
 RSS  2 n 1
log L(β1 ,..., β k | Yi , X i , i = 1,..., n ) = −n log  − log 2π − RSS
 n  2 2Z / n
n RSS n n
= − log − log 2π −
2 n 2 2
n
= − (log RSS + 1 + log 2π − log n ) .
2

We will re-write this as


n
log LU = − (log RSS U + 1 + log 2π − log n )
2
the subscript U emphasising that this is the unrestricted log-likelihood. If
we now impose a restriction on the parameters and maximise the log-
likelihood function subject to the restriction, it will be
200
Chapter 10: Binary choice

n
log LR = − (log RSS R + 1 + log 2π − log n )
2
where RSS R ≥ RSS U and hence log LR ≤ log LU . The LR statistic for a test
of the restriction is therefore
RSS R
2(log LU − LR ) = n(log RSS R − log RSS U ) = n log .
RSS U

It is distributed as a chi-squared statistic with one degree of freedom


under the null hypothesis that the restriction is valid. If there is more than
one restriction, the test statistic is the same but the number of degrees
of freedom under the null hypothesis that all the restrictions are valid is
equal to the number of restrictions.
An example of its use is the common factor test in Section 12.3 in the
textbook. As with all maximum likelihood tests, it is valid only for large
samples. Thus for testing linear restrictions we should prefer the F test
approach because it is valid for finite samples.

Additional exercises
A10.1
What factors affect the decision to make a purchase of your category of
expenditure in the CES data set?
Define a new variable CATBUY that is equal to 1 if the household makes
any purchase of your category and 0 if it makes no purchase at all. Regress
CATBUY on EXPPC, SIZE, REFAGE, and COLLEGE (as defined in Exercise
A5.6) using: (1) the linear probability model, (2) the logit model, and (3)
the probit model. Calculate the marginal effects at the mean of EXPPC,
SIZE, REFAGE, and COLLEGE for the logit and probit models and compare
them with the coefficients of the linear probability model.

A10.2
Logit analysis was used to relate the event of a respondent working
(WORKING, defined to be 1 if the respondent was working, and 0
otherwise) to the respondent’s educational attainment (S, defined as
the highest grade completed) using 1994 data from the US National
Longitudinal Survey of Youth. In this year the respondents were aged
29–36 and a substantial number of females had given up work to raise a
family. The analysis was undertaken for females and males separately, with
the output shown below (first females, then males, with iteration messages
deleted):

201
20 Elements of econometrics

ORJLW:25.,1*6LI0$/( 

/RJLW(VWLPDWHV1XPEHURIREV 
FKL   
3URE!FKL 
/RJ/LNHOLKRRG 3VHXGR5 


:25.,1*_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@

6_
BFRQV_


ORJLW:25.,1*6LI0$/( 

/RJLW(VWLPDWHV1XPEHURIREV 
FKL   
3URE!FKL 
/RJ/LNHOLKRRG 3VHXGR5 


:25.,1*_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@

6_
BFRQV_


95 per cent of the respondents had S in the range 9–18 years and
the mean value of S was 13.3 and 13.2 years for females and males,
respectively.
From the logit analysis, the marginal effect of S on the probability of
working at the mean was estimated to be 0.030 and 0.020 for females
and males, respectively. Ordinary least squares regressions of WORKING
on S yielded slope coefficients of 0.029 and 0.020 for females and males,
respectively.
As can be seen from the second figure below, the marginal effect of
educational attainment was lower for males than for females over most of
the range S ≥ 9. Discuss the plausibility of this finding.
As can also be seen from the second figure, the marginal effect of
educational attainment decreases with educational attainment for both
males and females over the range S ≥ 9. Discuss the plausibility of this
finding.
Compare the estimates of the marginal effect of educational attainment
using logit analysis with those obtained using ordinary least squares.




PDOHV


SUREDELOLW\

IHPDOHV






          
6
Figure 10.1 Probability of working, as a function of S

202
Chapter 10: Binary choice




PDOHV


PDUJLQDOHIIHFW

IHPDOHV







          
6

Figure 10.2 Marginal effect of S on the probability of working

A10.3
A researcher has data on weight, height, and schooling for 540
respondents in the US National Longitudinal Survey of Youth for the year
2002. Using the data on weight and height, he computes the body mass
index for each individual. If the body mass index is 30 or greater, the
individual is defined to be obese. He defines a binary variable, OBESE,
that is equal to 1 for the 164 obese individuals and 0 for the other 376.
He wishes to investigate whether obesity is related to schooling and fits an
ordinary least squares (OLS) regression of OBESE on S, years of schooling,
with the following result (t statistics in parentheses):
OBESE ˆ = 0.595 – 0.021 S (1)
(5.30) (2.63)
This is described as the linear probability model (LPM). He also fits
1
the logit model F (Z ) = , where F(Z) is the probability of being
1 + e −Z
obese and Z = b 1 + b 2 S , with the following result (again, t statistics in
parentheses):
Ẑ = 0.588 – 0.105 S (2)
(1.07) (2.60)
The figure below shows the probability of being obese and the marginal
effect of schooling as a function of S, given the logit regression. Most
(492 out of 540) of the individuals in the sample had 12 to 18 years of
schooling.

203
20 Elements of econometrics

 

 
SUREDELOLW\
SUREDELOLW\RIEHLQJREHVH  

PDUJLQDOHIIHFW
 

 

 

PDUJLQDOHIIHFW
 

 
                    
\HDUVRIVFKRROLQJ

Figure 10.3
• Discuss whether the relationships indicated by the probability and
marginal effect curves appear to be plausible.
• Add the probability function and the marginal effect function for the
LPM to the diagram. Explain why you drew them the way you did.
• The logit model is considered to have several advantages over the LPM.
Explain what these advantages are. Evaluate the importance of the
advantages of the logit model in this particular case.
• The LPM is fitted using OLS. Explain how, instead, it might be fitted
using maximum likelihood estimation:
Write down the probability of being obese for any obese individual,
given Si for that individual, and write down the probability of not being
obese for any non-obese individual, again given Si for that individual.
Write down the likelihood function for this sample of 164 obese
individuals and 376 non-obese individuals.
Explain how one would use this function to estimate the
parameters. [Note: You are not expected to attempt to derive the
estimators of the parameters.]
Explain whether your maximum likelihood estimators will be the
same or different from those obtained using least squares.

A10.4
A researcher interested in the relationship between parenting, age and
schooling has data for the year 2000 for a sample of 1,167 married males and
870 married females aged 35 to 42 in the National Longitudinal Survey of
Youth. In particular, she is interested in how the presence of young children
in the household is related to the age and education of the respondent.
She defines CHILDL6 to be 1 if there is a child less than 6 years old in the
household and 0 otherwise and regresses it on AGE, age, and S, years of
schooling, for males and females separately using probit analysis. Defining the
probability of having a child less than 6 in the household to be p = F(Z) where
Z = β1 + β2AGE + β3S
she obtains the results shown in the table below (asymptotic standard
errors in parentheses).

204
Chapter 10: Binary choice
males females
–0.137 –0.154
AGE
(0.018) (0.023)
0.132 0.094
S
(0.015) (0.020)
0.194 0.547
constant
(0.358) (0.492)
Z –0.399 –0.874
f (Z ) 0.368 0.272

For males and females separately, she calculates


Z = b1 + b2 AGE + b3 S

where AGE and S are the mean values of AGE and S and b1, b2, and b3
are the probit coefficients in the corresponding regression, and she further
calculates
1
1
− Z2
f (Z ) = e 2


G)
where I =   The values of Z and f (Z ) are shown in the table.
G=
• Explain how one may derive the marginal effects of the explanatory
variables on the probability of having a child less than 6 in the
household, and calculate for both males and females the marginal
effects at the means of AGE and S.
• Explain whether the signs of the marginal effects are plausible. Explain
whether you would expect the marginal effect of schooling to be higher
for males or for females.
• At a seminar someone asks the researcher whether the marginal effect
of S is significantly different for males and females. The researcher
does not know how to test whether the difference is significant and
asks you for advice. What would you say?

A10.5
A health economist investigating the relationship between smoking,
schooling, and age, defines a dummy variable D to be equal to 1 for
smokers and 0 for nonsmokers. She hypothesises that the effects of
schooling and age are not independent of each other and defines an
interactive term schooling*age. She includes this as an explanatory
variable in the probit regression. Explain how this would affect the
estimation of the marginal effects of schooling and age.

A10.6
A researcher has data on the following variables for 5,061 respondents in
the US National Longitudinal Survey of Youth:
• MARRIED, marital status in 1994, defined to be 1 if the respondent was
married with spouse present and 0 otherwise;
• MALE, defined to be 1 if the respondent was male and 0 if female;
• AGE in 1994 (the range being 29–37);
• S, years of schooling, defined as highest grade completed, and
• ASVABC, score on a test of cognitive ability, scaled so as to have mean
50 and standard deviation 10.
205
20 Elements of econometrics

She uses probit analysis to regress MARRIED on the other variables, with
the output shown:

SURELW0$55,('0$/($*(6$69$%&

3URELWHVWLPDWHV1XPEHURIREV 
/5FKL   
3URE!FKL 
/RJOLNHOLKRRG 3VHXGR5 


0$55,('_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@

0$/(_
$*(_
6_
$69$%&_
BFRQV_

Variable Mean Marginal effect
MALE 0.4841 -0.0467
AGE 32.52 0.0110
S 13.31 -0.0007
ASVABC 48.94 0.0097

The means of the explanatory variables, and their marginal effects


evaluated at the means, are shown in the table.
• Discuss the conclusions one may reach, given the probit output and the
table, commenting on their plausibility.
• The researcher considers including CHILD, a dummy variable defined
to be 1 if the respondent had children, and 0 otherwise, as an
explanatory variable. When she does this, its z-statistic is 33.65 and its
marginal effect 0.5685. Discuss these findings.

A10.7
Suppose that the time, t, required to complete a certain process has
probability density function
f (t ) = ae −a (t − b ) with t > β > 0
and you have a sample of n observations with times T1, ..., Tn.
Determine the maximum likelihood estimate of α, assuming that β is
known.

A10.8
In Exercise 10.14 in the textbook, an event could occur with probability
p. Given that the event occurred m times in a sample of n observations,
the exercise required demonstrating that m/n was the ML estimator of p.
Derive the LR statistic for the null hypothesis p = p0. If m = 40 and n =
100, test the null hypothesis p = 0.5.

A10.9
For the variable in Exercise A10.8, derive the Wald statistic and test the
null hypothesis p = 0.5.

206
Chapter 10: Binary choice

Answers to the starred exercises in the textbook


10.1
[This exercise does not have a star in the textbook, but an answer to it is
needed for comparison with the answer to Exercise 10.3.]
The output shows the result of an investigation of how the probability of a
respondent obtaining a bachelor’s degree from a four-year college is
related to the score on ASVABC, using EAEF Data Set 21. BACH is a dummy
variable equal to 1 for those with bachelor’s degrees (years of schooling at
least 16) and 0 otherwise. ASVABC ranged from 22 to 65, with mean value
50.2, and most scores were in the range 40 to 60. Provide an
interpretation of the coefficients. Explain why OLS is not a satisfactory
estimation method for this kind of model.

UHJ%$&+$69$%&

6RXUFH_66GI061XPEHURIREV 
)   
0RGHO_3URE!) 
5HVLGXDO_5VTXDUHG 
$GM5VTXDUHG 
7RWDO_5RRW06( 


%$&+_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@

$69$%&_
BFRQV_


Answer:
The slope coefficient indicates that the probability of earning a bachelor’s
degree rises by 2.4 per cent for every additional point on the ASVABC
score. While this may be realistic for a range of values of ASVABC, it is
not for very low ones. Very few of those with scores in the low end of
the spectrum earned bachelor’s degrees and variations in the ASVABC
score would be unlikely to have an effect on the probability. The intercept
literally indicates that an individual with a 0 score would have a minus
92.3 per cent probability of earning a bachelor’s degree. Given the way
that ASVABC was constructed, a score of 0 was in fact impossible. However
the linear probability model predicts nonsense negative probabilities for all
those with scores of 39 or less, of whom there were many in the sample.
The linear probability model also suffers from the problem that the
standard errors and t and F tests are invalid because the disturbance
term does not have a normal distribution. Its distribution is not even
continuous, consisting of only two possible values for each value of
ASVABC.

10.3
The output shows the results of fitting a logit regression to the data set
described in Exercise 10.1 (with four of the iteration messages deleted).
26.7 per cent of the respondents earned bachelor’s degrees.

207
20 Elements of econometrics

ORJLW%$&+$69$%&

,WHUDWLRQORJOLNHOLKRRG 

,WHUDWLRQORJOLNHOLKRRG 

/RJLVWLFUHJUHVVLRQ1XPEHURIREV 
/5FKL   
3URE!FKL 
/RJOLNHOLKRRG 3VHXGR5 


%$&+_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@

$69$%&_
BFRQV_


The figure shows the probability of earning a bachelor’s degree as a


function of ASVABC. It also shows the marginal effect function.

 

 
FXPXODWLYHHIIHFW

PDUJLQDOHIIHFW
 

 

 

 
     
$69$%&

Figure 10.4
• With reference to the figure, discuss the variation of the marginal effect
of the ASVABC score implicit in the logit regression.
• Sketch the probability and marginal effect diagrams for the OLS
regression in Exercise 10.1 and compare them with those for the logit
regression.
Answer:
In Exercise 10.1 we were told that the mean value of ASVABC in the
sample was 50.2. From the curve for the cumulative probability in the
figure it can be seen that the probability of graduating from college for
respondents with that score is only about 20 per cent. The question states
that most respondents had scores in the range 40–60. It can be seen that
at the top of that range the probability has increased substantially, being
about 60 per cent. Looking at the curve for the marginal probability,
it can be seen that the marginal effect is greatest in the range 50–65,
and of course this is the range with the steepest slope of the cumulative
probability. Exercise 10.1 states that the highest score was 65, where the
probability would be about 90 per cent.
For the linear probability model in Exercise 10.1, the counterpart to the
cumulative probability curve in the figure is a straight line using the

208
Chapter 10: Binary choice

regression result. At the ASVABC mean it predicts that there is a 29%


chance of the respondent graduating from college, considerably more than
the logit figure, but for a score of 65 it predicts a probability of only 63%.
It is particularly unsatisfactory for low ASVABC scores since it predicts
negative probabilities for all scores lower than 38. The OLS counterpart
to the marginal probability curve is a horizontal straight line at 0.023,
showing that the marginal effect is underestimated for ASVABC scores
above 50 and overestimated below that figure. (The maximum ASVABC
score was 65.)
 




FXPXODWLYHHIIHFW



PDUJLQDOHIIHFW











 
     
$69$%&

Figure 10.5

10.7
The following probit regression, with iteration messages deleted, was
fitted using 2,726 observations on females in the National Longitudinal
Survey of Youth using the LFP data set described in Appendix B. The data
are for 1994, when the respondents were aged 29 to 36 and many of them
were raising young families.
SURELW:25.,1*6$*(&+,/'/&+,/'/0$55,('(7+%/$&.(7++,63LI0$/( 

,WHUDWLRQORJOLNHOLKRRG 
,WHUDWLRQORJOLNHOLKRRG 
,WHUDWLRQORJOLNHOLKRRG 
,WHUDWLRQORJOLNHOLKRRG 

3URELWHVWLPDWHV1XPEHURIREV 
/5FKL   
3URE!FKL 
/RJOLNHOLKRRG 3VHXGR5 


:25.,1*_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@

6_
$*(_
&+,/'/_
&+,/'/_
0$55,('_
(7+%/$&._
(7++,63_
BFRQV_

WORKING is a binary variable equal to 1 if the respondent was working
in 1994, 0 otherwise. CHILDL06 is a dummy variable equal to 1 if there
was a child aged less than 6 in the household, 0 otherwise. CHILDL16 is
a dummy variable equal to 1 if there was a child aged less than 16, but
209
20 Elements of econometrics

no child less than 6, in the household, 0 otherwise. MARRIED is equal to 1 if


the respondent was married with spouse present, 0 otherwise. The remaining
variables are as described in EAEF Regression Exercises. The mean values of the
variables are given in the output below:
VXP:25.,1*6$*(&+,/'/&+,/'/0$55,('(7+%/$&.(7++,63LI0$/( 

9DULDEOH_2EV0HDQ6WG'HY0LQ0D[

:25.,1*_
6_
$*(_
&+,/'/_
&+,/'/_
0$55,('_
(7+%/$&._
(7++,63_

Calculate the marginal effects and discuss whether they are plausible. [The
data set and a description are posted on the website.]
Answer:
The marginal effects are calculated in the table below. As might be expected,
having a child aged less than 6 has a large adverse effect, very highly
significant. Schooling also has a very significant effect, more educated
mothers making use of their investment by tending to stay in the labour force.
Age has a significant negative effect, the reason for which is not obvious (the
respondents were aged 29 – 36 in 1994). Being black also has an adverse
effect, the reason for which is likewise not obvious. (The WORKING variable
is defined to be 1 if the individual has recorded hourly earnings of at least
$3. If the definition is tightened to also include the requirement that the
employment status is employed, the latter effect is smaller, but still significant
at the 5 per cent level.)

Variable Mean b Mean × b f(Z) bf(Z)


S 13.3100 0.0893 1.1886 0.2969 0.0265
AGE 17.6464 –0.0439 –0.7747 0.2969 –0.0130
CHILD06 0.3991 –0.5842 –0.2332 0.2969 –0.1735
CHILDL16 0.3180 –0.1359 –0.0432 0.2969 –0.0404
MARRIED 0.6229 –0.0077 –0.0048 0.2969 –0.0023
ETHBLACK 0.1306 –0.2781 –0.0363 0.2969 –0.0826
ETHHISP 0.0723 –0.0192 –0.0014 0.2969 –0.0057
constant 1.0000 0.6735 0.6735
Total 0.7685

10.9
Using the CES data set, perform a tobit regression of expenditure on your
commodity on total household expenditure per capita and household size, and
compare the slope coefficients with those obtained in OLS regressions including
and excluding observations with 0 expenditure on your commodity.
Answer:
The table gives the number of unconstrained observations for each category
of expenditure and the slope coefficients and standard errors from an OLS
regression using the unconstrained observations only, the OLS regression
using all the observations, and the tobit regression. As may be expected, the
discrepancies between the tobit estimates and the OLS estimates are greatest
for those categories with the largest numbers of constrained observations.
In the case of categories such as FDHO, SHEL, TELE, and CLOT, there is very
210 little difference between the tobit and the OLS estimates.
Chapter 10: Binary choice

Comparison of OLS and tobit regressions


OLS, all cases OLS, no 0 cases tobit
n EXPPC SIZE EXPPC SIZE EXPPC SIZE
0.0317 –132.221 0.0317 –133.775 0.0317 –132.054
FDHO 868
(0.0027) (15.230) (0.0027) (15.181) (0.0027) (15.220)
0.0488 5.342 0.0476 –2.842 0.0515 16.887
FDAW 827
(0.0025) (14.279) (0.0027) (15.084) (0.0026) (14.862)
0.2020 –113.011 0.2017 –113.677 0.2024 –112.636
HOUS 867
(0.0075) (42.240) (0.0075) (42.383) (0.0075) (42.256)
0.0147 –40.958 0.0145 –43.073 0.0149 –40.215
TELE 858
(0.0014) (7.795) (0.0014) (7.833) (0.0014) (7.862)
0.0178 16.917 0.0243 –1.325 0.0344 71.555
DOM 454
(0.0034) (19.250) (0.0060) (35.584) (0.0055) (31.739)
0.0078 7.896 0.0115 5.007 0.0121 28.819
TEXT 482
(0.0006) (3.630) (0.0011) (6.431) (0.0010) (5.744)
0.0135 5.030 0.0198 –43.117 0.0294 62.492
FURN 329
(0.0015) (8.535) (0.0033) (21.227) (0.0033) (19.530)
0.0066 3.668 0.0124 –25.961 0.0165 46.113
MAPP 244
(0.0008) (4.649) (0.0022) (13.976) (0.0024) (14.248)
0.0015 –1.195 0.0017 –7.757 0.0028 4.247
SAPP 467
(0.0002) (1.197) (0.0004) (2.009) (0.0004) (2.039)
0.0421 25.575 0.0414 21.831 0.0433 30.945
CLOT 847
(0.0021) (11.708) (0.0021) (12.061) (0.0021) (11.946)
0.0035 0.612 0.0034 –3.875 0.0041 3.768
FOOT 686
(0.0003) (1.601) (0.0003) (1.893) (0.0003) (1.977)
0.0212 –16.361 0.0183 –42.594 0.0229 –7.883
GASO 797
(0.0015) (8.368) (0.0015) (8.732) (0.0016) (9.096)
0.0210 15.862 0.0263 –13.063 0.0516 92.173
TRIP 309
(0.0018) (10.239) (0.0044) (27.150) (0.0042) (24.352)
–0.0007 –6.073 –0.0005 –23.839 –0.0039 –9.797
LOCT 172
(0.0004) (2.484) (0.0018) (9.161) (0.0019) (9.904)
0.0205 –162.500 0.0181 –178.197 0.0220 –160.342
HEAL 821
(0.0036) (20.355) (0.0036) (20.804) (0.0037) (21.123)
0.0754 58.943 0.0743 48.403 0.0806 87.513
ENT 824
(0.0044) (24.522) (0.0046) (26.222) (0.0045) (25.611)
0.0329 33.642 0.0337 23.969 0.0452 91.199
FEES 676
(0.0025) (14.295) (0.0032) (19.334) (0.0031) (17.532)
0.0081 9.680 0.0095 –5.894 0.0117 35.529
TOYS 592
(0.0008) (4.599) (0.0011) (6.205) (0.0011) (6.381)
0.0054 –8.202 0.0050 –12.491 0.0061 –6.743
READ 764
(0.0004) (1.998) (0.0004) (2.212) (0.0004) (2.233)
0.0114 17.678 0.0235 –108.177 0.0396 329.243
EDUC 288
(0.0029) (16.152) (0.0088) (47.449) (0.0072) (42.401)
0.0013 –17.895 0.0057 –48.865 0.0007 –13.939
TOB 368
(0.0009) (4.903) (0.0016) (8.011) (0.0019) (10.736)

211
20 Elements of econometrics

10.12
Show that the tobit model may be regarded as a special case of a selection
bias model.
Answer:
The selection bias model may be written

Yi = Yi* for Bi* > 0 ,


Yi is not observed for Bi* ≤ 0
where the Q variables determine selection. The tobit model is the special
case where the Q variables are identical to the X variables and B* is the
same as Y*.

10.14
An event is hypothesised to occur with probability p. In a sample of
n observations, it occurred m times. Demonstrate that the maximum
likelihood estimator of p is m/n.
Answer:
In each observation where the event did occur, the probability was p. In
each observation where it did not occur, the probability was (1 – p). Since
there were m of the former and n – m of the latter, the joint probability
was p m (1 − p) n − m . Reinterpreting this as a function of p, given m and n, the
log-likelihood function for p is

log L( p ) = m log p + (n − m ) log(1 − p ) .

Differentiating with respect to p, we obtain the first-order condition for a


minimum:

d log L( p ) m n − m
= − = 0.
dp p 1− p

This yields p = m/n. We should check that the second differential is


negative and that we have therefore found a maximum. The second
differential is

d 2 log L( p ) m n−m
=− 2 − .
dp 2
p (1 − p )2
Evaluated at p = m/n,
d 2 log L( p ) n2 n−m 1 1 
=− − = −n 2  + .
dp 2
m  m 2  m n − m
1 − 
 n

This is negative, so we have indeed chosen the value of p that maximises


the probability of the outcome.

212
Chapter 10: Binary choice

10.18
Returning to the example of the random variable X with unknown mean
μ and variance σ2, the log-likelihood for a sample of n observations was
given by equation (10.34):

n n 1  1 1 
log L = − log 2π − log σ 2 + 2  − ( X 1 − µ ) 2 − ... − ( X n − µ ) 2  .
2 2 σ  2 2 

The first-order condition for μ produced the ML estimator of μ and the first
order condition for σ then yielded the ML estimator for σ. Often, the variance
is treated as the primary dispersion parameter, rather than the standard
deviation. Show that such a treatment yields the same results in the present
case. Treat σ 2 as a parameter, differentiate log L with respect to it, and solve.
Answer:

∂ log L n 1  1 1 2
 − ( X 1 − µ ) − ... − ( X n − µ )  = 0.
2
=− −
∂σ 2 2σ 2 σ 4  2 2 

Hence

σ2 =
1
n
(
( X 1 − µ ) 2 + ... + ( X n − µ ) 2 )
as before. The ML estimator of μ is X as before.

10.19
In Exercise 10.4, log L0 is –1485.62. Compute the pseudo-R2 and confirm
that it is equal to that reported in the output.
Answer:
As defined in equation (10.43),
log L −1403.0835
pseudo-R2 = 1 – =1– = 0.0556,
log L0 − 1485.6248
as appears in the output.

10.20
In Exercise 10.4, compute the likelihood ratio statistic 2(log L – log L0),
confirm that it is equal to that reported in the output, and perform the
likelihood ratio test.
Answer:
The likelihood ratio statistic is 2(–1403.0835 + 1485.6248) = 165.08,
as printed in the output. Under the null hypothesis that the coefficients
of the explanatory variables are all jointly equal to 0, this is distributed
as a chi-squared statistic with degrees of freedom equal to the number of
explanatory variables, in this case 7. The critical value of chi-squared at
the 0.1 per cent significance level with 7 degrees of freedom is 24.32, and
so we reject the null hypothesis at that level.

Answers to the additional exercises


A10.1
In the case of FDHO and HOUS there were too few non-purchasing
households to undertake the analysis sensibly (one and two, respectively).
The results for the logit analysis and the probit analysis were very similar.
The linear probability model also yielded similar results for most of

213
20 Elements of econometrics

the commodities, the coefficients being similar to the logit and probit
marginal effects and the t statistics being of the same order of magnitude
as the z statistics for the logit and probit. However for those categories
of expenditure where most households made purchases, and the sample
was therefore greatly imbalanced, the linear probability model gave very
different results, as might be expected.
The total expenditure of the household and the size of the household were
both highly significant factors in the decision to make a purchase for all
the categories of expenditure except TELE, LOC and TOB. In the case of
TELE, only 11 households did not make a purchase, the reasons apparently
being non-economic. LOCT is on the verge of being an inferior good and
for that reason is not sensitive to total expenditure. TOB is well-known not
to be sensitive to total expenditure.
Age was a positive influence in the case of TRIP, HEAL, and READ and a
negative one for FDAW, FURN, FOOT, TOYS, EDUC, and TOB.
A college education was a positive influence for TRIP, HEAL, READ and
EDUC and a negative one for TOB.
Most of these effects seem plausible with simple explanations.

Linear probability model, dependent variable CATBUY


cases with
EXPPC x 10–4 SIZE REFAGE COLLEGE
probability
n b2 t b3 t b4 t b5 t <0 >1
FDHO 868 –0.0002 –0.16 0.0005 0.52 –0.0001 –0.90 0.0017 0.68 0 288
FDAW 827 0.0518 5.63 0.0181 3.37 –0.0018 –3.98 0.0101 0.68 0 173
HOUS 867 0.0025 1.19 0.0000 0.02 –0.0000 –0.43 0.0029 0.83 0 181
TELE 858 0.0092 1.85 0.0060 2.06 0.0004 1.66 0.0123 1.53 0 136
DOM 454 0.0926 4.22 0.0433 3.39 0.0019 1.84 0.0850 2.40 0 0
TEXT 482 0.1179 5.51 0.0690 5.52 –0.0019 –1.80 0.0227 0.66 0 5
FURN 329 0.1202 5.75 0.0419 3.43 –0.0036 –3.61 –0.0050 –0.15 0 0
MAPP 244 0.0930 4.71 0.0540 4.69 0.0012 1.25 0.0049 0.15 0 0
SAPP 467 0.1206 5.59 0.0655 5.20 –0.0012 –1.18 0.0174 0.50 0 4
CLOT 847 0.0316 4.60 0.0121 3.02 –0.0008 –2.30 0.0028 0.25 0 176
FOOT 686 0.0838 4.75 0.0444 4.31 –0.0028 –3.29 –0.0283 –0.99 0 12
GASO 797 0.0658 5.56 0.0374 5.42 –0.0013 –2.25 0.0222 1.16 0 119
TRIP 309 0.2073 10.65 0.0599 5.27 0.0027 2.89 0.1608 5.11 0 5
LOCT 172 –0.0411 –2.32 –0.0040 –0.39 –0.0011 –1.29 0.0109 0.38 1 0
HEAL 821 0.0375 3.79 0.0162 2.81 0.0030 6.39 0.0466 2.91 0 137
ENT 824 0.0495 5.26 0.0255 4.64 –0.0017 –3.75 0.0350 2.30 0 207
FEES 676 0.1348 8.20 0.0615 6.41 –0.0029 –3.61 0.1901 7.15 0 121
TOYS 592 0.0908 4.78 0.0854 7.70 –0.0055 –5.96 0.0549 1.79 0 32
READ 764 0.0922 6.64 0.0347 4.28 0.0018 2.67 0.1006 4.48 0 105
EDUC 288 0.0523 2.82 0.1137 10.51 –0.0041 –4.61 0.1310 4.37 57 2
TOB 368 –0.0036 –0.17 0.0153 1.21 –0.0033 –3.12 –0.1721 –4.92 0 0

214
Chapter 10: Binary choice

Logit model, dependent variable CATBUY


EXPPC x 10–4 SIZE REFAGE COLLEGE
n b2 z b3 z b4 z b5 z
FDHO – – – – – – – – –
FDAW 827 3.6456 5.63 0.6211 3.43 –0.0338 –2.90 0.0141 0.03
HOUS – – – – – – – – –
TELE 858 1.2314 1.83 0.6355 2.10 0.0330 1.83 1.1932 1.46
DOM 454 0.3983 4.09 0.1817 3.37 0.0081 1.85 0.3447 2.34
TEXT 482 0.5406 5.22 0.3071 5.32 –0.0075 –1.68 0.0823 0.55
FURN 329 0.5428 5.44 0.1904 3.46 –0.0173 –3.68 –0.0227 –0.15
MAPP 244 0.4491 4.54 0.2648 4.57 0.0059 1.17 0.0300 0.18
SAPP 467 0.5439 5.30 0.2855 5.05 –0.0049 –1.11 0.0597 0.40
CLOT 847 4.7446 4.68 0.8642 3.16 –0.0213 –1.38 –0.1084 –0.19
FOOT 686 0.6281 4.49 0.3162 4.18 –0.0152 –2.86 –0.2277 –1.22
GASO 797 1.5214 5.18 0.7604 5.20 –0.0084 –1.07 0.2414 0.79
TRIP 309 1.0768 9.02 0.3137 5.22 0.0143 2.80 0.7728 4.74
LOCT 172 –0.2953 –2.31 –0.0294 –0.46 –0.0069 –1.28 0.0788 0.43
HEAL 821 1.1577 3.49 0.3510 2.83 0.0620 5.65 0.9372 2.64
ENT 824 2.6092 4.96 0.9863 4.45 –0.0209 –1.89 1.0246 2.01
FEES 676 1.5529 7.55 0.5275 6.10 –0.0140 –2.43 1.4393 6.24
TOYS 592 0.5087 4.38 0.5351 7.02 –0.0240 –4.85 0.2645 1.54
READ 764 1.8601 6.59 0.4632 4.78 0.0202 2.99 1.1033 3.97
EDUC 288 0.3311 3.21 0.6053 9.17 –0.0283 –5.05 0.7442 4.34
TOB 368 –0.0163 –0.18 0.0637 1.19 –0.0139 –3.09 –0.7260 –4.82

215
20 Elements of econometrics

Probit model, dependent variable CATBUY


EXPPC x 10–4 SIZE REFAGE COLLEGE
n b2 z b3 z b4 z b5 z
FDHO – – – – – – – – –
FDAW 827 1.6988 5.72 0.2951 3.55 –0.0172 –3.03 0.0182 0.09
HOUS – – – – – – – – –
TELE 858 0.5129 1.93 0.2630 2.14 0.0135 1.79 0.5130 1.59
DOM 454 0.2467 4.16 0.1135 3.42 0.0051 1.86 0.2160 2.36
TEXT 482 0.3257 5.32 0.1841 5.44 –0.0046 –1.69 0.0606 0.65
FURN 329 0.3348 5.54 0.1168 3.46 –0.0103 –3.64 –0.0145 –0.15
MAPP 244 0.2770 4.62 0.1628 4.65 0.0035 1.19 0.0174 0.18
SAPP 467 0.3252 5.43 0.1733 5.12 –0.0031 –1.11 0.0423 0.46
CLOT 847 2.0167 4.63 0.4036 3.31 –0.0086 –1.21 0.0428 0.16
FOOT 686 0.3296 4.48 0.1635 4.17 –0.0088 –2.89 –0.1105 –1.04
GASO 797 0.6842 5.25 0.2998 5.08 –0.0065 –1.62 0.1452 0.96
TRIP 309 0.6121 9.63 0.1791 5.05 0.0082 2.73 0.4806 4.98
LOCT 172 –0.1556 –2.31 –0.0141 –0.39 –0.0039 –1.28 0.0448 0.43
HEAL 821 0.4869 3.65 0.1506 2.69 0.0301 5.67 0.4195 2.54
ENT 824 1.3386 5.10 0.4519 4.53 –0.0116 –2.10 0.4932 2.09
FEES 676 0.8299 7.82 0.2806 6.36 –0.0088 –2.66 0.8151 6.59
TOYS 592 0.2849 4.48 0.3091 7.35 –0.0149 –5.08 0.1694 1.67
READ 764 0.7905 6.67 0.2188 4.58 0.0107 2.92 0.5887 4.20
EDUC 288 0.1917 3.11 0.3535 9.51 –0.0168 –5.12 0.4417 4.37
TOB 368 –0.0106 –0.18 0.0391 1.18 –0.0086 –3.10 –0.4477 –4.84

216
Chapter 10: Binary choice

Comparison of marginal effects


n EXPPC4 SIZE REFAGE COLLEGE
FDAW 827 LPM 0.0518** 0.0181** –0.0018** 0.0101
Logit 0.0240** 0.0041** –0.0002** 0.0001
Probit 0.0260** 0.0045** –0.0003** 0.0003
TELE 858 LPM 0.0092 0.0060* 0.0004 0.0123
Logit 0.0078 0.0040* 0.0002 0.0076
Probit 0.0087 0.0044* 0.0002 0.0087
DOM 454 LPM 0.0926** 0.0433** 0.0019 0.0850*
Logit 0.0993** 0.0453** 0.0020 0.0860*
Probit 0.0982** 0.0452** 0.0020 0.0860*
TEXT 482 LPM 0.1179** 0.0690** –0.0019 0.0227
Logit 0.1332** 0.0757** –0.0018 0.0203
Probit 0.1286** 0.0727** –0.0018 0.0239
FURN 329 LPM 0.1202** 0.0419** –0.0036** –0.0050
Logit 0.1265** 0.0444** –0.0040** –0.0053
Probit 0.1266** 0.0441** –0.0039** –0.0055
MAPP 244 LPM 0.0930** 0.0540** 0.0012 0.0049
Logit 0.0893** 0.0526** 0.0012 0.0060
Probit 0.0923** 0.0543** 0.0012 0.0058
SAPP 467 LPM 0.1206** 0.0655** –0.0012 0.0174
Logit 0.1350** 0.0709** –0.0012 0.0148
Probit 0.1291** 0.0688** –0.0012 0.0168
CLOT 847 LPM 0.0316** 0.0121** –0.0008* 0.0028
Logit 0.0071** 0.0013** 0.0000 –0.0002
Probit 0.0063** 0.0013** 0.0000 0.0001
FOOT 686 LPM 0.0838** 0.0444** –0.0028** –0.0283
Logit 0.0969** 0.0488** –0.0023** –0.0351
Probit 0.0913** 0.0453** –0.0024** –0.0306
* significant at 5 per cent level, ** at 1 per cent level, two-tailed tests

217
20 Elements of econometrics

Comparison of marginal effects (continued)


n EXPPC4 SIZE REFAGE COLLEGE
GASO 797 LPM 0.0658** 0.0374** –0.0013* 0.0222
Logit 0.0622** 0.0311** –0.0003 0.0099
Probit 0.0707** 0.0310** –0.0007 0.0150
TRIP 309 LPM 0.2073** 0.0599** 0.0027** 0.1608**
Logit 0.2408** 0.0702** 0.0032** 0.1728**
Probit 0.2243** 0.0656** 0.0030** 0.1761**
LOCT 172 LPM –0.0411* –0.0040 –0.0011 0.0109
Logit –0.0463* –0.0046 –0.0011 0.0124
Probit –0.0430* –0.0039 –0.0011 0.0124
HEAL 821 LPM 0.0375** 0.0162** 0.0030** 0.0466**
Logit 0.0318** 0.0096** 0.0017** 0.0257**
Probit 0.0339** 0.0105** 0.0021** 0.0292*
ENT 824 LPM 0.0495** 0.0255** –0.0017** 0.0350*
Logit 0.0229** 0.0086** –0.0002 0.0090*
Probit 0.0251** 0.0085** –0.0002* 0.0092*
FEES 676 LPM 0.1348** 0.0615** –0.0029** 0.1901**
Logit 0.1765** 0.0600** –0.0016* 0.1636**
Probit 0.1878** 0.0635** –0.0020** 0.1845**
TOYS 592 LPM 0.0908** 0.0854** –0.0055** 0.0549
Logit 0.1029** 0.1083** –0.0049** 0.0535
Probit 0.0974** 0.1057** –0.0051** 0.0579
READ 764 LPM 0.0922** 0.0347** 0.0018** 0.1006**
Logit 0.1084** 0.0270** 0.0012** 0.0643**
Probit 0.1124** 0.0311** 0.0015** 0.0837**
EDUC 288 LPM 0.0523** 0.1137** –0.0041** 0.1310**
Logit 0.0673** 0.1230** –0.0058** 0.1512**
Probit 0.0654** 0.1206** –0.0057** 0.1508**
TOB 368 LPM –0.0036 0.0153 –0.0033** –0.1721**
Logit –0.0040 0.0155 –0.0034** –0.1769**
Probit –0.0042 0.0153 –0.0034** –0.1751**

* significant at 5 per cent level, ** at 1 per cent level, two-tailed tests

A10.2
The finding that the marginal effect of educational attainment was lower
for males than for females over most of the range S ≥ 9 is plausible
because the probability of working is much closer to 1 for males than for
females for S ≥ 9, and hence the possible sensitivity of the participation
rate to S is smaller.
The explanation of the finding that the marginal effect of educational
attainment decreases with educational attainment for both males and
females over the range S ≥ 9 is similar. For both sexes, the greater is S, the
greater is the participation rate, and hence the smaller is the scope for it
being increased by further education.

218
Chapter 10: Binary choice

The OLS estimates of the marginal effect of educational attainment


are given by the slope coefficients and they are very similar to the logit
estimates at the mean, the reason being that most of the observations on S
are confined to the middle part of the sigmoid curve where it is relatively
linear.

A10.3
• Discuss whether the relationships indicated by the probability and
marginal effect curves appear to be plausible.
The probability curve indicates an inverse relationship between
schooling and the probability of being obese. This seems entirely
plausible. The more educated tend to have healthier lifestyles,
including eating habits. Over the relevant range, the marginal effect
falls a little in absolute terms (is less negative) as schooling increases.
This is in keeping with the idea that further schooling may have less
effect on the highly educated than on the less educated (but the
difference is not large).
• Add the probability function and the marginal effect function for the LPM
to the diagram. Explain why you drew them the way you did.

 

 
SUREDELOLW\
SUREDELOLW\RIEHLQJREHVH

 

PDUJLQDOHIIHFW
 

 

 

PDUJLQDOHIIHFW
 

 
                    
\HDUVRIVFKRROLQJ

Figure 10.6
The estimated probability function for the LPM is just the regression
equation and the marginal effect is the coefficient of S. They are shown
as the dashed lines in the diagram.
• The logit model is considered to have several advantages over the LPM.
Explain what these advantages are. Evaluate the importance of the
advantages of the logit model in this particular case.
The disadvantages of the LPM are (1) that it can give nonsense fitted
values (predicted probabilities greater than 1 or less than 0); (2) the
disturbance term in observation i must be equal to either – 1 – F(Zi)
(if the dependent variable is equal to 1) or – F(Zi) (if the dependent
variable is equal to 0) and so it violates the usual assumption that
the disturbance term is normally distributed, although this may not
matter asymptotically; (3) the disturbance term will be heteroscedastic
because Zi is different for different observations; (4) the LPM implicitly
assumes that the marginal effect of each explanatory variable is
constant over its entire range, which is often intuitively unappealing.

219
20 Elements of econometrics

In this case, nonsense predictions are clearly not an issue. The


assumption of a constant marginal effect does not seem to be a
problem either, given the approximate linearity of the logit F (Z ) .
• The LPM is fitted using OLS. Explain how, instead, it might be fitted using
maximum likelihood estimation:
• Write down the probability of being obese for any obese individual, given
Si for that individual, and write down the probability of not being obese
for any non-obese individual, again given Si for that individual.
Obese: p iO = b 1 + b 2 S i ; not obese: S L12   E  E  6 L
• Write down the likelihood function for this sample of 164 obese
individuals and 376 non-obese individuals.

/ E   E  GDWD –S 2
L –S L
12
– E   E  6L –   E   E  6L 
2%(6( 1272%(6( 2%(6( 1272%(6(

• Explain how one would use this function to estimate the parameters.
[Note: You are not expected to attempt to derive the estimators of the
parameters.]
You would use some algorithm to find the values of β1 and β2 that
maximises the function.
• Explain whether your maximum likelihood estimators will be the same or
different from those obtained using least squares.
Least squares involves finding the extremum of a completely different
expression and will therefore lead to different estimators.

A10.4
• Explain how one may derive the marginal effects of the explanatory
variables on the probability of having a child less than 6 in the household,
and calculate for both males and females the marginal effects at the
means of AGE and S.
Since p is a function of Z, and Z is a linear function of the X variables,
the marginal effect of Xj is

∂p dp ∂Z dp
= = bj
∂X j dZ ∂X j dZ

where βj is the coefficient of Xj in the expression for Z. In the case


of probit analysis, p = F(Z) is the cumulative standardised normal
distribution. Hence dp/dZ is just the standardised normal distribution.
For males, this is 0.368 when evaluated at the means. Hence the
marginal effect of AGE is 0.368*–0.137 = –0.050 and that of S is
0.368*0.132 = 0.049. For females the corresponding figures are
0.272*–0.154= –0.042 and 0.272*0.094 = 0.026, respectively. So for
every extra year of age, the probability is reduced by 5.0 per cent for
males and 4.2 per cent for females. For every extra year of schooling,
the probability increases by 4.9 per cent for males and 2.6 per cent for
females.
• Explain whether the signs of the marginal effects are plausible. Explain
whether you would expect the marginal effect of schooling to be higher for
males or for females.

220
Chapter 10: Binary choice

Yes. Given that the cohort is aged 35–42, the respondents have passed
the age at which most adults start families, and the older they are, the
less likely they are to have small children in the household. At the same
time, the more educated the respondent, the more likely he or she is
to have started having a family relatively late, so the positive effect
of schooling is also plausible. However, given the age of the cohort,
it is likely to be weaker for females than for males, given that most
females intending to have families will have started them by this time,
irrespective of their education.
• At a seminar someone asks the researcher whether the marginal effect of
S is significantly different for males and females. The researcher does not
know how to test whether the difference is significant and asks you for
advice. What would you say?
Fit a probit regression for the combined sample, adding a male
intercept dummy and male slope dummies for AGE and S. Test the
coefficient of the slope dummy for S.

A10.5
The Z function will be of the form
Z = β1 + β2A + β3S + β4AS
wS GS w=
so the marginal effects are I = E   E  6  and
w$ G= w$
wS GS w=
I = E   E  $  Both factors depend on the values of A
w6 G= w6
and/or S, but the marginal effects could be evaluated for a representative
individual using the mean values of A and S in the sample.

A10.6
• Discuss the conclusions one may reach, given the probit output and the
table, commenting on their plausibility.
Being male has a small but highly significant negative effect. This
is plausible because males tend to marry later than females and the
cohort is still relatively young.
Age has a highly significant positive effect, again plausible because
older people are more likely to have married than younger people.
Schooling has no apparent effect at all. It is not obvious whether this is
plausible.
Cognitive ability has a highly significant positive effect. Again, it is not
obvious whether this is plausible.
• The researcher considers including CHILD, a dummy variable defined to
be 1 if the respondent had children, and 0 otherwise, as an explanatory
variable. When she does this, its z-statistic is 33.65 and its marginal effect
0.5685. Discuss these findings.
Obviously one would expect a high positive correlation between being
married and having children and this would account for the huge and
highly significant coefficient. However getting married and having
children are often a joint decision, and accordingly it is simplistic
to suppose that one characteristic is a determinant of the other. The
finding should not be taken at face value.

221
20 Elements of econometrics

A10.7
Determine the maximum likelihood estimate of α, assuming that β is known.
The loglikelihood function is

log L(a b , T1 ,..., Tn ) = n log a − a ∑ (Ti − b )


Setting the first derivative with respect to α equal to zero, we have


n
− ∑ (Ti − b ) = 0

and hence
1
aˆ = .
T −b

The second derivative is − n / aˆ 2 , which is negative, confirming we have


maximised the loglikelihood function.

A10.8
From the solution to Exercise 10.14, the log-likelihood function for p is
log L( p ) = m log p + (n − m ) log(1 − p ) .

Thus the LR statistic is


If m = 40 and n = 100, the LR statistic for H0: p = 0.5 is



.

We would reject the null hypothesis at the 5 per cent level (critical value
of chi-squared with one degree of freedom 3.84) but not at the 1 per cent
level (critical value 6.64).

A10.9
The first derivative of the log-likelihood function is
d log L( p ) m n − m
= − =0
dp p 1− p

and the second differential is


d .
d

Evaluated at p = m/n,

d 2 log L( p ) n2 n−m 2 1 1  n3
= − − = − n  +  = − .
dp 2 m  m 2 m n−m m(n − m )
1 − 
 n

222
Chapter 10: Binary choice

The variance of the ML estimate is given by


 d 2 log L( p )  m(n − m )
−1 −1
 n3 
 − 
 = 
  = .
 dp 2
  m (n − m )  n3

The Wald statistic is therefore


m 
2
m 
2

 − p0   − p0 
 n  = n  .
m(n − m ) 1 m n−m
n3 n n n

Given the data, this is equal to      



u  u 

Under the null hypothesis this has a chi-squared distribution with one
degree of freedom, and so the conclusion is the same as in Exercise A.8.

223
20 Elements of econometrics

Notes

224

You might also like