Chapter 10: Binary Choice and Limited Dependent Variable Models, and Maximum Likelihood Estimation
Chapter 10: Binary Choice and Limited Dependent Variable Models, and Maximum Likelihood Estimation
Overview
The first part of this chapter describes the linear probability model,
logit analysis, and probit analysis, three techniques for fitting regression
models where the dependent variable is a qualitative characteristic. Next
it discusses tobit analysis, a censored regression model fitted using a
combination of linear regression analysis and probit analysis. This leads
to sample selection models and heckman analysis. The second part of the
chapter introduces maximum likelihood estimation, the method used to fit
all of these models except the linear probability model.
Learning outcomes
After working through the corresponding chapter in the textbook, studying
the corresponding slideshows, and doing the starred exercises in the
textbook and the additional exercises in this guide, you should be able to:
• describe the linear probability model and explain its defects
• describe logit analysis, giving the mathematical specification
• describe probit analysis, including the mathematical specification
• calculate marginal effects in logit and probit analysis
• explain why OLS yields biased estimates when applied to a sample
with censored observations, even when the censored observations are
deleted
• explain the problem of sample selection bias and describe how the
heckman procedure may provide a solution to it (in general terms,
without mathematical detail)
• explain the principle underlying maximum likelihood estimation
• apply maximum likelihood estimation from first principles in simple
models.
Further material
Limiting distributions and the properties of maximum likelihood
estimators
Provided that weak regularity conditions involving the differentiability of
the likelihood function are satisfied, maximum likelihood (ML) estimators
have the following attractive properties in large samples:
(1) They are consistent.
(2) They are asymptotically normally distributed.
(3) They are asymptotically efficient.
The meaning of the first property is familiar. It implies that the probability
density function of the estimator collapses to a spike at the true value. This
being the case, what can the other assertions mean? If the distribution
195
20 Elements of econometrics
becomes degenerate as the sample size becomes very large, how can it be
described as having a normal distribution? And how can it be described as
being efficient, when its variance, and the variance of any other consistent
estimator, tend to zero?
To discuss the last two properties, we consider what is known as the
limiting distribution of an estimator. This is the distribution of the
estimator when the divergence between it and its population mean is
multiplied by n . If we do this, the distribution of a typical estimator
remains nondegenerate as n becomes large, and this enables us to say
meaningful things about its shape and to make comparisons with the
distributions of other estimators (also multiplied by n ).
To put this mathematically, suppose that there is one parameter of interest,
θ, and that θˆ is its ML estimator. Then (2) says that
( ) (
n θˆ − θ ~ N 0, σ 2 )
for some variance σ 2. (3) says that, given any other consistent estimator
~ ~
( )
θ , n θ − θ cannot have a smaller variance.
L θˆ
log
() ()
= log L θˆ − log L(θ 0 )
L(θ )
0
should not be significantly different from zero. In that it involves a
comparison of the measures of goodness of fit for unrestricted and
restricted versions of the model, the LR test is similar to an F test.
Under the null hypothesis, it can be shown that in large samples the test
statistic
/5
ORJ / TÖ ORJ /T
196
Chapter 10: Binary choice
The log–likelihood is
1 1 n
log L(µˆ | X 1 ,..., X n ) = n log − ∑ ( X i − µˆ )2
2
2π i =1
∂σ
=− + 3
σ σ
∑ (X
i =1
i − µ) = 0
2
197
20 Elements of econometrics
n
If one imposes the restriction μ = μ0 , we have RSS R = ∑ ( X i − µ 0 ) and
2
the F statistic i =1
( )
n n
∑ (X − µ0 ) − ∑ X i − X
2 2
i
F (1, n − 1) = i =1 i =1
.
n
(
∑ X i − X ) (n − 1)
2
i =1
Wald tests
Wald tests are based on the same principle as t tests in that they evaluate
whether the discrepancy between the maximum likelihood estimate θ and
the hypothetical value θ0 is significant, taking account of the variance in
the estimate. The test statistic for the null hypothesis H 0 : θˆ − θ 0 = 0 is
(θˆ − θ ) 0
2
σˆ θ2ˆ
198
Chapter 10: Binary choice
Examples
We will use the same examples as for the LR test, first, assuming that σ = 1
and then assuming that it has to be estimated along with μ. In the first case
the log-likelihood function is
1 1 n
log L(µ | X 1 ,..., X n ) = n log − ∑ ( X i − µ )2 .
2π 2 i =1
n
The first differential is ∑ (X
i =1
i − µ ) and the second is – n, so the estimate of
the variance is
1 2
. The Wald test statistic is therefore n X − µ 0 . ( )
n
In the second example, where σ was unknown, the concentrated log-
likelihood function is
1
1 1 n 2 n
log L(µ | X 1 ,..., X n ) = n log − n log ∑ ( X i − µ )2 −
2π n i =1 2
1 n 1 n n n
= n log − log − log ∑ ( X i − µ )2 − .
2 n 2
2π i =1 2
dlogL
∑ (X i − µ)
=n i =1 .
dµ n
∑ ( X i − µ )2
i =1
d 2 log L i =1 i =1 i =1 .
=n
dµ 2 n
2
2
∑ ( X i − µ )
i =1
n
Evaluated at the ML estimator µ̂ = X , ∑ (X
i =1
i − µ ) = 0 and hence
d 2 log L n2
=− n
dµ 2
∑ (X
i =1
i − µ)
2
σˆ 2
giving an estimated variance , given
n
σˆ 2 =
1 n
(
∑ Xi − X
n i =1
) 2
199
20 Elements of econometrics
with u assumed to be iid N(0, σ2), the log-likelihood function for the
parameters is
2
1 1 n k
log L(β 1 ,..., β k , σ | Yi , X i , i = 1,...,n) = n log −
2σ 2 ∑ Yi − β 1 − ∑ β j X ij .
σ 2π i =1 j =2
where
2
n k
Z = ∑ Yi − β 1 − ∑ β j X ij .
i =1 j =2
The estimates of the β parameters affect only Z. To maximise the log-
likelihood, they should be chosen so as to minimise Z, and of course this
is exactly what one is doing when one is fitting a least squares regression.
Hence Z = RSS. It remains to determine the ML estimate of σ. Taking
the partial differential with respect to σ, we obtain one of the first-order
conditions for a maximum:
∂ log L(b 1 ,..., b k , σ ) n 1
= − + 3 RSS = 0 .
∂σ σ σ
n
log LR = − (log RSS R + 1 + log 2π − log n )
2
where RSS R ≥ RSS U and hence log LR ≤ log LU . The LR statistic for a test
of the restriction is therefore
RSS R
2(log LU − LR ) = n(log RSS R − log RSS U ) = n log .
RSS U
Additional exercises
A10.1
What factors affect the decision to make a purchase of your category of
expenditure in the CES data set?
Define a new variable CATBUY that is equal to 1 if the household makes
any purchase of your category and 0 if it makes no purchase at all. Regress
CATBUY on EXPPC, SIZE, REFAGE, and COLLEGE (as defined in Exercise
A5.6) using: (1) the linear probability model, (2) the logit model, and (3)
the probit model. Calculate the marginal effects at the mean of EXPPC,
SIZE, REFAGE, and COLLEGE for the logit and probit models and compare
them with the coefficients of the linear probability model.
A10.2
Logit analysis was used to relate the event of a respondent working
(WORKING, defined to be 1 if the respondent was working, and 0
otherwise) to the respondent’s educational attainment (S, defined as
the highest grade completed) using 1994 data from the US National
Longitudinal Survey of Youth. In this year the respondents were aged
29–36 and a substantial number of females had given up work to raise a
family. The analysis was undertaken for females and males separately, with
the output shown below (first females, then males, with iteration messages
deleted):
201
20 Elements of econometrics
ORJLW:25.,1*6LI0$/(
/RJLW(VWLPDWHV1XPEHURIREV
FKL
3URE!FKL
/RJ/LNHOLKRRG 3VHXGR5
:25.,1*_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@
6_
BFRQV_
ORJLW:25.,1*6LI0$/(
/RJLW(VWLPDWHV1XPEHURIREV
FKL
3URE!FKL
/RJ/LNHOLKRRG 3VHXGR5
:25.,1*_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@
6_
BFRQV_
95 per cent of the respondents had S in the range 9–18 years and
the mean value of S was 13.3 and 13.2 years for females and males,
respectively.
From the logit analysis, the marginal effect of S on the probability of
working at the mean was estimated to be 0.030 and 0.020 for females
and males, respectively. Ordinary least squares regressions of WORKING
on S yielded slope coefficients of 0.029 and 0.020 for females and males,
respectively.
As can be seen from the second figure below, the marginal effect of
educational attainment was lower for males than for females over most of
the range S ≥ 9. Discuss the plausibility of this finding.
As can also be seen from the second figure, the marginal effect of
educational attainment decreases with educational attainment for both
males and females over the range S ≥ 9. Discuss the plausibility of this
finding.
Compare the estimates of the marginal effect of educational attainment
using logit analysis with those obtained using ordinary least squares.
PDOHV
SUREDELOLW\
IHPDOHV
6
Figure 10.1 Probability of working, as a function of S
202
Chapter 10: Binary choice
PDOHV
PDUJLQDOHIIHFW
IHPDOHV
6
A10.3
A researcher has data on weight, height, and schooling for 540
respondents in the US National Longitudinal Survey of Youth for the year
2002. Using the data on weight and height, he computes the body mass
index for each individual. If the body mass index is 30 or greater, the
individual is defined to be obese. He defines a binary variable, OBESE,
that is equal to 1 for the 164 obese individuals and 0 for the other 376.
He wishes to investigate whether obesity is related to schooling and fits an
ordinary least squares (OLS) regression of OBESE on S, years of schooling,
with the following result (t statistics in parentheses):
OBESE ˆ = 0.595 – 0.021 S (1)
(5.30) (2.63)
This is described as the linear probability model (LPM). He also fits
1
the logit model F (Z ) = , where F(Z) is the probability of being
1 + e −Z
obese and Z = b 1 + b 2 S , with the following result (again, t statistics in
parentheses):
Ẑ = 0.588 – 0.105 S (2)
(1.07) (2.60)
The figure below shows the probability of being obese and the marginal
effect of schooling as a function of S, given the logit regression. Most
(492 out of 540) of the individuals in the sample had 12 to 18 years of
schooling.
203
20 Elements of econometrics
SUREDELOLW\
SUREDELOLW\RIEHLQJREHVH
PDUJLQDOHIIHFW
PDUJLQDOHIIHFW
\HDUVRIVFKRROLQJ
Figure 10.3
• Discuss whether the relationships indicated by the probability and
marginal effect curves appear to be plausible.
• Add the probability function and the marginal effect function for the
LPM to the diagram. Explain why you drew them the way you did.
• The logit model is considered to have several advantages over the LPM.
Explain what these advantages are. Evaluate the importance of the
advantages of the logit model in this particular case.
• The LPM is fitted using OLS. Explain how, instead, it might be fitted
using maximum likelihood estimation:
Write down the probability of being obese for any obese individual,
given Si for that individual, and write down the probability of not being
obese for any non-obese individual, again given Si for that individual.
Write down the likelihood function for this sample of 164 obese
individuals and 376 non-obese individuals.
Explain how one would use this function to estimate the
parameters. [Note: You are not expected to attempt to derive the
estimators of the parameters.]
Explain whether your maximum likelihood estimators will be the
same or different from those obtained using least squares.
A10.4
A researcher interested in the relationship between parenting, age and
schooling has data for the year 2000 for a sample of 1,167 married males and
870 married females aged 35 to 42 in the National Longitudinal Survey of
Youth. In particular, she is interested in how the presence of young children
in the household is related to the age and education of the respondent.
She defines CHILDL6 to be 1 if there is a child less than 6 years old in the
household and 0 otherwise and regresses it on AGE, age, and S, years of
schooling, for males and females separately using probit analysis. Defining the
probability of having a child less than 6 in the household to be p = F(Z) where
Z = β1 + β2AGE + β3S
she obtains the results shown in the table below (asymptotic standard
errors in parentheses).
204
Chapter 10: Binary choice
males females
–0.137 –0.154
AGE
(0.018) (0.023)
0.132 0.094
S
(0.015) (0.020)
0.194 0.547
constant
(0.358) (0.492)
Z –0.399 –0.874
f (Z ) 0.368 0.272
where AGE and S are the mean values of AGE and S and b1, b2, and b3
are the probit coefficients in the corresponding regression, and she further
calculates
1
1
− Z2
f (Z ) = e 2
2π
G)
where I = The values of Z and f (Z ) are shown in the table.
G=
• Explain how one may derive the marginal effects of the explanatory
variables on the probability of having a child less than 6 in the
household, and calculate for both males and females the marginal
effects at the means of AGE and S.
• Explain whether the signs of the marginal effects are plausible. Explain
whether you would expect the marginal effect of schooling to be higher
for males or for females.
• At a seminar someone asks the researcher whether the marginal effect
of S is significantly different for males and females. The researcher
does not know how to test whether the difference is significant and
asks you for advice. What would you say?
A10.5
A health economist investigating the relationship between smoking,
schooling, and age, defines a dummy variable D to be equal to 1 for
smokers and 0 for nonsmokers. She hypothesises that the effects of
schooling and age are not independent of each other and defines an
interactive term schooling*age. She includes this as an explanatory
variable in the probit regression. Explain how this would affect the
estimation of the marginal effects of schooling and age.
A10.6
A researcher has data on the following variables for 5,061 respondents in
the US National Longitudinal Survey of Youth:
• MARRIED, marital status in 1994, defined to be 1 if the respondent was
married with spouse present and 0 otherwise;
• MALE, defined to be 1 if the respondent was male and 0 if female;
• AGE in 1994 (the range being 29–37);
• S, years of schooling, defined as highest grade completed, and
• ASVABC, score on a test of cognitive ability, scaled so as to have mean
50 and standard deviation 10.
205
20 Elements of econometrics
She uses probit analysis to regress MARRIED on the other variables, with
the output shown:
SURELW0$55,('0$/($*(6$69$%&
3URELWHVWLPDWHV1XPEHURIREV
/5FKL
3URE!FKL
/RJOLNHOLKRRG 3VHXGR5
0$55,('_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@
0$/(_
$*(_
6_
$69$%&_
BFRQV_
Variable Mean Marginal effect
MALE 0.4841 -0.0467
AGE 32.52 0.0110
S 13.31 -0.0007
ASVABC 48.94 0.0097
A10.7
Suppose that the time, t, required to complete a certain process has
probability density function
f (t ) = ae −a (t − b ) with t > β > 0
and you have a sample of n observations with times T1, ..., Tn.
Determine the maximum likelihood estimate of α, assuming that β is
known.
A10.8
In Exercise 10.14 in the textbook, an event could occur with probability
p. Given that the event occurred m times in a sample of n observations,
the exercise required demonstrating that m/n was the ML estimator of p.
Derive the LR statistic for the null hypothesis p = p0. If m = 40 and n =
100, test the null hypothesis p = 0.5.
A10.9
For the variable in Exercise A10.8, derive the Wald statistic and test the
null hypothesis p = 0.5.
206
Chapter 10: Binary choice
UHJ%$&+$69$%&
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
%$&+_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
$69$%&_
BFRQV_
Answer:
The slope coefficient indicates that the probability of earning a bachelor’s
degree rises by 2.4 per cent for every additional point on the ASVABC
score. While this may be realistic for a range of values of ASVABC, it is
not for very low ones. Very few of those with scores in the low end of
the spectrum earned bachelor’s degrees and variations in the ASVABC
score would be unlikely to have an effect on the probability. The intercept
literally indicates that an individual with a 0 score would have a minus
92.3 per cent probability of earning a bachelor’s degree. Given the way
that ASVABC was constructed, a score of 0 was in fact impossible. However
the linear probability model predicts nonsense negative probabilities for all
those with scores of 39 or less, of whom there were many in the sample.
The linear probability model also suffers from the problem that the
standard errors and t and F tests are invalid because the disturbance
term does not have a normal distribution. Its distribution is not even
continuous, consisting of only two possible values for each value of
ASVABC.
10.3
The output shows the results of fitting a logit regression to the data set
described in Exercise 10.1 (with four of the iteration messages deleted).
26.7 per cent of the respondents earned bachelor’s degrees.
207
20 Elements of econometrics
ORJLW%$&+$69$%&
,WHUDWLRQORJOLNHOLKRRG
,WHUDWLRQORJOLNHOLKRRG
/RJLVWLFUHJUHVVLRQ1XPEHURIREV
/5FKL
3URE!FKL
/RJOLNHOLKRRG 3VHXGR5
%$&+_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@
$69$%&_
BFRQV_
FXPXODWLYHHIIHFW
PDUJLQDOHIIHFW
$69$%&
Figure 10.4
• With reference to the figure, discuss the variation of the marginal effect
of the ASVABC score implicit in the logit regression.
• Sketch the probability and marginal effect diagrams for the OLS
regression in Exercise 10.1 and compare them with those for the logit
regression.
Answer:
In Exercise 10.1 we were told that the mean value of ASVABC in the
sample was 50.2. From the curve for the cumulative probability in the
figure it can be seen that the probability of graduating from college for
respondents with that score is only about 20 per cent. The question states
that most respondents had scores in the range 40–60. It can be seen that
at the top of that range the probability has increased substantially, being
about 60 per cent. Looking at the curve for the marginal probability,
it can be seen that the marginal effect is greatest in the range 50–65,
and of course this is the range with the steepest slope of the cumulative
probability. Exercise 10.1 states that the highest score was 65, where the
probability would be about 90 per cent.
For the linear probability model in Exercise 10.1, the counterpart to the
cumulative probability curve in the figure is a straight line using the
208
Chapter 10: Binary choice
FXPXODWLYHHIIHFW
PDUJLQDOHIIHFW
$69$%&
Figure 10.5
10.7
The following probit regression, with iteration messages deleted, was
fitted using 2,726 observations on females in the National Longitudinal
Survey of Youth using the LFP data set described in Appendix B. The data
are for 1994, when the respondents were aged 29 to 36 and many of them
were raising young families.
SURELW:25.,1*6$*(&+,/'/&+,/'/0$55,('(7+%/$&.(7++,63LI0$/(
,WHUDWLRQORJOLNHOLKRRG
,WHUDWLRQORJOLNHOLKRRG
,WHUDWLRQORJOLNHOLKRRG
,WHUDWLRQORJOLNHOLKRRG
3URELWHVWLPDWHV1XPEHURIREV
/5FKL
3URE!FKL
/RJOLNHOLKRRG 3VHXGR5
:25.,1*_&RHI6WG(UU]3!_]_>&RQI,QWHUYDO@
6_
$*(_
&+,/'/_
&+,/'/_
0$55,('_
(7+%/$&._
(7++,63_
BFRQV_
WORKING is a binary variable equal to 1 if the respondent was working
in 1994, 0 otherwise. CHILDL06 is a dummy variable equal to 1 if there
was a child aged less than 6 in the household, 0 otherwise. CHILDL16 is
a dummy variable equal to 1 if there was a child aged less than 16, but
209
20 Elements of econometrics
10.9
Using the CES data set, perform a tobit regression of expenditure on your
commodity on total household expenditure per capita and household size, and
compare the slope coefficients with those obtained in OLS regressions including
and excluding observations with 0 expenditure on your commodity.
Answer:
The table gives the number of unconstrained observations for each category
of expenditure and the slope coefficients and standard errors from an OLS
regression using the unconstrained observations only, the OLS regression
using all the observations, and the tobit regression. As may be expected, the
discrepancies between the tobit estimates and the OLS estimates are greatest
for those categories with the largest numbers of constrained observations.
In the case of categories such as FDHO, SHEL, TELE, and CLOT, there is very
210 little difference between the tobit and the OLS estimates.
Chapter 10: Binary choice
211
20 Elements of econometrics
10.12
Show that the tobit model may be regarded as a special case of a selection
bias model.
Answer:
The selection bias model may be written
10.14
An event is hypothesised to occur with probability p. In a sample of
n observations, it occurred m times. Demonstrate that the maximum
likelihood estimator of p is m/n.
Answer:
In each observation where the event did occur, the probability was p. In
each observation where it did not occur, the probability was (1 – p). Since
there were m of the former and n – m of the latter, the joint probability
was p m (1 − p) n − m . Reinterpreting this as a function of p, given m and n, the
log-likelihood function for p is
log L( p ) = m log p + (n − m ) log(1 − p ) .
212
Chapter 10: Binary choice
10.18
Returning to the example of the random variable X with unknown mean
μ and variance σ2, the log-likelihood for a sample of n observations was
given by equation (10.34):
n n 1 1 1
log L = − log 2π − log σ 2 + 2 − ( X 1 − µ ) 2 − ... − ( X n − µ ) 2 .
2 2 σ 2 2
The first-order condition for μ produced the ML estimator of μ and the first
order condition for σ then yielded the ML estimator for σ. Often, the variance
is treated as the primary dispersion parameter, rather than the standard
deviation. Show that such a treatment yields the same results in the present
case. Treat σ 2 as a parameter, differentiate log L with respect to it, and solve.
Answer:
∂ log L n 1 1 1 2
− ( X 1 − µ ) − ... − ( X n − µ ) = 0.
2
=− −
∂σ 2 2σ 2 σ 4 2 2
Hence
σ2 =
1
n
(
( X 1 − µ ) 2 + ... + ( X n − µ ) 2 )
as before. The ML estimator of μ is X as before.
10.19
In Exercise 10.4, log L0 is –1485.62. Compute the pseudo-R2 and confirm
that it is equal to that reported in the output.
Answer:
As defined in equation (10.43),
log L −1403.0835
pseudo-R2 = 1 – =1– = 0.0556,
log L0 − 1485.6248
as appears in the output.
10.20
In Exercise 10.4, compute the likelihood ratio statistic 2(log L – log L0),
confirm that it is equal to that reported in the output, and perform the
likelihood ratio test.
Answer:
The likelihood ratio statistic is 2(–1403.0835 + 1485.6248) = 165.08,
as printed in the output. Under the null hypothesis that the coefficients
of the explanatory variables are all jointly equal to 0, this is distributed
as a chi-squared statistic with degrees of freedom equal to the number of
explanatory variables, in this case 7. The critical value of chi-squared at
the 0.1 per cent significance level with 7 degrees of freedom is 24.32, and
so we reject the null hypothesis at that level.
213
20 Elements of econometrics
the commodities, the coefficients being similar to the logit and probit
marginal effects and the t statistics being of the same order of magnitude
as the z statistics for the logit and probit. However for those categories
of expenditure where most households made purchases, and the sample
was therefore greatly imbalanced, the linear probability model gave very
different results, as might be expected.
The total expenditure of the household and the size of the household were
both highly significant factors in the decision to make a purchase for all
the categories of expenditure except TELE, LOC and TOB. In the case of
TELE, only 11 households did not make a purchase, the reasons apparently
being non-economic. LOCT is on the verge of being an inferior good and
for that reason is not sensitive to total expenditure. TOB is well-known not
to be sensitive to total expenditure.
Age was a positive influence in the case of TRIP, HEAL, and READ and a
negative one for FDAW, FURN, FOOT, TOYS, EDUC, and TOB.
A college education was a positive influence for TRIP, HEAL, READ and
EDUC and a negative one for TOB.
Most of these effects seem plausible with simple explanations.
214
Chapter 10: Binary choice
215
20 Elements of econometrics
216
Chapter 10: Binary choice
217
20 Elements of econometrics
A10.2
The finding that the marginal effect of educational attainment was lower
for males than for females over most of the range S ≥ 9 is plausible
because the probability of working is much closer to 1 for males than for
females for S ≥ 9, and hence the possible sensitivity of the participation
rate to S is smaller.
The explanation of the finding that the marginal effect of educational
attainment decreases with educational attainment for both males and
females over the range S ≥ 9 is similar. For both sexes, the greater is S, the
greater is the participation rate, and hence the smaller is the scope for it
being increased by further education.
218
Chapter 10: Binary choice
A10.3
• Discuss whether the relationships indicated by the probability and
marginal effect curves appear to be plausible.
The probability curve indicates an inverse relationship between
schooling and the probability of being obese. This seems entirely
plausible. The more educated tend to have healthier lifestyles,
including eating habits. Over the relevant range, the marginal effect
falls a little in absolute terms (is less negative) as schooling increases.
This is in keeping with the idea that further schooling may have less
effect on the highly educated than on the less educated (but the
difference is not large).
• Add the probability function and the marginal effect function for the LPM
to the diagram. Explain why you drew them the way you did.
SUREDELOLW\
SUREDELOLW\RIEHLQJREHVH
PDUJLQDOHIIHFW
PDUJLQDOHIIHFW
\HDUVRIVFKRROLQJ
Figure 10.6
The estimated probability function for the LPM is just the regression
equation and the marginal effect is the coefficient of S. They are shown
as the dashed lines in the diagram.
• The logit model is considered to have several advantages over the LPM.
Explain what these advantages are. Evaluate the importance of the
advantages of the logit model in this particular case.
The disadvantages of the LPM are (1) that it can give nonsense fitted
values (predicted probabilities greater than 1 or less than 0); (2) the
disturbance term in observation i must be equal to either – 1 – F(Zi)
(if the dependent variable is equal to 1) or – F(Zi) (if the dependent
variable is equal to 0) and so it violates the usual assumption that
the disturbance term is normally distributed, although this may not
matter asymptotically; (3) the disturbance term will be heteroscedastic
because Zi is different for different observations; (4) the LPM implicitly
assumes that the marginal effect of each explanatory variable is
constant over its entire range, which is often intuitively unappealing.
219
20 Elements of econometrics
/E E GDWD S 2
L S L
12
E E 6L E E 6L
2%(6( 1272%(6( 2%(6( 1272%(6(
• Explain how one would use this function to estimate the parameters.
[Note: You are not expected to attempt to derive the estimators of the
parameters.]
You would use some algorithm to find the values of β1 and β2 that
maximises the function.
• Explain whether your maximum likelihood estimators will be the same or
different from those obtained using least squares.
Least squares involves finding the extremum of a completely different
expression and will therefore lead to different estimators.
A10.4
• Explain how one may derive the marginal effects of the explanatory
variables on the probability of having a child less than 6 in the household,
and calculate for both males and females the marginal effects at the
means of AGE and S.
Since p is a function of Z, and Z is a linear function of the X variables,
the marginal effect of Xj is
∂p dp ∂Z dp
= = bj
∂X j dZ ∂X j dZ
220
Chapter 10: Binary choice
Yes. Given that the cohort is aged 35–42, the respondents have passed
the age at which most adults start families, and the older they are, the
less likely they are to have small children in the household. At the same
time, the more educated the respondent, the more likely he or she is
to have started having a family relatively late, so the positive effect
of schooling is also plausible. However, given the age of the cohort,
it is likely to be weaker for females than for males, given that most
females intending to have families will have started them by this time,
irrespective of their education.
• At a seminar someone asks the researcher whether the marginal effect of
S is significantly different for males and females. The researcher does not
know how to test whether the difference is significant and asks you for
advice. What would you say?
Fit a probit regression for the combined sample, adding a male
intercept dummy and male slope dummies for AGE and S. Test the
coefficient of the slope dummy for S.
A10.5
The Z function will be of the form
Z = β1 + β2A + β3S + β4AS
wS GS w=
so the marginal effects are I = E E 6 and
w$ G= w$
wS GS w=
I = E E $ Both factors depend on the values of A
w6 G= w6
and/or S, but the marginal effects could be evaluated for a representative
individual using the mean values of A and S in the sample.
A10.6
• Discuss the conclusions one may reach, given the probit output and the
table, commenting on their plausibility.
Being male has a small but highly significant negative effect. This
is plausible because males tend to marry later than females and the
cohort is still relatively young.
Age has a highly significant positive effect, again plausible because
older people are more likely to have married than younger people.
Schooling has no apparent effect at all. It is not obvious whether this is
plausible.
Cognitive ability has a highly significant positive effect. Again, it is not
obvious whether this is plausible.
• The researcher considers including CHILD, a dummy variable defined to
be 1 if the respondent had children, and 0 otherwise, as an explanatory
variable. When she does this, its z-statistic is 33.65 and its marginal effect
0.5685. Discuss these findings.
Obviously one would expect a high positive correlation between being
married and having children and this would account for the huge and
highly significant coefficient. However getting married and having
children are often a joint decision, and accordingly it is simplistic
to suppose that one characteristic is a determinant of the other. The
finding should not be taken at face value.
221
20 Elements of econometrics
A10.7
Determine the maximum likelihood estimate of α, assuming that β is known.
The loglikelihood function is
A10.8
From the solution to Exercise 10.14, the log-likelihood function for p is
log L( p ) = m log p + (n − m ) log(1 − p ) .
We would reject the null hypothesis at the 5 per cent level (critical value
of chi-squared with one degree of freedom 3.84) but not at the 1 per cent
level (critical value 6.64).
A10.9
The first derivative of the log-likelihood function is
d log L( p ) m n − m
= − =0
dp p 1− p
Evaluated at p = m/n,
d 2 log L( p ) n2 n−m 2 1 1 n3
= − − = − n + = − .
dp 2 m m 2 m n−m m(n − m )
1 −
n
222
Chapter 10: Binary choice
− p0 − p0
n = n .
m(n − m ) 1 m n−m
n3 n n n
223
20 Elements of econometrics
Notes
224