0% found this document useful (0 votes)
8 views

2A.3 Lecture Slides20 LDV 1

Uploaded by

Uti Lities
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

2A.3 Lecture Slides20 LDV 1

Uploaded by

Uti Lities
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Part IIA Paper 3 Econometrics

Lecture 20:
Limited Dependent Variables: Probit model

Oleg I. Kitov
[email protected]

Faculty of Economics and Selwyn College

Lent Term 2022

1/21
Lecture outline

I Binary outcome variables.


I Latent variable model.
I Linear probability model.
I Probit model.
I Marginal effect in probit model.
I Maximum likelihood estimator of probit model.
I Testing hypotheses in probit model.

2/21
Binary outcome variable
I Previously, we worked with continuous outcome variables Yi ∈ R.
I Now, we would like to model a binary outcome variable Yi ∈ {0, 1}.
I Yi is a Bernoulli random variable: an event either happens or it doesn’t.
I Yi ∼ Ber (pi ) where pi = P (Yi = 1) and 1 − pi = P (Yi = 0).
I Note that pi can vary across individuals i = 1, . . . , n.
I Recall that E [Yi ] = pi and Var (Yi ) = pi (1 − pi ), heteroskedasticity?
I Example: suppose Yi is a vote by individual i in the Brexit referendum:

1 if i voted Brexit
Yi =
0 if i voted Remain

I Want to model Yi , or in fact P (Yi = 1), using explanatory variables:


- inci : yearly gross income in thousands of pounds;
- educi : educational attainment in years;
- femi : dummy variable equal to one if i is female. 3/21
Latent variable model
I Want to model probability of Yi = 1 conditional on explanatory variables:

pi = P (Yi = 1 | inci , educi , femi )

I Suppose there is a latent (unobserved) continuous variable Yi∗


(intensity of preference for Brexit) which gives rise to the observed vote Yi :

1 if Yi∗ > 0
Yi =
0 if Yi∗ ≤ 0

I Suppose that the data generating process for Yi∗ is linear:

Yi∗ = β0 + β1 inci + β2 educi + β3 femi + ui = X|i β + ui

where
| |
Xi = (1, inci , educi , femi ) , β = (β0 , β1 , β2 , β3 )
4/21
Model for conditional Yi
I It does not make much sense to model a binary outcome Yi directly.
I Instead, we want to model conditional probability pi = P (Yi = 1 | Xi ).
I We know the linear model for the latent variable Yi∗ = X|i β + ui .
I We know that the error term ui is a random variable.
I Link between conditional probability pi and the linear model for Yi∗ :

P (Yi = 1 | Xi ) = P (Yi∗ > 0 | Xi )


= P (X|i β + ui > 0 | Xi )
= P (ui > −X|i β | Xi )
= 1 − P (ui ≤ −X|i β | Xi )

I This is a conditional probabilistic model for the binary variable Yi .


I It is equivalent to the probability model for the error term ui .
I If we know the distribution of ui , we will know P (Yi = 1 | Xi ).
5/21
Link function

I To compute P (Yi = 1 | Xi ) need the cumulative distribution of ui .


I Denote the cdf of ui by Fu (x ), no particular assumptions on Fu yet.
I Write the binary response model as

P (Yi = 1 | Xi ) = 1 − P (ui ≤ −X|i β | Xi ) = 1 − Fu (−X|i β) .

I We call Fu (x ) the link function, as it literally links the probability of Yi


with the probability of ui , conditional on explanatory variables Xi .
n
I We observe a sample of realizations {(Yi , Xi )}i=1 and want to estimate
β0 , β1 , β2 , β3 that determine the dependence of P (Yi = 1 | Xi ) on Xi .
I We can only do that once we assume a particular distribution for ui !
I Assumptions about the distribution of ui give rise to specific models:
- probit model assumes ui are drawn from a standard normal distribution Φ (·);
- logit model assumes ui are drawn form a logistic distribution Λ (·).
6/21
Linear probability model
I Before considering non-linear binary response models, let’s look at

Yi = X|i β + ui = β0 + β1 inci + β2 educi + β3 femi + ui

I This is a linear probability model, but the outcome is Yi , not probability.


I The coefficients β0 , β1 , β2 , β3 can be estimated with OLS in the usual way.
I Predictions are not bounded between zero and one, so that interpretation
of the estimation results and prediction values are potentially meaningless:

Ŷi = β̂0 + β̂1 inci + β̂2 educi + β̂3 femi ∈


/ [0, 1]

I The marginal (partial) effect of the regressor inc is given by:

∂Yi
MEinc = = β1
∂inc

I The marginal effect is equal to β1 and is constant for all individuals i. 7/21
Probit model
I Assume the errors are standard normal, ui ∼ iid N (0, 1), with pdf and cdf:
Z x
1 2 1 2
φ (x ) = √ e −x /2 , Φ (x ) = √ e −s /2 ds
2π −∞ 2π

I The link function is given by the standard normal cdf, Fu (x ) = Φ (x ):

P (Yi = 1 | Xi ) = 1 − Fu (−X|i β) = 1 − Φ (−X|i β) = Φ (X|i β)

I Probit model: the model for P (Yi = 1 | Xi ) when ui ∼ iidN (0, 1):

P (Yi = 1 | inci , educi , femi ) = Φ (β0 + β1 inci + β2 educi + β3 femi )

I Since Φ (x ) ∈ [0, 1] for any x ∈ R, it must be the case that for any values
of explanatory variables inci , educi , femi , model predictions are in [0, 1]:
 
b (Yi = 1 | inci , educi , femi ) = Φ β̂0 + β̂1 inci + β̂2 educi + β̂3 femi
P 8/21
Probit model: computing predicted probabilities

I Suppose we estimated model parameters β̂0 , β̂1 , β̂2 , β̂3 .


I Later will talk about how to estimate parameters with maximum likelihood.
I Probabilities vary for all individuals and depend on explanatory variables.

i Yi inci educi femi b (Yi = 1 | Xi )


P
 
1 1 40 15 1 Φ β̂0 + 40β̂1 + 15β̂2 + β̂3
 
2 0 30 12 0 Φ β̂0 + 30β̂1 + 12β̂2
 
3 0 35 18 1 Φ β̂0 + 35β̂1 + 18β̂2 + β̂3
 
4 1 50 18 0 Φ β̂0 + 50β̂1 + 18β̂2
.. .. .. .. .. ..
. . . . .  . 
n 1 20 12 0 Φ β̂0 + 20β̂1 + 12β̂2

9/21
Probit model: marginal (partial) effects [1/5]
I Probit population model for pi = P (Yi = 1 | Xi ) is

P (Yi = 1 | inci , educi , femi ) = Φ (β0 + β1 inci + β2 educi + β3 femi )

I Marginal effect of a continuous regressor, e.g. inci , on P (Yi = 1 | Xi ):


MEinc (Xi ) = P (Yi = 1 | Xi )
∂inci

= Φ (β0 + β1 inci + β2 educi + β3 femi )
∂inci
= β1 φ (β0 + β1 inci + β2 educi + β3 femi )

I Marginal effects in probit are not equal to β1 , so the coefficients do not


have the same interpretation as in the linear model. To get the marginal
effect, also need to evaluate the standard normal pdf at particular values

φ (β0 + β1 inci + β2 educi + β3 femi ) 10/21


Probit model: marginal (partial) effects [2/5]

I MEinc (Xi ) is the marginal (partial) effect of income on the probability of


voting for Brexit, keeping education and gender fixed:

MEinc (Xi ) = β1 φ (β0 + β1 inci + β2 educi + β3 femi )

I Notice that marginal effect of income now depends on the level of income.
I If the coefficients are estimated with ML, the predicted marginal effect is
 
MEinc (Xi ) = β̂1 φ β̂0 + β̂1 inci + β̂2 educi + β̂3 femi

I Unlike in the linear model, where the marginal effect of income is just β1 ,
in the probit model MEinc (Xi ) 6= β1 , but is proportional to β1 .
I If β1 > 0, then higher β increases the rise in probability as income grows.
I In probit, MEinc (Xi ) is a function of all of the explanatory variables.
I Marginal effects vary across individuals i = 1, . . . , n.
11/21
Probit model: marginal (partial) effects [3/5]

I Want to compute marginal effect of a categorical regressor, such as the


dummy femi , on P (Yi = 1 | Xi ).
I Since femi ∈ {0, 1} we cannot differentiate with respect to femi .
I Can compute the difference in corresponding conditional probabilities for
individuals with femi = 1 and femi = 0, and other things (income and
education) being equal:

MEfem = P (Yi = 1 | inci , educi , femi = 1) − P (Yi = 1 | inci , educi , femi = 0)


= Φ (β0 + β1 inci + β2 educi + β3 ) − Φ (β0 + β1 inci + β2 educi )

I The marginal effect will differ depending on inci and educi .

12/21
Probit model: marginal (partial) effects [4/5]

I Want to evaluate the average marginal effect of income on the probability


of voting for Brexit. Since the marginal effects in probit depend on
explanatory variables and vary across individuals, we can do it in two ways.
I Average marginal (partial) effect (AME): compute marginal effects of
income for each i = 1, . . . , n in the sample and average over them:
n n
1X 1X
AMEinc = MEinc (Xi ) = β1 φ (X|i β)
n i=1 n i=1

I Note that in order to compute AME we need to know observations of


|
explanatory variables Xi = (1, inci , educi , femi ) for all i = 1, . . . , n.

13/21
Probit model: marginal (partial) effects [5/5]
I Marginal (partial) effect (evaluated) at the averages (MEA): the
marginal effect of income evaluated at the average values of all
|
explanatory variables, denoted by X̄ = (1, inc, educ, fem) :

MEAinc X̄ = β1 φ X̄| β = β1 φ (β0 + β1 inc + β2 educ + β3 fem)


 

I MEA for a categorical variable femi is the difference in probabilities:



MEAfem X̄ = Φ (β0 + β1 inc + β2 educ + β3 )−Φ (β0 + β1 inc + β2 educi )

I Sample means of explanatory variables are computed in the usual way:


n n n
1X 1X 1X
inc = inci , educ = educi , fem = femi
n i=1 n i=1 n i=1

I Note that for categorical variables such as fem, no individual can have
fem = fem, as fem ∈ (0, 1), it is the proportion of females in the sample. 14/21
Estimating probit model [1/5]

I Use maximum likelihood to estimate probit parameters β0 , β1 , β2 , β3 .


I Consider a simple case with one explanatory variable and no intercept:

pi = P (Yi = 1 | Xi ) = Φ (βXi )

I The outcome can be described by a Bernoulli random variable:



1 with prob pi = P (Yi = 1 | Xi ) = Φ (βXi )
Yi =
0 with prob 1 − pi = P (Yi = 0 | Xi ) = 1 − Φ (βXi )

I The probability mass function of Yi conditional on Xi is given by:


h iYi h i1−Yi
1−Yi
f (Yi | Xi ) = piYi (1 − pi ) = Φ (βXi ) 1 − Φ (βXi )

I We want to estimate β using maximum likelihood.


15/21
Estimating probit model [2/5]

I Yi is a conditional Bernoulli variable with probability mass:


h iYi h i1−Yi
1−Yi
f (Yi | Xi ; β) = piYi (1 − pi ) = Φ (βXi ) 1 − Φ (βXi ) .

I The logarithm of the mass function, ln f (Yi | Xi ) is then:

ln f (Yi | Xi ; β) = Yi ln Φ (βXi ) + (1 − Yi ) ln (1 − Φ (βXi ))

I The log-likelihood function in a sample of i = 1, . . . , n individuals is:


n
X n h
X i
l (β; Y|X) = ln f (Yi | Xi ; β) = Yi ln Φ (βXi )+(1 − Yi ) ln (1 − Φ (βXi ))
i=1 i=1

16/21
Estimating probit model [3/5]

I Maximize log-likelihood with respect to β to get the MLE:


n   0   0 
∂ X Φ (βXi ) Xi Φ (βXi ) Xi
l (β | Y, X) = Yi − (1 − Yi ) = 0.
∂β i=1
Φ (βXi ) 1 − Φ (βXi )

I Recall that that the derivative of the cdf is the pdf, Φ0 (βXi ) = φ (βXi ).
I The maximum likelihood estimator β̂ satisfies the first order condition:
   
n
X φ β̂Xi   
      Yi − Φ β̂Xi Xi = 0.
i=1 Φ β̂Xi 1 − Φ β̂Xi

I This equation cannot be solved analytically to give the MLE β̂.


I Instead, numerical iterative methods can be used.

17/21
Estimating probit model [4/5]
b (Yi = 1 | Xi ).
I Note that Φ(β̂Xi ) is the predicted probability P
I In the first-order condition the following term is the predicted residual:
 
ûi = Yi − Φ β̂xi = Yi − P
b (Yi = 1 | Xi )

I Note that the first term looks like some sort of a weight:
 
φ β̂Xi
ωi =     .
Φ β̂Xi 1 − Φ β̂Xi

I The first order condition of the MLE can be described as the weighted
sum of the product of the prediction errors and the explanatory variable:
   
n
X φ β̂Xi    Xn
      Yi − Φ β̂Xi Xi = ωi ûi Xi = 0.
i=1 Φ β̂Xi 1 − Φ β̂Xi i=1
18/21
Estimating probit model [5/5]

I Maximum likelihood gives unique values of coefficients estimates.


I Cannot write formulas for estimators, can only solve numerically.
I Under some general conditions maximum likelihood estimators are
- consistent;
- asymptotically normal;
- asymptotically efficient.

I In general, MLE are not unbiased.


I The formula for the standard errors is complicated, but STATA can do it.

19/21
Testing hypotheses about probit coefficients [1/2]
I Consider the estimated probit model for the Brexit vote:
 
b (Yi = 1 | inci , educi , femi ) = Φ β̂0 + β̂1 inci + β̂2 educi + β̂3 femi
P

I Want to test statistical significance of income, H0 : β1 = 0 vs H1 : β1 6= 0.


I Since probit is estimated using ML, use the likelihood ratio test.
I Estimate the unrestricted model and the restricted models:

- unrestricted: ML estimators β̂0 , β̂1 , β̂2 , β̂3 , with lu = l β̂0 , β̂1 , β̂2 , β̂3 ;

- restricted: β1 = 0 and ML estimators β̃0 , β̃2 , β̃3 with lr = l β̃0 , 0, β̃2 , β̃3 .
I Restricted model imposes a value of β1 = 0 (income excluded from model)
but other parameters still have to be estimated using maximum likelihood.
I The likelihood ratio test statistic with one restriction follows χ21 :

a
LR = −2 (lr − lu ) ∼ χ21

I Reject H0 at α if the sample test statistic exceeds critical value χ21,1−α .


20/21
Testing hypotheses about probit coefficients [2/2]

I Testing joint significance of income and education, H0 : β1 = β2 = 0.


I We have two restrictions, q = 2.
I Estimate the unrestricted model and the restricted models:

- unrestricted: ML estimators β̂0 , β̂1 , β̂2 , β̂3 , with lu = l β̂0 , β̂1 , β̂2 , β̂3 ;

- restricted: β1 = β2 = 0 and ML estimators β̃0 , β̃3 with lr = l β̃0 , 0, 0, β̃3 .

I Restricted model imposes β1 = β2 = 0 (income and education excluded


from model), other parameters estimated using maximum likelihood.
I The likelihood ratio test statistic with two restrictions follows χ22 :

a
LR = −2 (lr − lu ) ∼ χ22

I In general, for q restrictions, the likelihood ratio test statistic follows χ2q :

a
LR = −2 (lr − lu ) ∼ χ2q
21/21

You might also like