Microeconometrics_S1_AZ
Microeconometrics_S1_AZ
M1 APE
Adam Zylbersztejn
[email protected]
I. Introduction
Roadmap
2
Binary information
Multiple examples of economic variables
represented as indicator variables:
Active (Act=1) / Inactive (Act=0)
Higher eduction (Educ=1) / High school or less (Educ=0)
High quality product (Qual=1) / Low quality (Qual=0)
Married (Mar=1) / Unmarried (Mar = 0) ; etc.
4
Encoding binary information
Binary information → Indicator variable (0/1)
Active (Act=1)
Inactive (Act=0)
5
Multinomial variable
Another common form of discrete variables are the
multinomial ones:
Inactive (Emp=0) / Unemployed (Emp=1) / Student
(Emp=3) / Employed (Emp=4) / Retired (Emp=5)
Degree: MA/MSc or more (Ed=1) / BA/BSc (Ed=2) / High
school (Ed=3) / None (Ed=4)
6
Encoding multinomial
information
Different encoding yet same information:
Inactive (Emp=0) / Unemployed (Emp=1) / Student
(Emp=3) / Employed (Emp=4) / Retired (Emp=5)
Inactive (Emp=66) / Unemployed (Emp=23) / Student
(Emp=54) / Employed (Emp=12) / Retired (Emp=47)
Inactive (Emp= « inact ») / Unemployed
(Emp= « unem ») / Student (Emp= « stud ») / Emplyed
(Emp= « empl ») / Retired (Emp= « reti »)
8
Ordinal variable
There are cases in which absolute values have no
interpretation, but their relative values do:
Full time job (FT=1) / Part time job, at least half time
(FT=2) / Less than half time job (FT=3)
→ working time: « 1 » > « 2 » > « 3 »
11
II.1. Indicator explonatory
variable
a.k.a. « dummy » variable
12
Dummy variables
An indicator variable (dummy) can only
take either of the two values: 0 or 1
13
Dummy explanatory variable
Consider a simple model with one continuous
variable (x) and one dummy variable (d)
y = α0 + α1 x + d0d + u
15
Example (assuming d0 > 0)
y y = (α0 + d0) + α1x In both cases,
slope = α1
d=1
d0 { d=0
y = α0 + α1x
}α 0
x
16
Example
wage= 01 male 2 educu
̄ Male =β0+ β1 + β2 E
wage
wage =β + β E wage
Male − wage
Female = 1
̄ Female 0 2
17
Example
wage= 01 male 2 educu
Note that this model may also be estimated with a
dummy variabe « female » (i.e. taking males as
reference category)
This yields:
α0=β0−α1 1=−1 2 = 2
18
Example
wage=β0 +β1 +β 2 educ
wage=β0 + 2.27+β 2 educ Males (male = 1)
Females (male = 0)
0 1= 02,27
0
19
Example
wage=01 2 educ
NB : ici α1 < 0
0=0 1
0 1=0
20
Example
Attention: one cannot jointly include an intercept
and both dummies female et male in a regression model
female + male = 1
21
Modèle 18: MCO, utilisant les observations 1-526
Variable dépendante: wage
β1 = 2.27
→ Mean difference in houly wage between
males and females equals $2.27
23
Alternative specification
Wage and income equations are generally
estimated in a semilogarithmic form :
24
Alternative specification
ln(w)=β0+ β1 male+ β2 educ+ u
If the dependent variable is in logarithm, the
dummy variable coefficient is interpreted as the
relative mean difference between the two group (i.e.,
in percentage)
Exemple : β1 = 0,2 → mean difference of 20%
w Male
β1=ln(w Male )−ln(w Female )=ln( )
w Female
w Male −w Female w Male
= −1=exp(β1)−1≈β1 ( for β1≈0)
w Female w Female
25
Semi-logarithmic specification
Modèle 15: MCO, utilisant les observations 1-526
Variable dépendante: lwage
26
II.2. Indicator variables and
complex information
27
Dummy variables and multiple
categories
One can use dummy variables to model discrete
information with multiple categories
Example : geographical variable may take 4 values
(North, South, East et West)
Create three dummies to capture this
information:
North = 1 if the region is North and 0 otherwise
South = 1 if the region is South 0 otherwise
West = 1 if the region is West and 0 otherwise
Reference category: East
28
Dummy variables and multiple
categories
Every multinomial variables can be transformed
into a set of dummies
The reference category is captured by the
intercept For n categories, we only need n – 1
dummy variables
Risk of model saturation: the number of variables
may quickly become substantial (e.g., socio-
economic information) which requires redefining
the categories of interest
29
Dummy variables and multiple
categories
With multiple sets of dummies in the model (for
instance, one set for sex and one set for
geography), the reference category are
observations belonging to all the omitted
categories (e.g., females living in the East)
30
Example
Geographical information with 4 categories: East,
North (+ Centre), South et West:
→ have 3 dummies: east, northcen et south
31
Logarithmic specification
Modèle 24: MCO, utilisant les observations 1-526
Variable dépendante: lwage
32
Dummy variables and ordinal
information
If qualitative information is ordinal, a one-unit increase
in X may not have a constant (linear) effect on y
→ use a set of dummy variables
→ no ordinal structure per se, butordinal
interpretation
Example: classify insurance buyers according to risk
1: negligible risk
2: low risk
3: medium risk
4: high risk
5: substantial risk
33
Dummy variables and ordinal
information
Sometimes, one can recover categorical information
from quantitative data in order to pin down non-
linearity
Examples:
Age category
Income category
Education level as the number of years of
schooling
Trade-off: flexibility vs. greater number of
coefficients to estimate
34
II.3. Dummy variables and
interactions
35
Interactions between dummy
variables
Allows to measure the impact of one qualitative
variable on the effect that another qualitative variable
has on the dependent variable
Example : The effect of marital status on wage can
vary between males and females
If we are only interested in the difference between
males and females, or between married and unmarried,
we only need to include two variables in the model:
41
Modèle 28: MCO, utilisant les observations 1-526
Variable dépendante: lwage
42
β4 > 0
Males
y y = β0 + β1 + (β3 + β4).exp
Females
y = β0 + β3.exp
exp
43
II.4. Dummy variables and
program evaluation
44
Program evaluation
Dummy variables seem particularly hand when it
comes to program evaluation
Example: Income of people participating in a
training program vs. income of non-participants
Important notice:
Some individual determinants of selection to the
program (e.g., motivation) may be unobserved and
correlated with the unobserved determinants of
wage
Dummy variable is not exogeneous
The OLS estimator is biased
46
III. Qualitative information as
dependent variable
47
III.1. Introduction
48
A little bit of history
Relatively recent models
First contributions date back to the 1940's
Berkson (Logit et Probit)
Economic application even more recent (starting
in the 1970's)
MacFadden et Heckman (Prix Nobel 2000)
49
Areas of application
In various economic settings (and not only –
psychology, sociology, etc.) the variable of interest
in binary:
Choosing to pursue higher education (e.g., works
51
Linear Probability Model (LPM)
Formalization :
→ y i = X i β+εi
52
Interpretation of coefficients
y i = X i β+ εi
The model contains:
y i : qualitative information
X i β+εi : quantitative information
Encoding of the qualitative information is arbitrary
Different ways of encoding yield different values of β
→ β is generally not interpretable
53
Interpretation of coefficients
Binary encoding 1/0 allows for modeling the
probability that dummy yi equals 1 since:
54
Linear approximation not
suitable
Simple case: univariate model
y i =β0+ β1 x 1+ εi
55
Non-normality of residuals
y i = X i β+ εi → εi = y i− X i β
With binary yi:
{
εi =1− X i β
ou
εi =0−X i βi=−X i β
------------------------------------------------------------------------------
card | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .0188458 .003214 5.86 0.000 .0124678 .0252238
_cons | -.1594495 .089584 -1.78 0.078 -.3372261 .018327
------------------------------------------------------------------------------
59
Illustration: credit card and
income
60
Illustration: credit card and
income
61
Linear model for multinomial
dependent variable
Employed / Self-employed / Retired / Inactive
1/2/3/4?
2/4/1/2?
4/1/3/2?
NB: For OLS, 3 = 3x1 = 1.5x2 , 4 = 4x1=2x2 !
62
III.3. Solutions
63
More suitable estimation
methods
Binary variables
Logit / Probit
Multinomial variables
Multinomial Logit / Probit
66
LPM (OLS - Stata)
67
Logit (GRETL)
Modèle 2: Logit, utilisant les observations 1-32
Variable dépendante: GRADE
Écarts type basés sur la matrice hessienne
Prédit
0 1
Actuel 0 18 3
1 3 8 68
Logit (Stata)
69
Probit (GRETL)
Modèle 3: Probit, utilisant les observations 1-32
Variable dépendante: GRADE
Écarts type basés sur la matrice hessienne
Prédit
0 1
Actuel 0 18 3
1 3 8 70
Probit (Stata)
71