0% found this document useful (0 votes)

23 views

Econometria Avanzada: Generalized Linear Models

The document discusses generalized linear models for analyzing binary and polytomous dependent variables. It introduces binary choice models where a dependent variable can take the value of 0 or 1. It describes limitations of the linear probability model for such discrete dependent variables, including heteroskedasticity and probabilities outside the 0-1 range. The document then presents generalized linear models as an alternative, specifying the distribution of the dependent variable, the link function relating its mean to predictors, and examples of typical predictor formulations. It focuses on the probit model where the standard normal distribution is used to transform predicted probabilities into the 0-1 range.

Uploaded by

Alejandro Rojas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Econometria Avanzada: Generalized Linear Models

Uploaded by

Alejandro Rojas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Econometria Avanzada

Generalized Linear Models

Carlos Castro

Universidad del Rosario

1 / 30
Motivation: Binary Choice Models

Binary variables can take the value of 0 or 1. The value depends on a

particular outcome of the observed event.
So far you have used this type of variables (dummy variables) on the
right-hand side of the equation, as exogenous variables. Now the idea is
to use this same variables as endogenous or dependent variables in the
models.
Discrete variables are often used in various models in order to overcome
problems in measuring the actual event or to control for qualitive
characteristics.

2 / 30
Example: Biometrics Literature

The objective of a determined test carried by a Chemical company is to

3 / 30
Example:Biometrics Literature

The objective of a determined test carried by a Chemical company is to

assess the eﬀectiveness of a new pesticide, through the observed
tolerance of the insects. Let yi∗ be the tolerance of an insect;
yi∗ ∼ N(0, σ 2 ). The tolerance is measured with respect to an speciﬁc
doses of pesticide applied to the insects xi .
The observed variable is dichotomy yi that determines if the insect is
dead(0) or alive(1) after being exposed to xi doses of pesticide and in
accordance to the insects tolerance.

prob(yi = 1) = prob(yi > xi )

{ }
1, if yi∗ > xi
yi =
0, else

y ∗ is known as a latent variable, because it is not observable.

4 / 30
Motivation: Polychotomous

These variables can take more than two values.

Unordered: The variables do not have a natural order between them
 

 1 Person i travel auto 

 
 2 Person i travel carpool 
yi = 3 Person i travel subway

 

 4 Person i travel bus  

 
5 Person i travel other

5 / 30
Motivation: Polychotomous.

Ordered: The variables has an speciﬁc ranking (sequential).

 

 1 Person i high school education 

 
2 Person i incomplete college education
yi =

 3 Person i complete college education 

 
4 Person i Graduate education

6 / 30
Linear Models.

Consider the example of organized labor participation.

{ }
1, If person i is a member of organized labor (syndicate)
yi =
0, else

The problem at hand is the eﬀect of collective bargaining on wages.

Through a normal Minzer equation this would not be a problem since our
dichotomous variable is on the right hand side of the equation. However
if we are interest in the determinants of organized labor we would need to
have the dichotomous variable on the left hand side of the equation.

7 / 30
Linear Models.

yi = xi β + vi
where xi is a matrix that collects socio-economic characteristic from the
employee, such as education, work experience,..etc.
The linear probability model has a number of shortcomings:

prob(yi = 1) = F (xi , β)
prob(yi = 0) = 1 − F (xi , β)

where β is the eﬀect, changes in the variables in xi has on the probability

of the particular event.

8 / 30
Linear Model.

The linear probability can be express in the following form:

E [y |x] = F (x, β)
y = E [y |x] + [y − E [y |x]]
= β′x + ϵ

This form is the usual procedure applied in estimation when y is a

continuous variable.

9 / 30
Linear Models.

However when y is a discrete variable substantial problems emerge. The

ﬁrst problem is one of heteroskedasticity in the error since it depends on
the estimate β.

β′x + ϵ = 0 or 1
ϵ = −β ′ x with probability 1 − F or
ϵ = 1 − β ′ x with probability F =⇒
var [ϵ|z] = β ′ x(1 − β ′ x)

10 / 30
Linear Models.

Since the variance of ϵ is not constant because it depends on xi β.

However grave this problem is it can be overcome using a Feasible
Generalized Least Squares (FGLS).
A second inconvenient of the linear approximation arises from the
inability to guarantee that the predicted values of the model will be
actual probability values. This arises from the fact that β ′ x is not
necessarily ∈ [0, 1] Therefore the model could point out prediction of
uninterpretable probability values (such as negative values).

11 / 30
Generalized Linear Models.

Generalized Linear Model

yi = φ(xi β)
Building such a model involves three decisions:
What is the distribution of the data y (for ﬁxed values of the
predictors (exogenous variables) and possibly after a
transformation)?
What function of the mean will be models as linear in the
predictors?
What will the predictors be?

12 / 30
GLM: Distribution y.

Typically the vector y is assumed to consist of independent

measurements from a distribution from the exponential family or similar
to the exponential family:

∑
r
l(y , θ) = c(θ)h(y ) exp Qj (θ)Tj (y )
j=1

Where Qj and Tj take values in R.

13 / 30
GLM: Link function.
We typically want to relate the parameters of the distribution to various
predictors. We do so by modelling a transformation of the mean, µi
which will be some function of Q(θ):

E (yi ) = µi
−1
φ (µi ) = xi β

where φ is a known function, called the link function (since it links

together the mean of yi and the linear form of the predictors).
Identity: xi β
Probit link: Φ−1 (xi β)
exp(xi β)
Logistic link: 1+exp(xi β)

Poisson link (count): exp(xi β)

14 / 30
Predictors.

Generally predictors are conbsidered to have a linear form xi β. Same

considerations as in the linear model with respect to interactions,
treatment (dummys), non-linear relationships, and incorporation of
random factors GLMM.

15 / 30
Model Speciﬁcation.

In order to solve the unfortunate shortcomings of the linear speciﬁcation

(mentioned in the introduction) it is necessary to guarantee that
β ′ x ∈ [0, 1], such that prob(yi = 1) = F (xi β). This requires a
transformation through the use of a particular function F such that
Fxβ : R → [0, 1]. In the linear model there is no transformation since F is
an identity function such that

prob(yi = 1) = xi β
It is possible to use any continuous probability function deﬁned on R.

16 / 30
Probit Model: Speciﬁcation.

The function used for the transformation is a standard normal.

∫xi β { }
1 −zi2
prob(yi = 1) = ϕ(xi β) = √ exp dz
2π 2
−∞

The use of the standard normal function ϕ(xi β) restricts the range of
values to be ϵ[0, 1], such that

lim ϕ(x) = 1 and lim ϕ(x) = 0

x→+∞ x→−∞

17 / 30
Probit: Speciﬁcation

18 / 30
Probit: Speciﬁcation.

prob(yi = 1) = prob(yi∗ > 0)

= prob(xi β + ϵi > 0)
= prob(ϵi > −xi β)
ϵi −xi β
= prob( > )
σ σ
The last expression guarantee that we are working with the symmetric
standard normal distribution

ϵi −xi β
prob(yi = 1) = prob( > )
σ σ
ϵi xi β
= prob( < )
σ σ
xi β
= ϕ( )
σ

19 / 30
Probit: Estimation.

Given a sample of T independent observations, where each observation

may be on a diﬀerent individual, the likelihood function can be obtained
in the following way:

xi β
prob(yi = 1) = ϕ( )
σ
xi β
prob(yi = 0) = 1 − prob(yi = 1) = 1 − ϕ( )
σ
Since yi is iid the likelihood function is the multiplication of the
probability for each observation → ∃ 1, ..m observation such that yi = 0
and m + 1, ..., n such that n − m observations where yi = 1.

20 / 30
Probit: Estimation.

L = p(y1 = 0)p(y2 = 0) . . . p(ym = 0)p(ym+1 = 1) . . . p(yn = 1)

∏m
xi β ∏
n
xi β
= [1 − Φ( )] [Φ( )]
σ σ
i=1 i=m+1
∏n
x i β yi xi β 1−yi
= ϕ( ) [1 − ϕ( )]
σ σ
i=1

21 / 30
Probit: Estimation.

The log-likelihood function has the following form:

∑
n
ln L = yi ln F (xi β) + (1 − yi ) ln[1 − F (xi β)]
i=1

First order condition for the log-likelihood:

∂ ln L ∑ f (.) ∑
n n
f (.)
= yi xi − (1 − yi ) xi = 0
∂β F (.) 1 − F (.)
i=1 i=1

22 / 30
Probit: Estimation.

This function must be maximized with the use of numeric methods since
it is not linear. The most common numerical method used is the Newton
Raphson Algorithm.

∂ 2 ln L ∂ ln L
β̂t+1 = β̂t − [ |β̂t ]−1 [ |β̂t ]
∂β∂β ′ ∂β
where β̂t is a consistent, asymptotically eﬃcient and normal distributed
estimator.
The asymptotic covariance matrix is:

∂ 2 ln L
−[ |β̂t ]−1
∂β∂β ′

23 / 30
Logit: Speciﬁcation.

The function use for the transformation is a logistic cumulative.

exp {β ′ x}
prob(yi = 1) = Λ(xi β) =
1 + exp {β ′ x}
Both of these model are the most commonly used. The choice between
theM is sometimes based on the data used. The distribution are both
symmetric. However the tails of the logistic density function are
heavier(widther) then in the standard normal. This makes the probability
associated with extreme values most likely to vary between the to
models.

24 / 30
Logit: Estimation.

The methods used in the estimation of logit model are also maximum
likelihood methods.
First order condition for the log-likelihood:

∂ ln L ∑
n
= (yi − Λi )xi = 0
∂β
i=1

25 / 30
Logit: Estimation.

This equation must be solved with the use of numeric methods since it is
not linear. The most common numerical method used is the Newton
Raphson Algorithm.

∂ 2 ln L ∂ ln L
β̂t+1 = β̂t − [ ′
|β̂t ]−1 [ |β̂t ]
∂β∂β ∂β
where β̂t is a consistent, asymptotically eﬃcient and normal distributed
estimator.
The asymptotic covariance matrix is:

∂ 2 ln L
−[ |β̂t ]−1
∂β∂β ′

26 / 30
Marginal Eﬀects.

The parameters of the models with dicrete dependent variables, like those
of any nonlinear regression model, are not necessarily the marginal eﬀects
that one is accustomed to analyzing. In general

{ }
∂E [y |x] dF (β ′ x)
= β
∂x d(β ′ x)
= f (β ′ x)β

where f (.) is the density function that corresponds to the cumulative

distribution F (.).

27 / 30
Marginal Eﬀects.

For the normal distribution, this result is:

∂E [y |x]
= ϕ(β ′ x)β
∂x
where ϕ(t) is the standard normal density.

28 / 30
Marginal Eﬀects.

For the logistic distribution,

dΛ[β ′ x] exp {β ′ x}
=
d(β ′ x) (1 + exp {β ′ x})2
= Λ(β ′ x)[1 − Λ(β ′ x)]

Then for the logit model the marginal eﬀect is:

∂E [y |x]
= (Λ(β ′ x)[1 − Λ(β ′ x)])β
∂x
It is obvious that these values will vary with the values of x. In
interpreting the estimated model, it will be useful to calculate this value
at, say, the means of the regressors.

29 / 30
Goodness-of-ﬁt.

Often goodness-of-ﬁt measures are implicitly or explicitly based on

comparison: model only a constant as explanatary variables vs model
with other covariates. Let L1 the maximun loglikelihood value of the
model of interest and L0 the maximun loglikelihood value when all of the
parameters except the intercept are set to zero. We know that L1 > L0 .
The intuition is that the larger the diﬀerence the more the extended
model adds to the restricted model.
Pseudo-R 2
1
1− 2(L1 −L0 )
1+ N

McFaddenR 2
L1
1−
L0

30 / 30

Python Code For 1D Heat Equation
No ratings yet
Python Code For 1D Heat Equation
4 pages
Calculus of Variations
From Everand
Calculus of Variations
Lev D. Elsgolc
No ratings yet
CHapter 5 Acct
No ratings yet
CHapter 5 Acct
8 pages
Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
Module 3 - Data Analysis_S RM
No ratings yet
Module 3 - Data Analysis_S RM
63 pages
Regression3 Discussion
No ratings yet
Regression3 Discussion
30 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
1.1 Normal Linear Model: 1.1.1 Theoretical (Population) Model and Fitted (Sample) Model
No ratings yet
1.1 Normal Linear Model: 1.1.1 Theoretical (Population) Model and Fitted (Sample) Model
63 pages
Epidemiology Through the Lens of Differential Equations
No ratings yet
Epidemiology Through the Lens of Differential Equations
15 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
61 pages
binary
No ratings yet
binary
47 pages
Binary
No ratings yet
Binary
40 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
ARE107 L3 Detailed
No ratings yet
ARE107 L3 Detailed
30 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
36 pages
Binary Response Models: Logits, Probits and Semiparametrics: Joel L. Horowitz and N.E. Savin
No ratings yet
Binary Response Models: Logits, Probits and Semiparametrics: Joel L. Horowitz and N.E. Savin
18 pages
UC Berkeley Econ 140 Section 10
No ratings yet
UC Berkeley Econ 140 Section 10
8 pages
Multiple Regression OLS Asymptotics
No ratings yet
Multiple Regression OLS Asymptotics
20 pages
Chapter 3: Classical Demand Theory: Frederic Vermeulen
No ratings yet
Chapter 3: Classical Demand Theory: Frederic Vermeulen
70 pages
Qualitative Response Regression Questions
No ratings yet
Qualitative Response Regression Questions
10 pages
Adv Econ Chapter 1: Modeling Framework
No ratings yet
Adv Econ Chapter 1: Modeling Framework
5 pages
In All The Regression Models That We Have Considered So
100% (1)
In All The Regression Models That We Have Considered So
52 pages
Chapter - Two - Simple Linear Regression - Final Edited
No ratings yet
Chapter - Two - Simple Linear Regression - Final Edited
28 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Applied Mathematics, Eng Lessons 5& 6
No ratings yet
Applied Mathematics, Eng Lessons 5& 6
5 pages
Chap_2_Econometrics I Jonse (3)
No ratings yet
Chap_2_Econometrics I Jonse (3)
41 pages
Extensions of Regression Models Unit 3
No ratings yet
Extensions of Regression Models Unit 3
5 pages
stats theory
No ratings yet
stats theory
21 pages
Econometrics Edited Chapter-4
No ratings yet
Econometrics Edited Chapter-4
35 pages
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
30 pages
Selection-T Model Marchenko
No ratings yet
Selection-T Model Marchenko
15 pages
Lecture 5 Dummy Variable
No ratings yet
Lecture 5 Dummy Variable
11 pages
Countdata2018 2
No ratings yet
Countdata2018 2
23 pages
7 generalized linear models padua
No ratings yet
7 generalized linear models padua
29 pages
McCulloch and Neuhaus 2005 Generalized Linear Mixed Models
No ratings yet
McCulloch and Neuhaus 2005 Generalized Linear Mixed Models
5 pages
Chapter 1
No ratings yet
Chapter 1
6 pages
Lecture - Hoi Qui Don - DT - New - 8.5
No ratings yet
Lecture - Hoi Qui Don - DT - New - 8.5
10 pages
Lecture8-Estimating the Linear Causal Model I -Slides annotated
No ratings yet
Lecture8-Estimating the Linear Causal Model I -Slides annotated
27 pages
Arsheen 100929
No ratings yet
Arsheen 100929
14 pages
Micro Consumption Final 2014 2014 11 25 14 35 46
No ratings yet
Micro Consumption Final 2014 2014 11 25 14 35 46
108 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Hull & White - The Perfect Copula
No ratings yet
Hull & White - The Perfect Copula
19 pages
Chapter-3 Assumption of CLRM
No ratings yet
Chapter-3 Assumption of CLRM
9 pages
Ch. 5. - Models of Discrete Choice
No ratings yet
Ch. 5. - Models of Discrete Choice
43 pages
Course Title: Quantitative Techniques For Economics Course Code: ECON6002 Topic: The Linear Probability Model (LPM)
No ratings yet
Course Title: Quantitative Techniques For Economics Course Code: ECON6002 Topic: The Linear Probability Model (LPM)
12 pages
ECO - Chapter 2 SLRM
No ratings yet
ECO - Chapter 2 SLRM
40 pages
BIOSTATISTICS
No ratings yet
BIOSTATISTICS
24 pages
Assignment
No ratings yet
Assignment
20 pages
Appendix A Ordered Probit Model: I M I M
No ratings yet
Appendix A Ordered Probit Model: I M I M
3 pages
Probit Logit Ohio PDF
No ratings yet
Probit Logit Ohio PDF
16 pages
Lec3 2019 PDF
No ratings yet
Lec3 2019 PDF
43 pages
Econometrics Unit 7 Dummy Variables: Amado Peiró (Universitat de València)
No ratings yet
Econometrics Unit 7 Dummy Variables: Amado Peiró (Universitat de València)
15 pages
Econometrics Module
No ratings yet
Econometrics Module
79 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Multicollinerity_A_Violation_of_Classical_Linear_Regression_Model_Assumptions
No ratings yet
Multicollinerity_A_Violation_of_Classical_Linear_Regression_Model_Assumptions
19 pages
AE Unit II
No ratings yet
AE Unit II
64 pages
Chapter Three 3.0 Methodology 3.1 Source of Data
No ratings yet
Chapter Three 3.0 Methodology 3.1 Source of Data
10 pages
Mood LogisticRegressionCannot 2010
No ratings yet
Mood LogisticRegressionCannot 2010
17 pages
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
100% (3)
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
34 pages
(Discrete Choice Model Soderbom)
No ratings yet
(Discrete Choice Model Soderbom)
43 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
Artificial Intelligence MCQS
No ratings yet
Artificial Intelligence MCQS
22 pages
Smoothing Frequency Domain Filters
No ratings yet
Smoothing Frequency Domain Filters
22 pages
ECON 581. Introduction To Arrow-Debreu Pricing and Complete Markets
No ratings yet
ECON 581. Introduction To Arrow-Debreu Pricing and Complete Markets
32 pages
Lecture8 04042024 110138pm
No ratings yet
Lecture8 04042024 110138pm
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Estimation of Weibull Shape Parameter by Shrinkage Towards An Interval Under Failure Censored Sampling
No ratings yet
Estimation of Weibull Shape Parameter by Shrinkage Towards An Interval Under Failure Censored Sampling
22 pages
A Novel Fuzzy Logic-Based Image Steganography Method To Ensure Medical Data Security
No ratings yet
A Novel Fuzzy Logic-Based Image Steganography Method To Ensure Medical Data Security
26 pages
Binarry
No ratings yet
Binarry
3 pages
05 Bubble Sort
No ratings yet
05 Bubble Sort
3 pages
FSM Description: Pinit Kumhom
No ratings yet
FSM Description: Pinit Kumhom
16 pages
My Paper 2 (IOS Press Scopus Format) Final
No ratings yet
My Paper 2 (IOS Press Scopus Format) Final
12 pages
Bisection Method
No ratings yet
Bisection Method
11 pages
Full download Chemometrics in Spectroscopy (Second Edition) Howard Mark - eBook PDF pdf docx
100% (1)
Full download Chemometrics in Spectroscopy (Second Edition) Howard Mark - eBook PDF pdf docx
52 pages
Accenture 40 Technical Questions
No ratings yet
Accenture 40 Technical Questions
43 pages
CSC373 Algorithm Design, Analysis & Complexity Nisarg Shah
No ratings yet
CSC373 Algorithm Design, Analysis & Complexity Nisarg Shah
65 pages
Data Science Refresher: Gunjan Trivedi
No ratings yet
Data Science Refresher: Gunjan Trivedi
93 pages
A Penny Saved Is A Penny Earned
No ratings yet
A Penny Saved Is A Penny Earned
2 pages
3 The Quadratic Family and Bifurcations
No ratings yet
3 The Quadratic Family and Bifurcations
2 pages
2008 - Computational Analysis and Improvement of SIRT
No ratings yet
2008 - Computational Analysis and Improvement of SIRT
7 pages
ACFD-Lecture-4 Part-2
No ratings yet
ACFD-Lecture-4 Part-2
95 pages
AP TRB Physics - Poissons Bracket - Ks Academy
No ratings yet
AP TRB Physics - Poissons Bracket - Ks Academy
14 pages
An Ant Colony Optimization Approach For Nurse Rostering Problem
No ratings yet
An Ant Colony Optimization Approach For Nurse Rostering Problem
5 pages
U7l1 2
No ratings yet
U7l1 2
17 pages
Mid - Sem - 2019 - Linear Control System
No ratings yet
Mid - Sem - 2019 - Linear Control System
1 page
Data Structures (ALL)
No ratings yet
Data Structures (ALL)
47 pages
Chapter1 2
No ratings yet
Chapter1 2
12 pages
Data Science Methodology
No ratings yet
Data Science Methodology
26 pages
Analysis of Algorithms
No ratings yet
Analysis of Algorithms
5 pages
Kalo Solutions
No ratings yet
Kalo Solutions
36 pages

Econometria Avanzada: Generalized Linear Models

Uploaded by

Econometria Avanzada: Generalized Linear Models

Uploaded by

Econometria Avanzada

Generalized Linear Models

Universidad del Rosario

Binary variables can take the value of 0 or 1. The value depends on a

The objective of a determined test carried by a Chemical company is to

The objective of a determined test carried by a Chemical company is to

prob(yi = 1) = prob(yi > xi )

y ∗ is known as a latent variable, because it is not observable.

These variables can take more than two values.

Ordered: The variables has an speciﬁc ranking (sequential).

Consider the example of organized labor participation.

The problem at hand is the eﬀect of collective bargaining on wages.

where β is the eﬀect, changes in the variables in xi has on the probability

The linear probability can be express in the following form:

This form is the usual procedure applied in estimation when y is a

However when y is a discrete variable substantial problems emerge. The

Since the variance of ϵ is not constant because it depends on xi β.

Generalized Linear Model

Typically the vector y is assumed to consist of independent

Where Qj and Tj take values in R.

where φ is a known function, called the link function (since it links

Poisson link (count): exp(xi β)

Generally predictors are conbsidered to have a linear form xi β. Same

In order to solve the unfortunate shortcomings of the linear speciﬁcation

The function used for the transformation is a standard normal.

lim ϕ(x) = 1 and lim ϕ(x) = 0

prob(yi = 1) = prob(yi∗ > 0)

Given a sample of T independent observations, where each observation

L = p(y1 = 0)p(y2 = 0) . . . p(ym = 0)p(ym+1 = 1) . . . p(yn = 1)

The log-likelihood function has the following form:

First order condition for the log-likelihood:

The function use for the transformation is a logistic cumulative.

where f (.) is the density function that corresponds to the cumulative

For the normal distribution, this result is:

For the logistic distribution,

Then for the logit model the marginal eﬀect is:

Often goodness-of-ﬁt measures are implicitly or explicitly based on

You might also like