Ordered Probit Model
Ordered Probit Model
Nathan Carroll
Universität Regensburg
Abstract
Ordered discrete dependent variable models such as ordered probit and ordered logit
are frequently used across the social sciences to study outcomes including health status,
happiness, wealth and educational attainment. Unlike in the case of OLS, unaccounted
for heteroskedasticity in these models can lead to biased parameter estimates. This paper
introduces the oglmx package for the R statistical environment that permits estimation
of generalized models that allow the user to model the form of the heteroskedasticity.
1. Introduction
Ordered discrete dependent variable models are common across the social sciences, examples
of outcomes that require such models include categorical measures of health status (Case,
Lubotsky, and Paxson 2002), happiness or life-satisfaction (Gerdtham and Johannesson 2001),
wealth (Hartog and Oosterbeek 1998) and educational attainment (Dearden, Meghir, and
Ferri 2002). Standard models such as ordered probit and ordered logit assume that error
variances are constant across observations, or homoskedastic. When using ordinary least
squares to estimate a linear relationship in the presence of heteroskedasticity in the error
term parameter estimates remain consistent, though standard errors need to be adjusted via
a variance-covariance estimator that takes account of the heteroskedasticity. However, in the
case of models such as ordered probit and ordered logit failure to account for heteroskedasticity
can lead to biased parameter estimates in addition to misspecified standard errors. This paper
introduces the oglmx package developed for the R statistical environment (R Core Team 2015)
that allows the user to model the form of the heteroskedasticity in various ordered discrete
dependent variable models.
The simplest discrete dependent variable models, those for the case of binary outcomes, can
be estimated using the glm function available in the core distribution of R, while ordered
dependent variable models beyond the binary case are included in the MASS package (Ven-
ables and Ripley 2002) under the polr function. The standard ordered probit and logit models
include a normalization of the error variance which implies that the scale of the estimated
parameters are of little relevance to the researcher, instead researchers are often interested
in the marginal effects of particular variables on the probabilities of each observable value of
the dependent variable. R packages returning marginal effects include erer (Sun 2014) and
2 oglmx: A Package for Estimation of Ordered Generalized Linear Models.
mfx (Fernihough 2014). The oglmx package includes a margins function that returns marginal
effects (and their standard errors) for all models estimated by the oglmx function. The pack-
age includes link functions for probit, logit, cauchit, complementary log-log and log-log while
allowing the user to specify the functional form used to model the variance of the standard
error. The function is written so that it is sufficiently flexible to allow estimation of inter-
val regression with fixed boundaries across observations in addition to the ordered models
provided by function polr.1 The oglmx package makes use of the maxLik (Henningsen and
Toomet 2011) package to maximise the likelihood for the user specified model.
The paper is organised as follows: Section 2 describes the models estimated by the oglmx
function, section 3 gives an outline of how the core functions of the package work, section 4
provides a working example and section 5 concludes.
y ∗ = xβ + σ
where x is a 1 × K vector of explanatory variables that may or may not contain a constant
depending on the particular model to be estimated, β is a K × 1 vector of parameters, is a
mean zero random error term and σ is a parameter that allows the variance of the error term
to be shifted up or down. Let α1 < α2 < . . . < αJ−1 be threshold parameters that determine
the observed outcome as follows:
y=0 if y ∗ ≤ α1
y=1 if α1 < y ∗ ≤ α2
..
.
y =J −1 if y ∗ > αJ−1 .
Given a distribution function for the error term , vector of parameters β and the set of
threshold parameters we can obtain the probabilities for each of the outcomes.
∗ α1 − xβ
P (y = 0) = P (y ≤ α1 ) = P ≤
σ
∗ ∗ α2 − xβ α1 − xβ
P (y = 1) = P (y ≤ α2 ) − P (y ≤ α1 ) = P ≤ −P ≤
σ σ
..
.
∗ αJ−1 − xβ
P (y = J − 1) = P (y > αJ−1 ) = P > .
σ
1
The oglm command in Stata (?) offers a similar set of models as the oglmx function but omits interval
regression and does not allow flexibility in the function used to model the variance of the error term.
Nathan Carroll Universität Regensburg 3
αj+1 − xβ αj − xβ
P (y = j) = P ≤ −P ≤
σ σ
αj+1 − xβ αj − xβ
=F −F
σ σ
where F is the assumed cumulative distribution function (cdf) for the error term . The
various models estimated by the oglmx function vary according to the assumed distribution
of the error term, e.g. logistic distribution for ordered logit and standard normal distribution
for ordered probit, and the parameters that are known versus those that are estimated, e.g.
under interval regression the threshold parameters are known while the constant in the latent
variable equation and the variance of the error term are estimated whereas with ordered probit
the levels of the constant and the variance of the error term are imposed while the threshold
parameters are estimated.
To allow for heteroskedasticity the variance of the error term is permitted to vary by allowing
σ to be determined by the following equation:
σ = g (zδ)
where z is a 1 × L vector of variables that explain the level of the variance. As was the case
for the vector x, z may or may not include a constant. δ is an L × 1 vector of parameters. The
function g (.) should ideally return a positive value for all observed levels of variables in z,
with this in mind the default option of the package is to use the exponential function. There
is no restriction regarding the choice of variables in z, it may contain the same variables as x
or be entirely different.
The oglmx function obtains estimates of the parameters of the model by maximising the
log-likelihood function, that for a sample consisting of n observations is given by:
n J−1
X X αj+1 − xi β αj − xi β
L (β, δ, α) = I (yi = j) log F −F .
g (zi δ) g (zi δ)
i=1 j=0
where I (.) is the indicator function. Following the usual properties of maximum likelihood
estimators the parameter estimates obtained from maximising the likelihood are consistent
and asymptotically normal and the asymptotic variance of the estimated parameters can
be estimated straightforwardly (Wooldridge 2002). The main body of code in the oglmx
function calculates the above log-likelihood, score vector and Hessian matrix given a vector
of parameter values. This function is passed as an argument to a Newton-Raphson type
algorithm via the maxLik package.
Many popular models are included under the above framework as the distribution function
F (.) and the parameters that are estimated or imposed are varied. Table 1 lists some of the
models and the parameter restrictions imposed in each case.
the variance of the error term to unity and the constant equal to zero. Instead researchers
are interested in the marginal effect of a variable on the probability of each outcome. In
homoskedastic models the signs of regression coefficients are informative of the sign of the
marginal effects for outcomes at the extreme of the distribution, but not for intermediate
outcomes. In contrast in a heteroskedastic model the sign of a variable’s coefficient(s) is
on its own uninformative on the sign of any marginal effect when it enters the equation for
the variance. For a continuous variable v contained in the kth element of vector x and the
lth element of vector z the marginal effect of that variable on the probability of outcome j,
denoted M Ej (x, z) occurring is given by:
∂P (y = j) βk αj+1 − xβ αj − xβ
M Ej (x, z) = =− f −f
∂v g (zδ) g (zδ) g (zδ)
0
δl g (zδ) αj+1 − xβ αj+1 − xβ αj − xβ αj − xβ
− f − f
g (zδ) g (zδ) g (zδ) g (zδ) g (zδ)
(1)
where f (.) is the probability density function of error term . In general the sign of marginal
effects depends on the sign of the relevant coefficients and the relative value of the mean of
the latent variable (xβ) and relevant threshold parameters αj and αj+1 . Equation 1 is the
correct formula when the variable under consideration is continuous, however for a binary
variable it may be preferable to consider the full change of a variable from zero to one rather
than a change at the margin. In this case the marginal effect of a binary variable contained
in x and/or z is calculated using:
αj+1 − x1 β α j − x1 β αj+1 − x0 β αj − x 0 β
M Ej (x, z) = F −F − F −F
g (z1 δ) g (z1 δ) g (z0 δ) g (z0 δ)
(2)
where x1 and z1 denote vectors with the variable of interest set equal to one while x0 and z0
set the variable equal to zero.
Equations 1 and 2 tell us how to calculate the marginal effect given a particular set of values
of the components of the vectors x and z. Two main methods are used to summarize the
marginal effects for a sample of data, the marginal effect at mean (MEM) and the average
marginal effect (AME). The MEM for a particular variable calculates the marginal effect
supposing that all variables were at the means for the sample, that is:
where x̄ and z̄ denote vectors of means. The AME calculates the marginal effect for each
observed set of variables xi and zi and averages the marginal effects across the sample, that
is:
n
1X
AM Ej = M Ej (xi , zi ) .
n
i=1
The margins.oglmx function included in the oglmx package can calculate either of these two
measures of the marginal effect. The marginal effects are a non-linear function of the estimated
parameters, an approximation to the standard errors of the marginal effects can be obtained
via application of the delta method. The margins.oglmx function calculates the standard errors
using the delta method.
6 oglmx: A Package for Estimation of Ordered Generalized Linear Models.
3. Package implementation
The current implementation of the package follows the standard for model estimation in R,
a user input function oglmx that takes as input a formula for the empirical model being
estimated and the data frame where the data is to be sourced from, and a oglmx.fit function
called by oglmx that estimates the model. The oglmx.fit function contains further functions
that return the analytic log-likelihood, score vector and hessian for a given parameter vector
and these are used by the likelihood maximization procedure of the maxLik package which
given the analytic hessian performs the optimization with a Newton-Raphson algorithm by
default.
y ∗ = β0 + β1 x1 + β2 x2 + σ
where will have a standard normal distribution. The variance σ will be given by:
σ = exp (δ0 + δ1 x1 + δ2 x2 )
Variable x1 will be binary and x2 will be continuous. Specifically x1 will take on value 1 with
probability 0.75 and x2 will be drawn from a standard normal distribution.
> set.seed(242)
> n<-250
> x1<-rbinom(n,1,0.75) # binary variable
> x2<-rnorm(n) # continuous variable
> sampledata<-cbind(rep(1,n),x1,x2)
> # set true parameter values
> meanparams<-c(0.5,1,-0.5)
> varparams<- c(0,0.5,-0.5)
> # generate latent variable
> ystar<-sampledata%*%meanparams+rnorm(n)*exp(sampledata%*%varparams)
> # generate outcome variable
> threshparams<-c(-0.5,0.5,1.5)
> outcomes<-c(-1,0,1,2)
> setvalue<-function(x){
+ locate<-outcomes[1:(length(outcomes)-1)][x<threshparams]
+ if (length(locate)==0){
+ return(outcomes[length(outcomes)])
+ } else {
Nathan Carroll Universität Regensburg 7
+ return(locate[1])
+ }
+ }
> y<-sapply(ystar,setvalue)
> sampleframe<-data.frame(y,x1,x2)
The parameter values chosen for the example are for the mean equation β0 = 0.5, β1 = 1
and β2 = 1 and for the variance δ0 = 0, δ1 = 0.5 and δ2 = −0.5. Given the parameters
and knowing the source distribution of the variables we can calculate the expected marginal
effects at means for the two variables. Table 2 displays the implied marginal effects.
Outcome x1 x2
y = −1 -0.017 -0.05
y=0 -0.133 0.08
y=1 -0.154 0.139
y=2 0.304 -0.169
Note that the marginal effect of the variable x2 is negative for the two extreme outcomes
(y = −1 and y = 2). This possibility cannot be captured in a standard ordered probit model
unless including the variable non-linearly in the regression equation, for example by adding
the square of the variable, which generates additional complications in calculating marginal
effects.
The standard ordered probit can be estimated with the oglmx function with the correct
specification of optional arguments, or by using the oprobit.reg function included with the
package.
> library("oglmx")
> results.oprob<-oglmx(y ~ x1 + x2, data=sampleframe, link="probit",
+ constantMEAN = FALSE, constantSD = FALSE,
+ delta=0,threshparam = NULL)
> results.oprob1<-oprobit.reg(y ~ x1 + x2, data=sampleframe)
> summary(results.oprob)
> summary(results.oprob1)
In a typical ordered probit model, whether standard or heteroskedastic, the scale of the
parameter vectors is not identified. Identifying assumptions are necessary, for the standard
ordered probit the constant in the latent variable equation is set equal to zero while the
variance of the error term is set equal to one. These assumptions are somewhat arbitrary
and may be replaced by alternative assumptions. For example, in the case that the threshold
values are meaningful two (or more) of them can be imposed and the intercept and variance of
the error term can be estimated which leads to parameter estimates with a meaningful scale.
The code below estimates a heteroskedastic probit model, first with a standard assumption of
no constant in either the latent variable mean equation or the variance equation and secondly
using the fact we know the threshold values so fixing two of them to their true values.
> summary(results.oprobhet1)
> margins.oglmx(results.oprobhet)
> margins.oglmx(results.oprobhet1)
Considering the results for the parameter estimates, the size of the parameters in the mean
equation are different but the signs are the same and the conclusions reached from t-tests
of significance are the same. More importantly, the estimated marginal effects from these
two procedures are identical. The oglm command in Stata that permits estimation of het-
eroskedastic ordered models imposes the no-constant assumption used in the first estimation
above. Given the results and knowing the threshold values it is possible to transform the
results to obtain the correctly scaled parameter estimates, however by allowing a flexible
specification the oglmx package avoids the need for this transformation.
Using the lmtest package we can test whether the inclusion of the variables x1 and x2 in the
variance equation are jointly significant using a likelihood ratio test.
> library("lmtest")
> lrtest(results.oprob,results.oprobhet)
Model 1: y ~ x1 + x2
Model 2: y ~ (x1 + x2 | x1 + x2)
#Df LogLik Df Chisq Pr(>Chisq)
1 5 -307.32
2 7 -275.20 2 64.231 1.128e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This paper introduces the oglmx package for the R programming language. In linear models
heteroskedasticity in the error term does not affect the consistency of parameter estimates and
provided suitable standard error corrections are used inferences drawn from OLS estimates
are reliable. For non-linear models such as probit and ordered probit this is no longer the
case, heteroskedasticity can lead to substantially biased estimates of marginal effects. The
oglmx package permits the user to model the heteroskedasticity in order to obtain consistent
estimates of marginal effects and reliable statistical tests.
Ai and Norton (2003) identified an error committed by many applied researchers when inter-
preting results when using non-linear models such as probit and logit and including interaction
terms in their model. Further development of the package is expected to lead the margins
function to identify functions of variables that enter the equations for the mean and standard
deviation equations, i.e. interaction terms and polynomials of variables and estimate the true
marginal effects desired by researchers.
Inclusion of fixed effects in non-linear models such as probit, logit and their ordered equivalents
can lead to biased estimates due to the incidental parameters problem recognised by Neyman
and Scott (1948). Further development of the package will add methods to reduce this bias,
for example that suggested by Carro (2007) and adapted to the ordered outcome case with
two types of fixed effects in Carro and Traferri (2014).
References
Carro JM (2007). “Estimating dynamic panel data discrete choice models with fixed effects.”
Journal of Econometrics, 140(2), 503–528. ISSN 03044076. doi:10.1016/j.jeconom.
2006.07.023.
Carro JM, Traferri A (2014). “State Dependence and Heterogeneity in Health Using a Bias-
Corrected Fixed-Effects Estimator.” Journal of Applied Econometrics, 29, 181–207. ISSN
01451707. doi:10.1002/jae.
Case A, Lubotsky D, Paxson C (2002). “Economic Status and Health in Childhood: The
Origins of the Gradient.” American Economic Review, 92(5), 1308–1334.
Dearden L, Meghir C, Ferri J (2002). “The effect of school quality on educational attainment
and wages.” The Review of Economics and Statistics, 84(February), 1–20. doi:10.1162/
003465302317331883. URL http://dx.doi.org/10.1162/003465302317331883.
Fernihough A (2014). mfx: Marginal Effects, Odds Ratios and Incidence Rate Ratios for
GLMs. URL http://cran.r-project.org/package=mfx.
Gerdtham UG, Johannesson M (2001). “The relationship between happiness, health, and
socio-economic factors: Results based on Swedish microdata.” Journal of Socio-Economics,
30(6), 553–557. ISSN 10535357. doi:10.1016/S1053-5357(01)00118-4.
Nathan Carroll Universität Regensburg 13
Hartog J, Oosterbeek H (1998). “Health, wealth and happiness: why pursue a higher
education?” Economics of Education Review, 17(3), 245–256. ISSN 02727757. doi:
10.1016/S0272-7757(97)00064-2.
R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria. URL http://www.r-project.org/.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. Fourth edition. Springer,
New York. URL http://www.stats.ox.ac.uk/pub/MASS4.
Wooldridge JM (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press
Books. The MIT Press. URL http://ideas.repec.org/b/mtp/titles/0262232197.
html.
Affiliation:
Nathan Carroll
Institut für Volkswirtschaftslehre und Ökonometrie
Faculty of Business, Economics and Management Information Systems
Universität Regensburg
Universitätsstrasse 31
93053 Regensburg, Germany
E-mail: [email protected]