Xtxtmlogit
Xtxtmlogit
com
xtmlogit — Fixed-effects and random-effects multinomial logit models
Description
xtmlogit fits random-effects and conditional fixed-effects multinomial logit models for a categorical
dependent variable with unordered outcomes. The actual values taken by the dependent variable are
irrelevant.
Quick start
Random-effects model of y as a function of x1, x2, and indicators for levels of categorical variable
a using xtset data
xtmlogit y x1 x2 i.a
Same as above, but report relative-risk ratios
xtmlogit y x1 x2 i.a, rrr
Same as above, but with all variances and covariances distinctly estimated
xtmlogit y x1 x2 i.a, rrr covariance(unstructured)
Conditional fixed-effects model
xtmlogit y x1 x2 i.a, fe
Random-effects model with cluster–robust standard errors for panels nested within cvar
xtmlogit y x1 x2 i.a, vce(cluster cvar)
Menu
Statistics > Longitudinal/panel data > Categorical outcomes > Multinomial logistic regression (FE, RE)
1
2 xtmlogit — Fixed-effects and random-effects multinomial logit models
Syntax
Random-effects model
xtmlogit depvar indepvars if in weight , re RE options
RE options Description
Model
noconstant suppress constant term
re use random-effects estimator; the default
baseoutcome(#) value of depvar that will be the base outcome
constraints(constraints) apply specified linear constraints
covariance(vartype) variance–covariance structure of the random effects; default is
covariance(independent)
SE/Robust
vce(vcetype) vcetype may be oim, robust, cluster clustvar, bootstrap, or
jackknife
Reporting
level(#) set confidence level; default is level(95)
rrr report relative-risk ratios
lrmodel perform the likelihood-ratio model test instead of the default Wald test
nocnsreport do not display constraints
display options control columns and column formats, row spacing, line width,
display of omitted variables and base and empty cells, and
factor-variable labeling
Integration
intmethod(intmethod) integration method; intmethod may be mvaghermite (the default) or
ghermite
intpoints(#) use # quadrature points; default is intpoints(7)
Maximization
maximize options control the maximization process; seldom used
startgrid(numlist) improve starting values of the random-effects variance parameters by
performing a grid search
collinear keep collinear variables
coeflegend display legend instead of statistics
xtmlogit — Fixed-effects and random-effects multinomial logit models 3
vartype Description
independent distinct variances for each random effect and all covariances 0;
the default
shared one common random effect
identity equal variances for random effects and all covariances 0
exchangeable equal variances for random effects and one common pairwise
covariance
unstructured all variances and covariances to be distinctly estimated
FE options Description
Model
fe use fixed-effects estimator
baseoutcome(#) value of depvar that will be the base outcome
constraints(constraints) apply specified linear constraints
SE/Robust
vce(vcetype) vcetype may be oim, robust, cluster clustvar, bootstrap, or
jackknife
Reporting
level(#) set confidence level; default is level(95)
rrr report relative-risk ratios
nodots suppress display of progress bar
nocnsreport do not display constraints
display options control columns and column formats, row spacing, line width,
display of omitted variables and base and empty cells, and
factor-variable labeling
Permutations
rsample(# , rseed(# s ) ) draw sample of permuted outcome sequences at percentage #
favor(speed | space) favor speed or space when generating permutations of outcome
sequences; default is favor(speed)
force force estimation to proceed even if the number of permutations
exceeds 50 million
Maximization
maximize options control the maximization process; seldom used
collinear keep collinear variables
coeflegend display legend instead of statistics
Integration
intmethod(intmethod), intpoints(#); see [R] Estimation options.
Maximization
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(init specs); see [R] Maximize. These options are
seldom used.
The following options are available with xtmlogit but are not shown in the dialog box:
startgrid(numlist) performs a grid search to improve starting values of the random-effects param-
eters. By default, xtmlogit performs a grid search on startgrid(0.2 1).
collinear, coeflegend; see [R] Estimation options.
Permutations
rsample(# , rseed(# s ) ) specifies that a random subset be drawn from the set of all permutations
of the observed sequence of outcomes for each panel. Optionally, a random-number seed, # s , can
be specified to ensure reproducibility.
The size of the random subset is given as a percentage # of Ki , where Ki is the total number
of permutations of the outcome sequence in the ith panel. The resulting subset is of size Li =
ceil{(# /100)Ki }. The observed outcome sequence is also included for a total of Li + 1 sequences.
If rsample() is not specified, xtmlogit uses all Ki permutations in the conditional likelihood
calculation.
Specifying rsample() requires setting a time variable with xtset so that the order of the observed
outcome sequence is known.
If rsample() is specified, the default standard error type is vce(robust) rather than vce(oim).
favor(speed | space) instructs xtmlogit to favor either speed or space when generating the
permutations of the outcome sequences. favor(speed) is the default. When favoring speed,
the permuted sequences are generated once and stored in memory, thus increasing the speed of
evaluating the likelihood. This speed increase can be seen when the number of observations per
panel is relatively high. When favoring space, the permutations are generated repeatedly with each
likelihood evaluation.
P
force forces estimation to proceed even if the total number of permutations ( i Ki ) exceeds 50
million. Without specification of force, the fixed-effects estimator issues an error message if the
number of permutations exceeds 50 million. Estimation with this many permutations requires a
considerable amount of memory and is computationally intensive.
Maximization
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(init specs); see [R] Maximize. These options are
seldom used.
The following options are available with xtmlogit but are not shown in the dialog box:
collinear, coeflegend; see [R] Estimation options.
Introduction
xtmlogit fits random-effects and conditional fixed-effects multinomial logit (MNL) models. When-
ever we refer to a fixed-effects model, we mean the conditional fixed-effects model.
xtmlogit — Fixed-effects and random-effects multinomial logit models 7
Both the conditional fixed-effects and the random-effects estimators produce valid estimates in the
presence of unobserved heterogeneity at the panel level. The fixed-effects estimator is described in
Chamberlain (1980) and Pforr (2014). For a description of the random-effects estimator, see Hartzel,
Agresti, and Caffo (2001). For an application of the fixed-effects estimator, see Börsch-Supan (1990);
for an application of the random-effects estimator, see Grilli and Rampichini (2007).
The MNL model is a popular method for modeling categorical outcome variables where the categories
have no natural ordering. The MNL model is often used in the context of a random utility framework
to analyze choices made by individuals. However, the MNL model can also be found used without
an underlying utility theory, and the units of analysis do not necessarily have to be individuals or
other decision-making entities. In what follows, however, we will refer to individuals for the sake of
simplicity, and the set of choices each individual makes as a “panel”.
Unlike in cross-sectional applications of the MNL model, in the context of panel and longitudinal
data, we observe a sequence of outcomes for each individual in the dataset rather than just a single
observation. Each individual sequence can be thought of as a process that depends on individual
characteristics.
For example, if we were to analyze restaurant choices, vegetarians would consistently choose
restaurants that offer vegetarian dishes, or health-oriented people would consistently avoid fast-food
restaurants. In other words, the choices made by individuals are not independent over time because of
underlying individual preferences or characteristics, which often remain unobserved in the data. The
fixed- and random-effects MNL estimators discussed here offer a way to explicitly account for this
unobserved heterogeneity by including an additional error term at the panel level. This panel-level
error term is also known as a heterogeneity term and enters the model in addition to the error term
that accounts for heterogeneity at the observation (time) level.
The unobserved-heterogeneity model for both the conditional fixed-effects as well as the random-
effects estimator can be written in utility-maximization form as
Assuming we have a panel dataset with repeated observations from individuals, Uijt is the utility of
the ith individual toward outcome j at time t, with i = 1, . . . , N , j = 1, . . . , J , and t = 1, . . . , Ti .
The observed component of utility is xit βj , where xit is a row vector of covariates and βj is a
column vector of coefficients for outcome j . The unobserved part consists of error components uij
and ijt , where uij is the panel-level heterogeneity term and ijt is an observation-level error term.
Assuming a type-1 extreme value distribution for ijt , also known as a standard Gumbel distribution,
gives rise to the MNL model
exp(xit βm + uim )
Pr(yit = m | xit , βj , uij ) = J
P
exp(xit βj + uij )
j=1
For model identification, the above equation must be normalized with respect to a base category by
setting both the elements in βj as well as uij to zero for one of the categories of the outcome
variable. If—without loss of generality—we let the base outcome be outcome 1, the probability that
the ith individual chooses outcome m at time t is
8 xtmlogit — Fixed-effects and random-effects multinomial logit models
1
J
if m = 1
P
1 + j=2 exp(xit βj + uij )
Pr(yit = m | xit , βj , uij ) = F (yit = m, xit βj + uij ) =
exp(xit βm + uim )
J
if m > 1
P
1+ exp(xit βj + uij )
j=2
where φ(ui , Σu ) is the probability density function of the normal distribution ui ∼ N (0, Σu ). This
integral of dimension J − 1 has no closed-form solution and must be approximated numerically. By
default, xtmlogit uses adaptive Gauss–Hermite quadrature to approximate this integral.
xtmlogit allows for imposing a variety of structures on Σu . By default, xtmlogit estimates
separate, independent variance components for each of the J − 1 outcome equations. The option
covariance(shared) estimates a single shared variance component for all J − 1 outcome equations.
The most general case is specified by the option covariance(unstructured), which freely estimates
all variances and covariances among the random effects instead of treating them as independent. Not
imposing any structure on Σu can potentially yield more accurate results. However, this is also more
computationally intensive, resulting in longer computation times.
Let Yi = (Yi1 , . . . , YiTi ) be the sequence of outcomes of the ith panel, and let Yit =
(Yi1t , . . . , YiJt ) be a vector with elements Yijt = 1(i chooses j at t) that indicate the chosen
outcome of the ith panel at time t. The distribution of times that panel i chose each of the J
PTi
alternatives over time points Ti is then the sufficient statistic Θi = t=1 Yit = ci = (ci1 , . . . , ciJ ).
In other words, the elements in ci are sums of occurrences of each of the outcomes over time for
the ith panel.
Conditioning on the sufficient statistic Θi , the probability of panel i having a sequence Yi = si
that is consistent with ci is
Pr(Yi = si | Θi , ui , xi , β) = Pr{Yi1 , . . . , YiTi | Ψ(ci ), ui , xi , β}
!
Ti P
P J
exp Yijt xit βj
t=1 j=2
= !
P Ti P
P J
exp Yeijt xit βj
eijt ∈Ψ(ci )
Y t=1 j=2
where Ψ(ci ) is the set of all permutations of individual i’s observed sequence of outcomes that satisfy
PTi e
the condition t=1 Yit = ci . That is,
( Ti
)
X
Ψ(ci ) = Y e i = (Yei1 , . . . , YeiT )
i
Y
e it = ci
t=1
and Y
e it = (Yei1t , . . . , YeiJt ) is a vector of indicators with respect to the permutations of the observed
outcome sequence Yi . The log likelihood of panel i is then the natural logarithm of the above
probability
XTi X J X XTi X J
log li = Yijt xit βj − log exp Yeijt xit βj
t=1 j=2 eijt ∈Ψ(ci )
Y t=1 j=2
N
P
and the overall log likelihood is log li .
i=1
To illustrate the concept of permutations in this context, let us suppose we had a panel dataset
with three observations per individual and an outcome variable with four categories, j = 1, 2, 3, 4.
Let us further assume that for some individual in the dataset we observe the sequence Yi = (3, 2, 3).
This sequence has a total of three permutations, so the set of all permutations (which includes the
original sequence) for this individual consists of (2, 3, 3), (3, 2, 3), and (3, 3, 2). Notice that in all
three permutations, outcome 3 occurs twice, and outcome 2 occurs once, just as in the original
sequence.
Curse of dimensionality
Both the random-effects and fixed-effects estimators suffer from the curse of dimensionality. For
the random-effects estimator, the curse is rooted in J , the number of outcomes, because the integral
in (1) is a J − 1 dimensional integral unless one uses a common heterogeneity component for all
outcomes. This means that the computation time can be high for more than just three or four outcomes.
For example, if we had a dataset with six outcomes, we would have to approximate a five-dimensional
integral. If we were to use the default seven quadrature integration points, which are integration points
per dimension, we would end up with a total of 75 = 16807 integration points, resulting in substantial
computation time. If computation time becomes infeasible, one might consider using a single, shared
variance component, if appropriate.
10 xtmlogit — Fixed-effects and random-effects multinomial logit models
For the fixed-effects estimator, the curse of dimensionality is rooted mainly in Ti , the number of
repeated observations and potentially in J . The problem is that the number of permutations in Ψ(ci )
grows exponentially with Ti and can become infeasibly large. The number of permutations of panel
i’s observed vector of outcomes is
Ti !
Ki =
ci1 ! · · · cij ! · · · ciJ !
For instance, suppose we observed an individual with 15 repeated observations in a dataset with 6 out-
comes, j = 1, 2, . . . , 6, with the sequence of outcomes Yi = (3, 3, 3, 2, 4, 1, 1, 5, 4, 6, 6, 1, 1, 2, 4).
PTi PTi
Here t=1 Yi1t = 4, which is to say that outcome Yit = 1 is observed 4 times, t=1 Yi2t = 2, and
so on. The size of the set of permutations of this outcome vector is
15!
Ki = = 378,378,000
4! 2! 3! 3! 1! 2!
Notice that this number in the hundreds of millions is the size of the permutation set of just a single
panel in the dataset, and clearly this number can quickly become infeasibly large.
A potential solution that can alleviate this problem to some degree is to use a random subset
of permutations (D’Haultfœuille and Iaria 2016). The rsample() option can be used to specify the
size of the random subset as a percentage of the full set of permutations. Realistically, however,
the fixed-effects estimator is really feasible only with shorter panels where the number of repeated
observations does not exceed Ti = 9 or Ti = 10, depending on J , the size of the dataset, and possibly
other features of the data.
Examples
. use https://www.stata-press.com/data/r18/estatus
(Fictional employment status data)
. list id year estatus hhchild age in 22/41, sepby(id) noobs
The first person shown in the above excerpt (id==5) was observed between years 2002 and 2014.
The variable estatus records the employment history over these years. In this case, the person has
been employed between 2002 and 2008, was out of the labor force between 2010 and 2012, and was
unemployed prior to the interview in 2014.
The variable hhchild records whether at least one child under the age of 18 was living with the
surveyee in the same household at the time of the interview. Looking at the data of the first person
in the above excerpt, we see that there was one or more children in the household in 2002, but no
children in the household between 2004 and 2014. The variable age records the age of the women
at each interview. In this case, the woman was observed between 38 and 50 years of age.
To inspect the distribution of employment status over the entire sample, we can use the tabulate
command:
. tabulate estatus
Employment status Freq. Percent Cum.
We can see that in 35% of all observations, the interviewed women reported to be out of the
labor force, 15% of the time the women were unemployed, and 50% of the time the women were
employed.
12 xtmlogit — Fixed-effects and random-effects multinomial logit models
As with other panel-data estimators, we first need to declare our dataset to be panel data by using
the xtset command. Here we do not plan to use any lagged covariates, so it is sufficient to xtset
our dataset with just the person identifier id and without a variable for time:
. xtset id
Panel variable: id (unbalanced)
We can now go ahead and fit our model using xtmlogit. We will also include a number of control
variables: age, a person’s annual household income at the time of interview (hhincome), whether a
significant other was also living in the household at the time of interview (hhsigno), and whether the
surveyee was the sole or primary breadwinner in her household at the time of interview (bwinner).
We use the variable estatus as our dependent variable, and hhchild is our independent variable
of interest. Because hhchild, hhsigno, and bwinner are binary variables, we specify them as factor
variables.
xtmlogit — Fixed-effects and random-effects multinomial logit models 13
Out_of_lab~e
hhchild
Yes .4628125 .0962758 4.81 0.000 .2741154 .6515096
age -.004825 .0066428 -0.73 0.468 -.0178446 .0081946
hhincome -.0046922 .001839 -2.55 0.011 -.0082965 -.0010879
hhsigno
Yes .4967056 .0946442 5.25 0.000 .3112063 .6822049
bwinner
Yes -.4740919 .0727992 -6.51 0.000 -.6167756 -.3314082
_cons -.4787579 .2845139 -1.68 0.092 -1.036395 .0788792
Unemployed
hhchild
Yes -.0401989 .119596 -0.34 0.737 -.2746027 .1942049
age .0042644 .0081818 0.52 0.602 -.0117716 .0203004
hhincome -.0308468 .0026529 -11.63 0.000 -.0360463 -.0256473
hhsigno
Yes .0968 .1192659 0.81 0.417 -.1369568 .3305568
bwinner
Yes -.2252587 .0951984 -2.37 0.018 -.4118441 -.0386733
_cons -.0953821 .3508736 -0.27 0.786 -.7830817 .5923175
LR test vs. multinomial logit: chi2(2) = 225.31 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
14 xtmlogit — Fixed-effects and random-effects multinomial logit models
Looking at the table header, we can find some useful information about the model we just fit. For
example, we can see that the estimation sample consists of 4,761 observations from 800 groups (800
individuals in this case), with between 5 and 7 observations per group. The model test right above
the table on the right is a joint test of all model coefficients except the constants.
Looking at the output table itself, we see the results for all J − 1 equations. Because we have
three outcome categories, we see the coefficient estimates for two of the outcomes, while employment
is our base outcome. Here using employment as the base makes sense given our research question,
and we would have chosen this as a base if we had to specify it explicitly. In this case, however,
employment was chosen automatically because it is the most frequent category in our dataset, which
is what xtmlogit defaults to. If we had wanted to specify a different category as the base, we would
have used the baseoutcome() option.
Below the model coefficient estimates, we find the estimated variances of the random effects. In
this case, we have two estimates that correspond to the nonbase equations. By default, xtmlogit
assumes that the random effects are uncorrelated across the equations. We will see in the next example
how to use the covariance() option to specify a different covariance structure. Here we can see that
there is some considerable variance of the panel-level unobservables. The lower bound of the 95%
confidence interval is not close to zero relative to their estimated standard errors. This observation is
confirmed by the likelihood-ratio test shown beneath the table, which is a test of our model against
the MNL model without random effects.
Let us get back to our initial research question: what is the effect of having children under the
age of 18 in the household on employment status? The interpretation of the coefficients is the same
as in a conventional cross-sectional MNL model, except that, in the random-effects case, they are to
be interpreted as conditional on the random effects, while they naturally have a population-average
interpretation in the cross-sectional case. Either way, the coefficients are difficult to interpret. They
can be thought of as the natural logarithm of a double ratio: the logarithm of the relative risk, relative
to the base category. Realistically, only the sign of these coefficients can be interpreted usefully.
Looking at the results, we can see that the coefficient of hhchild in the first equation (out of labor
force) is around 0.46. Thus, we can say that women with children under 18 in the household are
more likely not to participate in the labor force than women with no young children in the household,
relative to being employed full time.
A more informative way to interpret the results would be to use relative-risk ratios (RRRs) instead
of log relative-risk ratios by exponentiating the coefficients. That is, instead of βj , we use exp(βj )
to interpret the results. With xtmlogit, we can use the rrr option for that purpose. This option can
be used at the time of estimation or when replaying results. Here we use it as a replay option:
xtmlogit — Fixed-effects and random-effects multinomial logit models 15
. xtmlogit, rrr
Random-effects multinomial logistic regression Number of obs = 4,761
Group variable: id Number of groups = 800
Random effects u_i ~ Gaussian Obs per group:
min = 5
avg = 6.0
max = 7
Integration method: mvaghermite Integration pts. = 7
Wald chi2(10) = 239.26
Log likelihood = -4468.8413 Prob > chi2 = 0.0000
Out_of_lab~e
hhchild
Yes 1.588535 .1529375 4.81 0.000 1.315367 1.918435
age .9951866 .0066108 -0.73 0.468 .9823137 1.008228
hhincome .9953188 .0018303 -2.55 0.011 .9917379 .9989127
hhsigno
Yes 1.643299 .1555288 5.25 0.000 1.365071 1.978235
bwinner
Yes .6224501 .0453138 -6.51 0.000 .5396818 .7179121
_cons .6195525 .1762713 -1.68 0.092 .3547312 1.082074
Unemployed
hhchild
Yes .9605983 .1148837 -0.34 0.737 .7598739 1.214345
age 1.004274 .0082168 0.52 0.602 .9882974 1.020508
hhincome .9696241 .0025723 -11.63 0.000 .9645956 .9746788
hhsigno
Yes 1.10164 .1313881 0.81 0.417 .8720079 1.391743
bwinner
Yes .7983097 .0759978 -2.37 0.018 .6624275 .9620649
_cons .9090255 .3189531 -0.27 0.786 .4569955 1.808174
Looking at the RRRs of hhchild in the out-of-labor-force equation, which is around 1.6, we can
say that the relative risk of being out of the labor force for women having at least one child under
the age of 18 in the household versus having no children under 18 in the household is 1.6 times as
large as the relative risk in the case of employment. While this provides a little bit more information,
it still does not provide a very intuitive way to interpret our results. It would be easier if we could
just see the actual risks for each of the outcomes with respect to the hhchild variable and then also
the risk differences rather than risk ratios. To that end, we can use margins:
16 xtmlogit — Fixed-effects and random-effects multinomial logit models
. margins hhchild
Predictive margins Number of obs = 4,761
Model VCE: OIM
1._predict: Pr(estatus==Out_of_labor_force), predict(pr outcome(1))
2._predict: Pr(estatus==Unemployed), predict(pr outcome(2))
3._predict: Pr(estatus==Employed), predict(pr outcome(3))
Delta-method
Margin std. err. z P>|z| [95% conf. interval]
_predict#
hhchild
1#No .3021986 .0131047 23.06 0.000 .2765138 .3278834
1#Yes .3912783 .0119865 32.64 0.000 .3677852 .4147714
2#No .1630791 .0101239 16.11 0.000 .1432367 .1829216
2#Yes .139782 .0079417 17.60 0.000 .1242167 .1553474
3#No .5347223 .0136504 39.17 0.000 .507968 .5614766
3#Yes .4689397 .0116018 40.42 0.000 .4462006 .4916787
By default, margins uses predicted probabilities that account for the random effects. The prob-
abilities are obtained by integrating out the random effects such that their averages can be used to
make population-average inferences. Starting with our third outcome, employment, we can see that
the averaged probability of being employed full time is around 0.47 in the presence of children under
the age of 18 in the household, whereas this probability is around 0.53 in the absence of young
children. Thus, women have a higher chance of being employed full time if they have no young
children living with them in the same household.
We can further quantify the difference in chance by calculating the risk difference, which here is
around 0.07. Using a percentage scale rather than probability scale, we can say that the chance of
being employed is higher by about 7 percentage points if no young child is in the household. Looking
at the other outcome of interest, we can see that the chance of being out of the labor force is about
39% in the presence of young children in the household and around 30% otherwise, resulting in a
risk difference of around 9 percentage points.
xtmlogit — Fixed-effects and random-effects multinomial logit models 17
We could also compute these risk differences directly by using the contrast operator r.:
. margins r.hhchild
Contrasts of predictive margins Number of obs = 4,761
Model VCE: OIM
1._predict: Pr(estatus==Out_of_labor_force), predict(pr outcome(1))
2._predict: Pr(estatus==Unemployed), predict(pr outcome(2))
3._predict: Pr(estatus==Employed), predict(pr outcome(3))
df chi2 P>chi2
hhchild@_predict
(Yes vs No) 1 1 26.36 0.0000
(Yes vs No) 2 1 3.28 0.0700
(Yes vs No) 3 1 13.33 0.0003
Joint 2 26.40 0.0000
Delta-method
Contrast std. err. [95% conf. interval]
hhchild@_predict
(Yes vs No) 1 .0890797 .0173496 .0550752 .1230843
(Yes vs No) 2 -.0232971 .0128562 -.0484948 .0019005
(Yes vs No) 3 -.0657826 .0180195 -.1011001 -.0304651
We can see that the results match the differences from the previous margins call. The predicted
probabilities underlying the margins analysis are also the default predictions of predict after
xtmlogit, re.
Out_of_lab~e
hhchild
Yes .4924799 .1002988 4.91 0.000 .295898 .6890619
age -.004219 .0070064 -0.60 0.547 -.0179513 .0095133
hhincome -.006046 .001992 -3.04 0.002 -.0099503 -.0021417
hhsigno
Yes .5036976 .0966982 5.21 0.000 .3141726 .6932225
bwinner
Yes -.489057 .0745454 -6.56 0.000 -.6351632 -.3429507
_cons -.3930378 .298386 -1.32 0.188 -.9778636 .191788
Unemployed
hhchild
Yes .0399687 .1238417 0.32 0.747 -.2027565 .2826939
age .0045538 .0085081 0.54 0.592 -.0121219 .0212294
hhincome -.0315377 .0027426 -11.50 0.000 -.0369131 -.0261624
hhsigno
Yes .1495817 .1214242 1.23 0.218 -.0884053 .3875687
bwinner
Yes -.2552257 .0968165 -2.64 0.008 -.4449826 -.0654689
_cons -.0417024 .3633406 -0.11 0.909 -.7538368 .670432
LR test vs. multinomial logit: chi2(3) = 286.41 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
At the bottom of the table, we can see the additional estimate for the covariance among the random
effects. When we look at the estimate relative to its standard error, or at the corresponding test result,
it looks as though the random effects are correlated considerably. To get a better idea of how strongly
the random effects are correlated, we might want to look at standard deviations and correlations, rather
than variances and covariances. We can do that by using the estat sd postestimation command:
xtmlogit — Fixed-effects and random-effects multinomial logit models 19
. estat sd
The results of estat sd show that the correlation between the random effects, u1 and u2, is around
0.7, which appears rather substantial. If we had more than one estimated covariance and wanted to
test the inclusion of covariance estimates as a whole, we could perform a joint test on the covariances
against zero using the test command. Testing covariances against zero is straightforward because
they are not bounded, unlike the variances. Because here we have only a single covariance estimate,
we can simply take the test result reported by xtmlogit. The results show that we can reject the
hypothesis of the covariance being zero.
Alternatively, we could perform a likelihood-ratio test here because the model with independent
covariance structure is a special case of the model with no structure imposed. We will fit the model
with uncorrelated random effects again, store the results, and use the lrtest command to perform
the likelihood-ratio test:
. estimates store unstr
. xtmlogit estatus i.hhchild age hhincome i.hhsigno i.bwinner, baseoutcome(3)
(output omitted )
. estimates store indep
. lrtest unstr indep
Likelihood-ratio test
Assumption: indep nested within unstr
LR chi2(1) = 61.11
Prob > chi2 = 0.0000
The conclusion here is the same as before: the model with no structure imposed on the random
effects covariance matrix appears to be preferable. However, if we compare the results with those
from our previous model, we can see that the model with unstructured covariance matrix would not
necessarily lead to substantially different conclusions, judging by the differences in relative-risk ratios
between the two models. This becomes even more apparent if we were to look at the differences in
the averaged marginal probabilities. For example, the difference between having and not having a
child in the household with respect to not participating in the labor force was 0.089 on the probability
scale in the previous example with independent covariance structure. If we were to compute this
risk difference again for the unstructured model, we would find a difference of 0.092 with a similar
standard error.
As an aside, notice that when we refit the model with uncorrelated random effects, we specified
the option baseoutcome(3). We would not have to do this because we already knew that xtmlogit
would choose the third outcome as base, but we did so anyway to point out that it is good practice to be
explicit about this in this context. It is important that the models that are compared with a likelihood-
ratio test use the same base outcome. This is because, unlike in a conventional cross-sectional MNL
model, the likelihood solution differs with different base outcomes because the modeling of random
effects depends on what category is selected as the reference category.
20 xtmlogit — Fixed-effects and random-effects multinomial logit models
Out_of_lab~e
hhchild
Yes 1.800717 .2266555 4.67 0.000 1.407036 2.304549
age .9996159 .0147684 -0.03 0.979 .9710854 1.028985
hhincome .9878698 .0087391 -1.38 0.168 .9708891 1.005148
hhsigno
Yes 1.663632 .166548 5.08 0.000 1.367233 2.024287
bwinner
Yes .6277743 .0491447 -5.95 0.000 .5384781 .7318786
Unemployed
hhchild
Yes 1.177757 .1930267 1.00 0.318 .8541801 1.623911
age 1.006356 .0195273 0.33 0.744 .9688014 1.045366
hhincome .9706959 .0116513 -2.48 0.013 .9481262 .9938029
hhsigno
Yes 1.124478 .1463356 0.90 0.367 .8713222 1.451187
bwinner
Yes .7795833 .0802992 -2.42 0.016 .637069 .9539784
Starting with the table header, we can see that our estimation sample consists of 4,310 observations
from 720 women. We saw earlier that we had 800 women in our dataset, so why do we now have
only 720? The answer to this question is given in the note near the top of the output, which lets us
know that 451 observations from 80 groups were dropped for the analysis because in these cases,
xtmlogit — Fixed-effects and random-effects multinomial logit models 21
there is no variation in the outcome variable over time. Technically, these observations could have
been kept in the estimation sample, but with no variation in the outcome, these observations would
not contribute anything to the likelihood, so they can as well be excluded. Looking at the relative-risk
ratios, we see the results are fairly similar to our random-effects estimates. We observe an RRR for
our variable of interest hhchild of around 1.8 for the out-of-labor-force category. The interpretation
of the RRRs here is the same as with the random-effects model from the earlier examples.
An unfortunate side effect of the fixed-effects estimator is that we cannot make predictions that
account for the panel-level unobservables. That is because we do not estimate the unobservables
explicitly. Therefore, unfortunately, we also cannot perform useful marginal analyses using the
margins command.
. xtset id year
Panel variable: id (unbalanced)
Time variable: year, 2002 to 2014, but with gaps
Delta: 1 unit
. xtmlogit estatus i.hhchild age hhincome i.hhsigno i.bwinner, fe rrr
> rsample(10, rseed(123))
note: option vce() set to vce(robust) because of permutation sampling.
note: 80 groups (451 obs) omitted because of no variation in the outcome
variable over time.
Computing initial values ...
Setting up 3,495 permutations:
....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Fitting full model:
Iteration 0: Log pseudolikelihood = -908.26163
Iteration 1: Log pseudolikelihood = -906.4585
Iteration 2: Log pseudolikelihood = -906.45801
Iteration 3: Log pseudolikelihood = -906.45801
Fixed-effects multinomial logistic regression Number of obs = 4,310
Group variable: id Number of groups = 720
Obs per group:
min = 5
avg = 6.0
max = 7
Wald chi2(10) = 72.91
Log pseudolikelihood = -906.45801 Prob > chi2 = 0.0000
(Std. err. adjusted for 720 clusters in id)
Robust
estatus RRR std. err. z P>|z| [95% conf. interval]
Out_of_lab~e
hhchild
Yes 1.790876 .2663706 3.92 0.000 1.338011 2.397017
age .994506 .0167663 -0.33 0.744 .9621816 1.027916
hhincome .9858517 .0099036 -1.42 0.156 .9666309 1.005455
hhsigno
Yes 1.559166 .1891864 3.66 0.000 1.229162 1.977769
bwinner
Yes .6304536 .0616622 -4.72 0.000 .5204757 .7636702
Unemployed
hhchild
Yes 1.186982 .2173595 0.94 0.349 .8290349 1.699479
age .9953453 .0215995 -0.21 0.830 .9538986 1.038593
hhincome .9661192 .0127244 -2.62 0.009 .9414989 .9913833
hhsigno
Yes .9267669 .1294269 -0.54 0.586 .7048498 1.218553
bwinner
Yes .7490293 .088281 -2.45 0.014 .5945326 .9436738
Looking at the output, we can see that the results are very close to the results from the previous
example, with standard errors being slightly larger, as we would expect. Notice also that, by default,
xtmlogit computes cluster–robust standard errors in this case because the likelihood function is not
the true likelihood because a term that is the sum over all permutations is replaced by a sum over a
sample of the permutations.
Out_of_lab~e
1.hhchild .5881852 .4628125 .1253727 .0810809
age -.0003842 -.004825 .0044408 .0131965
hhincome -.0122043 -.0046922 -.0075122 .0086532
1.hhsigno .5090034 .4967056 .0122977 .0326296
1.bwinner -.4655745 -.4740919 .0085173 .0287868
Unemployed
1.hhchild .163612 -.0401989 .203811 .1120618
age .0063355 .0042644 .0020711 .0175947
hhincome -.029742 -.0308468 .0011048 .0117062
1.hhsigno .1173192 .0968 .0205192 .0520686
1.bwinner -.2489958 -.2252587 -.0237371 .0393297
Notice that we specified hausman such that we gave it the results of the estimator that is consistent
under both H0 and Ha first (the fixed-effects estimator) and the results of the estimator that is
efficient under H0 second (the random-effects estimator). We also specified the alleqs option to
apply the test to all equations present in both models. The result of the test, χ2 = 8.05 with df = 10
yielding p = 0.62, suggests that we do not reject H0 . In other words, here we may proceed with the
random-effects estimator.
Technical note
The random-effects model is calculated using quadrature, which is an approximation whose accuracy
depends partially on the number of integration points used. We can use the quadchk command to see
if changing the number of integration points affects the results. If the results change, the quadrature
approximation is not accurate given the number of integration points. Try increasing the number
of integration points using the intpoints() option, and run quadchk again. Do not attempt to
interpret the results of estimates when the coefficients reported by quadchk differ substantially. See
[XT] quadchk for details and [XT] xtprobit for an example.
Because the xtmlogit likelihood function is calculated by Gauss–Hermite quadrature, on large
problems computations can be slow. Computation time is roughly proportional to the number of points
used for the quadrature.
xtmlogit — Fixed-effects and random-effects multinomial logit models 25
Stored results
xtmlogit, re stores the following in e():
Scalars
e(N) number of observations
e(N g) number of groups
e(k) number of parameters
e(k out) number of outcomes
e(k eq) number of equations in e(b)
e(k eq model) number of equations in overall model test
e(k eq base) equation number of the base outcome
e(baseout) the value of depvar to be treated as the base outcome
e(ibaseout) index of the base outcome
e(k dv) number of dependent variables
e(df m) model degrees of freedom
e(ll) log likelihood
e(ll 0) log likelihood, constant-only model
e(ll c) log likelihood, comparison model
e(chi2) χ2
e(chi2 c) χ2 for comparison test
e(p) p-value for model test
e(p c) p-value for comparison test
e(df c) comparison test degrees of freedom
e(N clust) number of clusters
e(n quad) number of quadrature points
e(g min) smallest group size
e(g avg) average group size
e(g max) largest group size
e(rank) rank of e(V)
e(rank0) rank of e(V) for constant-only model
e(ic) number of iterations
e(rc) return code
e(converged) 1 if converged, 0 otherwise
Macros
e(cmd) gsem
e(cmd2) xtmlogit
e(cmdline) command as typed
e(depvar) name of dependent variable
e(wtype) weight type
e(wexp) weight expression
e(covariance) random-effects covariance structure
e(ivar) variable denoting groups
e(model) re
e(title) title in estimation output
e(distrib) Gaussian; the distribution of the random effect
e(clustvar) name of cluster variable
e(eqnames) names of equations
e(baselab) value label corresponding to base outcome
e(chi2type) Wald or LR; type of model χ2 test
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. err.
e(intmethod) integration method
e(opt) type of optimization
e(which) max or min; whether optimizer is to perform maximization or minimization
e(ml method) type of ml method
e(user) name of likelihood-evaluator program
e(technique) maximization technique
e(properties) b V
e(predict) program used to implement predict
e(estat cmd) program used to implement estat
e(marginsok) predictions allowed by margins
e(marginsnotok) predictions disallowed by margins
26 xtmlogit — Fixed-effects and random-effects multinomial logit models
Note that results stored in r() are updated when the command is replayed and will be replaced when
any r-class command is run after the estimation command.
Note that results stored in r() are updated when the command is replayed and will be replaced when
any r-class command is run after the estimation command.
The variable Uijt measures the utility of the ith panel toward outcome j at time t and is the sum
of observed and unobserved components. The observed part of Uijt consists of xit , a row vector of
observed covariates of the ith panel at time t, and βj , a column vector of coefficients for the j th
outcome. The vector of covariates is the same for each outcome, and the covariates do not vary over
the outcomes for a given panel at a given time point. The unobserved part of Uijt consists of uij
and ijt , where uij is a panel-level unobserved heterogeneity term and ijt is the observation-level
error term. For model identification, (2) must be normalized with respect to a base category.
28 xtmlogit — Fixed-effects and random-effects multinomial logit models
Assuming a type-1 extreme value distribution for ijt gives rise to the MNL model
exp(xit βm + uim )
Pr(yit = m | xit , β, uij ) = F (yit = m, xit βj + uij ) = J
P
exp(xit βj + uij )
j=1
For normalization, βj and uij are set to zero for j = b, where b is the base outcome.
The random-effects estimator of xtmlogit assumes that ui is distributed ui ∼ N (0, Σu ). The
likelihood for the ith panel is
Z ∞ Z ∞ (Y
Ti
)
li = ··· F (yit = m, xit βj + uij ) φ(ui , Σu ) dui
−∞ −∞ t=1
Z ∞ Z ∞
≡ ··· f (yit = m, ηijt ) dui
−∞ −∞
where φ is the probability density function of the normal distribution and ηijt = xit βj + uij . This
integral of dimension J − 1 must be approximated numerically because it has no closed-form solution.
In the univariate case, the integral of a function multiplied by the kernel of the standard normal
distribution can be approximated using Gauss–Hermite quadrature. For q -point Gauss–Hermite quadra-
ture, let the abscissa and weight pairs be denoted by (a∗k , wk∗ ), k = 1, . . . , q . The Gauss–Hermite
quadrature approximation is then
Z ∞ Xq
f (x) exp(−x2 ) dx ≈ wk∗ f (a∗k )
−∞ k=1
Using the standard normal distribution yields the approximation
Z ∞ Xq
f (x)φ(x) dx ≈ wk f (ak )
−∞ k=1
√ √
where ak = 2a∗k and wk = wk∗ / π.
We can use a change-of-variables technique to transform the multivariate integral into a set of nested
univariate integrals. Each univariate integral can then be evaluated using Gauss–Hermite quadrature.
Let v be a random vector whose elements are independently standard normal, and let L be the
Cholesky decomposition of Σu ; that is, Σu = LL0 . In the distribution, we have that ui ≈ Lv, and
the linear predictions vector as a function of v is
ηeijt = xit βj + Lv
so the likelihood for a given panel is
Z ∞ ( r
)
Z ∞
1X 2
li = (2π)−r/2 ... exp logf (yi , ηi ) − vk dv1 . . . dvr
−∞ −∞ 2
k=1
where
ηeijtk = xit βj + Lak
q q
" (T ) r
#
X X Xi Y
l¨i = ... exp logf (yit = m, η̌ijtk ) ωks
k1 =1 kr =1 t=1 s=1
where
η̌ijtk = xit βj + Lαk
and αk and the ωks are the adaptive versions of the abscissas and weights after an orthogonalizing
transformation, which eliminates posterior covariances between the latent variables. αk and the ωks
are functions of ak and wk as well as the posterior mean and the posterior variance of v.
The fixed-effects estimator follows the derivations in Chamberlain (1980) and Pforr (2014). Let
Yi = (Yi1 , . . . , YiTi ) be the sequence of outcomes of the ith panel, and let Yit = (Yi1t , . . . , YiJt )
be a vector with elements Yijt = 1(i chooses j at t) that indicate the chosen outcome of the ith panel
at time t.
The distribution of times that panel i chose each of the J alternatives over time points Ti is the
PTi
sufficient statistic Θi = t=1 Yit = ci = (ci1 , . . . , ciJ ). Conditioning on the sufficient statistic Θi ,
we find the probability of panel i choosing a sequence Yi = si that is consistent with ci is
where Ψ(ci ) is the set of all permutations of individual i’s observed sequence of outcomes that satisfy
PTi e
the condition t=1 Yit = ci . That is,
( Ti
)
X
Ψ(ci ) = Y
e i = (Yei1 , . . . , YeiT )
i
Y
e it = ci
t=1
and Y
e it = (Yei1t , . . . , YeiJt ) is a vector of indicators with respect to the permutations of the observed
outcome sequence Yi . The log likelihood of panel i is then the natural logarithm of the above
probability
Ti
X J
X X XTi J
X
log li = Yijt xit βj − log exp Yeijt xit βj
t=1 j=1,j6=b eijt ∈Ψ(ci )
Y t=1 j=1,j6=b
N
P
and the overall log likelihood is log li .
i=1
30 xtmlogit — Fixed-effects and random-effects multinomial logit models
Consistent, albeit less efficient, estimates of the parameters in βj can be obtained by taking a
random sample of the elements in Ψ(ci ). The total number of permutations in Ψ(ci ) is
Ti !
Ki =
ci1 ! · · · cij ! · · · ciJ !
Let Ψ̈(ci ) be a random subset of Ψ(ci ). Ψ̈(ci ) consists of Li + 1 elements, where Li elements are
randomly drawn without replacement and equal probability from the set Ψ(ci ) that has the observed
sequence of outcomes removed, and then the observed sequence is added such that Ψ̈(ci ) always
contains the observed outcome sequence. The log likelihood with sampled permutations is
Ti
X J
X X XTi J
X
log li = Yijt xit βj − log exp Yeijt xit βj
t=1 j=1,j6=b eijt ∈Ψ̈(ci )
Y t=1 j=1,j6=b
The above is a consistent estimator of βj but is less efficient compared with using the full set of
permutations Ψ(ci ) because of the added Monte Carlo variance. The smaller the size of the sample
relative to Ki , the number of all permutations, the less efficient it becomes. The permutation sampling
implemented in xtmlogit follows the approach outlined in D’Haultfœuille and Iaria (2016).
References
Andersen, E. B. 1970. Asymptotic properties of conditional maximum likelihood estimators. Journal of the Royal
Statistical Society, Series B 32: 283–301. https://doi.org/10.1111/j.2517-6161.1970.tb00842.x.
Börsch-Supan, A. 1990. Panel data analysis of housing choices. Regional Science and Urban Economics 20: 65–82.
https://doi.org/10.1016/0166-0462(90)90025-X.
Chamberlain, G. 1980. Analysis of covariance with qualitative data. Review of Economic Studies 47: 225–238.
https://doi.org/10.2307/2297110.
D’Haultfœuille, X., and A. Iaria. 2016. A convenient method for the estimation of the multinomial logit model with
fixed effects. Economics Letters 141: 77–79. https://doi.org/10.1016/j.econlet.2016.02.002.
Grilli, L., and C. Rampichini. 2007. A multilevel multinomial logit model for the analysis of graduates’ skills.
Statistical Methods and Applications 16: 381–393. https://doi.org/10.1007/s10260-006-0039-z.
Hartzel, J., A. Agresti, and B. S. Caffo. 2001. Multinomial logit random effects models. Statistical Modelling 1:
81–102. https://doi.org/10.1177/1471082X0100100201.
Lancaster, T. 2000. The incidental parameter problem since 1948. Journal of Econometrics 95: 391–413.
https://doi.org/10.1016/S0304-4076(99)00044-5.
Pforr, K. 2014. femlogit—Implementation of the multinomial logit model with fixed effects. Stata Journal 14: 847–862.
xtmlogit — Fixed-effects and random-effects multinomial logit models 31
Also see
[XT] xtmlogit postestimation — Postestimation tools for xtmlogit
[XT] quadchk — Check sensitivity of quadrature approximation
[XT] xtlogit — Fixed-effects, random-effects, and population-averaged logit models
[XT] xtset — Declare data to be panel data
[BAYES] bayes: xtmlogit — Bayesian random-effects multinomial logit model
[R] clogit — Conditional (fixed-effects) logistic regression
[R] mlogit — Multinomial (polytomous) logistic regression
[R] mprobit — Multinomial probit regression
[SVY] svy estimation — Estimation commands for survey data
[U] 20 Estimation and postestimation commands
Stata, Stata Press, and Mata are registered trademarks of StataCorp LLC. Stata and
®
Stata Press are registered trademarks with the World Intellectual Property Organization
of the United Nations. Other brand and product names are registered trademarks or
trademarks of their respective companies. Copyright c 1985–2023 StataCorp LLC,
College Station, TX, USA. All rights reserved.