Fit Cmclogit
Fit Cmclogit
Example of application
Conclusions
2
What is the problem?
In 1992 Stata V3 introduced the clogit-
command to estimate Conditional (fixed-
effects) logistic regression model which
calculates the McFadden Pseudo R²
In 2007 Stata V10 introduced the asclogit-
command to estimate the alternative-specific
conditional logit model
In 2019 Stata V16 introduced the Choice
Models (cm) commands
But none of them calculates the Likelihood-
Ratio-chi² test statistic and any Pseudo R² to
assess the fit of the model !
3
What is the solution in Stata?
My fit_cmclogit.ado calculates for McFadden’s
conditional logit choice model the following test
statistic and Pseudo R²s tested by Monte Carlo
simulation studies in the 1990s / 2000s
< Likelihood-Ratio-chi² test statistic using a zero model
with alternative-specific constants
< McFadden Pseudo R² (likelihood-ratio-index) (1974)
< Adjusted McFadden Pseudo R² (1985)
< Maddala Pseudo R² (1983)
< Cragg & Uhler Pseudo R² (1970)
< Aldrich & Nelson Pseudo R² (1984)
< Aldrich & Nelson Pseudo R² with Veall &
Zimmermann correction (1994)
4
Example of application
North Rhine-Westphalia Election Study of 1995
Wald chi2(9) =
=
3
263.97
0.0000
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
vote | Coefficient Std. err. z P>|z| [95% conf. interval]
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
party |
gprefall |
yes | 2.193726 .1401447 15.65 0.000 1.919048 2.468405
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
SPD |
confession |
yes | ‐.9000949 .3084457 ‐2.92 0.004 ‐1.504637 ‐.2955524
|
education |
sec.modern+ | ‐.1846034 .3412324 ‐0.54 0.589 ‐.8534067 .4841998
grammar school | ‐.645902 .5506053 ‐1.17 0.241 ‐1.725069 .4332646
college/university | ‐1.03819 .6887728 ‐1.51 0.132 ‐2.38816 .3117801
|
_cons | .4353825 .2737489 1.59 0.112 ‐.1011554 .9719205
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
FDP |
confession |
yes | ‐.6455168 .3947333 ‐1.64 0.102 ‐1.41918 .1281462
|
education |
sec.modern+ | 1.393966 .4604399 3.03 0.002 .4915205 2.296412
grammar school | 2.076665 .6434303 3.23 0.001 .8155643 3.337765
college/university | 3.160799 .5990928 5.28 0.000 1.986598 4.334999
|
_cons | ‐1.956077 .3772011 ‐5.19 0.000 ‐2.695377 ‐1.216776
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
CDU | (base alternative)
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
6
Output of my fit_cmclogit.ado
. fit_cmclogit
Aldrich & Nelson Pseudo R2 with Veall & Zimmermann correction = 0.7246
7
My ado returns the following r-containers
. return list
scalars:
r(logl_m0) = ‐491.3376899127339
r(logl_ma) = ‐259.6791267683948
r(an_pr2_vz) = .7246287335654139
r(an_pr2) = .4789712842842913
r(cu_pr2) = .7009448689123344
r(ml_pr2) = .6011939268610912
r(rho2_bar) = .4531680913464735
r(rho2) = .4714854323214728
r(lr_p) = 0
r(lr_df) = 9
r(lr_chi2) = 463.3171262886782
8
Conclusions
What have I shown?
What’s in progress?
10
Contact
Affiliation
< Email:
– [email protected]
< Url:
– https://langer.soziologie.uni-halle.de
11
References
– Aldrich, J.H. & Nelson, F.D. (1984):
Linear probability, logit, and probit models. Newbury Park: SAGE
(Quantitative Applications in the Social Sciences, 45)
– Amemiya, T. (1981):
Qualitative response models: a survey. Journal of Economic Literature, 21, pp.1483-1536
– Ben-Akiva,M. & S.R.Lerman 19914 (1985):
Discrete choice analysis. Theory and application to travel demand. Cambridge, Mass:
MIT-Press
– Cox, D.R.& Snell, E.J. (1989):
The analysis of binary data. London: Chapman&Hill
– Cragg, S.G.& Uhler, R. (1970):
The demand for automobiles. Canadian Journal of Economics, 3, pp. 386-406
– DeMaris, A.(2002):
Explained variances in logistic regression. A Monte Carlo study of proposed
measures.Sociological Methods&Research, 11, 1, pp. 27-74
– Domencich,T.A. & McFadden, D.L. (1975): Urban travel demand. A behavioral analysis.
Amsterdam u. Oxford: North Holland Publishing Company
– Efron, B. (1978):
Regression and Anova with zero-one data. Measures of residual variation. Journal of
American Statistical Association, 73, pp. 113-121
– Hagle, T.M. & Mitchell II,G.E. (1992):
Goodness of fit measures for probit and Logit. American Journal of Political Science, 36,
3, pp. 762-784
12
References 2
– Hensher, D.A., Rose, J.M. & Greene (2005):
Applied choice analysis. A primer. Cambridge: Cambridge University Press
– Long, J.S. (1997):
Regression models for categorical and limited dependent variables. Thousand Oaks,
Ca : Sage
– Long, J.S. & Freese, J. (2000):
Scalar measures of fit for regression models. Bloomington, : Indiana University
– Long, J.S. & Freese, J. (20032):
Regression models for categorical dependent variables using Stata. College Station,
Tx: Stata
– Maddala, G.S. (1983):
Limited-dependent and qualitative variables in econometrics. Cambridge: Cambridge
University Press
– McFadden, D. (1974): Conditional logit analysis of qualitative choice behavior. In:
P.Zarembka (ed.), Frontiers in Econometrics. New York: Academice Press, pp. 105-142
– McFadden, D. (1978):
Quantitative methods for analysing travel behaviour of individuals: some recent
developments. In: D.A. Hensher & P.R. Stopher: (eds): Behavioural travel
modelling. London: Croom Helm, pp. 279-318
– McKelvey, R. & Zavoina, W. (1975):
A statistical model for the analysis of ordinal level dependent variables. Journal of
Mathematical Sociology, 4, pp. 103-20
– Nagelkerke, N.J.D. (1991):
A note on a general definition of the coefficient of determination. Biometrika, 78, 3,
pp.691-693
13
References 3
– Veall, M.R. & Zimmermann, K.F. (1992):
Pseudo-R2 in the ordinal probit model. Journal of Mathematical Sociology, 16, 4, pp. 333-
342
– Veall, M.R. & Zimmermann, K.F. (1994):
Evaluating Pseudo-R2's for binary probit models. Quality&Quantity, 28, pp. 151 - 164
– Windmeijer, F.A.G. (1995):
Goodness-of-fit measures in binary choice models. Econometric Reviews, 14, 1, pp. 101-
116
– Zimmermann, K.F. (1993):
Goodness of fit in qualitative choice models: review and evaluation. In: H. Schneeweiß &
K. Zimmermann (eds): Studies in applied econometrics. Heidelberg: Physika, pp. 25-
74
14
Appendix
15
What is the solution?
Short review of the Monte-Carlo studies made
by econometricians to test systematically the
most common Pseudo R²s for binary and
ordinal probit / logit models
< Hagle & Mitchell 1992
< Veall & Zimmermann 1992, 1993, 1994
< Windmeijer 1995
< DeMaris 2002
16
Which Pseudo-R²s were tested in the MC studies?
Likelihood-based measures:
< Maddala / Cox & Snell Pseudo R² (1983/1989)
< Cragg & Uhler / Nagelkerke Pseudo R² (1970/1992)
Log-Likelihood-based measures:
< McFadden Pseudo R² (1974)
< Aldrich & Nelson Pseudo R² (1984)
< Aldrich & Nelson Pseudo R² with the Veall &
Zimmermann correction (1992)
Basing on the estimated probabilities:
< Efron / Lave Pseudo R² (1970 / 1978)
Basing on the variance decomposition of the
estimated Probits / Logits:
< McKelvey & Zavoina Pseudo R² (1975)
17
Results of the Monte-Carlo-studies for
binary / ordinal logits or probits
The McKelvey & Zavoina Pseudo R² is the best
estimator for the ?true R²” of the OLS regression
The Aldrich & Nelson Pseudo R² with the Veall &
Zimmermann correction is the best approximation of
the McKelvey & Zavoina Pseudo R²
Lave / Efron, Aldrich & Nelson, McFadden and Cragg
& Uhler Pseudo R² underestimate the ?true R²” of the
OLS regression
My personal advice: Use the McKelvey & Zavoina
Pseudo R² or the Aldrich & Nelson Pseudo R² with
Veall & Zimmermann correction to assess the fit of
binary and ordinal logit models
18
Log-Likelihood-based measures 1
McFadden-Pseudo-R2 (1974) provided by Stata
log L A
McFadden Pseudo R 1
2 2
log L0
Theoretical range: 0 # McFadden Pseudo R² # 1
20
Log-Likelihood-based measures 2
Adjusted McFadden Pseudo R2 (1985)
log LA K
McFadden Pseudo R 2
adjusted 1 log L
2
0
Correction of McFadden Pseudo R² by the total number
of estimated logistic slopes (K) proposed by Ben-Akiva
& Lerman (1985: 167)
21
Likelihood-based measures 1
Maddala Pseudo-R2 (1983) or Cox & Snell
Pseudo R2 (1989):
2
L0 n
M addala Pseudo R ( R ) 1 2 2
ML
LA
L.R. 2 2 log LA log L0
1 exp 1 exp
n n
2
R a n g e : 0 M a d d a la P s e u d o R 2 1 L 0 n
Legend:
L0 : Likelihood of zero model (constant only)
LA : Likelihood of alternative model
n : number of cases
22
Likelihood-based measures 2
Cragg & Uhler Pseudo R2 (1970) or Nagelkerke
Pseudo R2 (1991) 2
RML
C ra g g & U h ler P seu d o R
2
m ax . R M2 L
2
L0 n L.R . 2
1 1 exp
LA
n
1 L0
2
n
1 exp n log L0
2
L .R . 2
A ldrich & N elson P seudo R 2
L .R . 2 n
2 log LA log L0
2 log LA log L0 n
24
Veall & Zimmermann Correction
Veall & Zimmermann (1994) propose a correction
of the Aldrich & Nelson Pseudo R2 by its upper
limit
< Range of the A&N Pseudo R2
2 lo g L 0
0 A & N P seudo R 2
n 2 lo g L 0
Y i pˆ i
2
L a v e / E fr o n P s e u d o R 2 1 i 1
Y
n 2
i Y
n
Y Yi
i 1
1
with n
i 1
Legend:
Yi : Value of the dependent variable for case i (1 or 0)
Var yˆ
* i 1
M & Z Pseudo R 2
n
Var yˆ * Var
yˆ
n 2
*
i yˆ *
n 3
2
i 1
n
Range: 0 # M&Z Pseudo R² #1
Legend:
Var y * : Variance of the estimated logits (latent variable Y*)
y i* : Estimated logit of case i
Case-specific Variables:
Intention to vote for
Religious affiliation : party SPD,FDP or CDU:
(confession) (vote)
1) Yes 1) Yes
0) No 0) No
Degree of education:
(education)
1) Secondary modern school
2) Secondary modern school +
3) Grammar school
4) College/University
Pi (Y j ) K
ln ( z ijk z iJk )
Pi (Y J )
k
k 1
J 1 L
Rational Choice-part with j ßl X il
alternative-specific γ-logit j l 1
slopes for the difference
of Zk
β-logistic slope of
the effect of Xl on
Multinomial logit model to comparison j vs. J
estimate the effects case-
specific exogenous variables Logistic constant for the
comparison j vs. J
29
Estimated effects of exogenous variables