0% found this document useful (0 votes)
20 views

Ordinal Logistic Regression Stata Command

Biostatistics

Uploaded by

Tolasa Wagari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Ordinal Logistic Regression Stata Command

Biostatistics

Uploaded by

Tolasa Wagari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Ordinal logistic regression stata command

What is ordinal logistic regression used for. What is ordinal logistic regression. Stata code for logistic regression.

How to interpret ordinal logistic regression stata. Types of ordinal logistic regression.

So far in this course we have analyzed data in which the response variable has had exactly two levels, but what about the situation in which there are more than two levels? In this chapter of the Logistic Regression with Stata, we cover the various commands used for multinomial and ordered logistic regression allowing for more than two categories.
Multinomial response models have much in common with the logistic regression models that we have covered so far. However, you will find that there are differences in some of the assumptions, in the analyses and in the interpretation of these models. 4.2 Ordered Logistic Regression 4.2.1 Example 1 Let’s begin our discussion of ordered logistic
regression with an example that has a binary outcome variable, honcomp, that indicates that a student is enrolled in an “honors composition” course. We begin with an ordinary logistic regression. use clear logit honcomp female Logit estimates Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 =
0.0170 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .6513707 .3336752 1.95 0.051 -.0026207 1.305362 _cons | -1.400088 .2631619 -5.32 0.000 -1.915876 -.8842998 ------------------------------------------------------------------------------ Next, we
will run an ordered logistic regression for the same model using Stata's ologit command. ologit honcomp female Ordered logit estimates Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf.
Interval] -------------+---------------------------------------------------------------- female | .6513707 .3336752 1.95 0.051 -.0026207 1.305362 -------------+---------------------------------------------------------------- _cut1 | 1.400088 .2631619 (Ancillary parameter) ------------------------------------------------------------------------------ As you can see, the values of the coefficients and the standard errors are
the same, except that, the sign for _cut1 is reversed from _cons. We will explain shortly what _cut1 is although it is already clear that it is related to the constant found in the logistic regression models. 4.2.2 Example 2 For our next example we will select ses as the response variable.
It has three ordered categories. Here are the frequencies for each of the categories.
tabulate ses ses | Freq. Percent Cum. ------------+----------------------------------- low | 47 23.50 23.50 middle | 95 47.50 71.00 high | 58 29.00 100.00 ------------+----------------------------------- Total | 200 100.00 We can also obtain much of the same information using the codebook command. codebook ses ses --------------------------------------------------------------- (unlabeled) type:
numeric (float) label: sl range: [1,3] units: 1 unique values: 3 coded missing: 0 / 200 tabulation: Freq. Numeric Label 47 1 low 95 2 middle 58 3 high For a predictor variable we will use the variable academic which is a dummy variable indicating whether or not students are in an academic program. Here is the ordered logistic model predicting ses
using academic. ologit ses academic Ordered logit estimates Number of obs = 200 LR chi2(1) = 11.83 Prob > chi2 = 0.0006 Log likelihood = -204.66504 Pseudo R2 = 0.0281 ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- academic | .9299309
.2745004 3.39 0.001 .39192 1.467942 -------------+---------------------------------------------------------------- _cut1 | -.7643189 .2042487 (Ancillary parameters) _cut2 | 1.41461 .225507 ------------------------------------------------------------------------------ The format of these results may seem confusing at first. What isn't clear from the output is that logistic regression is a multi-
equation model.

In this example, there are two equations, each with the same coefficients. This is known as the proportional odds model. Other logistics regression models, which do not assume proportional odds will have one equation, with their own constants and coefficients, for each of the k-1 equations. In our example, the results are formatted like a single
equation model when, in fact, this is a two equation model because there are three levels of ses.
In ordered logistic regression, Stata sets the constant to zero and estimates the cut points for separating the various levels of the response variable. Other programs may parameterize the model differently by estimating the constant and setting the first cut point to zero. In order to show the multi-equation nature of this model, we will redisplay the
results in a different format. /* output showing the multi-equation nature of ordered logistic regression */ ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- low | academic | .9299306 .2745004 3.39 0.001 .3919197 1.467941 _cons | .7643188
.2042487 3.74 0.000 .3639987 1.164639 -------------+---------------------------------------------------------------- middle | academic | .9299306 .2745004 3.39 0.001 .3919197 1.467941 _cons | -1.414609 .225507 -6.27 0.000 -1.856595 -.9726238 ------------------------------------------------------------------------------ With ordered logistic regression there are other possible methods that do
not involve the proportional odds assumption. There is a program omodel (available from the Stata website) which can be used to test the proportional odds assumption. You can download omodel from within Stata by typing search omodel (see How can I use the search command to search for programs and get additional help? for more information
about using search). omodel logit ses academic Ordered logit estimates Number of obs = 200 LR chi2(1) = 11.83 Prob > chi2 = 0.0006 Log likelihood = -204.66504 Pseudo R2 = 0.0281 ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf.

Interval] -------------+---------------------------------------------------------------- academic | .9299309 .2745004 3.39 0.001 .39192 1.467942 -------------+---------------------------------------------------------------- _cut1 | -.7643189 .2042487 (Ancillary parameters) _cut2 | 1.41461 .225507 ------------------------------------------------------------------------------ Approximate likelihood-ratio test of
proportionality of odds across response categories: chi2(1) = 2.01 Prob > chi2 = 0.1563 These results suggest that the proportional odds approach is reasonable since the chi-square test is not significant.
If the test of proportionality had been significant we could have tried the gologit2 program by Richard Williams of Notre Dame University. You can download gologit2 from within Stata by typing search gologit2 (see How can I use the search command to search for programs and get additional help? for more information about using search). gologit2
with the npl option does not assume proportional odds, let's try it just for "fun." gologit2 ses academic, npl Generalized Ordered Logit Estimates Number of obs = 200 LR chi2(2) = 13.83 Prob > chi2 = 0.0010 Log likelihood = -203.66708 Pseudo R2 = 0.0328 ------------------------------------------------------------------------------ ses | Coef.
Std.
Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- low | academic | .6374202 .3389678 1.88 0.060 -.0269444 1.301785 _cons | .8724881 .2250326 3.88 0.000 .4314324 1.313544 -------------+---------------------------------------------------------------- middle | academic | 1.191394 .3388816 3.52 0.000 .5271982 1.85559 _cons |
-1.596859 .27415 -5.82 0.000 -2.134183 -1.059535 ------------------------------------------------------------------------------ These results clearly show the multiple equation nature of ordered logistic regression with different constants, coefficients and standard errors. The gologit2 command provides us with an alternative method for testing the proportionality
assumption. If the assumption of proportional odds is tenable then there should not be a significant difference between the coefficients for academic in the two equations. The test command computes a Wald test across the two equations. test [low=middle] ( 1) [low]academic - [middle]academic = 0 chi2( 1) = 1.98 Prob > chi2 = 0.1595 The results of
this Wald test of proportionality are very similar to those found using the omodel command. Let's rerun the ologit command followed by the listcoef and fitstat commands. ologit ses academic Ordered logit estimates Number of obs = 200 LR chi2(1) = 11.83 Prob > chi2 = 0.0006 Log likelihood = -204.66504 Pseudo R2 = 0.0281 -----------------------------------
------------------------------------------- ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- academic | .9299309 .2745004 3.39 0.001 .39192 1.467942 -------------+---------------------------------------------------------------- _cut1 | -.7643189 .2042487 (Ancillary parameters) _cut2 | 1.41461 .225507 ------------------------------------------------
------------------------------ listcoef ologit (N=200): Factor Change in Odds Odds of: >m vs <=m ---------------------------------------------------------------------- ses | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- academic | 0.92993 3.388 0.001 2.5343 1.5929 0.5006 ---------------------------------------------------------------------- fitstat Measures of Fit for
ologit of ses Log-Lik Intercept Only: -210.583 Log-Lik Full Model: -204.665 D(197): 409.330 LR(1): 11.835 Prob > LR: 0.000 McFadden's R2: 0.028 McFadden's Adj R2: 0.014 Maximum Likelihood R2: 0.057 Cragg & Uhler's R2: 0.065 McKelvey and Zavoina's R2: 0.062 Variance of y*: 3.507 Variance of error: 3.290 Count R2: 0.475 Adj Count R2: 0.000
AIC: 2.077 AIC*n: 415.330 BIC: -634.438 BIC': -6.537 From the listcoef, we see that the relative risk ratio for academic is approximately 2.5, which means that the risk (odds) of being in the high ses versus medium and low ses is 2.5 times greater for students in the academic program. The same relative risk ratio also applies to the comparison of
medium and high ses versus low ses. 4.2.3 Example 3 The variable academic that we used in the previous example is a dichotomization of the three category variable prog (program type). Let's look at the frequencies for each of the levels of prog and create dummy coded variables at the same time using the tabulate command.

tabulate prog, generate(prog) type of | program | Freq. Percent Cum. ------------+----------------------------------- general | 45 22.50 22.50 academic | 105 52.50 75.00 vocation | 50 25.00 100.00 ------------+----------------------------------- Total | 200 100.00 Now we can use prog1 and prog3 in an ordered logistic regression so that the academic group will be our
comparison group. ologit ses prog1 prog3 Ordered logit estimates Number of obs = 200 LR chi2(2) = 12.06 Prob > chi2 = 0.0024 Log likelihood = -204.55398 Pseudo R2 = 0.0286 ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog1 | -1.030315
.3479667 -2.96 0.003 -1.712317 -.3483126 prog3 | -.8500258 .3223129 -2.64 0.008 -1.481747 -.2183042 -------------+---------------------------------------------------------------- _cut1 | -1.695676 .2334022 (Ancillary parameters) _cut2 | .4852592 .195606 ------------------------------------------------------------------------------ Individually, prog1 and prog3 are statistically significant and we
can determine from the likelihood ration chi-square (chi2(2) = 12.06) that they are jointly significant, i.e., that the variable prog is significant. We will follow this analysis with the omodel command to check on the proportional odds assumption.

omodel logit ses prog1 prog3 Ordered logit estimates Number of obs = 200 LR chi2(2) = 12.06 Prob > chi2 = 0.0024 Log likelihood = -204.55398 Pseudo R2 = 0.0286 ------------------------------------------------------------------------------ ses | Coef. Std. Err.
z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog1 | -1.030315 .3479667 -2.96 0.003 -1.712317 -.3483126 prog3 | -.8500258 .3223129 -2.64 0.008 -1.481747 -.2183042 -------------+---------------------------------------------------------------- _cut1 | -1.695676 .2334022 (Ancillary parameters) _cut2 | .4852592 .195606 -------------------------
----------------------------------------------------- Approximate likelihood-ratio test of proportionality of odds across response categories: chi2(2) = 4.74 Prob > chi2 = 0.0933 The test of proportionality is not significant, thus we can continue looking at the results for the ologit command by following up with listcoef and fitstat. listcoef ologit (N=200): Factor Change
in Odds Odds of: >m vs <=m ---------------------------------------------------------------------- ses | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- prog1 | -1.03031 -2.961 0.003 0.3569 0.6497 0.4186 prog3 | -0.85003 -2.637 0.008 0.4274 0.6914 0.4341 ---------------------------------------------------------------------- fitstat Measures of Fit for ologit of ses
Log-Lik Intercept Only: -210.583 Log-Lik Full Model: -204.554 D(196): 409.108 LR(2): 12.057 Prob > LR: 0.002 McFadden's R2: 0.029 McFadden's Adj R2: 0.010 Maximum Likelihood R2: 0.059 Cragg & Uhler's R2: 0.067 McKelvey and Zavoina's R2: 0.064 Variance of y*: 3.513 Variance of error: 3.290 Count R2: 0.475 Adj Count R2: 0.000 AIC: 2.086
AIC*n: 417.108 BIC: -629.362 BIC': -1.460 Note that if the ones and zeros were reversed in both prog1 and prog3 then the relative risk ratio for prog1 would be 1/.3569 = 2.80 and for prog3 would be 1/.4274 = 2.34. The fitstat gives a deviance of 409.11 which is lower than the deviance of 409.33 for the model that used the dichotomous variable
academic. This is not a very big change in the deviance.
If you look at the AIC you will see that the value for current model (2.086) is actually larger than for the model with academic (2.077). Again, this is a very small change which suggests that the three category predictor, prog, is not really any better than the dichotomous predictor academic. 4.2.4 Example 4 Next we will look at a model that has both
categorical and continuous predictor variables and their interaction. generate mathacad = math*academic ologit ses academic math mathacad Ordered logit estimates Number of obs = 200 LR chi2(3) = 19.02 Prob > chi2 = 0.0003 Log likelihood = -201.07214 Pseudo R2 = 0.0452 ------------------------------------------------------------------------------ ses | Coef. Std.
Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- academic | .4449579 1.73113 0.26 0.797 -2.947995 3.837911 math | .0423708 .0243203 1.74 0.081 -.005296 .0900376 mathacad | .0025625 .0327299 0.08 0.938 -.061587 .0667119 -------------+---------------------------------------------------------------- _cut1 | 1.255304 1.181954
(Ancillary parameters) _cut2 | 3.4974 1.21058 ------------------------------------------------------------------------------ We can tell from the test of the individual coefficients that the interaction term is not significant but let's run a likelihood ratio test anyway, just to confirm what we already know. lrtest, saving(0) ologit ses academic math Ordered logit estimates
Number of obs = 200 LR chi2(2) = 19.01 Prob > chi2 = 0.0001 Log likelihood = -201.07521 Pseudo R2 = 0.0451 ------------------------------------------------------------------------------ ses | Coef.
Std. Err.
z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- academic | .578395 .3035933 1.91 0.057 -.0166369 1.173427 math | .0437666 .0165564 2.64 0.008 .0113166 .0762166 -------------+---------------------------------------------------------------- _cut1 | 1.322609 .8117558 (Ancillary parameters) _cut2 | 3.564826 .851694 -------------------------------
----------------------------------------------- lrtest Ologit: likelihood-ratio test chi2(1) = 0.01 Prob > chi2 = 0.9376 Now we see that both math and academic are significant. However, the coefficient for math is for a one point change in the math test score, which is not very meaningful. Let's create a new variable math10 which is the math test score divided by ten.
A change of ten points on the math test will be more meaningful than a one point change.
The ologit will be followed by listcoef and fitstat. generate math10 = math/10 ologit ses academic math10 Ordered logit estimates Number of obs = 200 LR chi2(2) = 19.01 Prob > chi2 = 0.0001 Log likelihood = -201.07521 Pseudo R2 = 0.0451 ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------
---+---------------------------------------------------------------- academic | .578395 .3035933 1.91 0.057 -.0166369 1.173427 math10 | .4376661 .1655641 2.64 0.008 .1131664 .7621657 -------------+---------------------------------------------------------------- _cut1 | 1.322609 .8117558 (Ancillary parameters) _cut2 | 3.564826 .851694 ------------------------------------------------------------------------------
listcoef ologit (N=200): Factor Change in Odds Odds of: >m vs <=m ---------------------------------------------------------------------- ses | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- academic | 0.57840 1.905 0.057 1.7832 1.3358 0.5006 math10 | 0.43767 2.643 0.008 1.5491 1.5069 0.9368 ------------------------------------------------------------------
---- fitstat Measures of Fit for ologit of ses Log-Lik Intercept Only: -210.583 Log-Lik Full Model: -201.075 D(196): 402.150 LR(2): 19.015 Prob > LR: 0.000 McFadden's R2: 0.045 McFadden's Adj R2: 0.026 Maximum Likelihood R2: 0.091 Cragg & Uhler's R2: 0.103 McKelvey and Zavoina's R2: 0.099 Variance of y*: 3.651 Variance of error: 3.290 Count
R2: 0.480 Adj Count R2: 0.010 AIC: 2.051 AIC*n: 410.150 BIC: -636.320 BIC': -8.418 From the listcoef results we see that for every ten point increase in math the odds of being in high ses versus medium and low ses are about 1.5 times greater. The same thing is true for the odds of medium and high ses versus low ses. The relative risk ratio for
math10 is less than that of academic which indicates that the odds are about 1.8 times greater from students in the academic program. From the fitstat restults we can see that the deviance has dropped to 401.4 and the AIC is down to 2.05, both of which indicate that this model fits better than the model without math. ORDER STATA Stata supports
all aspects of logistic regression. View the list of logistic regression features. Stata’s logistic fits maximum-likelihood dichotomous logistic models: . webuse lbw (Hosmer & Lemeshow data) .
logistic low age lwt i.race smoke ptl ht ui Logistic regression Number of obs = 189 LR chi2(8) = 33.22 Prob > chi2 = 0.0001 Log likelihood = -100.724 Pseudo R2 = 0.1416 low Odds ratio Std. err. z P>|z| [95% conf. interval] age .9732636 .0354759 -0.74 0.457 .9061578 1.045339 lwt .9849634 .0068217 -2.19 0.029 .9716834 .9984249 race Black
3.534767 1.860737 2.40 0.016 1.259736 9.918406 Other 2.368079 1.039949 1.96 0.050 1.001356 5.600207 smoke 2.517698 1.00916 2.30 0.021 1.147676 5.523162 ptl 1.719161 .5952579 1.56 0.118 .8721455 3.388787 ht 6.249602 4.322408 2.65 0.008 1.611152 24.24199 ui 2.1351 .9808153 1.65 0.099 .8677528 5.2534 _cons 1.586014 1.910496
0.38 0.702 .1496092 16.8134 Note: _cons estimates baseline odds.
The syntax of all estimation commands is the same: the name of the dependent variable is followed by the names of the independent variables. In this case, the dependent variable low (containing 1 if a newborn had a birthweight of less than 2500 grams and 0 otherwise) was modeled as a function of a number of explanatory variables. By default,
logistic reports odds ratios; logit alternative will report coefficients if you prefer. Once a model has been fitted, you can use Stata's predict to obtain the predicted probabilities of a positive outcome, the value of the logit index, or the standard error of the logit index. You can also obtain Pearson residuals, standardized Pearson residuals, leverage (the
diagonal elements of the hat matrix), Delta chi-squared, Delta D, and Pregibon's Delta beta influence measures by typing a single command. All statistics are adjusted for the number of covariate patterns in the data—m-asymptotic rather than n-asymptotic in Hosmer and Lemeshow (2000) jargon.
Every diagnostic graph suggested by Hosmer and Lemeshow can be drawn by Stata. Also available are the goodness-of-fit test, using either cells defined by the covariate patterns or grouping, as suggested by Hosmer and Lemeshow; classification statistics and the classification table; and a graph and area under the ROC curve. Stata’s mlogit
performs maximum likelihood estimation of models with categorical dependent variables. It is intended for use when the dependent variable takes on more than two outcomes and the outcomes have no natural ordering. Uniquely, linear constraints on the coefficients can be specified both within and across equations using algebraic syntax. Much
thought has gone into making mlogit truly usable. For instance, there are no artificial constraints placed on the nature of the dependent variable.
The dependent variable is not required to take on integer, contiguous values such as 1, 2, and 3, although such a coding would be acceptable. Equally acceptable would be 1, 3, and 4, or even 1.2, 3.7, and 4.8. Stata’s clogit performs maximum likelihood estimation with a dichotomous dependent variable; conditional logistic analysis differs from
regular logistic regression in that the data are stratified and the likelihoods are computed relative to each stratum.
The form of the likelihood function is similar but not identical to that of multinomial logistic regression.
Conditional logistic analysis is known in epidemiology circles as the matched case–control model and in econometrics as McFadden's choice model. The form of the data, as well as the nature of the sampling, differs across the two settings, but clogit handles both. clogit allows both 1:1 and 1:k matching, and there may even be more than one positive
outcome per strata (which is handled using the exact solution). Stata’s ologit performs maximum likelihood estimation to fit models with an ordinal dependent variable, meaning a variable that is categorical and in which the categories can be ordered from low to high, such as “poor”, “good”, and “excellent”. Unlike mlogit, ologit can exploit the
ordering in the estimation process. (Stata also provides oprobit for fitting ordered probit models.) As with mlogit the categorical dependent variable may take on any values whatsoever.
See Greene (2012) for a straightforward description of the models fitted by clogit, mlogit, ologit, and oprobit.
References Breslow, N. E. 1974. Covariance analysis of censored survival data.
Biometrics 30: 89–99. Greene, W. H. 2012. Econometric Analysis. 8th ed. Upper Saddle River, NJ: Prentice Hall. Hosmer, D. W.
Jr., S. Lemeshow, and Sturdivant R. X. 2013. Applied Logistic Regression. 3rd ed. New York: Wiley.
McFadden, D. 1974. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics, ed. P. Zarembka, 105–142. New York: Academic Press.

You might also like