0% found this document useful (0 votes)

45 views

Logistic Regression (2022)

This document discusses binary logistic regression. It begins by outlining when to use logistic regression over other statistical tests based on the number and type of predictor and outcome variables. It then provides more detail on logistic regression, including how it can be used to study associations between risk factors and dichotomous outcomes. The document discusses odds ratios, how they are interpreted and calculated. It also outlines the four main assumptions of logistic regression and provides an example objective to identify risk factors associated with hypercholesterolemia using logistic regression.

Uploaded by

Nadzmi Nadzri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Logistic Regression (2022)

Uploaded by

Nadzmi Nadzri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

BINARY

LOGISTIC
REGRESSION
SHAHRUL AIMAN BIN SOELAR
Clinical Research Centre, Hospital Sultanah Bahiyah
[email protected]
CHOOSING THE DON’T
GIVE UP
Correct Statistical Test
Number of Independent(Predictor) Variables
≥ Two (Independent)
Dependent (Outcome)

One (Independent)
Categorical Data

Simple
Two(2) categories

Variable Selection
Logistic Regression
Variable

Multiple
• Numerical
Logistic Regression
• (age)
• Categorical or Numerical or mix
Simple
• (gender and age)
Logistic Regression
• Categorical
• (malay, chinese and indian)
DON’T
Logistic Regression GIVE UP

• To study the association between risk factors (numerical

or categorical data) and two outcome categories
(categorical data).

The model The Logistic function

X Y
Factor 1

Factor 2
Predict
Factor 3 Outcome

Factor 5
DON’T
Logistic Regression GIVE UP

• To study the association between risk factors (numerical

or categorical data) and two outcome categories
(categorical data).

H0: Y does not depend on any of the Xi’s Ha: Y depends on at least one of the Xi’s

X Y X Y
Factor 1 Factor 1

Factor 2 Factor 2
Predict Predict
Factor 3 Outcome Factor 3 Outcome

Factor 4 Factor 4

Factor 5 Factor 5
DON’T
Logistic Regression GIVE UP

Cancer

Exposure a b
c d
𝒂Τ 𝒂
𝒃 ൗ(𝒂+𝒃)
• Odds Ratio = 𝒄 • Risk Ratio =
Τ𝒅 𝒄/(𝒄+𝒅)
• Odds Ratio • Risk Ratio
✓ Cohort Study ✓ Cohort Study
✓ Case Control Study ✓ Randomised Controlled
✓ Cross-sectional Study Trials
✓ Odds (times) ✓ Probability (%)
✓ Diagnostic test
DON’T
Logistic Regression GIVE UP

Drug users Non-users Total

Male 120 102 222
Female 85 106 191
Total 205 208 413

• Odds: the probability of belonging to one group or event occurring divided

by the probability of not belonging to that group or event not occurring.
▪ The odds of a male using drug is 120/102=1.18,
▪ The odds of a female using drug is 85/106=0.80
▪ For males, it means that a male is 1.18 times more likely to use drug than not to
use.
▪ For females, it means that a female is 0.80 times less likely to use drug than
not to use.
DON’T
Logistic Regression GIVE UP

Drug users Non-users Total

Male 120 102 222
Female 85 106 191
Total 205 208 413

• Odds Ratio: an important estimate in logistic regression and used to

answer our research question.
▪ For the table below, the research question is whether there is a gender
difference in using drugs or whether the probability of drug use is the same for
males and females.
▪ A ratio of the odds for each group.
▪ Always odd for the response group (males) divided by odd for the referent
group (females).
▪ Odds ratio is 1.18/0.80=1.48
▪ Males in this example were 1.48 times more likely than females to use drugs.
DON’T
Logistic Regression GIVE UP

If Odds Ratio = 1

0 1 ∞
Low risk factor High risk factor
• To interpret Odds Ratio, compare value to 1:
▪ If OR<1, group A is less odds/likely of having event compared to group
B(reference category).
➢ (a negative or protective association between factor and outcome)
▪ If OR=1, group A and B are the odds of having the same event.
➢ (no association between factor and outcome)
▪ If OR>1, group A is more odds/likely of having event compared to group
B(reference category).
➢ (a positive association between factor and outcome)
DON’T
Logistic Regression GIVE UP

• 4 Assumptions
✓ There must be at least two cases for each category of the dependent
✓ Overall model fitness – MULTIPLE ONLY
a) STEP 1: Checking multicollinearity (Variance Inflation Factor)
b) STEP 2: Checking outliers (Cook’s Distance)
c) STEP 3: Checking model fit (Hosmer-Lemeshow goodness-of-fit test)

Objective:
• OBJECTIVE: To identify risk factors (age, sex, DM, HPT and exercise)
associated with hypercholesterolemia
– Import the file Hypercholesterol(Logistic).xlsx
CHOOSING THE DON’T
GIVE UP
Correct Statistical Test
Number of Independent(Predictor) Variables
≥ Two (Independent)
Dependent (Outcome)

One (Independent)
Categorical Data

Simple
Two(2) categories

Variable Selection
Logistic Regression
Variable

Multiple
• Numerical
Logistic Regression
• (age)
• Categorical or Numerical or mix
Simple
• (gender and age)
Logistic Regression
• Categorical
• (malay, chinese and indian)
01 Data Exploration DON’T
GIVE UP

## Compute summary statistics by groups

desc01 <- describeBy(hptR$age, hptR$hyperchol, IQR=TRUE)
desc01
Descriptive statistics by group
group: No
vars n mean sd median trimmed mad min max range skew kurtosis se IQR
X1 1 144 38.31 4.83 38 38.26 4.45 25 52 27 0.09 0.09 0.4 6
--------------------------------------------------------------------------------------------------------
group: Yes
vars n mean sd median trimmed mad min max range skew kurtosis se IQR
X1 1 56 42.59 4.69 43 42.85 4.45 30 52 22 -0.51 -0.2 0.63 6.25

Caution!! Acceptable range for normality is skewnessᵃ

and kurtosisᵇ lying between -1 to 1 and -3 to 3.

REFERENCES: NOTE
ᵃBulmer, M. G. (1979), Principles of Statistics. NY:Dover Books on Mathematics. Categorical Data
ᵇKevin P. Balanda and H.L. MacGillivray. “Kurtosis: A Critical Review”. The American Statistician 42:2 [May 1988], pp 111–119 Numerical Data
01 Data Exploration DON’T
GIVE UP

COMBINE
## Combine specific results into one table
desc99 <- cbind("Variable"=c("Age(year)"),
"Less.Mean"=c(desc01$No$mean),
"Less.SD"=c(desc01$No$sd),"n"=c(""),"(%)"=c(""),
"More.Mean"=c(desc01$Yes$mean),
"More.SD"=c(desc01$Yes$sd),"n"=c(""),"(%)"=c(""))
desc99
Variable Less.Mean Less.SD n (%) More.Mean More.SD n (%)
[1,] "Age(year)" "38.3125" "4.83354934811876" "" "" "42.5892857142857" "4.68567172137779" "" ""
02 Test statistic using R DON’T
GIVE UP

## Simple Logistic Regression (NUMERICAL DATA)

model01 <- glm(hyperchol ~ age, family=binomial, data=hptR)
model01
Call: glm(formula = hyperchol ~ age, family = binomial, data = hptR)

Coefficients:
(Intercept) age
-8.5878 0.1888

Degrees of Freedom: 199 Total (i.e. Null); 198 Residual

Null Deviance: 237.2
Residual Deviance: 207.1 AIC: 211.1

## extract the coefficients from the model and exponentiate

OR <- exp(coef(model01))
OR
(Intercept) age
0.0001863672 1.2078084592

CI <- as.data.frame(exp(confint(model01)))
CI
2.5 % 97.5 %
(Intercept) 6.831754e-06 0.003609845
age 1.124103e+00 1.307493754 NOTE
Categorical Data
Numerical Data
02 Test statistic using R DON’T
GIVE UP

## 2-tailed Wald z tests to test significance of coefficients

PV <- summary(model01)$coefficients[,4]
PV
(Intercept) age
6.947367e-08 8.676167e-07

outN1 <- cbind(desc99,model99)

outN1
03 Interpretation DON’T
GIVE UP

Odds Ratio
• The result shows that age (p-value <0.001) was statistically significant to the
hypercholesterolemia.
• An increase in one-year in age has a 1.21 times (95% CI 1.12 to 1.31) more
odds/likely of having HC.
• For example, those with 31 years old people have 1.21 more odds/likely of having
HC compared to those with 30 years old.
04 Data Presentation DON’T
GIVE UP
CHOOSING THE DON’T
GIVE UP
Correct Statistical Test
Number of Independent(Predictor) Variables
≥ Two (Independent)
Dependent (Outcome)

One (Independent)
Categorical Data

Simple
Two(2) categories

Variable Selection
Logistic Regression
Variable

## Compute summary statistics by groups

desc01 <- table(hptR$dm,hptR$hyperchol)
desc01
No Yes
No DM 73 16
Controlled DM 56 23
Uncontrolled DM 15 17

## Row Percentages
prop01 <- desc01/rowSums(desc01)*100
prop01
No Yes
No DM 82.02247 17.97753
Controlled DM 70.88608 29.11392
Uncontrolled DM 46.87500 53.12500

## Column Percentages
prop01 <- t(t(desc01)/colSums(desc01)*100)
prop01
No Yes
No DM 50.69444 28.57143
Controlled DM 38.88889 41.07143
Uncontrolled DM 10.41667 30.35714
NOTE
Categorical Data
Numerical Data
01 Data Exploration DON’T
GIVE UP

COMBINE
## Combine specific results into one table
desc99 <- cbind("Variable"=c(dimnames(desc01)[[1]]),
"Less.Mean"=c(""),"Less.SD"=c(""),
"n"=c(desc01[,1]),"(%)"=c(prop01[,1]),
"More.Mean"=c(""),"More.SD"=c(""),
"n"=c(desc01[,2]),"(%)"=c(prop01[,2]))
01 Data Exploration DON’T
GIVE UP

The assumption was not met

02 Test statistic using R DON’T
GIVE UP

## Simple Logistic Regression (CATEGORICAL DATA)

model01 <- glm(hyperchol ~ dm, family=binomial, data=hptR)
model01
Call: glm(formula = hyperchol ~ dm, family = binomial, data = hptR)

Coefficients:
(Intercept) dmControlled DM dmUncontrolled DM
-1.518 0.628 1.643

Degrees of Freedom: 199 Total (i.e. Null); 197 Residual

Null Deviance: 237.2
Residual Deviance: 223.4 AIC: 229.4

## extract the coefficients from the model and exponentiate

OR <- exp(coef(model01))
OR
(Intercept) dmControlled DM dmUncontrolled DM
0.2191781 1.8738839 5.1708333

CI <- as.data.frame(exp(confint(model01)))
CI
2.5 % 97.5 %
(Intercept) 0.1231741 0.3662076
dmControlled DM 0.9115646 3.9310545 NOTE
dmUncontrolled DM 2.1662191 12.7155635 Categorical Data
Numerical Data
02 Test statistic using R DON’T
GIVE UP

## 2-tailed Wald z tests to test significance of coefficients

PV <- summary(model01)$coefficients[,4]
PV
(Intercept) dmControlled DM dmUncontrolled DM
3.825688e-08 9.037618e-02 2.536746e-04

outC1 <- rbind(cbind("Diabetes Mellitus","","","","","","","","","","",""),

cbind(desc99,
rbind(cbind("1.00","(ref.)",""),model99)))
outC1
03 Interpretation DON’T
GIVE UP

Odds Ratio
• Controlled DM group have 1.87 times more odds/likely of having HC compared to
non-DM group.
• Uncontrolled DM group have 5.17 times more odds/likely of having HC compared
to non-DM group.
04 Data Presentation DON’T
GIVE UP
CHOOSING THE DON’T
GIVE UP
Correct Statistical Test
Number of Independent(Predictor) Variables
≥ Two (Independent)
Dependent (Outcome)

One (Independent)
Categorical Data

Simple
Two(2) categories

Variable Selection
Logistic Regression
Variable

Multiple
• Numerical
Logistic Regression
• (age)
• Categorical or Numerical or mix
Simple
• (gender and age)
Logistic Regression
• Categorical
• (malay, chinese and indian)

Enter all the variables in the model.

By running a Multiple Logistics Regression, there are three versions of this method:

(1) ENTER | (2) FORWARD | (3) BACKWARD

The independent variable with a P value less than 0.05 was contribute significantly to the
prediction of the dependent variable.
>> Thus, the variable can be included in the Multiple Logistics Regression.
01 Data Exploration DON’T
GIVE UP

NOT CONTRIBUTES SIGNIFICANTLY TO THE MODEL

Assumption
✓ There must be at least two cases for each
category of the dependent
02 Test statistic using R DON’T
GIVE UP

## Multiple Logistic Regression (ENTER METHOD)

model0E <- glm(hyperchol ~ age + dm + hpt + exercise, family=binomial, data=hptR)
summary(model0E)
Call:
glm(formula = hyperchol ~ age + dm + hpt + exercise, family = binomial, data = hptR)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.2956 -0.6510 -0.3542 0.5696 2.4039

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.77358 1.98224 -5.435 5.48e-08 ***
age 0.24480 0.04727 5.179 2.23e-07 *** The hpt does not contribute
dmControlled DM 0.95302 0.44096 2.161 0.03068 *
dmUncontrolled DM 1.58343 0.53820 2.942 0.00326 ** significantly to the model because the
hptYes -0.03546 0.38861 -0.091 0.92729 p-value is 0.927 is higher than 0.05.
exerciseYes -1.96524 0.43911 -4.476 7.62e-06 ***
--- Then, we decided to remove hpt from
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 the model.
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 237.18 on 199 degrees of freedom

Residual deviance: 167.39 on 194 degrees of freedom
AIC: 179.39 NOTE
Number of Fisher Scoring iterations: 5 Categorical Data
Numerical Data
02 Test statistic using R DON’T
GIVE UP

## Multiple Logistic Regression (ENTER METHOD)

model0E <- glm(hyperchol ~ age + dm + exercise, family=binomial, data=hptR)
summary(model0E)
Call:
glm(formula = hyperchol ~ age + dm + exercise, family = binomial, data = hptR)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.2865 -0.6462 -0.3551 0.5684 2.4120

Coefficients:
Estimate Std. Error z value Pr(>|z|) All factors contribute significantly to
(Intercept) -10.77819 1.98047 -5.442 5.26e-08 *** the model because the p-value is less
age 0.24455 0.04715 5.187 2.14e-07 ***
dmControlled DM 0.95204 0.44090 2.159 0.03082 *
than 0.05.
dmUncontrolled DM 1.57796 0.53469 2.951 0.00317 ** Then, this is the final model using
exerciseYes -1.96726 0.43868 -4.485 7.31e-06 ***
---
ENTER METHOD
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 237.18 on 199 degrees of freedom

Residual deviance: 167.40 on 195 degrees of freedom
AIC: 177.4
Number of Fisher Scoring iterations: 5 NOTE
Categorical Data
Numerical Data
02 Test statistic using R DON’T
GIVE UP

## Multiple Logistic Regression (INITIAL MODEL)

## only intercept
model01 <- glm(hyperchol ~ 1, family=binomial, data=hptR)
model01
Call: glm(formula = hyperchol ~ 1, family = binomial, data = hptR)

Coefficients:
(Intercept)
-0.9445

Degrees of Freedom: 199 Total (i.e. Null); 199 Residual

Null Deviance: 237.2
Residual Deviance: 237.2 AIC: 239.2

## Multiple Logistic Regression (FULL MODEL)

model02 <- glm(hyperchol ~ age + dm + hpt + exercise, family=binomial, data=hptR)
model02
Call: glm(formula = hyperchol ~ age + dm + hpt + exercise, family = binomial, data = hptR)

Coefficients:
(Intercept) age dmControlled DM dmUncontrolled DM hptYes exerciseYes
-10.77358 0.24480 0.95302 1.58343 -0.03546 -1.96524

Degrees of Freedom: 199 Total (i.e. Null); 194 Residual

Null Deviance: 237.2
NOTE
Residual Deviance: 167.4 AIC: 179.4 Categorical Data
Numerical Data
02 Test statistic using R DON’T
GIVE UP

## Multiple Logistic Regression (FORWARD METHOD)

model0F <- stepAIC(model01, direction = "forward", scope = formula(model02))

Initial Model
• hyperchol ~ 1 with an AIC of 239.18
• Small AIC means that the model can be improved
• Thus, age will be included for the next step to get a new AIC
which is 211.08

Model 1
• hyperchol ~ age with an AIC of 211.08
• exercise will be included for the next step to get a new AIC
which is 183.67

Model 2
• hyperchol ~ age + exercise with an AIC of 183.67
• DM will be included for the next step to get a new AIC which is
177.40

Model 3
• hyperchol ~ age + exercise + dm with an AIC of 177.40
• hpt can’t be included because the new AIC will be bigger than
the current AIC
02 Test statistic using R DON’T
GIVE UP

## Multiple Logistic Regression (BACKWARD METHOD)

model0B <- stepAIC(model02, direction = "backward")

Initial Model
• hyperchol ~ age + dm + hpt + exercise with an AIC of 177.40
• Small AIC means that the model can be improved
• Thus, hpt will be excluded for the next step to get a new AIC
which is 177.40

Model 1
• hyperchol ~ age + exercise + dm with an AIC of 177.40
• dm, exercise and age can’t be excluded because the new
AIC will be bigger than the current AIC
02 Test statistic using R DON’T
GIVE UP

## Multiple Logistic Regression (FORWARD METHOD)

model0F$anova
Stepwise Model Path
Analysis of Deviance Table

Initial Model:
hyperchol ~ 1

Final Model: The final model using

hyperchol ~ age + exercise + dm FORWARD METHOD

Step Df Deviance Resid. Df Resid. Dev AIC

1 199 237.1813 239.1813
2 + age 1 30.10322 198 207.0781 211.0781
3 + exercise 1 29.40760 197 177.6705 183.6705
4 + dm 2 10.26952 195 167.4010 177.4010

“+” means included in the model

02 Test statistic using R DON’T
GIVE UP

## Multiple Logistic Regression (BACKWARD METHOD)

model0B$anova
Stepwise Model Path
Analysis of Deviance Table

Initial Model:
hyperchol ~ age + dm + hpt + exercise

Final Model: The final model using

hyperchol ~ age + dm + exercise BACKWARD METHOD

Step Df Deviance Resid. Df Resid. Dev AIC

1 194 167.3927 179.3927
2 - hpt 1 0.008329567 195 167.4010 177.4010

“-” means excluded from the model

CHOOSING THE DON’T
GIVE UP
Correct Statistical Test
Number of Independent(Predictor) Variables
≥ Two (Independent)
Dependent (Outcome)

One (Independent)
Categorical Data

Simple
Two(2) categories

Variable Selection
Logistic Regression
Variable

Multiple
• Numerical
Logistic Regression
• (age)
• Categorical or Numerical or mix
Simple
• (gender and age)
Logistic Regression
• Categorical
• (malay, chinese and indian)
02 Test statistic using R DON’T
GIVE UP

Let’s decide MLR using

BACKWARD METHOD as the final model
## extract the coefficients from the model and exponentiate
OR <- exp(coef(model0B))
OR
(Intercept) age dmControlled DM dmUncontrolled DM exerciseYes
0.0000208493 1.2770416691 2.5909915916 4.8450792070 0.1398396242

Change the reference group from

exercise group to non-exercise group

## Multiple Logistic Regression (BACKWARD METHOD)

hptR$exercise <- relevel(hptR$exercise, ref = "Yes")
model0B <- stepAIC(model02, direction = "backward")

## extract the coefficients from the model and exponentiate

OR <- exp(coef(model0B))
OR
(Intercept) age dmControlled DM dmUncontrolled DM exerciseNo
2.915558e-06 1.277042e+00 2.590992e+00 4.845079e+00 7.151049e+00

NOTE
Categorical Data
Numerical Data
02 Test statistic using R DON’T
GIVE UP

Let’s decide MLR using

BACKWARD METHOD as the final model
CI <- as.data.frame(exp(confint(model0B)))
CI
2.5 % 97.5 %
(Intercept) 3.204919e-08 1.462969e-04
age 1.170306e+00 1.409334e+00
dmControlled DM 1.106978e+00 6.294374e+00
dmUncontrolled DM 1.730003e+00 1.425207e+01
exerciseNo 3.148796e+00 1.779667e+01

## 2-tailed Wald z tests to test significance of coefficients

PV <- summary(model0B)$coefficients[,4]
PV
(Intercept) age dmControlled DM dmUncontrolled DM exerciseNo
2.516469e-09 2.138787e-07 3.082457e-02 3.165608e-03 7.308127e-06
02 Test statistic using R DON’T
GIVE UP

COMBINE
## Combine specific results into one table
model99 <- cbind("OR"=c(OR[-1]),
"CI"=c(paste0("(",format(round(CI$`2.5 %`[-1],2),nsmall = 2),",",
format(round(CI$`97.5 %`[-1],2),nsmall = 2),")")),
"pvalue"=c(PV[-1]))
model99
OR CI pvalue
age "1.27704166910185" "(1.17, 1.41)" "2.13878673212258e-07"
dmControlled DM "2.59099159159642" "(1.11, 6.29)" "0.0308245651537937"
dmUncontrolled DM "4.84507920704878" "(1.73,14.25)" "0.00316560843115176"
exerciseNo "7.15104896668564" "(3.15,17.80)" "7.30812725789492e-06"
02 Test statistic using R DON’T
GIVE UP

COMBINE
## Combine specific results into one table
outFULL <- rbind(cbind("Age in years",t(model99[1,])), #Age in years
cbind("Diabetes Mellitus","","",""), #Diabetes Mellitus
cbind(model0B$xlevels$dm,
rbind(cbind("1.00","(ref.)",""),model99[2:3,])),
cbind("Exercise","","",""), #Exercise
cbind(model0B$xlevels$exercise,
rbind(cbind("1.00","(ref.)",""),model99[4,])))
rownames(outFULL) <- NULL
outFULL
OR CI pvalue
[1,] "Age in years" "1.27704166910185" "(1.17, 1.41)" "2.13878673212258e-07"
[2,] "Diabetes Mellitus" "" "" ""
[3,] "No DM" "1.00" "(ref.)" ""
[4,] "Controlled DM" "2.59099159159642" "(1.11, 6.29)" "0.0308245651537937"
[5,] "Uncontrolled DM" "4.84507920704878" "(1.73,14.25)" "0.00316560843115176"
[6,] "Exercise" "" "" ""
[7,] "Yes" "1.00" "(ref.)" ""
[8,] "No" "7.15104896668564" "(3.15,17.80)" "7.30812725789492e-06"
03 Checking assumptions using R DON’T
GIVE UP

## STEP 1: Checking multicollinearity (Variance Inflation Factor)

library(car)
round(vif(model0B),2)
GVIF Df GVIF^(1/(2*Df))
age 1.18 1 1.09 None of the VIFs is more than 10.
dm 1.05 2 1.01
1
Thus, there is no MC problem.
exercise 1.13 1 1.06

## STEP 2: Checking outliers (Cook’s Distance)

describe(cooks.distance(model0B))
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 200 0.01 0.01 0 0 0 0 0.08 0.08 3.58 15.85 0
If the max is 0.08 which is not more than 1.0.
2 Thus, there is no influential outlier.
## STEP 3: Checking model fit (Hosmer-Lemeshow goodness-of-fit test)
library(ResourceSelection)
hoslem.test(model0B$y, fitted(model0B), g=10)
Hosmer and Lemeshow goodness of fit (GOF) test
The p-value is 0.713 which is more than 0.05
data: model0B$y, fitted(model0B)
means that:
X-squared = 5.4143, df = 8, p-value = 0.7125 3
the model fits well; AND
the dataset fits well with the logistic model.
Overall, this final model met all assumptions for multiple logistic regression
04 Interpretation DON’T
GIVE UP

outFULL
OR CI pvalue
[1,] "Age in years" "1.27704166910185" "(1.17, 1.41)" "2.13878673212258e-07"
[2,] "Diabetes Mellitus" "" "" ""
[3,] "No DM" "1.00" "(ref.)" ""
[4,] "Controlled DM" "2.59099159159642" "(1.11, 6.29)" "0.0308245651537937"
[5,] "Uncontrolled DM" "4.84507920704878" "(1.73,14.25)" "0.00316560843115176"
[6,] "Exercise" "" "" ""
[7,] "Yes" "1.00" "(ref.)" ""
[8,] "No" "7.15104896668564" "(3.15,17.80)" "7.30812725789492e-06"

Odds Ratio
➢ AGE
• An increase in one year in age has a 1.28 times more odds/likely of having HC.
➢ DIABETES MELLITUS
• Controlled DM group have 2.59 times more odds/likely of having HC compared to non-DM group.
• Uncontrolled DM group have 4.85 times more odds/likely of having HC compared to non-DM group.
➢ EXERCISE
• Non-exercise group have 7.15 times more odds/likely of having HC compared to the exercise group.
05 Data Presentation DON’T
GIVE UP

“A logistic regression was performed to study the effects of age, diabetes mellitus and exercise on the likelihood that patients
have hypercholesterolemia. Results indicated that age (p<0.001), controlled diabetes mellitus (p=0.031), uncontrolled
diabetes mellitus (p=0.003) and non-exercise (p<0.001) are statistically significant factors for hypercholesterolemia. An
increase in one year in age has a 1.28 times more odds/likely of having hypercholesterolemia. The controlled and
uncontrolled diabetes mellitus group have 2.59 and 4.85 times more odds/likely of having hypercholesterolemia compared to
the none diabetes mellitus group. The non-exercise group have 7.15 times more odds/likely of having hypercholesterolemia
compared to the exercise group.”
05 Data Presentation DON’T
GIVE UP

“A logistic regression was performed to study the effects of age, diabetes mellitus and exercise on the likelihood that patients
have hypercholesterolemia. Results indicated that controlled diabetes mellitus (p=0.031), uncontrolled diabetes mellitus
(p=0.003) and non-exercise (p<0.001) are statistically significant factors for hypercholesterolemia after adjusted by age
(p<0.001). The controlled and uncontrolled diabetes mellitus group have 2.59 and 4.85 times more odds/likely of having
hypercholesterolemia compared to the none diabetes mellitus group. The non-exercise group have 7.15 times more
odds/likely of having hypercholesterolemia compared to the exercise group.”
05 Data Presentation DON’T
GIVE UP

EXAMPLE: Site comparison of mortality (2009)

4.0

3.5

3.0
p-value > 0.05
2.5

2.0

1.5

Dead 1.0

0.5

Alive 0.0 33 4 16 30 22 29 13 3 23 12 17 7 19 10 32 21 15 2 9 27 31 14 18 5 11 24 8 25 26 20 6 28
Crude Odds Ratio 0.47 0.58 0.59 0.63 0.63 0.71 0.72 0.75 0.83 0.84 0.85 0.88 0.90 0.92 0.93 0.95 0.97 1.00 1.11 1.12 1.12 1.16 1.25 1.31 1.33 1.34 1.36 1.46 1.50 1.73 1.79 2.51
Lower Limit 0.30 0.36 0.41 0.41 0.44 0.50 0.47 0.54 0.60 0.54 0.60 0.66 0.64 0.64 0.65 0.58 0.68 0.79 0.74 0.79 0.72 0.79 0.99 0.81 0.99 1.01 1.06 1.10 1.16 1.33 1.66
Upper Limit 0.72 0.92 0.86 0.95 0.90 1.02 1.09 1.04 1.15 1.29 1.19 1.17 1.26 1.32 1.32 1.54 1.39 1.54 1.71 1.60 1.88 1.96 1.74 2.19 1.81 1.84 2.02 2.03 2.59 2.40 3.80
THANK YOU
SHAHRUL AIMAN BIN SOELAR
Clinical Research Centre, Hospital Sultanah Bahiyah
[email protected]

Logistic Regression
0% (1)
Logistic Regression
49 pages
Multiple Logistic Regression
No ratings yet
Multiple Logistic Regression
71 pages
L5 Logistic Regression (2011)
100% (1)
L5 Logistic Regression (2011)
55 pages
Lect7 Math231
No ratings yet
Lect7 Math231
29 pages
Lecture 5. Part 1 - Regression Analysis
No ratings yet
Lecture 5. Part 1 - Regression Analysis
28 pages
Heart Disease App With Code
No ratings yet
Heart Disease App With Code
22 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
Multiple Logistic Regression (SPSS) 2021
No ratings yet
Multiple Logistic Regression (SPSS) 2021
79 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Sestrada Logistic Regression in R 02172023
No ratings yet
Sestrada Logistic Regression in R 02172023
25 pages
18Logistic regression yilma
No ratings yet
18Logistic regression yilma
88 pages
Assignment 2
No ratings yet
Assignment 2
11 pages
Log Reg
No ratings yet
Log Reg
32 pages
1734438351389
No ratings yet
1734438351389
45 pages
Logistic Regression
100% (1)
Logistic Regression
34 pages
Lecture 3-Logistic Reg Model-II
No ratings yet
Lecture 3-Logistic Reg Model-II
37 pages
Final Cc01 Group05-1
No ratings yet
Final Cc01 Group05-1
26 pages
Logistic Regression-1
No ratings yet
Logistic Regression-1
27 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
Predictive Modeling: Logistic Regression
No ratings yet
Predictive Modeling: Logistic Regression
13 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
T3. Logistic Regressions
No ratings yet
T3. Logistic Regressions
3 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
Full Download Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva PDF DOCX
100% (4)
Full Download Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva PDF DOCX
55 pages
lineare regrassion and correlation for mph
No ratings yet
lineare regrassion and correlation for mph
119 pages
Logistic Reg
No ratings yet
Logistic Reg
87 pages
Thesis Using Logistic Regression
100% (2)
Thesis Using Logistic Regression
7 pages
Week 8 - Logistic Regression
No ratings yet
Week 8 - Logistic Regression
67 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
A1
No ratings yet
A1
8 pages
Binary Logistic Regression Concept
No ratings yet
Binary Logistic Regression Concept
10 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Bivariate Logistic Regression
No ratings yet
Bivariate Logistic Regression
12 pages
ProbList5-24-Sln
No ratings yet
ProbList5-24-Sln
9 pages
Dissertation Using Logistic Regression
100% (2)
Dissertation Using Logistic Regression
6 pages
Appendix: Answers To Selected Exercises: /user
No ratings yet
Appendix: Answers To Selected Exercises: /user
8 pages
Assignment_STAT5002
No ratings yet
Assignment_STAT5002
5 pages
Part 3 - Binary Outcome Variables
No ratings yet
Part 3 - Binary Outcome Variables
104 pages
Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
100% (4)
Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
65 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Guide To SPSS
No ratings yet
Guide To SPSS
15 pages
5.1) Binary logistic regression
No ratings yet
5.1) Binary logistic regression
32 pages
Regression Categorical Variables 2024-2
No ratings yet
Regression Categorical Variables 2024-2
61 pages
Logistic regression_2021 ch-8
No ratings yet
Logistic regression_2021 ch-8
52 pages
Logistic Regression Mini Tab
100% (3)
Logistic Regression Mini Tab
20 pages
PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
100% (3)
PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
33 pages
Minitab Tip Sheet 15
No ratings yet
Minitab Tip Sheet 15
5 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet
Interflex Co LTD
No ratings yet
Interflex Co LTD
1 page
Week 5-1S1 Oil and Gas E - P
No ratings yet
Week 5-1S1 Oil and Gas E - P
43 pages
Belt Tracking Guide
No ratings yet
Belt Tracking Guide
13 pages
Analysis of The Gait Characteristics and Usability After Wearable Exoskeleton Robot Gait Training
No ratings yet
Analysis of The Gait Characteristics and Usability After Wearable Exoskeleton Robot Gait Training
10 pages
Informe Diagnostico Esi Digital
No ratings yet
Informe Diagnostico Esi Digital
45 pages
Annual Annual Annual Annual Examination, 2017 Examination, 2017 Examination, 2017 EXAMINATION, 2017 - 18 18 18 18
No ratings yet
Annual Annual Annual Annual Examination, 2017 Examination, 2017 Examination, 2017 EXAMINATION, 2017 - 18 18 18 18
3 pages
The Important of Media Use in English Education
No ratings yet
The Important of Media Use in English Education
10 pages
Boschloo Representation of History
No ratings yet
Boschloo Representation of History
14 pages
Differentiated Lesson Plan: Topographic and Geologic Maps
No ratings yet
Differentiated Lesson Plan: Topographic and Geologic Maps
3 pages
GB 3
No ratings yet
GB 3
131 pages
Chem 10 Ch#9,10,11,13
No ratings yet
Chem 10 Ch#9,10,11,13
3 pages
Mindset Secrets for Winning How to Bring Personal Power to Everything You Do Bonus Chapter Living With Intention 1st Edition Mark Minervini instant download
100% (1)
Mindset Secrets for Winning How to Bring Personal Power to Everything You Do Bonus Chapter Living With Intention 1st Edition Mark Minervini instant download
48 pages
Unit 18 Global Business Environment: Student Name: Student ID
No ratings yet
Unit 18 Global Business Environment: Student Name: Student ID
16 pages
Llrap13107399119 PDF
No ratings yet
Llrap13107399119 PDF
1 page
UNIT IV - Engine Cooling Systems
No ratings yet
UNIT IV - Engine Cooling Systems
55 pages
Digital Marketing in Healthcare
No ratings yet
Digital Marketing in Healthcare
4 pages
2ND PT Contemporary 2022-2023
No ratings yet
2ND PT Contemporary 2022-2023
4 pages
Lift Your Low Mood Colour
No ratings yet
Lift Your Low Mood Colour
28 pages
FSP3000R7 R19.2 Maintenance and Troubleshooting Manual IssA
No ratings yet
FSP3000R7 R19.2 Maintenance and Troubleshooting Manual IssA
1,256 pages
IoT Project - Smart Terrarium Group 1
No ratings yet
IoT Project - Smart Terrarium Group 1
7 pages
Declaration by Paper Setter: Scrutiny of The Question Paper
No ratings yet
Declaration by Paper Setter: Scrutiny of The Question Paper
5 pages
Practical STEAM - WEBINAR 060224 - Shared Document
No ratings yet
Practical STEAM - WEBINAR 060224 - Shared Document
18 pages
Railway Reservation System: Software Requirements Specification
100% (1)
Railway Reservation System: Software Requirements Specification
19 pages
(Ebook) How to Do Everything with HTML by James H. Pence ISBN 9780072132731, 9780072192094, 0072132736, 0072192097 download
100% (1)
(Ebook) How to Do Everything with HTML by James H. Pence ISBN 9780072132731, 9780072192094, 0072132736, 0072192097 download
46 pages
UX Vs UI - Mahdiyar Shababi
No ratings yet
UX Vs UI - Mahdiyar Shababi
7 pages
n38-1934-Eyes-of-the-Movie_Potamkin
No ratings yet
n38-1934-Eyes-of-the-Movie_Potamkin
32 pages
Intertek EV Fluids WP - 23
No ratings yet
Intertek EV Fluids WP - 23
16 pages
Conveyor Gearbox Failure Analysis #
No ratings yet
Conveyor Gearbox Failure Analysis #
6 pages
Final Datesheet PUE_II_III_IV Yr 14 - 26 Dec. 24, Odd Sem 2024-25
No ratings yet
Final Datesheet PUE_II_III_IV Yr 14 - 26 Dec. 24, Odd Sem 2024-25
10 pages
Strength of Joints That Combine Bolts and Welds
No ratings yet
Strength of Joints That Combine Bolts and Welds
10 pages

Logistic Regression (2022)

Uploaded by

Logistic Regression (2022)

Uploaded by

BINARY

• To study the association between risk factors (numerical

The model The Logistic function

• To study the association between risk factors (numerical

Drug users Non-users Total

• Odds: the probability of belonging to one group or event occurring divided

Drug users Non-users Total

• Odds Ratio: an important estimate in logistic regression and used to

## Compute summary statistics by groups

Caution!! Acceptable range for normality is skewnessᵃ

## Simple Logistic Regression (NUMERICAL DATA)

Degrees of Freedom: 199 Total (i.e. Null); 198 Residual

## extract the coefficients from the model and exponentiate

## 2-tailed Wald z tests to test significance of coefficients

outN1 <- cbind(desc99,model99)

## Compute summary statistics by groups

The assumption was not met

## Simple Logistic Regression (CATEGORICAL DATA)

Degrees of Freedom: 199 Total (i.e. Null); 197 Residual

## extract the coefficients from the model and exponentiate

## 2-tailed Wald z tests to test significance of coefficients

outC1 <- rbind(cbind("Diabetes Mellitus","","","","","","","","","","",""),

Enter all the variables in the model.

(1) ENTER | (2) FORWARD | (3) BACKWARD

NOT CONTRIBUTES SIGNIFICANTLY TO THE MODEL

## Multiple Logistic Regression (ENTER METHOD)

Null deviance: 237.18 on 199 degrees of freedom

## Multiple Logistic Regression (ENTER METHOD)

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 237.18 on 199 degrees of freedom

## Multiple Logistic Regression (INITIAL MODEL)

Degrees of Freedom: 199 Total (i.e. Null); 199 Residual

## Multiple Logistic Regression (FULL MODEL)

Degrees of Freedom: 199 Total (i.e. Null); 194 Residual

## Multiple Logistic Regression (FORWARD METHOD)

## Multiple Logistic Regression (BACKWARD METHOD)

## Multiple Logistic Regression (FORWARD METHOD)

Final Model: The final model using

Step Df Deviance Resid. Df Resid. Dev AIC

“+” means included in the model

## Multiple Logistic Regression (BACKWARD METHOD)

Final Model: The final model using

Step Df Deviance Resid. Df Resid. Dev AIC

“-” means excluded from the model

Let’s decide MLR using

Change the reference group from

## Multiple Logistic Regression (BACKWARD METHOD)

## extract the coefficients from the model and exponentiate

Let’s decide MLR using

## 2-tailed Wald z tests to test significance of coefficients

## STEP 1: Checking multicollinearity (Variance Inflation Factor)

## STEP 2: Checking outliers (Cook’s Distance)

EXAMPLE: Site comparison of mortality (2009)

You might also like