1734438351389
1734438351389
Wan Nor Arifin. Multiple logistic regression by Wan Nor Arifin is licensed under the Creative Commons Attribution-
ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/.
IBM SPSS Statistics Version 22 screenshots are copyrighted to IBM Corp.
1
Multiple Logistic Regression
Outlines
●
Introduction
●
Steps in Multiple Logistic Regression
1. Descriptive Statistics
2. Variable Selection
3. Model Fit Assessment
4. Final Model Interpretation & Presentation
2
Multiple Logistic Regression
Objectives
3
Multiple Logistic Regression
Introduction
●
Logistic regression is used when:
– Dependent Variable, DV: A binary categorical variable [Yes/
No], [Disease/No disease] i.e the outcome.
●
Simple logistic regression – Univariable:
– Independent Variable, IV: A categorical/numerical variable.
●
Multiple logistic regression – Multivariable:
– IVs: Categorical & numerical variables.
●
Recall – Multiple Linear Regression?
4
Multiple Logistic Regression
Introduction
●
Multiple Linear Regression
– y = a + b1x1 + b2x2 + … + bnxn
●
Multiple Logistic Regression
– log(odds) = a + b1x1 + b2x2 + … + bnxn
– That's why it is called “logistic” regression.
5
Multiple Logistic Regression
Introduction
●
Binary outcome: Concerned with Odds Ratio.
– Odds is a measure of chance like probability.
– Odds = n(Disease)/n(no Disease) among a group.
– Odds Ratio, OR = Odds(Factor)/Odds(No factor)
– Applicable to all observational study designs.
●
Relative Risk, RR
– Only cohort study.
●
OR ≈ RR for rare disease, useful to determine risk.
6
Multiple Logistic Regression
Introduction
8
Multiple Logistic Regression
Steps in Multiple Logistic Regression
●
Dataset: slog.sav
●
Sample size, n=200
●
DV: cad (1: Yes, 0: No)
●
IVs:
– Numerical: sbp (systolic blood pressure), dbp (diastolic
blood pressure), chol (serum cholesterol in mmol/L), age
(age in years), bmi (Body Mass Index).
– Categorical: race (0: Malay, 1: Chinese, 2: Indian), gender (0:
Female, 1: Male)
9
Multiple Logistic Regression
Steps in Multiple Logistic Regression
1.Descriptive statistics.
2.Variable selection.
a. Univariable analysis.
b. Multivariable analysis.
c. Multicollinearity.
d. Interactions.
3.Model fit assessment.
4.Final model interpretation & presentation.
10
Multiple Logistic Regression
1. Descriptive statistics
●
Set outputs by CAD
status.
– Data → Split File →
Select Compare groups
– Set Groups Based on:
cad, OK
11
Multiple Logistic Regression
1. Descriptive statistics
●
Obtain mean(SD) and
n(%) by CAD group.
– Analyze → Descriptive
Statistics → Frequencies
– Include relevant
variables in Variables
12
Multiple Logistic Regression
1. Descriptive statistics
●
Cont...
– Statistics → tick →
Continue
13
Multiple Logistic Regression
1. Descriptive statistics
●
Cont...
– Charts → tick →
Continue → OK
14
Multiple Logistic Regression
1. Descriptive statistics
●
Results
15
Multiple Logistic Regression
1. Descriptive statistics
●
Results
16
Multiple Logistic Regression
1. Descriptive statistics
●
Results
– Look at histograms to
decide data normality for
numerical variables.
Remember your Basic Stats!
●
Caution! Reset back the
data.
– Data → Split File → Select
Analyze all cases
– OK
17
Multiple Logistic Regression
1. Descriptive statistics
●
Present the results in a table.
CAD, n=37 No CAD, n=163
Factors
mean(SD) mean(SD)
Systolic Blood Pressure 143.8(25.61) 129.3(22.26)
Diastolic Blood Pressure 89.0(12.17) 80.8(12.61)
Cholesterol 6.6(1.17) 6.1(1.17)
Age 47.4(8.80) 45.2(8.41)
BMI 36.4(3.99) 36.9(3.77)
Malay 13(35.1%) 60(36.8%)
Race* Chinese 12(32.4%) 52(31.9%)
Indian 12(32.4%) 51(31.3%)
Male 24(64.9%) 76(46.6%)
Gender*
Female 13(35.1%) 87(53.4%)
18
Multiple Logistic Regression
2. Variable selection
●
To select best variables to predict the outcome.
●
Sub-steps:
a. Univariable analysis.
b. Multivariable analysis.
c. Checking multicollinearity & interactions.
19
Multiple Logistic Regression
2a. Univariable analysis
●
Perform Simple Logistic Regression on each IV.
●
Select IVs which fullfill:
– P-value < 0.25 → Statistical significance.
– Clinically significant IVs → You decide.
20
Multiple Logistic Regression
2a. Univariable analysis
●
Analyze numerical
variables:
– Analyze → Regression →
Binary Logistic
– Dependent: cad,
Covariates: sbp
– Click Options → Tick
Iteration history, CI for
exp(B) → Continue → OK
– Repeat for dbp, chol, age,
bmi
21
Multiple Logistic Regression
2a. Univariable analysis
Model: SBP P-
●
Results value=0.001 by
Likelihood Ratio (LR) ●
Exp(B) is OR.
test ●
OR(1 unit ↑ in SBP)
=1.04(95% CI: 1.01,
SBP P-value=0.001 by
1.04). Unadjusted/
Wald test
Crude OR.
●
Interpretation:
1mmHg increase in
SBP increase odds of
CAD by 1.02 times.
●
In variable selection
context, less concern
about OR &
interpretation.
22
Multiple Logistic Regression
2a. Univariable analysis
●
Analyze categorical
variables:
– Dependent: cad,
Covariates: gender
– Click Categorical →
Categorical Covariates:
gender → Change Contrast
→ Reference Category:
First → Change → Continue.
– Repeat for race
23
Multiple Logistic Regression
2a. Univariable analysis
●
Results Women=0 becomes
the reference group.
●
OR(male)=2.11(95%
Model: Gender P- CI: 1.01, 4.44).
value=0.044 by LR test Unadjusted/Crude
OR.
●
Interpretation: Man
Gender P-value=0.048 has 2.11 times odds
by Wald test of CAD as compared
to woman.
24
Multiple Logistic Regression
2a. Univariable analysis
●
P-values of IVs – select P-value < 0.25
Factors P-value (Wald test) P-value (LR test)
●
Selected variables:
– sbp, dbp, chol, age, gender
●
Perform Multiple logistic regression of the selected
variables (multivariable) in on go.
●
Variable selection is now proceed at multivariable
level.
●
Some may remain significant, some become
insignificant.
26
Multiple Logistic Regression
2b. Multivariable analysis
●
Variable Selection
Methods:
– Automatic.
●
Forward: Conditional, LR,
Wald. Enters variables.
●
Backward: Conditional, LR,
Wald. Removes variables.
– Manual.
●
Enter. Entry & removal of
variables done manually.
(Recommended, but leave to
experts/statisticians).
27
Multiple Logistic Regression
2b. Multivariable analysis
●
Variable Selection in this workshop:
– Automatic by Forward & Backward LR.
– Selection of variables by P-values based on LR test.
28
Multiple Logistic Regression
2b. Multivariable analysis
●
Enter all selected variables.
●
Perform 2x – 1x Forward LR, 1x Backward LR.
29
Multiple Logistic Regression
2b. Multivariable analysis
●
Results
Forward LR
●
Both methods
keep same
IVs: dbp &
gender.
●
P-values by
Wald test.
Backward LR
30
Multiple Logistic Regression
2b. Multivariable analysis
●
Results
Forward LR
●
Both methods keep same
IVs: dbp & gender.
●
P-values by LR test.
Backward LR
31
Multiple Logistic Regression
2c. Multicollinearity
●
Indicates redundant variables –
highly correlated IVs.
●
Perform Enter method with dbp &
gender.
●
Look at coefficients (B) & std
errors (SE) / ORs (95% CIs) if they
are suspiciously large.
●
Results
●
SEs are quite small
relative to Bs.
●
95% CIs are not too
wide.
●
No multicollinearity.
32
Multiple Logistic Regression
2d. Interactions
●
IVs combination that
requires interpretation of
regression separately
based on levels of IV →
making things
complicated.
●
Perform Enter method with
dbp, gender & dbp x gender.
Select both dbp & gender
(hold Ctrl on keyboard) →
Click >a*b>
33
Multiple Logistic Regression
2d. Interactions
●
Results
34
Multiple Logistic Regression
2. Variable selection
●
At the end of Variable Selection Step → Preliminary
Final Model. ●
P-values by Wald
test per variable
by Enter method.
●
Take this adjusted
OR.
P-values by LR per
variable. Obtained with
Forward LR method.
35
Multiple Logistic Regression
3. Model fit assessment
●
By these 3 goodness-of-fit assessment methods:
a. Hosmer-Lemeshow test
b. Classification table.
c. Area under Receiver Operating Characteristics (ROC)
curve.
●
At the end → Final Model.
36
Multiple Logistic Regression
3. Model fit assessment
●
Perform Enter method with
dbp & gender.
●
Additionally
– Click Options... → Tick
Hosmer-Lemeshow
goodness-of-fit
– Click Save… → Tick
Probabilities under
Predicted Values
– A new variable PRE_1 will be
created.
37
Multiple Logistic Regression
3a. Hosmer-Lemeshow test
●
Indicates fit of Preliminary Final Model to data.
●
Results P-value 0.09 > 0.05 →
Good model fit to the data.
●
Expected/predicted counts
by model.
●
The smaller the differences
between Observed vs
Expected → Better model fit
to data.
38
Multiple Logistic Regression
3b. Classification table
●
CAD & No CAD subjects observed vs
predicted/classified by Preliminary Final Model.
●
% correctly classified > 70% is expected for good
model fit.
Results 80% of subjects are
●
●
correctly classified by
the model.
●
Good model fit to the
data.
39
Multiple Logistic Regression
3c. Area under ROC curve (AUC)
●
A measure of ability of the model to
discriminate CAD vs Non CAD
subjects.
●
AUC > 0.7 is acceptable fit.
●
AUC ≤ 0.5 no discrimination at all,
not acceptable.
●
Steps
– Analyze → Classify → ROC curve... →
Assign Test Variable: Predicted
probability (PRE_1), State Variable: cad,
Value of State Variable: 1.
– Under Display tick ROC Curve, With
diagonal reference line and Standard
Error and confidence interval.
40
Multiple Logistic Regression
3c. Area under ROC curve (AUC)
●
Results
●
AUC=0.73 > 0.7.
●
95% CI: 0.64, 0.82.
●
Lower limit slightly < 0.7, still
acceptable > 0.5.
●
Good model fit to the data.
41
Multiple Logistic Regression
3. Model fit assessment
●
All 3 methods indicate good model fit of
Preliminary Final Model.
●
Can conclude the model with dbp & gender → Final
Model.
42
Multiple Logistic Regression
2. Final Model interpretation & presentation
●
The Final Model.
●
P-values by Wald
test per variable
by Enter method.
●
Take this adjusted
OR.
P-values by LR per
variable. Obtained with
Forward LR method.
43
Multiple Logistic Regression
4. Final Model interpretation & presentation
●
Associated factors of coronary artery disease.
Factors b Adjusted OR (95% CI) P-valuea
44
Multiple Logistic Regression
Q&A
45
Multiple Logistic Regression