0% found this document useful (0 votes)
4 views45 pages

1734438351389

The document provides a comprehensive overview of multiple logistic regression, including its objectives, steps, and statistical methods using SPSS. It outlines the process of variable selection, model fit assessment, and interpretation of results, emphasizing the importance of odds ratios and statistical significance. The document also includes practical examples and results from analyses conducted on a dataset related to coronary artery disease.

Uploaded by

waleamogne507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views45 pages

1734438351389

The document provides a comprehensive overview of multiple logistic regression, including its objectives, steps, and statistical methods using SPSS. It outlines the process of variable selection, model fit assessment, and interpretation of results, emphasizing the importance of odds ratios and statistical significance. The document also includes practical examples and results from analyses conducted on a dataset related to coronary artery disease.

Uploaded by

waleamogne507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Multiple Logistic Regression

Dr. Wan Nor Arifin


Unit of Biostatistics and Research Methodology,
Universiti Sains Malaysia.
[email protected] / wnarifin.github.io

Wan Nor Arifin. Multiple logistic regression by Wan Nor Arifin is licensed under the Creative Commons Attribution-
ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/.
IBM SPSS Statistics Version 22 screenshots are copyrighted to IBM Corp.
1
Multiple Logistic Regression
Outlines


Introduction

Steps in Multiple Logistic Regression
1. Descriptive Statistics
2. Variable Selection
3. Model Fit Assessment
4. Final Model Interpretation & Presentation

2
Multiple Logistic Regression
Objectives

1.Understand the reasons behind the use of logistic


regression.
2.Perform multiple logistic regression in SPSS.
3.Identify and interpret the relevant SPSS outputs.
4.Summarize important results in a table.

3
Multiple Logistic Regression
Introduction


Logistic regression is used when:
– Dependent Variable, DV: A binary categorical variable [Yes/
No], [Disease/No disease] i.e the outcome.

Simple logistic regression – Univariable:
– Independent Variable, IV: A categorical/numerical variable.

Multiple logistic regression – Multivariable:
– IVs: Categorical & numerical variables.

Recall – Multiple Linear Regression?
4
Multiple Logistic Regression
Introduction


Multiple Linear Regression
– y = a + b1x1 + b2x2 + … + bnxn

Multiple Logistic Regression
– log(odds) = a + b1x1 + b2x2 + … + bnxn
– That's why it is called “logistic” regression.

5
Multiple Logistic Regression
Introduction


Binary outcome: Concerned with Odds Ratio.
– Odds is a measure of chance like probability.
– Odds = n(Disease)/n(no Disease) among a group.
– Odds Ratio, OR = Odds(Factor)/Odds(No factor)
– Applicable to all observational study designs.

Relative Risk, RR
– Only cohort study.

OR ≈ RR for rare disease, useful to determine risk.
6
Multiple Logistic Regression
Introduction

Factor vs CAD CAD No CAD

Man 24 [a] 76 [b]


Woman
13 [c] 87 [d]
(i.e. not Man)

Odds(man) = a/b = 24/76 = 0.32

Odds(woman) = c/d = 13/87 = 0.15

OR(man/woman) = 0.32/0.15 = 2.13

Shortcut, OR = ad/bc = (24x87)/(76x13) = 2.11
7
Multiple Logistic Regression
Introduction

Factor vs CAD CAD No CAD

Man 24 [a] 76 [b]


Woman
13 [c] 87 [d]
(i.e. not Man)

Risk(man) = Proportion CAD = a/(a+b) = 0.24

Risk(woman) = Proportion CAD c/(c+d) = 0.13

RR(man/woman) = 0.24/0.13 = 1.85 ≈ OR, 2.11

8
Multiple Logistic Regression
Steps in Multiple Logistic Regression


Dataset: slog.sav

Sample size, n=200

DV: cad (1: Yes, 0: No)

IVs:
– Numerical: sbp (systolic blood pressure), dbp (diastolic
blood pressure), chol (serum cholesterol in mmol/L), age
(age in years), bmi (Body Mass Index).
– Categorical: race (0: Malay, 1: Chinese, 2: Indian), gender (0:
Female, 1: Male)
9
Multiple Logistic Regression
Steps in Multiple Logistic Regression

1.Descriptive statistics.
2.Variable selection.
a. Univariable analysis.
b. Multivariable analysis.
c. Multicollinearity.
d. Interactions.
3.Model fit assessment.
4.Final model interpretation & presentation.
10
Multiple Logistic Regression
1. Descriptive statistics


Set outputs by CAD
status.
– Data → Split File →
Select Compare groups
– Set Groups Based on:
cad, OK

11
Multiple Logistic Regression
1. Descriptive statistics


Obtain mean(SD) and
n(%) by CAD group.
– Analyze → Descriptive
Statistics → Frequencies
– Include relevant
variables in Variables

12
Multiple Logistic Regression
1. Descriptive statistics


Cont...
– Statistics → tick →
Continue

13
Multiple Logistic Regression
1. Descriptive statistics


Cont...
– Charts → tick →
Continue → OK

14
Multiple Logistic Regression
1. Descriptive statistics


Results

15
Multiple Logistic Regression
1. Descriptive statistics


Results

16
Multiple Logistic Regression
1. Descriptive statistics


Results
– Look at histograms to
decide data normality for
numerical variables.
Remember your Basic Stats!

Caution! Reset back the
data.
– Data → Split File → Select
Analyze all cases
– OK
17
Multiple Logistic Regression
1. Descriptive statistics


Present the results in a table.
CAD, n=37 No CAD, n=163
Factors
mean(SD) mean(SD)
Systolic Blood Pressure 143.8(25.61) 129.3(22.26)
Diastolic Blood Pressure 89.0(12.17) 80.8(12.61)
Cholesterol 6.6(1.17) 6.1(1.17)
Age 47.4(8.80) 45.2(8.41)
BMI 36.4(3.99) 36.9(3.77)
Malay 13(35.1%) 60(36.8%)
Race* Chinese 12(32.4%) 52(31.9%)
Indian 12(32.4%) 51(31.3%)
Male 24(64.9%) 76(46.6%)
Gender*
Female 13(35.1%) 87(53.4%)
18
Multiple Logistic Regression
2. Variable selection


To select best variables to predict the outcome.

Sub-steps:
a. Univariable analysis.
b. Multivariable analysis.
c. Checking multicollinearity & interactions.

19
Multiple Logistic Regression
2a. Univariable analysis


Perform Simple Logistic Regression on each IV.

Select IVs which fullfill:
– P-value < 0.25 → Statistical significance.
– Clinically significant IVs → You decide.

20
Multiple Logistic Regression
2a. Univariable analysis


Analyze numerical
variables:
– Analyze → Regression →
Binary Logistic
– Dependent: cad,
Covariates: sbp
– Click Options → Tick
Iteration history, CI for
exp(B) → Continue → OK
– Repeat for dbp, chol, age,
bmi
21
Multiple Logistic Regression
2a. Univariable analysis

Model: SBP P-

Results value=0.001 by
Likelihood Ratio (LR) ●
Exp(B) is OR.
test ●
OR(1 unit ↑ in SBP)
=1.04(95% CI: 1.01,
SBP P-value=0.001 by
1.04). Unadjusted/
Wald test
Crude OR.

Interpretation:
1mmHg increase in
SBP increase odds of
CAD by 1.02 times.

In variable selection
context, less concern
about OR &
interpretation.

22
Multiple Logistic Regression
2a. Univariable analysis


Analyze categorical
variables:
– Dependent: cad,
Covariates: gender
– Click Categorical →
Categorical Covariates:
gender → Change Contrast
→ Reference Category:
First → Change → Continue.
– Repeat for race

23
Multiple Logistic Regression
2a. Univariable analysis


Results Women=0 becomes
the reference group.

OR(male)=2.11(95%
Model: Gender P- CI: 1.01, 4.44).
value=0.044 by LR test Unadjusted/Crude
OR.

Interpretation: Man
Gender P-value=0.048 has 2.11 times odds
by Wald test of CAD as compared
to woman.

24
Multiple Logistic Regression
2a. Univariable analysis


P-values of IVs – select P-value < 0.25
Factors P-value (Wald test) P-value (LR test)

Systolic Blood Pressure 0.001 0.001

Diastolic Blood Pressure 0.001 0.001

Cholesterol 0.012 0.011

Age 0.143 0.141

BMI 0.505 0.511


Chinese-vs-Malay 0.887
Race 0.981*
Indian-vs-Malay 0.852
Gender Man- Woman 0.048 0.044

*For both variables


25
Multiple Logistic Regression
2b. Multivariable analysis


Selected variables:
– sbp, dbp, chol, age, gender

Perform Multiple logistic regression of the selected
variables (multivariable) in on go.

Variable selection is now proceed at multivariable
level.

Some may remain significant, some become
insignificant.
26
Multiple Logistic Regression
2b. Multivariable analysis


Variable Selection
Methods:
– Automatic.

Forward: Conditional, LR,
Wald. Enters variables.

Backward: Conditional, LR,
Wald. Removes variables.
– Manual.

Enter. Entry & removal of
variables done manually.
(Recommended, but leave to
experts/statisticians).
27
Multiple Logistic Regression
2b. Multivariable analysis


Variable Selection in this workshop:
– Automatic by Forward & Backward LR.
– Selection of variables by P-values based on LR test.

28
Multiple Logistic Regression
2b. Multivariable analysis


Enter all selected variables.

Perform 2x – 1x Forward LR, 1x Backward LR.

Options: Just leave at the default values.

29
Multiple Logistic Regression
2b. Multivariable analysis


Results
Forward LR


Both methods
keep same
IVs: dbp &
gender.

P-values by
Wald test.

Backward LR

30
Multiple Logistic Regression
2b. Multivariable analysis


Results
Forward LR


Both methods keep same
IVs: dbp & gender.

P-values by LR test.

Backward LR

31
Multiple Logistic Regression
2c. Multicollinearity


Indicates redundant variables –
highly correlated IVs.

Perform Enter method with dbp &
gender.

Look at coefficients (B) & std
errors (SE) / ORs (95% CIs) if they
are suspiciously large.

Results

SEs are quite small
relative to Bs.

95% CIs are not too
wide.

No multicollinearity.
32
Multiple Logistic Regression
2d. Interactions


IVs combination that
requires interpretation of
regression separately
based on levels of IV →
making things
complicated.

Perform Enter method with
dbp, gender & dbp x gender.
Select both dbp & gender
(hold Ctrl on keyboard) →
Click >a*b>
33
Multiple Logistic Regression
2d. Interactions


Results

Wald test for dbp by gender


(dbp*gender) not sig. Can
remove the interaction term
from model.

34
Multiple Logistic Regression
2. Variable selection


At the end of Variable Selection Step → Preliminary
Final Model. ●
P-values by Wald
test per variable
by Enter method.

Take this adjusted
OR.

P-values by LR test for


both dbp & gender by
Enter method.

P-values by LR per
variable. Obtained with
Forward LR method.

35
Multiple Logistic Regression
3. Model fit assessment


By these 3 goodness-of-fit assessment methods:
a. Hosmer-Lemeshow test
b. Classification table.
c. Area under Receiver Operating Characteristics (ROC)
curve.

At the end → Final Model.

36
Multiple Logistic Regression
3. Model fit assessment


Perform Enter method with
dbp & gender.

Additionally
– Click Options... → Tick
Hosmer-Lemeshow
goodness-of-fit
– Click Save… → Tick
Probabilities under
Predicted Values
– A new variable PRE_1 will be
created.
37
Multiple Logistic Regression
3a. Hosmer-Lemeshow test


Indicates fit of Preliminary Final Model to data.

Results P-value 0.09 > 0.05 →
Good model fit to the data.

Observed counts in data.


Expected/predicted counts
by model.

The smaller the differences
between Observed vs
Expected → Better model fit
to data.

38
Multiple Logistic Regression
3b. Classification table


CAD & No CAD subjects observed vs
predicted/classified by Preliminary Final Model.

% correctly classified > 70% is expected for good
model fit.
Results 80% of subjects are


correctly classified by
the model.

Good model fit to the
data.

39
Multiple Logistic Regression
3c. Area under ROC curve (AUC)


A measure of ability of the model to
discriminate CAD vs Non CAD
subjects.

AUC > 0.7 is acceptable fit.

AUC ≤ 0.5 no discrimination at all,
not acceptable.

Steps
– Analyze → Classify → ROC curve... →
Assign Test Variable: Predicted
probability (PRE_1), State Variable: cad,
Value of State Variable: 1.
– Under Display tick ROC Curve, With
diagonal reference line and Standard
Error and confidence interval.
40
Multiple Logistic Regression
3c. Area under ROC curve (AUC)


Results


AUC=0.73 > 0.7.

95% CI: 0.64, 0.82.

Lower limit slightly < 0.7, still
acceptable > 0.5.

Good model fit to the data.

41
Multiple Logistic Regression
3. Model fit assessment


All 3 methods indicate good model fit of
Preliminary Final Model.

Can conclude the model with dbp & gender → Final
Model.

42
Multiple Logistic Regression
2. Final Model interpretation & presentation


The Final Model.

P-values by Wald
test per variable
by Enter method.

Take this adjusted
OR.

P-values by LR test for


both dbp & gender by
Enter method.

P-values by LR per
variable. Obtained with
Forward LR method.

43
Multiple Logistic Regression
4. Final Model interpretation & presentation


Associated factors of coronary artery disease.
Factors b Adjusted OR (95% CI) P-valuea

Diastolic Blood Pressure 0.05 1.05 (1.02, 1.08) < 0.001

Gender Man vs Woman 0.81 2.24 (1.04, 4.82) 0.036


a
LR test
1mmHg increase in DBP
Man has 2.24 times odds of
increase odds of CAD
CAD as compared to woman,
by 1.05 times, while
while controlling for DBP.
controlling for gender.
To obtain for 10mmHg increase in DBP
OR = exp(c x b) = exp(10 x 0.05) = exp(0.5) = 1.65 times.

44
Multiple Logistic Regression
Q&A

45
Multiple Logistic Regression

You might also like