0% found this document useful (0 votes)

35 views

Chapter 8 Logistic Regression (Compatibility Mode)

Uploaded by

sayihmehari74

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Chapter 8 Logistic Regression (Compatibility Mode)

Uploaded by

sayihmehari74

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

University of Gondar

College of medicine and health science

Department of Epidemiology and
Biostatistics

Logistic Regression

Lemma Derseh (BSc. MPH)

Logistic regression
 In linear regression, we can fit a model consisting of a
continuous dependent variable with independent variable/s
of any measurement scale (categorical or numeric)

 What can we do if the dependent variable is

dichotomous (although we can have also
more than 2 categories i.e. multinomial or
ordinal logistic regression)?
Logistic regression cont…
The above question refers to the following types of problems:
Relationship between Coronary Heart Disease (a binary
outcome variable; i.e. +ve or -ve) and age (continuous variable).
Note: CHD = 0 implies -Ve for CHD, and CHD = 1 implies +Ve

Age CHD Age CHD Age CHD

22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1
Logistic regression cont…
 One possible statistical method is to use a t or Z-test to comparee the
mean ages of the two groups or using ANOVA ( Even though it has
only two outcome levels)
 Of course, in this regard we will get a statistically significant age
difference between the two groups (CHD +Ve against -Ve)
(p<0.0001)
 However, all these tests tell us only the signifiant différence in age
among the two groups, but not the magnitude of the effect of age on
CHD
 Therefore, what if our research goal is to know the probability
of getting +Ve CHD (i.e. to prédicat the outcome status of each
individual)? Or
 What happens when you have several covariates that you
believe contribute to CHD?
Shall we use linear regression?
Logistic regression cont…
First draw a scatter plot of status of CHD versus age

Probability for CHD

80 (age)

Second add the possible linear regression line of probability on age

Problem! If we try to fit an ordinary linear regression, we will

predict probabilities greater than 1 or less than 0 which is impossible
Logistic regression cont…
So what shall we do?

Rather than dealing with a single age data with binary outcomes,
let us group the age data so that we can get proportions
(probabilities) of success (1s) at different age groups

In doing so, we can get intermediate proportions between 0 and 1

Diseased

Age group # in group # probability

20 - 29 5 0 0

30 - 39 6 1 0.17

40 - 49 7 2 0.29

50 - 59 7 4 0.57

60 - 69 5 4 0.80

70 - 79 2 2 1.00

80 - 89 1 1 1.00
Logistic regression cont…
The probabilities in the above table are the same as the
proportions of individuals with CHD in each age category.

1
Sign of coronary disease

(Yes )

Probability for CHD

0 Age group
(No)
The scatter plot of the set of proportions in the age ranges
could give us the above S-shaped curve (red color)
Logistic regression cont…
 Again such S-shaped (sigmoidal) curve is difficult to describe
with a linear equation for two reasons.
 First, even though it seems linear at the center of the curve, the
extremes do not follow a linear trend;
 Second, the errors are neither normally distributed nor constant
across the entire range of data

Question! So what do we do with this S-Shaped curve?

 Answer:
 First: Find a function that best fits (be linked) with this S-
shaped graph
 Second: Find another function that transforms the S-shaped
graph into linear function
(I) Finding a function that best fits with
the S- shaped graph of probability

1.0

0.8

0.6
P = P(y/x) = P(success given x
0.4
occurred) = P(a person is +ve CHD
0.2 given his age is x)
0.0
20 40 60 80 100

We call the above mathematical expression a logistic function

It always has an S- shaped curve within the range of 0 and 1
for any x
That is why we linked it with p (probability) which has the
same S-shape in the same range of 0 to 1
(II) Transforming to linear function
using logit function
This is the logistic function

The odds of an event

a = log odds of event

in unexposed

logit of P b = log Odds Ratio associated

with being exposed

e b = Odds Ratio
The linking and transformation process
Yes

Outcome
Pi Start
No
Predictor (group) Predictor (single)

Link function

End

Logistic function Logit transform function Linear function

Characteristics of the logistic function
The S shaped curve of logistic function has the following
characteristics:

Function:

If â is the slope of the linear function after logit transformation then,

 The S-shaped curve has a slope equal to p(1-p) â , where p is the
probability at X = x

 As we move to the two extremes of x or p, the slope closes to 0

 The (x, p) coordinate on which the slope reaches its pick is (-á/â,
0.5). The value of x at this point is called median effective level
denoted by EL50
Logistic regression cont…
Example on the data given above:
The analysis of logistic regression is computer intensive

After entering the above data using SPSS and running it for
binary logistic regression, the following result has been obtained.

95.0% C.I. for

Variable â S.E Wald Sig. Exp(â) EXP(â)
Lower Upper
age
0.132 0.046 8.053 0.005 1.141 1.042 1.249
Constant -6.708 2.354 8.121 0.004 0.001

For a unit increase in age of a person, the odds of being

positive for CHD increases by a factor of 1.141

The 95% CI for this estimate (i.e. Odds Ratio) is (1.04, 1.23)
Logistic Regression cont…
Patient satisfaction

Example on Residence Unsatisfied Satisfied Total

categorical variable Rural 98 17 115
(residence) Vs patient Urban 205 154 359
satisfaction on service
Total 303 171 474

 Odds for Rural: p 0 . 85 ln Odds:

  5 . 76
1  p  0 . 15 ln( 5 . 76 )  1 .75

 Odds for Urban: p 0 . 571 ln Odds:

  1 . 33
1  p  0 . 429 ln(1.33)  0.285
 Odds Ratio = = 4.33 ∆ ln Odds = ln OR = â= 1.47

 OR remains the same by the two methods OR = e1.47 = 4.33

Interpreting the Logistic Regression
Model
 p 
 The model for this example is: ln       x 1
0 1
1 p 

 For urban (x1= 0) we have: ln  p    o   1  0   o

1 p 
(Always we make the unexposed category 0)

 Thus the estimate of the intercept is equal to ß0 which is the log

odds for urban (unexposed).
 p 
ln      0 . 285
1 p  0
Interpreting the Logistic Regression
Model cont…
 The estimate of the slope is the difference between the log
odds for rural on the predictor (exposed) and the log odds for
urban on the predictor (unexposed):

 p1   p0 
b1  ln   ln   1.75  0.285  1.465
 1  p1    1  p0  

 The fitted model is: log(Odds) = 0.285 + 1.465X

Meaning of the Odds Ratio
Oddsrural e 0.2851.465  1.465
 The odds ratio is:  0.285  e  4.33
Oddsurban e
Or , Odds Ratio = exp(â1) = exp (1.465) = e1.465 = 4.33

SPSS output
95.0% C.I. for EXP(B)
Variable B S.E Wald P-value. Exp(B) Lower Upper

Residence 1.47 0.284 26.72 <0.001 4.33 2.48 7.55

Constant 2.86 0.107 7.196 0.007 1.33

 Interpretation: the odds of rural patients’ unsatisfaction on the

service they got is 4.33 time that of urban residents’
Multiple logistic regression
 This model includes more than one independent variables

 The independent variables could be dichotomous, ordinal,

nominal, or continuous etc
 P 
logit(p)  ln    á  â 1 x 1  â 2 x 2  ...  â i x i
 1- P 
 Interpretation of bi

 It is the increase in log-odds for a one unit increase in xi

with all the other xis constant

 It measures association between xi and log-odds adjusted

for all other xi
Multiple logistic regression cont..
Example-1: Assume we have a second variable ‘sex’ which is
added to the existing data (CHD data) as indicated in the SPSS
data view in the exercise.
B S.E Wald Sig. Exp(B) 95.0% C.I. for
Variable EXP(B)
Lower Upper

age 0.114 0.053 4.733 0.030 1.121 1.011 1.243

sex(1) 2.952 1.276 5.356 0.021 19.153 1.571 233.459

Constant -7.787 2.689 7.367 0.007 0.000

Interpretation: For females, the odds of developing CHD is 19.15

times that of males’. (Males are taken as a reference)

Note that the 95% CI is very wide due to the fact that there is small
sample size used in the analysis (There must be at least 10 ‘yes’s and
10 ‘no’s preferably 20 for each category of each variables
Multiple Logistic Regression output
Unsatisfied
Characteristics Yes No Crude OR Adjusted OR P- Value
(95% CI) (95% CI)
Cost of treatment 0.522
Very cheap 59 69 1.0 1.0
Cheap 70 42 1.95 (1.16, 3.27) 1.36 (0.66 2.81) 0.400
Moderate 30 8 4.39(1.87, 10.30) 2.12 (0.64, 7.05) 0.220
Expensive 97 41 2.77(1.67, 4.58) 1.54(0.73, 3.26) 0.255
Highly expensive 47 11 5.00(2.38, 10.50) 2.35 (0.74, 7.53) 0.150
Residence
Urban 205 154 1.0 1.0
Rural 98 17 4.33 (2.48, 7.55) 2.71 (1.19, 6.16) 0.017*
Extra job < 0.001**
Goven’t worker 19 13 1.0 1.0
Need partime 210 58 2.48 (1.16, 5.31) 3.18(1.09, 9.06) 0.034
Partimer 60 96 0.43 (0.20, 0.93) 1.04(0.35, 3.15) 0.94
Has his own firm 14 4 2.40 (0.64, 8.93) 6.26 (3.21, 32.27) 0.028
Diagnosis Type
Complex 108 154 1.0 1.0
Simple 195 17 16.36 (9.41, 28.44) 13.55 (6.96, 26.37) <0.001**

Total 303 171

 = 0.05, * shows significant, * *shows highly significant, Underlined figures are overall p-values
Take Notice of:
 How we should put the variables (characteristics) and the
corresponding categories
 How to put the frequencies in relation to the definition of the
categories of the dependent variable in the SPSS variable-view
(Because SPSS always interprets ORs in terms of the larger code
given in the ‘value’ column e.g. here unsatisfaction)
 Overall p-values are important for variables having more than two
categories especially if there are both significant and insignificant
categories in that particular variable
 The overall-p-values are written just straight to the variable name
and specific p-values to the respective categories
 Specific p-values are of course could look redundancies of the
confidence intervals, however, they could tell us the level (degree)
of significance (Like: strong, marginal weak etc associations)

Practice Test 2 Summer 2021
0% (1)
Practice Test 2 Summer 2021
7 pages
Agresti Cda
No ratings yet
Agresti Cda
191 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
14 Midterm 2740 PDF
No ratings yet
14 Midterm 2740 PDF
17 pages
MATH40082 (Computational Finance) Assignment No. 2: Advanced Methods
No ratings yet
MATH40082 (Computational Finance) Assignment No. 2: Advanced Methods
6 pages
Chapter 6
0% (1)
Chapter 6
50 pages
18Logistic regression yilma
No ratings yet
18Logistic regression yilma
88 pages
Binary Logistic Regression Concept
No ratings yet
Binary Logistic Regression Concept
10 pages
Lect7 Math231
No ratings yet
Lect7 Math231
29 pages
Logistic regression_2021 ch-8
No ratings yet
Logistic regression_2021 ch-8
52 pages
Predictive Modeling: Logistic Regression
No ratings yet
Predictive Modeling: Logistic Regression
13 pages
Correlation and Regression
80% (5)
Correlation and Regression
24 pages
Healthcare Analytics LRA
No ratings yet
Healthcare Analytics LRA
73 pages
Logistic Regression-1
No ratings yet
Logistic Regression-1
27 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Logistic Regression
100% (1)
Logistic Regression
34 pages
Meps CHD Project Presentation
No ratings yet
Meps CHD Project Presentation
32 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
Logistic Regression (2022)
No ratings yet
Logistic Regression (2022)
44 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
Week 5 RiskRateChisquaretests
No ratings yet
Week 5 RiskRateChisquaretests
18 pages
Log Reg
No ratings yet
Log Reg
32 pages
categorical data_spss2019
No ratings yet
categorical data_spss2019
62 pages
Final Cc01 Group05-1
No ratings yet
Final Cc01 Group05-1
26 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Quiz 3, Modified: Modern Data Mining October 29, 2018
No ratings yet
Quiz 3, Modified: Modern Data Mining October 29, 2018
5 pages
Thesis Using Logistic Regression
100% (2)
Thesis Using Logistic Regression
7 pages
Heart Disease App With Code
No ratings yet
Heart Disease App With Code
22 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Lecture 5. Part 1 - Regression Analysis
No ratings yet
Lecture 5. Part 1 - Regression Analysis
28 pages
Logistic Regression Mini Tab
100% (3)
Logistic Regression Mini Tab
20 pages
Cox Proportional Hazard Model
No ratings yet
Cox Proportional Hazard Model
34 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Agresti Ordinal Tutorial
No ratings yet
Agresti Ordinal Tutorial
75 pages
Lecture 10
No ratings yet
Lecture 10
13 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
1categorical Data Analysis (Chi Square) June 2022
No ratings yet
1categorical Data Analysis (Chi Square) June 2022
194 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Logistic Regression
100% (2)
Logistic Regression
47 pages
Lecture3-Logistic Regression 6-5-08
No ratings yet
Lecture3-Logistic Regression 6-5-08
72 pages
L5 Logistic Regression (2011)
100% (1)
L5 Logistic Regression (2011)
55 pages
Cox Regression Thesis
100% (3)
Cox Regression Thesis
6 pages
Inferential Statistics II
No ratings yet
Inferential Statistics II
62 pages
Statistics2 Mcqs Student Version
No ratings yet
Statistics2 Mcqs Student Version
9 pages
Basic Concepts of Logistic Regression
No ratings yet
Basic Concepts of Logistic Regression
5 pages
Equation Cda
No ratings yet
Equation Cda
14 pages
logistic regression
No ratings yet
logistic regression
79 pages
Categorical Questions
No ratings yet
Categorical Questions
26 pages
lineare regrassion and correlation for mph
No ratings yet
lineare regrassion and correlation for mph
119 pages
Logistic Regression
No ratings yet
Logistic Regression
15 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
5.1) Binary logistic regression
No ratings yet
5.1) Binary logistic regression
32 pages
biostat finals part 1
No ratings yet
biostat finals part 1
3 pages
DA R Assignment2
No ratings yet
DA R Assignment2
9 pages
Multiple Logistic Regression
No ratings yet
Multiple Logistic Regression
71 pages
hw3 Spring2024 Solution
No ratings yet
hw3 Spring2024 Solution
18 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
ASC2014 - Chapter 1 - Workshop
No ratings yet
ASC2014 - Chapter 1 - Workshop
4 pages
Aaa Math
No ratings yet
Aaa Math
2 pages
Social Work Stat
No ratings yet
Social Work Stat
52 pages
Forcasting
No ratings yet
Forcasting
27 pages
Store 24 AB
0% (2)
Store 24 AB
15 pages
Business Analytics Presentation
No ratings yet
Business Analytics Presentation
11 pages
Wonder of Heavens
No ratings yet
Wonder of Heavens
8 pages
Experiments & Design of Experiments (Doe)
No ratings yet
Experiments & Design of Experiments (Doe)
364 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
From GLM To GLIMMIX-Which Model To Choose
No ratings yet
From GLM To GLIMMIX-Which Model To Choose
7 pages
3.5.16 Probability Distribution PDF
No ratings yet
3.5.16 Probability Distribution PDF
23 pages
Hypothesis Melted Ice Presentation in Statistic and Probability
No ratings yet
Hypothesis Melted Ice Presentation in Statistic and Probability
13 pages
MinE424 - 02statistical Applications in Projects
No ratings yet
MinE424 - 02statistical Applications in Projects
54 pages
Short Quiz 1
No ratings yet
Short Quiz 1
3 pages
Probability Theory Part 2
No ratings yet
Probability Theory Part 2
38 pages
Normal Curve Powerpoint
No ratings yet
Normal Curve Powerpoint
18 pages
Econ452: Problem Set 2: University of Michigan - Department of Economics
No ratings yet
Econ452: Problem Set 2: University of Michigan - Department of Economics
4 pages
Stochastic Project
No ratings yet
Stochastic Project
3 pages
Ancova - Using Spss
100% (1)
Ancova - Using Spss
12 pages
Managerial Computer Lab
No ratings yet
Managerial Computer Lab
14 pages
What Do You Mean by The Additive Property of The T
0% (1)
What Do You Mean by The Additive Property of The T
2 pages
Instant Ebooks Textbook Statistics Learning From Data Roxy Peck Download All Chapters
100% (16)
Instant Ebooks Textbook Statistics Learning From Data Roxy Peck Download All Chapters
84 pages
Unit - 2 BRM PDF
No ratings yet
Unit - 2 BRM PDF
9 pages
Ito Stratonovich
No ratings yet
Ito Stratonovich
18 pages
Hines 2010
No ratings yet
Hines 2010
11 pages
Statistics Notes - Normal Distribution, Confidence Interval & Hypothesis Testing
No ratings yet
Statistics Notes - Normal Distribution, Confidence Interval & Hypothesis Testing
2 pages

Chapter 8 Logistic Regression (Compatibility Mode)

Uploaded by

Chapter 8 Logistic Regression (Compatibility Mode)

Uploaded by

University of Gondar

College of medicine and health science

Lemma Derseh (BSc. MPH)

 What can we do if the dependent variable is

Age CHD Age CHD Age CHD

Probability for CHD

Second add the possible linear regression line of probability on age

Problem! If we try to fit an ordinary linear regression, we will

In doing so, we can get intermediate proportions between 0 and 1

Age group # in group # probability

Probability for CHD

Question! So what do we do with this S-Shaped curve?

We call the above mathematical expression a logistic function

The odds of an event

a = log odds of event

logit of P b = log Odds Ratio associated

Logistic function Logit transform function Linear function

If â is the slope of the linear function after logit transformation then,

 As we move to the two extremes of x or p, the slope closes to 0

95.0% C.I. for

For a unit increase in age of a person, the odds of being

Example on Residence Unsatisfied Satisfied Total

 Odds for Rural: p 0 . 85 ln Odds:

 Odds for Urban: p 0 . 571 ln Odds:

 OR remains the same by the two methods OR = e1.47 = 4.33

 For urban (x1= 0) we have: ln  p    o   1  0   o

 Thus the estimate of the intercept is equal to ß0 which is the log

 The fitted model is: log(Odds) = 0.285 + 1.465X

Residence 1.47 0.284 26.72 <0.001 4.33 2.48 7.55

 Interpretation: the odds of rural patients’ unsatisfaction on the

 The independent variables could be dichotomous, ordinal,

 It is the increase in log-odds for a one unit increase in xi

 It measures association between xi and log-odds adjusted

age 0.114 0.053 4.733 0.030 1.121 1.011 1.243

sex(1) 2.952 1.276 5.356 0.021 19.153 1.571 233.459

Constant -7.787 2.689 7.367 0.007 0.000

Interpretation: For females, the odds of developing CHD is 19.15

Total 303 171

You might also like