Logistic Regression-Advanced Biostat-PDF(1)
Logistic Regression-Advanced Biostat-PDF(1)
1
Simple Binary Logistic Regression
• Simple Logistic regression is a statistical method used to
model the relationship between a binary dependent
variable with one independent variable.
• The independent variable used to predict or
explain the outcome variable can be either
continuous or categorical.
2
Cont…
• Unlike standard linear regression models, logistic
regression model does not require assumptions of:
normality of the response variable distribution and
Equal/homogenous variance of the error term for
every level of independent variable
• Logistic regression is widely used in fields of health
sciences to predict the likelihood of an event
occurring, such as whether a patient has a certain
disease or not based on predictor variables (e.g. age,
sex, smoking status).
3
Cont…
Goals of Logistic regression:
• To see whether the probability of getting a particular
value of the outcome/dependent variable is
associated with the independent/predictor variable
and/or
• To predict the probability that outcome/dependent
variable takes a particular value given the value of
continuous independent/predictor variable or the
level/a particular category of categorical
independent/predictor variable.
4
Assumptions of Logistic regression
1. The dependent variable should be binary.
Dependent variable with the desired outcome takes a
value of 1 and without the desired outcome takes a
value of 0.
2. Log odds of the dependent variable taking the desired
value (the event occurring) is linearly related with
continuous independent variables.
3. The error terms need to be independent
Observations should be independent of each other.
5
Cont…
6
Logistic Regression Model
• The response variable (Yi) is binary and assumes
only two values that for convenience can be
coded as 0 and 1.
1, if the th subject has the desired attribute
i
Y i 0, if the th subject has no the desired attribute
i
• A random variable Yi assuming values 1 or 0 with
probabilities i and 1 i respectively
• We can define a model with the intention that
the probabilities of individuals having the
attribute, i depend on one potential
explanatory variable(simple logistic regression)
7
Cont…
• Given the potential predictor, probabilities of individuals
having the desired event/attribute is given by:
• = 1⎸ = =
Where:
i is the probability that Yi=1 for a given Xi
8
Cont…
• o is a constant/intercept(corresponds to the
baseline group) and
• is regression coefficient (slope), measure the rate of
changes in i for a given change in continuous predictor
variable or for a given category/level of a categorical
predictor.
• e is the base of natural logarithm
• This equation is useful to determine the predicted
probability of an occurrence of an event given the
values of the predictor variable.
9
Cont…
10
Cont…
• Second, we take the natural log of the odds that Yi=1:
• ln ( = 1 ⎸ ) = ln = + +
• The natural log of the odds ratio of the Yi=1 versus Yi=0,
i.e. the log of the odds that an individual has the
attribute/event relative to does not have it.
11
Cont…
• i is the probability that the ith observation (i=1, …., n)
takes value 1 (Yi=1).
• 1 i is probability that the ith observation (i=1, …., n)
takes value 0 (Yi=0).
• are (k+1) unknown parameters to be
estimated including the intercept term.
• is explanatory variable
• i is random error/ error term which is binomially
distributed
12
Estimation method
• Since the error terms have binomial distribution,
ordinary least square (OLS) estimation is not
appropriate.
• Thus, we use maximum likelihood (ML) method
for model parameters estimation instead.
13
Confidence Intervals and Significance Tests for Logistic
Regression Parameters
• A 100 1 − % confidence interval for the log-odds
coefficients ( ) are given by:
± × ( ) and ± × ( )
• A 100 1 − % confidence interval for the odds ratio,
are obtained by transforming the confidence interval
for the intercept and the slope as:
• For constant:
± × ( ) × ( ) × ( )
= ,
• For the slope:
± × ( ) × ( ) × ( )
= ,
14
Example
• The objective of this study is whether the cure/non-
cure process is associated with the treatment
alternatives (treatment A or treatment B). In other
words, we are testing if the probability of cure
applying treatment A is equal to, or different from,
the probability of cure applying treatment B.
• To conduct the study, suppose that we have
performed an experiment on a random sample of
100 patients, randomly divided into two groups of 50
patients, each of which is given a treatment (A or B).
15
Cont…
• The results obtained in the experiment are
shown in the following table:
Treatment alternative * Cure Status Crosstabulation
Count
Cure Status
Cure Non-cure Total
Treatment Treatment A 40 10 50
alternative
Treatment B 30 20 50
Total 70 30 100
• OR=2.67
16
Cont…
• Logistic regression output
Dependent Variable Encoding
Parameter
coding
Frequency (1)
Treatment Treatment B 50 .000
alternative Treatment A 50 1.000
• P(cure)=70/100=0.7, P(non-cure)=30/100=0.3
• Odds(cure)=P(cure)/P(non-cure)=0.7/0.3=2.333
• B=ln(Odds of cure)=ln(2.333)=0.847
• OR=EXP(B)=EXP(0.847)=2.333
• OR >1 (OR=2.333) shows that cure is more likely
than non-cure.
18
cont…
• Simple Binary Logistic Regression model
• Does inclusion of one independent variable (treatment
alternative) improve the model over the null model?
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 4.831 1 .028
Block 4.831 1 .028
Model 4.831 1 .028
20
Cont…
• Under Model Summary of SPSS output,
the -2 Log Likelihood statistic for the model with one
independent variable is 117. 341
Model Summary
• Treatment alternative(1)=treatment A
• Treatment B is set to reference group
23
Cont…
• Based on the model estimate:
• The probability that a patient cure from the disease based on
treatment alternative taken, is given by:
. .
• = 1⎸ = = = . .
• The log odds of cure from the disease for a patient based on a
particular treatment alternative taken is given by:
• ln ⎸ = ln
= + = . + .
24
Cont…
Interpretation of the logistic regression outputs
• Looking first at the results for Treatmentalternative(1),
there is a significant overall effect of treatment alternative
on likelihood of cure(B= 0.981, SE= 0.456, Wald=4.618,
df=1, p=0.032).
• The B coefficient for Treatmentalternative (1) is significant
and positive, indicating that the Treatment A is associated
with an increased odds of cure from the disease.
• The Exp(B) column (the Odds Ratio) tells us that the odds
of cure from the disease for a patient who took treatment A
was 2.67times the odds of cure for a patient who took
treatment B (our reference category) (COR=2.67 ; 95% CI:
[1.09, 6.524]).
25
Exercise
1. Find the probability that a patient was cured from the disease
given that he/she took treatment A.
2. Find the probability that a patient was not cured from the disease
given that he/she took treatment A.
3. Find the odds of cure for a patient who took treatment A
4. Find the probability that a patient was cured from the disease
given that he/she took treatment B.
5. Find the probability that a patient was not cured from the disease
given that he/she took treatment B.
6. Find the odds of cure for a patient who took treatment B
7. Calculate the ratio of the odds of cure for a patient who took
treatment A to the odds of cure for a patient who took treatment
B
8. Find 95% CI for the odds ratio for patient who took treatment A
compared to those who took treatment B
26
Data for the example
subject Status of cure Treatment alternative
1 1 1
Definitions of variables: 2 1 1
3 1 1
Dependent variable: 4 1 1
5 1 1
Status of cure: 6 1 1
7 1 1
1= cure, 0=non-cure 8 1 1
9 1 1
Independent variable 10
11
1
1
1
1
Treatment alternative: 12
13
1
1
1
1
14 1 1
1=Treatment A 15 1 1
16 1 1
0=Treatment B 17 1 1
18 1 1
19 1 1
20 1 1
27
Cont…
21 1 1 41 1 0
22 1 1 42 1 0
23 1 1 43 1 0
24 1 1 44 1 0
25 1 1 45 1 0
26 1 1 46 1 0
27 1 1 47 1 0
28 1 1 48 1 0
29 1 1 49 1 0
30 1 1 50 1 0
31 1 1 51 1 0
32 1 1 52 1 0
33 1 1 53 1 0
34 1 1 54 1 0
35 1 1 55 1 0
36 1 1 56 1 0
37 1 1 57 1 0
38 1 1 58 1 0
39 1 1 59 1 0
40 1 1 60 1 0
28
Cont…
61 1 0 81 0 0
62 1 0 82 0 0
63 1 0 83 0 0
64 1 0 84 0 0
65 1 0 85 0 0
66 1 0 86 0 0
67 1 0 87 0 0
68 1 0 88 0 0
69 1 0 89 0 0
70 1 0 90 0 0
71 0 1 91 0 0
72 0 1 92 0 0
73 0 1 93 0 0
74 0 1 94 0 0
75 0 1 95 0 0
76 0 1 96 0 0
77 0 1 97 0 0
78 0 1 98 0 0
79 0 1 99 0 0
80 0 1 100 0 0
29
Advanced Biostatistics
30
Multiple Binary Logistic Regression
• Multiple binary logistic regression model is a type of
regression model which is appropriate when the
response variable takes only two possible values
representing the presence or absence of an attribute of
interest(examples: infected/not infected, effective/ not
effective) is associated with more than one
independent variables.
• The independent variables that are used for outcome
variable prediction can be dichotomous, categorical of
more than two levels or continuous or combinations of
categorical and continuous variables.
31
Example
32
Assumptions of Logistic regression
• Unlike standard linear regression models, logistic
regression model does not require assumptions of:
normality of the response variable distribution and
Equal/homogenous variance of the error term for
every level of independent variable
• However, there are important assumptions to be
met to use multiple binary logistic regression:
33
Cont…
1. The dependent variable should be binary.
Dependent variable with the desired outcome takes a
value of 1 and without the desired outcome takes a
value of 0.
2. Log odds of the dependent variable taking the desired
value (the event occurring) is linearly related with
continuous independent variables.
3. The model should be fitted correctly/correct model
specification.
Only potential predictor variables should be included
4. The error terms need to be independent
Observations should be independent of each other.
34
Cont…
35
Multiple Binary Logistic Regression Model
• The response variable (Yi) is binary and assumes
only two values that for convenience can be
coded as 0 or 1.
1, if the th subject has the desired attribute
i
Y i 0, if the th subject has no the desired attribute
i
• A random variable Yi assuming values 1 or 0 with
probabilities i and 1 i respectively
• We can define a model with the intention that
the probabilities of individuals having the
attribute/event, i depend on k potential
explanatory variables
36
Cont…
• Given potential predictors, probabilities of individuals
having the desired attribute is given by:
e 0 1X i1... K X iK
i p(Y 1| X1 x1,...,X k xk) 1 e 0 1X i1.... K X iK ........................(1)
Where:
i is the probability that Yi=1 for a given
X j
x j
; j 1,2,3, ...., k
37
Cont…
• o is a constant/intercept(corresponds to the
baseline group) and
• are regression coefficients (slopes),
1 , 2 , ..., k
measure the rate of changes in i for a given change in
continuous predictor variable or for a given
category/level of a categorical predictor.
• e is the base of natural logarithm
• This equation is useful to determine the predicted
probability of an occurrence of an event given the
values of the predictor variables.
38
Cont…
39
Cont…
• Second, we take the natural log of the odds that Yi=1:
i pr(Y i 1)
logit( i) ln
1-
ln
pr(Y 0)
ln e 01xi1.... k xik i
i i
0 1xi1 .... k xik i ............................... (3)
40
Cont…
• i is the probability that the ith observation (i=1, …., n)
takes value 1 (Yi=1).
• 1 i is probability that the ith observation (i=1, …., n)
takes value 0 (Yi=0).
• 0, 1, 2, ..., k are (k+1) unknown parameters to be
estimated including the intercept term.
• xi1,....,xik are k explanatory variables
• i is random error/ error term which is binomially
distributed
41
Estimation method
• Since the error terms have binomial distribution,
ordinary least square (OLS) estimation is not
appropriate.
• Thus, model’s parameters estimation is based on
maximum likelihood (ML) method.
42
Confidence Intervals and Significance Tests for Logistic
Regression Parameters
• A 100(1 )% confidence interval for the constant ( 0 )
and slopes ( ; j 1,2,3, ..., k ) are given by:
j
0 Z SE 0 j Z SE j
i.
ii .
2 2
• A confidence interval for the odds ratio of constant
( e 0 ) and slopes ( e j ) are obtained by transforming
the confidence interval for the constant and slopes are
given by:
SE
e 0 Z SE 0 e 0 Z SE 0 , e 0 2
Z 0
i.
2 2
ii.. e j
Z SE
j e j Z SE j , e j
Z
SE
j
2
2 2
43
Model-Building Strategies for Logistic
Regression
i. Backward elimination
• Starts with a model that contains all of the
explanatory variables available.
• Each variable is examined and the variable that, if
removed, would cause the smallest change (largest P-
value) in the overall model fit is then removed.
• This continues until all variables in the model are
significant.
44
Cont…
ii. Forward selection
• Looks at each explanatory variable individually and
selects a single explanatory variable that fits the data the
best on its own as the first variable included in the
model.
• Given the first variable, other variables are examined to
see if they will add significantly to the overall fit of the
model.
• Among the remaining variables the one that adds the
most is included.
• Examining remaining variables in light of those already in
the model and adding those that add significantly to the
overall fit is repeated until none of the remaining
variables would add significantly or there are no variables
remaining. 45
Cont…
iii. Stepwise selection
• A stepwise selection procedure is a combination of
forward selection and backward elimination.
• start with a forward selection procedure but after
every step one checks to see if variables added early
on have been made redundant (and so can be
dropped) by variables or combinations of variables
added later.
• Similarly, one can start with a backward selection
procedure but after every step check to see if a
variable that has been dropped should be added
back into the model.
46
Hypothesis testing
i. Overall (global) test
H :
0 1
.... 0
k
logit( i)
o
48
Cont…
• We need to test the hypothesis:
H o : 0 versus H : 0 ; j 1,2, ..., k
j 1 j
49
Cont…
i. Wald test
• Under the null hypothesis of 0 slopes and based on
asymptotic theory, Wald statistic follows a chi-square
distribution with 1 degree of freedom.
• Wald statistics is computed as:
2
ˆ j
W2 s . e ( ˆ j )
ln
P ( Y i 1)
P (Y i 0 )
0 1 x i1 .... k x ik
51
Cont…
• To test the Ho, we calculate a test statistic as:
L
G 2ln 2lnL lnL
R
2
R F
L F
2
• If H 0: 1 0is true, the sampling distribution of G
is very close to a 2 distribution with 1 degrees
of freedom (df equal to the number of extra
parameters (regression coefficients) included in
full model but not in reduced model).
52
Cont…
2 2
• If the test is significant (
G (1, ) ), the inclusion
of X1 as a predictor variable makes the full model
a better fit to our data than the reduced model
and therefore Ho is rejected.
• We can do a similar model comparison test for
the other predictor variables, X ; j 2,3,4, ...., k
j
53
Assessing the Fit of the Model
Goodness of fit
• Goodness of fit is an important diagnostic tool
for checking whether the model is adequate or
not by determining how similar the observed Y-
values are to the expected or predicted Y-values
• Among well known statistics for assessing the
goodness of fit of a logistic regression model, Hosmer
and Lemeshow statistic is common.
54
Cont…
i. Hosmer-Lemeshow goodness of fit test
• The probability of the outcome event is estimated for
each subject’s using the estimated regression
coefficients given the explanatory variables.
• These estimated subject probabilities are then classified
into g groups (usually 10 categories) defined by deciles
is common.
• The 10% of subjects with the lowest estimated
probabilities form the first category, the next lowest
10% form the second category and so on until the last
category is made up of those individuals with the
highest estimated probabilities.
55
Cont…
• The hypothesis is stated as:
eb0 b1 X 1 ... b K X K
H 0 : E[Y ] 1 b0 b1 X 1 .... b K X K
e
eb0 b1 X 1 ... b K X K
H1 : E[Y ] 1 b0 b1 X 1 .... b K X K
e
Or
H0: The model is good fit to the data
H1: The model is not good fit to the data
• The null hypothesis says that the model is
“correct” (or the only reason the observed
frequencies differ from the expected is a result of
random variation).
56
Cont…
Hosmer and Lemeshow test statistic X 2
• Obtained by computing the Pearson chi-square statistic
based on observed and expected values from a g by 2 table:
2
2
g 1 (O jk E jk) 2
X ~ g2
j 1 k 0 E jk
If statistic is unusually large
2
2 P 2g2X 2
• X g2 or P vaue
,then the differences between the observed and
expected values are greater than we would expect by
chance.
• This suggests that the model is not adequate (lack of
fit).
57
Cont…
• If the test statistic is small X 2 2g2 or P vaue P 2g2X 2
, do not reject H o .
• Non-significance of Hosmer and Lemeshow test
indicates that the model prediction does not significantly
differ from the observed outcome values.
• Thus, non-significant test (P-value greater than 0.05)
confirms that the logistic regression model is
correct/adequate model in fitting the data.
58
Cont…
ii. Analogue of Coefficient of determination
• It tells us the idea of the percentage of variation in the
response variable that is `explained' by the model.
• Some of the commonly used pseudo Coefficient of
determinations (pseudo-R2s) for logistic regression are:
A. The log-linear ratio R2 (aka McFadden’s R2)
2 LLF
RL 1 LLo
Where LL o is the log-likelihood for the model with only the
intercept and
LL F is the log-likelihood for the model with all predictors.
59
Cont…
B. Cox and Snell’s R2
2
n
2 L0 (LLoLLF)
Cox SnellR 1 1 e n
LF
Where:
L 0 is likelihoodof interceptonly model
61
Cont…
iii. Information Criteria
2
• In addition to the deviance G statistic and
2 pseudo R
, information criteria are also used to assess different
models goodness of fit.
A. The Akaike Information Criteria (AIC)
• Akaike Information Criterion adjusts (‘’penalize’’) the
residual deviance (model fit) for the number of
predictors, thus favoring parsimonious models
AIC -2 LL F 2 P
62
Cont…
B. Bayesian Information Criterion (BIC)
BIC -2 LL F P ln(n)
Where:
LL F Log - likelihood of the full/fitted model
p is the number of parameters in the model.
n is sample size.
• Smaller values of the information
criteria indicate a better fitting model and
• if many models have similar low AICs and BICs, we
choose the one with the fewest model terms
(parameters).
63
Interpretation of Logistic Regression parameters
Directionality and Magnitude of the Relationship
• A positive relationship means an increase in the
independent variable is associated with an increase in
the predicted probability of the response variable with
the event of interest, and vice versa.
• A negative relationship means an increase in the
independent variable is associated with a decrease in the
predicted probability of the response variable with the
event of interest, and vice versa.
• The direction of the relationship is reflected differently
for the original coefficients (B) and exponentiated logistic
regression coefficients (EXP(B)).
64
Cont…
A. Original coefficient (B)
• Original coefficient signs indicate the direction of the
relationship.
• Positive sign is associated with an increased probability
that the response variable will assume the event of
interest
• Negative sign associated with a decreased probability
that the response variable will assume the event of
interest.
• Therefore, for a unit changes in the continuous variable
or a specified level of categorical variable, the log odds
of the response variable with the event of interest
increases or decreases by B.
65
Cont…
B. Exponentiated coefficients (EXP(B))
• Exponentiated coefficients are interpreted differently
since they are exponentiated values of the original
coefficients and do not have negative values.
• Exponentiated coefficient (OR) above 1 represent a
positive relationship and
• OR Values less than 1 represent negative relationships.
• Accordingly, for a unit changes in the continuous
variable or a specified level of categorical variable, the
odds of the response variable with the event of interest
increases or decreases by (EXP(B)).
66
Cont…
• Exponentiated coefficient (OR) of value 1 shows that a
unit changes in the continuous variable or a specified
level of categorical variable has no effect/ change on the
odds of the response variable with the event of interest.
• In this case, a 95% confidence interval of the odds ratio
includes the value of 1 to confirm that the predictor
variable is not considered to be statistically significant.
• The exponentiated coefficients can be expressed in
terms of the percentage change in the expected odds of
the dependent variable with the desired event for a one-
unit change in the independent variable holding the
other independent variables
constant/fixed/adjusted/controlled.
67
Cont…
Odds Ratio
• Odds ratio- the ratio of two odds and is a comparative
measure between two levels of a categorical variable or
a unit change in the continuous variable
68
Cont…
• Odd ratio is given by:
i eo1Xi1...k Xik eo Xi1 Xik
OR ln (e 1) (
... e k )
1i
e o 1X i1... k X ik
P Y i 1/ X j ˆ / X j
i 1e o 1X i1... k X ik
70
Example
• The aim is to study the relationship between coronary
heart disease (CHD) and risk factors (age and sex).
• We want to answer the research question:
Is there an association between coronary heart disease
and age and sex?
• Absence of coronary heart disease is coded as 0 and
presence of coronary heart disease is coded as 1
• Table1.2. in the next slid presents data of status of
coronary heart disease (CHD) for 33 individuals with
their respective age and sex.
71
Cont…
• Table1.2. Data of status of coronary heart disease for 33
individuals with their respective age and sex(male=1,
female=0)
CHD Age Sex CD Age Sex CHD Age Sex
0 22 1 0 40 1 0 54 0
0 23 0 1 41 1 1 55 1
0 24 1 0 46 0 1 58 1
0 27 0 0 47 0 1 60 1
0 28 1 0 48 0 0 60 0
0 30 0 1 49 0 1 62 1
0 30 1 0 49 1 1 65 1
0 32 0 1 50 0 1 67 1
0 33 0 0 51 0 1 71 1
1 35 1 1 51 1 1 77 0
0 38 0 0 52 0 1 81 1
72
Cont…
To facilitate modelling, we need to categorize
age into two categories(<=50 years and
>50years):
Variables description:
• Response variable:
CDH: 1=presence/0=absence
• Independent variables:
Age category:
agecateg: 1= <= 50 years (reference category), 2 >50 years.
sex: 0=Female (reference category), 1= Male
73
Cont…
• The data with age categorize into two groups:
• Since more than 50% of the people in the sample did not
develop CHD, the best prediction for each case (if we have no
additional information) is that they did not develop CHD.
• We would be correct for 57.6% of the cases, because
19/33(57.6%) actually did not develop CHD.
75
Cont…
Null model (constant only)
Variables in the Equation
78
Cont…
Under Model Summary of SPSS output,
• the -2 Log Likelihood statistic for full model is 30.43.
• Although SPSS does not give us this statistic for null
model (the model that had only the intercept), from
Omnibus Tests of Model Coefficients output, we know
that -2LL of null model is reduced by 14.557.
• Since -2LL of full model is 30.43, -2LL of null model
equals 44.987 (30.43+14.557).
• Adding the age and sex reduced the -2 Log Likelihood
statistic by 14.557(44.557-30.43).
79
Cont…
• The reduction is significant (Chi-square=14.557,
df=2; p-value=0.001).
• df= number of parameters estimated in the full
model(constant, age, sex) minus number of
parameters estimated in the reduced model
(constant only). Thus, df=3-1=2.
Model Summary
Step -2 Log likelihood Cox & Snell R Nagelkerke R
Square Square
80
Cont…
• The pseudo-R2 values tell us approximately how much
variation in the outcome is explained by the model.
• The Nagelkerke’s R2 (0.479) suggests that the model
explains roughly 47.9% of the variation in the outcome
variable.
• The Hosmer & Lemeshow test of the goodness of fit is
non-significant (Chi-square =5.172, df=2, p-value=0.075)
suggesting that the model is a good fit to the data.
Hosmer and Lemeshow Test
5.172 2 .075
1
81
Cont…
• Another useful output is the Classification Table of full model.
• The model that includes the explanatory variables (age and sex) is
correctly classifying the outcome variable for 84.8% of the cases
compared to 57.6% correct classification of the outcome variable
by the null model.
• Full model showed a marked improvement over null model
Classification Tablea
Observed Predicted
Sign of coronary Percentag
heart disease e Correct
absence presence
of of
coronary coronary
heart heart
disease disease
absence of 19 0 100.0
coronary heart
Sign of coronary disease
Step 1 heart disease presence of 5 9 64.3
coronary heart
disease
Overall Percentage 84.8
a. The cut value is .500
82
Cont…
• The output in the Variables in the Equation table provides
the regression coefficients (B), the Wald statistic (to test the
statistical significance) and
• Odds Ratio (Exp (B)) for each variable category.
• Logistic Regression coefficients:
Variables in the Equation
83
Cont…
• Looking first at the results for agecateg(1), there is a
significant overall effect (B= 2.288, SE= 0.938,
Wald=5.956, df=1, p=0.015).
• The B coefficient agecateg(1) is significant and positive,
indicating that the higher age category is associated with
increased odds of developing CHD.
• The Exp(B) column (the Odds Ratio) tells us that the
odds of developing CHD for an individual from age
greater than 50 years was 9.86 times the odds of an
individual from 50 or below years of age group (our
reference category) controlling for sex (AOR=9.86 ; 95%
CI: [1.57, 61.93]).
84
Cont…
• The effect of sex is also significant and positive
(B= 2.126, SE= 0.953, Wald= 4.975, df=1,
p=0.026), indicating that men are more likely to
develop CHD than women.
• The OR estimate shows that the odds of
developing CHD for male was 8.34 times the
odds of female adjusting for age (AOR=8.38; 95%
CI: [1.29, 54.30]).
85
Summary
Logistic regression Equation:
Log(Odds of CHD)= -2.536 + 2.288age(>50 year) +2.126sex(Male)
• Both independent variables (age and sex) were significant predictors of
CHD (age: B=2.29, SE=0.94, Wald=5.956, df=1, P-value=0.015; sex:
B=2.13, SE=0.95, Wald= 4.975, df=1, P-value=0.026).
• The odds of developing coronary heart disease for an individual in the
age category of greater than 50 years is 9.9 times the odds of an
individual in the age category of 50 year or below controlling for sex
(AOR=9.86 ; 95% CI: [1.57, 61.93]).
• The odd of developing coronary heart disease for male was 8.4 times
the odds of female controlling for age (AOR=8.38; 95% CI: [1.29, 54.30]).
• In conclusion,
age above 50 years and male sex were statistically significant risk
factors for CHD.
86