0% found this document useful (0 votes)

17 views

Logistic Regression-Advanced Biostat-PDF(1)

Advanced biostatics

Uploaded by

zekariasnigisa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Logistic Regression-Advanced Biostat-PDF(1)

Advanced biostatics

Uploaded by

zekariasnigisa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Advanced Biostatistics

Course Topic2: Simple Binary Logistic

Regression

1
Simple Binary Logistic Regression
• Simple Logistic regression is a statistical method used to
model the relationship between a binary dependent
variable with one independent variable.
• The independent variable used to predict or
explain the outcome variable can be either
continuous or categorical.

2
Cont…
• Unlike standard linear regression models, logistic
regression model does not require assumptions of:
 normality of the response variable distribution and
 Equal/homogenous variance of the error term for
every level of independent variable
• Logistic regression is widely used in fields of health
sciences to predict the likelihood of an event
occurring, such as whether a patient has a certain
disease or not based on predictor variables (e.g. age,
sex, smoking status).

3
Cont…
Goals of Logistic regression:
• To see whether the probability of getting a particular
value of the outcome/dependent variable is
associated with the independent/predictor variable
and/or
• To predict the probability that outcome/dependent
variable takes a particular value given the value of
continuous independent/predictor variable or the
level/a particular category of categorical
independent/predictor variable.

4
Assumptions of Logistic regression
1. The dependent variable should be binary.
 Dependent variable with the desired outcome takes a
value of 1 and without the desired outcome takes a
value of 0.
2. Log odds of the dependent variable taking the desired
value (the event occurring) is linearly related with
continuous independent variables.
3. The error terms need to be independent
 Observations should be independent of each other.

5
Cont…

4. The model requires a large sample sizes

6. The continuous predictors should not have
influential outliers

6
Logistic Regression Model
• The response variable (Yi) is binary and assumes
only two values that for convenience can be
coded as 0 and 1.
1, if the th subject has the desired attribute
i
Y i 0, if the th subject has no the desired attribute
 
 i
• A random variable Yi assuming values 1 or 0 with
probabilities  i and 1 i respectively
• We can define a model with the intention that
the probabilities of individuals having the
attribute,  i depend on one potential
explanatory variable(simple logistic regression)
7
Cont…
• Given the potential predictor, probabilities of individuals
having the desired event/attribute is given by:

• = 1⎸ = =
Where:
  i is the probability that Yi=1 for a given Xi

 are parameters to be estimated,

8
Cont…
• o is a constant/intercept(corresponds to the
baseline group) and
• is regression coefficient (slope), measure the rate of
changes in  i for a given change in continuous predictor
variable or for a given category/level of a categorical
predictor.
• e is the base of natural logarithm
• This equation is useful to determine the predicted
probability of an occurrence of an event given the
values of the predictor variable.

9
Cont…

• We need to transform  i so that the logistic

regression model closely resembles a familiar
linear regression model:
 First, we calculate odds that an event occurs
(i.e. Yi=1) which is the probability that an event
occurs relative to its converse/it does not occur,
i.e. the probability that Yi=1 relative to the
probability that Yi=0:
= 1⎸ = =
1−

10
Cont…
• Second, we take the natural log of the odds that Yi=1:
• ln ( = 1 ⎸ ) = ln = + +

• The natural log of the odds ratio of the Yi=1 versus Yi=0,
i.e. the log of the odds that an individual has the
attribute/event relative to does not have it.

11
Cont…
•  i is the probability that the ith observation (i=1, …., n)
takes value 1 (Yi=1).
• 1 i is probability that the ith observation (i=1, …., n)
takes value 0 (Yi=0).
• are (k+1) unknown parameters to be
estimated including the intercept term.
• is explanatory variable
•  i is random error/ error term which is binomially
distributed

12
Estimation method
• Since the error terms have binomial distribution,
ordinary least square (OLS) estimation is not
appropriate.
• Thus, we use maximum likelihood (ML) method
for model parameters estimation instead.

13
Confidence Intervals and Significance Tests for Logistic
Regression Parameters
• A 100 1 − % confidence interval for the log-odds
coefficients ( ) are given by:
± × ( ) and ± × ( )
• A 100 1 − % confidence interval for the odds ratio,
are obtained by transforming the confidence interval
for the intercept and the slope as:
• For constant:
± × ( ) × ( ) × ( )
= ,
• For the slope:
± × ( ) × ( ) × ( )
= ,

14
Example
• The objective of this study is whether the cure/non-
cure process is associated with the treatment
alternatives (treatment A or treatment B). In other
words, we are testing if the probability of cure
applying treatment A is equal to, or different from,
the probability of cure applying treatment B.
• To conduct the study, suppose that we have
performed an experiment on a random sample of
100 patients, randomly divided into two groups of 50
patients, each of which is given a treatment (A or B).

15
Cont…
• The results obtained in the experiment are
shown in the following table:
Treatment alternative * Cure Status Crosstabulation

Count
Cure Status
Cure Non-cure Total
Treatment Treatment A 40 10 50
alternative
Treatment B 30 20 50
Total 70 30 100

• OR=2.67

16
Cont…
• Logistic regression output
Dependent Variable Encoding

Original Value Internal Value

Non-cure 0
Cure 1

• Categorical independent variable coding:

Categorical Variables Codings

Parameter
coding
Frequency (1)
Treatment Treatment B 50 .000
alternative Treatment A 50 1.000

• Treatment B is set to be reference category

17
Cont…

• Null model (constant only)

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 0 Constant .847 .218 15.076 1 .000 2.333

• P(cure)=70/100=0.7, P(non-cure)=30/100=0.3
• Odds(cure)=P(cure)/P(non-cure)=0.7/0.3=2.333
• B=ln(Odds of cure)=ln(2.333)=0.847
• OR=EXP(B)=EXP(0.847)=2.333
• OR >1 (OR=2.333) shows that cure is more likely
than non-cure.
18
cont…
• Simple Binary Logistic Regression model
• Does inclusion of one independent variable (treatment
alternative) improve the model over the null model?
Omnibus Tests of Model Coefficients

Chi-square df Sig.
Step 1 Step 4.831 1 .028
Block 4.831 1 .028
Model 4.831 1 .028

• The statistic measures the amount of -2LL (deviance –

equivalent to the sum residual squares) reduced using
the model with one explanatory variable compared to
the null model.
19
Cont…
• The Omnibus Tests of Model Coefficients is used to check
that the model with one explanatory variable is an
improvement over the null model.
• It uses chi-square tests to see if there is a significant
difference between the Log-likelihoods (-2LLs) of the null
model and the model with one independent variable.
• If the model with one independent variable has a
significantly reduced -2LL compared to the null model
then it suggests that the model with the independent
variable is explaining more of the variance in the
response variable (status of cure) than the null model.

20
Cont…
• Under Model Summary of SPSS output,
the -2 Log Likelihood statistic for the model with one
independent variable is 117. 341
Model Summary

Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square

1 117.341 a .047 .067
a. Estimation terminated at iteration number 4 because parameter estimates changed by less than
.001.

• Although SPSS does not give us this statistic for null

model (the model that had only the intercept), from
Omnibus Tests of Model Coefficients output, we know
that -2LL of null model is reduced by 4.831.
21
Cont…
• Since -2LL of the model with one independent variable is
117.341, -2LL of null model equals 122.172
(117.341+4.831).
• Adding the treatment alternative as predictor variable
reduced the -2 Log Likelihood statistic by 4.831(122.172-
117.341).
• The reduction is significant (Chi-square=4.831, df=1; p-
value=0.028).
• df= number of parameters estimated in the model with
one independent variable (constant,
treatmentalternative) minus number of parameters
estimated in the null model (constant only).
Thus, df=2-1=1.
22
Cont…
• Logistic regression model with the independent variable output:
• The output in the Variables in the Equation table provides
the regression coefficients (B), the Wald statistic (to test the
statistical significance) and
• Odds Ratio (Exp (B)) for each variable category.
Variables in the Equation
95% C.I.for
EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper

Step 1a Treatment alternative(1) .981 .456 4.618 1 .032 2.667 1.090 6.524

Constant .405 .289 1.973 1 .160 1.500

a. Variable(s) entered on step 1: Treatment alternative.

• Treatment alternative(1)=treatment A
• Treatment B is set to reference group

23
Cont…
• Based on the model estimate:
• The probability that a patient cure from the disease based on
treatment alternative taken, is given by:
. .
• = 1⎸ = = = . .

• The odds of cure from the disease for a patient based on a

particular treatment alternative taken, is given by:
. .
• = 1⎸ = = =

• The log odds of cure from the disease for a patient based on a
particular treatment alternative taken is given by:
• ln ⎸ = ln
= + = . + .

24
Cont…
Interpretation of the logistic regression outputs
• Looking first at the results for Treatmentalternative(1),
there is a significant overall effect of treatment alternative
on likelihood of cure(B= 0.981, SE= 0.456, Wald=4.618,
df=1, p=0.032).
• The B coefficient for Treatmentalternative (1) is significant
and positive, indicating that the Treatment A is associated
with an increased odds of cure from the disease.
• The Exp(B) column (the Odds Ratio) tells us that the odds
of cure from the disease for a patient who took treatment A
was 2.67times the odds of cure for a patient who took
treatment B (our reference category) (COR=2.67 ; 95% CI:
[1.09, 6.524]).

25
Exercise
1. Find the probability that a patient was cured from the disease
given that he/she took treatment A.
2. Find the probability that a patient was not cured from the disease
given that he/she took treatment A.
3. Find the odds of cure for a patient who took treatment A
4. Find the probability that a patient was cured from the disease
given that he/she took treatment B.
5. Find the probability that a patient was not cured from the disease
given that he/she took treatment B.
6. Find the odds of cure for a patient who took treatment B
7. Calculate the ratio of the odds of cure for a patient who took
treatment A to the odds of cure for a patient who took treatment
B
8. Find 95% CI for the odds ratio for patient who took treatment A
compared to those who took treatment B

26
Data for the example
subject Status of cure Treatment alternative
1 1 1
Definitions of variables: 2 1 1
3 1 1
Dependent variable: 4 1 1
5 1 1
Status of cure: 6 1 1
7 1 1
1= cure, 0=non-cure 8 1 1
9 1 1
Independent variable 10
11
1
1
1
1
Treatment alternative: 12
13
1
1
1
1
14 1 1
1=Treatment A 15 1 1
16 1 1
0=Treatment B 17 1 1
18 1 1
19 1 1
20 1 1
27
Cont…
21 1 1 41 1 0
22 1 1 42 1 0
23 1 1 43 1 0
24 1 1 44 1 0
25 1 1 45 1 0
26 1 1 46 1 0
27 1 1 47 1 0
28 1 1 48 1 0
29 1 1 49 1 0
30 1 1 50 1 0
31 1 1 51 1 0
32 1 1 52 1 0
33 1 1 53 1 0
34 1 1 54 1 0
35 1 1 55 1 0
36 1 1 56 1 0
37 1 1 57 1 0
38 1 1 58 1 0
39 1 1 59 1 0
40 1 1 60 1 0

28
Cont…
61 1 0 81 0 0
62 1 0 82 0 0
63 1 0 83 0 0
64 1 0 84 0 0
65 1 0 85 0 0
66 1 0 86 0 0
67 1 0 87 0 0
68 1 0 88 0 0
69 1 0 89 0 0
70 1 0 90 0 0
71 0 1 91 0 0
72 0 1 92 0 0
73 0 1 93 0 0
74 0 1 94 0 0
75 0 1 95 0 0
76 0 1 96 0 0
77 0 1 97 0 0
78 0 1 98 0 0
79 0 1 99 0 0
80 0 1 100 0 0

29
Advanced Biostatistics

Topic3: Multiple Binary Logistic Regression

30
Multiple Binary Logistic Regression
• Multiple binary logistic regression model is a type of
regression model which is appropriate when the
response variable takes only two possible values
representing the presence or absence of an attribute of
interest(examples: infected/not infected, effective/ not
effective) is associated with more than one
independent variables.
• The independent variables that are used for outcome
variable prediction can be dichotomous, categorical of
more than two levels or continuous or combinations of
categorical and continuous variables.

31
Example

• Clinical and epidemiological studies of binary outcome

variables typically focus on the potential effects of
multiple predictors.
• For example, investigators believe that total serum
cholesterol, diastolic and Systolic blood pressure,
smoking, age, BMI, and behavior pattern are potential
predictors of coronary heart disease (CHD).
• Thus, to study the relationship between CHD and these
potential predictors, multiple binary logistic regression
model is appropriate.

32
Assumptions of Logistic regression
• Unlike standard linear regression models, logistic
regression model does not require assumptions of:
 normality of the response variable distribution and
 Equal/homogenous variance of the error term for
every level of independent variable
• However, there are important assumptions to be
met to use multiple binary logistic regression:

33
Cont…
1. The dependent variable should be binary.
 Dependent variable with the desired outcome takes a
value of 1 and without the desired outcome takes a
value of 0.
2. Log odds of the dependent variable taking the desired
value (the event occurring) is linearly related with
continuous independent variables.
3. The model should be fitted correctly/correct model
specification.
 Only potential predictor variables should be included
4. The error terms need to be independent
 Observations should be independent of each other.
34
Cont…

5. No strong multi-collinearity among independent

variables
6. The model requires a large sample sizes
7. The continuous predictors should not have
influential outliers

35
Multiple Binary Logistic Regression Model
• The response variable (Yi) is binary and assumes
only two values that for convenience can be
coded as 0 or 1.
1, if the th subject has the desired attribute
i
Y i 0, if the th subject has no the desired attribute
 
 i
• A random variable Yi assuming values 1 or 0 with
probabilities  i and 1 i respectively
• We can define a model with the intention that
the probabilities of individuals having the
attribute/event,  i depend on k potential
explanatory variables
36
Cont…
• Given potential predictors, probabilities of individuals
having the desired attribute is given by:

e 0  1X i1...  K X iK
i  p(Y 1| X1  x1,...,X k  xk)  1 e 0  1X i1....  K X iK ........................(1)
Where:
  i is the probability that Yi=1 for a given
X j
 x j
; j  1,2,3, ...., k

  o ,  1 , ... ,  k are parameters to be estimated and

37
Cont…
• o is a constant/intercept(corresponds to the
baseline group) and
• are regression coefficients (slopes),
 1 ,  2 , ...,  k
measure the rate of changes in  i for a given change in
continuous predictor variable or for a given
category/level of a categorical predictor.
• e is the base of natural logarithm
• This equation is useful to determine the predicted
probability of an occurrence of an event given the
values of the predictor variables.

38
Cont…

• We need to transform  i so that the logistic

39
Cont…
• Second, we take the natural log of the odds that Yi=1:
  i   pr(Y i 1) 
logit( i)  ln
1-
  ln
  pr(Y 0)  
  ln e 01xi1.... k xik i 
 i   i 
  0  1xi1 ....  k xik   i ............................... (3)

• Equation (3) is the natural log of the odds ratio of the

Yi=1 versus Yi=0, i.e. the log of the odds that an
individual has the attribute/event relative to does not
have it.

40
Cont…
•  i is the probability that the ith observation (i=1, …., n)
takes value 1 (Yi=1).
• 1 i is probability that the ith observation (i=1, …., n)
takes value 0 (Yi=0).
•  0,  1,  2, ...,  k are (k+1) unknown parameters to be
estimated including the intercept term.
• xi1,....,xik are k explanatory variables
•  i is random error/ error term which is binomially
distributed

41
Estimation method
• Since the error terms have binomial distribution,
ordinary least square (OLS) estimation is not
appropriate.
• Thus, model’s parameters estimation is based on
maximum likelihood (ML) method.

42
Confidence Intervals and Significance Tests for Logistic
Regression Parameters
• A 100(1 )% confidence interval for the constant ( 0 )
and slopes ( ; j  1,2,3, ..., k ) are given by:
j

  
 0  Z  SE  0  j  Z  SE   j 

i. 


 ii .
   
2 2
• A confidence interval for the odds ratio of constant
( e  0 ) and slopes ( e j ) are obtained by transforming
the confidence interval for the constant and slopes are
given by:
   SE 
e 0  Z SE  0  e 0  Z SE 0 , e 0 2
     Z 0
i.
2 2 
 

ii.. e j
 
 Z SE 

     
j  e j  Z SE  j , e j


 Z  
 SE  
j
2  
2 2 
  43
Model-Building Strategies for Logistic
Regression
i. Backward elimination
• Starts with a model that contains all of the
explanatory variables available.
• Each variable is examined and the variable that, if
removed, would cause the smallest change (largest P-
value) in the overall model ﬁt is then removed.
• This continues until all variables in the model are
significant.

44
Cont…
ii. Forward selection
• Looks at each explanatory variable individually and
selects a single explanatory variable that fits the data the
best on its own as the first variable included in the
model.
• Given the first variable, other variables are examined to
see if they will add significantly to the overall fit of the
model.
• Among the remaining variables the one that adds the
most is included.
• Examining remaining variables in light of those already in
the model and adding those that add significantly to the
overall fit is repeated until none of the remaining
variables would add significantly or there are no variables
remaining. 45
Cont…
iii. Stepwise selection
• A stepwise selection procedure is a combination of
forward selection and backward elimination.
• start with a forward selection procedure but after
every step one checks to see if variables added early
on have been made redundant (and so can be
dropped) by variables or combinations of variables
added later.
• Similarly, one can start with a backward selection
procedure but after every step check to see if a
variable that has been dropped should be added
back into the model.
46
Hypothesis testing
i. Overall (global) test
H :
0 1
 ....    0
k
 logit( i)  
o

H 1: At least one of B j 0; j1,2, ... , k

Test Statistic
• The test based on the statistic " G2 " under the null
hypothesis that the beta's coefficients for the explanatory
variables in the model are equal to 0.
• The test statistic is given by:
2  Lo 
G  - 2ln 

 L F 
• where Lo is likelihood of the model with only intercept
and LF is the likelihood of the full model (model with all
explanatory variables)
47
Cont…
• A significant p-value (or G2 (2k,) ) provides evidence that at
least one of the regression coefficients for an explanatory
variables, is non zero ( but it doesn’t tell us which ones!)
• where k is the number of explanatory variables in the
logistic regression equation

ii. Individual (partial) test

• Individual test examines whether regression
coefficients in a logistic regression model are
significantly different from 0.

48
Cont…
• We need to test the hypothesis:
H o :   0 versus H :   0 ; j  1,2, ..., k
j 1 j

• If ˆ j  0 , the potential predictor variables ( X j ; j  1,2,3,...,

k )
included in the model, has no effect on the odds of the
response variable takes the event of interest(Y=1).
• The two commonly used individual test methods are:
i. Wald test and
ii. Log-likelihood test

49
Cont…

i. Wald test
• Under the null hypothesis of 0 slopes and based on
asymptotic theory, Wald statistic follows a chi-square
distribution with 1 degree of freedom.
• Wald statistics is computed as:
2
 ˆ j 
W2   s . e ( ˆ j ) 
 

• Where ̂ j represents the estimated coefficient of  j

and s.e(̂ ) is its standard error.
j
2 2
• we reject H o if W   (1, )
50
Cont…
ii. Log-likelihood test
• To test H 0:  1 0 , we compare the fit (the log
likelihood) of the full model:

ln 
P ( Y i 1)
P (Y i 0 )
  0   1 x i1  ....   k x ik

to the fit of the reduced model:

 P (Y  1) 
ln  i      x  ........   x
 P (  0)  i2 ik
 Y
o 2 k
i 

51
Cont…
• To test the Ho, we calculate a test statistic as:
 L
G  2ln   2lnL  lnL 
R
2
R F

L  F

• where L R  likelihood of reduced model and L F is likelihood of full model

2
• If H 0:  1 0is true, the sampling distribution of G
is very close to a  2 distribution with 1 degrees
of freedom (df equal to the number of extra
parameters (regression coefficients) included in
full model but not in reduced model).
52
Cont…
2 2
• If the test is significant (
G   (1, ) ), the inclusion
of X1 as a predictor variable makes the full model
a better fit to our data than the reduced model
and therefore Ho is rejected.
• We can do a similar model comparison test for
the other predictor variables, X ; j  2,3,4, ...., k
j

53
Assessing the Fit of the Model
Goodness of fit
• Goodness of fit is an important diagnostic tool
for checking whether the model is adequate or
not by determining how similar the observed Y-
values are to the expected or predicted Y-values
• Among well known statistics for assessing the
goodness of fit of a logistic regression model, Hosmer
and Lemeshow statistic is common.

54
Cont…
i. Hosmer-Lemeshow goodness of fit test
• The probability of the outcome event is estimated for
each subject’s using the estimated regression
coefficients given the explanatory variables.
• These estimated subject probabilities are then classified
into g groups (usually 10 categories) defined by deciles
is common.
• The 10% of subjects with the lowest estimated
probabilities form the first category, the next lowest
10% form the second category and so on until the last
category is made up of those individuals with the
highest estimated probabilities.
55
Cont…
• The hypothesis is stated as:
eb0  b1 X 1  ...  b K X K
H 0 : E[Y ]  1  b0  b1 X 1  ....  b K X K
e
eb0  b1 X 1  ...  b K X K
H1 : E[Y ]  1  b0  b1 X 1  .... b K X K
e
Or
H0: The model is good fit to the data
H1: The model is not good fit to the data
• The null hypothesis says that the model is
“correct” (or the only reason the observed
frequencies differ from the expected is a result of
random variation).
56
Cont…
Hosmer and Lemeshow test statistic  X 2 
• Obtained by computing the Pearson chi-square statistic
based on observed and expected values from a g by 2 table:
2
2
g 1 (O jk E jk) 2
X   ~  g2
j 1 k 0 E jk
If statistic is unusually large  
2  
2 P  2g2X 2   
• X   g2 or P  vaue   
,then the differences between the observed and
expected values are greater than we would expect by
chance.
• This suggests that the model is not adequate (lack of
fit).

57
Cont…
• If the test statistic is small X 2  2g2 or P  vaue  P 2g2X 2    
, do not reject H o .
• Non-significance of Hosmer and Lemeshow test
indicates that the model prediction does not significantly
differ from the observed outcome values.
• Thus, non-significant test (P-value greater than 0.05)
confirms that the logistic regression model is
correct/adequate model in fitting the data.

58
Cont…
ii. Analogue of Coefficient of determination
• It tells us the idea of the percentage of variation in the
response variable that is `explained' by the model.
• Some of the commonly used pseudo Coefficient of
determinations (pseudo-R2s) for logistic regression are:
A. The log-linear ratio R2 (aka McFadden’s R2)
2 LLF
RL 1  LLo
Where LL o is the log-likelihood for the model with only the
intercept and
LL F is the log-likelihood for the model with all predictors.
59
Cont…
B. Cox and Snell’s R2
2
n
2  L0   (LLoLLF) 
Cox SnellR  1    1 e n 

 LF 
Where:
L 0 is likelihoodof interceptonly model

L F is likelihood of a specified model

LL 0 is - 2log - Likelihood of intercept only/null model

LL F is - 2log - Likelihood of a specified model

n= number of observations/sample size

60
Cont…
C. Nagelkerke’s R2
• The Nagelkerke R2 adjusts the Cox-Snell R2 so that the
range of possible values extends to 1.
2
n
 L0 
1 
2  LF  Cox  Snell R 2
Nagelkerke R  2

  LL o 
1 L 0 
n
1 e 
 n 

 Usually pseudo Coefficient of determinations tend to be

smaller than R-square and values 0.2 to 0.4 are
considered to be satisfactory .

61
Cont…
iii. Information Criteria
2
• In addition to the deviance G  statistic and
2 pseudo R
, information criteria are also used to assess different
models goodness of fit.
A. The Akaike Information Criteria (AIC)
• Akaike Information Criterion adjusts (‘’penalize’’) the
residual deviance (model fit) for the number of
predictors, thus favoring parsimonious models

AIC  -2 LL F  2  P

62
Cont…
B. Bayesian Information Criterion (BIC)

BIC  -2 LL F  P  ln(n)
Where:
LL F  Log - likelihood of the full/fitted model
p is the number of parameters in the model.
n is sample size.
• Smaller values of the information
criteria indicate a better fitting model and
• if many models have similar low AICs and BICs, we
choose the one with the fewest model terms
(parameters).
63
Interpretation of Logistic Regression parameters
Directionality and Magnitude of the Relationship
• A positive relationship means an increase in the
independent variable is associated with an increase in
the predicted probability of the response variable with
the event of interest, and vice versa.
• A negative relationship means an increase in the
independent variable is associated with a decrease in the
predicted probability of the response variable with the
event of interest, and vice versa.
• The direction of the relationship is reflected differently
for the original coefficients (B) and exponentiated logistic
regression coefficients (EXP(B)).
64
Cont…
A. Original coefficient (B)
• Original coefficient signs indicate the direction of the
relationship.
• Positive sign is associated with an increased probability
that the response variable will assume the event of
interest
• Negative sign associated with a decreased probability
that the response variable will assume the event of
interest.
• Therefore, for a unit changes in the continuous variable
or a specified level of categorical variable, the log odds
of the response variable with the event of interest
increases or decreases by B.
65
Cont…
B. Exponentiated coefficients (EXP(B))
• Exponentiated coefficients are interpreted differently
since they are exponentiated values of the original
coefficients and do not have negative values.
• Exponentiated coefficient (OR) above 1 represent a
positive relationship and
• OR Values less than 1 represent negative relationships.
• Accordingly, for a unit changes in the continuous
variable or a specified level of categorical variable, the
odds of the response variable with the event of interest
increases or decreases by (EXP(B)).

66
Cont…
• Exponentiated coefficient (OR) of value 1 shows that a
unit changes in the continuous variable or a specified
level of categorical variable has no effect/ change on the
odds of the response variable with the event of interest.
• In this case, a 95% confidence interval of the odds ratio
includes the value of 1 to confirm that the predictor
variable is not considered to be statistically significant.
• The exponentiated coefficients can be expressed in
terms of the percentage change in the expected odds of
the dependent variable with the desired event for a one-
unit change in the independent variable holding the
other independent variables
constant/fixed/adjusted/controlled.
67
Cont…

• This can be achieved by subtracting 1 from the

exponentiated coefficients(OR) and multiplying the
result by 100%:
(100(OR - 1)% = 100(exp(B) - 1)%).

Odds Ratio
• Odds ratio- the ratio of two odds and is a comparative
measure between two levels of a categorical variable or
a unit change in the continuous variable

68
Cont…
• Odd ratio is given by:
 
 i  eo1Xi1...k Xik eo  Xi1  Xik
OR  ln    (e 1) (
... e k )
1i 

• Thus, e  j is the multiplicative effect on the odds

of the response variable assuming the event of
interest for increasing Xj by 1 unit, holding the
other X’s constant.
Probability
• The logistic regression function can be
interpreted as the probability of the outcome
variable taking value 1 (Y=1).
69
Cont…

• The probability that the response variable will assume

the desired event for a given values of the predictor
variables is predicted by:

e  o 1X i1...  k X ik
 
P Y i 1/ X j ˆ / X j 
i 1e o 1X i1... k X ik

70
Example
• The aim is to study the relationship between coronary
heart disease (CHD) and risk factors (age and sex).
• We want to answer the research question:
 Is there an association between coronary heart disease
and age and sex?
• Absence of coronary heart disease is coded as 0 and
presence of coronary heart disease is coded as 1
• Table1.2. in the next slid presents data of status of
coronary heart disease (CHD) for 33 individuals with
their respective age and sex.

71
Cont…
• Table1.2. Data of status of coronary heart disease for 33
individuals with their respective age and sex(male=1,
female=0)
CHD Age Sex CD Age Sex CHD Age Sex
0 22 1 0 40 1 0 54 0
0 23 0 1 41 1 1 55 1
0 24 1 0 46 0 1 58 1
0 27 0 0 47 0 1 60 1
0 28 1 0 48 0 0 60 0
0 30 0 1 49 0 1 62 1
0 30 1 0 49 1 1 65 1
0 32 0 1 50 0 1 67 1
0 33 0 0 51 0 1 71 1
1 35 1 1 51 1 1 77 0
0 38 0 0 52 0 1 81 1

72
Cont…
To facilitate modelling, we need to categorize
age into two categories(<=50 years and
>50years):
Variables description:
• Response variable:
CDH: 1=presence/0=absence
• Independent variables:
 Age category:
agecateg: 1= <= 50 years (reference category), 2 >50 years.
 sex: 0=Female (reference category), 1= Male
73
Cont…
• The data with age categorize into two groups:

age age age

CHD Age category sex CHD Age category sex CHD Age category sex
0 22 1 1 0 40 1 1 0 54 2 0
0 23 1 0 1 41 1 1 1 55 2 1
0 24 1 1 0 46 1 0 1 58 2 1
0 27 1 0 0 47 1 0 1 60 2 1
0 28 1 1 0 48 1 0 0 60 2 0
0 30 1 0 1 49 1 0 1 62 2 1
0 30 1 1 0 49 1 1 1 65 2 1
0 32 1 0 1 50 1 0 1 67 2 1
0 33 1 0 0 51 2 0 1 71 2 1
1 35 1 1 1 51 2 1 1 77 2 0
0 38 1 0 0 52 2 0 1 81 2 1
74
SPSS output of logistic regression analysis:
• Table 1.3. Classification table
Classification Tablea,b
Observed Predicted
Sign of coronary heart Percentage
disease Correct
absence presence
of CHD of CHD

Sign of coronary absence of CHD 19 0 100.0

Step 0 heart disease presence of CHD 14 0 .0
Overall Percentage 57.6
a. Constant is included in the model.
b. The cut value is .500

• Since more than 50% of the people in the sample did not
develop CHD, the best prediction for each case (if we have no
additional information) is that they did not develop CHD.
• We would be correct for 57.6% of the cases, because
19/33(57.6%) actually did not develop CHD.
75
Cont…
Null model (constant only)
Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 0 Constant -.305 .352 .752 1 .386 .737

• P(CHD present)= 14/33=0.424,

• P(CHD absent)= 19/33=0.576, then
• Odd of CHD= 0.424/0.576= 0.737(odds of developing
CHD).
• B=ln(odds of CHD)= ln(0.737)= -0.305
• EXP(B)=EXP(-0.305)=0.737.
76
Log-Likelihood ratio test
• The statistic measures the amount of -2LL (deviance –
equivalent to the sum residual squares) reduced using
the full model compared to the null model.
• The Omnibus Tests of Model Coefficients is used to check
that the full model (with all explanatory variables
included) is an improvement over the null model.
• It uses chi-square tests to see if there is a significant
difference between the Log-likelihoods (-2LLs) of the null
model and the full model.
• If the full model has a significantly reduced -2LL
compared to the null model then it suggests that the full
model is explaining more of the variance in the response
variable (CHD) than null model.
77
Cont…
• Here, the chi-square is highly significant (chi-
square=14.557, df=2, p=0.001).
• So our full model (model with both independent
variables: age and sex) is significantly better than null
model (model without independent variables).
Omnibus Tests of Model Coefficients
Chi-square df Sig.

Step 14.557 2 .001

Step 1 Block 14.557 2 .001
Model 14.557 2 .001

78
Cont…
Under Model Summary of SPSS output,
• the -2 Log Likelihood statistic for full model is 30.43.
• Although SPSS does not give us this statistic for null
model (the model that had only the intercept), from
Omnibus Tests of Model Coefficients output, we know
that -2LL of null model is reduced by 14.557.
• Since -2LL of full model is 30.43, -2LL of null model
equals 44.987 (30.43+14.557).
• Adding the age and sex reduced the -2 Log Likelihood
statistic by 14.557(44.557-30.43).

79
Cont…
• The reduction is significant (Chi-square=14.557,
df=2; p-value=0.001).
• df= number of parameters estimated in the full
model(constant, age, sex) minus number of
parameters estimated in the reduced model
(constant only). Thus, df=3-1=2.
Model Summary
Step -2 Log likelihood Cox & Snell R Nagelkerke R
Square Square

1 30.430a .357 .479

a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.

80
Cont…
• The pseudo-R2 values tell us approximately how much
variation in the outcome is explained by the model.
• The Nagelkerke’s R2 (0.479) suggests that the model
explains roughly 47.9% of the variation in the outcome
variable.
• The Hosmer & Lemeshow test of the goodness of fit is
non-significant (Chi-square =5.172, df=2, p-value=0.075)
suggesting that the model is a good fit to the data.
Hosmer and Lemeshow Test

Step Chi-square df Sig.

5.172 2 .075
1

81
Cont…
• Another useful output is the Classification Table of full model.
• The model that includes the explanatory variables (age and sex) is
correctly classifying the outcome variable for 84.8% of the cases
compared to 57.6% correct classification of the outcome variable
by the null model.
• Full model showed a marked improvement over null model
Classification Tablea
Observed Predicted
Sign of coronary Percentag
heart disease e Correct
absence presence
of of
coronary coronary
heart heart
disease disease
absence of 19 0 100.0
coronary heart
Sign of coronary disease
Step 1 heart disease presence of 5 9 64.3
coronary heart
disease
Overall Percentage 84.8
a. The cut value is .500
82
Cont…
• The output in the Variables in the Equation table provides
the regression coefficients (B), the Wald statistic (to test the
statistical significance) and
• Odds Ratio (Exp (B)) for each variable category.
• Logistic Regression coefficients:
Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for

EXP(B)
Lower Upper

agecateg(1) 2.288 .938 5.956 1 .015 9.858 1.569 61.932

2.126 .953 4.975 1 .026 8.383 1.294 54.304
Step 1a sex(1)
-2.536 .919 7.605 1 .006 .079
Constant
a. Variable(s) entered on step 1: agecateg, sex.

83
Cont…
• Looking first at the results for agecateg(1), there is a
significant overall effect (B= 2.288, SE= 0.938,
Wald=5.956, df=1, p=0.015).
• The B coefficient agecateg(1) is significant and positive,
indicating that the higher age category is associated with
increased odds of developing CHD.
• The Exp(B) column (the Odds Ratio) tells us that the
odds of developing CHD for an individual from age
greater than 50 years was 9.86 times the odds of an
individual from 50 or below years of age group (our
reference category) controlling for sex (AOR=9.86 ; 95%
CI: [1.57, 61.93]).
84
Cont…
• The effect of sex is also significant and positive
(B= 2.126, SE= 0.953, Wald= 4.975, df=1,
p=0.026), indicating that men are more likely to
develop CHD than women.
• The OR estimate shows that the odds of
developing CHD for male was 8.34 times the
odds of female adjusting for age (AOR=8.38; 95%
CI: [1.29, 54.30]).

85
Summary
Logistic regression Equation:
Log(Odds of CHD)= -2.536 + 2.288age(>50 year) +2.126sex(Male)
• Both independent variables (age and sex) were significant predictors of
CHD (age: B=2.29, SE=0.94, Wald=5.956, df=1, P-value=0.015; sex:
B=2.13, SE=0.95, Wald= 4.975, df=1, P-value=0.026).
• The odds of developing coronary heart disease for an individual in the
age category of greater than 50 years is 9.9 times the odds of an
individual in the age category of 50 year or below controlling for sex
(AOR=9.86 ; 95% CI: [1.57, 61.93]).
• The odd of developing coronary heart disease for male was 8.4 times
the odds of female controlling for age (AOR=8.38; 95% CI: [1.29, 54.30]).
• In conclusion,
 age above 50 years and male sex were statistically significant risk
factors for CHD.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (78)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
Shortcut To Shred Ebook Revised 9-9-2015 PDF
88% (8)
Shortcut To Shred Ebook Revised 9-9-2015 PDF
15 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
2025 Internal Assessment Score Sheet
100% (4)
2025 Internal Assessment Score Sheet
8 pages
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Logistic Regression: Prof. Andy Field
No ratings yet
Logistic Regression: Prof. Andy Field
34 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
Classical Linear Regression Model Assumptions and Diagnostics
No ratings yet
Classical Linear Regression Model Assumptions and Diagnostics
71 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Handout 4 Multiple Regression
No ratings yet
Handout 4 Multiple Regression
2 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Eco 3
No ratings yet
Eco 3
68 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
MBAS901 - L4
No ratings yet
MBAS901 - L4
83 pages
Logistic regression_2021 ch-8
No ratings yet
Logistic regression_2021 ch-8
52 pages
Dummy Dependent Variable
100% (1)
Dummy Dependent Variable
58 pages
Advance Business Research Methods
No ratings yet
Advance Business Research Methods
38 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
MLR
No ratings yet
MLR
48 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Research Methodology and Biostatistics Part II 2
No ratings yet
Research Methodology and Biostatistics Part II 2
45 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Estadística Clase 7
No ratings yet
Estadística Clase 7
24 pages
Chapter1 - An Overview of Regression Analysis
No ratings yet
Chapter1 - An Overview of Regression Analysis
35 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
15 pages
Newsletter 23 - Logit, Probit, Tobit (2P)
No ratings yet
Newsletter 23 - Logit, Probit, Tobit (2P)
2 pages
Multiple Regression Analysis & Applications
No ratings yet
Multiple Regression Analysis & Applications
23 pages
Cross Sectional
No ratings yet
Cross Sectional
40 pages
Chapter05DemandEstimation (1)
No ratings yet
Chapter05DemandEstimation (1)
41 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Linear Regression - Stats 2 (Translated)
No ratings yet
Linear Regression - Stats 2 (Translated)
63 pages
Regression With A Binary Dependent Variable: Michael Ash
No ratings yet
Regression With A Binary Dependent Variable: Michael Ash
18 pages
Stoltzfus (2011) Logreg
No ratings yet
Stoltzfus (2011) Logreg
6 pages
Multiple-Regression -Batool & Raya
No ratings yet
Multiple-Regression -Batool & Raya
24 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
Econometrics Chapter Three (1)
No ratings yet
Econometrics Chapter Three (1)
55 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
74 pages
Chapter Three 3.0 Methodology 3.1 Source of Data
No ratings yet
Chapter Three 3.0 Methodology 3.1 Source of Data
10 pages
11, 12. Predictive Analysis
No ratings yet
11, 12. Predictive Analysis
33 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
Week 6 - Result and Analysis 2 (UP)
No ratings yet
Week 6 - Result and Analysis 2 (UP)
7 pages
F_Regression
No ratings yet
F_Regression
65 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Week 2
No ratings yet
Week 2
43 pages
Econometrics 2
No ratings yet
Econometrics 2
128 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
Aqt 1
No ratings yet
Aqt 1
33 pages
Eco 5
No ratings yet
Eco 5
30 pages
Lecture 6 Topic 5 Basic Regression
No ratings yet
Lecture 6 Topic 5 Basic Regression
11 pages
The Linear Regression Model
No ratings yet
The Linear Regression Model
25 pages
FinQuiz - Curriculum Note, @InsightSquad Study Session 2, Reading 5
No ratings yet
FinQuiz - Curriculum Note, @InsightSquad Study Session 2, Reading 5
11 pages
Regression Analysis Linear Multiple Logistic
No ratings yet
Regression Analysis Linear Multiple Logistic
25 pages
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Statistical Software
No ratings yet
Statistical Software
19 pages
Ergonomical Hazard
No ratings yet
Ergonomical Hazard
64 pages
5-Global-Health-Measurement-and-GBD-Fall-2019-red
No ratings yet
5-Global-Health-Measurement-and-GBD-Fall-2019-red
64 pages
MATERNAL AND CHILD HEALTH SERVICE PROVISON ASSESSMENT IN (1)
No ratings yet
MATERNAL AND CHILD HEALTH SERVICE PROVISON ASSESSMENT IN (1)
151 pages
Research - Q2 - WK 2 SSC 7
No ratings yet
Research - Q2 - WK 2 SSC 7
13 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
Easyanova
No ratings yet
Easyanova
25 pages
Practical Research 1 Notes
No ratings yet
Practical Research 1 Notes
17 pages
House price predictor ppt Project
No ratings yet
House price predictor ppt Project
13 pages
Statistics Interview Questions
100% (1)
Statistics Interview Questions
7 pages
SCI 1020 - wk2
No ratings yet
SCI 1020 - wk2
4 pages
Unit 2 (3)
No ratings yet
Unit 2 (3)
100 pages
Data Warehousing, Mining, Neural Network
No ratings yet
Data Warehousing, Mining, Neural Network
26 pages
Indian Journal of Tourism and Hospitality Management Vol 11 No 1 June 2021
No ratings yet
Indian Journal of Tourism and Hospitality Management Vol 11 No 1 June 2021
137 pages
Operational Definition of Variables: What Is A Variable?
100% (2)
Operational Definition of Variables: What Is A Variable?
3 pages
Employees Management On Sport Developmen
No ratings yet
Employees Management On Sport Developmen
7 pages
Sana Shahid Ref. No is SPP 2024-10
No ratings yet
Sana Shahid Ref. No is SPP 2024-10
19 pages
Term Paper of Goup-F (Applied Research Methodology)
No ratings yet
Term Paper of Goup-F (Applied Research Methodology)
39 pages
Int 354
No ratings yet
Int 354
4 pages
Introduction To Stata - Lecture 4: Instrumental Variables: Hayley Fisher 3 March 2010
No ratings yet
Introduction To Stata - Lecture 4: Instrumental Variables: Hayley Fisher 3 March 2010
11 pages
EDA Unit IV
No ratings yet
EDA Unit IV
17 pages
Report
No ratings yet
Report
36 pages
Test On Heteroschedasticity Econometrics
No ratings yet
Test On Heteroschedasticity Econometrics
20 pages
Research Report
No ratings yet
Research Report
27 pages
Uji Kebebasan
No ratings yet
Uji Kebebasan
57 pages
Linear Probability Model (LPM) in Multiple Linear Regression Model With A Binary
No ratings yet
Linear Probability Model (LPM) in Multiple Linear Regression Model With A Binary
3 pages
Chapter 16 - MANAGING COSTS AND UNCERTAINTY
No ratings yet
Chapter 16 - MANAGING COSTS AND UNCERTAINTY
16 pages
Using SPSS A Little Syntax Guide PDF
100% (1)
Using SPSS A Little Syntax Guide PDF
72 pages

Logistic Regression-Advanced Biostat-PDF(1)

Uploaded by

Logistic Regression-Advanced Biostat-PDF(1)

Uploaded by

Advanced Biostatistics

Course Topic2: Simple Binary Logistic

4. The model requires a large sample sizes

 are parameters to be estimated,

• We need to transform  i so that the logistic

Original Value Internal Value

• Categorical independent variable coding:

• Treatment B is set to be reference category

• Null model (constant only)

B S.E. Wald df Sig. Exp(B)

• The statistic measures the amount of -2LL (deviance –

Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square

• Although SPSS does not give us this statistic for null

B S.E. Wald df Sig. Exp(B) Lower Upper

Constant .405 .289 1.973 1 .160 1.500

a. Variable(s) entered on step 1: Treatment alternative.

• The odds of cure from the disease for a patient based on a

Topic3: Multiple Binary Logistic Regression

• Clinical and epidemiological studies of binary outcome

5. No strong multi-collinearity among independent

  o ,  1 , ... ,  k are parameters to be estimated and

• We need to transform  i so that the logistic

• Equation (3) is the natural log of the odds ratio of the

H 1: At least one of B j 0; j1,2, ... , k

ii. Individual (partial) test

• If ˆ j  0 , the potential predictor variables ( X j ; j  1,2,3,...,

• Where ̂ j represents the estimated coefficient of  j

to the fit of the reduced model:

• where L R  likelihood of reduced model and L F is likelihood of full model

L F is likelihood of a specified model

LL 0 is - 2log - Likelihood of intercept only/null model

LL F is - 2log - Likelihood of a specified model

n= number of observations/sample size

 Usually pseudo Coefficient of determinations tend to be

• This can be achieved by subtracting 1 from the

• Thus, e  j is the multiplicative effect on the odds

• The probability that the response variable will assume

age age age

Sign of coronary absence of CHD 19 0 100.0

B S.E. Wald df Sig. Exp(B)

Step 0 Constant -.305 .352 .752 1 .386 .737

• P(CHD present)= 14/33=0.424,

Step 14.557 2 .001

1 30.430a .357 .479

Step Chi-square df Sig.

B S.E. Wald df Sig. Exp(B) 95% C.I.for

agecateg(1) 2.288 .938 5.956 1 .015 9.858 1.569 61.932

You might also like