0% found this document useful (0 votes)
45 views

Binary Logistic Regression: JTMS-03 Applied Statistics With R

Uploaded by

A B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Binary Logistic Regression: JTMS-03 Applied Statistics With R

Uploaded by

A B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

JTMS-03 Applied Statistics with R

Binary logistic regression

Dr. Georgi Dragolov

10.05.2021

Jacobs University Bremen


Regression for limited dependent variables

• Regression techniques for limited (categorical) dependent outcomes


– Nominal scale
• Binary (e.g. event occurs, event does not occur)
Binary logistic regression
• Multinomial (more than two categories that cannot be
ordered in a meaningful way: e.g. students’ choice among
several majors, political party preferences)
Multinomial logistic regression
– Ordinal scale
• More than two ordered categories
Ordered logistic regression

10.05.2021 ASwR 2
Probability, odds, odds ratio

• Probability
– the likelihood of an event to occur given all possible outcomes

• Odds
– a function of the probability of a favored outcome, expressed as
the ratio of the probability that it will happen to the probability
that it will not happen

• Odds ratio
– The ratio of two odds. Generally, the ratio of the odds of outcome
A in the presence of condition B to the odds of outcome A in the
absence of condition B.

10.05.2021 ASwR 3
Probability, odds, odds ratio

• Example 1: Surviving Titanic by sex


Titanic Survived
No Yes
Man n11 1364 n12 367 n1. 1731
Sex

Woman n21 126 n22 344 n2. 470


n.1 1490 n.2 711 N 2201

The cross-tabulation above shows the number of Titanic survivors with


respect to their biological sex.

10.05.2021 ASwR 4
Probability, odds, odds ratio

• Example 1: Surviving Titanic by sex


Titanic Survived
No Yes
Man n11 1364 n12 367 n1. 1731
Sex

Woman n21 126 n22 344 n2. 470


n.1 1490 n.2 711 N 2201

– Probability of survival for any passenger:


The probability of survival was 0.323 or 32.3 %.
The “risk” of survival was 0.323.
32.3 % of all passengers survived.
The proportion of survivors was 0.323.

10.05.2021 ASwR 5
Probability, odds, odds ratio

• Example 1: Surviving Titanic by sex


Titanic Survived
No Yes
Man n11 1364 n12 367 n1. 1731
Sex

Woman n21 126 n22 344 n2. 470


n.1 1490 n.2 711 N 2201

– Probability of survival for men:

– Probability of survival for women:

10.05.2021 ASwR 6
Probability, odds, odds ratio

• Example 1: Surviving Titanic by sex


Titanic Survived
No Yes
Man n11 1364 n12 367 n1. 1731
Sex

Woman n21 126 n22 344 n2. 470


n.1 1490 n.2 711 N 2201

– Odds of survival for any passenger:


The odds of survival are 1 to 2: one survives, whereas 2 do not.
– Odds of survival for men:

– Odds of survival for women:

10.05.2021 ASwR 7
Probability, odds, odds ratio

• Example 1: Surviving Titanic by sex


Titanic Survived
No Yes
Man n11 1364 n12 367 n1. 1731
Sex

Woman n21 126 n22 344 n2. 470


n.1 1490 n.2 711 N 2201

– Odds ratio of survival for women:

The odds of women to have survived Titanic were about ten times
higher than the odds of men.

10.05.2021 ASwR 8
Binary logistic regression

• Dependent variable Y: dichotomous


• Predictors X: continuous and/or categorical
• Logic of the method
– the original binary dependent outcome Y (event occurs or not) is
transformed into a probability of occurrence on the basis of a
linear prediction that applies a so-called link function
– the linear prediction produces a latent (unobserved) variable (Ŷ)
that underlies the manifest (observed) binary measurement
Binary Y

Latent Ŷ

Probability of Y

10.05.2021 ASwR 9
Binary logistic regression

• Link function: logit


– based on the standard logistic function (see next slide)

Binary Y

Latent Ŷ
as log-odds

Probability of Y

10.05.2021 ASwR 10
Standard logistic function

• Standard Logistic function


– S-shaped (sigmoid)
– takes any real number t and outputs values between 0 and 1

10.05.2021 ASwR 11
Binary logistic regression

• Link function: probit


– based on the cumulative function of the normal distribution (see
next slide)
Binary Y

Ŷ=
Latent Ŷ
as a z-score

P(Y) = Φ(Ŷ)
Probability of Y

10.05.2021 ASwR 12
Cumulative normal distribution function

• For any value x from a distribution with a mean μ and a standard


deviation σ, Φ ∈ [0, 1]

1
.75
(Y')
Probability of Y =

• In fact, a z-score is
.5

precisely:
.25
0

-4 -2 0 2 4
z-score

10.05.2021 ASwR 13
Binary logistic regression

• Logit or Probit?
– estimated regression coefficients (constant and b’s) and their
significance are not identical
– In substantive terms the conclusions reached will be similar,
particularly with sufficiently large sample sizes
– Mostly a matter of choice and discipline
• Logit is more popular in health sciences
• Probit is more preferred in econometrics and political science
– Logit has the advantage of transforming the regression
coefficients from the linear prediction to odds ratios via
exponentiation
– Regression coefficients as estimated with the probit link cannot
be transformed to odds ratios

10.05.2021 ASwR 14
Binary logistic regression

• Parameter estimation
– goal is to find the parameters b that best fit:
>0
else
where ε is an error following:
• the standard logistic distribution (logit link)
or
• the standard normal distribution (probit link).

– This is done using maximum likelihood estimation (MLE)

10.05.2021 ASwR 15
Maximum likelihood

• Maximum likelihood (ML) is an iterative method for parameter


estimation which aims to maximize the probability that the specified
model fits the observed data.
• Here we will not delve into the formal technicalities of ML.
• The logic behind the estimation method can be illustrated using the
metaphor of a tourist aiming to reach the highest peak of a
mountain.

10.05.2021 ASwR 16
Binary logistic regression

• Log-likelihood
– Recall that linear regression minimizes the squared differences
between the observed and the predicted values (OLS). The
smaller these differences (residuals), the closer the predictions
to the actual observations, hence also the better the model fit,
and the higher the amount of explained variation (R2) in the
dependent variable by the predictors.
– In logistic regression, the analogous measure to the sum of
squared residuals from linear regression is log-likelihood (LL). It
compares the observed and predicted probabilities of the
outcome:

– Larger LL values indicate a poorly fitting model

10.05.2021 ASwR 17
Binary logistic regression

• Log-likelihood
– In order to assess model fit, we compare the LL of the baseline
model (only constant, no predictors) to that of the specified one:

– The resulting difference between the LL’s of the two models


follows the chi-square (2) distribution
– The degrees of freedom (df) are based on the number of
parameters k to be estimated:
• Baseline model: k = 1 (only the constant)
• Specified model: k = 1 + number of predictors

10.05.2021 ASwR 18
Binary logistic regression

• Pseudo-R2
– As there is no ‘genuine’ analogon to R2 in logistic regression, we
have to use so-called pseudo-R2 measures, e.g.:
• Hosmer and Lemeshow’s (R2L)

• Cox and Snell’s (R2CS)

• Nagelkerke’s (R2N)

– There is no consensus as to which is ‘the’ accurate one


10.05.2021 ASwR 19
Binary logistic regression

• Testing predictors’ (X) significance

– Null hypothesis: Independent variable X has no effect on the


probability of occurrence of Y, i.e. Y is independent of X
– using a Wald-test:

where b is the regression coefficient and SEb is its standard error.


– SPSS assesses the significance of the Wald statistic in its
squared form (Wald2), thereby assuming that it follows the chi-
square distribution
– R and Stata assume the Wald statistic to be z-distributed
10.05.2021 ASwR 20
Binary logistic regression

• Interpretation of regression coefficient bn


– If bn > 0
All other predictors constant, for higher values of predictor Xn, the
probability of Y to occur increases.

– If bn < 0
All other predictors constant, for higher values of predictor Xn, the
probability of Y to occur decreases.

10.05.2021 ASwR 21
Binary logistic regression

• Odds
– Note that this applies only to the logit link
– Probability of the event to occur divided by the probability of the
event not to occur

– If we plug in and reshuffle:

odds(Y) = e

10.05.2021 ASwR 22
Binary logistic regression

• Odds ratio (OR)


– Change in the odds resulting from a unit change (x+1) in the
predictor
OR =

– For a one unit increase in X, all other predictors constant, the


odds of Y to occur change by a factor of exp(b), i.e. eb
– If OR = 1: equal odds, no relationship between predictor and
outcome
– If OR > 1: one unit increase in X increases the odds of Y to occur
by a factor of (OR)
– If OR < 1: one unit increase in X lowers the odds of Y to occur by
a factor of (OR)

10.05.2021 ASwR 23
Binary logistic regression

• Example 2: Hours of studying and passing an exam


We have data from 20 students on the hours they spent preparing for
an exam (variable hrsstudy in the dataset) and whether they passed it
or not (variable pass).

Hours 5 7.5 10 12.5 15 17.5 17.5 20 22.5 25


Pass 0 0 0 0 0 0 1 0 1 0

Hours 27.5 30 32.5 35 40 42.5 45 47.5 50 55


Pass 1 0 1 0 1 1 1 1 1 1

Test the hypothesis that studying more hours increases the probability
of passing the exam.

10.05.2021 ASwR 24
Binary logistic regression

• Example 2: Hours of studying and passing an exam


Linear regression is not appropriate since its core assumptions are
violated: the dependent variable is binary, not normally distributed.

The scatter plot shows

1
that linear regression
Probability of passing

will result in considerable


over- and underpredictions.
.5
0

0 20 40 60
Hours studying

Observed Linear

10.05.2021 ASwR 25
Binary logistic regression

• Example 2: Hours of studying and passing an exam


The appropriate method is binary logistic regression. A logistic curve
fitted to the data looks like:
1
.8
Probability of passing

.6
.4
.2
0

0 20 40 60
Hours studying

Logistic curve Observed

10.05.2021 ASwR 26
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Binary logistic regression model with a logit link in R

ex2.logit <- glm(pass ~ hrsstdy,


family= binomial(link= "logit"))

– Accessor function:
summary() model summary

– Some important elements:


model$coefficients vector with model coefficients
model$fitted.values vector with predicted probabilities

10.05.2021 ASwR 27
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: logit link
The summary presents, among others, the regression estimates. These
include the regression coefficients from the linear prediction, their
standard errors, and the corresponding Wald test of their significance.

The relationship between the hours of studying and the probability of


passing the exam is positive and significant: b = 0.150, p = .017 (two-
sided). The result informs that a one-unit increase in the independent
variable (i.e. one more hour of studying) significantly increases the
probability of passing the exam. The tested one-tailed hypothesis can
be supported with the data. Yet, is the effect strong or weak?
10.05.2021 ASwR 28
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: logit link
Substantive information on the strength of the effect of the predictor is
offered by its odds ratio (exponentiated regression coefficient).

The odds ratio (OR = 1.16) informs that a one-unit increase in the
predictor (one more hour of studying) increases the odds of passing the
exam by a factor of 1.16. This can also be stated as: one additional
hour of studying increases the odds of passing the exam 1.16 times.
Another correct formulation is: one additional hour of studying
increases the odds of passing the exam by 16 % ((1.16 – 1)*100).

10.05.2021 ASwR 29
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: logit link

The model summary further displays the deviances (-2*log-likelihood)


of the baseline (null) model and of the specified (residual) model.
The baseline model includes only the constant. When no further
information is available, this model would be the ‘best guess’ about the
probability of the outcome.
An overall test of model fit can be performed by contrasting the
deviance of the specified (residual) model to that of the baseline (null)
model. The resulting test statistic is the difference of the two deviances.
It is chi-square distributed with degrees of freedom equal to the
difference between the degrees of freedom of both models.
10.05.2021 ASwR 30
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: logit link
dev.chisq <- with(ex2.logit, null.deviance - deviance)
dev.df <- with(ex2.logit, df.null - df.residual)
dev.p <- with(ex2.logit, pchisq(null.deviance - deviance, df.null - df.residual,
lower.tail= FALSE))
print(c(round(dev.chisq, 3), dev.df, round(dev.p, 3)))
## 11.666 1.000 0.001

The overall test of model fit, i.e. the comparison between the deviances
of the specified (residual) and the baseline (null) models, informs that
the specified model fits the data significantly better than the baseline
model: 2(1) = 11.67, p ≤ .01.
In other words, the included predictor explains a significant share of the
differences (passing the exam or not) among the 20 students.
10.05.2021 ASwR 31
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: logit link
library(DescTools)
PseudoR2(ex2.logit, which= c("CoxSnell", "Nagelkerke", "McFadden"))
## CoxSnell Nagelkerke McFadden
## 0.4419499 0.5892665 0.4207667

The share of explained variation in the dependent variable is indicated


by the reported pseudo-R2 measures. It ranges from 42 % (according to
McFadden’s approach) up to 59 % (according to Nagelkerke’s R2).

10.05.2021 ASwR 32
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Binary logistic regression model with a probit link in R

ex2.probit <- glm(pass ~ hrsstdy,


family= binomial(link= "probit"))

– Accessor function:
summary() model summary

– Some important elements:


model$coefficients vector with model coefficients
model$fitted.values vector with predicted probabilities

10.05.2021 ASwR 33
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: probit link

The regression coefficients, as estimated with the probit link, are


different from those of the logit model, but lead to the same conclusion.
There is a positive and significant relationship between the hours of
studying and the probability of passing the exam: b = 0.091, p = 0.007
(two-sided). The tested one-tailed hypothesis can be supported with the
data.
One drawback of the probit link is that its regression coefficients from
the linear prediction cannot be transformed into odds-ratio.

10.05.2021 ASwR 34
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: probit link
dev.chisq <- with(ex2.probit, null.deviance - deviance)
dev.df <- with(ex2.probit, df.null - df.residual)
dev.p <- with(ex2.probit, pchisq(null.deviance - deviance, df.null - df.residual,
lower.tail= FALSE))

print(c(round(dev.chisq, 3), dev.df, round(dev.p, 3)))


## 11.930 1.000 0.001

The overall test of model fit informs that the specified model fits the
data significantly better than the baseline model: 2(1) = 11.93, p ≤ .01.
The included predictor explains a significant share of the differences
(passing the exam or not) among the 20 students.
The estimates of the probit model differ slightly from those of the logit
model, but – in essence – lead to the same conclusion.
10.05.2021 ASwR 35
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Evidence from R: probit link
library(DescTools)
PseudoR2(ex2.probit, which= c("CoxSnell", "Nagelkerke", "McFadden"))
## CoxSnell Nagelkerke McFadden
## 0.4492756 0.5990342 0.430298

The share of explained variation in the dependent variable is indicated


by the reported pseudo-R2 measures. It ranges from 43 % (according to
McFadden’s approach) up to 60 % (according to Nagelkerke’s R2).

As already mentioned, there is no agreement among statisticians and


practitioners as to which of them is the ‘best’.

10.05.2021 ASwR 36
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Logit vs. probit estimates
Logit Probit
b p b p
Hours of studying 0.15 0.017 0.09 0.007
Constant -4.08 0.021 -2.47 0.009

– Linear prediction of latent Ŷ


• Logit: Ŷ = -4.08 + 0.15 * Hours of studying
• Probit: Ŷ = -2.47 + 0.09 * Hours of studying

10.05.2021 ASwR 37
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Probability of passing the exam at given hours of studying
Hours of Logit Probit
studying
Latent Ŷ p Latent Ŷ P
10 -2.57 -1.56
20 -1.07 -0.65
30 0.44 0.27
40 1.94 1.18
50 3.45 2.09
60 4.95 3.00

Ŷ = -4.08 + 0.15 * Hours Ŷ = -2.47 + 0.09 * Hours


10.05.2021 ASwR 38
Binary logistic regression

• Example 2: Hours of studying and passing an exam


– Probability of passing the exam at given hours of studying
Hours of Logit Probit
studying
Latent Ŷ p Latent Ŷ P
P(Y) =
10 -2.57 0.07 -1.56 0.06
probability of
20 -1.07 0.26 -0.65 0.26 z (latent Ŷ)
30 0.44 0.61 0.27 0.60 based on the
cumulative
40 1.94 0.87 1.18 0.88 normal
50 3.45 0.97 2.09 0.98 distribution
60 4.95 0.99 3.00 0.99 function

P(Y) = 1 / (1 + exp(-(-4.08 + 0.15 * Hours)))


10.05.2021 ASwR 39
Binary logistic regression

• Logit vs. Probit


As exemplified by the comparison of the estimated probabilities with the
logit and probit links, the two methods lead to substantively similar
conclusions.

• Multiple predictors possible


Example 2 demonstrated the logic and application of binary logistic
regression with the logit and probit links using only one predictor. Like
multiple linear regression, binary logistic regression can accommodate
multiple categorical and continuous predictors. Categorical predictors
should be specified in the form of dummies.

10.05.2021 ASwR 40

You might also like