Binary Logistic Regression: JTMS-03 Applied Statistics With R
Binary Logistic Regression: JTMS-03 Applied Statistics With R
10.05.2021
10.05.2021 ASwR 2
Probability, odds, odds ratio
• Probability
– the likelihood of an event to occur given all possible outcomes
• Odds
– a function of the probability of a favored outcome, expressed as
the ratio of the probability that it will happen to the probability
that it will not happen
• Odds ratio
– The ratio of two odds. Generally, the ratio of the odds of outcome
A in the presence of condition B to the odds of outcome A in the
absence of condition B.
10.05.2021 ASwR 3
Probability, odds, odds ratio
10.05.2021 ASwR 4
Probability, odds, odds ratio
10.05.2021 ASwR 5
Probability, odds, odds ratio
10.05.2021 ASwR 6
Probability, odds, odds ratio
10.05.2021 ASwR 7
Probability, odds, odds ratio
The odds of women to have survived Titanic were about ten times
higher than the odds of men.
10.05.2021 ASwR 8
Binary logistic regression
Latent Ŷ
Probability of Y
10.05.2021 ASwR 9
Binary logistic regression
Binary Y
Latent Ŷ
as log-odds
Probability of Y
10.05.2021 ASwR 10
Standard logistic function
10.05.2021 ASwR 11
Binary logistic regression
Ŷ=
Latent Ŷ
as a z-score
P(Y) = Φ(Ŷ)
Probability of Y
10.05.2021 ASwR 12
Cumulative normal distribution function
1
.75
(Y')
Probability of Y =
• In fact, a z-score is
.5
precisely:
.25
0
-4 -2 0 2 4
z-score
10.05.2021 ASwR 13
Binary logistic regression
• Logit or Probit?
– estimated regression coefficients (constant and b’s) and their
significance are not identical
– In substantive terms the conclusions reached will be similar,
particularly with sufficiently large sample sizes
– Mostly a matter of choice and discipline
• Logit is more popular in health sciences
• Probit is more preferred in econometrics and political science
– Logit has the advantage of transforming the regression
coefficients from the linear prediction to odds ratios via
exponentiation
– Regression coefficients as estimated with the probit link cannot
be transformed to odds ratios
10.05.2021 ASwR 14
Binary logistic regression
• Parameter estimation
– goal is to find the parameters b that best fit:
>0
else
where ε is an error following:
• the standard logistic distribution (logit link)
or
• the standard normal distribution (probit link).
10.05.2021 ASwR 15
Maximum likelihood
10.05.2021 ASwR 16
Binary logistic regression
• Log-likelihood
– Recall that linear regression minimizes the squared differences
between the observed and the predicted values (OLS). The
smaller these differences (residuals), the closer the predictions
to the actual observations, hence also the better the model fit,
and the higher the amount of explained variation (R2) in the
dependent variable by the predictors.
– In logistic regression, the analogous measure to the sum of
squared residuals from linear regression is log-likelihood (LL). It
compares the observed and predicted probabilities of the
outcome:
10.05.2021 ASwR 17
Binary logistic regression
• Log-likelihood
– In order to assess model fit, we compare the LL of the baseline
model (only constant, no predictors) to that of the specified one:
10.05.2021 ASwR 18
Binary logistic regression
• Pseudo-R2
– As there is no ‘genuine’ analogon to R2 in logistic regression, we
have to use so-called pseudo-R2 measures, e.g.:
• Hosmer and Lemeshow’s (R2L)
• Nagelkerke’s (R2N)
– If bn < 0
All other predictors constant, for higher values of predictor Xn, the
probability of Y to occur decreases.
10.05.2021 ASwR 21
Binary logistic regression
• Odds
– Note that this applies only to the logit link
– Probability of the event to occur divided by the probability of the
event not to occur
odds(Y) = e
10.05.2021 ASwR 22
Binary logistic regression
10.05.2021 ASwR 23
Binary logistic regression
Test the hypothesis that studying more hours increases the probability
of passing the exam.
10.05.2021 ASwR 24
Binary logistic regression
1
that linear regression
Probability of passing
0 20 40 60
Hours studying
Observed Linear
10.05.2021 ASwR 25
Binary logistic regression
.6
.4
.2
0
0 20 40 60
Hours studying
10.05.2021 ASwR 26
Binary logistic regression
– Accessor function:
summary() model summary
10.05.2021 ASwR 27
Binary logistic regression
The odds ratio (OR = 1.16) informs that a one-unit increase in the
predictor (one more hour of studying) increases the odds of passing the
exam by a factor of 1.16. This can also be stated as: one additional
hour of studying increases the odds of passing the exam 1.16 times.
Another correct formulation is: one additional hour of studying
increases the odds of passing the exam by 16 % ((1.16 – 1)*100).
10.05.2021 ASwR 29
Binary logistic regression
The overall test of model fit, i.e. the comparison between the deviances
of the specified (residual) and the baseline (null) models, informs that
the specified model fits the data significantly better than the baseline
model: 2(1) = 11.67, p ≤ .01.
In other words, the included predictor explains a significant share of the
differences (passing the exam or not) among the 20 students.
10.05.2021 ASwR 31
Binary logistic regression
10.05.2021 ASwR 32
Binary logistic regression
– Accessor function:
summary() model summary
10.05.2021 ASwR 33
Binary logistic regression
10.05.2021 ASwR 34
Binary logistic regression
The overall test of model fit informs that the specified model fits the
data significantly better than the baseline model: 2(1) = 11.93, p ≤ .01.
The included predictor explains a significant share of the differences
(passing the exam or not) among the 20 students.
The estimates of the probit model differ slightly from those of the logit
model, but – in essence – lead to the same conclusion.
10.05.2021 ASwR 35
Binary logistic regression
10.05.2021 ASwR 36
Binary logistic regression
10.05.2021 ASwR 37
Binary logistic regression
10.05.2021 ASwR 40