Logistic Regression 223110099
Logistic Regression 223110099
Main
Reference
Introduction
• How well your set of predictor variables predicts or explains the categorical dependent
variable.
• Logistic regression is used when y is categorical with only two outcomes, i.e.
dichotomous/binary.
• If you have only one x, it is called “simple” logistic regression, and if you have more than
one x, it is called “multiple” logistic regression.
• A logistic regression is thus based on the fact that the outcome has only two possible
values: 0 (no) or 1 (Yes).
• Logistic regression is used to predict the “odds” of being a “case” based on the values of
the x-variable(s). The odds ratio is interpreted in the following way: “for every one-unit
increase in x, y increases/decreases by [the odds ratio]”.
Formula
Cancer No cancer
Smoking a/c
Not smoking b/d
Cancer No cancer
1200
Smoking 40 20
=6
200
Not smoking 10 30
Simple versus multiple regression models
• The difference between simple and multiple regression models, is
that in a multiple regression each x-variable’s effect on y is estimated
while taking into account the other x-variables’ effects on y.
Kanker paru-paru
Minum kopi
Merokok
A confounder is a variable that influences both the x-variable and the y-variable and, thus, makes you think that there is
an actual relationship between x and y (but it is due to z).
In data analysis, we commonly want to get rid of the confounding effects – in that context, we often talk about
“controlling” or “adjusting” for confounders
Simple logistic regression
For every additional day of unemployment, the odds of dying increases by a factor of 1.67.
Outpout
by a factor of 0.972
or inverse by 1/value
For every one-year decrease in age, the likelihood of having an active lifestyle increases by a factor of 1.03.
Having small children increases the likelihood to have a pet by a factor 1.49.
Family with small children were 1.49 more likely to have a pet than those without small children.
People with active lifestyle were 0.987 times less likely to be married than those without active lifestyle
aren’t
Model diagnostics
• Goodness of fit if the estimated model (i.e. the model with one or
more x-variables) predicts the outcome better than the null model
(i.e. a model without any x-variables).
specificity
sensitivity
Exercise
• Dependent variable = sleeping problem (yes/no)
• Independent variable = sex (male/female), age (continuous), caffeine
(yes/no)
Pertanyaan?
Twitter: @kbarbahar
Instagram: @akbarbahar
Email: [email protected]