Logistic Regression
Logistic Regression
Logistic Function
The function g(z) is the logistic function, also known as the sigmoid
function.
The logistic function has asymptotes at 0 and 1, and it crosses the y-
axis at 0.5.
Logistic function .
The result is the logistic regression hypothesis:
𝜋
𝑙𝑜𝑔𝑖𝑡 (𝑌) = 𝑙𝑛 ( )
1−𝜋
= 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋2 (3)
Therefore
𝜋 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦(𝑌|𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 )
𝛼+𝛽1 𝑋1 +𝛽2 𝑋2
𝑒
= (4)
1 + 𝑒 𝛼+𝛽1𝑋1 +𝛽2𝑋2
Predictor 𝛽 𝑆𝐸 𝛽 Wald’s 𝑑𝑓 𝑝 𝑒𝛽
𝜒2 (𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜)
Constant 0.5340 0.8109 0.4337 1 0.5102 NA
Reading -0.0261 0.0122 4.5648 1 0.0326 0.9742
Gender 0.6477 0.3248 3.9759 1 0.0462 1.9111
(1=boys,
0=girls)
Question
To assess the effect modification relating the outcome of interest
(Y) to independent variables representing the treatment assignment,
sex and the product of the two (called the treatment by sex
interaction variable). Where T is the treatment assignment (1=new
drug and 0=placebo), M = male gender (1=yes, 0=no) and TM, i.e., T *
M or T x M is the product of treatment and male gender, the multiple
regression analysis revealed the following: results
Regression P-
Independent Variable T
Coefficient value
Intercept 39.24 65.89 0.0001
T (Treatment) -0.36 -0.43 0.6711
M (Male Gender) -0.18 -0.13 0.8991
TM (Treatment x Male
6.55 3.37 0.0011
Gender)
Deduce the multiple regression model, hence comment on the results
The expected HDL for men (M=1) assigned to the placebo (T=0) is:
𝑌̂ = 39.24 − 0.36(0) − 0.18(1) + 6.55(0)(1) = 39.06
Similarly, the expected HDL for women (M=0) assigned to the new
drug (T=1) is:
𝑌̂ = 39.24 − 0.36(1) − 0.18(0) + 6.55(1)(0) = 38.88
The expected HDL for women (M=0) assigned to the placebo (T=0)
is:
𝑌̂ = 39.24 − 0.36(0) − 0.18(0) + 6.55(0)(0) = 39.24
Logistic Regression in R
The Dataset
mtcars(motor trend car road test) comprises fuel consumption,
performance, and 10 aspects of automobile design for 32 automobiles.
It comes pre-installed with dplyr package in R.
R
# Loading package
library(dplyr)
# Loading package
library(caTools)
library(ROCR)
# Training model
logistic_model <- glm(vs ~ wt + disp,
data = train_reg,
family = "binomial")
logistic_model
# Summary
summary(logistic_model)
Output:
Output:
Call:
glm(formula = vs ~ wt + disp, family = "binomial", data = train_reg)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6552 -0.4051 0.4446 0.6180 1.9191
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.58781 2.60087 0.610 0.5415
wt 1.36958 1.60524 0.853 0.3936
disp -0.02969 0.01577 -1.882 0.0598 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call: The function call used to fit the logistic regression model is
displayed, along with information on the family, formula, and data.
R
predict_reg <- predict(logistic_model,
test_reg, type = "response")
predict_reg
Output:
Hornet Sportabout Merc 280C Merc 450SE Chrysler
Imperial
0.01226166 0.78972164 0.26380531
0.01544309
AMC Javelin Camaro Z28 Ford Pantera L
0.06104267 0.02807992 0.01107943
R
# Changing probabilities
predict_reg <- ifelse(predict_reg >0.5, 1, 0)
# ROC-AUC Curve
ROCPred <- prediction(predict_reg, test_reg$vs)
ROCPer <- performance(ROCPred, measure = "tpr",
x.measure = "fpr")
# Plotting curve
plot(ROCPer)
plot(ROCPer, colorize = TRUE,
print.cutoffs.at = seq(0.1, by = 0.1),
main = "ROC CURVE")
abline(a = 0, b = 1)
auc <- round(auc, 4)
legend(.6, .4, auc, title = "AUC", cex = 1)