0% found this document useful (0 votes)

2 views

LogisticRegression

Logistic Regression question +solution

Uploaded by

Helin Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

LogisticRegression

Logistic Regression question +solution

Uploaded by

Helin Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

STAT 3888

Semester 2 Statistical Machine Learning 2022

Tutorial Exercise 3.

1. Consider the following data collected by Erickson (1987) as part of a study on the
measurement of anaesthetic depth. The potency of an anaesthetic agent is measured
in terms of the minimum alveolar concentration (MAC) of the agent at which 50% of
patients exhibit no response to stimulation (i.e. do not move - moving means jerking
or twisting, not twitching or grimacing - in response to a surgical incision). Thirty
patients were administered an anaesthetic agent which was maintained at a predeter-
mined alveolar concentration (actually, anaesthetists refer to concentration when they
mean partial pressure hence alveolar concentration is measured as a percentage of one
atmosphere) for 15 minutes before a single incision was made in each patient. For each
patient, the alveolar concentration of the anaesthetic agent and the patient’s response
to incision was recorded.
Consider the following R code:

> x
[1] 0.8 0.8 0.8 0.8 0.8 0.8 0.8 1.0 1.0 1.0 1.0 1.0 1.2 1.2 1.2 1.2 1.2
1.2 1.4 1.4 1.4 1.4 1.4 1.4 1.6 1.6 1.6 1.6 2.5 2.5
> y
[1] 1 1 1 1 1 1 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
> res <- glm(y~x, family = "binomial")
> res
Call: glm(formula = y ~ x, family = "binomial")
Coefficients:
(Intercept) x
6.469 -5.567
Degrees of Freedom: 29 Total (i.e. Null); 28 Residual
Null Deviance: 41.46
Residual Deviance: 27.75 AIC: 31.75

The amount of Alveolar concentration is stored in the R vector x above. If the patient
“responds to stimulation” then the corresponding element of y is 1 and 0 otherwise.
A summary of the model can be found at the beginning for the following page.

1
> summary(res)
Call:
glm(formula = y ~ x, family = "binomial")
Deviance Residuals:
Min 1Q Median 3Q Max
-2.06900 -0.68666 -0.03413 0.74407 1.76666
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.469 2.418 2.675 0.00748 **
x -5.567 2.044 -2.724 0.00645 **
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 41.455 on 29 degrees of freedom
Residual deviance: 27.754 on 28 degrees of freedom
AIC: 31.754
Number of Fisher Scoring iterations: 5

Use the above R code and output to answer the following questions.

(a) State the fitted model.

Solution: The fitted model is

logit (P(y = 1)) = 6.469 − 5.567x

x

where logit(x) = log 1−x
. Alternatively,

P(y = 1) = [1 + exp (−(6.469 − 5.567x))]−1 = expit(6.469 − 5.567x)

where expit(x) = logit−1 (x) = 1/(1 + exp(−x)).

(b) Estimate the MAC value corresponding to P(y = 1) = 0.5?

Solution: If P(y = 1) = 0.5 then

logit (P(y = 1)) = 0

so we need to solve for x the equation

0 = 6.469 − 5.567x

which gives us x = 6.469/5.567 ≈ 1.162 to 3 d.p.

2
(c) If x = 1.1p predict whether or not the subject responds to stimulation?
Solution: If x = 1.1 then

P(y = 1) = [1 + exp (−(6.469 − 5.567 × 1.1))]−1 = [1 + exp (−0.3453)]−1 ≈ 0.585

to 3 d.p. Since the predicted probability is greater than 0.5 we would predict
y = 1, that is the responds to stimulation.
(d) Interpret the effect of increasing x by one unit on response.
Solution: Increasing the value of x decreases the log-odds of P(y = 1) by 1 by
5.567. To see this the log-odds for a value x = 1.1 is

P(y = 1)
log = 0.585
P(y = 0)

If we increase x to 2.1 the log-odds is

P(y = 1)
log = 6.469 − 5.567 × 2.1 = −5.2217
P(y = 0)

The difference between these is a decrease of 5.567 the estimated coefficient for x.

(e) Let β1 be the coefficient corresponding to x in the logistic regression model. What
is the Wald statistic for testing the hypothesis H0 : β1 = 0? What is the cor-
responding p-value? Is the coefficient β1 statistically significantly different from
0?
Solution: The Wald Statistic for the hypotheses

H0 : β1 = 0 versus H1 : β1 ̸= 0

is W1 = βb1 /se(βb1 ) = −5.567/2.044 = −2.724 to 3 d.p. The p-value is 2*pnorm(

abs(-2.724), lower.tail=FALSE) ≈ 0.00645. These values can be read from
the summary R output. The p-value is smaller than 0.05, so we reject the null
hypothesis that H0 : β1 = 0 and say that β1 is significantly different from 0.

2. A retrospective sample of males in a coronary heart disease (CHD) high-risk region

of the Western Cape, South Africa was collected. Samples were divided into cases (of
CHD) and controls. Many of the CHD positive men have undergone blood pressure
reduction treatment and other programs to reduce their risk factors after their CHD
event. The variables in the study are:

3
Variable Description
sbp systolic blood pressure
tobacco cumulative tobacco (kg)
ldl low density lipoprotein cholesterol
famhist family history of heart disease (Present, Absent)
bmi body mass index
alcohol current alcohol consumption
age age at onset
CHD response

Assume that the variables sbp, tobacco, ldl, famhist, bmi, alcohol, age and CHD
have already been entered as vectors in R.
Consider the R code:

res <- glm(chd~,data=dat,family=binomial)

summary(res)
\end{Sinput}
\begin{Soutput}
Call:
glm(formula = chd ~ ., family = "binomial", data = dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7781 -0.8213 -0.4387 0.8889 2.5435
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.1507209 1.3082600 -4.701 2.58e-06 ***
sbp 0.0065040 0.0057304 1.135 0.256374
tobacco 0.0793764 0.0266028 2.984 0.002847 **
ldl 0.1739239 0.0596617 2.915 0.003555 **
adiposity 0.0185866 0.0292894 0.635 0.525700
famhistPresent 0.9253704 0.2278940 4.061 4.90e-05 ***
typea 0.0395950 0.0123202 3.214 0.001310 **
obesity -0.0629099 0.0442477 -1.422 0.155095
alcohol 0.0001217 0.0044832 0.027 0.978350
age 0.0452253 0.0121298 3.728 0.000193 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 596.11 on 461 degrees of freedom
Residual deviance: 472.14 on 452 degrees of freedom
AIC: 492.14
Number of Fisher Scoring iterations: 5

Use the above R code and output to answer the following questions.

4
(a) State the model corresponding to the R object res.
Solution: The model can be written as

P(chd = 1)
log = −6.1507209 + 0.0065040 × sbp + 0.0793764 × tobacco
1 − P(chd = 1)
+0.1739239 × ldl + 0.0185866 × adiposity
+0.9253704 × famhist = “Present” + 0.0395950 × typea
−0.0629099 × obesity + 0.0001217 × alcohol
+0.0452253 × age

(b) Using model res what is the probability of chd=1 for a patient with measurements
sbp=160, tobacco=12.00, ldl=5.73, adiposity=23.11, famhist=”Present”, typea=49,
obesity=25.30, alcohol=97.20, and age=52.

P(chd = 1)
log = −6.1507209 + 0.0065040 × 160 + 0.0793764 × 12.00
1 − P(chd = 1)
+0.1739239 × 5.73 + 0.0185866 × 23.11
+0.9253704 + 0.0395950 × 49
−0.0629099 × 25.30 + 0.0001217 × 97.20
+0.0452253 × 52
= 0.9060059

Hence, P(chd = 1) = 1/(1 + exp(−0.9060059)) = 0.71 to 2 d.p.

(c) Write the R command to make this prediction.
Solution:
newdata <- data.frame(
sbp=160,
tobacco=12.00,
ldl=5.73,
adiposity=23.11,
famhist="Present",
typea=49,
obesity=25.30,
alcohol=97.20,
age=52
)
predict(res, newdata, type="response")
which returns 0.7121829.
(d) Which variables are significantly different from zero?
Solution: The variables that are significantly differnt from 0 are tobacco, ldl,
famhist, typea and age.

5
These questions assume knowledge of vector calculus
3. (Harder - not examinable) Suppose that y ∈ Rn , X ∈ Rn×p , β ∈ Rp , λ > 0 is a tuning
parameter. Consider the penalized likelihood
pℓ(β, λ) = − 21 ∥y − Xβ∥22 − λ∥β∥22 + costants

Show that the maximizer of pℓ(β, λ) for fixed λ is given by

b = XT X + λI −1 XT y

β

Solution
∂pℓ(β, λ)
= −XT Xβ + XT y − λIβ
∂β
Setting the above to zero after rearranging we get
XT X + λI β = XT y

multiplying both sides to the right by (XT X + λI) (which is always possible even when
p > n) gives the desired result.
4. (Harder - not examinable) Suppose that y ∈ Rn , X ∈ Rn×p , β ∈ Rp , λ > 0 is a tuning
parameter. Consider the penalized likelihood
pℓ(β, λ) = − 12 ∥y − Xβ∥22 − λ∥β∥1 + costants

(a) For a single coefficient βj find the maximizer of pℓ(β, λ) holding all other param-
eters fixed.
Solution: We can rewrite pℓ(β, λ) as
pℓ(β, λ) = − 12 ∥rj − Xj βj ∥22 − λ|βj | + costants in βj
where rj = y − X−j β −j where X−j is the matrix X with the jth column removed,
and β −j is the vector β with the jth element removed. Then

∂pℓ(β, λ) = −βj + rj T Xj − λ∂|βj |

assuming ∥Xj ∥ = 1. This is the same problem as described in the lectures whose
solution is
βbj = Sλ (rj T Xj )
where Sλ = sign(x)(x − λ)+ is the soft-max operator.
(b) Use the above result to suggest an algorithm to maximize pℓ(β, λ) for fixed λ > 0.
Solution: Set β. b Loop over the following 2 steps for each j = 1, . . . , p:
• Set rj = y − X−j β −j
• βbj = Sλ (rj T Xj )j)
Until the change is the βbj values is sufficiently small.

Problem Set 1
100% (2)
Problem Set 1
26 pages
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
No ratings yet
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
23 pages
Kelly Chase
No ratings yet
Kelly Chase
10 pages
International Safety Management Code: Background
100% (1)
International Safety Management Code: Background
38 pages
Sta 328.applied Regression Analysis Ii PDF
No ratings yet
Sta 328.applied Regression Analysis Ii PDF
12 pages
L03
No ratings yet
L03
2 pages
Problem 4.1 A)
No ratings yet
Problem 4.1 A)
11 pages
Ujian Akhir Bioslanjut 2013
No ratings yet
Ujian Akhir Bioslanjut 2013
8 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Ekram Assignment ECONO
100% (1)
Ekram Assignment ECONO
16 pages
Logistic
No ratings yet
Logistic
14 pages
Advanced Research Skills: Glms Ii Binomial Family
No ratings yet
Advanced Research Skills: Glms Ii Binomial Family
18 pages
Intergrated Problem
No ratings yet
Intergrated Problem
8 pages
HW 4
No ratings yet
HW 4
12 pages
BES220 Sick Nov2022
No ratings yet
BES220 Sick Nov2022
12 pages
340-s23-final
No ratings yet
340-s23-final
7 pages
Mlogit 2
100% (2)
Mlogit 2
14 pages
MATH1541-WE01 Statistics I May 2016
No ratings yet
MATH1541-WE01 Statistics I May 2016
8 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Mock-test-Econ
No ratings yet
Mock-test-Econ
3 pages
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
No ratings yet
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
27 pages
Project of Biostatistics#02-RaeesaAli-MS - BIOTECH
No ratings yet
Project of Biostatistics#02-RaeesaAli-MS - BIOTECH
27 pages
2101 F 17 Assignment 1
No ratings yet
2101 F 17 Assignment 1
8 pages
Department of Statistics: STATS 762: Topics in Regression Modelling Term Test Friday October 12, 2007
No ratings yet
Department of Statistics: STATS 762: Topics in Regression Modelling Term Test Friday October 12, 2007
6 pages
Burford-Exam 3 Spring 2018
No ratings yet
Burford-Exam 3 Spring 2018
6 pages
RM2017 Midterm Questions
No ratings yet
RM2017 Midterm Questions
9 pages
Stat331 hw1
No ratings yet
Stat331 hw1
4 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Tutorial 05 Soln
No ratings yet
Tutorial 05 Soln
4 pages
Modern Regression Homework 5-1
No ratings yet
Modern Regression Homework 5-1
8 pages
Mehran Riaz - 17232720-016 - QTB
No ratings yet
Mehran Riaz - 17232720-016 - QTB
13 pages
2019 Exam
No ratings yet
2019 Exam
14 pages
Logit R101
No ratings yet
Logit R101
27 pages
2018may 02402 Solution En
No ratings yet
2018may 02402 Solution En
36 pages
Assignment On Probit Model
No ratings yet
Assignment On Probit Model
17 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
TCD 2021
No ratings yet
TCD 2021
4 pages
ISYE6414 FA23 Practice Midterm Exam 2 Solutions
No ratings yet
ISYE6414 FA23 Practice Midterm Exam 2 Solutions
6 pages
2018 Exam
No ratings yet
2018 Exam
17 pages
BIO&Epi Final22
No ratings yet
BIO&Epi Final22
3 pages
AMS 315 Final Examination Solution F2019B PDF
No ratings yet
AMS 315 Final Examination Solution F2019B PDF
16 pages
13. Review of Logistic and Poisson Regression Models
No ratings yet
13. Review of Logistic and Poisson Regression Models
15 pages
Rss Grad Diploma Module4 Solutions Specimen B PDF
No ratings yet
Rss Grad Diploma Module4 Solutions Specimen B PDF
16 pages
Final Exam Practice Questions
No ratings yet
Final Exam Practice Questions
16 pages
FandI_Subj101_200109_examreport
No ratings yet
FandI_Subj101_200109_examreport
10 pages
Ps1 Sol Fall2016
No ratings yet
Ps1 Sol Fall2016
13 pages
Final Exam Econom
No ratings yet
Final Exam Econom
10 pages
Module 6A
No ratings yet
Module 6A
25 pages
Practice Exam2 PDF
100% (1)
Practice Exam2 PDF
8 pages
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
100% (1)
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
24 pages
LOG708 Applied Statistics 26.11.2021
No ratings yet
LOG708 Applied Statistics 26.11.2021
20 pages
Group4
No ratings yet
Group4
9 pages
Final 221220 Statmeth Exam
No ratings yet
Final 221220 Statmeth Exam
7 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Predictive Validity of The Structured Assessment of Violence Risk in Youth
No ratings yet
Predictive Validity of The Structured Assessment of Violence Risk in Youth
16 pages
Pemberian Asi Selama 6 Bulan
No ratings yet
Pemberian Asi Selama 6 Bulan
15 pages
Quick Reference: Cavities Catalog
No ratings yet
Quick Reference: Cavities Catalog
44 pages
Generated by - State Health
No ratings yet
Generated by - State Health
2 pages
66f86a96bae7314294e9e49e - ## - People Development and Environment 20 - Daily Class Notes
No ratings yet
66f86a96bae7314294e9e49e - ## - People Development and Environment 20 - Daily Class Notes
17 pages
Indian Ocean T Sunami 2004
No ratings yet
Indian Ocean T Sunami 2004
18 pages
Powersoft TN020 Unbal2Balanced en v1.0
No ratings yet
Powersoft TN020 Unbal2Balanced en v1.0
3 pages
Manual Cargador de Baterias Shumager
No ratings yet
Manual Cargador de Baterias Shumager
28 pages
E9-Đe Cuong Giưa Ki 2
No ratings yet
E9-Đe Cuong Giưa Ki 2
12 pages
4&5G CyberSecurity
No ratings yet
4&5G CyberSecurity
23 pages
Target_Business-Partner-Code-of-Conduct
No ratings yet
Target_Business-Partner-Code-of-Conduct
12 pages
Design and Fabrication of Trolley Operated Agricultural Multi-Nozzle Spray Pump
No ratings yet
Design and Fabrication of Trolley Operated Agricultural Multi-Nozzle Spray Pump
7 pages
Agriculturist Office
No ratings yet
Agriculturist Office
5 pages
Undo Me DOM For Hire 2 1st Edition Hazel Parker download
100% (2)
Undo Me DOM For Hire 2 1st Edition Hazel Parker download
63 pages
PDF
100% (1)
PDF
2 pages
Science 9 - Q3 - Module 5 - Climatic Phenomena Occurring On A Global Level
100% (2)
Science 9 - Q3 - Module 5 - Climatic Phenomena Occurring On A Global Level
25 pages
Operation Manual: Riso Collator Tc7100
100% (1)
Operation Manual: Riso Collator Tc7100
48 pages
According To The ACI 308 Recommended Practice
No ratings yet
According To The ACI 308 Recommended Practice
2 pages
Ignacio Saturnino Vs Philippine American Life Insurance Company
100% (1)
Ignacio Saturnino Vs Philippine American Life Insurance Company
2 pages
Science Trivia Quiz Answers
No ratings yet
Science Trivia Quiz Answers
14 pages
Medium Voltage Terminations: Catalogue 2017
No ratings yet
Medium Voltage Terminations: Catalogue 2017
24 pages
TREMATODES
No ratings yet
TREMATODES
36 pages
Physical Sciences State Standards
No ratings yet
Physical Sciences State Standards
5 pages
Penelitian Genap 21-22
No ratings yet
Penelitian Genap 21-22
19 pages
Flammable Combustible Liquids - Hazards OSH Answers
No ratings yet
Flammable Combustible Liquids - Hazards OSH Answers
5 pages
6305.OTP-638D2 - Datasheet (R (0) and Application (0-50)
No ratings yet
6305.OTP-638D2 - Datasheet (R (0) and Application (0-50)
11 pages
4 - 002 Qafco Urea3 Plant High Pressure
No ratings yet
4 - 002 Qafco Urea3 Plant High Pressure
13 pages
9. P2C9. Phy.-2nd-Paper-For-FRB-2024_Without Solve_Sha 29.5.24 (2)
No ratings yet
9. P2C9. Phy.-2nd-Paper-For-FRB-2024_Without Solve_Sha 29.5.24 (2)
14 pages
TRANSPORTASI
No ratings yet
TRANSPORTASI
5 pages

LogisticRegression

Uploaded by

LogisticRegression

Uploaded by

STAT 3888

Semester 2 Statistical Machine Learning 2022

(a) State the fitted model.

logit (P(y = 1)) = 6.469 − 5.567x

P(y = 1) = [1 + exp (−(6.469 − 5.567x))]−1 = expit(6.469 − 5.567x)

where expit(x) = logit−1 (x) = 1/(1 + exp(−x)).

(b) Estimate the MAC value corresponding to P(y = 1) = 0.5?

logit (P(y = 1)) = 0

so we need to solve for x the equation

which gives us x = 6.469/5.567 ≈ 1.162 to 3 d.p.

P(y = 1) = [1 + exp (−(6.469 − 5.567 × 1.1))]−1 = [1 + exp (−0.3453)]−1 ≈ 0.585

If we increase x to 2.1 the log-odds is

is W1 = βb1 /se(βb1 ) = −5.567/2.044 = −2.724 to 3 d.p. The p-value is 2*pnorm(

2. A retrospective sample of males in a coronary heart disease (CHD) high-risk region

res <- glm(chd~,data=dat,family=binomial)

Hence, P(chd = 1) = 1/(1 + exp(−0.9060059)) = 0.71 to 2 d.p.

Show that the maximizer of pℓ(β, λ) for fixed λ is given by

∂pℓ(β, λ) = −βj + rj T Xj − λ∂|βj |

You might also like