0% found this document useful (0 votes)

6 views

MLS - Logistic Regression

The document discusses Logistic Regression, highlighting its importance for predicting categorical outcomes as opposed to Linear Regression, which predicts continuous variables. It covers performance metrics such as accuracy, precision, recall, and the F1 score, emphasizing the need for careful threshold selection to optimize model performance. Additionally, it explains the relationship between the logit function, sigmoid function, and Logistic Regression, along with the use of ROC curves to evaluate model effectiveness.

Uploaded by

shaam solanki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

MLS - Logistic Regression

Uploaded by

shaam solanki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

shaamsolanki@gmail.

com
DLOE0CFJ6M
Logistic Regression

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Discussion Questions
1. Why do we need Logistic Regression?
a. What is the difference between Regression and Classiﬁcation?
b. Can we use Linear Regression for a classiﬁcation problem?
c. What is the difference between Linear and Logistic Regressions?
d. What is the output of a Logistic Regression model?
2. How do we measure the performance of a Logistic Regression model?
[email protected]
3. Why is accuracy not always a good performance measure?
DLOE0CFJ6M

4. How do we choose the optimal threshold value?

5. What are some other ways to look at the model performance?

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Why do we need Logistic Regression?
● Linear Regression uses a set of independent variables to predict a continuous dependent variable whereas Logistic
Regression is used where the dependent variable is categorical/discrete.
● Linear Regression can not be used to predict probabilities because we can not restrict its output between 0 and 1.
● Logistic Regression is a supervised learning algorithm that estimates the log of odds for an event which can be used
to predict the probability of the occurring of that event.
● The predicted probabilities can be converted to classes based on the threshold (the default threshold is 0.5).
[email protected]
DLOE0CFJ6M

This file is meant for personal use by [email protected] only.

Source: https://www.javatpoint.com/linear-regression-vs-logistic-regression-in-machine-learning
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Confusion Matrix
It is a tabular representation of the predicted vs actual classes. We can use it to assess the performance of a Logistic
Regression model.
● True Positives (TP): The model predicted the class as positive, and it is actually positive.
● True Negatives (TN): The model predicted the class as negative, and it is actually negative.
● False Positives (FP): The model predicted the class as positive, but it is actually negative.(Also known as a "Type I
error").
● False Negatives (FN): The model predicted the class as negative, but it is actually positive.(Also known as a "Type II
error").
[email protected]
DLOE0CFJ6M Actual Values
Positive (1) Negative (0) Metric's that are often computed from a confusion
matrix for a binary classiﬁer:
Predicted Values

Positive (1) TP FP
• Accuracy = TP + TN / (TP + FP + FN + TN)
• Precision = TP / (TP + FP)
• Recall or sensitivity = TP / (TP+FN)
Negative (0) TN • Speciﬁcity = TN / (TN + FP)
FN

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Why is accuracy not always a good performance measure?
Accuracy is simply the overall % of correct predictions and can be high even for very useless models.

# Total
Model Misses out
Patients – 100
Cancer rate – predicts that Accuracy – critical
# of Patients
2% no one has 98% patients
having cancer-
cancer having cancer
2

Here Accuracy will be 98%, even if we predict all

[email protected]
● ● The other measures are Recall and Precision
DLOE0CFJ6M
patients do not have cancer. ○ Recall - What % of actuals 1s did we
● In this case, Recall should be used as a measure of capture in my prediction?
the model performance, High recall implies less ○ Precision - What % of our predicted 1s are
False Negatives. actually 1?
● Less False Negatives imply fewer chances of ● There is a tradeoff - as you try to increase Recall,
predicting a patient having cancer as a patient not Precision will reduce and vice versa
having cancer. ● This tradeoff can be used to ﬁgure out the right
● This is where we need other metrics to evaluate threshold to use for the model
model performance.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
How do we choose the optimal threshold value?
● Precision-Recall is a useful measure of success of
prediction when the classes are imbalanced.
● The Precision-Recall curve shows the tradeoff
between Precision and Recall for different
thresholds.
● It can be used to select optimal threshold as
required to improve the model improvement
[email protected]
● Here we can see, Precision and Recall are the same
DLOE0CFJ6M
when the threshold is 0.4
● If we want higher Precision, we can increase the
threshold.
● If we want higher Recall, we can decrease the
Choosing a threshold can completely
threshold.
change the model performance
assessment
It is important to think about what is
the ‘sweet spot’.

This file is meant for personal use by [email protected] only.

● F1 Score is a measure that takes into account both Precision and Recall.
● F1 Score is the harmonic mean of Precision and Recall. Therefore, this score takes both False Positives and False
Negatives into account.

[email protected]
DLOE0CFJ6M

● The highest possible value of an F1 Score is 1, indicating perfect precision and recall, and the lowest possible value is
0.

This file is meant for personal use by [email protected] only.

We can choose a threshold

based on the point where
the vertical distance
between the plot and the
baseline is maximum. It generally varies from
0.5 to 1.

[email protected] If it is less than 0.5, then

DLOE0CFJ6M
there is something
terribly wrong with the
model as it is doing
worse than
random/baseline.

AUC = Area under the ROC Curve

This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Appendix
[email protected]
DLOE0CFJ6M

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
How does Logistic Regression work?
To understand the concept of a Logistic Regression, it is important to understand the concept of Odds Ratio, Logit function,
and Sigmoid function (or Logistic function)

Odds Ratio (OR):

● Odds Ratio (OR) is the odds in favor of a particular event.
● Let P be the probability of subjects affected, then
Odds = P/(1-P)
[email protected]
DLOE0CFJ6M
Logit Function:
● Logit function is the logarithm of the Odds Ratio (log-odds). It takes input values in the range 0 to 1 and then
transforms them to value over the entire real number range.
● If P is the probability, then
Logit(P) = Log(P/(1-P))

Sigmoid function:
● The inverse of the logit function is the sigmoid function.
● The Sigmoid Function can take any real value and map it to a value between 0 and 1.
● It is also called Logistic Function and gives an S shaped curve.
Sigmoid(x) = 1 / (1 + e^(-x))
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
What is the relationship between Logit, Sigmoid and Logistic
Regression
● Linear Regression Equation
○ Y = a1 + a2*x + error
● If the dependent variable Y is the logit function
○ Logit(P) = Y = a1 + a2*x + error
where P = the probability of sample belonging to a class
[email protected]
DLOE0CFJ6M○ log(P/1-P) = a1 + a2*x + error
● Apply the sigmoid function over LHS and RHS to get probabilities,
○ sigmoid(log(P/1-P)) = sigmoid( a1 + a2*x + error )
● So, we get,
○ P = 1 / (1 + e^-(a1 + a2*x + error))
○ This ‘P’ is the output of the Logistic Regression model, i.e. we are getting the probability of sample
belonging to a class.
● Usually if P>0.5, we mark it as positive, and if P<0.5, we mark it as negative
● This cut-off point, known as Threshold, can be changed between 0 to 1, depending on the context of the problem.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
ROC Curve

1. It is a plot between TPR(True Positive We can choose threshold based on

Rate)/Sensitivity and FPR(False Positive Rate)/(1 - the point where the vertical distance
Speciﬁcity) for varying thresholds in the same model between the plot and the baseline is
maximum.
2. The area under the ROC curve (AUC) is a measure of
how good a model is - The higher the AUC, the better
the model is, at distinguishing between classes
[email protected]
DLOE0CFJ6M
3. The red diagonal represents a model whose
predictions are as good as random
4. The further the ROC curve is from the diagonal line,
the better the model is, at distinguishing between
positive and negative classes
5. We can use this curve for getting a better threshold
value as per our requirement.

This file is meant for personal use by [email protected] only.

Fundamentals of Foundation Engineering (2023)
100% (8)
Fundamentals of Foundation Engineering (2023)
436 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Logistic Regression
No ratings yet
Logistic Regression
22 pages
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
No ratings yet
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
33 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
Logistic+regression Data
No ratings yet
Logistic+regression Data
13 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG
No ratings yet
Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG
29 pages
Chapter 4 Statistical Classification Methods
No ratings yet
Chapter 4 Statistical Classification Methods
63 pages
Module 2
No ratings yet
Module 2
92 pages
D2S1 - Classification Algorithms
No ratings yet
D2S1 - Classification Algorithms
30 pages
DS_UNIT_4
No ratings yet
DS_UNIT_4
13 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
logistic regression
No ratings yet
logistic regression
6 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistic Regression[2]
No ratings yet
Logistic Regression[2]
36 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
MLS+2+-+Classification
No ratings yet
MLS+2+-+Classification
13 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Lecture 4-Logistic Regression
No ratings yet
Lecture 4-Logistic Regression
20 pages
ML-Unit I - Logistic Regression
No ratings yet
ML-Unit I - Logistic Regression
102 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Logisttic Regression, ROC Curve, Cost Function
No ratings yet
Logisttic Regression, ROC Curve, Cost Function
10 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression in R and Python
No ratings yet
Logistic Regression in R and Python
9 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Part A Assignment - No - 5 PDF
No ratings yet
Part A Assignment - No - 5 PDF
8 pages
logisticregression
No ratings yet
logisticregression
22 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Dav Exp4 66
No ratings yet
Dav Exp4 66
5 pages
AIML_Lab7_Manual (Model Eval-Cross Validation)
No ratings yet
AIML_Lab7_Manual (Model Eval-Cross Validation)
6 pages
Day_3,Task_2
No ratings yet
Day_3,Task_2
2 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
FALLSEM2024-25 MMAT501L TH VL2024250107615 2024-09-24 Reference-Material-I
No ratings yet
FALLSEM2024-25 MMAT501L TH VL2024250107615 2024-09-24 Reference-Material-I
12 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
10 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
From Everand
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
Aarav Joshi
No ratings yet
MidtermExam Costing
No ratings yet
MidtermExam Costing
9 pages
OOPS(Python) Laboratory Manual 2025 1-50 EXP
No ratings yet
OOPS(Python) Laboratory Manual 2025 1-50 EXP
54 pages
The Future of the Tpower1 Tpower Tpower2 Tpower3
No ratings yet
The Future of the Tpower1 Tpower Tpower2 Tpower3
3 pages
PE 2 Module 1 Fundamentals of Rhythmic Activities
No ratings yet
PE 2 Module 1 Fundamentals of Rhythmic Activities
10 pages
Victims Crime And Society Annotated Edition Pamela Davies Peter Francis instant download
No ratings yet
Victims Crime And Society Annotated Edition Pamela Davies Peter Francis instant download
86 pages
Conversation Read, and Fill in The Spaces.: Finished Past Unfinished Past
No ratings yet
Conversation Read, and Fill in The Spaces.: Finished Past Unfinished Past
5 pages
Aggression and Violence in Adolescence 1st Edition Robert F. Marcus 2024 Scribd Download
100% (2)
Aggression and Violence in Adolescence 1st Edition Robert F. Marcus 2024 Scribd Download
79 pages
Download ebooks file Mining the Home Movie Excavations in Histories and Memories 1st Edition Karen L. Ishizuka all chapters
100% (4)
Download ebooks file Mining the Home Movie Excavations in Histories and Memories 1st Edition Karen L. Ishizuka all chapters
81 pages
Clearing House Lectureupdated - June11 2013
No ratings yet
Clearing House Lectureupdated - June11 2013
35 pages
IV.3. Structural Breakdown of Projects (WBS, PBS, OBS, RBS, CBS)
No ratings yet
IV.3. Structural Breakdown of Projects (WBS, PBS, OBS, RBS, CBS)
22 pages
Geography Project: Name-Class - Sec - Roll No. - Session
50% (2)
Geography Project: Name-Class - Sec - Roll No. - Session
8 pages
[FREE PDF sample] The Circular Economy A Wealth of Flows 2nd Edition Ken Webster ebooks
100% (2)
[FREE PDF sample] The Circular Economy A Wealth of Flows 2nd Edition Ken Webster ebooks
46 pages
DETRs With Collaborative Hybrid Assignments Training
No ratings yet
DETRs With Collaborative Hybrid Assignments Training
13 pages
Strategic Leadership: Michael A. Hitt R. Duane Ireland Robert E. Hoskisson
100% (1)
Strategic Leadership: Michael A. Hitt R. Duane Ireland Robert E. Hoskisson
29 pages
Ready Referencer On Standards-13.12.2019
No ratings yet
Ready Referencer On Standards-13.12.2019
15 pages
Question-1394076 240621 183227
No ratings yet
Question-1394076 240621 183227
12 pages
IWAKI MX-250,505 Manual
No ratings yet
IWAKI MX-250,505 Manual
44 pages
All Questions and Answers Are Based On The 2011 NEC.: Electrical Construction and Maintenance
No ratings yet
All Questions and Answers Are Based On The 2011 NEC.: Electrical Construction and Maintenance
2 pages
BVF3184 Topic 4 Part 1 - Boiler Components
No ratings yet
BVF3184 Topic 4 Part 1 - Boiler Components
44 pages
They Are Factors To Consider in The Production of A Requirement Which Includes
No ratings yet
They Are Factors To Consider in The Production of A Requirement Which Includes
6 pages
Autobiography of A Coin Essay
No ratings yet
Autobiography of A Coin Essay
6 pages
g
No ratings yet
g
1 page
Standard Operating Procedure For Safe Handling of Liquid Nitrogen
No ratings yet
Standard Operating Procedure For Safe Handling of Liquid Nitrogen
3 pages
Revised Questionnaire
No ratings yet
Revised Questionnaire
3 pages
HW3 PDF
No ratings yet
HW3 PDF
3 pages
Eugene Schwartz
100% (3)
Eugene Schwartz
1 page
ANFIELD
No ratings yet
ANFIELD
14 pages
Convergences and Discord in The Correspondence Between Ligeti and Adorno
100% (1)
Convergences and Discord in The Correspondence Between Ligeti and Adorno
31 pages
Perancangan Turap Dengan Angkur - ENG PDF
No ratings yet
Perancangan Turap Dengan Angkur - ENG PDF
48 pages

MLS - Logistic Regression

Uploaded by

MLS - Logistic Regression

Uploaded by

shaamsolanki@gmail.

This file is meant for personal use by [email protected] only.

4. How do we choose the optimal threshold value?

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

Here Accuracy will be 98%, even if we predict all

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

We can choose a threshold

[email protected] If it is less than 0.5, then

AUC = Area under the ROC Curve

This file is meant for personal use by [email protected] only.

Odds Ratio (OR):

1. It is a plot between TPR(True Positive We can choose threshold based on

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

You might also like