ML-Lecture-11-Evaluation

This document outlines evaluation metrics for classification in machine learning, including confusion matrix, accuracy, precision, recall, specificity, F1 score, ROC curve, and AUC score. It emphasizes the importance of using appropriate metrics depending on class balance, particularly in cases like cancer detection where high accuracy may be misleading. The document also includes practical assignments and resources for further learning on these metrics.

Uploaded by

Shohanur Rahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views17 pages

ML-Lecture-11-Evaluation

Uploaded by

Shohanur Rahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Machine Learning

Lecture 11: Evaluation Metrics for Classification

COURSE CODE: CSE451
2023
Course Teacher
Dr. Mrinal Kanti Baowaly
Associate Professor
Department of Computer Science and
Engineering, Bangabandhu Sheikh
Mujibur Rahman Science and
Technology University, Bangladesh.

Email: [email protected]
Common Evaluation Metrics for
Classification
1. Confusion Matrix
2. Accuracy
3. Precision
4. Recall/𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦
5. Specificity
6. F1 Score
7. ROC (Receiver Operating Characteristics) Curve
8. AUC (Area Under the ROC curve) Score
Confusion Matrix
 A confusion matrix is a table that describes
the performance of a classification model
on the test data
 It is an N X N matrix, where N is the number
of classes being predicted
 Each row of the matrix represents the
instances in a predicted class while each
column represents the instances in an
actual class (and vice versa).
Terms associated with Confusion matrix
 True Positives : The cases in which the model
predicted 1(True) and the actual output was
also 1(True).
 True Negatives : The cases in which the model
predicted 0(False) and the actual output was
also 0(False).
 False Positives : The cases in which the model
predicted 1(True) and the actual output was
0(False).
 False Negatives : The cases in which the model
predicted 0(False) and the actual output was
1(True).
Accuracy
 It is the ratio of number of correct predictions to the total
number of input samples (predictions).
𝑁𝑜. 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠

𝑇𝑃 + 𝑇𝑁
=
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁

 It is the most commonly used metric to judge a model

and is a good measure when the target variable classes in
the data are nearly balanced.
 It should NEVER be used as a measure when the target
Accuracy = 93%
classes are imbalanced. Error = 7%
Precision
 Out of all the positive classes we have predicted,
how many are actually positive
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃

55
= = 0.9649
57

Accuracy = 93%
Error = 7%
Recall/Sensitivity
 Out of all the positive classes, how many are
predicted correctly
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁

55
= = 0.9166
60

Accuracy = 93%
Error = 7%
Specificity
 Out of all the negative classes, how many are
predicted correctly
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝐹𝑃 + 𝑇𝑁
38
= = 0.95
40

Accuracy = 93%
Error = 7%
F1 Score
 Harmonic mean of the Precision and Recall
F1 =

F1 = 0.94

 It makes a balance between Precision and Recall

 Rather than measure recall and precision every Accuracy = 93%
Error = 7%
time, it would be easier to use a single F1 score
 It is a better choice when the target classes are
imbalanced
HW: Why classification accuracy is not
enough?
Hints:
 Suppose you have the problem of detecting cancer. You
have two classes for that:
1.Having cancer, the positive class, denoted by 1
2.No cancer, the negative class, denoted by 0
Lets assume that you have 1000 patient records. The
Accuracy = 0.994
confusion matrix of a predictive model is as in the right Error = 0.006
side. F1 Score = 0.249
It yields very high accuracy (99.4%) but fails to detect the
patients with cancer (4 out of 5). F1 score can be a proper
metric in this case of imbalanced target classes.
ROC (Receiver Operating Characteristics)
Curve
 A ROC is a graphical plot that is used as a performance
measurement for classification problem
 The ROC curve is created by plotting the true positive
rate (TPR) against the false positive rate (FPR) at various
threshold settings
𝑇𝑃
𝑇𝑃𝑅 = 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑇𝑃+𝐹𝑁
𝑇𝑁 𝐹𝑃
𝐹𝑃𝑅 = 1 − 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 1 − =
𝐹𝑃+𝑇𝑁 𝐹𝑃+𝑇𝑁
 It tells how much model is capable of distinguishing
between classes (i.e. Separability/Discrimination capacity)
AUC (Area Under the ROC curve) Score
 The AUC is the area under the ROC curve.
 This score gives us a good idea of how well the model
performances.
 AUC Score ranges 0 to 1
 An ideal model has AUC near to the 1 which means it has
excellent discrimination capacity.
 An poor model has AUC near to the 0.5 which means it has
no discrimination capacity.
 When AUC is approximately 0, model is actually reciprocating
the classes. It means the model is predicting negative class as
a positive class and vice versa (Worst model).
Example: Confusion Matrix
# import confusion matrix
from sklearn.metrics import confusion_matrix
# actual values
actual = [1,0,0,1,0,0,1,0,1,1]
# predicted values
predicted = [1,0,0,1,0,0,1,1,0,0]
# confusion matrix
matrix = confusion_matrix(actual, predicted, labels=[1,0])
print('Confusion matrix : \n',matrix)
# outcome values order in sklearn
TP,FN,FP,TN = matrix.reshape(-1)
print('Outcome values : \n', TP,FN,FP,TN)
Assignment: How to Use Various Metrics
in Classification Problems?
1. Let us investigate the Lung Cancer Dataset from here:
https://www.kaggle.com/datasets/thedevastator/cancer-patients-and-air-
pollution-a-new-link
2. There are 1000 items (patients) and 24 predictor variables (age, gender, air
pollution exposure, alcohol use, dust allergy, etc.) without index and ID. The
variable (level) to predict the risk of lung cancer is encoded as 0 and 1 where
0 means low risk of lung cancer and 1 means medium or high risk of lung
cancer.
3. Build a binary classification model to predict the risk of lung cancer (0, 1) of
the patients. Estimate and compare Accuracy, Precision, Recall, Specificity, F1
Score and AUC Score to evaluate the performance of the model. And plot the
ROC curve also.
Evaluation Metrics for Multi-class
Classification
 Micro-averaged Precision is calculated as
precision of total values.
 Macro-averaged Precision is calculated as an
average of Precisions of all classes.
 Weighted-averaged Precision is also calculated
based on Precision per class but takes into
account the number of samples of each class in
the data
 HW: Find which type of averaging is preferable?
Source link: Maria Khalusova
Some Learning Materials
AnalyticsVidhya: How to Choose Evaluation Metrics for Classification
Models
RitchieNg: Evaluating a Classification Model
TowardsDatascience: Various ways to evaluate a machine learning
model’s performance
Understanding Micro, Macro, and Weighted Averages for Scikit-Learn
metrics in multi-class classification with example

Notes 03
No ratings yet
Notes 03
38 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Performance Metrics Classification (1)
No ratings yet
Performance Metrics Classification (1)
39 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Confusion Matrix
No ratings yet
Confusion Matrix
16 pages
l09_machine_learning
No ratings yet
l09_machine_learning
39 pages
Lecture-(3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture-(3-4) Evaluation Metrices Classification and Regression
28 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
W6 CSE 4781 Classification Metrics
No ratings yet
W6 CSE 4781 Classification Metrics
28 pages
6814911878
No ratings yet
6814911878
72 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Performance Parameters
No ratings yet
Performance Parameters
23 pages
Lect_02_Evaluation_Part_1
No ratings yet
Lect_02_Evaluation_Part_1
33 pages
Machine_Learning_II
No ratings yet
Machine_Learning_II
61 pages
lecture11evaluationmetricsforclassification-240913060639-0c766554
No ratings yet
lecture11evaluationmetricsforclassification-240913060639-0c766554
28 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Classification Metrics.pptx
No ratings yet
Classification Metrics.pptx
39 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Industrial Training Evaluation Rubrics
100% (1)
Industrial Training Evaluation Rubrics
3 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
EvaluationMatrix
No ratings yet
EvaluationMatrix
29 pages
Lecture -3
No ratings yet
Lecture -3
24 pages
March_3rd&4th
No ratings yet
March_3rd&4th
19 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
WINSEM2024-25_CBS3006_ETH_VL2024250505168_2025-01-09_Reference-Material-IV
No ratings yet
WINSEM2024-25_CBS3006_ETH_VL2024250505168_2025-01-09_Reference-Material-IV
20 pages
Lec_4_ML_S4_Evaluation_Metrics
No ratings yet
Lec_4_ML_S4_Evaluation_Metrics
29 pages
DL_IT324a_4
No ratings yet
DL_IT324a_4
52 pages
Classification Matrics
No ratings yet
Classification Matrics
18 pages
Exp7_MLAI2
No ratings yet
Exp7_MLAI2
8 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
ML-Lecture-12 (Evaluation Metrics For Classification)
No ratings yet
ML-Lecture-12 (Evaluation Metrics For Classification)
15 pages
Classification Metrics
No ratings yet
Classification Metrics
24 pages
Mange Ment
No ratings yet
Mange Ment
478 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
06-FSSR_DS610_2024=2025T1_ٍMetrics
No ratings yet
06-FSSR_DS610_2024=2025T1_ٍMetrics
24 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Confusion Matrix V 2.0
No ratings yet
Confusion Matrix V 2.0
14 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Approval
No ratings yet
Approval
2 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
360-Degree!: The Art of Leading From The Middle
100% (1)
360-Degree!: The Art of Leading From The Middle
34 pages
UNIT-3
No ratings yet
UNIT-3
13 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
ML-Lecture-8-9-Classification
No ratings yet
ML-Lecture-8-9-Classification
35 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
ML-Lecture-2-3-Types
No ratings yet
ML-Lecture-2-3-Types
27 pages
Instruction & Option Choice
No ratings yet
Instruction & Option Choice
6 pages
Assignment2
No ratings yet
Assignment2
10 pages
Teaching Competency Standards in Southeast Asian Countries
100% (2)
Teaching Competency Standards in Southeast Asian Countries
105 pages
B4 1-Seidel
No ratings yet
B4 1-Seidel
28 pages
Intro To Microprocessor
No ratings yet
Intro To Microprocessor
26 pages
Lec01 Intro
No ratings yet
Lec01 Intro
27 pages
Testing
No ratings yet
Testing
61 pages
ThinkNEXT Brochure
No ratings yet
ThinkNEXT Brochure
19 pages
Project Management
No ratings yet
Project Management
25 pages
Lec05 System Modeling Part2
No ratings yet
Lec05 System Modeling Part2
21 pages
The West Bengal Madrasah Service Commission: Child Development and Pedagogy
No ratings yet
The West Bengal Madrasah Service Commission: Child Development and Pedagogy
4 pages
LearnJam - LearningDesignPrinciples
No ratings yet
LearnJam - LearningDesignPrinciples
27 pages
Vocational Hope and Vocational Identity: Urban Adolescents' Career Development
No ratings yet
Vocational Hope and Vocational Identity: Urban Adolescents' Career Development
21 pages
ML-Lecture-14-SVM
No ratings yet
ML-Lecture-14-SVM
15 pages
Observer
No ratings yet
Observer
12 pages
ML-Lecture-12-NB
No ratings yet
ML-Lecture-12-NB
15 pages
ML-Lecture-1-Intro
No ratings yet
ML-Lecture-1-Intro
21 pages
TalentPro Detail Steps
No ratings yet
TalentPro Detail Steps
9 pages
Factory
No ratings yet
Factory
7 pages
WEEK 1 DSGN8290 Digital Design and User Interaction
No ratings yet
WEEK 1 DSGN8290 Digital Design and User Interaction
14 pages
Research Paper
No ratings yet
Research Paper
7 pages
Organization of 8086
No ratings yet
Organization of 8086
22 pages
ML-Lecture-13-KNN
No ratings yet
ML-Lecture-13-KNN
14 pages
RESUME Shahrukh Khan
No ratings yet
RESUME Shahrukh Khan
3 pages
06 2023 Rimc Eng
No ratings yet
06 2023 Rimc Eng
2 pages
Lec02 Process Model
No ratings yet
Lec02 Process Model
37 pages
EDCOM Report by RM Villenes (Proj in EdM514)
88% (8)
EDCOM Report by RM Villenes (Proj in EdM514)
49 pages
Major Stars 3 - Unit 3 - Lesson 3
No ratings yet
Major Stars 3 - Unit 3 - Lesson 3
5 pages
Image Fish
No ratings yet
Image Fish
4 pages
Lec03 Agile
No ratings yet
Lec03 Agile
28 pages
Gopi Krishna
No ratings yet
Gopi Krishna
2 pages
Grammar Solutions
No ratings yet
Grammar Solutions
3 pages
Activity Case
No ratings yet
Activity Case
34 pages
Essay Review: The Roots of Biological Determinism: Biology Department Washington University St. Louis, Missouri 63130
No ratings yet
Essay Review: The Roots of Biological Determinism: Biology Department Washington University St. Louis, Missouri 63130
5 pages
How To Teach Vocabulary
No ratings yet
How To Teach Vocabulary
3 pages
PSYC62 Syllabus Winter 2023
No ratings yet
PSYC62 Syllabus Winter 2023
9 pages
Broken Home: How Does It Affect To Academic Achievements of Students?
No ratings yet
Broken Home: How Does It Affect To Academic Achievements of Students?
2 pages
IsiZulu HL P3 May June 2022
No ratings yet
IsiZulu HL P3 May June 2022
7 pages
Lesson 01 Greeting
No ratings yet
Lesson 01 Greeting
2 pages
Curriculum Vitae of Semba Bwalya Chitibwi
No ratings yet
Curriculum Vitae of Semba Bwalya Chitibwi
4 pages
Punctuality Essay For Students and Children
No ratings yet
Punctuality Essay For Students and Children
3 pages
FWP
No ratings yet
FWP
37 pages
Elective Paper V &vi - Ug Updated
No ratings yet
Elective Paper V &vi - Ug Updated
19 pages
Competency Bank
No ratings yet
Competency Bank
8 pages
Statement Ratings: (Lowest) (Highest)
100% (1)
Statement Ratings: (Lowest) (Highest)
3 pages
Foundation Course - Indian Literature - English
No ratings yet
Foundation Course - Indian Literature - English
5 pages
Educated Unemployed: A Challenge Before Sustainable Education
No ratings yet
Educated Unemployed: A Challenge Before Sustainable Education
8 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet

ML-Lecture-11-Evaluation

Uploaded by

ML-Lecture-11-Evaluation

Uploaded by

Machine Learning

Lecture 11: Evaluation Metrics for Classification

 It is the most commonly used metric to judge a model

 It makes a balance between Precision and Recall

You might also like