0% found this document useful (0 votes)

87 views

Binary ClassificationMetrics Cheathsheet

This document discusses various metrics for evaluating binary classification models, including false positive rate, false negative rate, false discovery rate, true negative rate, negative predictive value, log loss, Brier score, ROC AUC score, and precision-recall AUC. It provides explanations of when each metric should be used and how to calculate them using scikit-learn.

Uploaded by

Pedro Pereira

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views

Binary ClassificationMetrics Cheathsheet

Uploaded by

Pedro Pereira

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Binary classiﬁcation metrics 1/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use

log post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use Them)
Blogpost:

False positive rate | Type-I error False negative rate | Type-II error False discovery rate

When to use it Explanation When to use it Explanation When to use it Explanation

Usually, it is not used alone but rather How many false alerts your Usually, it is not used alone but rather How often your model misses - Reversed precision (1-
with some other metric, model raises with some other metric, trully fraudulent transactions. precision).
If the cost of dealing with an alert is If the cost of letting the fraudulent
high you should consider increasing the transactions through is high and the
threshold to get fewer alerts. value you get from the users isn’t you
Formula can consider focusing on this number. Formula Formula

How to calculate How to calculate How to calculate

from sklearn.metrics import confusion_matrix from sklearn.metrics import confusion_matrix from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() false_negative_rate = fn / (tp + fn) false_discovery_rate = fp / (tp + fp)
false_positive_rate = fp / (fp + tn)

True negative rate | Speciﬁcity Negative predictive value

When to use it Explanation When to use it Explanation

Usually, you don’t use it alone but rather as an auxiliary metric, Think of it as recall for negative class. Usually, you don’t use it alone but rather as an auxiliary metric, Think of it as precision for negative class.
When you really want to be sure that you are right when you say When we care about high precision on negative predictions. For
something is safe. A typical example would be a doctor telling a example, imagine we really don’t want to have any additional process
patient “you are healthy”. Making a mistake here and telling a sick for screening the transactions predicted as clean. In that case, we
person they are safe and can go home is something you may want to may want to make sure that our negative predictive value is high.
avoid. Formula Formula

How to calculate How to calculate

from sklearn.metrics import confusion_matrix from sklearn.metrics import confusion_matrix

tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
true_negative_rate = tn / (tn + fp) negative_predictive_value = tn / (tn + fn)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Binary classiﬁcation metrics 2/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use Them)
Blogpost:

Log loss Brier score (loss)

When to use it Explanation When to use it Explanation

Usually used as objective not metric. Averaged difference between ground truth and When you care about calibrated probabilities. Mean squared error between ground truth and
logarithm of predicted score for every predicted score.
observation. Heavily penalizes when the model
is conﬁdent about something yet wrong.

Formula Formula

How to calculate How to calculate

from sklearn.metrics import log_loss from sklearn.metrics import brier_score_loss

loss = log_loss(y_true, y_pred) brier = brier_score_loss(y_true, y_pred[:,1])

ROC AUC score Precison-Recall AUC | Average precision

When to use it Explanation When to use it Explanation

You should use it when you ultimately care about ranking predictions. Rank correlation between predictions and When you want to communicate precision/recall decision to other What is the average precision over all recall
You should not use it when your data is heavily imbalanced. target. Tells you how good at ranking stakeholders and want to choose the threshold that ﬁts the business values.
You should use it when you care equally about positive and negative predictions (positive over negative) your model problem.
class. is. When your data is heavily imbalanced.
When you care more about positive than negative class.
Formula Formula

Area under ROC curve Area under Precision-Recall curve

How to calculate How to calculate

from sklearn.metrics import roc_auc_score from sklearn.metrics import average_precision_score

roc_auc = roc_auc_score(y_true, y_pred_pos) avg_precision = average_precision_score(y_true, y_pred)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Binary classiﬁcation metrics 3/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use Them)
Blogpost:

Kolmogorov-Smirnov statistics F beta F1 score

When to use it Explanation When to use it Explanation When to use it Explanation

when your problem is about sorting/ It helps to assess the separation When you want to combine precision Geometrically averaged Pretty much in every binary Geometric average of precision
prioritizing the most relevant between prediction distributions and recall in one metric and would like precision and recall with a classiﬁcation problem. and recall.
observations and you care equally for positive and negative class. to be able to adjust how much focus weight beta. ( 0<beta<1 favours It is my go-to metric when working on
about positive and negative class. you put on one or the other. precision; beta>1 favours recall ) those problems.

Formula Formula Formula

Max distance between KS curves

How to calculate How to calculate How to calculate

from scikitplot.helpers import binary_ks_curve from sklearn.metrics import fbeta_score from sklearn.metrics import fbeta_score
fbeta_score(y_true, y_pred_class, beta) fbeta_score(y_true, y_pred_class, beta=1)
res = binary_ks_curve(y_true, y_pred[:, 1])
ks_stat = res[3]

F2 score Cohen Kappa Matthews correlation coeﬃcient

When to use it Explanation When to use it Explanation When to use it Explanation

When recalling positive observations Geometric average of precison When you want to combine precision How much better is your model Pretty much in every binary Correlation between predicted
(fraudulent transactions) is more and recall with twice as much and recall in one metric and would like over the random classiﬁer that classiﬁcation problem. classes and ground truth.
important than being precise about it weight on recall. to be able to adjust how much focus predicts based on class It is my go-to metric when working on
but you still want to have a nice and you put on one or the other. frequencies. those problems.
simple metric that combines precision
and recall. Formula Formula Formula

How to calculate How to calculate How to calculate

from sklearn.metrics import fbeta_score from sklearn.metrics import cohen_kappa_score from sklearn.metrics import matthews_corrcoef
fbeta_score(y_true, y_pred_class, beta=2) cohen_kappa_score(y_true, y_pred_class) matthews_corrcoef(y_true, y_pred_class)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Binary classiﬁcation metrics 4/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use Them)
Blogpost:

True positive rate | Recall | Sensitivity Positive predictive value | Precision

When to use it Explanation When to use it Explanation

Usually, you will not use it alone but rather coupled with other metrics Put all guilty in prison. Again, it usually doesn’t make sense to use it alone but rather Make sure that people that go to prison are
like precision. coupled with other metrics like recall. guilty.
That being said recall is a go-to metric, when you really care about When raising false alerts is costly and you really want all the positive
catching all fraudulent transactions even at a cost of false alerts. predictions to be worth looking at you should optimize for precision.
Potentially it is cheap for you to process those alerts and very
expensive when the transaction goes unseen. Formula Formula

How to calculate How to calculate

from sklearn.metrics import recall_score from sklearn.metrics import precision_score

recall_score(y_true, y_pred_class) precision_score(y_true, y_pred_class)

Accuracy

When to use it Explanation

When your problem is balanced using How good at classyfying both

accuracy is usually a good start. positive and negative
When every class is equally important cases your model is.
to you.

Formula

How to calculate

from sklearn.metrics import accuracy_score

accuracy_score(y_true, y_pred_class)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Binary classiﬁcation metrics 5/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use Them)
Blogpost:

Confusion Matrix

When to use it Explanation

Pretty much always. Table that contains true negative (tn), false positive (fp),
Get the gist of model imbalance and where the false negative (fn), and true positive (tp) predictions.
predictions fall.

How to calculate

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true, y_pred_class)
tn, fp, fn, tp = cm.ravel()

Cumulative gain chart

When to use it Explanation

Whenever you want to use your model to choose the best In simple words, it helps you gauge how much you gain by
customers/transactions to target by sorting all using your model over a random model for a given fraction
predictions you should consider using cumulative gain of top scored predictions.
charts.

How to calculate

from scikitplot.metrics import plot_cumulative_gain

fig, ax = plt.subplots()
plot_cumulative_gain(y_true, y_pred, ax=ax)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Binary classiﬁcation metrics 6/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use Them)
Blogpost:

Lift curve

When to use it Explanation

How to calculate

from scikitplot.metrics import plot_lift_curve

fig, ax = plt.subplots()
plot_lift_curve(y_true, y_pred, ax=ax)

Kolmogorov-Smirnov chart

When to use it Explanation

When your problem is about sorting/prioritizing the most It helps to assess the separation between prediction
relevant observations and you care equally about positive distributions for positive and negative class.
and negative class.
So it works similarly to Cumulative gain chart but instead of
just looking at positive class it looks at the separation
between positive and negative class.

How to calculate

from scikitplot.metrics import plot_ks_statistic

fig, ax = plt.subplots()
plot_ks_statistic(y_true, y_pred, ax=ax)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Binary classiﬁcation metrics 7/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use Them)
Blogpost:

ROC curve

When to use it Explanation

You should use it when you ultimately care about ranking It is a chart that visualizes the tradeoff between true
predictions. positive rate (TPR) and false positive rate (FPR). Basically,
You should not use it when your data is heavily for every threshold, we calculate TPR and FPR and plot it on
imbalanced. one chart.
You should use it when you care equally about positive
and negative class.

How to calculate

from scikitplot.metrics import plot_roc

fig, ax = plt.subplots()
plot_roc(y_true, y_pred, ax=ax)

Precision-Recall curve

When to use it Explanation

It is a curve that combines precision (PPV) and Recall When you want to communicate precision/recall decision to
(TPR) in a single visualization. For every threshold, you other stakeholders and want to choose the threshold that
calculate PPV and TPR and plot it. The higher on y-axis ﬁts the business problem.
your curve is the better your model performance. When your data is heavily imbalanced.
When you care more about positive than negative class.

How to calculate

from scikitplot.metrics import plot_precision_recall

fig, ax = plt.subplots()
plot_precision_recall(y_true, y_pred, ax=ax)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Hfe Onkyo TX-sr607 Service Manual
60% (5)
Hfe Onkyo TX-sr607 Service Manual
128 pages
CATIA - Composites Design For Manufacturing (CPM)
100% (2)
CATIA - Composites Design For Manufacturing (CPM)
4 pages
VLSM Exercises 107510
No ratings yet
VLSM Exercises 107510
12 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
No ratings yet
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
9 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
The 5 Classification Evaluation Metrics Every Data Scientist Must Know PDF
No ratings yet
The 5 Classification Evaluation Metrics Every Data Scientist Must Know PDF
22 pages
Lecture -3
No ratings yet
Lecture -3
24 pages
chapter 5 Model Evaluation
No ratings yet
chapter 5 Model Evaluation
21 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Lecture-(3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture-(3-4) Evaluation Metrices Classification and Regression
28 pages
Lecture 10
No ratings yet
Lecture 10
16 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Confusion Matrix
No ratings yet
Confusion Matrix
18 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Confusion Matrix and Classification Evaluation Metrics
No ratings yet
Confusion Matrix and Classification Evaluation Metrics
16 pages
How To Calculate Precision, Recall, and F-Measure For Imbalanced Classification
No ratings yet
How To Calculate Precision, Recall, and F-Measure For Imbalanced Classification
19 pages
Evaluation Measures for Machine Learning Models
No ratings yet
Evaluation Measures for Machine Learning Models
6 pages
WINSEM2024-25_CBS3006_ETH_VL2024250505168_2025-01-09_Reference-Material-IV
No ratings yet
WINSEM2024-25_CBS3006_ETH_VL2024250505168_2025-01-09_Reference-Material-IV
20 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
Understanding the Confusion Matrix in Machine Learning
No ratings yet
Understanding the Confusion Matrix in Machine Learning
4 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Accuracy, Precision, Recall or F1 - Towards Data Science
No ratings yet
Accuracy, Precision, Recall or F1 - Towards Data Science
9 pages
F1 Score Vs ROC AUC Vs Accuracy Vs PR AUC Which Evaluation Metric Should You Choose - Neptune - Ai
No ratings yet
F1 Score Vs ROC AUC Vs Accuracy Vs PR AUC Which Evaluation Metric Should You Choose - Neptune - Ai
1 page
CSL0777 L06
No ratings yet
CSL0777 L06
24 pages
Evaluation
No ratings yet
Evaluation
18 pages
Exp7_MLAI2
No ratings yet
Exp7_MLAI2
8 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Evaluation Metrics and Statistical Tests For Machi
No ratings yet
Evaluation Metrics and Statistical Tests For Machi
15 pages
Confusion Matrix
No ratings yet
Confusion Matrix
42 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Lec_4_ML_S4_Evaluation_Metrics
No ratings yet
Lec_4_ML_S4_Evaluation_Metrics
29 pages
s41598-024-56706-x
No ratings yet
s41598-024-56706-x
14 pages
Module 6
No ratings yet
Module 6
24 pages
CH 4
No ratings yet
CH 4
9 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Handling Imbalanced Data
No ratings yet
Handling Imbalanced Data
21 pages
Lec 8
No ratings yet
Lec 8
35 pages
Module 8 - PDF
No ratings yet
Module 8 - PDF
51 pages
Classification Metrics
No ratings yet
Classification Metrics
24 pages
Machine_Learning_II
No ratings yet
Machine_Learning_II
61 pages
9__ROC__AUC
No ratings yet
9__ROC__AUC
27 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
261 SIMS Strategy and Insight Development March 30 2010
No ratings yet
261 SIMS Strategy and Insight Development March 30 2010
14 pages
Smart-Car-Parking-Reservation-System-for-Establishments
No ratings yet
Smart-Car-Parking-Reservation-System-for-Establishments
90 pages
Analysis of The Need To Implement Automated Systems For Managing Electric Vehicle Depots
No ratings yet
Analysis of The Need To Implement Automated Systems For Managing Electric Vehicle Depots
6 pages
Appendix A A1-Competitor Analysis
No ratings yet
Appendix A A1-Competitor Analysis
1 page
Release Notes Corescanner Driver For Windows
No ratings yet
Release Notes Corescanner Driver For Windows
7 pages
IQ - Team Freeze Tag - L1 - Competition Activity
No ratings yet
IQ - Team Freeze Tag - L1 - Competition Activity
2 pages
Chapter 3 - Developing - A - Cost - Effective - Strategy - For - Wireless - Communication
No ratings yet
Chapter 3 - Developing - A - Cost - Effective - Strategy - For - Wireless - Communication
9 pages
Boarding Pass
No ratings yet
Boarding Pass
7 pages
Landstar Log
No ratings yet
Landstar Log
2 pages
Vollmann and Buffa (1966)
No ratings yet
Vollmann and Buffa (1966)
20 pages
PROC COMPARE
No ratings yet
PROC COMPARE
3 pages
Nine Billion Names of God Theater Adaption
No ratings yet
Nine Billion Names of God Theater Adaption
9 pages
51 52 03 41 hc900 Io Modules Jan23
No ratings yet
51 52 03 41 hc900 Io Modules Jan23
44 pages
PDF Concurrency by Tutorials Second Edition Multithreading in Swift With GCD and Operations Tutorial Team Download
100% (4)
PDF Concurrency by Tutorials Second Edition Multithreading in Swift With GCD and Operations Tutorial Team Download
62 pages
Text 678
No ratings yet
Text 678
10 pages
Consumer Behaviour Regarding Telecom Service Provider in India
No ratings yet
Consumer Behaviour Regarding Telecom Service Provider in India
80 pages
Sinbeam Design Software: General Use
No ratings yet
Sinbeam Design Software: General Use
5 pages
MM PDF
No ratings yet
MM PDF
228 pages
Master Writer Slave Receiver
No ratings yet
Master Writer Slave Receiver
3 pages
Bitscope Display Diagnostic User Guide
No ratings yet
Bitscope Display Diagnostic User Guide
4 pages
CCNA 200-301 Official Cert Guide, Volume 2-16
No ratings yet
CCNA 200-301 Official Cert Guide, Volume 2-16
3 pages
Lab Paper Final Term 20011598-138
No ratings yet
Lab Paper Final Term 20011598-138
31 pages
Network Solution Proposal
No ratings yet
Network Solution Proposal
10 pages
Statistical Quality Control
No ratings yet
Statistical Quality Control
9 pages
Basic Calculus: Learning Activity Sheet No. 6 Continuity and Differentiability of A Function
No ratings yet
Basic Calculus: Learning Activity Sheet No. 6 Continuity and Differentiability of A Function
11 pages
Gis Interview Question
No ratings yet
Gis Interview Question
4 pages

Binary ClassificationMetrics Cheathsheet

Uploaded by

Binary ClassificationMetrics Cheathsheet

Uploaded by

Binary classiﬁcation metrics 1/7 based on blog post 22 Evaluation Metrics for Binary Classiﬁcation (And When to Use

When to use it Explanation When to use it Explanation When to use it Explanation

How to calculate How to calculate How to calculate

True negative rate | Speciﬁcity Negative predictive value

When to use it Explanation When to use it Explanation

How to calculate How to calculate

from sklearn.metrics import confusion_matrix from sklearn.metrics import confusion_matrix

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Log loss Brier score (loss)

When to use it Explanation When to use it Explanation

How to calculate How to calculate

from sklearn.metrics import log_loss from sklearn.metrics import brier_score_loss

loss = log_loss(y_true, y_pred) brier = brier_score_loss(y_true, y_pred[:,1])

ROC AUC score Precison-Recall AUC | Average precision

When to use it Explanation When to use it Explanation

Area under ROC curve Area under Precision-Recall curve

How to calculate How to calculate

from sklearn.metrics import roc_auc_score from sklearn.metrics import average_precision_score

roc_auc = roc_auc_score(y_true, y_pred_pos) avg_precision = average_precision_score(y_true, y_pred)

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

Kolmogorov-Smirnov statistics F beta F1 score

When to use it Explanation When to use it Explanation When to use it Explanation

Formula Formula Formula

Max distance between KS curves

How to calculate How to calculate How to calculate

F2 score Cohen Kappa Matthews correlation coeﬃcient

When to use it Explanation When to use it Explanation When to use it Explanation

How to calculate How to calculate How to calculate

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

True positive rate | Recall | Sensitivity Positive predictive value | Precision

When to use it Explanation When to use it Explanation

How to calculate How to calculate

from sklearn.metrics import recall_score from sklearn.metrics import precision_score

When to use it Explanation

When your problem is balanced using How good at classyfying both

from sklearn.metrics import accuracy_score

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

When to use it Explanation

from sklearn.metrics import confusion_matrix

Cumulative gain chart

When to use it Explanation

from scikitplot.metrics import plot_cumulative_gain

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

When to use it Explanation

from scikitplot.metrics import plot_lift_curve

When to use it Explanation

from scikitplot.metrics import plot_ks_statistic

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

When to use it Explanation

from scikitplot.metrics import plot_roc

When to use it Explanation

from scikitplot.metrics import plot_precision_recall

neptune.ml We bring collaboration to data science projects. @NeptuneML /neptune-ml [email protected]

You might also like