0% found this document useful (0 votes)

198 views

Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On

The document discusses different performance metrics for machine learning models including accuracy, recall, precision, F-score, and specificity. It uses a diabetes prediction example to explain each metric. Accuracy measures the percentage of correct predictions out of all predictions. Recall measures the percentage of actual positive cases that are correctly identified. Precision measures the percentage of predicted positive cases that are actually positive. The document discusses how to choose the appropriate metric depending on whether false positives or false negatives are worse for the given problem and predictions.

Uploaded by

cidsant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views

Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On

Uploaded by

cidsant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Accuracy, Recall, Precision, F-Score &

Speci city, which to optimize on?
Based on your project, which performance metric to
improve on?

Salma Ghoneim Follow

Apr 2 · 5 min read

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 1/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 2/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

I will use a basic example to explain each performance metric on in

order for you to really understand the difference between each one of
them. So that in your next ML project you can choose which
performance metric to improve on that best suits your project.

Here we go
A school is running a machine learning primary diabetes scan on all of
its students.
The output is either diabetic (+ve) or healthy (-ve).

There are only 4 cases any student X could end up with.

We’ll be using the following as a reference later, So don’t hesitate to re-
read it if you get confused.

• True positive (TP): Prediction is +ve and X is diabetic, we want

that

• True negative (TN): Prediction is -ve and X is healthy, we want that

too

• False positive (FP): Prediction is +ve and X is healthy, false alarm,

bad

• False negative (FN): Prediction is -ve and X is diabetic, the worst

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 3/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

To remember that, there are 2 tricks

- If it starts with True then the prediction was correct whether
diabetic or not, so true positive is a diabetic person correctly predicted
& a true negative is a healthy person correctly predicted.
Oppositely, if it starts with False then the prediction was incorrect,
so false positive is a healthy person incorrectly predicted as diabetic(+)
& a false negative is a diabetic person incorrectly predicted as
healthy(-).
- Positive or negative indicates the output of our program. While
true or false judges this output whether correct or incorrect.

Before I continue, true positives & true negatives are always good. we
love the news the word true brings. Which leaves false positives and
false negatives.
In our example, false positives are just a false alarm. In a 2nd more
detailed scan it’ll be corrected. But a false negative label, this means
that they think they’re healthy when they’re not, which is — in our
problem — the worst case of the 4.
Whether FP & FN are equally bad or if one of them is worse than the
other depends on your problem. This piece of information has a great
impact on your choice of the performance metric, So give it a thought
before you continue.

. . .

Which performance metric to choose?

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 4/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Accuracy
It’s the ratio of the correctly labeled subjects to the whole pool of
subjects.
Accuracy is the most intuitive one.
Accuracy answers the following question: How many students did
we correctly label out of all the students?
Accuracy = (TP+TN)/(TP+FP+FN+TN)
numerator: all correctly labeled subject (All trues)
denominator: all subjects

Precision
Precision is the ratio of the correctly +ve labeled by our program to all
+ve labeled.
Precision answers the following: How many of those who we
labeled as diabetic are actually diabetic?
Precision = TP/(TP+FP)
numerator: +ve labeled diabetic people.
denominator: all +ve labeled by our program (whether they’re diabetic
or not in reality).

Recall (aka Sensitivity)

Recall is the ratio of the correctly +ve labeled by our program to all
who are diabetic in reality.
Recall answers the following question: Of all the people who are
diabetic, how many of those we correctly predict?
Recall = TP/(TP+FN)
numerator: +ve labeled diabetic people.

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 5/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

denominator: all people who are diabetic (whether detected by our

program or not)

F1-score (aka F-Score / F-Measure)

F1 Score considers both precision and recall.
It is the harmonic mean(average) of the precision and recall.
F1 Score is best if there is some sort of balance between precision (p) &
recall (r) in the system. Oppositely F1 Score isn’t so high if one measure
is improved at the expense of the other.
For example, if P is 1 & R is 0, F1 score is 0.
F1 Score = 2*(Recall * Precision) / (Recall + Precision)

Specificity
Specificity is the correctly -ve labeled by the program to all who are
healthy in reality.
Specifity answers the following question: Of all the people who are
healthy, how many of those did we correctly predict?
Speci city = TN/(TN+FP)
numerator: -ve labeled healthy people.
denominator: all people who are healthy in reality (whether +ve or -ve
labeled)

. . .

General Notes

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 6/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Yes, accuracy is a great measure but only when you have symmetric
datasets (false negatives & false positives counts are close), also, false
negatives & false positives have similar costs.
If the cost of false positives and false negatives are different then F1 is
your savior. F1 is best if you have an uneven class distribution.

Precision is how sure you are of your true positives whilst recall is how
sure you are that you are not missing any positives.

Choose Recall if the idea of false positives is far better than false
negatives, in other words, if the occurrence of false negatives is
unaccepted/intolerable, that you’d rather get some extra false
positives(false alarms) over saving some false negatives, like in our
diabetes example.
You’d rather get some healthy people labeled diabetic over leaving a
diabetic person labeled healthy.

Choose precision if you want to be more confident of your true

positives. for example, Spam emails. You’d rather have some spam
emails in your inbox rather than some regular emails in your spam box.
So, the email company wants to be extra sure that email Y is spam
before they put it in the spam box and you never get to see it.

Choose Specificity if you want to cover all true negatives, meaning

you don’t want any false alarms, you don’t want any false positives. for
example, you’re running a drug test in which all people who test
positive will immediately go to jail, you don’t want anyone drug-free
going to jail. False positives here are intolerable.

. . .

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 7/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Bottom Line is
— Accuracy value of 90% means that 1 of every 10 labels is incorrect,
and 9 is correct.
— Precision value of 80% means that on average, 2 of every 10 diabetic
labeled student by our program is healthy, and 8 is diabetic.
— Recall value is 70% means that 3 of every 10 diabetic people in
reality are missed by our program and 7 labeled as diabetic.
— Specificity value is 60% means that 4 of every 10 healthy people in
reality are miss-labeled as diabetic and 6 are correctly labeled as
healthy.

. . .

Confusion Matrix
Wikipedia will explain it better than me

In the field of machine learning and specifically the problem of statistical

classification, a confusion matrix, also known as an error matrix, is a
specific table layout that allows visualization of the performance of an
algorithm, typically a supervised learning one (in unsupervised learning it
is usually called a matching matrix). Each row of the matrix represents the
instances in a predicted class while each column represents the instances in
an actual class (or vice versa).The name stems from the fact that it makes
it easy to see if the system is confusing two classes (i.e. commonly
mislabeling one as another).

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 8/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

A nice & easy how-to of calculating a confusion matrix is here.

from sklearn.metrics import confusion_matrix

>>>tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1],
[1, 1, 1, 0]).ravel()
# true negatives, false positives, false negatives, true
positives
>>>(tn, fp, fn, tp)
(0, 2, 1, 1)

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 9/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 10/10

Luhmann Self Reference
100% (1)
Luhmann Self Reference
126 pages
Construction of An Achievement Test
91% (23)
Construction of An Achievement Test
11 pages
Ames Housing Price Prediction - Complete ML Project With Python
No ratings yet
Ames Housing Price Prediction - Complete ML Project With Python
14 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Descriptive Research - Research-Methodology
100% (1)
Descriptive Research - Research-Methodology
3 pages
Basic Concepts of Professionalism
No ratings yet
Basic Concepts of Professionalism
54 pages
Best Practices For Designing and Grading Exams
No ratings yet
Best Practices For Designing and Grading Exams
12 pages
9__ROC__AUC
No ratings yet
9__ROC__AUC
27 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Accuracy, Precision, Recall or F1 - Towards Data Science
No ratings yet
Accuracy, Precision, Recall or F1 - Towards Data Science
9 pages
Module 2
No ratings yet
Module 2
72 pages
Accuracy, Precision, Recall or F1
No ratings yet
Accuracy, Precision, Recall or F1
5 pages
21-General approach to classification, classification by decision tree induction-17-02-2025
No ratings yet
21-General approach to classification, classification by decision tree induction-17-02-2025
15 pages
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
No ratings yet
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
5 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Confusion Matrix: A Confusion Matrix Is A Summary of Prediction Results On A Classification Problem
No ratings yet
Confusion Matrix: A Confusion Matrix Is A Summary of Prediction Results On A Classification Problem
13 pages
CE880_Lecture6_slides
No ratings yet
CE880_Lecture6_slides
25 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Ch01_ICS422_03
No ratings yet
Ch01_ICS422_03
46 pages
Unit 7 - AI (Evaluation)
No ratings yet
Unit 7 - AI (Evaluation)
28 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Confusion Matrix
No ratings yet
Confusion Matrix
13 pages
10 Ai Evaluation tp01
No ratings yet
10 Ai Evaluation tp01
5 pages
Accuracy and error measures
No ratings yet
Accuracy and error measures
14 pages
Confusion Metrics
No ratings yet
Confusion Metrics
7 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Performance Measures
No ratings yet
Performance Measures
9 pages
Wa0013.
No ratings yet
Wa0013.
9 pages
LL Evaluationmatrics
No ratings yet
LL Evaluationmatrics
2 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Evaluation 2
No ratings yet
Evaluation 2
15 pages
Lecture-(3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture-(3-4) Evaluation Metrices Classification and Regression
28 pages
EVALUATION - notes
No ratings yet
EVALUATION - notes
15 pages
AI Project Evaluation 1
No ratings yet
AI Project Evaluation 1
5 pages
Confusion Matrix
No ratings yet
Confusion Matrix
18 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
MS EVALUATION WORKSHEET
No ratings yet
MS EVALUATION WORKSHEET
3 pages
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
No ratings yet
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
9 pages
Evaluation Questions.docx
No ratings yet
Evaluation Questions.docx
9 pages
Confusion Matrix
No ratings yet
Confusion Matrix
11 pages
Evaluation Notes
No ratings yet
Evaluation Notes
12 pages
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
No ratings yet
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
7 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
Confusion Matrix and Classification Evaluation Metrics
No ratings yet
Confusion Matrix and Classification Evaluation Metrics
16 pages
517-c-30072-Assignment Chapter Evaluation
No ratings yet
517-c-30072-Assignment Chapter Evaluation
10 pages
Performance Metrics Classification (1)
No ratings yet
Performance Metrics Classification (1)
39 pages
EVALUATION
No ratings yet
EVALUATION
10 pages
Deep Dive Into Confusion Matrix - Towards AI
No ratings yet
Deep Dive Into Confusion Matrix - Towards AI
9 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Evaluation Measures for Machine Learning Models
No ratings yet
Evaluation Measures for Machine Learning Models
6 pages
Evaluation Metrics: Anand Avati
No ratings yet
Evaluation Metrics: Anand Avati
31 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
Ch-EVALUATION
No ratings yet
Ch-EVALUATION
7 pages
1051637-Worksheet Part b Unit7 Evaluation
No ratings yet
1051637-Worksheet Part b Unit7 Evaluation
5 pages
A Joosr Guide to... Superforecasting by Philip Tetlock and Dan Gardner: The Art and Science of Prediction
From Everand
A Joosr Guide to... Superforecasting by Philip Tetlock and Dan Gardner: The Art and Science of Prediction
Joosr
No ratings yet
Healthcare Eq: A Clinician’S Guide
From Everand
Healthcare Eq: A Clinician’S Guide
Robert Driver, MD
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
An End-To-End Project On Time Series Analysis and Forecasting With Python
No ratings yet
An End-To-End Project On Time Series Analysis and Forecasting With Python
23 pages
A Tutorial On Bayesian Optimization
No ratings yet
A Tutorial On Bayesian Optimization
22 pages
Deploy A Machine Learning Model Using Flask - Towards Data Science
No ratings yet
Deploy A Machine Learning Model Using Flask - Towards Data Science
12 pages
A Machine Learning Framework For An Algorithmic Trading System PDF
No ratings yet
A Machine Learning Framework For An Algorithmic Trading System PDF
11 pages
Analysis of Stock Market Cycles With Fbprophet Package in Python PDF
50% (2)
Analysis of Stock Market Cycles With Fbprophet Package in Python PDF
10 pages
A Volatility Based Momentum Indicator For Traders - Steve Roehling - Medium
No ratings yet
A Volatility Based Momentum Indicator For Traders - Steve Roehling - Medium
7 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
Analyze A Soccer Game Using Tensorflow Object Detection and OpenCV PDF
No ratings yet
Analyze A Soccer Game Using Tensorflow Object Detection and OpenCV PDF
6 pages
Use C# and ML - Net Machine Learning To Predict Taxi Fares in New York
No ratings yet
Use C# and ML - Net Machine Learning To Predict Taxi Fares in New York
19 pages
Microsoft Machine Learning Algorithm Cheat Sheet v7
No ratings yet
Microsoft Machine Learning Algorithm Cheat Sheet v7
1 page
A Hands-On Introduction To Time Series Classification (With Python Code)
No ratings yet
A Hands-On Introduction To Time Series Classification (With Python Code)
20 pages
Cheatsheet Convolutional Neural Networks
No ratings yet
Cheatsheet Convolutional Neural Networks
5 pages
A Generic Framework To Support The Implementation of Six Sigma Approach in SMEs
No ratings yet
A Generic Framework To Support The Implementation of Six Sigma Approach in SMEs
6 pages
Survey QP
No ratings yet
Survey QP
10 pages
Scientific Thought and Its Burdens-Compressed PDF
No ratings yet
Scientific Thought and Its Burdens-Compressed PDF
260 pages
Character Strengths and Virtues A Handbook of Classification Mindmap
No ratings yet
Character Strengths and Virtues A Handbook of Classification Mindmap
1 page
Compassion at Work
No ratings yet
Compassion at Work
34 pages
Vivekananda Institute of Professional Studies Vivekananda School of Business Studies
No ratings yet
Vivekananda Institute of Professional Studies Vivekananda School of Business Studies
17 pages
Time Complexity
No ratings yet
Time Complexity
4 pages
Leadership: Chapter 2 - Trait Approach
No ratings yet
Leadership: Chapter 2 - Trait Approach
15 pages
Ed 350863
No ratings yet
Ed 350863
514 pages
Malette2012 PDF
No ratings yet
Malette2012 PDF
20 pages
Writing Business Research Proposal
No ratings yet
Writing Business Research Proposal
32 pages
What Is A Quantitative Analysis Approach
100% (3)
What Is A Quantitative Analysis Approach
5 pages
Unit Test in RW
No ratings yet
Unit Test in RW
1 page
Rules of Translation: Readability, Naturalness, Accuracy (Clarity)
No ratings yet
Rules of Translation: Readability, Naturalness, Accuracy (Clarity)
13 pages
People vs. Jumamoy
No ratings yet
People vs. Jumamoy
3 pages
Sexual Harassment Jurisprudence and Other Legal Basis
No ratings yet
Sexual Harassment Jurisprudence and Other Legal Basis
10 pages
Measurements Assessment and Evaluation in Outcome Based Education
67% (6)
Measurements Assessment and Evaluation in Outcome Based Education
14 pages
Scope of Ob: 1. A Separate Field of Study and Not A Discipline Only
No ratings yet
Scope of Ob: 1. A Separate Field of Study and Not A Discipline Only
3 pages
Bengio, 2009 Curriculum Learning PDF
No ratings yet
Bengio, 2009 Curriculum Learning PDF
8 pages
Maslow's Hierarchy of Human Needs: Sullivan's Transactional Analysis
No ratings yet
Maslow's Hierarchy of Human Needs: Sullivan's Transactional Analysis
12 pages
E 703 Solutions To Psets PDF
No ratings yet
E 703 Solutions To Psets PDF
86 pages
Virtues and Ethics Within Watsuji Tetsurō's: Rinrigaku
No ratings yet
Virtues and Ethics Within Watsuji Tetsurō's: Rinrigaku
15 pages
Michael Walschots
No ratings yet
Michael Walschots
13 pages
Asymptotic Methods For Perturbation Problems
No ratings yet
Asymptotic Methods For Perturbation Problems
42 pages
Types of Multilingual Person
100% (1)
Types of Multilingual Person
21 pages

Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On

Uploaded by

Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On

Uploaded by

03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Accuracy, Recall, Precision, F-Score &

Salma Ghoneim Follow

I will use a basic example to explain each performance metric on in

There are only 4 cases any student X could end up with.

• True positive (TP): Prediction is +ve and X is diabetic, we want

• True negative (TN): Prediction is -ve and X is healthy, we want that

• False positive (FP): Prediction is +ve and X is healthy, false alarm,

• False negative (FN): Prediction is -ve and X is diabetic, the worst

To remember that, there are 2 tricks

Which performance metric to choose?

Recall (aka Sensitivity)

denominator: all people who are diabetic (whether detected by our

F1-score (aka F-Score / F-Measure)

Choose precision if you want to be more confident of your true

Choose Specificity if you want to cover all true negatives, meaning

In the field of machine learning and specifically the problem of statistical

A nice & easy how-to of calculating a confusion matrix is here.

from sklearn.metrics import confusion_matrix

You might also like