0% found this document useful (0 votes)
198 views

Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On

The document discusses different performance metrics for machine learning models including accuracy, recall, precision, F-score, and specificity. It uses a diabetes prediction example to explain each metric. Accuracy measures the percentage of correct predictions out of all predictions. Recall measures the percentage of actual positive cases that are correctly identified. Precision measures the percentage of predicted positive cases that are actually positive. The document discusses how to choose the appropriate metric depending on whether false positives or false negatives are worse for the given problem and predictions.

Uploaded by

cidsant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views

Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On

The document discusses different performance metrics for machine learning models including accuracy, recall, precision, F-score, and specificity. It uses a diabetes prediction example to explain each metric. Accuracy measures the percentage of correct predictions out of all predictions. Recall measures the percentage of actual positive cases that are correctly identified. Precision measures the percentage of predicted positive cases that are actually positive. The document discusses how to choose the appropriate metric depending on whether false positives or false negatives are worse for the given problem and predictions.

Uploaded by

cidsant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Accuracy, Recall, Precision, F-Score &


Speci city, which to optimize on?
Based on your project, which performance metric to
improve on?

Salma Ghoneim Follow


Apr 2 · 5 min read

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 1/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 2/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

I will use a basic example to explain each performance metric on in


order for you to really understand the difference between each one of
them. So that in your next ML project you can choose which
performance metric to improve on that best suits your project.

Here we go
A school is running a machine learning primary diabetes scan on all of
its students.
The output is either diabetic (+ve) or healthy (-ve).

There are only 4 cases any student X could end up with.


We’ll be using the following as a reference later, So don’t hesitate to re-
read it if you get confused.

• True positive (TP): Prediction is +ve and X is diabetic, we want


that

• True negative (TN): Prediction is -ve and X is healthy, we want that


too

• False positive (FP): Prediction is +ve and X is healthy, false alarm,


bad

• False negative (FN): Prediction is -ve and X is diabetic, the worst

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 3/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

To remember that, there are 2 tricks


- If it starts with True then the prediction was correct whether
diabetic or not, so true positive is a diabetic person correctly predicted
& a true negative is a healthy person correctly predicted.
Oppositely, if it starts with False then the prediction was incorrect,
so false positive is a healthy person incorrectly predicted as diabetic(+)
& a false negative is a diabetic person incorrectly predicted as
healthy(-).
- Positive or negative indicates the output of our program. While
true or false judges this output whether correct or incorrect.

Before I continue, true positives & true negatives are always good. we
love the news the word true brings. Which leaves false positives and
false negatives.
In our example, false positives are just a false alarm. In a 2nd more
detailed scan it’ll be corrected. But a false negative label, this means
that they think they’re healthy when they’re not, which is — in our
problem — the worst case of the 4.
Whether FP & FN are equally bad or if one of them is worse than the
other depends on your problem. This piece of information has a great
impact on your choice of the performance metric, So give it a thought
before you continue.

. . .

Which performance metric to choose?

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 4/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Accuracy
It’s the ratio of the correctly labeled subjects to the whole pool of
subjects.
Accuracy is the most intuitive one.
Accuracy answers the following question: How many students did
we correctly label out of all the students?
Accuracy = (TP+TN)/(TP+FP+FN+TN)
numerator: all correctly labeled subject (All trues)
denominator: all subjects

Precision
Precision is the ratio of the correctly +ve labeled by our program to all
+ve labeled.
Precision answers the following: How many of those who we
labeled as diabetic are actually diabetic?
Precision = TP/(TP+FP)
numerator: +ve labeled diabetic people.
denominator: all +ve labeled by our program (whether they’re diabetic
or not in reality).

Recall (aka Sensitivity)


Recall is the ratio of the correctly +ve labeled by our program to all
who are diabetic in reality.
Recall answers the following question: Of all the people who are
diabetic, how many of those we correctly predict?
Recall = TP/(TP+FN)
numerator: +ve labeled diabetic people.

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 5/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

denominator: all people who are diabetic (whether detected by our


program or not)

F1-score (aka F-Score / F-Measure)


F1 Score considers both precision and recall.
It is the harmonic mean(average) of the precision and recall.
F1 Score is best if there is some sort of balance between precision (p) &
recall (r) in the system. Oppositely F1 Score isn’t so high if one measure
is improved at the expense of the other. 
For example, if P is 1 & R is 0, F1 score is 0.
F1 Score = 2*(Recall * Precision) / (Recall + Precision)

Specificity
Specificity is the correctly -ve labeled by the program to all who are
healthy in reality.
Specifity answers the following question: Of all the people who are
healthy, how many of those did we correctly predict?
Speci city = TN/(TN+FP)
numerator: -ve labeled healthy people.
denominator: all people who are healthy in reality (whether +ve or -ve
labeled)

. . .

General Notes

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 6/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Yes, accuracy is a great measure but only when you have symmetric
datasets (false negatives & false positives counts are close), also, false
negatives & false positives have similar costs.
If the cost of false positives and false negatives are different then F1 is
your savior. F1 is best if you have an uneven class distribution.

Precision is how sure you are of your true positives whilst recall is how
sure you are that you are not missing any positives.

Choose Recall if the idea of false positives is far better than false
negatives, in other words, if the occurrence of false negatives is
unaccepted/intolerable, that you’d rather get some extra false
positives(false alarms) over saving some false negatives, like in our
diabetes example.
You’d rather get some healthy people labeled diabetic over leaving a
diabetic person labeled healthy.

Choose precision if you want to be more confident of your true


positives. for example, Spam emails. You’d rather have some spam
emails in your inbox rather than some regular emails in your spam box.
So, the email company wants to be extra sure that email Y is spam
before they put it in the spam box and you never get to see it.

Choose Specificity if you want to cover all true negatives, meaning


you don’t want any false alarms, you don’t want any false positives. for
example, you’re running a drug test in which all people who test
positive will immediately go to jail, you don’t want anyone drug-free
going to jail. False positives here are intolerable.

. . .

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 7/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

Bottom Line is
— Accuracy value of 90% means that 1 of every 10 labels is incorrect,
and 9 is correct.
— Precision value of 80% means that on average, 2 of every 10 diabetic
labeled student by our program is healthy, and 8 is diabetic.
— Recall value is 70% means that 3 of every 10 diabetic people in
reality are missed by our program and 7 labeled as diabetic.
— Specificity value is 60% means that 4 of every 10 healthy people in
reality are miss-labeled as diabetic and 6 are correctly labeled as
healthy.

. . .

Confusion Matrix
Wikipedia will explain it better than me

In the field of machine learning and specifically the problem of statistical


classification, a confusion matrix, also known as an error matrix, is a
specific table layout that allows visualization of the performance of an
algorithm, typically a supervised learning one (in unsupervised learning it
is usually called a matching matrix). Each row of the matrix represents the
instances in a predicted class while each column represents the instances in
an actual class (or vice versa).The name stems from the fact that it makes
it easy to see if the system is confusing two classes (i.e. commonly
mislabeling one as another).

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 8/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

A nice & easy how-to of calculating a confusion matrix is here.

from sklearn.metrics import confusion_matrix


>>>tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1],
[1, 1, 1, 0]).ravel()
# true negatives, false positives, false negatives, true
positives
>>>(tn, fp, fn, tp)
(0, 2, 1, 1)

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 9/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?

https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 10/10

You might also like