Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On
Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 1/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 2/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
Here we go
A school is running a machine learning primary diabetes scan on all of
its students.
The output is either diabetic (+ve) or healthy (-ve).
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 3/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
Before I continue, true positives & true negatives are always good. we
love the news the word true brings. Which leaves false positives and
false negatives.
In our example, false positives are just a false alarm. In a 2nd more
detailed scan it’ll be corrected. But a false negative label, this means
that they think they’re healthy when they’re not, which is — in our
problem — the worst case of the 4.
Whether FP & FN are equally bad or if one of them is worse than the
other depends on your problem. This piece of information has a great
impact on your choice of the performance metric, So give it a thought
before you continue.
. . .
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 4/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
Accuracy
It’s the ratio of the correctly labeled subjects to the whole pool of
subjects.
Accuracy is the most intuitive one.
Accuracy answers the following question: How many students did
we correctly label out of all the students?
Accuracy = (TP+TN)/(TP+FP+FN+TN)
numerator: all correctly labeled subject (All trues)
denominator: all subjects
Precision
Precision is the ratio of the correctly +ve labeled by our program to all
+ve labeled.
Precision answers the following: How many of those who we
labeled as diabetic are actually diabetic?
Precision = TP/(TP+FP)
numerator: +ve labeled diabetic people.
denominator: all +ve labeled by our program (whether they’re diabetic
or not in reality).
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 5/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
Specificity
Specificity is the correctly -ve labeled by the program to all who are
healthy in reality.
Specifity answers the following question: Of all the people who are
healthy, how many of those did we correctly predict?
Speci city = TN/(TN+FP)
numerator: -ve labeled healthy people.
denominator: all people who are healthy in reality (whether +ve or -ve
labeled)
. . .
General Notes
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 6/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
Yes, accuracy is a great measure but only when you have symmetric
datasets (false negatives & false positives counts are close), also, false
negatives & false positives have similar costs.
If the cost of false positives and false negatives are different then F1 is
your savior. F1 is best if you have an uneven class distribution.
Precision is how sure you are of your true positives whilst recall is how
sure you are that you are not missing any positives.
Choose Recall if the idea of false positives is far better than false
negatives, in other words, if the occurrence of false negatives is
unaccepted/intolerable, that you’d rather get some extra false
positives(false alarms) over saving some false negatives, like in our
diabetes example.
You’d rather get some healthy people labeled diabetic over leaving a
diabetic person labeled healthy.
. . .
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 7/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
Bottom Line is
— Accuracy value of 90% means that 1 of every 10 labels is incorrect,
and 9 is correct.
— Precision value of 80% means that on average, 2 of every 10 diabetic
labeled student by our program is healthy, and 8 is diabetic.
— Recall value is 70% means that 3 of every 10 diabetic people in
reality are missed by our program and 7 labeled as diabetic.
— Specificity value is 60% means that 4 of every 10 healthy people in
reality are miss-labeled as diabetic and 6 are correctly labeled as
healthy.
. . .
Confusion Matrix
Wikipedia will explain it better than me
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 8/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 9/10
03/05/2019 Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?
https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124 10/10