Model Performance Metrics. Accuracy, Precision, Recall

Data Science
Model Performance Metrics
(Accuracy, Precision, Recall)

Model Performance Metrics
Model performance metrics are measurements used to evaluate the effectiveness
and efficiency of a predictive model or machine learning algorithm.
To evaluate the performance of predictive model there are metrics:
Accuracy
Precision
Recall (Sensitivity)
F1-Score
Confusion Matrix
ROC Curve and AUC
Please check the description box for the link to Machine Learning videos.

TP TN FP FN
A true positive is an outcome where the model correctly predicts the
positive class. Similarly, a true negative is an outcome where the
model correctly predicts the negative class.
A false positive is an outcome where the model incorrectly predicts
the positive class. And a false negative is an outcome where the
model incorrectly predicts the negative class.
Predicted Positive Negative
Positive
Negative
Actual
True Positive TP
True Negative TN
False Negative FN
False Positive FP

Accuracy
Accuracy: Accuracy is the ratio of the number of correct predictions and the
total number of predictions. It is calculated as-
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Accuracy is useful in binary classification with balanced classes, also be used
for evaluating multiclass classification model when classes are balanced.
When classes in the dataset are highly imbalanced, meaning there is a
significant disparity in the number of instances between classes, accuracy can
be misleading. A model may achieve high accuracy by simply predicting the
majority class for every instance, ignoring the minority class entirely.

Example
let's consider a medical diagnosis scenario where we are developing a
model to predict whether a patient has a rare disease or not. Suppose we
have a dataset of 100 patients, out of which only 2 have the disease. This
dataset represents a highly imbalanced scenario.
let's say we develop a simple classifier that always predicts that a patient
does not have the disease. Despite the high accuracy of 98%, this classifier
is not useful because it fails to identify any patients with the disease. It
simply predicts that every patient is disease-free.
In such cases, evaluation metrics like precision, recall, or F1-score provide
more insightful information about the model's performance, especially
concerning its ability to correctly identify the minority class (patients with
the disease).

Precision
Precision: Precision is a measure of a model’s performance that tells us how
many of the positive predictions made by the model are actually correct. It is
calculated as-
Precision = TP / (TP + FP)
Precision is particularly useful in scenarios where the cost of false positives
is high.
The importance of precision is in music or video recommendation systems,
e-commerce websites, etc. where wrong results could lead to customer churn,
and this could be harmful to the business.
It gives us insight into the model's ability to avoid false positives, A higher
precision indicates fewer false positives.

Example
• Suppose we have a dataset of 1000 emails, out of which 200 are spam
(positive class) and 800 are not spam (negative class). After training our
spam detection model, it predicts that 250 emails are spam.
• True Positives (TP): 150 (correctly identified spam emails)
• False Positives (FP): 100 (non-spam emails incorrectly classified as
spam)
• Using these numbers, let's calculate precision:
• Precision=150/150+100=150/250 =0.6
• So, the precision of the model is 0.6 or 60%. This means that out of all
the emails predicted as spam, 60% of them are actually spam.

Recall (Sensitivity)
Recall: Also known as sensitivity or true positive rate, recall measures the
proportion of true positive predictions among all actual positive instances in
the dataset. It is calculated as-
Recall = TP / (TP + FN).
Recall is particularly useful in scenarios where capturing all positive instances
is crucial, even if it means accepting a higher rate of false positives.
In medical diagnosis, missing a positive instance (false negative) can have
severe consequences for the patient's health or even lead to loss of life. High
recall ensures that the model identifies as many positive cases as possible,
reducing the likelihood of missing critical diagnoses.
It gives us insight into the model's ability to avoid false negatives, which are
cases where patients with the disease are incorrectly diagnosed as not having
it.

Example
• Suppose we have a dataset of 100 patients who were tested for a specific
disease, where 20 patients actually have the disease (positive class), and 80
patients do not have the disease (negative class).
• After training our diagnostic model,
• True Positives (TP): 15 (patients correctly diagnosed with the disease)
• False Positives (FP): 5 (patients incorrectly diagnosed with the disease)
• False Negatives (FN): 5 (patients with the disease incorrectly diagnosed as not
having the disease)
• True Negatives (TN): 75 (patients correctly diagnosed as not having the disease)
• Recall= 15/15+5 =15/20 =0.75

Precision vs Recall
• Precision can be seen as a
measure of quality.
• Higher precision means that an
algorithm returns more relevant
results than irrelevant ones.
• Precision measures the accuracy
of positive predictions.
• Precision is important when the
cost of false positives is high.
(e.g. spam detection).
• Recall can be seen as a measure of
quantity.
• Higher recall means that an
algorithm returns most of the
relevant results (whether irrelevant
ones are also returned).
• Recall measures the completeness of
positive predictions.
• Recall is important when the cost of
false negative is high. (e.g. disease
diagnosis)

Thanks for Watching!
Please check the description box for the link to
Machine Learning videos.

Model Performance Metrics. Accuracy, Precision, Recall

Recommended

More Related Content

What's hot (20)

Similar to Model Performance Metrics. Accuracy, Precision, Recall (20)

More from Megha Sharma (20)

Recently uploaded (20)

Model Performance Metrics. Accuracy, Precision, Recall