Performance Measures - Session 2
Performance Measures - Session 2
3 Regression Algorithms are used with continuous Classification Algorithms are used with discrete data.
data.
4 In Regression, we try to find the best fit line, In Classification, we try to find the decision boundary,
which can predict the output more accurately. which can divide the dataset into different classes.
5 Regression algorithms can be used to solve the Classification Algorithms can be used to solve
regression problems such as Weather classification problems such as Identification of spam
Prediction, House price prediction, etc. emails, Speech Recognition, Identification of cancer
cells, etc.
6 The regression Algorithm can be further divided The Classification algorithms can be divided into Binary
into Linear and Non-linear Regression. Classifier and Multi-class Classifier.
Confusion Matrix
• Need for confusion Matrix
• What is a confusion Matrix?
• Confusion Matrix example
• Metrics in confusion Matrix
• Confusion Matrix for multiclass classification
• Key Points in Confusion Matrix
Health care sector
Confusion Matrix
Health care sector
What is Confusion Matrix?
Metrics in confusion Matrix
Performance Metrics
•Accuracy
•Precision
•Recall
•Specificity
•AUC
•F1 Score
Performance Charts
•ROC Curve
•Precision/Recall Curve
Accuracy
It is most common performance metric for classification
algorithms. It may be defined as the number of correct
predictions made as a ratio of all predictions made.
Precision
OR
𝑭𝟏 = 𝟐 ∗ (𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝒓𝒆𝒄𝒂𝒍𝒍) / (𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 + 𝒓𝒆𝒄𝒂𝒍𝒍)
F-measure:
• If two models have low precision and high recall or vice versa, it is
difficult to compare these models.
• So, for this purpose, we can use F-score. This score helps us to
evaluate the recall and precision at the same time. The F-score is
maximum if the recall is equal to the precision.
Misclassification rate
Error rate=
Confusion Matrix for Multiclass
Classification
Adelie
Problem 1
Problem 2
1. Suppose we train a model to predict whether an email is Spam or Not Spam. After training the model, we
apply it to a test set of 200 new email messages (also labeled) and the model produces the contingency
table below.
True Class
Spam Not Spam
Predicted Class Spam 70 30
Not Spam 70 330
[A] Compute the precision of this model with respect to the Spam class.
[B] Compute the recall of this model with respect to the Not Spam class.
[C] Suppose we have two users (Emily and Simon) with the following preferences.
Emily hates seeing spam emails in her inbox! However, she doesn’t mind periodically checking the “Junk” directory for
genuine emails incorrectly marked as spam.
Simon doesn’t even know where the “Junk” directory is. He would much prefer to see spam emails in his inbox than to miss
genuine emails without knowing!
Which user is more likely to be satisfied with this classifier? Why?
Problem 3
Problem 4
• Consider the following 3-class confusion Matrix . Calculate precision
and recall per class .Also calculate weighted average ,precision and
recall for classifier.
Predicted Total
15 2 3
Actual 7 15 8
2 3 45
Total
Predicted Total
20
15 2 3
30
Actual 7 15 8
50
2 3 45
100
Total 24 20 56
Precision
• Precision=
• Class A
• Class B
• Class C
Recall
• Recall =
• Class A
• Class B
• Class C
Accuracy
• Accuracy =
FIND ,