0% found this document useful (0 votes)
37 views

DSA1101 2019 Week4 Part1

This document discusses evaluating the performance of classifiers. It introduces the confusion matrix and defines true positives, true negatives, false positives and false negatives. It then describes metrics like accuracy, true positive rate, false positive rate, false negative rate and precision that can be used to evaluate classifiers. Finally, it discusses how the acceptable levels of different types of errors can depend on the specific business context or application.

Uploaded by

ttt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

DSA1101 2019 Week4 Part1

This document discusses evaluating the performance of classifiers. It introduces the confusion matrix and defines true positives, true negatives, false positives and false negatives. It then describes metrics like accuracy, true positive rate, false positive rate, false negative rate and precision that can be used to evaluate classifiers. Finally, it discusses how the acceptable levels of different types of errors can depend on the specific business context or application.

Uploaded by

ttt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Introduction to Data Science

DSA1101

Semester 1, 2019/2020
Week 4
Diagnostics of Classifiers

1 / 38
Diagnostics of Classifiers

We have studied the k-nearest neighbor algorithm as an


example of a classifier
However, there is a need to evaluate the performance of the
classifiers

2 / 38
Diagnostics of Classifiers

k-nearest neighbor is often used as a classifier to assign class


labels to a person, item, or transaction.
In general, for two class labels, C and ¬C , where ¬C denotes
“not C,” some working definitions and formulas follow:
- True Positive: Predict C , when actually C
- True Negative: Predict ¬C , when actually ¬C
- False Positive: Predict C , when actually ¬C
- False Negative: Predict ¬C , when actually C

3 / 38
Diagnostics of Classifiers

We will study the confusion matrix which is a specific table


layout that allows visualization of the performance of a
classifier.
In a two-class classification, a preset threshold may be used to
separate positives from negatives (e.g. we used the majority
rule, Ŷ < 0.5, in the k-nearest neighbor example).

Predicted Class
Positive Negative
Positive True Positives (TP) False Negatives (FN)
Actual Class
Negative False Positives (FP) True Negatives (TN)

4 / 38
Diagnostics of Classifiers

TP and TN are the correct guesses.


A good classifier should have large TP and TN and small
(ideally zero) numbers for FP and FN.

Predicted Class
Positive Negative
Positive True Positives (TP) False Negatives (FN)
Actual Class
Negative False Positives (FP) True Negatives (TN)

5 / 38
Diagnostics of Classifiers: example

A testing set of 100 emails (with their spam or non-spam


label known)
Example confusion matrix of a k-nearest neighbor classifier to
predict if each email is spam or not

Predicted Class
Spam Non-Spam Total
Spam 3 8 11
Actual Class
Non-Spam 2 87 89
Total 5 95 100

6 / 38
Diagnostics of Classifiers

The accuracy (or the overall success rate) is a metric defining


the rate at which a model has classified the records correctly.
It is defined as the sum of TP and TN divided by the total
number of instances:
TP + TN
Accuracy = × 100%
TP + TN + FP + FN

7 / 38
Diagnostics of Classifiers

A good model should have a high accuracy score, but having


a high accuracy score alone does not guarantee the model is
well established.
We will introduce more fine-grained measures better evaluate
the performance of a classifier.

8 / 38
Diagnostics of Classifiers

The true positive rate (TPR) shows the proportion of positive


instances the classifier correctly identified:
TP
TPR =
TP + FN

Predicted Class
Positive Negative
Positive True Positives (TP) False Negatives (FN)
Actual Class
Negative False Positives (FP) True Negatives (TN)

9 / 38
Diagnostics of Classifiers

The false positive rate (FPR) shows what percent of negatives


the classifier marked as positive.
The FPR is also called the false alarm rate or the type I error
rate
FP
FPR =
FP + TN
Predicted Class
Positive Negative
Positive True Positives (TP) False Negatives (FN)
Actual Class
Negative False Positives (FP) True Negatives (TN)

10 / 38
Diagnostics of Classifiers

The false negative rate (FNR) shows what percent of positives


the classifier marked as negatives.
It is also known as the miss rate or type II error rate.
FN
FNR =
TP + FN

Predicted Class
Positive Negative
Positive True Positives (TP) False Negatives (FN)
Actual Class
Negative False Positives (FP) True Negatives (TN)

11 / 38
Diagnostics of Classifiers

Precision is the percentage of instances marked positive that


really are positive:
TP
Precision =
TP + FP

Predicted Class
Positive Negative
Positive True Positives (TP) False Negatives (FN)
Actual Class
Negative False Positives (FP) True Negatives (TN)

12 / 38
Diagnostics of Classifiers

A well-performed model should have a high TPR that is


ideally 1 and a low FPR and FNR that are ideally 0.
In reality, it is rare to have TPR = 1, FPR = 0, and FNR = 0,
but these measures are useful to compare the performance of
multiple models that are designed for solving the same
problem.
Note that in general, the model that is more preferable may
depend on the business situation.

13 / 38
Diagnostics of Classifiers

During the discovery phase of the data analytics lifecycle, the


team should have learned from the business what kind of
errors can be tolerated.
Some business situations are more tolerant of type I errors,
whereas others may be more tolerant of type II errors.

14 / 38
Diagnostics of Classifiers

Consider the example of e-mail spam filtering.


Some people (such as busy executives) only want important
e-mail in their inbox and are tolerant of having some less
important e-mail end up in their spam folder as long as no
spam is in their inbox.
In this case, a higher false positive rate (FPR) or type I error
can be tolerated.

15 / 38
Diagnostics of Classifiers

Other people may not want any important or less important


e-mail to be specified as spam and are willing to have some
spam in their inboxes as long as no important e-mail makes it
into the spam folder.
In this case, a higher false negative rate (FNR) or type II error
can be tolerated.

16 / 38
Diagnostics of Classifiers

Another example involves medical screening during an


infectious disease outbreak.
The cost of having a person, who has the disease, to be
instead diagnosed as disease-free is extremely high, since the
disease may be highly contagious.
Therefore, the false negative rate (FNR) or type II error needs
to be low.
A higher false positive rate (FPR) or type I error can be
tolerated.

17 / 38
Diagnostics of Classifiers

Third example involves security screening at the airport.


The cost of a false negative in this scenario is extremely high
(not detecting a bomb being brought onto a plane could
result in hundreds of deaths) whilst the cost of a false positive
is relatively low (a reasonably simple further inspection)
Therefore, a higher false positive rate (FPR) or type I error
can be tolerated, in order to keep the false negative rate
(FNR) or type II error low.

18 / 38
Diagnostics of Classifiers: example

TP + TN
Accuracy = × 100%
TP + TN + FP + FN
3 + 87
= × 100% = 90%
3 + 87 + 2 + 8
Predicted Class
Spam Non-Spam Total
Spam 3 8 11
Actual Class
Non-Spam 2 87 89
Total 5 95 100

19 / 38
Diagnostics of Classifiers: example

TP 3
TPR = = ≈ 0.273
TP + FN 3+8
Predicted Class
Spam Non-Spam Total
Spam 3 8 11
Actual Class
Non-Spam 2 87 89
Total 5 95 100

20 / 38
Diagnostics of Classifiers: example

FP 2
FPR = = ≈ 0.022
FP + TN 2 + 87
Predicted Class
Spam Non-Spam Total
Spam 3 8 11
Actual Class
Non-Spam 2 87 89
Total 5 95 100

21 / 38
Diagnostics of Classifiers: example

FN 8
FNR = = ≈ 0.727
TP + FN 3+8
Predicted Class
Spam Non-Spam Total
Spam 3 8 11
Actual Class
Non-Spam 2 87 89
Total 5 95 100

22 / 38
Diagnostics of Classifiers: example

TP 3
Precision = = = 0.6
TP + FP 3+2
Predicted Class
Spam Non-Spam Total
Spam 3 8 11
Actual Class
Non-Spam 2 87 89
Total 5 95 100

23 / 38
Diagnostics of Classifiers

We have studied a number of measures that can be used to


evaluate the performance of a classifier.
In practice, when we are presented with a dataset, how should
we go about estimating these performance measures?
A common practice is to perform N-Fold Cross-Validation

24 / 38
Diagnostics of Classifiers

The entire dataset is randomly split into N datasets of


approximately equal size.
N-1 of these datasets are treated as the training dataset, while
the remaining one is the test dataset. A measure of the model
error is obtained.
This process is repeated across the various combinations of N
datasets taken N − 1 at a time.
The observed N models errors are averaged across the N folds.

25 / 38
Diagnostics of Classifiers

26 / 38
Example: Anti-spam techniques

Let us illustrate
N-Fold
Cross-Validation
with an example
with the k-nearest
neighbor classfier for
spams, where we
specify k = 1.
Suppose our dataset
consists of 10 data
points.

27 / 38
Diagnostics of Classifiers

For 2-fold cross validation, we randomly split the whole


dataset of 10 points into two datasets of 5 points each

28 / 38
Example: Anti-spam techniques

29 / 38
Diagnostics of Classifiers

For the first iteration, we use the first dataset as the training
set and the second dataset as the testing set.

30 / 38
Example: Anti-spam techniques

31 / 38
Example: Anti-spam techniques

32 / 38
Diagnostics of Classifiers

In this iteration, we estimate the accuracy of the 1-nearest


neighbor algorithm to be equal to 54

33 / 38
Diagnostics of Classifiers

For the second iteration, we use the second dataset as the


training set and the first dataset as the testing set.

34 / 38
Example: Anti-spam techniques

35 / 38
Example: Anti-spam techniques

36 / 38
Diagnostics of Classifiers

In this iteration, we estimate the accuracy of the 1-nearest


neighbor algorithm to be equal to 53

37 / 38
Diagnostics of Classifiers

Therefore, based on 2-fold cross validation, the accuracy of


the 1-nearest neighbor algorithm is estimated to be
4 3 7

5 + 5 /2 = 10 .
We will continue with more examples next week to ground
ideas.

38 / 38

You might also like