KNN Evaluation
KNN Evaluation
Machine Intelligence
Lecture # 2
Spring 2024
1
Tentative Course Topics
2
Other Names
Non-parametric classification
algorithm.
4
Classification revisited
5
Classification revisited
R e fu n d M a r it a l T a x a b le
S ta tu s In c o m e C heat
No S in g le 75K ?
T id R e fu n d M a r ita l T a x a b le
S ta tu s In c o m e C heat Yes M a r r ie d 50K ?
3 No S in g le 70K No No S in g le 40K ?
5 No D iv o r c e d 95K Yes
6 No M a r r ie d 60K No Test
7 Yes D iv o r c e d 220K No Set
8 No S in g le 85K Yes
9 No M a r r ie d 75K No
Training
Learn
10 No S in g le 90K Yes Model
10
Set Classifier
6
Before K-NN After K-NN
X2 x2
Category 2 Category 2
Category 1 Category 1
x1 x1
1. A positive integer k is specified, along with a new sample (k= 1, 3, 5)
2. We select the k entries in our training data set which are closest to
the new sample
Two classes
Two attributes
How to classify (9.1,11)
9
Example
Two classes
Two attributes
How to classify (9.1,11)
10
S T E P 1: Choose the number K of neighbors:K =5
x2
Category 2
Category 1
x1
y
P (x ,y )
2 2 2
y2
y1
P 1(x 1,y1)
x1 x2
x
x2
Category 1: 3 neighbors
Category 2
Category 2: 2 neighbors
New data point
Category 1
x1
Classifying with distance measurements k-Nearest Neighbors (kNN)
16
Example
19
Components of K-NN classifier
Distance metric
− How do we measure distance between instances?
− Determines the layout of the example space categorical variables
continuous variables
The k hyper-parameter
− How large a neighborhood
should we consider?
26
Decision Boundaries
28
Decision Boundary of a kNN
30
Variations on kNN
Weighted voting
• Default: all neighbors have equal weight
• Extension: weight votes of neighbors by (inverse) distance
• The intuition behind weighted kNN, is to give more weight to the points
which are nearby and less weight to the points which are farther away.
( )≤
31
https://www.cs.umd.edu/
Issues with KNN – Effect of K
32
Evaluation
Supervised Learning: Training and Testing
In supervised learning, the task is to learn a mapping from inputs to outputs given a
training dataset 1 1 𝑛 of n input-output pairs.
35
Classification Loss Function
Our goal is to minimize the loss function, i.e., find a set of parameters θ, that make
the misclassification rate as close to zero as possible.
Remember that, For continuous labels or response variables, a common loss function is the
Mean Square Error (MSE)
36
Performance Measurement
37
Performance Measurement
• Squared loss
• Linear loss
39
What fraction of the examples are classified correctly? C1
Acc = ?
= 9/10
C2
5
• Acc(M1) = ? M2 M1
C1 C1
• Acc(M2) = ?
C2 C2
6
Shortcomings of Accuracy
Let’s delve into the possible classification cases. Either the classifier got a positive
example labeled as positive, or it made a mistake and marked it as negative. Conversely, a
negative example may have been mis-labeled as positive, or correctly guessed negative.
So we define the following metrics:
42
Example – Spam Classifier
In this case, accuracy = (10 + 100)/(10 + 100 + 25 + 15) = 73.3%. We may be tempted to think our
classifier is pretty decent since it detected nearly 73% of all the spam messages.
However, look what happens when we switch it for a dumb classifier that always says “no spam”:
43
A new dump Spam Classifier
44
• Imbalanced data (distribution of classes)!
• Some errors matter more than others …
− Given medical record, predict patient has COVID or not
− Given an email, detect spam
7
Accuracy paradox
This is called the accuracy paradox. When TP < FP, then accuracy will always increase
when we change a classification rule to always output “negative” category.
Conversely, when TN < FN, the same will happen when we change our rule to always
output “positive”.
So what can we do, so we are not tricked into thinking one classifier model is better
than other one, when it really isn’t?
46
positive negative
FP TN
Actual
TP FN
positive negative
Predicted
8
positive
positive negative
M2 M1
? ?
C1 C1
Actual
M1
? ?
positive negative
Predicted
positive negative
C2 C2
? ?
Actual
M2
? ?
positive negative
Predicted
9
positive
M2 M1
positive negative
2 6 C1 C1
Actual
M1
1 1
positive negative
Predicted
C2 C2
positive negative
1 7
Actual
M2
0 2
positive negative
Predicted
10
Actual correct prediction divided by the
positive negative
1 1
positive negative
Predicted =
+
positive negative
= 1/3 FP TN
negative
1
1 7
Actual
Actual
M2
positive
TP FN
0 2 Precision: % of positive
Predicted
M1
1 1 FP TN
negative
Actual
positive negative
= TP FN
positive
Predicted
+ FN positive negative
Predicted
positive negative
1 = 1/2
1 7
Actual
M2
2 = 0/2
0 2
positive negative Recall: % of gold positive
Predicted examples that are found
12
which is better (high precision and low recall or vice-versa)?
Detect few positive examples but misses many others The ideal
1
Precision
Predict everything
as positive
0 1
Recall
15
False Positive or False Negative Sick, healthy
Spam, not Spam
In the medical example, what is worse, a False In the spam detector example, what is worse, a
Positive, or a False Negative? False Positive, or a False Negative?
55
there should be a metric that combines both
measure
2
1
+
− Harmonic mean of and
• Weighted measure
different weightage to recall and precision
Beta represents how many times recall is more important than precision.
If the recall is twice as important as precision, the value of Beta is 2.
16
M2 M1
M1 M2 C1 C1
Precision ? ?
Recall ? ?
F1 ? ?
C2 C2
17
M2 M1
M1 M2 C1 C1
F1 0.4 0
C2 C2
18
C1
• Accuracy = ?
= (3+3+1)/10 = 0.7
C3
19
C1
Predicted
Actual
C2
C3
20
Predicted C1
3 0 1
Actual
0 3 1 C2
1 0 1
C3
P
R
F1
21
Predicted C1
3 0 1
Actual
0 3 1 C2
1 0 1
C3
P 0.75 1 0.333
R 0.75 0.75 0.5
F1 0.75 0.86 0.4
Overfitting:
It occurs when you pay too much attention to the
specifics of the training data, and are not able to
generalize well.
•Often, this means that your model is fitting noise,
rather than whatever it is supposed to fit.
•Or didn’t have sufficient data to learn from
Underfitting:
Learning algorithm had the opportunity to learn more
from training data, but didn’t (too simple model).
•Or didn’t have sufficient data to learn from
66
67
CSCI417
Machine Intelligence
Thank you
Spring 2024
68