S3 K Nearest Neighbor LKW 15jan2025
S3 K Nearest Neighbor LKW 15jan2025
1
Basic Idea of KNN
• Data-driven, not model-driven.
3
Choosing k
• K is the number of nearby neighbors to be used to classify
the new record
K=1 means use the single nearest record
• Note: the extreme case of k = n (i.e., the entire data set) is the
same as the “naïve rule” (classify all records according to
majority class) 5
Using KNN to classify
Example: Riding Mowers
• Data: 24 households classified as owning or not owning riding
mowers
8
KNN Classification – Prediction of validation data
Validation: Classification Summary
Confusion Matrix
Actual\Predicted non-owner owner
non-owner 2 1
owner 0 3
Error Report
Class # Cases # Errors % Error
non-owner 3 1 33.33333333
owner 3 0 0
Overall 6 1 16.66666667
Metrics
Metric Value
Accuracy (#correct) 5
Accuracy (%correct) 83.33333333
Specificity 0.666666667
Sensitivity (Recall) 1
Precision 0.75
F1 score 0.857142857
Success Class owner
Success Probability 0.5
9
Classifying a new record (“x”):
find neighbors (try k = 3)
10
New record: lot size = 20, income = 60. If k=3, the three closest neighbors are 4, 9, and
14. Two are owners so, by majority vote, we classify the new record as “owner.” 11
Using K-NN for Prediction
(kNN for Numerical Outcome)
● Step 1: Determine neighbours by computing distance (same).
● Instead of overall error used in kNN classification, we use RMS error (or
other prediction error metric) for KNN prediction to determine the best k.
12
Advantages of KNN algorithm
● Simple
13
Shortcomings of KNN algorithm
16