0% found this document useful (0 votes)
70 views16 pages

S3 K Nearest Neighbor LKW 15jan2025

The document provides an overview of the K-Nearest Neighbour (KNN) algorithm, explaining its basic concept, how to measure proximity, and the process of choosing the number of neighbors (k) for classification and prediction. It discusses the advantages and disadvantages of KNN, including its simplicity and the challenges posed by the curse of dimensionality. Additionally, it presents an example of KNN applied to classify households based on ownership of riding mowers, along with performance metrics and strategies to mitigate dimensionality issues.

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views16 pages

S3 K Nearest Neighbor LKW 15jan2025

The document provides an overview of the K-Nearest Neighbour (KNN) algorithm, explaining its basic concept, how to measure proximity, and the process of choosing the number of neighbors (k) for classification and prediction. It discusses the advantages and disadvantages of KNN, including its simplicity and the challenges posed by the curse of dimensionality. Additionally, it presents an example of KNN applied to classify households based on ownership of riding mowers, along with performance metrics and strategies to mitigate dimensionality issues.

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

K-Nearest-Neighbour [KNN]

• Basic Idea of K-Nearest-Neighbour [KNN]


• Measuring “Near”
• Choosing K
• Example of KNN for classification
• Using KNN for prediction
• Pros and Cons of KNN
• Reference
SBDP, Chapter 7 – K-Nearest-Neighbour

1
Basic Idea of KNN
• Data-driven, not model-driven.

• Makes no assumptions about the data.

• For a given record to be classified, identify nearby records.

• “Near” means records with similar predictor values X1, X2, … Xp

• Classify the record as whatever the predominant class is among


the nearby records (the “neighbours”)
2
How to measure “nearby”?

The most popular distance measure is Euclidean distance

3
Choosing k
• K is the number of nearby neighbors to be used to classify
the new record
 K=1 means use the single nearest record

 K=3 means use the 3 nearest records

 K=5 means use the 5 nearest records

• Typically choose that value of k which has lowest error rate


in validation data.
4
Low k vs. High k

• Low values of k (1, 3, …) capture local structure in data (but


also noise).

• High values of k provide more smoothing, less noise, but may


miss local structure.

• Note: the extreme case of k = n (i.e., the entire data set) is the
same as the “naïve rule” (classify all records according to
majority class) 5
Using KNN to classify
Example: Riding Mowers
• Data: 24 households classified as owning or not owning riding
mowers

• Predictors: Income, Lot Size.

• SBDP page 173: SBDP manually partition training set and


validation set by including a training variable to be either t or v.
See footnote 3 validation data set includes household 2 ,7 ,8, 13,
22, and 24.
6
Using KNN to classify - Example: Riding Mower
obs Inc ome Lot Size Owne rs hip tra in
1 60 1 8 .4 o wn e r t
2 8 5 .5 1 6 .8 o wn e r v
3 6 4 .8 2 1 .6 o wn e r t
4 6 1 .5 2 0 .8 o wn e r t
5 87 2 3 .6 o wn e r t
6 1 1 0 .1 1 9 .2 o wn e r t
7 108 1 7 .6 o wn e r v
8 8 2 .8 2 2 .4 o wn e r v
9 69 20 o wn e r t
10 93 2 0 .8 o wn e r t
11 51 22 o wn e r t
12 81 20 o wn e r t
13 75 1 9 .6 n o n -o wn e r v
14 5 2 .8 2 0 .8 n o n -o wn e r t
15 6 4 .8 1 7 .2 n o n -o wn e r t
16 4 3 .2 2 0 .4 n o n -o wn e r t
17 84 1 7 .6 n o n -o wn e r t
18 4 9 .2 1 7 .6 n o n -o wn e r t
19 5 9 .4 16 n o n -o wn e r t
20 66 1 8 .4 n o n -o wn e r t
21 4 7 .4 1 6 .4 n o n -o wn e r t
22 33 1 8 .8 n o n -o wn e r v
23 51 14 n o n -o wn e r t
24 63 1 4 .8 n o n -o wn e r v
7
KNN Performance for Different k’s
Riding Mowers

8
KNN Classification – Prediction of validation data
Validation: Classification Summary

Confusion Matrix
Actual\Predicted non-owner owner
non-owner 2 1
owner 0 3

Error Report
Class # Cases # Errors % Error
non-owner 3 1 33.33333333
owner 3 0 0
Overall 6 1 16.66666667

Metrics
Metric Value
Accuracy (#correct) 5
Accuracy (%correct) 83.33333333
Specificity 0.666666667
Sensitivity (Recall) 1
Precision 0.75
F1 score 0.857142857
Success Class owner
Success Probability 0.5

Validation: Classification Details

Record ID Ownership Prediction: Ownership PostProb: non-owner PostProb: owner


Record 2 owner owner 0.5 0.5
Record 7 owner owner 0.375 0.625
Record 8 owner owner 0.125 0.875
Record 13 non-owner owner 0.375 0.625
Record 22 non-owner non-owner 0.625 0.375
Record 24 non-owner non-owner 0.875 0.125

9
Classifying a new record (“x”):
find neighbors (try k = 3)

10
New record: lot size = 20, income = 60. If k=3, the three closest neighbors are 4, 9, and
14. Two are owners so, by majority vote, we classify the new record as “owner.” 11
Using K-NN for Prediction
(kNN for Numerical Outcome)
● Step 1: Determine neighbours by computing distance (same).

● Step 2: Instead of “majority vote determines class” use average of response


values of the k nearest neighbours to determine the prediction.

● This average may be a weighted average, weight decreasing with distance


from the point at which the prediction is made.

● Instead of overall error used in kNN classification, we use RMS error (or
other prediction error metric) for KNN prediction to determine the best k.
12
Advantages of KNN algorithm
● Simple

● No assumptions required about Normal distribution.

● Effective at capturing complex interactions among variables


without having to define a statistical model

13
Shortcomings of KNN algorithm

● Required size of training set increases exponentially with # of


predictors, p
This is because expected distance to nearest neighbor
increases with p (with large vector of predictors, all records
end up “far away” from each other)
● In a large training set, it takes a long time to find distances to
all the neighbors and then identify the nearest one(s)
● These constitute “curse of dimensionality”
14
Dealing with the Curse

● Reduce dimension of predictors (e.g., with Principal


Components Analysis (PCA)

● Computational shortcuts that settle for “almost nearest


neighbors”. E.g. use bucketing where records are
grouped into buckets so that records in each bucket is
close to each other.
15
Summary
⚫Find distance between record-to-be-classified and all other records

⚫Select k-nearest records


 Classify it according to majority vote of nearest neighbors
 Or, for prediction, take the as average of the nearest neighbors

⚫“Curse of dimensionality” – need to limit # of predictors

16

You might also like