0% found this document useful (0 votes)

70 views16 pages

S3 K Nearest Neighbor LKW 15jan2025

The document provides an overview of the K-Nearest Neighbour (KNN) algorithm, explaining its basic concept, how to measure proximity, and the process of choosing the number of neighbors (k) for classification and prediction. It discusses the advantages and disadvantages of KNN, including its simplicity and the challenges posed by the curse of dimensionality. Additionally, it presents an example of KNN applied to classify households based on ownership of riding mowers, along with performance metrics and strategies to mitigate dimensionality issues.

Uploaded by

Joel Lim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views16 pages

S3 K Nearest Neighbor LKW 15jan2025

Uploaded by

Joel Lim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

K-Nearest-Neighbour [KNN]

• Basic Idea of K-Nearest-Neighbour [KNN]

• Measuring “Near”
• Choosing K
• Example of KNN for classification
• Using KNN for prediction
• Pros and Cons of KNN
• Reference
SBDP, Chapter 7 – K-Nearest-Neighbour

1
Basic Idea of KNN
• Data-driven, not model-driven.

• Makes no assumptions about the data.

• For a given record to be classified, identify nearby records.

• “Near” means records with similar predictor values X1, X2, … Xp

• Classify the record as whatever the predominant class is among

the nearby records (the “neighbours”)
2
How to measure “nearby”?

The most popular distance measure is Euclidean distance

3
Choosing k
• K is the number of nearby neighbors to be used to classify
the new record
 K=1 means use the single nearest record

 K=3 means use the 3 nearest records

 K=5 means use the 5 nearest records

• Typically choose that value of k which has lowest error rate

in validation data.
4
Low k vs. High k

• Low values of k (1, 3, …) capture local structure in data (but

also noise).

• High values of k provide more smoothing, less noise, but may

miss local structure.

• Note: the extreme case of k = n (i.e., the entire data set) is the
same as the “naïve rule” (classify all records according to
majority class) 5
Using KNN to classify
Example: Riding Mowers
• Data: 24 households classified as owning or not owning riding
mowers

• Predictors: Income, Lot Size.

• SBDP page 173: SBDP manually partition training set and

validation set by including a training variable to be either t or v.
See footnote 3 validation data set includes household 2 ,7 ,8, 13,
22, and 24.
6
Using KNN to classify - Example: Riding Mower
obs Inc ome Lot Size Owne rs hip tra in
1 60 1 8 .4 o wn e r t
2 8 5 .5 1 6 .8 o wn e r v
3 6 4 .8 2 1 .6 o wn e r t
4 6 1 .5 2 0 .8 o wn e r t
5 87 2 3 .6 o wn e r t
6 1 1 0 .1 1 9 .2 o wn e r t
7 108 1 7 .6 o wn e r v
8 8 2 .8 2 2 .4 o wn e r v
9 69 20 o wn e r t
10 93 2 0 .8 o wn e r t
11 51 22 o wn e r t
12 81 20 o wn e r t
13 75 1 9 .6 n o n -o wn e r v
14 5 2 .8 2 0 .8 n o n -o wn e r t
15 6 4 .8 1 7 .2 n o n -o wn e r t
16 4 3 .2 2 0 .4 n o n -o wn e r t
17 84 1 7 .6 n o n -o wn e r t
18 4 9 .2 1 7 .6 n o n -o wn e r t
19 5 9 .4 16 n o n -o wn e r t
20 66 1 8 .4 n o n -o wn e r t
21 4 7 .4 1 6 .4 n o n -o wn e r t
22 33 1 8 .8 n o n -o wn e r v
23 51 14 n o n -o wn e r t
24 63 1 4 .8 n o n -o wn e r v
7
KNN Performance for Different k’s
Riding Mowers

8
KNN Classification – Prediction of validation data
Validation: Classification Summary

Confusion Matrix
Actual\Predicted non-owner owner
non-owner 2 1
owner 0 3

Error Report
Class # Cases # Errors % Error
non-owner 3 1 33.33333333
owner 3 0 0
Overall 6 1 16.66666667

Metrics
Metric Value
Accuracy (#correct) 5
Accuracy (%correct) 83.33333333
Specificity 0.666666667
Sensitivity (Recall) 1
Precision 0.75
F1 score 0.857142857
Success Class owner
Success Probability 0.5

Validation: Classification Details

Record ID Ownership Prediction: Ownership PostProb: non-owner PostProb: owner

Record 2 owner owner 0.5 0.5
Record 7 owner owner 0.375 0.625
Record 8 owner owner 0.125 0.875
Record 13 non-owner owner 0.375 0.625
Record 22 non-owner non-owner 0.625 0.375
Record 24 non-owner non-owner 0.875 0.125

9
Classifying a new record (“x”):
find neighbors (try k = 3)

10
New record: lot size = 20, income = 60. If k=3, the three closest neighbors are 4, 9, and
14. Two are owners so, by majority vote, we classify the new record as “owner.” 11
Using K-NN for Prediction
(kNN for Numerical Outcome)
● Step 1: Determine neighbours by computing distance (same).

● Step 2: Instead of “majority vote determines class” use average of response

values of the k nearest neighbours to determine the prediction.

● This average may be a weighted average, weight decreasing with distance

from the point at which the prediction is made.

● Instead of overall error used in kNN classification, we use RMS error (or
other prediction error metric) for KNN prediction to determine the best k.
12
Advantages of KNN algorithm
● Simple

● No assumptions required about Normal distribution.

● Effective at capturing complex interactions among variables

without having to define a statistical model

13
Shortcomings of KNN algorithm

● Required size of training set increases exponentially with # of

predictors, p
This is because expected distance to nearest neighbor
increases with p (with large vector of predictors, all records
end up “far away” from each other)
● In a large training set, it takes a long time to find distances to
all the neighbors and then identify the nearest one(s)
● These constitute “curse of dimensionality”
14
Dealing with the Curse

● Reduce dimension of predictors (e.g., with Principal

Components Analysis (PCA)

● Computational shortcuts that settle for “almost nearest

neighbors”. E.g. use bucketing where records are
grouped into buckets so that records in each bucket is
close to each other.
15
Summary
⚫Find distance between record-to-be-classified and all other records

⚫Select k-nearest records

 Classify it according to majority vote of nearest neighbors
 Or, for prediction, take the as average of the nearest neighbors

⚫“Curse of dimensionality” – need to limit # of predictors

AI-900 Killexam
0% (1)
AI-900 Killexam
4 pages
Asset-V1 RISE+MASTER-BCG RF+Wave11-DSM09-P0-3+Type@[email protected] Build Your Team Everest Onepager Pre Read
No ratings yet
Asset-V1 RISE+MASTER-BCG RF+Wave11-DSM09-P0-3+Type@[email protected] Build Your Team Everest Onepager Pre Read
1 page
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Chap7 KNN
No ratings yet
Chap7 KNN
15 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Clustering - KNN
No ratings yet
Clustering - KNN
10 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
Week 07
No ratings yet
Week 07
24 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
No ratings yet
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
21 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
No ratings yet
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
10 pages
ML 2
No ratings yet
ML 2
6 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
KNN
No ratings yet
KNN
53 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
KNN Presentation
No ratings yet
KNN Presentation
19 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
KNN Report
No ratings yet
KNN Report
28 pages
K Nearest Neighbors KNN A Fundamental Machine Learning Algorithm
No ratings yet
K Nearest Neighbors KNN A Fundamental Machine Learning Algorithm
11 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Shubh
No ratings yet
Shubh
10 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
21 KNN
No ratings yet
21 KNN
28 pages
06 KNN
No ratings yet
06 KNN
41 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
K Nearest Neighbors (KNN) : "Birds of A Feather Flock Together"
No ratings yet
K Nearest Neighbors (KNN) : "Birds of A Feather Flock Together"
16 pages
DADM S15 K-NN Classification
No ratings yet
DADM S15 K-NN Classification
13 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
KNN With Example
No ratings yet
KNN With Example
21 pages
K - Nearest Neighbours (K-NN) Algorithm
No ratings yet
K - Nearest Neighbours (K-NN) Algorithm
10 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Instance-Based Learning: K-Nearest Neighbour Learning
No ratings yet
Instance-Based Learning: K-Nearest Neighbour Learning
21 pages
ML 5
No ratings yet
ML 5
35 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Physical Pharmaceutics-II Lab Manual as per the PCI Syllabus
From Everand
Physical Pharmaceutics-II Lab Manual as per the PCI Syllabus
A. Pavani
No ratings yet
AST Day 4 Slides (New)
No ratings yet
AST Day 4 Slides (New)
37 pages
Lecture 0 - DLIR Module Intro
No ratings yet
Lecture 0 - DLIR Module Intro
8 pages
Exercise 1.2 Data Exploration
No ratings yet
Exercise 1.2 Data Exploration
1 page
Lecture 2 - CNN and Overfitting
No ratings yet
Lecture 2 - CNN and Overfitting
42 pages
Lecture 2 - Conv - Operation
No ratings yet
Lecture 2 - Conv - Operation
31 pages
Lecture 1 - Introduction To NN - CET
No ratings yet
Lecture 1 - Introduction To NN - CET
53 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
Lecture 6 - Use Cases of CNN and Implementation
No ratings yet
Lecture 6 - Use Cases of CNN and Implementation
33 pages
Ans in Day 4 Slides
No ratings yet
Ans in Day 4 Slides
5 pages
Week 3 - Post - GAN
No ratings yet
Week 3 - Post - GAN
38 pages
Week 1 - Introduction To SDGAI
No ratings yet
Week 1 - Introduction To SDGAI
36 pages
M2 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
No ratings yet
M2 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
47 pages
Lecture 1 - NN - Computation
No ratings yet
Lecture 1 - NN - Computation
5 pages
M3 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
No ratings yet
M3 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
45 pages
11 Managed Services
No ratings yet
11 Managed Services
25 pages
Week 0 - Introduction To SDGAI
No ratings yet
Week 0 - Introduction To SDGAI
8 pages
08 Interconnecting Networks
No ratings yet
08 Interconnecting Networks
45 pages
01 Interacting With Google Cloud
No ratings yet
01 Interacting With Google Cloud
20 pages
06 Sample Exam Questions
No ratings yet
06 Sample Exam Questions
79 pages
Mod6 4.1 Blogger vs. WordPress
No ratings yet
Mod6 4.1 Blogger vs. WordPress
5 pages
07 Resource Monitoring
No ratings yet
07 Resource Monitoring
37 pages
Comptia Linux Xk0 005 Exam Objectives (2 0)
No ratings yet
Comptia Linux Xk0 005 Exam Objectives (2 0)
16 pages
Leadership Team
No ratings yet
Leadership Team
10 pages
Mod6 2.1 RACE Framework Your Practical Tool For Effective Digital Marketing
No ratings yet
Mod6 2.1 RACE Framework Your Practical Tool For Effective Digital Marketing
5 pages
CV Esmaeil Keyvanshokooh
No ratings yet
CV Esmaeil Keyvanshokooh
8 pages
Top 10 Machine Learning Challenges We
No ratings yet
Top 10 Machine Learning Challenges We
7 pages
Scaling Down To Scale Up: A Guide To Parameter-Efficient Fine-Tuning
No ratings yet
Scaling Down To Scale Up: A Guide To Parameter-Efficient Fine-Tuning
21 pages
M.Tech Software Engineering Full Syllabus
No ratings yet
M.Tech Software Engineering Full Syllabus
67 pages
COVID-19 Future Forecasting Using Supervised Machine Learning Models
No ratings yet
COVID-19 Future Forecasting Using Supervised Machine Learning Models
13 pages
Analysis of EEG Based Emotion Detection of DEAP and SEED-IV Databases Using SVM
No ratings yet
Analysis of EEG Based Emotion Detection of DEAP and SEED-IV Databases Using SVM
6 pages
Thermostat With Machine Learning Algorithms
No ratings yet
Thermostat With Machine Learning Algorithms
8 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
Transformer 2
No ratings yet
Transformer 2
6 pages
Flow Regime Prediction Using Artificial Neural Network
No ratings yet
Flow Regime Prediction Using Artificial Neural Network
8 pages
Industrial 5.0 Aquaponics System Using Machine Learning Techniques
No ratings yet
Industrial 5.0 Aquaponics System Using Machine Learning Techniques
8 pages
Data Analytics and Business Intelligence NOTES
No ratings yet
Data Analytics and Business Intelligence NOTES
37 pages
World Trends in Warehousing Logistics
No ratings yet
World Trends in Warehousing Logistics
19 pages
MLT Unit-3 Important Questions
No ratings yet
MLT Unit-3 Important Questions
8 pages
MongoDB Overall Report
No ratings yet
MongoDB Overall Report
9 pages
Deep Learning Review and Discussion of Its Future
No ratings yet
Deep Learning Review and Discussion of Its Future
7 pages
Central Recruitment & Promotion Department Corporate Centre, Mumbai
No ratings yet
Central Recruitment & Promotion Department Corporate Centre, Mumbai
3 pages
Deep Fakes
No ratings yet
Deep Fakes
42 pages
Delivery Time Prediction Using Random Forest
No ratings yet
Delivery Time Prediction Using Random Forest
6 pages
KDSH 2024 - PS
No ratings yet
KDSH 2024 - PS
18 pages
Transportation System 上台報告ppt
No ratings yet
Transportation System 上台報告ppt
47 pages
Machine Learning Engineer
No ratings yet
Machine Learning Engineer
2 pages
Data Science Real World Applications
100% (1)
Data Science Real World Applications
19 pages
Tifs2012 SRM
No ratings yet
Tifs2012 SRM
16 pages
PPPPT
No ratings yet
PPPPT
5 pages
International Conference On Signal Processing Trends (SPT 2024)
No ratings yet
International Conference On Signal Processing Trends (SPT 2024)
2 pages
Aiot CT2
No ratings yet
Aiot CT2
230 pages
Customer Churn Analysis in Banking Sector Using Data Mining Techniques
No ratings yet
Customer Churn Analysis in Banking Sector Using Data Mining Techniques
10 pages
Guide On How To Learn Ai Automation Including Resources
No ratings yet
Guide On How To Learn Ai Automation Including Resources
3 pages