KNN

K Nearest Neigbhours presentation in Machine Learning

Uploaded by

Mahnoor Farooq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

KNN

K Nearest Neigbhours presentation in Machine Learning

Uploaded by

Mahnoor Farooq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

K NEAREST

NEIGHBOR
Presented By:
Mahnoor Farooq- 2021-CS-403
Ayesha Nadeem - 2021-CS-413
Saria Irshad- 2021-CS-425
Saba Shahzadi-2021-CS-411
• Introduction

• Distance Metrics

• Choice of k

AGENDA • Feature Scaling

• KNN for classification

• KNN for regression

• Search Algorithm

• Challenges & Limitation

WHAT IS KNN?

• A powerful supervised learning algorithm used for both classification and

regression problem, but mainly used for classification.
• K nearest neighbors stores all available cases and classifies new cases based
on a similarity measure.
• It is a Lazy Learning Algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification,
it performs an action on dataset.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
KNN-DIFFERENT

•
NAMES
K-Nearest Neighbbors
• Memory-based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
KNN-PRINCIPLE

• This technique implements classification by considering majority if

vote among the “k” closest points to the unlabeled data point.
• K here describes the Numbers of instances point that should be
taken into the consideration
KNN-
EXAMPLE
Green circle is the unlabeled data
point.
• k=3 in this problem
• Closest 3 points taken
• 2 are red 1 is blue
• Votes= 2Red>1Blue
• Green circle is a red triangle
KNN-
EXAMPLE
Green circle is the unlabeled data
point.
• k=5 in this problem
• Closest 5 points taken
• 2 are red 3 is blue
• Votes= 2Red>3Blue
• Green circle is a Blue square
DISTANCE
• Euclidean Distance: Euclidean distance is calculated as the square root of the
METRICS
sum of the squared differences between a new point (x) and an existing point (y).

• Manhattan Distance: Manhatten distance says that the distance between your
starting point and the destination point.

• Hamming Distance
EUCLIDEAN
DISTANCE
Suppose the vector X1 and X2 coordinates are in 2-Dim,
X1(x1, y1) = X1(3, 4)
X2(x2, y2) = X2(4, 7), as you can see in the image.
So these are a 2-Dim vector so our eucledian distance
mathematical equation for finding the distance between
X1 and X2 is:
distance = sqrt( (x2-x1)2 + (y2-y1)2 )
When we put our coordinates in the equations,
distance = sqrt( (4-3)2 + (7-4)2 )
distance = sqrt( 1+32 )
distance = sqrt( 10 )
distance = 3.1 (approx)
MANHATTAN DISTANCE

one cab starting from point X1 and has to reach its

destination point X2 so we didn’t calculate its
shortest path apart from this we have to calculate its
absolute or full path they travel.
distance = absolute sum ||xi-yi||
distance = (7 + 4)
distance = 11
SO, the absolute path the cab cover is 11.
HAMMING DISTANCE
Hamming distance: This technique is used typically
used with Boolean or string vectors, identifying the
points where the vectors do not match.

Suppose we have two points X1 and X2 and both of

them are boolean vectors, represented as:
X1 = [ 0,0,0,1,0,1,1,0,1,1,1,0,0,0,1]
X2 = [ 0,1,0,1,0,1,0,0,1,0,1,0,1,0,1]
So more simply the hamming distance of (X1, X2) is the
location where the binary vectors or numbers are
different.
Hamming distance(X1, X2) = 3
PERFORMANCE OF KNN

The performance of K-NN heavily depends on two key

factors:
1.Choice of K: The number of neighbors considered
during prediction.
2.Hyperparameter Tuning: Tuning parameters like
distance metrics and K itself to optimize performance.
CHOOSING THE K

• K is a critical hyperparameter that affects the

bias-variance tradeoff:
• Small K (e.g., K=1): Leads to overfitting as the
model is too sensitive to noise in the data.
• Large K (e.g., K=20): Leads to underfitting as
the model becomes too generalized and
smooths out details.
• Ideal K balances between noise sensitivity and
model complexity.
HYPERPARAMETER
TUNING
Cross-validation is essential to tune
hyperparameters like K. It helps ensure the model
generalizes well to unseen data.
K-Fold Cross-Validation:
• 1. Split the dataset into K equal-sized subsets
(folds).
• 2. For each fold, train the model on K-1 folds and
test on the remaining fold.
• 3. Calculate the error for each fold.
• 4. Average the errors across all K folds to get the
final model performance.
PREPROCESSING ON
KNN
Key Preprocessing Steps for K-NN:
• Feature Scaling
• Handling Missing Data: Missing data can distort distance calculations.
• Imputation: Replace missing values with the mean, median, or mode based on
the feature type.
• Handling Noisy Data and Outliers: Noisy data can lead to incorrect predictions.
• Outlier Detection: Use methods like Z-Score or IQR to identify and remove
outliers.
FEATURE SCALING
• K-NN is sensitive to feature scales because it calculates
distances between data points.
• Features with larger ranges (e.g., Income in USD) will
dominate the distance metric, making features with smaller
ranges (e.g., Age in years) less impactful.
Scaling methods :
1.Min-Max Scaling: Scales features to a fixed range, typically [0,
1].
2.Standardization: Scales features to have a mean of 0 and
standard deviation of 1.
MIN-MAX SCALING
• Where X' is the scaled value, X is the original value,
• Min-Max scaling rescales the feature values to the range [0, 1].
• Example: If Age values are between 20 and 80, and the value is 50, applying
Min-Max scaling will rescale it to:
STANDARIZATION
Standardization Formula:
• Where X' is the standardized value, X is the original value,
• Standardization converts features to have a mean of 0 and a standard deviation of 1.
• Example: If the feature "Salary" has a mean of 50,000 and a standard deviation of
10,000, a Salary value of 60,000 will be standardized as:
KNN FOR
Distance Calculation:
CLASSIFICATION
• The distance between a new sample x and an existing
sample x_i is calculated using a distance metric like
Euclidean distance.
• Euclidean Distance Formula:
• x_j is the j-th feature of the new data point x,
• x_ij is the j-th feature of the training point x_i,
• n is the total number of features.
• Majority Vote: Once the distances are calculated, the K
nearest neighbors are identified, and the majority class of
the neighbors determines the predicted class for the new
point.
KNN FOR REGRESSION

• K-NN for Regression predicts continuous values rather

than discrete classes.
• Instead of majority voting, the predicted value for a new
point is the average of the values of its K nearest
neighbors.
KNN FOR REGRESSION

Distance Calculation: Same as for classification, using Euclidean distance or other distance
metrics.
• Prediction Rule: The predicted value for regression is the average of the values of the K
nearest neighbors.
Formula:
• Where y_i is the target value for the i-th nearest neighbor.
• Weighted Average (Optional): A weighted average can be used where closer neighbors have
more influence.
Where w_i is the weight based on distance.
CHOOSING K FOR CLASSIFICATION &
REGRESSION
choosing K:
• Small K: Small K values (e.g., K=1) are sensitive to noise and can lead to overfitting.
• Large K: Large K values (e.g., K=20) may smooth out the model too much, leading to
underfitting.
• The optimal value of K is typically found by cross-validation.
For Classification:
K should be large enough to avoid noise but small enough to preserve local patterns in
the data.
For Regression:
The optimal K balances between the local behavior of the data and generalizing trends.
KNN FOR CLASSIFICATION VS REGRESSION

Classification: K-NN assigns a class label based on the majority vote of the K
nearest neighbors.
Regression: K-NN predicts a continuous value based on the average (or weighted
average) of the K nearest neighbors' target values.
Mathematical Difference:
In classification, the output is a class label.
In regression, the output is a continuous value (mean or weighted mean of
neighbors’ target values).
CHALLENGES & LIMITATIONS
1.Computational Cost (Slow Prediction Time):
Problem: K-NN calculates the distance between the test point and all points in the training
dataset at prediction time, making it computationally expensive.
Example: In a dataset with 1 million data points, predicting the class of a new data point
requires calculating the distance between this point and all 1 million points, which can be
slow.
2.Curse of Dimensionality:
Problem: As the number of features (dimensions) increases, the concept of distance
becomes less meaningful, leading to poor performance in high-dimensional spaces.
Example: When classifying high-dimensional data (e.g., pixel values of an image with 100
features), the notion of 'closeness' becomes distorted.
CHALLENGES & LIMITATIONS
3.Memory and Storage Requirements:
Problem: K-NN stores the entire training dataset, which can be inefficient if the
dataset is very large.
Example: For a large image dataset, storing millions of images can consume a lot of
memory.
4.Sensitivity to Noisy Data and Outliers:
Problem: K-NN relies on the proximity of data points to make predictions. If there are
noisy data points or outliers, they can influence the classification of the test point.
Example: In a classification task with most data points labeled as 'cat' but one outlier
labeled 'dog', the test point may be incorrectly classified as 'dog'.
OPTIMIZING KNN
KD-Tree Overview:
A KD-Tree is a hierarchical binary tree used to organize data points
in a k-dimensional space.
How it Works: KD-Trees recursively split the data into two halves
along the median of each dimension, creating a binary tree
structure.
Benefit: Searching for nearest neighbors becomes more efficient,
as the tree structure allows for pruning of large portions of the
data, reducing the search space during prediction.
Example: When a test point is provided, the KD-Tree quickly
narrows down the region of interest by traversing the tree
structure, making the search faster than a brute-force search.
BALL TREE
Ball Tree Overview:
A Ball Tree is another hierarchical data structure designed for
high-dimensional spaces, much like a KD-Tree, but optimized
for cases where data has more than 2-3 dimensions.
How it Works: The Ball Tree organizes data points into
hierarchical clusters (balls) based on distance from centroids,
recursively subdividing the dataset into smaller balls.
Benefit: Ball Trees are especially efficient when dealing with
high-dimensional data (e.g., image data with hundreds of
features).
How Ball Tree Works: The tree recursively divides the data
into smaller clusters, calculating the distance from the
centroid of each cluster.
CONCLUSION
In summary, K-NN is a good choice for problems where simplicity and flexibility
are crucial, but for larger datasets or high-dimensional data, optimizations like
KD-Tree or Ball-Tree are essential for ensuring performance
THANK
YOU

5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
AIML PPT[1]
No ratings yet
AIML PPT[1]
13 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
Lecture Note #3_PEC-CS701E
No ratings yet
Lecture Note #3_PEC-CS701E
27 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
Chapter 4. K Nearest Neighbors (2)
No ratings yet
Chapter 4. K Nearest Neighbors (2)
55 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
No ratings yet
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
11 pages
Notes: KNN: K-Nearest Neighbors
No ratings yet
Notes: KNN: K-Nearest Neighbors
4 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
KNN 2
No ratings yet
KNN 2
53 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
ML 4 (1)
No ratings yet
ML 4 (1)
33 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
06-knn
No ratings yet
06-knn
41 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
kNN (1)
No ratings yet
kNN (1)
20 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
KNN
No ratings yet
KNN
53 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
KNN
No ratings yet
KNN
7 pages
ML-Lecture-13-KNN
No ratings yet
ML-Lecture-13-KNN
14 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
K - Nearest Neighbours
No ratings yet
K - Nearest Neighbours
6 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
K - Nearest Neighbours (K-NN) Algorithm
No ratings yet
K - Nearest Neighbours (K-NN) Algorithm
10 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
ML Notes
100% (2)
ML Notes
125 pages
Machine learning Lecture 02
No ratings yet
Machine learning Lecture 02
25 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
4+KNN+Classifier
No ratings yet
4+KNN+Classifier
6 pages
KNN
No ratings yet
KNN
10 pages
K_Nearest_Neighbour_Classifier
No ratings yet
K_Nearest_Neighbour_Classifier
24 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mealy and Moore Machines
No ratings yet
Mealy and Moore Machines
20 pages
batch 1 Job market analysis and prediction-1
No ratings yet
batch 1 Job market analysis and prediction-1
60 pages
QT-03 Basic Quantum Programming 2025 Final
No ratings yet
QT-03 Basic Quantum Programming 2025 Final
1 page
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Sytem of Linear Equations - : Phan Thi Khanh Van
No ratings yet
Sytem of Linear Equations - : Phan Thi Khanh Van
14 pages
Assignment 3 (ESE353)
No ratings yet
Assignment 3 (ESE353)
1 page
Introduction To Econometrics, 5 Edition: Chapter 1: Simple Regression Analysis
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 1: Simple Regression Analysis
26 pages
Mechanical Vibrations - L1
No ratings yet
Mechanical Vibrations - L1
32 pages
L01-02 - 8.11.2022 - Introduction To Algorithms and Roadmap of This Course
No ratings yet
L01-02 - 8.11.2022 - Introduction To Algorithms and Roadmap of This Course
48 pages
CP5074 - SNA Unit III Notes
No ratings yet
CP5074 - SNA Unit III Notes
27 pages
Fig 10: Object Detection in Tensorflow
No ratings yet
Fig 10: Object Detection in Tensorflow
6 pages
Aarthi_report
100% (1)
Aarthi_report
28 pages
Social Prediction: A New Research Paradigm Based On Machine Learning
No ratings yet
Social Prediction: A New Research Paradigm Based On Machine Learning
21 pages
For Dowload This Book Click LINK or Button Below
100% (6)
For Dowload This Book Click LINK or Button Below
64 pages
010 Syllabus
No ratings yet
010 Syllabus
2 pages
Unit 1: Basics of Databases
No ratings yet
Unit 1: Basics of Databases
12 pages
Computer Aided Design For Vlsi
No ratings yet
Computer Aided Design For Vlsi
4 pages
stastyy
No ratings yet
stastyy
2 pages
Flowchart Topical
No ratings yet
Flowchart Topical
17 pages
اسس هندسة السيطرة النظري-مرحلة ثالثة-قسم هندسة تقنيات الحاسوب
No ratings yet
اسس هندسة السيطرة النظري-مرحلة ثالثة-قسم هندسة تقنيات الحاسوب
80 pages
Perona-Malik Anisotropic Filtering: Machine Vision - RE4107 Dr. Colin Flanagan
No ratings yet
Perona-Malik Anisotropic Filtering: Machine Vision - RE4107 Dr. Colin Flanagan
6 pages
DPP1
No ratings yet
DPP1
2 pages
CNSL Groupb 1
No ratings yet
CNSL Groupb 1
7 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Using Generative Adversarial Networks For Improving Classification Effectiveness in Credit Card Fraud Detection
100% (1)
Using Generative Adversarial Networks For Improving Classification Effectiveness in Credit Card Fraud Detection
8 pages
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters Via Wise Resource Sharing
No ratings yet
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters Via Wise Resource Sharing
11 pages
Unit III PPT Slides
No ratings yet
Unit III PPT Slides
157 pages
Authentication Protocol
No ratings yet
Authentication Protocol
27 pages
LECTURE 5 - Exact Differential Equations
100% (1)
LECTURE 5 - Exact Differential Equations
5 pages
Simplilearn Deep Learning
No ratings yet
Simplilearn Deep Learning
6 pages