0% found this document useful (0 votes)
14 views

KNN

K Nearest Neigbhours presentation in Machine Learning

Uploaded by

Mahnoor Farooq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

KNN

K Nearest Neigbhours presentation in Machine Learning

Uploaded by

Mahnoor Farooq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

K NEAREST

NEIGHBOR
Presented By:
Mahnoor Farooq- 2021-CS-403
Ayesha Nadeem - 2021-CS-413
Saria Irshad- 2021-CS-425
Saba Shahzadi-2021-CS-411
• Introduction

• Distance Metrics

• Choice of k

AGENDA • Feature Scaling

• KNN for classification

• KNN for regression

• Search Algorithm

• Challenges & Limitation


WHAT IS KNN?

• A powerful supervised learning algorithm used for both classification and


regression problem, but mainly used for classification.
• K nearest neighbors stores all available cases and classifies new cases based
on a similarity measure.
• It is a Lazy Learning Algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification,
it performs an action on dataset.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
KNN-DIFFERENT


NAMES
K-Nearest Neighbbors
• Memory-based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
KNN-PRINCIPLE

• This technique implements classification by considering majority if


vote among the “k” closest points to the unlabeled data point.
• K here describes the Numbers of instances point that should be
taken into the consideration
KNN-
EXAMPLE
Green circle is the unlabeled data
point.
• k=3 in this problem
• Closest 3 points taken
• 2 are red 1 is blue
• Votes= 2Red>1Blue
• Green circle is a red triangle
KNN-
EXAMPLE
Green circle is the unlabeled data
point.
• k=5 in this problem
• Closest 5 points taken
• 2 are red 3 is blue
• Votes= 2Red>3Blue
• Green circle is a Blue square
DISTANCE
• Euclidean Distance: Euclidean distance is calculated as the square root of the
METRICS
sum of the squared differences between a new point (x) and an existing point (y).

• Manhattan Distance: Manhatten distance says that the distance between your
starting point and the destination point.

• Hamming Distance
EUCLIDEAN
DISTANCE
Suppose the vector X1 and X2 coordinates are in 2-Dim,
X1(x1, y1) = X1(3, 4)
X2(x2, y2) = X2(4, 7), as you can see in the image.
So these are a 2-Dim vector so our eucledian distance
mathematical equation for finding the distance between
X1 and X2 is:
distance = sqrt( (x2-x1)2 + (y2-y1)2 )
When we put our coordinates in the equations,
distance = sqrt( (4-3)2 + (7-4)2 )
distance = sqrt( 1+32 )
distance = sqrt( 10 )
distance = 3.1 (approx)
MANHATTAN DISTANCE

one cab starting from point X1 and has to reach its


destination point X2 so we didn’t calculate its
shortest path apart from this we have to calculate its
absolute or full path they travel.
distance = absolute sum ||xi-yi||
distance = (7 + 4)
distance = 11
SO, the absolute path the cab cover is 11.
HAMMING DISTANCE
Hamming distance: This technique is used typically
used with Boolean or string vectors, identifying the
points where the vectors do not match.

Suppose we have two points X1 and X2 and both of


them are boolean vectors, represented as:
X1 = [ 0,0,0,1,0,1,1,0,1,1,1,0,0,0,1]
X2 = [ 0,1,0,1,0,1,0,0,1,0,1,0,1,0,1]
So more simply the hamming distance of (X1, X2) is the
location where the binary vectors or numbers are
different.
Hamming distance(X1, X2) = 3
PERFORMANCE OF KNN

The performance of K-NN heavily depends on two key


factors:
1.Choice of K: The number of neighbors considered
during prediction.
2.Hyperparameter Tuning: Tuning parameters like
distance metrics and K itself to optimize performance.
CHOOSING THE K

• K is a critical hyperparameter that affects the


bias-variance tradeoff:
• Small K (e.g., K=1): Leads to overfitting as the
model is too sensitive to noise in the data.
• Large K (e.g., K=20): Leads to underfitting as
the model becomes too generalized and
smooths out details.
• Ideal K balances between noise sensitivity and
model complexity.
HYPERPARAMETER
TUNING
Cross-validation is essential to tune
hyperparameters like K. It helps ensure the model
generalizes well to unseen data.
K-Fold Cross-Validation:
• 1. Split the dataset into K equal-sized subsets
(folds).
• 2. For each fold, train the model on K-1 folds and
test on the remaining fold.
• 3. Calculate the error for each fold.
• 4. Average the errors across all K folds to get the
final model performance.
PREPROCESSING ON
KNN
Key Preprocessing Steps for K-NN:
• Feature Scaling
• Handling Missing Data: Missing data can distort distance calculations.
• Imputation: Replace missing values with the mean, median, or mode based on
the feature type.
• Handling Noisy Data and Outliers: Noisy data can lead to incorrect predictions.
• Outlier Detection: Use methods like Z-Score or IQR to identify and remove
outliers.
FEATURE SCALING
• K-NN is sensitive to feature scales because it calculates
distances between data points.
• Features with larger ranges (e.g., Income in USD) will
dominate the distance metric, making features with smaller
ranges (e.g., Age in years) less impactful.
Scaling methods :
1.Min-Max Scaling: Scales features to a fixed range, typically [0,
1].
2.Standardization: Scales features to have a mean of 0 and
standard deviation of 1.
MIN-MAX SCALING
• Where X' is the scaled value, X is the original value,
• Min-Max scaling rescales the feature values to the range [0, 1].
• Example: If Age values are between 20 and 80, and the value is 50, applying
Min-Max scaling will rescale it to:
STANDARIZATION
Standardization Formula:
• Where X' is the standardized value, X is the original value,
• Standardization converts features to have a mean of 0 and a standard deviation of 1.
• Example: If the feature "Salary" has a mean of 50,000 and a standard deviation of
10,000, a Salary value of 60,000 will be standardized as:
KNN FOR
Distance Calculation:
CLASSIFICATION
• The distance between a new sample x and an existing
sample x_i is calculated using a distance metric like
Euclidean distance.
• Euclidean Distance Formula:
• x_j is the j-th feature of the new data point x,
• x_ij is the j-th feature of the training point x_i,
• n is the total number of features.
• Majority Vote: Once the distances are calculated, the K
nearest neighbors are identified, and the majority class of
the neighbors determines the predicted class for the new
point.
KNN FOR REGRESSION

• K-NN for Regression predicts continuous values rather


than discrete classes.
• Instead of majority voting, the predicted value for a new
point is the average of the values of its K nearest
neighbors.
KNN FOR REGRESSION

Distance Calculation: Same as for classification, using Euclidean distance or other distance
metrics.
• Prediction Rule: The predicted value for regression is the average of the values of the K
nearest neighbors.
Formula:
• Where y_i is the target value for the i-th nearest neighbor.
• Weighted Average (Optional): A weighted average can be used where closer neighbors have
more influence.
Where w_i is the weight based on distance.
CHOOSING K FOR CLASSIFICATION &
REGRESSION
choosing K:
• Small K: Small K values (e.g., K=1) are sensitive to noise and can lead to overfitting.
• Large K: Large K values (e.g., K=20) may smooth out the model too much, leading to
underfitting.
• The optimal value of K is typically found by cross-validation.
For Classification:
K should be large enough to avoid noise but small enough to preserve local patterns in
the data.
For Regression:
The optimal K balances between the local behavior of the data and generalizing trends.
KNN FOR CLASSIFICATION VS REGRESSION

Classification: K-NN assigns a class label based on the majority vote of the K
nearest neighbors.
Regression: K-NN predicts a continuous value based on the average (or weighted
average) of the K nearest neighbors' target values.
Mathematical Difference:
In classification, the output is a class label.
In regression, the output is a continuous value (mean or weighted mean of
neighbors’ target values).
CHALLENGES & LIMITATIONS
1.Computational Cost (Slow Prediction Time):
Problem: K-NN calculates the distance between the test point and all points in the training
dataset at prediction time, making it computationally expensive.
Example: In a dataset with 1 million data points, predicting the class of a new data point
requires calculating the distance between this point and all 1 million points, which can be
slow.
2.Curse of Dimensionality:
Problem: As the number of features (dimensions) increases, the concept of distance
becomes less meaningful, leading to poor performance in high-dimensional spaces.
Example: When classifying high-dimensional data (e.g., pixel values of an image with 100
features), the notion of 'closeness' becomes distorted.
CHALLENGES & LIMITATIONS
3.Memory and Storage Requirements:
Problem: K-NN stores the entire training dataset, which can be inefficient if the
dataset is very large.
Example: For a large image dataset, storing millions of images can consume a lot of
memory.
4.Sensitivity to Noisy Data and Outliers:
Problem: K-NN relies on the proximity of data points to make predictions. If there are
noisy data points or outliers, they can influence the classification of the test point.
Example: In a classification task with most data points labeled as 'cat' but one outlier
labeled 'dog', the test point may be incorrectly classified as 'dog'.
OPTIMIZING KNN
KD-Tree Overview:
A KD-Tree is a hierarchical binary tree used to organize data points
in a k-dimensional space.
How it Works: KD-Trees recursively split the data into two halves
along the median of each dimension, creating a binary tree
structure.
Benefit: Searching for nearest neighbors becomes more efficient,
as the tree structure allows for pruning of large portions of the
data, reducing the search space during prediction.
Example: When a test point is provided, the KD-Tree quickly
narrows down the region of interest by traversing the tree
structure, making the search faster than a brute-force search.
BALL TREE
Ball Tree Overview:
A Ball Tree is another hierarchical data structure designed for
high-dimensional spaces, much like a KD-Tree, but optimized
for cases where data has more than 2-3 dimensions.
How it Works: The Ball Tree organizes data points into
hierarchical clusters (balls) based on distance from centroids,
recursively subdividing the dataset into smaller balls.
Benefit: Ball Trees are especially efficient when dealing with
high-dimensional data (e.g., image data with hundreds of
features).
How Ball Tree Works: The tree recursively divides the data
into smaller clusters, calculating the distance from the
centroid of each cluster.
CONCLUSION
In summary, K-NN is a good choice for problems where simplicity and flexibility
are crucial, but for larger datasets or high-dimensional data, optimizations like
KD-Tree or Ball-Tree are essential for ensuring performance
THANK
YOU

You might also like