0% found this document useful (0 votes)
82 views

K- Nearest Neighbors.pptx

K-Nearest Neighbors (KNN) is a powerful, non-parametric classification algorithm that classifies new instances based on the majority vote of their K nearest neighbors in an n-dimensional feature space. It is intuitive and effective with large datasets, but can be sensitive to noise and requires careful selection of K. The algorithm involves calculating distances between instances and can be affected by feature weighting and normalization.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

K- Nearest Neighbors.pptx

K-Nearest Neighbors (KNN) is a powerful, non-parametric classification algorithm that classifies new instances based on the majority vote of their K nearest neighbors in an n-dimensional feature space. It is intuitive and effective with large datasets, but can be sensitive to noise and requires careful selection of K. The algorithm involves calculating distances between instances and can be affected by feature weighting and normalization.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

K- Nearest Neighbors

Simple Analogy..
KNN – Different names
• K-Nearest Neighbors
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
What is KNN?
• A powerful classification algorithm used in pattern
recognition.

• K nearest neighbors stores all available cases and


classifies new cases based on a similarity measure(e.g
distance function)

• One of the top data mining algorithms used today.

• A non-parametric lazy learning algorithm (An Instance


based Learning method).
KNN: Classification Approach
• An object (a new instance) is classified by a
majority votes for its neighbor classes.
• The object is assigned to the most common
class amongst its K nearest
neighbors.(measured by a distant function )
Distance Measure
Distance measure for Continuous
Variables
Distance Between Neighbors
K-Nearest Neighbor Algorithm
• All the instances correspond to points in an n-dimensional feature
space.

• Each instance is represented with a set of numerical attributes.

• Each of the training data consists of a set of vectors and a class label
associated with each vector.

• Classification is done by comparing feature vectors of different K


nearest points.

• Select the K-nearest examples to E in the training set.

• Assign E to the most common class among its K-nearest


• neighbors.
3-KNN: Example
How to choose K?
• If K is too small it is sensitive to noise points.
• Larger K works well. But too large K may
include majority points from other classes.
• Rule of thumb is K < sqrt(n), n is number of
examples.
KNN Feature Weighting
• Scale each feature by its importance for
classification
• Can use our prior knowledge about which
features are more important
• Can learn the weights wk using
cross-validation
Feature Normalization
Nominal/Categorical Data
• Distance works naturally with numerical
attributes.
• Binary value categorical data attributes can be
regarded as 1 or 0.
KNN Classification – Distance
KNN Classification – Standardized
Distance
Example
• Dataset
Example

• Dataset
• Classify the new example
• Distance to each training example
• 2. Find Rank
Strengths of KNN
• Very simple and intuitive.

• Can be applied to the data from any


distribution.

• Good classification if the number of samples is


large enough.
Weaknesses of KNN
• Takes more time to classify a new example.

• Need to calculate and compare distance from


new example to all other examples.

• Choosing k may be tricky.

• Need large number of samples for accuracy.

You might also like