0% found this document useful (0 votes)

8 views

4+KNN+Classifier

The K-Nearest Neighbors (KNN) classifier is a simple, non-parametric algorithm used for classification and regression based on feature similarity. It operates by storing the training dataset and classifying new instances based on the majority class of their nearest neighbors, with various distance measures like Euclidean, Manhattan, and Mahalanobis available for similarity computation. Key considerations include selecting an appropriate value for K and understanding the algorithm's pros and cons, such as its simplicity and computational expense.

Uploaded by

Geetakshi Kandpal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

4+KNN+Classifier

Uploaded by

Geetakshi Kandpal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

K- nearest neighbour classifier

Prerequisite-

- Liner Algebra and cartesian plane

- Evaluation Metrics for classification accuracy.

Objectives( Lerner will able to understand and explain)

- What is KNN classifier

- Different types of distance measures
- How to choose K value
- Pros and cons of KNN classifier

KNN Algorithm-

KNN algorithm also known as K-Nearest Neighbors Algorithm is used to solve the both problems of
classification as well as regression. This working principle of algorithm is mainly based on feature
[email protected]
similarity in both classification and regression problem.
QN3DYCG1XF KNN classifier is different from other
probabilistic classifiers where the model comprises a learning step of computing probabilities from
a training sample and use them for future prediction of a test sample. In probability based model
once the model is trained the training sample could be thrown away and classification is done using
the computed probabilities.
In KNN there is no learning step involved instead the dataset is stored in memory and is used to
classify the test query on the fly. KNN is also known as Lazy learner as it does not create a model using
training set in advance. It is one of the simplest methods of classification. In KNN, the term `k’ is a parameter
which refers to the number of nearest neighbours. The classification procedure for a query point q works in
two steps as:

1. Find the K neighbours in the dataset which are closet to q based on the similarity measure.
2. Use these K neighbours to determine the class of q using majority voting.

Proprietary
Thiscontent. © Greatfor
file is meant Learning. All Rights
personal use Reserved. Unauthorized use or distribution
by [email protected] prohibited.
only.
1
Sharing or publishing the contents in part or full is liable for legal action.
Distance Measure-

KNN classifier needs to compute similarities or distances of test query from each sample point in
training dataset. Several methods are used to compute the distances and the choice completely depends
of the types of features in the dataset. The popular distance measurements are as follows:

Euclidean Distance-

It is the most commonly used distance metrics and defined as the square root of the sum of squared
differences between the two points. Let the two points are P(x1, x2) and Q (y1, y2) the Euclidean distance is
given by:
[email protected]
QN3DYCG1XF
𝑃𝑄#$%&'()*+ = .(𝑥1 − 𝑦1 )5 + (𝑥5 − 𝑦5 )5
In general

𝑃𝑄#$%&'()*+ = 78(𝑥' − 𝑦' )5

'91

Manhattan Distance-

It is also known as city block distance or absolute distance. The distance measure is inspired with
the structure of Manhattan city where the distance between two points is measured through city road grids.
The distance is defined as the sum of absolute differences between two points coordinates.
𝑃𝑄:*+;*<<*+ = |𝑥1 − 𝑦1 | + |𝑥5 − 𝑦5 |
or
+

𝑃𝑄:+;<<*+ = 8|𝑥' − 𝑦' |

'91

Proprietary
Thiscontent. © Greatfor
file is meant Learning. All Rights
personal use Reserved. Unauthorized use or distribution
by [email protected] prohibited.
only.
2
Sharing or publishing the contents in part or full is liable for legal action.
Chebyshev Distance-

This distance is also known as Maximum value distance or chessboard distance. The distance is
based on absolute magnitude between the coordinates of pair of two points. This distance is equally used
with the quantitative and ordinal variable.
𝑃𝑄>;)?@A;)B = 𝑀𝑎𝑥(|𝑥1 − 𝑦1 |, |𝑥5 − 𝑦5 |)
or
𝑃𝑄>;)?@A;)B = 𝑀𝑎𝑥( |𝑥' − 𝑦' |)

Example: Let P (1, 2) and Q (3, 5)

𝑃𝑄#$%&'()*+ = .(1 − 3 )5 + (2 − 5)5 = √13

[email protected]
QN3DYCG1XF

𝑃𝑄:*+;*<<*+ = |1 − 3| + |2 − 5| = 5

𝑃𝑄>;)?@A;)B = 𝑀𝑎𝑥(|1 − 3|, |2 − 5|) = 3

Proprietary
Thiscontent. © Greatfor
file is meant Learning. All Rights
personal use Reserved. Unauthorized use or distribution
by [email protected] prohibited.
only.
3
Sharing or publishing the contents in part or full is liable for legal action.
Minkowski Distance-

Minkowski distance is one of the generalized distance measure, which means that by
manipulating the formula different distances measures can be obtained. Above stated distance
measures are the special case of Minkowski distance.
+
1
𝑃𝑄:'+KLMAK' = (8(𝑥' − 𝑦' )N )N
'

When p =1 Minkowski has become Manhattan distance.

When p =2 Minkowski has become Euclidean distance.
When p = ∞ Minkowski has become Chebyshev distance.

Mahalanobis Distance-

This distance measure is used to calculate the distance between two points in multivariate space.
The idea is to calculate the distance of a point P from any distribution D in terms of standard deviation and
mean of distribution D. The main advantage of mahalanobis distance is that it includes the covariance of
distribution to measure the similarity between two points. The distance equation is given by:

[email protected] 𝑃𝑄P;&*L?'A = .(𝑃 − 𝑄)Q 𝑆 S1 (𝑃 − 𝑄)

QN3DYCG1XF
Where P and Q are two random vectors of same distribution and S is covariance matrix.

NOTE: Most widely used distance measure is Euclidean distance, But all the distances have their respective
purpose and importance. One cannot say or claim that only one particular distance measure is always
accurate.

How to choose value of K-

The choice of k value in KNN classifier is critical. A small value of K implies a higher influence of
noise over result whereas a large value makes it computationally expensive.
Some heuristics suggest to choose a K value = sqrt(N)/2 where N is the size of training dataset. Apart this
an odd value (3, 5, 7,..) of K helps to avoid tie between predicted classes.

Other Variants-

1. Radius Neighbour Classifier- Radius Neighbour classifier implements learning by computing

the number of neighbours within a fixed radius R for each training point. Radius Neighbour classifier
is a good choice in case when sampling of data is not uniform. However, if the dataset has many
attributes and is sparse this method becomes ineffective due the curse of dimensionality.

Proprietary
Thiscontent. © Greatfor
file is meant Learning. All Rights
personal use Reserved. Unauthorized use or distribution
by [email protected] prohibited.
only.
4
Sharing or publishing the contents in part or full is liable for legal action.
2. KD Tree Nearest Neighbour-This method use KD tree approach to implementing classifier. The
method helps to reduce overall computation time for KNN classifier and becomes effective when
there is large number of samples present in training set with few dimensions.
3. KNN Regression- The general principle of KNN regression is very much similar to KNN classifier
except the target is continuous real value instead discrete and is predicted by calculating the average
of neighbour values.

Example- Consider training data is given as shown in the table and we have asked to predict label of a
query point q (5.6, 3.8). Let us use Euclidean distance to measure the similarity between query and training
sample points with K = 3.

X1 X2 Label
6.3 5.1 0
6.0 4.8 0
6.8 4.8 1
5.7 3.7 1
5.7 4.1 1
5.5 4.0 1
6.5 5.1 0
5.5 3.7 1
4.9 4.5 0
[email protected]
QN3DYCG1XF
Step 1- Compute the distance between query and sample points

𝑑1 = .(5.6 − 6.3)5 + (3.8 − 5.1)5 = 1.476

𝑑5 = .(5.6 − 6.0)5 + (3.8 − 4.8)5 = 1.077

𝑑[ = .(5.6 − 6.8)5 + (3.8 − 4.8)5 = 1.562

𝑑\ = .(5.6 − 5.7)5 + (3.8 − 3.7)5 = 0.141

𝑑] = .(5.6 − 5.7)5 + (3.8 − 4.1)5 = 0.316

𝑑^ = .(5.6 − 5.5)5 + (3.8 − 4.0)5 = 0.224

𝑑_ = .(5.6 − 6.5)5 + (3.8 − 5.1)5 = 1.581

𝑑` = .(5.6 − 5.5)5 + (3.8 − 3.7)5 = 0.141

𝑑a = .(5.6 − 4.9)5 + (3.8 − 4.5)5 = 0.990

Proprietary
Thiscontent. © Greatfor
file is meant Learning. All Rights
personal use Reserved. Unauthorized use or distribution
by [email protected] prohibited.
only.
5
Sharing or publishing the contents in part or full is liable for legal action.
Step 2- Select top K =3 Neighbours based on similarity

The top 3 neighbours are: P4, P8 and P6 with 𝑑\ = 0.141, 𝑑` = 0.141 𝑎𝑛𝑑 𝑑^ = 0.224

Step 3- Determine the class of q by majority voting.

Point X1 X2 Label

Point 4 5.7 3.7 1

Point 8 5.5 3.7 1

Point 6 5.5 4.0 1

Query q 5.6 3.8 Predicted Class- 1

ONE MORE TRY

Select top K =5 Neighbours based on similarity

The top 5 neighbours are: P4, P8, P6, P5 and P9 with 𝑑\ = 0.141, 𝑑` = 0.141, 𝑑^ = 0.224, 𝑑] =
0.316 𝑎𝑛𝑑 𝑑a = 0.990

Point X1 X2 Label

[email protected] Point 4 5.7 3.7 1

QN3DYCG1XF
Point 8 5.5 3.7 1

Point 6 5.5 4.0 1

Point 5 5.7 4.1 1

Point 9 4.9 4.5 0

Query q 5.6 3.8 Predicted Class- 1

Pros and Cons-

Pros- K- nearest neighbour algorithm is very simple to implement and the algorithm is robust when the
error ratio is small. It also does not make any assumption about the distribution of classes and can work
with multiple classes simultaneously.

Cons- It calculates distance for every new point so become computationally expensive(Lazy Learner). The
method is not effective when distribution overlaps with each other. Fixing an optimal value of K is the
challenge in KNN method.

******

Proprietary
Thiscontent. © Greatfor
file is meant Learning. All Rights
personal use Reserved. Unauthorized use or distribution
by [email protected] prohibited.
only.
6
Sharing or publishing the contents in part or full is liable for legal action.

K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
KNN
No ratings yet
KNN
29 pages
Chapter 4. K Nearest Neighbors (2)
No ratings yet
Chapter 4. K Nearest Neighbors (2)
55 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
K-Nearest Neighbor (KNN) ..: Class or Value
No ratings yet
K-Nearest Neighbor (KNN) ..: Class or Value
18 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Research Paper
No ratings yet
Research Paper
6 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
KNN 2
No ratings yet
KNN 2
53 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
KNN v2
No ratings yet
KNN v2
31 pages
Instance-Based Learning: K-Nearest Neighbour Learning
No ratings yet
Instance-Based Learning: K-Nearest Neighbour Learning
21 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
K - Nearest Neighbours (K-NN) Algorithm
No ratings yet
K - Nearest Neighbours (K-NN) Algorithm
10 pages
Ue21cs352a 20230830121058
No ratings yet
Ue21cs352a 20230830121058
18 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
m3 final-1
No ratings yet
m3 final-1
171 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Vecinos Mas Cercanos 01
No ratings yet
Vecinos Mas Cercanos 01
13 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
ML Lec-13
No ratings yet
ML Lec-13
17 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Bài-nhóm-tìm-hiểu-về-KNN
No ratings yet
Bài-nhóm-tìm-hiểu-về-KNN
5 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
AIML PPT[1]
No ratings yet
AIML PPT[1]
13 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
No ratings yet
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
6 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Rewards Criteria 1
No ratings yet
Rewards Criteria 1
1 page
EmiratesTicket1
No ratings yet
EmiratesTicket1
4 pages
Business Report
No ratings yet
Business Report
43 pages
Main+Projects+Rubrics+-+PM+-+Coded+%28NEW%29
No ratings yet
Main+Projects+Rubrics+-+PM+-+Coded+%28NEW%29
2 pages
SQL- Data Manipulation Language_Concepts (1) (1)
No ratings yet
SQL- Data Manipulation Language_Concepts (1) (1)
6 pages
submission_template_coded_project
No ratings yet
submission_template_coded_project
12 pages
Program Delivery Updated July 24 A (1)
No ratings yet
Program Delivery Updated July 24 A (1)
2 pages
ECERA White Paper On Digital Circular Economy
No ratings yet
ECERA White Paper On Digital Circular Economy
28 pages
P0112, P0113
No ratings yet
P0112, P0113
4 pages
Things To Know Before Starting A Sublimation Business-2
No ratings yet
Things To Know Before Starting A Sublimation Business-2
49 pages
E 480: Reliability and Failure Analysis: Presented By: Nur Hamidah Abdul Halim
No ratings yet
E 480: Reliability and Failure Analysis: Presented By: Nur Hamidah Abdul Halim
49 pages
Final Test Preparation Basaing
No ratings yet
Final Test Preparation Basaing
15 pages
Differentiation Formulas PDF
No ratings yet
Differentiation Formulas PDF
7 pages
PTO TEMPLATE (eLEVATOR)
No ratings yet
PTO TEMPLATE (eLEVATOR)
1 page
Datasheet E-Box Wifi en
No ratings yet
Datasheet E-Box Wifi en
2 pages
An Open Source EDA Tool For Circuit Design, Simulation, Analysis and PCB Design
No ratings yet
An Open Source EDA Tool For Circuit Design, Simulation, Analysis and PCB Design
129 pages
New Pegler Catlogue
No ratings yet
New Pegler Catlogue
84 pages
Ii Puc Practical Examination, Feb - Mar 2022: Viva Questions 1. 2. 3. 4. 5
No ratings yet
Ii Puc Practical Examination, Feb - Mar 2022: Viva Questions 1. 2. 3. 4. 5
3 pages
DRDO Sponsored Eighth Control Instrumentation System Conference, CISCON-2011 (An International Conference)
No ratings yet
DRDO Sponsored Eighth Control Instrumentation System Conference, CISCON-2011 (An International Conference)
4 pages
Menus and Their Types1
No ratings yet
Menus and Their Types1
8 pages
4023881
No ratings yet
4023881
3 pages
The Influence of Social Media On Economic Growth in The Society Project
No ratings yet
The Influence of Social Media On Economic Growth in The Society Project
58 pages
Cyber Crime
No ratings yet
Cyber Crime
8 pages
Linux Tutorial
No ratings yet
Linux Tutorial
23 pages
Evaluation of Policy Making in Russell Group
No ratings yet
Evaluation of Policy Making in Russell Group
6 pages
Anushka - 21BCS11780 Resume 24 July-1 - Anushka Jha
No ratings yet
Anushka - 21BCS11780 Resume 24 July-1 - Anushka Jha
1 page
QM - Artix7 - Xc7A35T DB: User Manual
No ratings yet
QM - Artix7 - Xc7A35T DB: User Manual
19 pages
Leak Free Pump Catalogue
No ratings yet
Leak Free Pump Catalogue
4 pages
Introduction To FPGA Programming
No ratings yet
Introduction To FPGA Programming
28 pages
Petrel 2011: A Simple Case Study To Investigate Why SWATINIT Failed To Match Initial Water Saturation?
100% (2)
Petrel 2011: A Simple Case Study To Investigate Why SWATINIT Failed To Match Initial Water Saturation?
11 pages
Taksh-Panchal-Resume
No ratings yet
Taksh-Panchal-Resume
1 page
DVV T
No ratings yet
DVV T
6 pages
Hexa Multipay
No ratings yet
Hexa Multipay
16 pages
Branch For Additional Details and Information. Consult With Diebold Installation/Service
No ratings yet
Branch For Additional Details and Information. Consult With Diebold Installation/Service
4 pages
Mintesinot Melese
No ratings yet
Mintesinot Melese
19 pages
811720114017intern Report
No ratings yet
811720114017intern Report
13 pages
El 31 1 30
No ratings yet
El 31 1 30
8 pages