0% found this document useful (0 votes)

3 views

5. K-Nearest Neighbors Classifiers 2025

The document provides an overview of the K-Nearest Neighbors (K-NN) algorithm, which is primarily used for classification problems by comparing new data entries to existing data based on proximity. It outlines the procedure for implementing K-NN, including choosing the value of K, calculating distances using metrics like Euclidean and Manhattan distances, and classifying new entries based on majority voting among neighbors. Additionally, it includes a practical example of classifying iris species using the K-NN algorithm with the Iris dataset from scikit-learn.

Uploaded by

Yashashvi Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

5. K-Nearest Neighbors Classifiers 2025

Uploaded by

Yashashvi Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Classification Algorithms

Pushparaj, Amrita Univ, Cbe

K-Nearest Neighbors
Classifier
(KNN)

Pushparaj, Amrita Univ, Cbe

K-Nearest Neighbors (K-NN) algorithm

• Used mostly for solving classification problems

• Compares a new data entry to the values in a given data set
• Based on its closeness or similarities in a given range (K) of neighbors,
the algorithm assigns the new data to a class or category in the data
set (training data)

Pushparaj, Amrita Univ, Cbe

K-NN Procedure
• Step #1 - Assign a value to K.
• Step #2 - Calculate the distance between the new data entry
and all other existing data entries. Arrange them in
ascending order.
• Step #3 - Find the K nearest neighbors to the new entry
based on the calculated distances.
• Step #4 - Assign the new data entry to the majority class in
the nearest neighbors.

Pushparaj, Amrita Univ, Cbe

The graph above represents a data set
consisting of two classes — red and blue.
A new data entry has been introduced, the
green point
Pushparaj, Amrita Univ, Cbe
• Assign a value to K which
denotes the number of
neighbors to consider before
classifying the new data entry.
• Let's assume the value of K is 3.
• Since the value of K is 3, the
algorithm will only consider the
3 nearest neighbors to the
green point (new entry)
• Out of the 3 nearest neighbors
in the diagram above, the
majority class is red so the new
entry will be assigned to that
class. Pushparaj, Amrita Univ, Cbe
K-Nearest Neighbors Classifiers and Model
Example With Data Set
• We have two columns —
Brightness and Saturation.
• Each row in the table has a class of
either Red or Blue.
• Let's assume the value of K is 5
The new data entry:

Pushparaj, Amrita Univ, Cbe

Distance Metrics Used in KNN Algorithm
• Euclidean Distance
• Measures a straight line
between the query point and
the other point being
measured

Pushparaj, Amrita Univ, Cbe

• Manhattan
distance
• Measures the
absolute value
between two
points

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe
How to Calculate Euclidean Distance

• Here's the new data entry:

• To know its class, we have to calculate the distance from the

new entry to other entries in the data set using the Euclidean
distance formula

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
• The majority class within the 5 nearest neighbors to the new
entry is Red. Therefore, we'll classify the new entry as Red.
Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
How to Choose the Value of K in the K-NN
Algorithm

• There is no particular way of choosing the value K, but here

are some common conventions to keep in mind:
• Choosing a very low value will most likely lead to inaccurate
predictions.
• The commonly used value of K is 5.
• Always use an odd number as the value of K.

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe
Classifying Iris Species
❖ An ML model for distinguishing the
species of some iris flowers
❖ Data of iris measurements in cm:
❖ the length and width of the petals
❖ the length and width of the sepals
❖ Each instance of iris has a class label of
species (setosa, versicolor, or virginica)
❖ Goal is to build a machine learning
model that can learn from the
measurements of these irises whose
species is known, so that we can
predict the species for a new iris.
❖ A three-class classification problem
❖ Data: Iris dataset Parts of the iris flower
Pushparaj, Amrita Univ, Cbe
Data : Iris dataset
Included in scikit-learn in the datasets module
Load it by calling the load_iris function:

from sklearn.datasets import load_iris

iris_dataset = load_iris()

The iris object that is returned by load_iris is a Bunch object, which is very similar
to a dictionary. It contains keys and values:

print("Keys of iris_dataset: \n{}".format(iris_dataset.keys()))

Keys of iris_dataset:
dict_keys(['target_names', 'feature_names', 'DESCR', 'data',
'target'])
Pushparaj, Amrita Univ, Cbe
The value of the key DESCR is a short description of the dataset

print(iris_dataset['DESCR'][:193] + "\n...")

The value of the key target_names is an array of strings, containing the species of flower
print("Target names: {}".format(iris_dataset['target_names']))
Target names: ['setosa' 'versicolor' 'virginica']
print("Feature names: \n{}".format(iris_dataset['feature_names']))
Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
'petal width (cm)']
Pushparaj, Amrita Univ, Cbe
print("Type of data: {}".format(type(iris_dataset['data'])))
Type of data: <class 'numpy.ndarray'>
print("Shape of data: {}".format(iris_dataset['data'].shape))
Shape of data: (150, 4)
First five rows of data:
[[ 5.1 3.5 1.4 0.2]
print("First five rows of data:\n{}".format(iris_dataset['data'][:5])) [ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]
[ 5. 3.6 1.4 0.2]]

print("Type of target: {}".format(type(iris_dataset['target'])))

Type of target: <class 'numpy.ndarray'>

Pushparaj, Amrita Univ, Cbe

print("Shape of target: {}".format(iris_dataset['target'].shape))
Shape of target: (150,)

print("Target:\n{}".format(iris_dataset['target']))

Target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000111111111111111111111111
1111111111111111111111111122222222222
2222222222222222222222222222222222222
2 2]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris_dataset['data’],
iris_dataset['target'], random_state=0)
Pushparaj, Amrita Univ, Cbe
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))

X_train shape: (112, 4)

y_train shape: (112,)

print("X_test shape: {}".format(X_test.shape))

print("y_test shape: {}".format(y_test.shape))

X_test shape: (38, 4)

y_test shape: (38,)
Pushparaj, Amrita Univ, Cbe
Inspecting the data
To find abnormalities
May be in inches, not in cms
Best way: Visualization – scatter plot or pair plot
Pair plot: possible pairs of features
To create the plot, we first convert the NumPy array into a pandas DataFrame
Pandas has a function to create pair plots called scatter_matrix.
The diagonal of this matrix is filled with histograms of each feature
# create dataframe from data in X_train
# label the columns using the strings in iris_dataset.feature_names
iris_dataframe = pd.DataFrame(X_train, columns=iris_dataset.feature_names)
# create a scatter matrix from the dataframe, color by y_train
grr = pd.plotting.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15), marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8, cmap=mglearn.cm3
Pushparaj, Amrita Univ, Cbe
Pushparaj, Amrita Univ, Cbe
Building the Model: k-Nearest Neighbors
To set the parameter of k-nn model:

from sklearn.neighbors import

KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

To build the model on the training set, we call the fit method of the knn object,
which takes as arguments the NumPy array X_train containing the training data and
the NumPy array y_train of the corresponding training labels

knn.fit(X_train, y_train)
Pushparaj, Amrita Univ, Cbe
Making Predictions
Imagine we found an iris in the wild with a sepal length of 5 cm, a sepal width of 2.9
cm, a petal length of 1 cm, and a petal width of 0.2 cm. What species of iris would
this be?
X_new = np.array([[5, 2.9, 1, 0.2]])
print("X_new.shape: {}".format(X_new.shape))
X_new.shape: (1, 4)

To make a prediction, we call the predict method of the knn object:

prediction = knn.predict(X_new)
print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(
iris_dataset['target_names'][prediction])) Pushparaj, Amrita Univ, Cbe
Evaluating the Model
y_pred = knn.predict(X_test)
print("Test set predictions:\n
{}".format(y_pred))
Test set predictions:
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0 2]
print("Test set score:
{:.2f}".format(np.mean(y_pred == y_test)))
Test set score: 0.97

We can also use the score method of the knn object

print("Test set score:

{:.2f}".format(knn.score(X_test,
` y_test)))
Pushparaj, Amrita Univ, Cbe

Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
CSE 408 Fall 2021 Final Exam Topics & Questions
No ratings yet
CSE 408 Fall 2021 Final Exam Topics & Questions
8 pages
Mod3_Classification
No ratings yet
Mod3_Classification
32 pages
Knn Datacamp
No ratings yet
Knn Datacamp
31 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
KNN Lab
No ratings yet
KNN Lab
4 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
ML1 - Classification - KNN & NB
No ratings yet
ML1 - Classification - KNN & NB
23 pages
Lecture 12 K-Nearest Neighbors
No ratings yet
Lecture 12 K-Nearest Neighbors
24 pages
MLT lab 09
No ratings yet
MLT lab 09
3 pages
Practical 10 K-Nearest Neighbors Algorithm
No ratings yet
Practical 10 K-Nearest Neighbors Algorithm
16 pages
Lecture7 KNN
No ratings yet
Lecture7 KNN
40 pages
12_23ECE216_Nearest Neighbors
No ratings yet
12_23ECE216_Nearest Neighbors
29 pages
It - S All About Neighbors - Completed
No ratings yet
It - S All About Neighbors - Completed
14 pages
ML_Course_15 -17
No ratings yet
ML_Course_15 -17
31 pages
Rahul Raj.ipynb - Colab
No ratings yet
Rahul Raj.ipynb - Colab
50 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
06-knn
No ratings yet
06-knn
41 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
ML-KN
No ratings yet
ML-KN
12 pages
ML Notes
100% (2)
ML Notes
125 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
K-NN Algorithm in Machine Learning
No ratings yet
K-NN Algorithm in Machine Learning
11 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
ml3
No ratings yet
ml3
6 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Total Listing Machine Learning
100% (1)
Total Listing Machine Learning
114 pages
K Nearest neighbour’s(knn)[1] using R
No ratings yet
K Nearest neighbour’s(knn)[1] using R
9 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
DS Report
No ratings yet
DS Report
11 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Beginner’s Guide to Implementing a Simple Machine Learning Project - DeV Community
No ratings yet
Beginner’s Guide to Implementing a Simple Machine Learning Project - DeV Community
9 pages
MyChap3-Classification - Part 1
No ratings yet
MyChap3-Classification - Part 1
21 pages
Record
No ratings yet
Record
23 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Sridevi Women'S Engineering College: Mini Project Seminar On
No ratings yet
Sridevi Women'S Engineering College: Mini Project Seminar On
23 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Week3 Stat
No ratings yet
Week3 Stat
4 pages
AIML Lec-10
No ratings yet
AIML Lec-10
19 pages
ML#07
No ratings yet
ML#07
21 pages
Machine Learning KNN - Supervised
No ratings yet
Machine Learning KNN - Supervised
9 pages
AAM CODES
No ratings yet
AAM CODES
8 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
Python for Data Science IA 1 Programs
No ratings yet
Python for Data Science IA 1 Programs
14 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Class10 14 PatternClassification - 13 24sept2019
No ratings yet
Class10 14 PatternClassification - 13 24sept2019
50 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Questions For Chapter 2
No ratings yet
Questions For Chapter 2
6 pages
Computer Science Cacic 2019 25th Argentine Congress Of Computer Science Cacic 2019 Ro Cuarto Argentina October 1418 2019 Revised Selected Papers 1st Edition Patricia Pesado instant download
100% (2)
Computer Science Cacic 2019 25th Argentine Congress Of Computer Science Cacic 2019 Ro Cuarto Argentina October 1418 2019 Revised Selected Papers 1st Edition Patricia Pesado instant download
61 pages
Putation 1 Basic Algorithms and Operators by Thomas Back
100% (2)
Putation 1 Basic Algorithms and Operators by Thomas Back
379 pages
Wheat Disease Detection Using Image Processing
No ratings yet
Wheat Disease Detection Using Image Processing
3 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Motta, de Castro Neto, Sarmento - 2021 - A Mixed Approach For Urban Flood Prediction Using Machine Learning and GIS
No ratings yet
Motta, de Castro Neto, Sarmento - 2021 - A Mixed Approach For Urban Flood Prediction Using Machine Learning and GIS
13 pages
Unit 2
No ratings yet
Unit 2
38 pages
Unit-4 Ai
No ratings yet
Unit-4 Ai
28 pages
Cart: Classification and Regression Tree
No ratings yet
Cart: Classification and Regression Tree
45 pages
Unit 5
No ratings yet
Unit 5
27 pages
Stress Detection Using Machine Learning Techniques
No ratings yet
Stress Detection Using Machine Learning Techniques
5 pages
Final RRL 7
No ratings yet
Final RRL 7
8 pages
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
No ratings yet
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
35 pages
s12911 019 1004 8 PDF
No ratings yet
s12911 019 1004 8 PDF
16 pages
Customer Segmentation Using Clustering and Data Mining Techniques
No ratings yet
Customer Segmentation Using Clustering and Data Mining Techniques
6 pages
2 PB
No ratings yet
2 PB
10 pages
Seminar
No ratings yet
Seminar
31 pages
Review of Fruit Grading System
No ratings yet
Review of Fruit Grading System
10 pages
204CS001
No ratings yet
204CS001
2 pages
Breast Cancer Classification Using Transfer Learning
No ratings yet
Breast Cancer Classification Using Transfer Learning
57 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Classification Using Decision Tree: CSE-454: Data Warehousing and Data Mining Sessional
No ratings yet
Classification Using Decision Tree: CSE-454: Data Warehousing and Data Mining Sessional
23 pages
A Framework To Predict Social Crimes Using Twitter Tweets
No ratings yet
A Framework To Predict Social Crimes Using Twitter Tweets
5 pages
Projects 1920 A12
No ratings yet
Projects 1920 A12
78 pages
Rapidminer_ Data Mining Use Cases and Business Analytics - Hofmann, Markus(Editor);Klinkenberg, Ralf - 2016 - CRC Press - 9781482205503 - 4585e4298994fffc1eeb35862dc68e83 - Anna’s Archive-88-
No ratings yet
Rapidminer_ Data Mining Use Cases and Business Analytics - Hofmann, Markus(Editor);Klinkenberg, Ralf - 2016 - CRC Press - 9781482205503 - 4585e4298994fffc1eeb35862dc68e83 - Anna’s Archive-88-
20 pages
2112.15538v1 (1)
No ratings yet
2112.15538v1 (1)
30 pages
Sentiment Analysis of IMDb Movie Reviews
No ratings yet
Sentiment Analysis of IMDb Movie Reviews
6 pages
Decoding Mobile App Performance With Explainable AI
No ratings yet
Decoding Mobile App Performance With Explainable AI
7 pages
Doe Implementing U.S. Department of Energy Lessons Learned Programs Doe-Hdbk-7502-95
No ratings yet
Doe Implementing U.S. Department of Energy Lessons Learned Programs Doe-Hdbk-7502-95
66 pages

5. K-Nearest Neighbors Classifiers 2025

Uploaded by

5. K-Nearest Neighbors Classifiers 2025

Uploaded by

Classification Algorithms

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe

• Used mostly for solving classification problems

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe

Pushparaj, Amrita Univ, Cbe

• Here's the new data entry:

• To know its class, we have to calculate the distance from the

Pushparaj, Amrita Univ, Cbe

• There is no particular way of choosing the value K, but here

Pushparaj, Amrita Univ, Cbe

from sklearn.datasets import load_iris

print("Keys of iris_dataset: \n{}".format(iris_dataset.keys()))

print("Type of target: {}".format(type(iris_dataset['target'])))

Type of target: <class 'numpy.ndarray'>

Pushparaj, Amrita Univ, Cbe

from sklearn.model_selection import train_test_split

X_train shape: (112, 4)

print("X_test shape: {}".format(X_test.shape))

X_test shape: (38, 4)

from sklearn.neighbors import

To make a prediction, we call the predict method of the knn object:

We can also use the score method of the knn object

print("Test set score:

You might also like