Bài nhóm tìm hiểu về KNN

Uploaded by

Đỗ Quỳnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Bài nhóm tìm hiểu về KNN

Uploaded by

Đỗ Quỳnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Thành viên nhóm

Nguyễn Ngọc Thanh Lam

Lê Ngọc Hân
Đỗ Nguyễn Quỳnh Như
Phan Ngọc Diễm Trinh
Mai Thị Huỳnh Như
Lê Thị Thúy Ngọc

1. What is the K-Nearest Neighbors Algorithm?

- K-Nearest Neighbors (KNN) is a supervised machine learning model that can be
used for both regression and classification tasks. The algorithm is non-parametric,
which means that it doesn't make any assumption about the underlying distribution of
the data.
- KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application in
pattern recognition, data mining, and intrusion detection.
- It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
- KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know whether it is a cat or dog. So for this identification, we can use
the KNN algorithm, as it works on a similarity measure. Our KNN model will find the
similar features of the new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
2. How does K-Nearest Neighbors Classification work?

Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity, where it
predicts the label or value of a new data point by considering the labels or values of its K
nearest neighbors in the training dataset.

Step-by-Step explanation of how KNN works is discussed below:

Step 1: Selecting the optimal value of K
K represents the number of nearest neighbors that needs to be considered while making a
prediction.
Step 2: Calculating distance
To measure the similarity between target and training data points, Euclidean distance is
used. Distance is calculated between each of the data points in the dataset and target point.
Step 3: Finding Nearest Neighbors
The k data points with the smallest distances to the target point are the nearest neighbors.
Step 4: Voting for Classification or Taking Average for Regression
In the classification problem, the class labels of K-nearest neighbors are determined by
performing majority voting. The class with the most occurrences among the neighbors
becomes the predicted class for the target data point.
In the regression problem, the class label is calculated by taking average of the target values
of K nearest neighbors. The calculated average value becomes the predicted output for the
target data point.
Let X be the training dataset with n data points, where each data point is represented by a
d-dimensional feature vector Xi and Y be the corresponding labels or values for each data
point in X. Given a new data point x, the algorithm calculates the distance between x and
each data point Xi in X using a distance metric, such as Euclidean distance:
The algorithm selects the K data points from X that have the shortest distances to x. For
classification tasks, the algorithm assigns the label y that is most frequent among the K
nearest neighbors to x. For regression tasks, the algorithm calculates the average or
weighted average of the values y of the K nearest neighbors and assigns it as the predicted
value for x.

3. Distance Metrics Used in KNN Algorithm

The distance between the new data point and all the points in the training set is
calculated. Common methods of calculating the distance include:

● Euclidean Distance

This is nothing but the cartesian distance between the two points which are in the
plane/hyperplane. Euclidean distance can also be visualized as the length of the
straight line that joins the two points which are into consideration. This metric helps
us calculate the net displacement done between the two states of an object.

● Manhattan Distance

Manhattan Distance metric is generally used when we are interested in the total
distance traveled by the object instead of the displacement. This metric is calculated
by summing the absolute difference between the coordinates of the points in
n-dimensions.

● Minkowski Distance

We can say that the Euclidean, as well as the Manhattan distance, are special cases of
the Minkowski distance.
From the formula above we can say that when p = 2 then it is the same as the formula
for the Euclidean distance and when p = 1 then we obtain the formula for the
Manhattan distance.
4. How to choose the value of k for KNN Algorithm?
There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
Large values for K are good, but it may find some difficulties.
5. K-nearest neighbors for handwriting recognition

One of the classic applications of the K-Nearest Neighbors (KNN) algorithm is

handwriting recognition, particularly for recognizing handwritten digits. This
application demonstrates the versatility and effectiveness of KNN in dealing with
high-dimensional data. Here’s how KNN is used for handwriting recognition:

How It Works:

- Dataset: The algorithm uses datasets like the MNIST (Modified National
Institute of Standards and Technology) database, which contains thousands of
labeled images of handwritten digits (0–9). Each image is represented as a
high-dimensional vector of pixel intensities (e.g., a 28x28 pixel image flattened
into a 784-dimensional vector).
- Feature Extraction: In this task, each pixel in the image is treated as a feature.
The goal is to compare the pixel intensity values between the test image and the
training images.
- Distance Calculation: To classify a new handwritten digit, KNN computes the
distance between the feature vector of the test image and the feature vectors of
all the training images. The Euclidean distance is commonly used, though other
distance metrics can also be applied.
- Finding Neighbors: The algorithm identifies the K nearest neighbors—training
images with the smallest distances to the test image.
- Classification: KNN assigns the digit that is most frequently occurring among
the K nearest neighbors as the predicted label for the test image.
Advantages in Handwriting Recognition

- Simplicity: KNN is straightforward and requires no complex training phase,

making it ideal for rapid prototyping.
- High Accuracy for Small Datasets: When paired with a robust distance metric,
KNN performs well on relatively small handwriting datasets.

Challenges

- High Dimensionality: Each image contributes hundreds of features, leading to

increased computation time and memory usage during classification.
- Scalability: KNN becomes computationally expensive with large datasets since
it involves computing distances for every training sample during inference.

Example:

Suppose you want to classify the following digit:

- Input: A 28x28 image of a handwritten number "7".

- Steps:
○ Flatten the image into a 784-dimensional vector.
○ Calculate distances between this vector and all labeled images in the
training set.
○ Select K nearest neighbors (e.g., K=5).
○ Perform majority voting. If the nearest neighbors have labels [7, 7, 1, 7,
3], the predicted label is "7".

Optimization Techniques

- Dimensionality Reduction: Techniques like Principal Component Analysis

(PCA) can reduce the number of features, improving computational efficiency.
- Efficient Data Structures: Data structures like KD-trees or Ball-trees can speed
up nearest-neighbor search.

By leveraging KNN, handwriting recognition systems can effectively classify digits

and characters, forming the basis for many real-world applications, such as postal
code recognition and form processing.

6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
No ratings yet
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
14 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Day43 KNN Intro
No ratings yet
Day43 KNN Intro
4 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
K-Nearest Neighbors (K-NN) Algorithm
No ratings yet
K-Nearest Neighbors (K-NN) Algorithm
10 pages
KNN Algorithm
No ratings yet
KNN Algorithm
2 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
No ratings yet
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
10 pages
Unit V Non Parametric Machine Learning
No ratings yet
Unit V Non Parametric Machine Learning
47 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
Instance Based Learning
No ratings yet
Instance Based Learning
7 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
K-Nearest Neighbor (KNN)
No ratings yet
K-Nearest Neighbor (KNN)
27 pages
KNN Algo
No ratings yet
KNN Algo
9 pages
Sample KNN
No ratings yet
Sample KNN
7 pages
Unit II 2 Mark Answers ML
No ratings yet
Unit II 2 Mark Answers ML
3 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
K-Nearest Neighbor Algorithm
100% (1)
K-Nearest Neighbor Algorithm
6 pages
ML 2
No ratings yet
ML 2
6 pages
Amrendra
No ratings yet
Amrendra
9 pages
21 KNN
No ratings yet
21 KNN
28 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
KNN
No ratings yet
KNN
53 pages
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
No ratings yet
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
3 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
4.kNN Concepts
No ratings yet
4.kNN Concepts
12 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Research Paper
No ratings yet
Research Paper
6 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
Week 07
No ratings yet
Week 07
24 pages
K - Nearest Neighbours
No ratings yet
K - Nearest Neighbours
6 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
K-Nearest Neighbors (KNN)
No ratings yet
K-Nearest Neighbors (KNN)
16 pages
K-Nearest Neighbor Algorithm
No ratings yet
K-Nearest Neighbor Algorithm
6 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
3 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
2 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Intro To KNN
No ratings yet
Intro To KNN
8 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
KNN Algorithm - PPT (Autosaved)
0% (1)
KNN Algorithm - PPT (Autosaved)
8 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Complexity
No ratings yet
Complexity
19 pages
Daa Unit 2
No ratings yet
Daa Unit 2
12 pages
18CSC305J - Artificial Intelligence Unit IV Question Bank Part A
No ratings yet
18CSC305J - Artificial Intelligence Unit IV Question Bank Part A
7 pages
Digital - MCQ - 8 - 2 - 25
No ratings yet
Digital - MCQ - 8 - 2 - 25
5 pages
RAM Model
No ratings yet
RAM Model
11 pages
Building and Room Acoustics Measurements With Sine-Sweep Technique
No ratings yet
Building and Room Acoustics Measurements With Sine-Sweep Technique
2 pages
Nria20-Dl - Unit-4 Notes-Final
No ratings yet
Nria20-Dl - Unit-4 Notes-Final
21 pages
CS235102 Data Structures
No ratings yet
CS235102 Data Structures
46 pages
Time Allotted: 3 Hrs Full Marks: 70: Digital Communication (ECEN 3105)
No ratings yet
Time Allotted: 3 Hrs Full Marks: 70: Digital Communication (ECEN 3105)
2 pages
Multivariate Approach Towards Financial Market Prediction: Guided By:-Mrs. Shraddha Ovale
No ratings yet
Multivariate Approach Towards Financial Market Prediction: Guided By:-Mrs. Shraddha Ovale
8 pages
Mid Point Line Generation Algorithm
No ratings yet
Mid Point Line Generation Algorithm
4 pages
Ch02 WienerFilters Lect 04
No ratings yet
Ch02 WienerFilters Lect 04
51 pages
Matlab Expo 2021 Intro To Ai Workshop Edt
No ratings yet
Matlab Expo 2021 Intro To Ai Workshop Edt
26 pages
Inheritance 57
No ratings yet
Inheritance 57
25 pages
Lu and Plu Factorization: Terry A. Loring
No ratings yet
Lu and Plu Factorization: Terry A. Loring
7 pages
Cryptographic Hash Functions
No ratings yet
Cryptographic Hash Functions
55 pages
Calculator Techniques
No ratings yet
Calculator Techniques
10 pages
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question. Multiply
No ratings yet
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question. Multiply
2 pages
Advanced Digital Signal Processing Spring 2012: Signal Representation and Time Domain Analysis
No ratings yet
Advanced Digital Signal Processing Spring 2012: Signal Representation and Time Domain Analysis
46 pages
Error Coding Reed Solomon
No ratings yet
Error Coding Reed Solomon
28 pages
The 0/1 Knapsack Problem The 0/1 Knapsack Problem
No ratings yet
The 0/1 Knapsack Problem The 0/1 Knapsack Problem
21 pages
Pattern Recognition Using Neural Network (Project Proposal For Image Processing)
No ratings yet
Pattern Recognition Using Neural Network (Project Proposal For Image Processing)
6 pages
Comparative Analysis of Multi-Label Classification Algorithms
No ratings yet
Comparative Analysis of Multi-Label Classification Algorithms
4 pages
Assignment
No ratings yet
Assignment
7 pages
Algorithms For Generating Permutations and Combinations: Prof. Nathan Wodarz Math 209 - Fall 2008
0% (1)
Algorithms For Generating Permutations and Combinations: Prof. Nathan Wodarz Math 209 - Fall 2008
7 pages
Lecture4 CSPs
No ratings yet
Lecture4 CSPs
56 pages
Algorith and Data Structure Revision - 1
No ratings yet
Algorith and Data Structure Revision - 1
9 pages
Design and Analysis of Algorithms - Model Question Paper
No ratings yet
Design and Analysis of Algorithms - Model Question Paper
3 pages
Base Paper
No ratings yet
Base Paper
19 pages
Code:: 01. Write A Program in C To Draw A Smiley Face
No ratings yet
Code:: 01. Write A Program in C To Draw A Smiley Face
7 pages