SlideShare a Scribd company logo
KNN Classification
Algorithm
What’s in it for you?
Why do we need KNN?
What is KNN?
How do we choose the factor ‘K’?
When do we use KNN?
How does KNN Algorithm work?
Use Case: Predict whether a person will have diabetes or
not
Why KNN?
By now, we all know
Machine learning models
makes predictions by
learning from the past
data available
Input value
Machine Learning Model
Predicted Output
Is that a dog?
No dear, you can
differentiate
between a cat
and a dog based
on their
characteristics
No dear, you can
differentiate
between a cat
and a dog based
on their
characteristics
DOGSCATS
Sharp Claws, uses to climb
Smaller length of ears
Meows and purrs
Doesn’t love to play around
Dull Claws
Bigger length of ears
Barks
Loves to run around
No dear, you can
differentiate
between a cat
and a dog based
on their
characteristics
DOGSCATS
Length of ears 
Sharpnessofclaws
Now tell me if it
is a cat or a dog?
Now tell me if
it’s a cat or a
dog?
DOGSCATS
Length of ears 
Sharpnessofclaws
It’s features are
more like cats, it
must be a cat!
DOGSCATS
Length of ears 
Sharpofclaws
Why KNN?
Because KNN is based on
feature similarity, we can
do classification using KNN
Classifier!
Input value
KNN
Predicted Output
What is KNN?
What is KNN Algorithm?
KNN – K Nearest Neighbors, is one of the simplest Supervised Machine Learning algorithm
mostly used for
Classification
It classifies a data point based on how its
neighbors are classified
What is KNN Algorithm?
KNN stores all available cases and
classifies new cases based on a
similarity measure
Chloride Level 
SulphurDioxideLevel
RED or WHITE?
What is KNN Algorithm?
But, what is K?
Chloride Level 
SulphurDioxideLevel
RED or WHITE?
What is KNN Algorithm?
RED or WHITE?
k in KNN is a parameter that refers
to the number of nearest neighbors
to include in the majority voting
process
K=5
Chloride Level 
SulphurDioxideLevel
What is KNN Algorithm?
RED or WHITE?
A data point is classified by majority
votes from its 5 nearest neighbors
K=5
Chloride Level 
SulphurDioxideLevel
What is KNN Algorithm?
Chloride Level 
SulphurDioxideLevel RED
Here, the unknown point would be
classified as red, since 4 out of 5
neighbors are red
K=5
How do we choose ‘k’?
How do we choose the factor ‘k’?
?
K=3
New
Variable
KNN Algorithm is based on feature similarity: Choosing the right value of k is a process
called parameter tuning, and is important for better accuracy
How do we choose the factor ‘k’?
?
k=3
New
Variable
KNN Algorithm is based on feature similarity: Choosing the right value of k is a process
called parameter tuning, and is important for better accuracy
So at k=3, we can classify ‘?’ as
How do we choose the factor ‘k’?
?
k=3
k=7
New
Variable
KNN Algorithm is based on feature similarity: Choosing the right value of k is a process
called parameter tuning, and is important for better accuracy
But at k=7, we classify ‘?’ as
How do we choose the factor ‘k’?
?
K=1
K=3
New
Variable
KNN Algorithm is based on feature similarity: Choosing the right value of k is a process
called parameter tuning, and is important for better accuracy
So at k=3, we can classify ‘?’ as
The class of unknown data
point was at k=3 but
changed at k=7, so which k
should we choose?
How do we choose the factor ‘k’?
Odd value of K is selected to avoid confusion between two classes of data
Sqrt(n), where n is the total number of data points
To choose a value of k:
How do we choose the factor ‘k’?
Odd value of K is selected to avoid confusion between two classes of data
Sqrt(n), where n is the total number of data points
To choose a value of k:
Higher value of k has lesser
chance of error
When do we use KNN?
When do we use KNN Algorithm?
We can use KNN when
Dog
Data is labeled
When do we use KNN Algorithm?
We can use KNN when
Data is noise free
Data is labeled
Dog
Noise
Weight(x2) Height(y2) Class
51 167Underweight
62 182 one-fourty
69 176 23
64 173 hello kitty
65 172 Normal
When do we use KNN Algorithm?
We can use KNN when
Data is noise freeDataset is small
Data is labeled
Dog
Because KNN is a ‘lazy learner’ i.e.
doesn’t learn a discriminative function
from the training set
Noise
Weight(x2) Height(y2) Class
51 167Underweight
62 182 one-fourty
69 176 23
64 173 hello kitty
65 172 Normal
How does KNN Algorithm work?
How does KNN Algorithm work?
Consider a dataset having two variables: height (cm) & weight (kg) and each point is
classified as Normal or Underweight
Weight(x2) Height(y2) Class
51 167 Underweight
62 182 Normal
69 176 Normal
64 173 Normal
65 172 Normal
56 174 Underweight
58 169 Normal
57 173 Normal
55 170 Normal
How does KNN Algorithm work?
On the basis of the given data we have to classify the below set as Normal or
Underweight using KNN
Assuming, we
don’t know how to
calculate BMI!
57 kg 170 cm ?
How does KNN Algorithm work?
To find the nearest neighbors, we will calculate Euclidean
distance
But, what is
Euclidean distance?
How does KNN Algorithm work?
According to the Euclidean distance formula, the distance between two points in the plane with
coordinates (x, y) and (a, b) is given by:
dist(d)= √(x - a)² + (y - b)²
How does KNN Algorithm work?
Let’s calculate it to understand clearly:
Similarly, we will calculate Euclidean distance of unknown data point from
all the points in the dataset
dist(d1)= √(170-167)² + (57-51)² ~= 6.7
d1
dist(d2)= √(170-182)² + (57-62)² ~= 13
d2
dist(d3)= √(170-176)² + (57-69)² ~= 13.4
d3
Unknown data point
How does KNN Algorithm work?
Hence, we have calculated the Euclidean distance of unknown data point from all
the points as shown:
Where (x1, y1) = (57, 170) whose
class we have to classify
Weight(x2) Height(y2) Class Euclidean Distance
51 167 Underweight 6.7
62 182 Normal 13
69 176 Normal 13.4
64 173 Normal 7.6
65 172 Normal 8.2
56 174 Underweight 4.1
58 169 Normal 1.4
57 173 Normal 3
55 170 Normal 2
How does KNN Algorithm work?
Now, lets calculate the nearest neighbor at k=3
k = 3
57 kg 170 cm ?
Weight(x2) Height(y2) Class Euclidean Distance
51 167 Underweight 6.7
62 182 Normal 13
69 176 Normal 13.4
64 173 Normal 7.6
65 172 Normal 8.2
56 174 Underweight 4.1
58 169 Normal 1.4
57 173 Normal 3
55 170 Normal 2
Weight(x2) Height(y2) Class Euclidean Distance
51 167 Underweight 6.7
62 182 Normal 13
69 176 Normal 13.4
64 173 Normal 7.6
65 172 Normal 8.2
56 174 Underweight 4.1
58 169 Normal 1.4
57 173 Normal 3
55 170 Normal 2
How does KNN Algorithm work?
Now, lets calculate the nearest neighbor at k=3
k = 3
57 kg 170 cm ?
We have n=10,
And sqrt(10)=3.1
Hence, we have taken k=3
How does KNN Algorithm work?
So, majority neighbors are pointing towards ‘Normal’
Hence, as per KNN algorithm the class of (57, 170) should be ‘Normal’
k = 3
Class Euclidean Distance
Underweight 6.7
Normal 13
Normal 13.4
Normal 7.6
Normal 8.2
Underweight 4.1
Normal 1.4
Normal 3
Normal 2
Recap of KNN
Recap of KNN
• A positive integer k is specified, along with
a new sample
• We select the k entries in our database
which are closest to the new sample
• We find the most common classification of
these entries
• This is the classification we give to the
new sample
USE CASE: Predict Diabetes
KNN - Predict diabetes
Objective: Predict whether a person will be diagnosed with diabetes or not
We have a dataset of 768 people who were or were not
diagnosed with diabetes
KNN - Predict diabetes
Import the required Scikit-learn libraries as shown:
KNN - Predict diabetes
Load the dataset and have a look:
KNN - Predict diabetes
Values of columns like ‘Glucose’, BloodPressure’ cannot be accepted as zeroes because it will affect the outcome
We can replace such values with the mean of the respective column:
KNN - Predict diabetes
Before proceeding further, let’s split the dataset into train and test:
KNN - Predict diabetes
Feature Scaling:
Rule of thumb: Any algorithm that
computes distance or assumes
normality, scale your features!
KNN - Predict diabetes
Then define the model using KNeighborsClassifier and fit the train data in
the model
N_neighbors here is ‘K’
p is the power parameter to define
the metric used, which is ‘Euclidean’
in our case
KNN - Predict diabetes
There are other metrics
also to evaluate the
distance like Manhattan
distance , Minkowski
distance etc
KNN - Predict diabetes
Let’s predict the test results:
KNN - Predict diabetes
It’s important to evaluate the model, let’s use confusion matrix to do that:
KNN - Predict diabetes
Calculate accuracy of the model:
KNN - Predict diabetes
So, we have created a model using
KNN which can predict whether a
person will have diabetes or not
KNN - Predict diabetes
And accuracy of 80% tells us that it is
a pretty fair fit in the model!
Summary
Why we need knn? Eucledian distance Choosing the value of k
Knn classifier for diabetes predictionHow KNN works?
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Beginners | Simplilearn

More Related Content

What's hot (20)

PPTX
Knn 160904075605-converted
rameswara reddy venkat
 
PDF
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
PPTX
House price prediction
AdityaKumar1505
 
PPTX
Clustering in Data Mining
Archana Swaminathan
 
PDF
Logistic regression in Machine Learning
Kuppusamy P
 
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
PPTX
Classification in data mining
Sulman Ahmed
 
PPTX
Kmeans
Nikita Goyal
 
PPT
K mean-clustering algorithm
parry prabhu
 
PPTX
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
PPTX
Supervised Machine Learning
Livares Technologies Pvt Ltd
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPTX
Random forest algorithm
Rashid Ansari
 
PPTX
supervised learning
Amar Tripathi
 
PPT
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
PPTX
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
PDF
Dimensionality Reduction
mrizwan969
 
PDF
Convolutional Neural Networks : Popular Architectures
ananth
 
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Knn 160904075605-converted
rameswara reddy venkat
 
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
House price prediction
AdityaKumar1505
 
Clustering in Data Mining
Archana Swaminathan
 
Logistic regression in Machine Learning
Kuppusamy P
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Classification in data mining
Sulman Ahmed
 
Kmeans
Nikita Goyal
 
K mean-clustering algorithm
parry prabhu
 
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Supervised Machine Learning
Livares Technologies Pvt Ltd
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Random forest algorithm
Rashid Ansari
 
supervised learning
Amar Tripathi
 
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Dimensionality Reduction
mrizwan969
 
Convolutional Neural Networks : Popular Architectures
ananth
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 

Similar to KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Beginners | Simplilearn (20)

PDF
Machine Learning Algorithm - KNN
Kush Kulshrestha
 
PDF
tghteh ddh4eth rtnrtrgthgh12500123196.pdf
NidhiKumari899659
 
PPTX
KNN Classificationwithexplanation and examples.pptx
ansarinazish958
 
PPTX
Statistical Machine Learning unit3 lecture notes
SureshK256753
 
PPTX
KNN Classifier
Mobashshirur Rahman 👲
 
PPTX
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
Nishant83346
 
PPTX
K- Nearest Neighbor Approach
Kumud Arora
 
PDF
Understanding K-Nearest Neighbor (KNN) Algorithm
Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
 
PDF
Natural Language Processing of applications.pdf
pranavi452104
 
PDF
Lecture03 - K-Nearest-Neighbor Machine learning
ShafinZaman2
 
PDF
Lecture 6 - Classification Classification
viyah59114
 
PPTX
k-Nearest Neighbors with brief explanation.pptx
gamingzonedead880
 
PDF
K nearest neighbor algorithm
Learnbay Datascience
 
PPTX
K neareast neighbor algorithm presentation
Shiraz316
 
PPTX
k-nearest neighbour Machine Learning.pptx
SabbirAhmed346057
 
PDF
k-nearest neighbour Machine Learning.pdf
SabbirAhmed346057
 
PPTX
knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
BootNeck1
 
PPTX
K- Nearest Neighbour Algorithm.pptx
DivyaKS12
 
PPTX
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
Guru Nanak Technical Institutions
 
PPTX
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
NandiniKumari54
 
Machine Learning Algorithm - KNN
Kush Kulshrestha
 
tghteh ddh4eth rtnrtrgthgh12500123196.pdf
NidhiKumari899659
 
KNN Classificationwithexplanation and examples.pptx
ansarinazish958
 
Statistical Machine Learning unit3 lecture notes
SureshK256753
 
KNN Classifier
Mobashshirur Rahman 👲
 
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
Nishant83346
 
K- Nearest Neighbor Approach
Kumud Arora
 
Natural Language Processing of applications.pdf
pranavi452104
 
Lecture03 - K-Nearest-Neighbor Machine learning
ShafinZaman2
 
Lecture 6 - Classification Classification
viyah59114
 
k-Nearest Neighbors with brief explanation.pptx
gamingzonedead880
 
K nearest neighbor algorithm
Learnbay Datascience
 
K neareast neighbor algorithm presentation
Shiraz316
 
k-nearest neighbour Machine Learning.pptx
SabbirAhmed346057
 
k-nearest neighbour Machine Learning.pdf
SabbirAhmed346057
 
knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
BootNeck1
 
K- Nearest Neighbour Algorithm.pptx
DivyaKS12
 
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
Guru Nanak Technical Institutions
 
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
NandiniKumari54
 
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PDF
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
PPTX
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
PDF
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
PPTX
Difference between write and update in odoo 18
Celine George
 
PPTX
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
PPTX
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
PDF
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
PPTX
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
PPTX
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
PPTX
How to Create Odoo JS Dialog_Popup in Odoo 18
Celine George
 
PDF
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
PPTX
infertility, types,causes, impact, and management
Ritu480198
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
ENG8_Q1_WEEK2_LESSON1. Presentation pptx
marawehsvinetshe
 
PDF
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
PPT
Indian Contract Act 1872, Business Law #MBA #BBA #BCOM
priyasinghy107
 
PPTX
Introduction to Indian Writing in English
Trushali Dodiya
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
Difference between write and update in odoo 18
Celine George
 
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
How to Create Odoo JS Dialog_Popup in Odoo 18
Celine George
 
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
infertility, types,causes, impact, and management
Ritu480198
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
ENG8_Q1_WEEK2_LESSON1. Presentation pptx
marawehsvinetshe
 
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
Indian Contract Act 1872, Business Law #MBA #BBA #BCOM
priyasinghy107
 
Introduction to Indian Writing in English
Trushali Dodiya
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 

KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Beginners | Simplilearn

  • 2. What’s in it for you? Why do we need KNN? What is KNN? How do we choose the factor ‘K’? When do we use KNN? How does KNN Algorithm work? Use Case: Predict whether a person will have diabetes or not
  • 3. Why KNN? By now, we all know Machine learning models makes predictions by learning from the past data available Input value Machine Learning Model Predicted Output
  • 4. Is that a dog?
  • 5. No dear, you can differentiate between a cat and a dog based on their characteristics
  • 6. No dear, you can differentiate between a cat and a dog based on their characteristics DOGSCATS Sharp Claws, uses to climb Smaller length of ears Meows and purrs Doesn’t love to play around Dull Claws Bigger length of ears Barks Loves to run around
  • 7. No dear, you can differentiate between a cat and a dog based on their characteristics DOGSCATS Length of ears  Sharpnessofclaws
  • 8. Now tell me if it is a cat or a dog?
  • 9. Now tell me if it’s a cat or a dog? DOGSCATS Length of ears  Sharpnessofclaws
  • 10. It’s features are more like cats, it must be a cat!
  • 11. DOGSCATS Length of ears  Sharpofclaws
  • 12. Why KNN? Because KNN is based on feature similarity, we can do classification using KNN Classifier! Input value KNN Predicted Output
  • 14. What is KNN Algorithm? KNN – K Nearest Neighbors, is one of the simplest Supervised Machine Learning algorithm mostly used for Classification It classifies a data point based on how its neighbors are classified
  • 15. What is KNN Algorithm? KNN stores all available cases and classifies new cases based on a similarity measure Chloride Level  SulphurDioxideLevel RED or WHITE?
  • 16. What is KNN Algorithm? But, what is K? Chloride Level  SulphurDioxideLevel RED or WHITE?
  • 17. What is KNN Algorithm? RED or WHITE? k in KNN is a parameter that refers to the number of nearest neighbors to include in the majority voting process K=5 Chloride Level  SulphurDioxideLevel
  • 18. What is KNN Algorithm? RED or WHITE? A data point is classified by majority votes from its 5 nearest neighbors K=5 Chloride Level  SulphurDioxideLevel
  • 19. What is KNN Algorithm? Chloride Level  SulphurDioxideLevel RED Here, the unknown point would be classified as red, since 4 out of 5 neighbors are red K=5
  • 20. How do we choose ‘k’?
  • 21. How do we choose the factor ‘k’? ? K=3 New Variable KNN Algorithm is based on feature similarity: Choosing the right value of k is a process called parameter tuning, and is important for better accuracy
  • 22. How do we choose the factor ‘k’? ? k=3 New Variable KNN Algorithm is based on feature similarity: Choosing the right value of k is a process called parameter tuning, and is important for better accuracy So at k=3, we can classify ‘?’ as
  • 23. How do we choose the factor ‘k’? ? k=3 k=7 New Variable KNN Algorithm is based on feature similarity: Choosing the right value of k is a process called parameter tuning, and is important for better accuracy But at k=7, we classify ‘?’ as
  • 24. How do we choose the factor ‘k’? ? K=1 K=3 New Variable KNN Algorithm is based on feature similarity: Choosing the right value of k is a process called parameter tuning, and is important for better accuracy So at k=3, we can classify ‘?’ as The class of unknown data point was at k=3 but changed at k=7, so which k should we choose?
  • 25. How do we choose the factor ‘k’? Odd value of K is selected to avoid confusion between two classes of data Sqrt(n), where n is the total number of data points To choose a value of k:
  • 26. How do we choose the factor ‘k’? Odd value of K is selected to avoid confusion between two classes of data Sqrt(n), where n is the total number of data points To choose a value of k: Higher value of k has lesser chance of error
  • 27. When do we use KNN?
  • 28. When do we use KNN Algorithm? We can use KNN when Dog Data is labeled
  • 29. When do we use KNN Algorithm? We can use KNN when Data is noise free Data is labeled Dog Noise Weight(x2) Height(y2) Class 51 167Underweight 62 182 one-fourty 69 176 23 64 173 hello kitty 65 172 Normal
  • 30. When do we use KNN Algorithm? We can use KNN when Data is noise freeDataset is small Data is labeled Dog Because KNN is a ‘lazy learner’ i.e. doesn’t learn a discriminative function from the training set Noise Weight(x2) Height(y2) Class 51 167Underweight 62 182 one-fourty 69 176 23 64 173 hello kitty 65 172 Normal
  • 31. How does KNN Algorithm work?
  • 32. How does KNN Algorithm work? Consider a dataset having two variables: height (cm) & weight (kg) and each point is classified as Normal or Underweight Weight(x2) Height(y2) Class 51 167 Underweight 62 182 Normal 69 176 Normal 64 173 Normal 65 172 Normal 56 174 Underweight 58 169 Normal 57 173 Normal 55 170 Normal
  • 33. How does KNN Algorithm work? On the basis of the given data we have to classify the below set as Normal or Underweight using KNN Assuming, we don’t know how to calculate BMI! 57 kg 170 cm ?
  • 34. How does KNN Algorithm work? To find the nearest neighbors, we will calculate Euclidean distance But, what is Euclidean distance?
  • 35. How does KNN Algorithm work? According to the Euclidean distance formula, the distance between two points in the plane with coordinates (x, y) and (a, b) is given by: dist(d)= √(x - a)² + (y - b)²
  • 36. How does KNN Algorithm work? Let’s calculate it to understand clearly: Similarly, we will calculate Euclidean distance of unknown data point from all the points in the dataset dist(d1)= √(170-167)² + (57-51)² ~= 6.7 d1 dist(d2)= √(170-182)² + (57-62)² ~= 13 d2 dist(d3)= √(170-176)² + (57-69)² ~= 13.4 d3 Unknown data point
  • 37. How does KNN Algorithm work? Hence, we have calculated the Euclidean distance of unknown data point from all the points as shown: Where (x1, y1) = (57, 170) whose class we have to classify Weight(x2) Height(y2) Class Euclidean Distance 51 167 Underweight 6.7 62 182 Normal 13 69 176 Normal 13.4 64 173 Normal 7.6 65 172 Normal 8.2 56 174 Underweight 4.1 58 169 Normal 1.4 57 173 Normal 3 55 170 Normal 2
  • 38. How does KNN Algorithm work? Now, lets calculate the nearest neighbor at k=3 k = 3 57 kg 170 cm ? Weight(x2) Height(y2) Class Euclidean Distance 51 167 Underweight 6.7 62 182 Normal 13 69 176 Normal 13.4 64 173 Normal 7.6 65 172 Normal 8.2 56 174 Underweight 4.1 58 169 Normal 1.4 57 173 Normal 3 55 170 Normal 2
  • 39. Weight(x2) Height(y2) Class Euclidean Distance 51 167 Underweight 6.7 62 182 Normal 13 69 176 Normal 13.4 64 173 Normal 7.6 65 172 Normal 8.2 56 174 Underweight 4.1 58 169 Normal 1.4 57 173 Normal 3 55 170 Normal 2 How does KNN Algorithm work? Now, lets calculate the nearest neighbor at k=3 k = 3 57 kg 170 cm ? We have n=10, And sqrt(10)=3.1 Hence, we have taken k=3
  • 40. How does KNN Algorithm work? So, majority neighbors are pointing towards ‘Normal’ Hence, as per KNN algorithm the class of (57, 170) should be ‘Normal’ k = 3 Class Euclidean Distance Underweight 6.7 Normal 13 Normal 13.4 Normal 7.6 Normal 8.2 Underweight 4.1 Normal 1.4 Normal 3 Normal 2
  • 41. Recap of KNN Recap of KNN • A positive integer k is specified, along with a new sample • We select the k entries in our database which are closest to the new sample • We find the most common classification of these entries • This is the classification we give to the new sample
  • 42. USE CASE: Predict Diabetes
  • 43. KNN - Predict diabetes Objective: Predict whether a person will be diagnosed with diabetes or not We have a dataset of 768 people who were or were not diagnosed with diabetes
  • 44. KNN - Predict diabetes Import the required Scikit-learn libraries as shown:
  • 45. KNN - Predict diabetes Load the dataset and have a look:
  • 46. KNN - Predict diabetes Values of columns like ‘Glucose’, BloodPressure’ cannot be accepted as zeroes because it will affect the outcome We can replace such values with the mean of the respective column:
  • 47. KNN - Predict diabetes Before proceeding further, let’s split the dataset into train and test:
  • 48. KNN - Predict diabetes Feature Scaling: Rule of thumb: Any algorithm that computes distance or assumes normality, scale your features!
  • 49. KNN - Predict diabetes Then define the model using KNeighborsClassifier and fit the train data in the model N_neighbors here is ‘K’ p is the power parameter to define the metric used, which is ‘Euclidean’ in our case
  • 50. KNN - Predict diabetes There are other metrics also to evaluate the distance like Manhattan distance , Minkowski distance etc
  • 51. KNN - Predict diabetes Let’s predict the test results:
  • 52. KNN - Predict diabetes It’s important to evaluate the model, let’s use confusion matrix to do that:
  • 53. KNN - Predict diabetes Calculate accuracy of the model:
  • 54. KNN - Predict diabetes So, we have created a model using KNN which can predict whether a person will have diabetes or not
  • 55. KNN - Predict diabetes And accuracy of 80% tells us that it is a pretty fair fit in the model!
  • 56. Summary Why we need knn? Eucledian distance Choosing the value of k Knn classifier for diabetes predictionHow KNN works?

Editor's Notes

  • #3: Remove title case
  • #4: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #5: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #6: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #7: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #8: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #9: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #10: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #11: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #12: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #13: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #15: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #16: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #17: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #18: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #19: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #20: ,. Unlike other regression algorithms where we have one or more than one dependent variables
  • #29: ,. Unlike other regression algorithms where we have one or more than one dependent variables We have a small data set as KNN is a lazy learner so it becomes slow with large datasets
  • #30: ,. Unlike other regression algorithms where we have one or more than one dependent variables We have a small data set as KNN is a lazy learner so it becomes slow with large datasets
  • #31: ,. Unlike other regression algorithms where we have one or more than one dependent variables We have a small data set as KNN is a lazy learner so it becomes slow with large datasets
  • #49:  Most of the times, your dataset will contain features highly varying in magnitudes, units and range. we need to bring all features to the same level of magnitudes. This can be acheived by scaling
  • #53: The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.