SlideShare a Scribd company logo
K-Nearest Neighbour
Tushar B. Kute,
http://tusharkute.com
What sort of Machine Learning?
• An idea that can be used for machine learning—
as does another maxim involving poultry: "birds
of a feather flock together."
• In other words, things that are alike are likely to
have properties that are alike.
• We can use this principle to classify data by
placing it in the category with the most similar,
or "nearest" neighbors.
Nearest Neighbor Classification
• In a single sentence, nearest neighbor classifiers are defined
by their characteristic of classifying unlabeled examples by
assigning them the class of the most similar labeled examples.
Despite the simplicity of this idea, nearest neighbor methods
are extremely powerful. They have been used successfully for:
– Computer vision applications, including optical character
recognition and facial recognition in both still images and
video
– Predicting whether a person enjoys a movie which he/she
has been recommended (as in the Netflix challenge)
– Identifying patterns in genetic data, for use in detecting
specific protein or diseases
The kNN Algorithm
• The kNN algorithm begins with a training dataset
made up of examples that are classified into several
categories, as labeled by a nominal variable.
• Assume that we have a test dataset containing
unlabeled examples that otherwise have the same
features as the training data.
• For each record in the test dataset, kNN identifies k
records in the training data that are the "nearest" in
similarity, where k is an integer specified in advance.
• The unlabeled test instance is assigned the class of
the majority of the k nearest neighbors
Example:
Reference: Machine Learning with R, Brett Lantz, Packt Publishing
Example:
Example:
Classify me now!
Example:
Calculating Distance
• Locating the tomato's nearest neighbors requires
a distance function, or a formula that measures
the similarity between two instances.
• There are many different ways to calculate
distance.
• Traditionally, the kNN algorithm uses Euclidean
distance, which is the distance one would measure
if you could use a ruler to connect two points,
illustrated in the previous figure by the dotted
lines connecting the tomato to its neighbors.
Calculating Distance
Reference: Super Data Science
Distance
• Euclidean distance is specified by the following formula, where p
and q are th examples to be compared, each having n features. The
term p1 refers to the value of the first feature of example p, while
q1 refers to the value of the first feature of example q:
• The distance formula involves comparing the values of each
feature. For example, to calculate the distance between the
tomato (sweetness = 6, crunchiness = 4), and the green bean
(sweetness = 3, crunchiness = 7), we can use the formula as follows:
Distance
Distance
Manhattan Distance
Euclidean Distance
Closest Neighbors
Choosing appropriate k
• Deciding how many neighbors to use for kNN
determines how well the mode will generalize to
future data.
• The balance between overfitting and underfitting
the training data is a problem known as the bias-
variance tradeoff.
• Choosing a large k reduces the impact or variance
caused by noisy data, but can bias the learner such
that it runs the risk of ignoring small, but important
patterns.
Choosing appropriate k
Choosing appropriate k
• In practice, choosing k depends on the difficulty
of the concept to be learned and the number of
records in the training data.
• Typically, k is set somewhere between 3 and 10.
One common practice is to set k equal to the
square root of the number of training examples.
• In the classifier, we might set k = 4, because
there were 15 example ingredients in the
training data and the square root of 15 is 3.87.
Min-Max normalization
• The traditional method of rescaling features for kNN is min-
max normalization.
• This process transforms a feature such that all of its values fall
in a range between 0 and 1. The formula for normalizing a
feature is as follows. Essentially, the formula subtracts the
minimum of feature X from each value and divides by the range
of X:
• Normalized feature values can be interpreted as indicating how
far, from 0 percent to 100 percent, the original value fell along
the range between the original minimum and maximum.
The Lazy Learning
• Using the strict definition of learning, a lazy learner
is not really learning anything.
• Instead, it merely stores the training data in it. This
allows the training phase to occur very rapidly, with
a potential downside being that the process of
making predictions tends to be relatively slow.
• Due to the heavy reliance on the training instances,
lazy learning is also known as instance-based
learning or rote learning.
Few Lazy Learning Algorithms
• K Nearest Neighbors
• Local Regression
• Lazy Naive Bayes
The kNN Algorithm
Python Packages needed
• pandas
– Data Analytics
• numpy
– Numerical Computing
• matplotlib.pyplot
– Plotting graphs
• sklearn
– Classification and Regression Classes
Sample Application
KNN – Classification : Dataset
• The best small project to start with on a new tool is the
classification of iris flowers (e.g. the iris dataset).
• This is a good project because it is so well understood.
– Attributes are numeric so you have to figure out how to load and
handle data.
– It is a classification problem, allowing you to practice with perhaps
an easier type of supervised learning algorithm.
– It is a multi-class classification problem (multi-nominal) that may
require some specialized handling.
– It only has 4 attributes and 150 rows, meaning it is small and easily
fits into memory (and a screen or A4 page).
– All of the numeric attributes are in the same units and the same
scale, not requiring any special scaling or transforms to get started.
Dataset
KNN – Classification : Dataset
Pre-processing
Characterize
Finding error with changed k
Error rate
Useful resources
• www.mitu.co.in
• www.pythonprogramminglanguage.com
• www.scikit-learn.org
• www.towardsdatascience.com
• www.medium.com
• www.analyticsvidhya.com
• www.kaggle.com
• www.stephacking.com
• www.github.com
tushar@tusharkute.com
Thank you
This presentation is created using LibreOffice Impress 5.1.6.2, can be used freely as per GNU General Public License
Web Resources
https://mitu.co.in
http://tusharkute.com
/mITuSkillologies @mitu_group
contact@mitu.co.in
/company/mitu-
skillologies
MITUSkillologies
Ad

More Related Content

Similar to Lecture03 - K-Nearest-Neighbor Machine learning (20)

Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
K- Nearest Neighbour Algorithm.pptx
K-  Nearest   Neighbour   Algorithm.pptxK-  Nearest   Neighbour   Algorithm.pptx
K- Nearest Neighbour Algorithm.pptx
DivyaKS12
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
Iwan Sofana
 
KNN
KNNKNN
KNN
Joris Schelfaut
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
fetnbadani
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
sandeepsandy494692
 
part3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptxpart3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptx
VaishaliBagewadikar
 
Introduction to data visualization tools like Tableau and Power BI and Excel
Introduction to data visualization tools like Tableau and Power BI  and ExcelIntroduction to data visualization tools like Tableau and Power BI  and Excel
Introduction to data visualization tools like Tableau and Power BI and Excel
Lipika Sharma
 
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Machine learning by using python Lesson One Part 2 By Professor Lili SaghafiMachine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Professor Lili Saghafi
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Aun Akbar
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
SVasuKrishna1
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Knoldus Inc.
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithm
Anmitas1
 
K MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNINGK MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
Machine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptxMachine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptx
tecaviw979
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
K- Nearest Neighbour Algorithm.pptx
K-  Nearest   Neighbour   Algorithm.pptxK-  Nearest   Neighbour   Algorithm.pptx
K- Nearest Neighbour Algorithm.pptx
DivyaKS12
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
Iwan Sofana
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
fetnbadani
 
part3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptxpart3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptx
VaishaliBagewadikar
 
Introduction to data visualization tools like Tableau and Power BI and Excel
Introduction to data visualization tools like Tableau and Power BI  and ExcelIntroduction to data visualization tools like Tableau and Power BI  and Excel
Introduction to data visualization tools like Tableau and Power BI and Excel
Lipika Sharma
 
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Machine learning by using python Lesson One Part 2 By Professor Lili SaghafiMachine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Professor Lili Saghafi
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Knoldus Inc.
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithm
Anmitas1
 
K MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNINGK MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
Machine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptxMachine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptx
tecaviw979
 

Recently uploaded (20)

How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Ad

Lecture03 - K-Nearest-Neighbor Machine learning

  • 1. K-Nearest Neighbour Tushar B. Kute, http://tusharkute.com
  • 2. What sort of Machine Learning? • An idea that can be used for machine learning— as does another maxim involving poultry: "birds of a feather flock together." • In other words, things that are alike are likely to have properties that are alike. • We can use this principle to classify data by placing it in the category with the most similar, or "nearest" neighbors.
  • 3. Nearest Neighbor Classification • In a single sentence, nearest neighbor classifiers are defined by their characteristic of classifying unlabeled examples by assigning them the class of the most similar labeled examples. Despite the simplicity of this idea, nearest neighbor methods are extremely powerful. They have been used successfully for: – Computer vision applications, including optical character recognition and facial recognition in both still images and video – Predicting whether a person enjoys a movie which he/she has been recommended (as in the Netflix challenge) – Identifying patterns in genetic data, for use in detecting specific protein or diseases
  • 4. The kNN Algorithm • The kNN algorithm begins with a training dataset made up of examples that are classified into several categories, as labeled by a nominal variable. • Assume that we have a test dataset containing unlabeled examples that otherwise have the same features as the training data. • For each record in the test dataset, kNN identifies k records in the training data that are the "nearest" in similarity, where k is an integer specified in advance. • The unlabeled test instance is assigned the class of the majority of the k nearest neighbors
  • 5. Example: Reference: Machine Learning with R, Brett Lantz, Packt Publishing
  • 10. Calculating Distance • Locating the tomato's nearest neighbors requires a distance function, or a formula that measures the similarity between two instances. • There are many different ways to calculate distance. • Traditionally, the kNN algorithm uses Euclidean distance, which is the distance one would measure if you could use a ruler to connect two points, illustrated in the previous figure by the dotted lines connecting the tomato to its neighbors.
  • 12. Distance • Euclidean distance is specified by the following formula, where p and q are th examples to be compared, each having n features. The term p1 refers to the value of the first feature of example p, while q1 refers to the value of the first feature of example q: • The distance formula involves comparing the values of each feature. For example, to calculate the distance between the tomato (sweetness = 6, crunchiness = 4), and the green bean (sweetness = 3, crunchiness = 7), we can use the formula as follows:
  • 16. Choosing appropriate k • Deciding how many neighbors to use for kNN determines how well the mode will generalize to future data. • The balance between overfitting and underfitting the training data is a problem known as the bias- variance tradeoff. • Choosing a large k reduces the impact or variance caused by noisy data, but can bias the learner such that it runs the risk of ignoring small, but important patterns.
  • 18. Choosing appropriate k • In practice, choosing k depends on the difficulty of the concept to be learned and the number of records in the training data. • Typically, k is set somewhere between 3 and 10. One common practice is to set k equal to the square root of the number of training examples. • In the classifier, we might set k = 4, because there were 15 example ingredients in the training data and the square root of 15 is 3.87.
  • 19. Min-Max normalization • The traditional method of rescaling features for kNN is min- max normalization. • This process transforms a feature such that all of its values fall in a range between 0 and 1. The formula for normalizing a feature is as follows. Essentially, the formula subtracts the minimum of feature X from each value and divides by the range of X: • Normalized feature values can be interpreted as indicating how far, from 0 percent to 100 percent, the original value fell along the range between the original minimum and maximum.
  • 20. The Lazy Learning • Using the strict definition of learning, a lazy learner is not really learning anything. • Instead, it merely stores the training data in it. This allows the training phase to occur very rapidly, with a potential downside being that the process of making predictions tends to be relatively slow. • Due to the heavy reliance on the training instances, lazy learning is also known as instance-based learning or rote learning.
  • 21. Few Lazy Learning Algorithms • K Nearest Neighbors • Local Regression • Lazy Naive Bayes
  • 23. Python Packages needed • pandas – Data Analytics • numpy – Numerical Computing • matplotlib.pyplot – Plotting graphs • sklearn – Classification and Regression Classes
  • 25. KNN – Classification : Dataset • The best small project to start with on a new tool is the classification of iris flowers (e.g. the iris dataset). • This is a good project because it is so well understood. – Attributes are numeric so you have to figure out how to load and handle data. – It is a classification problem, allowing you to practice with perhaps an easier type of supervised learning algorithm. – It is a multi-class classification problem (multi-nominal) that may require some specialized handling. – It only has 4 attributes and 150 rows, meaning it is small and easily fits into memory (and a screen or A4 page). – All of the numeric attributes are in the same units and the same scale, not requiring any special scaling or transforms to get started.
  • 30. Finding error with changed k
  • 32. Useful resources • www.mitu.co.in • www.pythonprogramminglanguage.com • www.scikit-learn.org • www.towardsdatascience.com • www.medium.com • www.analyticsvidhya.com • www.kaggle.com • www.stephacking.com • www.github.com
  • 33. [email protected] Thank you This presentation is created using LibreOffice Impress 5.1.6.2, can be used freely as per GNU General Public License Web Resources https://mitu.co.in http://tusharkute.com /mITuSkillologies @mitu_group [email protected] /company/mitu- skillologies MITUSkillologies