SlideShare a Scribd company logo
Lesson 18
Classification Techniques – K-Nearest Neighbors Kush Kulshrestha
kNN Algorithm Summed up
kNN Algorithm
The KNN algorithm assumes that similar things exist in close proximity.
In other words, similar things are near to each other.
Notice in the image above that most of the time, similar data points are close to each other. The KNN algorithm
hinges on this assumption being true enough for the algorithm to be useful.
Which data points are near to me?
KNN captures the idea of similarity (sometimes called distance, proximity, or closeness) by calculating the distance
between points on a graph.
How could you find the distance between two points?
One way is to use the Pythagoras theorem and draw some perpendicular lines.
Which data points are near to me?
By solving below equation, we can get the value of c, which could be used as effective distance between A and B.
This distance is called Euclidean distance. There are other distances that could
be used to calculate the number representing the distance between two points.
Check here:
https://towardsdatascience.com/importance-of-distance-metrics-in-machine-learning-modelling-e51395ffe60d
For now, we will be going forward with Euclidean distance.
The Algorithm
1. Load the data
2. Initialize K to your chosen number of neighbors
3. For each example in the data:
1. Calculate the distance between the training data and the current example from the data.
2. Add the distance and the index of the example to an ordered collection.
4. Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the distances
5. Pick the first K entries from the sorted collection
6. Get the labels of the selected K entries
7. If regression, return the mean of the K labels
8. If classification, return the mode of the K labels
Choosing the right k
To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and
choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately
make predictions when it’s given data it hasn’t seen before.
Things to care about:
a. As we decrease the value of K to 1, our predictions become less stable
b. Inversely, as we increase the value of K, our predictions become more stable due to majority voting / averaging,
and thus, more likely to make more accurate predictions (up to a certain point). Eventually, we begin to witness
an increasing number of errors. It is at this point we know we have pushed the value of K too far.
c. In cases where we are taking a majority vote (e.g. picking the mode in a classification problem) among labels, we
usually make K an odd number to have a tiebreaker.
Choosing the right k
Under fitting – Over fitting.
Sounds familiar?
Choosing the right k
To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and
choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately
make predictions when it’s given data it hasn’t seen before.
Trend in Training error:
With k=1, training error will always be 0 because, because we are just looking at ourselves to classify us. As k
increases, the training error will also increase, until k is so big that all of the examples are classified as one class.
Choosing the right k
Trend in Testing error:
At K=1, we were overfitting the boundaries. Hence, error rate initially decreases and reaches a minima. After the
minima point, it then increase with increasing K. To get the optimal value of K, you can segregate the training and
validation from the initial dataset. Now plot the validation error curve to get the optimal value of K. This value of K
should be used for all predictions.
Pros and Cons
Advantages:
The algorithm is simple and easy to implement.
There’s no need to build a model, tune several parameters, or make additional assumptions.
Disadvantages:
The algorithm gets significantly slower as the number of examples and/or predictors/independent variables
increase. This is called the curse of Dimensionality.
Reference: https://en.wikipedia.org/wiki/Curse_of_dimensionality
Ad

More Related Content

What's hot (20)

Clustering ppt
Clustering pptClustering ppt
Clustering ppt
sreedevibalasubraman
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Knowledge And Skill Forum
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
Yiqun Hu
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
Sivam Chinna
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
QuantUniversity
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
Edureka!
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Iris - Most loved dataset
Iris - Most loved datasetIris - Most loved dataset
Iris - Most loved dataset
DrAsmitaTitre
 
KNN
KNNKNN
KNN
BhuvneshYadav13
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
Vinit Dantkale
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
Functional Imperative
 
Decision tree
Decision treeDecision tree
Decision tree
ShraddhaPandey45
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
Yiqun Hu
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
Sivam Chinna
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
QuantUniversity
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
Edureka!
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Iris - Most loved dataset
Iris - Most loved datasetIris - Most loved dataset
Iris - Most loved dataset
DrAsmitaTitre
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
Vinit Dantkale
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
Functional Imperative
 

Similar to Machine Learning Algorithm - KNN (20)

KNN Classificationwithexplanation and examples.pptx
KNN Classificationwithexplanation and examples.pptxKNN Classificationwithexplanation and examples.pptx
KNN Classificationwithexplanation and examples.pptx
ansarinazish958
 
Statistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notesStatistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notes
SureshK256753
 
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptxKNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
Nishant83346
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
Abdullah al Mamun
 
SVM & KNN Presentation.pptx
SVM & KNN Presentation.pptxSVM & KNN Presentation.pptx
SVM & KNN Presentation.pptx
MohamedMonir33
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
Gautam Kumar
 
K nearest neighbor algorithm
  K nearest neighbor algorithm  K nearest neighbor algorithm
K nearest neighbor algorithm
Learnbay Datascience
 
k-nearest neighbour Machine Learning.pdf
k-nearest neighbour Machine Learning.pdfk-nearest neighbour Machine Learning.pdf
k-nearest neighbour Machine Learning.pdf
SabbirAhmed346057
 
MachineLearning.pptx
MachineLearning.pptxMachineLearning.pptx
MachineLearning.pptx
Bangtangurl
 
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUOLecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
AjayKumar773878
 
K nearest neighbor: classify by closest training points.
K nearest neighbor: classify by closest training points.K nearest neighbor: classify by closest training points.
K nearest neighbor: classify by closest training points.
mcafarewell2e
 
k-nearest neighbour Machine Learning.pptx
k-nearest neighbour Machine Learning.pptxk-nearest neighbour Machine Learning.pptx
k-nearest neighbour Machine Learning.pptx
SabbirAhmed346057
 
KNN Classifier
KNN ClassifierKNN Classifier
KNN Classifier
Mobashshirur Rahman 👲
 
Clustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.pptClustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.ppt
shilpamathur13
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
Piyush Srivastava
 
Bb25322324
Bb25322324Bb25322324
Bb25322324
IJERA Editor
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Knoldus Inc.
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
cscpconf
 
Reuqired ppt for machine learning algirthms and part
Reuqired ppt for machine learning algirthms and partReuqired ppt for machine learning algirthms and part
Reuqired ppt for machine learning algirthms and part
SiddheshMhatre27
 
KNN Classificationwithexplanation and examples.pptx
KNN Classificationwithexplanation and examples.pptxKNN Classificationwithexplanation and examples.pptx
KNN Classificationwithexplanation and examples.pptx
ansarinazish958
 
Statistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notesStatistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notes
SureshK256753
 
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptxKNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
Nishant83346
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
Abdullah al Mamun
 
SVM & KNN Presentation.pptx
SVM & KNN Presentation.pptxSVM & KNN Presentation.pptx
SVM & KNN Presentation.pptx
MohamedMonir33
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
Gautam Kumar
 
K nearest neighbor algorithm
  K nearest neighbor algorithm  K nearest neighbor algorithm
K nearest neighbor algorithm
Learnbay Datascience
 
k-nearest neighbour Machine Learning.pdf
k-nearest neighbour Machine Learning.pdfk-nearest neighbour Machine Learning.pdf
k-nearest neighbour Machine Learning.pdf
SabbirAhmed346057
 
MachineLearning.pptx
MachineLearning.pptxMachineLearning.pptx
MachineLearning.pptx
Bangtangurl
 
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUOLecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
AjayKumar773878
 
K nearest neighbor: classify by closest training points.
K nearest neighbor: classify by closest training points.K nearest neighbor: classify by closest training points.
K nearest neighbor: classify by closest training points.
mcafarewell2e
 
k-nearest neighbour Machine Learning.pptx
k-nearest neighbour Machine Learning.pptxk-nearest neighbour Machine Learning.pptx
k-nearest neighbour Machine Learning.pptx
SabbirAhmed346057
 
Clustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.pptClustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.ppt
shilpamathur13
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
Piyush Srivastava
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Knoldus Inc.
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
cscpconf
 
Reuqired ppt for machine learning algirthms and part
Reuqired ppt for machine learning algirthms and partReuqired ppt for machine learning algirthms and part
Reuqired ppt for machine learning algirthms and part
SiddheshMhatre27
 
Ad

More from Kush Kulshrestha (17)

Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
Kush Kulshrestha
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for Classification
Kush Kulshrestha
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic Regression
Kush Kulshrestha
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
Kush Kulshrestha
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
Kush Kulshrestha
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
Kush Kulshrestha
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine Learning
Kush Kulshrestha
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Visualization-2
Visualization-2Visualization-2
Visualization-2
Kush Kulshrestha
 
Visualization-1
Visualization-1Visualization-1
Visualization-1
Kush Kulshrestha
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
Kush Kulshrestha
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Kush Kulshrestha
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
Kush Kulshrestha
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric Vehicles
Kush Kulshrestha
 
Time management
Time managementTime management
Time management
Kush Kulshrestha
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their types
Kush Kulshrestha
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
Kush Kulshrestha
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for Classification
Kush Kulshrestha
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic Regression
Kush Kulshrestha
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
Kush Kulshrestha
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
Kush Kulshrestha
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
Kush Kulshrestha
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine Learning
Kush Kulshrestha
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
Kush Kulshrestha
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Kush Kulshrestha
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
Kush Kulshrestha
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric Vehicles
Kush Kulshrestha
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their types
Kush Kulshrestha
 
Ad

Recently uploaded (20)

183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 

Machine Learning Algorithm - KNN

  • 1. Lesson 18 Classification Techniques – K-Nearest Neighbors Kush Kulshrestha
  • 3. kNN Algorithm The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. Notice in the image above that most of the time, similar data points are close to each other. The KNN algorithm hinges on this assumption being true enough for the algorithm to be useful.
  • 4. Which data points are near to me? KNN captures the idea of similarity (sometimes called distance, proximity, or closeness) by calculating the distance between points on a graph. How could you find the distance between two points? One way is to use the Pythagoras theorem and draw some perpendicular lines.
  • 5. Which data points are near to me? By solving below equation, we can get the value of c, which could be used as effective distance between A and B. This distance is called Euclidean distance. There are other distances that could be used to calculate the number representing the distance between two points. Check here: https://towardsdatascience.com/importance-of-distance-metrics-in-machine-learning-modelling-e51395ffe60d For now, we will be going forward with Euclidean distance.
  • 6. The Algorithm 1. Load the data 2. Initialize K to your chosen number of neighbors 3. For each example in the data: 1. Calculate the distance between the training data and the current example from the data. 2. Add the distance and the index of the example to an ordered collection. 4. Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the distances 5. Pick the first K entries from the sorted collection 6. Get the labels of the selected K entries 7. If regression, return the mean of the K labels 8. If classification, return the mode of the K labels
  • 7. Choosing the right k To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately make predictions when it’s given data it hasn’t seen before. Things to care about: a. As we decrease the value of K to 1, our predictions become less stable b. Inversely, as we increase the value of K, our predictions become more stable due to majority voting / averaging, and thus, more likely to make more accurate predictions (up to a certain point). Eventually, we begin to witness an increasing number of errors. It is at this point we know we have pushed the value of K too far. c. In cases where we are taking a majority vote (e.g. picking the mode in a classification problem) among labels, we usually make K an odd number to have a tiebreaker.
  • 8. Choosing the right k Under fitting – Over fitting. Sounds familiar?
  • 9. Choosing the right k To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately make predictions when it’s given data it hasn’t seen before. Trend in Training error: With k=1, training error will always be 0 because, because we are just looking at ourselves to classify us. As k increases, the training error will also increase, until k is so big that all of the examples are classified as one class.
  • 10. Choosing the right k Trend in Testing error: At K=1, we were overfitting the boundaries. Hence, error rate initially decreases and reaches a minima. After the minima point, it then increase with increasing K. To get the optimal value of K, you can segregate the training and validation from the initial dataset. Now plot the validation error curve to get the optimal value of K. This value of K should be used for all predictions.
  • 11. Pros and Cons Advantages: The algorithm is simple and easy to implement. There’s no need to build a model, tune several parameters, or make additional assumptions. Disadvantages: The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase. This is called the curse of Dimensionality. Reference: https://en.wikipedia.org/wiki/Curse_of_dimensionality