0% found this document useful (0 votes)
8 views

Lecture-8 Classification Using K-NN

Classification Using K-NN

Uploaded by

Rimsha Shabbir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture-8 Classification Using K-NN

Classification Using K-NN

Uploaded by

Rimsha Shabbir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Classification in

Machine Learning
Lecture 08
Discussion
2

Just because you


are young and
healthy does not
mean death will
not come for
you.
Death has no age
restrictions.
We should always
be prepared.
K-Nearest Neighbor
Agenda:
•A Quick Recap (Important Concepts)
•What is KNN algorithm
•Industrial Use case of KNN Algorithm
•How prediction is done using KNN
•How to choose the value of K?
•KNN Algorithm using Python

4
5
Is that a dog?
No dear, you can
differentiate
between a cat
and a dog based
on their
characteristics
No dear, you can
differentiate
CATS DOGS between a cat
and a dog based
on their
characteristics

Sharp Claws, uses to Dull Claws


climb
Bigger length of
Smaller length of ears ears

Meows and purrs Barks

Doesn’t love to play Loves to run around


around
No dear, you can
differentiate
CATS DOGS between a cat
and a dog based
on their
characteristics
Sharpness of claws 

Length of ears 
Now tell me if it
is a cat or a dog?
DOGS Now tell me if
CATS it’s a cat or a
dog?
Sharpness of claws 

Length of ears 
It’s features are
more like cats, it
must be a cat!
CATS DOGS
Sharp of claws 

Length of ears 
Because KNN is based on
feature similarity, we can
do classification using
KNN Classifier!

KNN

Input value Predicted Output


Applications of KNN

● Recommender Systems
● Movie Recommender
● Shopping Recommender
● Browser recommendation
● Concept Search
● Searching documents with similar content
● Grouping documents with similar content

15
16
17
18
19
Small Example:

20
Classification Algorithms- Lazy Learners
● Lazy learners simply store the training data and wait until a
testing data appear.
● No generalization is performed
● No explicit models are generated
● Classification is conducted based on the most related
data in the stored training data.
● Compared to eager learners, lazy learners have less training
time but more time in predicting.
● Examples
● KNN, Naïve Bayes

21
Classification Algorithms- Eager Learners
● Eager learners construct a classification model based on
the given training data before receiving data for
classification.
● It must be able to commit to a single hypothesis that
covers the entire instance space.
● Generalization
● Due to the model construction, eager learners take a long
time for train and less time to predict.
● Examples
● Decision Trees, ANN

22
23
Characteristics:

• Lazy Classifier
• Doesn’t Abstract from the training data
• Data-driven, not model-driven
• Makes no assumptions about the data

24
Basic Idea:
● Idea:
● Similar examples have similar label.
● Classify new examples like similar training examples.
● Algorithm:
● Given some new example x for which we need to predict its
class y
● Find most similar training examples
● Classify x “like” these most similar examples
● Questions:
● How to determine similarity?
● How many similar training examples to consider?
● How to resolve inconsistencies among the training examples?

25
1-Nearest Neighbor
● One of the simplest of all machine learning
classifiers
● Simple idea: label a new point the same as the
closest known point

Label it red.

26
1-Nearest Neighbor
● A type of instance-based learning
● Also known as “memory-based” learning
● Looks for a neighbor nearest to this instance
● Assigns the class of the neighbor to the new instance

27
Distance Metrics:
● Different metrics can the neighbourhood
● Standard Euclidean distance metric:
● Two-dimensional: Dist(a,b) = sqrt((a1 – b1)2 + (a2 – b2)2)
● Multivariate: Dist(a,b) = sqrt(∑ (ai – bi)2)

Dist(a,b) =(a1 – b1)2 + (a2 – b2)2


Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2

28
k – Nearest Neighbor
● Generalizes 1-NN to smooth away noise in the labels
● A new point is now assigned the most frequent label
of its k nearest neighbors

Label it red, when k = 3

Label it rust, when k = 7

29
Selecting the Number of Neighbors
● Increase k:
● Makes KNN less sensitive to noise

● Decrease k:
● Allows capturing finer structure of space

● Pick k not too large, but not too small (depends


on data)

30
Curse-of-Dimensionality
● Prediction accuracy can quickly degrade when number of
attributes grows.
● Irrelevant attributes easily “swamp” information from relevant
attributes
● When many irrelevant attributes, similarity/distance measure
becomes less reliable

● Remedy
● Try to remove irrelevant attributes in pre-processing step
● Weight attributes differently
● Increase k (but not too much)

31
Distance Metrics

32
Example

33
Example

34

34
Advantages/Disadvantages of KNN
● Advantages
● Very simple and easy to understand algorithm
● Very little, almost no time required for training
● Quite effective in certain applications such as recommender
systems
● Disadvantages/Weaknesses
● Doesn’t actually learn anything
● Heavy reliance on training data
● Classification procedure is slow
● Large computational space is required to lead the training data for
classification.

35
Using K-NN (for Numerical Outcome)

● Instead of “majority vote determines class” use average


of response values

● May be a weighted average, weight decreasing with distance

36
37
KNN Step by Step:

38
39
Thank you
Any Question?

40

You might also like