Lecture-8 Classification Using K-NN
Lecture-8 Classification Using K-NN
Machine Learning
Lecture 08
Discussion
2
4
5
Is that a dog?
No dear, you can
differentiate
between a cat
and a dog based
on their
characteristics
No dear, you can
differentiate
CATS DOGS between a cat
and a dog based
on their
characteristics
Length of ears
Now tell me if it
is a cat or a dog?
DOGS Now tell me if
CATS it’s a cat or a
dog?
Sharpness of claws
Length of ears
It’s features are
more like cats, it
must be a cat!
CATS DOGS
Sharp of claws
Length of ears
Because KNN is based on
feature similarity, we can
do classification using
KNN Classifier!
KNN
● Recommender Systems
● Movie Recommender
● Shopping Recommender
● Browser recommendation
● Concept Search
● Searching documents with similar content
● Grouping documents with similar content
15
16
17
18
19
Small Example:
20
Classification Algorithms- Lazy Learners
● Lazy learners simply store the training data and wait until a
testing data appear.
● No generalization is performed
● No explicit models are generated
● Classification is conducted based on the most related
data in the stored training data.
● Compared to eager learners, lazy learners have less training
time but more time in predicting.
● Examples
● KNN, Naïve Bayes
21
Classification Algorithms- Eager Learners
● Eager learners construct a classification model based on
the given training data before receiving data for
classification.
● It must be able to commit to a single hypothesis that
covers the entire instance space.
● Generalization
● Due to the model construction, eager learners take a long
time for train and less time to predict.
● Examples
● Decision Trees, ANN
22
23
Characteristics:
• Lazy Classifier
• Doesn’t Abstract from the training data
• Data-driven, not model-driven
• Makes no assumptions about the data
24
Basic Idea:
● Idea:
● Similar examples have similar label.
● Classify new examples like similar training examples.
● Algorithm:
● Given some new example x for which we need to predict its
class y
● Find most similar training examples
● Classify x “like” these most similar examples
● Questions:
● How to determine similarity?
● How many similar training examples to consider?
● How to resolve inconsistencies among the training examples?
25
1-Nearest Neighbor
● One of the simplest of all machine learning
classifiers
● Simple idea: label a new point the same as the
closest known point
Label it red.
26
1-Nearest Neighbor
● A type of instance-based learning
● Also known as “memory-based” learning
● Looks for a neighbor nearest to this instance
● Assigns the class of the neighbor to the new instance
27
Distance Metrics:
● Different metrics can the neighbourhood
● Standard Euclidean distance metric:
● Two-dimensional: Dist(a,b) = sqrt((a1 – b1)2 + (a2 – b2)2)
● Multivariate: Dist(a,b) = sqrt(∑ (ai – bi)2)
28
k – Nearest Neighbor
● Generalizes 1-NN to smooth away noise in the labels
● A new point is now assigned the most frequent label
of its k nearest neighbors
29
Selecting the Number of Neighbors
● Increase k:
● Makes KNN less sensitive to noise
● Decrease k:
● Allows capturing finer structure of space
30
Curse-of-Dimensionality
● Prediction accuracy can quickly degrade when number of
attributes grows.
● Irrelevant attributes easily “swamp” information from relevant
attributes
● When many irrelevant attributes, similarity/distance measure
becomes less reliable
● Remedy
● Try to remove irrelevant attributes in pre-processing step
● Weight attributes differently
● Increase k (but not too much)
31
Distance Metrics
32
Example
33
Example
34
34
Advantages/Disadvantages of KNN
● Advantages
● Very simple and easy to understand algorithm
● Very little, almost no time required for training
● Quite effective in certain applications such as recommender
systems
● Disadvantages/Weaknesses
● Doesn’t actually learn anything
● Heavy reliance on training data
● Classification procedure is slow
● Large computational space is required to lead the training data for
classification.
35
Using K-NN (for Numerical Outcome)
36
37
KNN Step by Step:
38
39
Thank you
Any Question?
40