0% found this document useful (0 votes)
6 views

cs4302-lecture2

The document discusses K-nearest neighbors (KNN) as a method for inductive learning in supervised learning contexts, focusing on classification and regression problems. It explains the concept of hypothesis space, generalization, and various distance metrics used in KNN, such as Euclidean and Minkowski distances. Additionally, it addresses challenges like overfitting, underfitting, and the importance of choosing the right value of k, along with techniques like cross-validation to optimize model performance.

Uploaded by

jp9tavdrt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

cs4302-lecture2

The document discusses K-nearest neighbors (KNN) as a method for inductive learning in supervised learning contexts, focusing on classification and regression problems. It explains the concept of hypothesis space, generalization, and various distance metrics used in KNN, such as Euclidean and Minkowski distances. Additionally, it addresses challenges like overfitting, underfitting, and the importance of choosing the right value of k, along with techniques like cross-validation to optimize model performance.

Uploaded by

jp9tavdrt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Lecture 2: K-nearest neighbours

g
2024

Xin Li
School of Computer Science,
Beijing Institute of Technology
Inductive Learning (recap)

Induction

form (x, f x )
Given a training set of examples of the

x is the input, f(x) is the output

Return a function ℎ that approximates f


ℎ is called the hypothesis

PAGE 2
Supervised Learning
Two types of problems
1. Classification

2. Regression

NB: The nature (categorical or continuous) of the


domain (input space) off does not matter

PAGE 3
Classification Example
Problem: Will you enjoy an outdoor sport based on the
weather?
Training Sky Humidity Wind Water Forecast EnjoySport
set: Sunny Normal Strong Warm Same yes
Sunny High Strong Warm Same yes
Sunny High Strong Warm Change no
Sunny High Strong Cool Change yes

!
x f(x
Possible )

h 1 :s = sunny → enjoysport = yes


Hypotheses:

h " : wa = cool or F = same →


enjoysport = yes
PAGE 4
Regression Example
Find function h that fits f at
instances x

PAGE 5
More
Examples
Problem Domain Range Classification /
Regression
Spam Detection

Stock price prediction

Speech recognition

Digit recognition

Housing valuation

Weather prediction

PAGE 6
Hypothesis Space
Hypothesis space H
Set of all hypotheses ℎ that the learner may
consider
Learning is a search through hypothesis
space
Objective: find h that minimizes
Misclassification (or more generally some
error function) with respect to the training
examples
PAGE 7
Generalization
A good hypothesis will generalize well
i.e., predict unseen examples correctly

Usually …
Any hypothesis ℎ found to approximate the target function f
well over a sufficiently large set of training examples
will also approximate the target function well over any
unobserved examples

PAGE 8
Inductive Learning
Goal: find an ℎ that agrees with f on training set
ℎ is consistent if it agrees with f on all examples

Finding a consistent hypothesis is not always possible


Insufficient hypothesis space:
E.g., it is not possible to learn exactly f x = ax + b + xsin(x)
when H = space of polynomials of finite degree

Noisy data
E.g., in weather prediction, identical conditions may lead to
rainy and sunny days
PAGE 9
Inductive Learning
A learning problem is realizable if the hypothesis space
contains the true function otherwise it is unrealizable.
Difficult to determine whether a learning problem is realizable
since the true function is not known
It is possible to use a very large hypothesis space
For example: H = class of all Turing machines
But there is a tradeoffbetween expressiveness of a
hypothesis class and the complexity of finding a good
hypothesis
PAGE 10
Nearest Neighbor Classifiers
Basic idea: If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the


Records “nearest” records

PAGE 10
Nearest Neighbour Classification
Classification function: ℎ x = y x*

where y x* is the label associated with the nearest


neighbour
Distance measures: d x,
xꞌ

PAGE 11
Euclidean Distance

• Euclidean Distance

n 2
dist   ( pk  qk )
k 1
Where n is the number of dimensions (attributes) and pk and qk
are, respectively, the kth attributes (components) or data
objects p and q.

• Standardization is necessary, if scales differ.


Euclidean Distance
3
point x y
2 p1
p1 0 2
p3 p4
1
p2 2 0
p2 p3 3 1
0 p4 5 1
0 1 2 3 4 5 6

p1 p2 p3 p4
p1 0 2.828 3.162 5.099
p2 2.828 0 1.414 3.162
p3 3.162 1.414 0 2
p4 5.099 3.162 2 0

Distance Matrix
Minkowski Distance
Minkowski Distance is a generalization of Euclidean Distance
1
n r r
dist (  | pk  qk | )
k 1

Where r is a parameter, n is the number of dimensions (attributes) and pk and


qk are, respectively, the kth attributes (components) or data objects p and q.
Minkowski Distance: Examples

r = 1. City block (Manhattan, taxicab, L1 norm) distance.


A common example of this is the Hamming distance, which is just the number of bits that are different
between two binary vectors

r = 2. Euclidean distance

r  . “supremum” (Lmax norm, L norm) distance.


This is the maximum difference between any component of the vectors

Do not confuse r with n, i.e., all these distances are defined for all numbers of
dimensions.
L1 p1 p2 p3 p4
Minkowski Distance p1 0 4 4 6
p2 4 0 2 4
p3 4 2 0 2
p4 6 4 2 0
point x y
p1 0 2 L2 p1 p2 p3 p4
p2 2 0 p1 0 2.828 3.162 5.099
p3 3 1 p2 2.828 0 1.414 3.162
p4 5 1 p3 3.162 1.414 0 2
p4 5.099 3.162 2 0

L p1 p2 p3 p4
p1 0 2 3 5
p2 2 0 1 3
p3 3 1 0 2
p4 5 3 2 0

Distance Matrix
Mahalanobis Distance
1 T
mahalanobi s( p, q) ( p  q)  ( p  q)

 is the covariance matrix of the


input data X

1 n
 j ,k   ( X ij  X j )( X ik  X k )
n  1 i 1

For red points, the Euclidean distance is 14.7, Mahalanobis distance is 6.


Mahalanobis Distance
Covariance Matrix:

 0. 3 0 . 2 
  
 0 . 2 0 . 3
C

B A: (0.5, 0.5)
B: (0, 1)
A C: (1.5, 1.5)

Mahal(A,B) = 5
Mahal(A,C) = 4
Voronoi Diagram

neighbor fn ℎ
Partition implied by nearest

Assuming Euclidean distance

PAGE 12
K-Nearest Neighbour

Nearest neighbour often instable (noise)

Idea: assign most frequent label among k-nearest


neighbours
Let knn x be the k-nearest neighbours of x according
to distance d
Label:

PAGE 13
Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

 To classify an unknown record:


– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points


that have the k smallest distance to x
Nearest Neighbor Classification
Compute distance between two points:
Euclidean distance

d ( p, q ) 
Determine the class from nearest neighbor list
(p  q)
take the majority vote of class labels among the k-nearest neighbors
i i i
2

Weigh the vote according to distance


weight factor, w = 1/d2
Nearest Neighbor Classification…
Choosing the value of k:
If k is too small, sensitive to noise points
If k is too large, neighborhood may include points from other classes

X
Effect of K
K controls the degree of
smoothing.
Which partition do you prefer?
k = k = k =
1 3 31
Why?

PAGE 14
Performance of a learning algorithm
A learning algorithm is good if it produces a
hypothesis that does a good job of predicting
classifications of unseen examples
Verify performance with a test set
1. Collect a large set of examples
2. Divide into 2 disjoint sets: training set and test set
3. Learn hypothesis ℎ with training set
4. Measure percentage of correctly classified examples by ℎ in the
test set
PAGE 15
The effect of K
Best r depends on
Problem
Amount of
training data

PAGE 16
Underfitting

hypothesis ℎ
Defi nition: underfitting occurs when an algorithm finds a

future accuracy of some other hypothesis ℎ’


with training accuracy that is lower than the

Amount of underfitting of ℎ:

Common cause:
Classifier is not expressive enough
PAGE 17
Overfitting

hypothesis ℎ with higher training accuracy than its future


Defi nition: overfitting occurs when an algorithm finds a

accuracy.

Amount of overfitting of ℎ:

Common causes:
Classifier is too expressive
Noisy data
Lack of data PAGE 18
Choosing K
How should we choose K?
Ideally: select K with highest future accuracy
Alternative: select K with highest test accuracy

Problem: since we are choosing K based on the test set, the test
set effectively becomes part of the training set when optimizing K.
Hence, we cannot trust anymore the test set accuracy to be
representative of future accuracy.

Solution: split data into training, validation and test sets


Training set: compute nearest neighbour
Validation set: optimize hyperparameters such as K
Test set: measure performance
PAGE 19
Choosing K based on Validation
Set

PAGE 20
Robust validation

How can we ensure that validation


accuracy is representative of future
accuracy?
Validation accuracy becomes more reliable
as we increase the size of the
validation set
However, this reduces the amount of data left
for training

PAGE 21
Cross-Validation
Repeatedly split training data in two parts, one for training and one
for validation. Report the average validation accuracy.
k-fold cross validation: split training data in k equal size subsets.
Run k
experiments, each time validating on one subset and training on the

the k experiments.
remaining subsets. Compute the average validation accuracy of

Picture:

PAGE 22
Selecting the Number of Neighbours by Cross-Validation

PAGE 23
Selecting the Hyperparameters by Cross-Validation

PAGE 24
Weighted K-Nearest Neighbour

We can often improve K-nearest neighbours by


weighting each neighbour based on some distance
measure

PAGE 25
K-Nearest Neighbour
Regression
We can also use KNN for regression
Let yx be a real value instead of a
categorical label
K-nearest neighbour regression:

Weighted K-nearest neighbour regression:

PAGE 26
Nearest neighbor Classification…
Problem with Euclidean measure:
High dimensional data
curse of dimensionality
Can produce counter-intuitive results

111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142
Nearest neighbor Classification…
k-NN classifiers are lazy learners
It does not build models explicitly
Unlike eager learners such as decision tree induction and rule-based systems
Classifying unknown records are relatively expensive

You might also like