ML-Lec7
ML-Lec7
Overview
• In contrast to learning methods that construct a general,
• Introduction of Instance-based learning explicit description of the target function when training
• K-Nearest Neighbor examples are provided, instance-based learning methods
simply store the training examples.
Instance-based Learning -
Instance-based Learning -
Overview
Overview • Instance-based learning methods such as Nearest
1. Instance-based learning methods are sometimes
Neighbor are conceptually straightforward
referred to as delayed/lazy learning methods because approaches to approximating real-valued or discrete-
they delay processing until a new instance must be valued target functions.
classified.
1. Learning in these algorithms consists of simply storing
2. A key advantage of delayed/lazy learning is that instead the presented training data.
of estimating the target function one for the entire 2. When a new query instance is encountered, a set of
instance space, these methods can estimate it locally similar related instances is retrieved from memory and
and differently for each new instance to be classified. used to classify the new query instance.
3 4
Instance-based Learning - Instance-based Learning -
Overview Disadvantages
• Many techniques construct only a local approximation
to the target function that applies in the neighborhood 1. One disadvantage of instance-based
of the new query instance, and never construct an approaches is that the cost of classifying new
approximation designed to perform well over the instances can be high.
entire instance space.
• This is due to the fact that nearly all computation
• This has significant advantages when the target takes place at classification time rather than when
function is very complex, but can still be described by the training examples are first encountered.
a collection of less complex local approximations.
5 6
7 8
K-nearest Neighbor Learning K-nearest Neighbor Learning
(Euclidean distance) (output type)
• Let an arbitrary instance
• In nearest-neighbor learning the target function
x be described by the a
feature vector 1 (x),a2 (x),...an (x) may be either discrete-valued or real-valued.
(a (x ) − a (x ))
approximating a discrete-valued target function is
d (xi , x j ) r i r j
2
given in next page ->
r =1 9 10
K-nearest Neighbor Algorithm for approximating K-nearest Neighbor Algorithm for approximating
a discrete-valued function f : n -> V, a discrete-valued function f : n -> V,
• Training algorithm:
• For each training example <x, f(x)>, add the example to the listing
• As shown there, the value fˆ(xq ) returned by this algorithm as its
training_examples estimate of f(xq) is just the most common value of f among the
k training examples nearest to xq.
• Classification algorithm
• Given a query instance xq to be classified,
• Let x1….xk denote the k instances from training_examples that are • If we choose k = 1, then the 1-NEAREST NEIGHBOR algorithm
nearest to xq assigns to fˆ(xq ) the value f(xi) where xi is the training instance
• Return nearest to xq.
k
fˆ(xq ) arg max
vV
(v, f (x )) i • For larger values of k, the algorithm assigns the most common
i=1 value among the k nearest training examples.
fˆ
k
f (xi )
fˆ(xq ) i=1
k
w f (x )
of each of the k the instances for real- k
vV
neighbors according to i=1 valued target functions in i i
their distance to the where a similar way. fˆ(xq ) i=1
w
k
query point xq, giving wi
1
greater weight to closer d (xq , xi )2 i=1 i
neighbors. where
1
⚫ If xq exactly match one of the training instances xi, the wi
denominator d(xq,xi)2 is zero and, we assign fˆ(xq ) to be 16
d (xq , xi )2
f(xi).
16
Remark
Remark
• Note all of the above variants of the k-NEAREST NEIGHBOR algorithm
consider only the k nearest neighbors to classify the query point.
• The distance-weighted k-NEAREST
• Once we add distance weighting, there is no harm in allowing all NEIGHBOR algorithm is a highly effective
training examples because very distant examples will have very little inductive inference method for many practical
effect on fˆ(xq ) problems.
• The only disadvantage of considering all examples is that our classifier • It is robust to noisy training data and quite effective
will run more slowly.
when it is provided a sufficiently large set of training
data.
• If all training examples are considered when classifying a new query
instance, we call the algorithm a global method.
• Note that by taking the weighted average of the k
• If only the nearest training examples are considered, we call it a local neighbors nearest to the query point, it can smooth
method. out the impact of isolated noisy training examples.
17 18
Remark
19