0% found this document useful (0 votes)

8 views

Lect05 Instance ML

Uploaded by

Hưng Nguyễn

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lect05 Instance ML

Uploaded by

Hưng Nguyễn

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Instance Based Learning

Bùi Tiến Lên

2022
Contents

1. Classification

2. Metric Learning

3. Regression

4. Clustering
Notation
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
symbol meaning
Loss Function
a, b, c, N . . . scalar number
Regression
Kernel Function w, v, x, y . . . column vector
Kernel Regression
k-NN Regression
X, Y . . . matrix operator meaning
Nadaraya-Watson
Model R set of real numbers w| transpose
Nadaraya-Watson
Parametric Model Z set of integer numbers XY matrix multiplication
Clustering N set of natural numbers X −1 inverse
RD set of vectors
k-Means
Hierarchical Clustering

set
k-d Tree
X , Y, . . .
A algorithm

3
Parametric vs Non-parametric Models
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Parametric Models Non-parametric Models
Metric Learning
Loss Function • In the models that we have seen, • A non parametric model is one
Regression
Kernel Function
we select a hypothesis space H that can not be characterized by a
Kernel Regression and adjust a fixed set of fixed set of parameters
parameters w with the training
k-NN Regression
Nadaraya-Watson
Model
• A family of non parametric models
Nadaraya-Watson
Parametric Model
data D is Instance Based Learning. The
Clustering • We assume that the parameters w function is based on the training
summarize the training data D data D = {x 1 , x 2 , ...x n }
k-Means
Hierarchical Clustering

and we can forget about it

k-d Tree

y = f (x; w) (1) y = f (x; x 1 , x 2 , ..., x n ) (2)

4
Inductive Bias
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Concept 1
In nonparametric model, we assume that similar inputs have similar outputs.
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
• This is a reasonable assumption: The world is smooth, and functions,
Nadaraya-Watson
Model whether they are densities, discriminants, or regression functions, change
Nadaraya-Watson
Parametric Model slowly. Similar instances mean similar things.
Clustering
k-Means
Hierarchical Clustering
k-d Tree

5
Classification
• k-Nearest Neighbor (k-NN)
• Effects of Hyper-parameters
When To Consider Nearest Neighbor
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Data points x ∈ RD
Metric Learning
Loss Function
• Less than D < 20 attributes
Regression
Kernel Function • Lots of training data D
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

7
Nearest Neighbor
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Learning mode
Metric Learning
Loss Function • Store all training examples D = {(x i , yi ) | i = 1, ..., N}
Running mode
Regression
Kernel Function

• Nearest neighbor: Given query instance x q , first locate the nearest

Kernel Regression
k-NN Regression

neighbhor x (1) , then estimate

Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering h(x q ) = y (1) (3)

k-Means
Hierarchical Clustering
k-d Tree
• k-Nearest neighbor: Given x q , take vote among its k nearest neighbors
{x (1) , x (2) , ..., x (k) }

h(x q ) = majority vote{y (1) , y (2) , ..., y (k) } (4)

8
Distance
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Some common distances in space RD
Metric Learning
Loss Function • The Minkowski distance of order p > 0
Regression
Kernel Function
D
!1/p
Kernel Regression
p
d(x, y) = Lp (x, y) = |xi − yi | (5)
X
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
i=1
Parametric Model

Clustering • Euclidean distance (popular)

k-Means
Hierarchical Clustering
k-d Tree
v
u D
d(x, y) = L2 (x, y) = t (xi − yi )2 (6)
uX

i=1

9
Distance (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
• Manhattan distance
Metric Learning
Loss Function
D
Regression
d(x, y) = L1 (x, y) = |xi − yi | (7)
X
Kernel Function
Kernel Regression
i=1
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree 0 0 0 0

p = 0.5 p= 1 p= 2 p= 4

Figure 1: Contours of the distance from the origin O for various values of the parameter p

10
The Curse of dimensionality
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • The more dimensions we have, the more examples we need
Metric Learning
Loss Function
• The number of examples that we have in a volume of space decreases
exponentially with the number of dimensions
Regression
Kernel Function
Kernel Regression
k-NN Regression
• If the number of dimensions is very high, the nearest neighbours can be
Nadaraya-Watson
Model very far away
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

11
Analysis
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Advantages Disadvantages
Metric Learning
Loss Function • No training, just store data • Slow at query time
Regression
Kernel Function
• Learn complex target functions • Easily fooled by irrelevant
Kernel Regression
k-NN Regression • Don’t lose information attributes
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

12
Parameter k
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • if k = 1 the cross point x should be classified to square class
Metric Learning
Loss Function
• if k = 3 ?
Regression
Kernel Function • if k = 5 ?
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
square class
Nadaraya-Watson
Parametric Model
circle class
Clustering
k-Means
Hierarchical Clustering
k-d Tree

13
Parameter k (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Data set D with 500 samples belonging to two classes {blue, orange}
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

14
Parameter k (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Decision regions for various values of k
Metric Learning
Loss Function k=1 k=2 k=3
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
k=4 k=5 k=6
Clustering
k-Means
Hierarchical Clustering
k-d Tree

k = 10 k = 20 k = 50

15
Metric Learning
• Motivation
• Metric Learning
• Loss Function
Motivation
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Nearest neighbor classification
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

17
Motivation (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Clustering
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

18
Motivation (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Information retrieval
Metric Learning
Loss Function
Query image
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

Most similar images

19
Motivation (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Data visualization
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

20
Metric Learning
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Given a set of data points X and their corresponding labels Y
Metric Learning
Loss Function • Select a parametric distance or similarity function
Regression

dW (x, x 0 ) = L fW (x), fW (x 0 ) (8)

Kernel Function

Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
• An embedding function (parametric function)
Parametric Model

Clustering
k-Means
fW (x) : X → Rn (9)
Hierarchical Clustering
k-d Tree • A distance function (which is usually fixed beforehand)

L(x, x 0 ) : Rn × Rn → R (10)
• The goal is to train the parametric distance, so that the combination
dW (x, x 0 ) produces small values if the labels y, y 0 ∈ Y of the samples
x, x 0 ∈ X are equal, and larger values if they aren’t.
21
Metric Learning (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Collect similarity judgements on data pairs/triplets
Metric Learning
Loss Function

Regression S = {(x i , x j ) : x i and x j should be similar},

Kernel Function
Kernel Regression D = {(x i , x j ) : x i and x j should be dissimilar}. (11)
k-NN Regression
Nadaraya-Watson
Model R = {(x i , x j , x k ) : x i should be more similar to x j than to x k }.
Nadaraya-Watson
Parametric Model

Clustering • Estimate parameters s.t. metric best agrees with judgements

k-Means
Hierarchical Clustering  
k-d Tree

Ŵ = arg min `(dW , S, D, R) + λR(W )  (12)

 
W | {z } | {z }
loss function regularization

22
Metric Learning (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

23
Metric Learning (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

24
Contrastive Approaches
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • An embedding function is usually a neural network
Metric Learning
Loss Function
• A distance function is L2 distance
Regression
Kernel Function • A loss function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

25
Contrastive Loss
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Contrastive Loss (Chopra et al. 2005)
Metric Learning
Loss Function • Let x 1 , x 2 be some samples in the dataset, and y1 , y2 are their corresponding
Regression
Kernel Function
labels. Also, for some condition A, let’s denote IA as the identity function
Kernel Regression that is equal to 1 if A is true, and 0 otherwise. The loss function is then
defined as follows:
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson

`contrast = Iy1 =y2 dW (x 1 , x 2 ) + Iy1 6=y2 max (0, α − dW (x 1 , x 2 )) (13)

Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree
where α is the margin.

26
Triplet Loss
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Triplet Loss (Schroff et al. 2015)
Metric Learning
Loss Function • Let x a , x p , x n be some samples from the dataset and y a , y p , y n be their
Regression
Kernel Function
corresponding labels, so that ya = yp and ya 6= yn . Usually, x a is called
Kernel Regression anchor sample, x p is called positive sample because it has the same label as
x a , and x n is called negative sample because it has a different label. It is
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
defined as:
Clustering
k-Means `triplet = max (0, dW (x a , x p ) − dW (x a , x n ) + α) (14)
Hierarchical Clustering
k-d Tree

where α is the margin.

27
Contrastive Loss vs. Triplet Loss
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

contrastive lost triplet lost

28
Regression
• Kernel Function
• Kernel Regression
• k-NN Regression
• Nadaraya-Watson Model
• Nadaraya-Watson Parametric Model
Feature Space
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Project the data into a higher dimensional space (feature space) F
Metric Learning
Loss Function • Transformation function
Regression φ : RD → F
Kernel Function (15)
Kernel Regression x i → φ(x i )
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
• Work with φ(x i ) instead of working with x i .
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

30
The Kernel Function
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Concept 2
A kernel is a function k(x, z) which represents a dot product in a “hidden”
Loss Function

Regression
Kernel Function feature space of φ.
k(x, z) = φ(x) · φ(z) (16)
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model • Note that: we have only dot products φ(x i ) · φ(x j ) to compute; however,
Clustering
k-Means
this could be very expensive in a high dimensional space.
Hierarchical Clustering
k-d Tree
• Kernel trick:
√ x1
2
 
x1

instead of φ(x) = φ =  2x1 x2 , use k(x, z) = (x · z)2
x2
x22

31
Common Kernels
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Polynomial:
Metric Learning
Loss Function

Regression k(x, z) = (ux · z + v)p (u ∈ R, v ∈ R, p ∈ N) (17)

Kernel Function
Kernel Regression
k-NN Regression • Gaussian:
kx − zk2
Nadaraya-Watson
!
Model
Nadaraya-Watson
Parametric Model
k(x, z) = exp − , σ ∈ R+ (18)
σ2
Clustering

Note: feature space is infinite-dimensional

k-Means
Hierarchical Clustering
k-d Tree

32
Techniques for Construction of Kernels
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation In all the following, k1 , k2 , ..., kj are assumed to be valid kernel functions
Metric Learning
Loss Function 1. Scalar multiplication: The validity of a kernel is conserved after
Regression
Kernel Function
multiplication by a positive scalar, i.e., for any α > 0, the function
Kernel Regression
k-NN Regression
Nadaraya-Watson
k(x, z) = αk1 (x, z) (19)
Model
Nadaraya-Watson
Parametric Model
2. Adding a positive constant: For any positive constant α > 0, the function
Clustering
k-Means
Hierarchical Clustering
k-d Tree
k(x, z) = α + k1 (x, z) (20)

33
Techniques for Construction of Kernels (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation 3. Linear combination: A linear combination of kernel functions involving only
Metric Learning
Loss Function
positive weights, i.e.,
Regression
Kernel Function
m
k(x, z) = αj kj (x, z), with αj > 0 (21)
X
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
j=1
Nadaraya-Watson

is a valid kernel function.

Parametric Model

Clustering
k-Means
Hierarchical Clustering
4. Product: The product of two kernel functions, i.e.,
k-d Tree

k(x, z) = k1 (x, z)k2 (x, z) (22)

is a valid kernel function.

34
Techniques for Construction of Kernels (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation 5. Polynomial functions of a kernel output: Given a polynomial f : R → R
Metric Learning
Loss Function
with positive coefficients, the function
Regression
Kernel Function k(x, z) = f (k1 (x, z)) (23)
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
is a valid kernel function.
6. Exponential function of a kernel output: The function
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k(x, z) = exp(k1 (x, z)) (24)
k-d Tree

is a valid kernel function.

7. Product of matrix and vectors:

k(x, z) = x | Az (25)

where A is a symmetric positive semidefinite matrix.

35
Linear Regression Revisted
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Problem: Given a dataset of input-output pairs D = {(x 1 , y1 ), . . . , (x N , yN )},
Metric Learning
Loss Function
find the best linear regresion
Regression • Primal form
Kernel Function
D
Kernel Regression

ŷ = f (x) = wi xi (26)
X
k-NN Regression
Nadaraya-Watson
Model
i=1
Nadaraya-Watson

where
Parametric Model

Clustering
k-Means w = (X | X + λI D )−1 X | y (27)
Hierarchical Clustering
k-d Tree
• Dual Form
N
ŷ = f (x) = αi x |i x (28)
X

i=1

where
α = (XX | + λI N )−1 y (29)
36
The Kernel Trick
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Question: How introduce nonlinearity to
Metric Learning
Loss Function

Regression
N
ŷ = f (x) = αi x |i x
X
Kernel Function
Kernel Regression
k-NN Regression i=1
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
• Solution: Replace the inner product x |i x by k(x, x i ), we have
Clustering
k-Means N
ŷ = f (x) = αi k(x, x i ) (30)
Hierarchical Clustering
X
k-d Tree

i=1

37
Kernel Method
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
1. Select a kernel function k(·, ·)
Loss Function
2. Construct a kernel matrix K ∈ RN×N where
Regression
Kernel Function
Kernel Regression [K ]ij = k(x i , x j ) (31)
k-NN Regression
Nadaraya-Watson

3. Compute the coefficients α ∈ RN , with

Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means α = (K + λI N )−1 y (32)
Hierarchical Clustering
k-d Tree

4. Estimate the predicted value for a new sample x

N
ŷ = αi k(x, x i ) (33)
X

i=1

38
Linear Regression vs. Kernel Method
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Linear regression Kernel method
Loss Function
pick a global model, best fit globally pick a local model, best fit locally
Regression
Kernel Function based on the columns (features) based on the rows (samples)
Kernel Regression
k-NN Regression
handle linearity handle nonlinearity
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

39
k-NN Regression
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Problem: Given a dataset of input-output pairs
Metric Learning
Loss Function D = {(x 1 , y1 ), . . . , (x N , yN )}, how to learn f to predict the output ŷ = f (x)
Regression
Kernel Function
for any new input x?
Kernel Regression
k-NN Regression
• Solution: Take the mean of the values of k nearest neighbors
Nadaraya-Watson
Model {x (1) , x (2) , ..., x (k) }
Pk
Nadaraya-Watson
y (i)
ŷ = i=1 (34)
Parametric Model

Clustering
k-Means
k
Hierarchical Clustering
k-d Tree

40
Nadaraya-Watson Model
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Problem: Given a dataset of input-output pairs
Metric Learning
Loss Function D = {(x 1 , y1 ), . . . , (x N , yN )}, how to learn f to predict the output ŷ = f (x)
Regression
Kernel Function
for any new input x?
Kernel Regression
k-NN Regression
• Solution: Consider (x i , yi ) as a pair of key-value and x as query
Nadaraya-Watson

key value
Model
Nadaraya-Watson
Parametric Model

Clustering
x1 y1
k-Means .. ..
Hierarchical Clustering
k-d Tree
. .
xN yN

N
ŷ = α(x, x i )yi , (35)
X

i=1

41
Nadaraya-Watson Model (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
• We define α using a Gaussian kernel
Metric Learning
Loss Function h i
Regression exp − 12 kx − x i k2
Kernel Function
α(x, x i ) = P i. (36)
n
h
exp x
Kernel Regression 2
1
k-NN Regression
j=1 − 2 kx − j k
Nadaraya-Watson
Model

and plug it into equation (17)

Nadaraya-Watson
Parametric Model

Clustering
N
k-Means

ŷ = α(x, x i )yi
X
Hierarchical Clustering
k-d Tree

i=1
h i (37)
N exp − 12 kx − x i k2
i yi
X
= PN h
i=1 j=1 exp − 1
2 kx − x j k2

42
Nadaraya-Watson Model (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
• A key x i that is closer to the given query x will get more attention via a
Metric Learning
Loss Function
larger attention weight assigned to the key’s corresponding value yi .
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

43
Example 1
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Generate an artificial dataset including 50 training examples and 50 testing
examples according to the following nonlinear function with the noise term
Metric Learning
Loss Function

Regression ∼ N (0, 0.5)

y = 2 sin(x) + x 0.8 + (38)
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
• Find the kernel regression
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

44
Nadaraya-Watson Parametric Model
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Kernel regression enjoys the consistency benefit: given enough data this
Metric Learning
Loss Function model converges to the optimal solution.
Regression
Kernel Function
• Nonetheless, we can easily integrate learnable parameters.
• In the following the distance between the query x and the key x i is multiplied
Kernel Regression
k-NN Regression

a learnable parameter w:
Nadaraya-Watson
Model
Nadaraya-Watson

w)
h i
exp x
Parametric Model
N − 1
(kx − i k 2
Clustering 2
ŷ = i yi (39)
X
k-Means
PN
w)
h
Hierarchical Clustering
k-d Tree
i=1 j=1 exp − 1
2 (kx − x j k 2

45
Example 2
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Generate an artificial dataset including 50 training examples and 50 testing
Metric Learning
Loss Function
examples according to the following nonlinear function with the noise term
Regression ∼ N (0, 0.5)
Kernel Function
Kernel Regression y = 2 sin(x) + x 0.8 + (40)
k-NN Regression
Nadaraya-Watson
Model
• Find the parametric kernel regression
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

46
Clustering
• k-Means
• Hierarchical Clustering
• k-d Tree
Clustering
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Concept 3
Loss Function
Cluster analysis or clustering is the task of grouping a set of objects in such a
way that objects in the same group (called a cluster) are more similar (in some
Regression
Kernel Function
Kernel Regression
k-NN Regression
sense) to each other than to those in other groups (clusters).
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

48
k-Means
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Concept 4
Given a set of observations D = {x 1 , . . . , x N }, k-means clustering aims to
Loss Function

Regression
Kernel Function partition the N observations into k (≤ N) sets S = {S1 , S2 , ..., Sk } so as to
minimize the within-cluster sum of squares
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
• The objective to find
Parametric Model

Clustering k X
arg min (41)
X
kx − µi k2
k-Means
Hierarchical Clustering
S
i=1 x∈Si
k-d Tree

where µi is the mean of Si

49
Illustration
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Loss Function

Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering
k-Means
Hierarchical Clustering
k-d Tree

50
Naive k-Means Algorithm
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation

1. Initialise a set of k means m 1 ,...,m k

Metric Learning
(0) (0)
Loss Function

Regression 2. For t = 1, 2, 3, ... do

Kernel Function
Kernel Regression • Assignment step: Assign each observation to the cluster with the
nearest mean: that with the least squared Euclidean distance
k-NN Regression
Nadaraya-Watson
Model

Si = x | L2 (x, m i ) < L2 (x, m j ), ∀j 6= i (42)

n o
Nadaraya-Watson (t) (t) (t)
Parametric Model

Clustering
k-Means
Hierarchical Clustering
• Update step: Recalculate means (centroids) for observations assigned
k-d Tree
to each cluster.
1
mi x (43)
(t+1)
X
= (t)
|Si |
x∈Si
(t)

The algorithm has converged when the assignments no longer change

51
Hierarchical Clustering
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Concept 5
Loss Function
Hierarchical clustering is a method of cluster analysis which seeks to build a
Regression
Kernel Function
hierarchy of clusters.
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Hierarchical Clustering Dendrogram
Nadaraya-Watson
Parametric Model 30
Clustering
k-Means
25
Hierarchical Clustering
k-d Tree 20

0
(7) (8) 41 (5)(10)(7) (4) (8) (9)(15)(5) (7) (4)(22)(15)(23)
Number of points in node (or index of point if no parenthesis).

52
Linkage Function
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation
Metric Learning
Concept 6
A linkage function L is used to calculate the distance (similarity/dissimilarity)
Loss Function

Regression
Kernel Function between arbitrary subsets of the instance space, given a distance metric d
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model • Single linkage: defines the distance between two clusters as the smallest
pairwise distance between elements from each cluster.
Nadaraya-Watson
Parametric Model

Clustering

Lsingle (A, B) = min{d(x, y) | x ∈ A, y ∈ B} (44)

k-Means
Hierarchical Clustering
k-d Tree

• Complete linkage: defines the distance between two clusters as the largest
pointwise distance.

Lcomplete (A, B) = max{d(x, y) | x ∈ A, y ∈ B} (45)

53
Agglomerative algorithm
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning

• Given a set of observations D = {x 1 , . . . , x n }

Motivation
Metric Learning
Loss Function

Regression Initialise clusters to singleton data points

Kernel Function
Kernel Regression
Create a leaf node for every singleton cluster
k-NN Regression
Nadaraya-Watson
Repeat
Model
Nadaraya-Watson
find the pair of clusters X , Y with lowest linkage
Parametric Model
merge X , Y into Z
Clustering
k-Means create a node for Z (parent node of X , Y )
Hierarchical Clustering
k-d Tree
Until all data points are in one cluster
Return the constructed binary tree

54
k-d Tree
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • The fundamental problem of k-NN is that distance computation is costly and
Metric Learning
Loss Function the total cost unavoidably linear in the number of points compared.
Regression
Kernel Function
• To increase the processing speed, it is possible to partition the data space
Kernel Regression
k-NN Regression
and reduce this number significantly using k-d tree
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Concept 7
Clustering A k-d tree (short for k-dimensional tree) is a space-partitioning data structure for
organizing points in a k-dimensional space
k-Means
Hierarchical Clustering
k-d Tree

55
Algorithm
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Construct k-d tree
Metric Learning
Loss Function
• Given and D-dimensional dataset D = {x 1 , x 2 , ..., x N }
Regression
Kernel Function • Cut data with a plane at its median value along that dimension
Kernel Regression
k-NN Regression
Nadaraya-Watson
• Recurse this procedure to create a balanced binary tree k-d tree
Model
Nadaraya-Watson

Nearest neighbor search

Parametric Model

Clustering
k-Means
Hierarchical Clustering • To locate the NN of an query vector x, determine which leaf cell it lies
within
k-d Tree

• To perform an exhaustive search within this cell.

56
Example
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation Given a dataset D = {(x1 , x2 )} = {(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2)}
Metric Learning
Loss Function • Construct k-d tree
Regression
Kernel Function
Kernel Regression 10
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson 8 (7,2)
Parametric Model

Clustering
k-Means 6
Hierarchical Clustering
(5,4) (9,6)
k-d Tree

(2,3) (4,7) (8,1)

0
0 2 4 6 8 10

57
Example (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters

Metric Learning
Motivation • Nearest neighbor search
Metric Learning
Loss Function 10
Regression
Kernel Function
Kernel Regression
k-NN Regression 8
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model

Clustering 6
k-Means
Hierarchical Clustering
k-d Tree

0
0 2 4 6 8 10 58
References

Goodfellow, I., Bengio, Y., and Courville, A. (2016).

Deep learning.
MIT press.
Lê, B. and Tô, V. (2014).
Cở sở trí tuệ nhân tạo.
Nhà xuất bản Khoa học và Kỹ thuật.
Russell, S. and Norvig, P. (2021).
Artificial intelligence: a modern approach.
Pearson Education Limited.

Folmsbee Et Al. - 2018 - Active Deep Learning Improved Training Efficiency of Convolutional Neural Networks For Tissue Classification in
No ratings yet
Folmsbee Et Al. - 2018 - Active Deep Learning Improved Training Efficiency of Convolutional Neural Networks For Tissue Classification in
4 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Machine learning Lecture 02
No ratings yet
Machine learning Lecture 02
25 pages
ch2
No ratings yet
ch2
30 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
CHP 4
No ratings yet
CHP 4
24 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
K - Nearest Neighbors
No ratings yet
K - Nearest Neighbors
26 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
KNN
No ratings yet
KNN
29 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Module IV - K NN
No ratings yet
Module IV - K NN
15 pages
ML-KN
No ratings yet
ML-KN
12 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
11 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
k-nearest neighbors algorithm - Wikipedia
No ratings yet
k-nearest neighbors algorithm - Wikipedia
10 pages
Lectures 7 and 8 - Data Anaysis in Management - MBM
No ratings yet
Lectures 7 and 8 - Data Anaysis in Management - MBM
78 pages
Lecture 3 - kNN algorithm
No ratings yet
Lecture 3 - kNN algorithm
28 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Training Machine Learning KNN 2017
No ratings yet
Training Machine Learning KNN 2017
17 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
S3-K-Nearest-Neighbor-LKW-15Jan2025
No ratings yet
S3-K-Nearest-Neighbor-LKW-15Jan2025
16 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Lecture Note #3_PEC-CS701E
No ratings yet
Lecture Note #3_PEC-CS701E
27 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Machine Learning unit 3
No ratings yet
Machine Learning unit 3
40 pages
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
No ratings yet
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
9 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Week 03
No ratings yet
Week 03
28 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN
No ratings yet
KNN
53 pages
Nearest Neighbour
No ratings yet
Nearest Neighbour
25 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
Shubh
No ratings yet
Shubh
10 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Lecture-8 Classification Using K-NN
No ratings yet
Lecture-8 Classification Using K-NN
40 pages
K NN Annotated Slides
No ratings yet
K NN Annotated Slides
9 pages
Monday - The K-Nearest Neighbours (KNN) - DS Core 13 Machine Learning
No ratings yet
Monday - The K-Nearest Neighbours (KNN) - DS Core 13 Machine Learning
4 pages
331mt 3.2 (1)
No ratings yet
331mt 3.2 (1)
23 pages
03d Algind KNN Eng
No ratings yet
03d Algind KNN Eng
23 pages
Aiml Module 3 Part 2
No ratings yet
Aiml Module 3 Part 2
12 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Module 4 A
No ratings yet
Module 4 A
29 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
Lecture7 KNN
No ratings yet
Lecture7 KNN
40 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Age and Gender Identification in Social Media
No ratings yet
Age and Gender Identification in Social Media
8 pages
Pneumonia Detection in AI
No ratings yet
Pneumonia Detection in AI
18 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
Bcse209l Machine-Learning TH 1.0 0 Bcse209l
No ratings yet
Bcse209l Machine-Learning TH 1.0 0 Bcse209l
3 pages
JPSSR.MS.ID.000537
No ratings yet
JPSSR.MS.ID.000537
15 pages
8 Esh Narayan 734 Research Article CSIT June 2012
No ratings yet
8 Esh Narayan 734 Research Article CSIT June 2012
9 pages
EdYoda Data Scientist Program Curriculum
No ratings yet
EdYoda Data Scientist Program Curriculum
14 pages
How To Break Down A Set Defence
No ratings yet
How To Break Down A Set Defence
27 pages
Number Plate Recogination Using Machine Learning
No ratings yet
Number Plate Recogination Using Machine Learning
11 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
59 pages
Semester-I: Information and Cyber Security
No ratings yet
Semester-I: Information and Cyber Security
10 pages
Domain Testing
No ratings yet
Domain Testing
25 pages
New K Means - Jupyter Notebook
No ratings yet
New K Means - Jupyter Notebook
4 pages
Complete Download Mahout in Action Sean Owen PDF All Chapters
100% (15)
Complete Download Mahout in Action Sean Owen PDF All Chapters
35 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
SPE 117423 Retrieving Vuggy-Fractured Porosity From Standard Well Log Data
No ratings yet
SPE 117423 Retrieving Vuggy-Fractured Porosity From Standard Well Log Data
7 pages
(eBook PDF) Practice of Computing Using Python, The 3rd Editioninstant download
100% (2)
(eBook PDF) Practice of Computing Using Python, The 3rd Editioninstant download
41 pages
AIML Module - 03 21CS4
No ratings yet
AIML Module - 03 21CS4
34 pages
Chapter 2 Data Preprocessing
No ratings yet
Chapter 2 Data Preprocessing
23 pages
Use of Image Classification in PIM Systems
No ratings yet
Use of Image Classification in PIM Systems
68 pages
Fruit Quality Evaluation Using Machine Learning Techniques - Review, Motivation and Future Perspectives
No ratings yet
Fruit Quality Evaluation Using Machine Learning Techniques - Review, Motivation and Future Perspectives
23 pages
Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River
No ratings yet
Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River
12 pages
Phishing Detection With Machine Learning
No ratings yet
Phishing Detection With Machine Learning
9 pages
An Insight Into Machine Learning Techniq
No ratings yet
An Insight Into Machine Learning Techniq
8 pages
Machine Learning Algorithmsfor Predictionofmobilephone Price
No ratings yet
Machine Learning Algorithmsfor Predictionofmobilephone Price
9 pages
Data Mining and Warehousing
100% (3)
Data Mining and Warehousing
30 pages
Finger Code
No ratings yet
Finger Code
3 pages
Joint Stock
No ratings yet
Joint Stock
4 pages
Machine Learinig Ja Bca 2nd Year Part 1
No ratings yet
Machine Learinig Ja Bca 2nd Year Part 1
10 pages

Lect05 Instance ML

Uploaded by

Lect05 Instance ML

Uploaded by

Instance Based Learning

Bùi Tiến Lên

and we can forget about it

y = f (x; w) (1) y = f (x; x 1 , x 2 , ..., x n ) (2)

• Nearest neighbor: Given query instance x q , first locate the nearest

neighbhor x (1) , then estimate

Clustering h(x q ) = y (1) (3)

h(x q ) = majority vote{y (1) , y (2) , ..., y (k) } (4)

Clustering • Euclidean distance (popular)

Most similar images

dW (x, x 0 ) = L fW (x), fW (x 0 ) (8)

Regression S = {(x i , x j ) : x i and x j should be similar},

Clustering • Estimate parameters s.t. metric best agrees with judgements

Ŵ = arg min `(dW , S, D, R) + λR(W )  (12)

`contrast = Iy1 =y2 dW (x 1 , x 2 ) + Iy1 6=y2 max (0, α − dW (x 1 , x 2 )) (13)

where α is the margin.

contrastive lost triplet lost

Regression k(x, z) = (ux · z + v)p (u ∈ R, v ∈ R, p ∈ N) (17)

Note: feature space is infinite-dimensional

is a valid kernel function.

k(x, z) = k1 (x, z)k2 (x, z) (22)

is a valid kernel function.

is a valid kernel function.

where A is a symmetric positive semidefinite matrix.

3. Compute the coefficients α ∈ RN , with

4. Estimate the predicted value for a new sample x

and plug it into equation (17)

Regression  ∼ N (0, 0.5)

where µi is the mean of Si

1. Initialise a set of k means m 1 ,...,m k

Regression 2. For t = 1, 2, 3, ... do

Si = x | L2 (x, m i ) < L2 (x, m j ), ∀j 6= i (42)

The algorithm has converged when the assignments no longer change

Lsingle (A, B) = min{d(x, y) | x ∈ A, y ∈ B} (44)

Lcomplete (A, B) = max{d(x, y) | x ∈ A, y ∈ B} (45)

• Given a set of observations D = {x 1 , . . . , x n }

Regression Initialise clusters to singleton data points

Nearest neighbor search

• To perform an exhaustive search within this cell.

(2,3) (4,7) (8,1)

Goodfellow, I., Bengio, Y., and Courville, A. (2016).

You might also like

Regression ∼ N (0, 0.5)