Lect05 Instance ML
Lect05 Instance ML
2022
Contents
1. Classification
2. Metric Learning
3. Regression
4. Clustering
Notation
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
symbol meaning
Loss Function
a, b, c, N . . . scalar number
Regression
Kernel Function w, v, x, y . . . column vector
Kernel Regression
k-NN Regression
X, Y . . . matrix operator meaning
Nadaraya-Watson
Model R set of real numbers w| transpose
Nadaraya-Watson
Parametric Model Z set of integer numbers XY matrix multiplication
Clustering N set of natural numbers X −1 inverse
RD set of vectors
k-Means
Hierarchical Clustering
set
k-d Tree
X , Y, . . .
A algorithm
3
Parametric vs Non-parametric Models
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Parametric Models Non-parametric Models
Metric Learning
Loss Function • In the models that we have seen, • A non parametric model is one
Regression
Kernel Function
we select a hypothesis space H that can not be characterized by a
Kernel Regression and adjust a fixed set of fixed set of parameters
parameters w with the training
k-NN Regression
Nadaraya-Watson
Model
• A family of non parametric models
Nadaraya-Watson
Parametric Model
data D is Instance Based Learning. The
Clustering • We assume that the parameters w function is based on the training
summarize the training data D data D = {x 1 , x 2 , ...x n }
k-Means
Hierarchical Clustering
4
Inductive Bias
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Concept 1
In nonparametric model, we assume that similar inputs have similar outputs.
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
• This is a reasonable assumption: The world is smooth, and functions,
Nadaraya-Watson
Model whether they are densities, discriminants, or regression functions, change
Nadaraya-Watson
Parametric Model slowly. Similar instances mean similar things.
Clustering
k-Means
Hierarchical Clustering
k-d Tree
5
Classification
• k-Nearest Neighbor (k-NN)
• Effects of Hyper-parameters
When To Consider Nearest Neighbor
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Data points x ∈ RD
Metric Learning
Loss Function
• Less than D < 20 attributes
Regression
Kernel Function • Lots of training data D
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
7
Nearest Neighbor
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Learning mode
Metric Learning
Loss Function • Store all training examples D = {(x i , yi ) | i = 1, ..., N}
Running mode
Regression
Kernel Function
8
Distance
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Some common distances in space RD
Metric Learning
Loss Function • The Minkowski distance of order p > 0
Regression
Kernel Function
D
!1/p
Kernel Regression
p
d(x, y) = Lp (x, y) = |xi − yi | (5)
X
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
i=1
Parametric Model
i=1
9
Distance (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
• Manhattan distance
Metric Learning
Loss Function
D
Regression
d(x, y) = L1 (x, y) = |xi − yi | (7)
X
Kernel Function
Kernel Regression
i=1
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree 0 0 0 0
p = 0.5 p= 1 p= 2 p= 4
Figure 1: Contours of the distance from the origin O for various values of the parameter p
10
The Curse of dimensionality
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • The more dimensions we have, the more examples we need
Metric Learning
Loss Function
• The number of examples that we have in a volume of space decreases
exponentially with the number of dimensions
Regression
Kernel Function
Kernel Regression
k-NN Regression
• If the number of dimensions is very high, the nearest neighbours can be
Nadaraya-Watson
Model very far away
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
11
Analysis
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Advantages Disadvantages
Metric Learning
Loss Function • No training, just store data • Slow at query time
Regression
Kernel Function
• Learn complex target functions • Easily fooled by irrelevant
Kernel Regression
k-NN Regression • Don’t lose information attributes
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
12
Parameter k
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • if k = 1 the cross point x should be classified to square class
Metric Learning
Loss Function
• if k = 3 ?
Regression
Kernel Function • if k = 5 ?
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
square class
Nadaraya-Watson
Parametric Model
circle class
Clustering
k-Means
Hierarchical Clustering
k-d Tree
13
Parameter k (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Data set D with 500 samples belonging to two classes {blue, orange}
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
14
Parameter k (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Decision regions for various values of k
Metric Learning
Loss Function k=1 k=2 k=3
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
k=4 k=5 k=6
Clustering
k-Means
Hierarchical Clustering
k-d Tree
k = 10 k = 20 k = 50
15
Metric Learning
• Motivation
• Metric Learning
• Loss Function
Motivation
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Nearest neighbor classification
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
17
Motivation (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Clustering
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
18
Motivation (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Information retrieval
Metric Learning
Loss Function
Query image
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
19
Motivation (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Data visualization
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
20
Metric Learning
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Given a set of data points X and their corresponding labels Y
Metric Learning
Loss Function • Select a parametric distance or similarity function
Regression
Clustering
k-Means
fW (x) : X → Rn (9)
Hierarchical Clustering
k-d Tree • A distance function (which is usually fixed beforehand)
L(x, x 0 ) : Rn × Rn → R (10)
• The goal is to train the parametric distance, so that the combination
dW (x, x 0 ) produces small values if the labels y, y 0 ∈ Y of the samples
x, x 0 ∈ X are equal, and larger values if they aren’t.
21
Metric Learning (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Collect similarity judgements on data pairs/triplets
Metric Learning
Loss Function
22
Metric Learning (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
23
Metric Learning (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
24
Contrastive Approaches
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • An embedding function is usually a neural network
Metric Learning
Loss Function
• A distance function is L2 distance
Regression
Kernel Function • A loss function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
25
Contrastive Loss
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Contrastive Loss (Chopra et al. 2005)
Metric Learning
Loss Function • Let x 1 , x 2 be some samples in the dataset, and y1 , y2 are their corresponding
Regression
Kernel Function
labels. Also, for some condition A, let’s denote IA as the identity function
Kernel Regression that is equal to 1 if A is true, and 0 otherwise. The loss function is then
defined as follows:
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Clustering
k-Means
Hierarchical Clustering
k-d Tree
where α is the margin.
26
Triplet Loss
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Triplet Loss (Schroff et al. 2015)
Metric Learning
Loss Function • Let x a , x p , x n be some samples from the dataset and y a , y p , y n be their
Regression
Kernel Function
corresponding labels, so that ya = yp and ya 6= yn . Usually, x a is called
Kernel Regression anchor sample, x p is called positive sample because it has the same label as
x a , and x n is called negative sample because it has a different label. It is
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
defined as:
Clustering
k-Means `triplet = max (0, dW (x a , x p ) − dW (x a , x n ) + α) (14)
Hierarchical Clustering
k-d Tree
27
Contrastive Loss vs. Triplet Loss
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
28
Regression
• Kernel Function
• Kernel Regression
• k-NN Regression
• Nadaraya-Watson Model
• Nadaraya-Watson Parametric Model
Feature Space
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Project the data into a higher dimensional space (feature space) F
Metric Learning
Loss Function • Transformation function
Regression φ : RD → F
Kernel Function (15)
Kernel Regression x i → φ(x i )
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
• Work with φ(x i ) instead of working with x i .
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
30
The Kernel Function
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Concept 2
A kernel is a function k(x, z) which represents a dot product in a “hidden”
Loss Function
Regression
Kernel Function feature space of φ.
k(x, z) = φ(x) · φ(z) (16)
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model • Note that: we have only dot products φ(x i ) · φ(x j ) to compute; however,
Clustering
k-Means
this could be very expensive in a high dimensional space.
Hierarchical Clustering
k-d Tree
• Kernel trick:
√ x1
2
x1
instead of φ(x) = φ = 2x1 x2 , use k(x, z) = (x · z)2
x2
x22
31
Common Kernels
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Polynomial:
Metric Learning
Loss Function
32
Techniques for Construction of Kernels
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation In all the following, k1 , k2 , ..., kj are assumed to be valid kernel functions
Metric Learning
Loss Function 1. Scalar multiplication: The validity of a kernel is conserved after
Regression
Kernel Function
multiplication by a positive scalar, i.e., for any α > 0, the function
Kernel Regression
k-NN Regression
Nadaraya-Watson
k(x, z) = αk1 (x, z) (19)
Model
Nadaraya-Watson
Parametric Model
2. Adding a positive constant: For any positive constant α > 0, the function
Clustering
k-Means
Hierarchical Clustering
k-d Tree
k(x, z) = α + k1 (x, z) (20)
33
Techniques for Construction of Kernels (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation 3. Linear combination: A linear combination of kernel functions involving only
Metric Learning
Loss Function
positive weights, i.e.,
Regression
Kernel Function
m
k(x, z) = αj kj (x, z), with αj > 0 (21)
X
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
j=1
Nadaraya-Watson
Clustering
k-Means
Hierarchical Clustering
4. Product: The product of two kernel functions, i.e.,
k-d Tree
34
Techniques for Construction of Kernels (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation 5. Polynomial functions of a kernel output: Given a polynomial f : R → R
Metric Learning
Loss Function
with positive coefficients, the function
Regression
Kernel Function k(x, z) = f (k1 (x, z)) (23)
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
is a valid kernel function.
6. Exponential function of a kernel output: The function
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k(x, z) = exp(k1 (x, z)) (24)
k-d Tree
k(x, z) = x | Az (25)
Metric Learning
Motivation Problem: Given a dataset of input-output pairs D = {(x 1 , y1 ), . . . , (x N , yN )},
Metric Learning
Loss Function
find the best linear regresion
Regression • Primal form
Kernel Function
D
Kernel Regression
ŷ = f (x) = wi xi (26)
X
k-NN Regression
Nadaraya-Watson
Model
i=1
Nadaraya-Watson
where
Parametric Model
Clustering
k-Means w = (X | X + λI D )−1 X | y (27)
Hierarchical Clustering
k-d Tree
• Dual Form
N
ŷ = f (x) = αi x |i x (28)
X
i=1
where
α = (XX | + λI N )−1 y (29)
36
The Kernel Trick
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Question: How introduce nonlinearity to
Metric Learning
Loss Function
Regression
N
ŷ = f (x) = αi x |i x
X
Kernel Function
Kernel Regression
k-NN Regression i=1
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
• Solution: Replace the inner product x |i x by k(x, x i ), we have
Clustering
k-Means N
ŷ = f (x) = αi k(x, x i ) (30)
Hierarchical Clustering
X
k-d Tree
i=1
37
Kernel Method
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
1. Select a kernel function k(·, ·)
Loss Function
2. Construct a kernel matrix K ∈ RN×N where
Regression
Kernel Function
Kernel Regression [K ]ij = k(x i , x j ) (31)
k-NN Regression
Nadaraya-Watson
Clustering
k-Means α = (K + λI N )−1 y (32)
Hierarchical Clustering
k-d Tree
i=1
38
Linear Regression vs. Kernel Method
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Linear regression Kernel method
Loss Function
pick a global model, best fit globally pick a local model, best fit locally
Regression
Kernel Function based on the columns (features) based on the rows (samples)
Kernel Regression
k-NN Regression
handle linearity handle nonlinearity
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
39
k-NN Regression
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Problem: Given a dataset of input-output pairs
Metric Learning
Loss Function D = {(x 1 , y1 ), . . . , (x N , yN )}, how to learn f to predict the output ŷ = f (x)
Regression
Kernel Function
for any new input x?
Kernel Regression
k-NN Regression
• Solution: Take the mean of the values of k nearest neighbors
Nadaraya-Watson
Model {x (1) , x (2) , ..., x (k) }
Pk
Nadaraya-Watson
y (i)
ŷ = i=1 (34)
Parametric Model
Clustering
k-Means
k
Hierarchical Clustering
k-d Tree
40
Nadaraya-Watson Model
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Problem: Given a dataset of input-output pairs
Metric Learning
Loss Function D = {(x 1 , y1 ), . . . , (x N , yN )}, how to learn f to predict the output ŷ = f (x)
Regression
Kernel Function
for any new input x?
Kernel Regression
k-NN Regression
• Solution: Consider (x i , yi ) as a pair of key-value and x as query
Nadaraya-Watson
key value
Model
Nadaraya-Watson
Parametric Model
Clustering
x1 y1
k-Means .. ..
Hierarchical Clustering
k-d Tree
. .
xN yN
N
ŷ = α(x, x i )yi , (35)
X
i=1
41
Nadaraya-Watson Model (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
• We define α using a Gaussian kernel
Metric Learning
Loss Function h i
Regression exp − 12 kx − x i k2
Kernel Function
α(x, x i ) = P i. (36)
n
h
exp x
Kernel Regression 2
1
k-NN Regression
j=1 − 2 kx − j k
Nadaraya-Watson
Model
Clustering
N
k-Means
ŷ = α(x, x i )yi
X
Hierarchical Clustering
k-d Tree
i=1
h i (37)
N exp − 12 kx − x i k2
i yi
X
= PN h
i=1 j=1 exp − 1
2 kx − x j k2
42
Nadaraya-Watson Model (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
• A key x i that is closer to the given query x will get more attention via a
Metric Learning
Loss Function
larger attention weight assigned to the key’s corresponding value yi .
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
43
Example 1
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Generate an artificial dataset including 50 training examples and 50 testing
examples according to the following nonlinear function with the noise term
Metric Learning
Loss Function
Clustering
k-Means
Hierarchical Clustering
k-d Tree
44
Nadaraya-Watson Parametric Model
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Kernel regression enjoys the consistency benefit: given enough data this
Metric Learning
Loss Function model converges to the optimal solution.
Regression
Kernel Function
• Nonetheless, we can easily integrate learnable parameters.
• In the following the distance between the query x and the key x i is multiplied
Kernel Regression
k-NN Regression
a learnable parameter w:
Nadaraya-Watson
Model
Nadaraya-Watson
w)
h i
exp x
Parametric Model
N − 1
(kx − i k 2
Clustering 2
ŷ = i yi (39)
X
k-Means
PN
w)
h
Hierarchical Clustering
k-d Tree
i=1 j=1 exp − 1
2 (kx − x j k 2
45
Example 2
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Generate an artificial dataset including 50 training examples and 50 testing
Metric Learning
Loss Function
examples according to the following nonlinear function with the noise term
Regression ∼ N (0, 0.5)
Kernel Function
Kernel Regression y = 2 sin(x) + x 0.8 + (40)
k-NN Regression
Nadaraya-Watson
Model
• Find the parametric kernel regression
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
46
Clustering
• k-Means
• Hierarchical Clustering
• k-d Tree
Clustering
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Concept 3
Loss Function
Cluster analysis or clustering is the task of grouping a set of objects in such a
way that objects in the same group (called a cluster) are more similar (in some
Regression
Kernel Function
Kernel Regression
k-NN Regression
sense) to each other than to those in other groups (clusters).
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
48
k-Means
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Concept 4
Given a set of observations D = {x 1 , . . . , x N }, k-means clustering aims to
Loss Function
Regression
Kernel Function partition the N observations into k (≤ N) sets S = {S1 , S2 , ..., Sk } so as to
minimize the within-cluster sum of squares
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
• The objective to find
Parametric Model
Clustering k X
arg min (41)
X
kx − µi k2
k-Means
Hierarchical Clustering
S
i=1 x∈Si
k-d Tree
49
Illustration
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Loss Function
Regression
Kernel Function
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering
k-Means
Hierarchical Clustering
k-d Tree
50
Naive k-Means Algorithm
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Clustering
k-Means
Hierarchical Clustering
• Update step: Recalculate means (centroids) for observations assigned
k-d Tree
to each cluster.
1
mi x (43)
(t+1)
X
= (t)
|Si |
x∈Si
(t)
51
Hierarchical Clustering
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Concept 5
Loss Function
Hierarchical clustering is a method of cluster analysis which seeks to build a
Regression
Kernel Function
hierarchy of clusters.
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model
Hierarchical Clustering Dendrogram
Nadaraya-Watson
Parametric Model 30
Clustering
k-Means
25
Hierarchical Clustering
k-d Tree 20
15
10
0
(7) (8) 41 (5)(10)(7) (4) (8) (9)(15)(5) (7) (4)(22)(15)(23)
Number of points in node (or index of point if no parenthesis).
52
Linkage Function
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation
Metric Learning
Concept 6
A linkage function L is used to calculate the distance (similarity/dissimilarity)
Loss Function
Regression
Kernel Function between arbitrary subsets of the instance space, given a distance metric d
Kernel Regression
k-NN Regression
Nadaraya-Watson
Model • Single linkage: defines the distance between two clusters as the smallest
pairwise distance between elements from each cluster.
Nadaraya-Watson
Parametric Model
Clustering
• Complete linkage: defines the distance between two clusters as the largest
pointwise distance.
53
Agglomerative algorithm
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
54
k-d Tree
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • The fundamental problem of k-NN is that distance computation is costly and
Metric Learning
Loss Function the total cost unavoidably linear in the number of points compared.
Regression
Kernel Function
• To increase the processing speed, it is possible to partition the data space
Kernel Regression
k-NN Regression
and reduce this number significantly using k-d tree
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Concept 7
Clustering A k-d tree (short for k-dimensional tree) is a space-partitioning data structure for
organizing points in a k-dimensional space
k-Means
Hierarchical Clustering
k-d Tree
55
Algorithm
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Construct k-d tree
Metric Learning
Loss Function
• Given and D-dimensional dataset D = {x 1 , x 2 , ..., x N }
Regression
Kernel Function • Cut data with a plane at its median value along that dimension
Kernel Regression
k-NN Regression
Nadaraya-Watson
• Recurse this procedure to create a balanced binary tree k-d tree
Model
Nadaraya-Watson
Clustering
k-Means
Hierarchical Clustering • To locate the NN of an query vector x, determine which leaf cell it lies
within
k-d Tree
56
Example
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation Given a dataset D = {(x1 , x2 )} = {(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2)}
Metric Learning
Loss Function • Construct k-d tree
Regression
Kernel Function
Kernel Regression 10
k-NN Regression
Nadaraya-Watson
Model
Nadaraya-Watson 8 (7,2)
Parametric Model
Clustering
k-Means 6
Hierarchical Clustering
(5,4) (9,6)
k-d Tree
0
0 2 4 6 8 10
57
Example (cont.)
Classification
k-Nearest Neighbor
(k-NN)
Effects of
Hyper-parameters
Metric Learning
Motivation • Nearest neighbor search
Metric Learning
Loss Function 10
Regression
Kernel Function
Kernel Regression
k-NN Regression 8
Nadaraya-Watson
Model
Nadaraya-Watson
Parametric Model
Clustering 6
k-Means
Hierarchical Clustering
k-d Tree
0
0 2 4 6 8 10 58
References