0% found this document useful (0 votes)

15 views

Hadsell Et Al - Dimensionality Reduction by Learning an Invariant Mapping

Uploaded by

seung.youn.lee

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Hadsell Et Al - Dimensionality Reduction by Learning an Invariant Mapping

Uploaded by

seung.youn.lee

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Dimensionality Reduction by Learning an Invariant Mapping

Raia Hadsell Sumit Chopra Yann LeCun

Courant Institute of Mathematical Sciences
New York University
New York, NY, USA
{raia, sumit, yann}@cs.nyu.edu

Abstract age data is limited because linearly combining images only

makes sense for images that are perfectly registered and
Dimensionality reduction involves mapping a set of high very similar. Laplacian Eigenmap [2] and Hessian LLE [8]
dimensional input points onto a low dimensional mani- do not require a meaningful metric in input space (they
fold so that “similar” points in input space are mapped merely require a list of neighbors for every sample), but
to nearby points on the manifold. We present a method - as with LLE, new points whose relationships with training
called Dimensionality Reduction by Learning an Invariant samples are unknown cannot be processed. Out-of-sample
Mapping (DrLIM) - for learning a globally coherent non- extensions to several dimensionality reduction techniques
linear function that maps the data evenly to the output man- have been proposed that allow for consistent embedding of
ifold. The learning relies solely on neighborhood relation- new data samples without recomputation of all samples [3].
ships and does not require any distance measure in the input These extensions, however, assume the existence of a com-
space. The method can learn mappings that are invariant putable kernel function that is used to generate the neigh-
to certain transformations of the inputs, as is demonstrated borhood matrix. This dependence is reducible to the depen-
with a number of experiments. Comparisons are made to dence on a computable distance metric in input space.
other techniques, in particular LLE. Another limitation of current methods is that they tend to
cluster points in output space, sometimes densely enough to
be considered degenerate solutions. Rather, it is sometimes
1 Introduction desirable to ﬁnd manifolds that are uniformly covered by
samples.
The method proposed in the present paper, called Di-
Modern applications have steadily expanded their use of mensionality Reduction by Learning an Invariant Mapping
complex, high dimensional data. The massive, high dimen- (DrLIM), provides a solution to the above problems. Dr-
sional image datasets generated by biology, earth science, LIM is a method for learning a globally coherent non-linear
astronomy, robotics, modern manufacturing, and other do- function that maps the data to a low dimensional manifold.
mains of science and industry demand new techniques for The method presents four essential characteristics:
analysis, feature extraction, dimensionality reduction, and
visualization. • It only needs neighborhood relationships between
Dimensionality reduction aims to translate high dimen- training samples. These relationships could come from
sional data to a low dimensional representation such that prior knowledge, or manual labeling, and be indepen-
similar input objects are mapped to nearby points on a man- dent of any distance metric.
ifold. Most existing dimensionality reduction techniques
have two shortcomings. First, they do not produce a func- • It may learn functions that are invariant to complicated
tion (or a mapping) from input to manifold that can be ap- non-linear trnasformations of the inputs such as light-
plied to new points whose relationship to the training points ing changes and geometric distortions.
is unknown. Second, many methods presuppose the exis-
tence of a meaningful (and computable) distance metric in • The learned function can be used to map new samples
the input space. not seen during training, with no prior knowledge.
For example, Locally Linear Embedding (LLE) [13] lin-
early combines input vectors that are identiﬁed as neigh- • The mapping generated by the function is in some
bors. The applicability of LLE and similar methods to im- sense “smooth” and coherent in the output space.

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
A contrastive loss function is employed to learn the param- sional mapping. In recent work, Weinberger et al.in [10] at-
eters W of a parameterized function GW , in such a way that tempt to learn the kernel matrix when the high dimensional
neighbors are pulled together and non-neighbors are pushed input lies on a low dimensional manifold by formulating the
apart. Prior knowledge can be used to identify the neighbors problem as a semidefinite program. There are also related
for each training data point. algorithms for clustering due to Shi and Malik [12] and Ng
The method uses an energy based model that uses the et al. [15].
given neighborhood relationships to learn the mapping The proposed approach is different from these methods;
function. For a family of functions G, parameterized by W , it learns a function that is capable of consistently mapping
the objective is to find a value of W that maps a set of high new points unseen during training. In addition, this function
dimensional inputs to the manifold such that the euclidean is not constrained by simple distance measures in the input
distance between points on the manifold, DW (X 1 , X
2 ) = space. The learning architecture is somewhat similar to the
||GW (X 1 ) − GW (X 2 )||2 approximates the “semantic sim- one discussed in [4, 5].
ilarity”of the inputs in input space, as provided by a set of Section 2 describes the general framework, the loss func-
neighborhood relationships. No assumption is made about tion, and draws an analogy with a mechanical spring sys-
GW except that it is differentiable with respect to W . tem. The ideas in this section are made concrete in sec-
tion 3. Here various experimental results are given.
1.1 Previous Work
2 Learning the Low Dimensional Mapping
The problem of mapping a set of high dimensional points
onto a low dimensional manifold has a long history. The The problem is to find a function that maps high dimen-
two classical methods for the problem are Principal Com- sional input patterns to lower dimensional outputs, given
ponent Analysis (PCA) [7] and Multi-Dimensional Scal- neighborhood relationships between samples in input space.
ing (MDS) [6]. PCA involves the projection of inputs to a The graph of neighborhood relationships may come from
low dimensional subspace that maximizes the variance. In information source that may not be available for test points,
MDS, one computes the projection that best preserves the such as prior knowledge, manual labeling, etc. More pre-
pairwise distances between input points. However both the cisely, given a set of input vectors I = {X 1 , . . . , XP },
methods - PCA in general and MDS in the classical scaling i ∈ , ∀i = 1, . . . , n, find a parametric func-
D
where X
case (when the distances are euclidean distances) - generate
tion GW : D −→ d with d D, such that it has the
a linear embedding.
following properties:
In recent years there has been a lot of activity in design-
ing non-linear spectral methods for the problem. These 1. Simple distance measures in the output space (such as
methods involve solving the eigenvalue problem for a euclidean distance) should approximate the neighbor-
particular matrix. Recently proposed algorithms include hood relationships in the input space.
ISOMAP (2000) by Tenenbaum et al. [1], Local Linear Em-
bedding - LLE (2000) by Roweis and Saul [13], Laplacian 2. The mapping should not be constrained to implement-
Eigenmaps (2003) due to Belkin and Niyogi [2] and Hes- ing simple distance measures in the input space and
sian LLE (2003) by Donoho and Grimes [8]. All the above should be able to learn invariances to complex trans-
methods have three main steps. The first is to identify a list formations.
of neighbors of each point. Second, a gram matrix is com-
3. It should be faithful even for samples whose neighbor-
puted using this information. Third, the eigenvalue problem
hood relationships are unknown.
is solved for this matrix. None of these methods attempt to
compute a function that could map a new, unknown data
point without recomputing the entire embedding and with- 2.1 The Contrastive Loss Function
out knowing its relationships to the training points. Out-
of-sample extensions to the above methods have been pro- Consider the set I of high dimensional training vectors
X i . Assume that for each X
i ∈ I there is a set S of train-
posed by Bengio et al.in [3], but they too rely on a predeter- Xi
mined computable distance metric. ing vectors that are deemed similar to X i . This set can be
Along a somewhat different line Schöelkopf et al.in computed by some prior knowledge - invariance to distor-
1998 [11] proposed a non-linear extension of PCA, called tions or temporal proximity, for instance - which does not
Kernel PCA. The idea is to non-linearly map the inputs to depend on a simple distance. A meaningful mapping from
a high dimensional feature space and then extract the prin- high to low dimensional space maps similar input vectors to
cipal components. The algorithm first expresses the PCA nearby points on the output manifold and dissimilar vectors
computation solely in terms of dot products and then ex- to distant points. A new loss function whose minimization
ploits the kernel trick to implicitly compute the high dimen- can produce such a function is now introduced.

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
Unlike loss functions that sum over samples, this loss function only if their distance is within this radius (See fig-
function runs over pairs of samples. Let X 2 ∈ I be a
1 , X ure 1).
pair of input vectors shown to the system. Let Y be a binary The contrastive term involving dissimilar pairs, LD , is
label assigned to this pair. Y = 0 if X1 and X 2 are deemed crucial. Simply minimizing DW (X 1 , X
2 ) over the set of all
similar, and Y = 1 if they are deemed dissimilar. Define similar pairs will usually lead to a collapsed solution, since
the parameterized distance function to be learned DW be- DW and the loss L could then be made zero by setting GW
tween X 2 as the euclidean distance between the outputs
1 , X to a constant. Most energy-based models require the use of
of GW . That is, an explicit contrastive term in the loss function.
1 , X
DW (X 2 ) = GW (X
1 ) − GW (X
2 )2 (1) 2.2 Spring Model Analogy
To shorten notation, DW (X 1 , X
2 ) is written DW . Then the
An analogy to a particular mechanical spring system is
loss function in its most general form is given to provide an intuition of what is happening when
the loss function is minimized. The outputs of GW can be

P
L(W ) = 1, X
L(W, (Y, X 2 )i ) (2) thought of as masses attracting and repelling each other with
i=1 springs. Consider the equation of a spring

1, X
L(W, (Y, X 2 )i ) = (1 − Y )LS Di + Y LD Di
W W
F = −KX (5)
(3)
where F is the force, K is the spring constant and X is the
2 )i is the i-th labeled sample pair, LS is
1 , X displacement of the spring from its rest length. A spring
where (Y, X
is attract-only if its rest length is equal to zero. Thus any
the partial loss function for a pair of similar points, LD the
positive displacement X will result in an attractive force
partial loss function for a pair of dissimilar points, and P
between its ends. A spring is said to be m-repulse-only if its
the number of training pairs (which may be as large as the
rest length is equal to m. Thus two points that are connected
square of the number of samples).
with a m-repulse-only spring will be pushed apart if X is
LS and LD must be designed such that minimizing L
less than m. However this spring has a special property
with respect to W would result in low values of DW for
that if the spring is stretched by a length X > m, then no
similar pairs and high values of DW for dissimilar pairs.
attractive force brings it back to rest length. Each point is
3.5 connected to other points using these two kinds of springs.
Seen in the light of the loss function, each point is connected
3
by attract-only springs to similar points, and is connected
2.5
by m-repulse-only spring to dissimilar points. See figure 2.
Consider the loss function LS (W, X 1 , X
2 ) associated
2 with similar pairs.
Loss (L)

1.5
LS (W, X 2 ) = 1 (DW )2
1 , X (6)
2
1

The loss function L is minimized using the stochastic gra-

0.5 dient descent algorithm. The gradient of LS is
margin: m
∂LS ∂DW
= DW
0
1.25
Energy (Ew) (7)
∂W ∂W
Figure 1. Graph of the loss function L against the energy DW . Comparing equations 5 and 7, it is clear that the gradient
∂LS
The dashed (red) line is the loss function for the similar pairs and ∂W of LS gives the attractive force between the two points.
the solid (blue) line is for the dissimilar pairs. ∂DW
∂W defines the spring constant K of the spring and D W ,
which is the distance between the two points, gives the per-
The exact loss function is turbation X of the spring from its rest length. Clearly, even
1, X
2 ) = a small value of DW will generate a gradient (force) to de-
L(W, Y, X
crease DW . Thus the similar loss function corresponds to
1 1
(1 − Y ) (DW )2 + (Y ) {max(0, m − DW )}2 (4) the attract-only spring (figure 2).
2 2 Now consider the partial loss function LD .
where m > 0 is a margin. The margin defines a radius 1

around GW (X). Dissimilar pairs contribute to the loss 1 , X
LD (W, X 2) = (max{0, m − DW })2 (8)
2

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
global loss function L over all springs, one would ultimately
drive the system to its equilibrium state.

2.3 The Algorithm

The algorithm ﬁrst generates the training set, then trains

the machine.
Step 1: For each input sample X i , do the following:
(a) Using prior knowledge ﬁnd the set of samples
SXi = {X j }p , such that X
j is deemed sim-
j=1
i.
ilar to X
(b) Pair the sample X i with all the other training
samples and label the pairs so that:
j ∈ S , and Yij = 1 otherwise.
Yij = 0 if X Xi

Combine all the pairs to form the labeled training set.

Step 2: Repeat until convergence:
(a) For each pair (X i, X
j ) in the training set, do
i. If Yij = 0, then update W to decrease
DW = GW (X i ) − GW (X
j )2
Figure 2. The spring system. The solid circles represent points that
are similar to the point in the center. The hollow circles represent ii. If Yij = 1, then update W to increase
dissimilar points. The springs are shown as red zigzag lines. The DW = GW (X i ) − GW (X
j )2
forces acting on the points are shown in blue arrows. The length of
the arrows approximately gives the strength of the force. In the two This increase and decrease of euclidean distances in the
plots on the right side, the x-axis is the distance DW and the y-axis output space is done by minimizing the above loss function.
is the value of the loss function. (a). Shows the points connected
to similar points with attract-only springs. (b). The loss function
and gradient of similar pairs. (c) The point connected only with
3 Experiments
dissimilar points inside the circle of radius m with m-repulse-only
springs. (d) The loss function and gradient of dissimilar pairs. (e) The experiments presented in this section demonstrate
A point is pulled by other points in different directions, creating the invariances afforded by our approach and also clarify the
equilibrium. limitations of techniques such as LLE. First we give details
of the parameterized machine GW that learns the mapping
function.
When DW > m, ∂L ∂W = 0. Thus there is no gradient
D

(force) on the two points that are dissimilar and are at a

distance DW > m. If DW < m then 3.1 Training Architecture
∂LD ∂DW
= −(m − DW ) (9) The learning architecture is similar to the one used in [4]
∂W ∂W and [5]. Called a siamese architecture, it consists of two
Again, comparing equations 5 and 9 it is clear that the dis- copies of the function GW which share the same set of pa-
similar loss function LD corresponds to the m-repulse-only rameters W , and a cost module. A loss module whose input
spring; its gradient gives the force of the spring, ∂DW
∂W gives
is the output of this architecture is placed on top of it. The
the spring constant K and (m− DW ) gives the perturbation input to the entire system is a pair of images (X 2 ) and
1 , X
X. The negative sign denotes the fact that the force is re- a label Y . The images are passed through the functions,
pulsive only. Clearly the force is maximum when DW = 0 yielding two outputs G(X 1 ) and G(X 2 ). The cost module
and absent when DW = m. See ﬁgure 2. then generates the distance DW (GW (X 1 ), GW (X2 )). The
Here, especially in the case of LS , one might think that loss function combines DW with label Y to produce the
simply making DW = 0 for all attract-only springs would scalar loss LS or LD , depending on the label Y . The pa-
put the system in equilibrium. Consider, however, ﬁgure 2e. rameter W is updated using stochastic gradient. The gradi-
Suppose b1 is connected to b2 and b3 with attract-only ents can be computed by back-propagation through the loss,
springs. Then decreasing DW between b1 and b2 will in- the cost, and the two instances of GW . The total gradient is
crease DW between b1 and b3 . Thus by minimizing the the sum of the contributions from the two instances.

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
The experiments involving airplane images from the organization that is primarily determined by the slant angle
NORB dataset [9] use a 2-layer fully connected neural net- of the samples. The samples are spread rather uniformly in
work as GW . The number of hidden and output units used the populated region.
was 20 and 3 respectively. Experiments on the MNIST
dataset used a convolutional network as GW (ﬁgure 3).
Convolutional networks are trainable, non-linear learning
machines that operate at pixel level and learn low-level fea-
tures and high-level representations in an integrated manner.
They are trained end-to-end to map images to outputs. Be-
cause of a structure of shared weights and multiple layers,
they can learn optimal shift-invariant local feature detectors
while maintaining invariance to geometric distortions of the
input image.

Figure 4. Experiment demonstrating the effectiveness of the Dr-

LIM in a trivial situation with MNIST digits. A Euclidean near-
Figure 3. Architecture of the function GW (a convolutional net- est neighbor metric is used to create the local neighborhood rela-
work) which was learned to map the MNIST data to a low dimen- tionships among the training samples, and a mapping function is
sional manifold with invariance to shifts. learned with a convolutional network. Figure shows the placement
of the test samples in output space. Even though the neighborhood
The layers of the convolutional network comprise a con- relationships among these samples are unknown, they are well or-
volutional layer C1 with 15 feature maps, a subsampling ganized and evenly distributed on the 2D manifold.
layer S2 , a second convolutional layer C3 with 30 feature
maps, and fully connected layer F3 with 2 units. The sizes
of the kernels for the C1 and C3 were 6x6 and 9x9 respec- 3.3 Learning a Shift-Invariant Mapping of
tively. MNIST samples

3.2 Learned Mapping of MNIST samples In this experiment, the DrLIM approach is evaluated us-
ing 2 categories of MNIST, distorted by adding samples that
The first experiment is designed to establish the basic have been horizontally translated. The objective is to learn
functionality of the DrLIM approach. The neighborhood a 2D mapping that is invariant to horizontal translations.
graph is generated with euclidean distances and no prior In the distorted set, 3000 images of 4’s and 3000 im-
knowledge. ages of 9’s are horizontally translated by -6, -3, 3, and 6
The training set is built from 3000 images of the hand- pixels and combined with the originals, producing a total
written digit 4 and 3000 images of the handwritten digit 9 of 30,000 samples. The 2000 samples in the test set were
chosen randomly from the MNIST dataset. Approximately distorted in the same way.
1000 images of each digit comprised the test set. These im- First the system was trained using pairs from a euclidean
ages were shuffled, paired, and labeled according to a sim- distance neighborhood graph (5 nearest neighbors per sam-
ple euclidean distance measure: each sample X i was paired ple), as in experiment 1. The large distances between trans-
with its 5 nearest neighbors, producing the set SXi . All lated samples creates a disjoint neighborhood relationship
other possible pairs were labeled dissimilar. graph and the resulting mapping is disjoint as well. The out-
The mapping of the test set to a 2D manifold is shown put points are clustered according to the translated position
in figure 4. The lighter-colored blue dots are 9’s and the of the input sample (figure 5). Within each cluster, however,
darker-colored red dots are 4’s. Several input test samples the samples are well organized and evenly distributed.
are shown next to their manifold positions. The 4’s and 9’s For comparison, the LLE algorithm was used to map the
are in two somewhat overlapping regions, with an overall distorted MNIST using the same euclidean distance neigh-

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
In order to make the mapping function invariant to trans-
lation, the euclidean nearest neighbors were supplemented
with pairs created using prior knowledge. Each sample was
paired with (a) its 5 nearest neighbors, (b) its 4 translations,
and (c) the 4 translations of each of its 5 nearest neighbors.
Additionally, each of the sample’s 4 translations was paired
with (d) all the above nearest neighbors and translated sam-
ples. All other possible pairs are labeled as dissimilar.
The mapping of the test set samples is shown in ﬁgure 7.
The lighter-colored blue dots are 4’s and the darker-colored
red dots are 9’s. As desired, there is no organization on the
basis of translation; in fact, translated versions of a given
character are all tighlty packed in small regions on the man-
ifold.
Figure 5. This experiment shows the effect of a simple distance-
based mapping on MNIST data with horizontal translations added
(-6, -3, +3, and +6 pixels). Since translated samples are far apart,
the manifold has 5 distinct clusters of samples corresponding to
the 5 translations. Note that the clusters are individually well-
organized, however. Results are on test samples, unseen during
training.

borhood graph. The result was a degenerate embedding in

which differently registered samples were completely sepa-
rated (ﬁgure 6). Although there is sporadic local organiza-
tion, there is no global coherence in the embedding.

Figure 7. This experiment measured DrLIM’s success at learning

a mapping from high-dimensional, shifted digit images to a 2D
manifold. The mapping is invariant to translations of the input
images. The mapping is well-organized and globally coherent.
Results shown are the test samples, whose neighborhood relations
are unknown. Similar characters are mapped to nearby areas, re-
gardless of their shift.

3.4 Mapping Learned with Temporal

Neighborhoods and Lighting Invari-
ance

The ﬁnal experiment demonstrates dimensionality re-

duction on a set of images of a single object. The object is
an airplane from the NORB [9] dataset with uniform back-
Figure 6. LLE’s embedding of the distorted MNIST set with hor- grounds. There are a total of 972 images of the airplane un-
izontal translations added. Most of the untranslated samples are der various poses around the viewing half-sphere, and under
tightly clustered at the top right corner, and the translated samples various illuminations. The views have 18 azimuths (every
are grouped at the sides of the output. 20 degrees around the circle), 9 elevations (from 30 to 70

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
degrees every 5 degrees), and 6 lighting conditions (4 lights used to create an embedding using LLE. Although arbitrary
in various on-off combinations). The objective is to learn a neighborhoods can be used in the LLE algorithm, the al-
globally coherent mapping to a 3D manifold that is invariant gorithm computes linear reconstruction weights to embed
to lighting conditions. A pattern based on temporal conti- the samples, which severely limits the desired effect of us-
nuity of the camera was used to construct a neighborhood ing distant neighbors. The embedding produced by LLE is
graph; images are similar if they were taken from contigu- shown (see figure 10). Clearly, the 3D embedding is not
ous elevation or azimuth regardless of lighting. Images may invariant to lighting, and the organization of azimuth and
be neighbors even if they are very distant in terms of Eucli- elevation does not reflect the real topology neighborhood
den distance in pixel space, due to different lighting. graph.
The dataset was split into 660 training images and a 312
test images. The result of training on all 10989 similar pairs
and 206481 dissimilar pairs is a 3-dimensional manifold in
the shape of a cylinder (see figure 8). The circumference
of the cylinder corresponds to change in azimuth in input
space, while the height of the cylinder corresponds to ele-
vation in input space. The mapping is completely invariant
to lighting. This outcome is quite remarkable. Using only
local neighborhood relationships, the learned manifold cor-
responds globally to the positions of the camera as it pro-
duced the dataset.
Viewing the weights of the network helps explain how
the mapping learned illumination invariance (see figure 9).
The concentric rings match edges on the airplanes to a par-
ticular azimuth and elevation, and the rest of the weights
are close to 0. The dark edges and shadow of the wings, for
example, are relatively consistent regardless of lighting.

Figure 10. 3d embedding of NORB images by LLE algorithm. The

neighborhood graph was constructed to create invariance to light-
ing, but the linear reconstruction weights of LLE force it organize
the embedding by lighting. The shape of the embedding resembles
a folded paper. The top image shows the ’v’ shape of the fold and
the lower image looks into the valley of the fold.

Figure 9. The weights of the 20 hidden units of a fully-connected

4 Discussion and Future Work
neural network trained with DrLIM on airplane images from the
NORB dataset. Since the camera rotates 360o around the airplane The experiments presented here demonstrate that, unless
and the mapping must be invariant to lighting, the weights are zero prior knowledge is used to create invariance, variabilities
except to detect edges at each azimuth and elevation; thus the con- such as lighting and registration can dominate and distort
centric patterns. the outcome of dimensionality reduction. The proposed ap-
proach, DrLIM, offers a solution: it is able to learn an in-
For comparison, the same neighborhood relationships variant mapping to a low dimensional manifold using prior
deﬁned by the prior knowledge in this experiment were knowledge. The complexity of the invariances that can be

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
Figure 8. Test set results: the DrLIM approach learned a mapping to 3d space for images of a single airplane (extracted from NORB dataset).
The output manifold is shown under ﬁ ve different viewing angles. The manifold is roughly cylindrical with a systematic organization: along
the circumference varies azimuth of camera in the viewing half-sphere. Along the height varies the camera elevation in the viewing sphere.
The mapping is invariant to the lighting condition, thanks to the prior knowledge built into the neighborhood relationships.

learned are only limited by the power of the parameterized tering. Advances in Neural Information Processing Systems,
function GW . The function maps inputs that evenly cover 16, 2004. In S. Thrun, L.K. Saul and B. Scholkopf, editors.
a manifold, as can be seen by the experimental results. It Cambrige, MA. MIT Press.
[4] J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah.
also faithfully maps new, unseen samples to meaningful lo-
Signature veriﬁ cation using a siamese time delay neural net-
cations on the manifold. work. J. Cowan and G. Tesauro (eds) Advances in Neural
The strength of DrLIM lies in the contrastive loss func- Information Processing Systems, 1993.
tion. By using a separate loss function for similar and dis- [5] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity
similar pairs, the system avoids collapse to a constant func- metric discriminatively, with applications to face veriﬁ caton.
tion and maintains an equilibrium in output space, much as In Proceedings of the IEEE Conference on Computer Vision
a mechanical system of interconnected springs does. and Pattern Recognition (CVPR-05), 1:539–546, 2005.
[6] T. Cox and M. Cox. Multidimensional scaling. London:
The experiments with LLE show that LLE is most useful
Chapman and Hill, 1994.
where the input samples are locally very similar and well- [7] T. I. Jolliffe. Principal component analysis. New York:
registered. If this is not the case, then LLE may give degen- Springer-Verlag, 1986.
erate results. Although it is possible to run LLE with arbi- [8] D. L. Donoho and C. E. Grimes. Hessian eigenmap: Lo-
trary neighborhood relationships, the linear reconstruction cally linear embedding techniques for high dimensional data.
of the samples negates the effect of very distant neighbors. Proceedings of the National Academy of Arts and Sciences,
Other dimensionality reduction methods have avoided this 100:5591–5596, 2003.
[9] Y. LeCun, F. J. Huang, and L. Bottou. Learning methods for
limitation, but none produces a function that can accept new generic object recognition with invariance to pose and light-
samples without recomputation or prior knowledge. ing. In Proceedings of the IEEE Conference on Computer
Creating a dimensionality reduction mapping using prior Vision and Pattern Recognition (CVPR-04), 2:97–104, 2004.
knowledge has other uses. Given the success of the NORB [10] K. Q. Weinberger, F. Sha, and L. K. Saul. Learning a kernel
experiment, in which the positions of the camera were matrix for nonlinear dimensionality reduction. In Proceed-
learned from prior knowledge of the temporal connections ings of the Twenty First International Conference on Ma-
chine Learning (ICML-04), pages 839–846, 2004.
between images, it may be feasible to learn a robot’s posi- [11] B. Schőelkopf, A. J. Smola, and K. R. Muller. Nonlinear
tion and heading from image sequences. component analysis as a kernel eigen-value problem. Neural
Computation, 10:1299–1219, 1998.
[12] J. Shi and J. Malik. Normalized cuts and image segmenta-
References tion. IEEE Transactions on Pattern Analysis and Machine
Intelligence (PAMI), pages 888–905, 2000.
[1] J. B. Tenenbaum, V. DeSilva, and J. C. Langford. A global [13] S. T. Roweis and L. K. Saul. Nonlinear dimensionality re-
geometric framework for non linear dimensionality reduc- duction by locally linear embedding. Science, 290.
tion. Science, 290:2319–2323, 2000. [14] P. Vincent and Y. Bengio. A neural support vector network
[2] M. Belkin and P. Niyogi. Laplacian eigenmaps and spec- architecture with adaptive kernels. In Proc. of the Interna-
tral techniques for embedding and clustering. Advances in tional Joint Conference on Neural Networks, 5, July 2000.
Neural Information Processing Systems, 15(6):1373–1396, [15] A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering:
2003. Analysis and an algorithm. Advances in Neural Information
[3] Y. Bengio, J. F. Paiement, and P. Vincent. Out-of-sample ex- Processing Systems, 14:849–856, 2002.
tensions for lle, isomap, mds, eigenmaps, and spectral clus-

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE

Hadsell Chopra Lecun 06 PDF
No ratings yet
Hadsell Chopra Lecun 06 PDF
8 pages
Grauman Darrell Iccv05
No ratings yet
Grauman Darrell Iccv05
8 pages
Rank Priors For Continuous Non-Linear Dimensionality Reduction
No ratings yet
Rank Priors For Continuous Non-Linear Dimensionality Reduction
8 pages
p117 Andoni
No ratings yet
p117 Andoni
6 pages
Dense Disparity Maps From Sparse Disparity Measurements
No ratings yet
Dense Disparity Maps From Sparse Disparity Measurements
8 pages
Fast Subspace Search Via Grassmannian Based Hashing
No ratings yet
Fast Subspace Search Via Grassmannian Based Hashing
8 pages
NeurIPS 2023 Smoothing The Landscape Boosts The Signal For SGD Optimal Sample Complexity For Learning Single Index Models Paper Conference
No ratings yet
NeurIPS 2023 Smoothing The Landscape Boosts The Signal For SGD Optimal Sample Complexity For Learning Single Index Models Paper Conference
33 pages
Mean Shift a Robust Approach Toward Feature Space Analysis
No ratings yet
Mean Shift a Robust Approach Toward Feature Space Analysis
17 pages
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
No ratings yet
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
12 pages
Unsupervised Machine Learning On A Hybrid Quantum Computer
No ratings yet
Unsupervised Machine Learning On A Hybrid Quantum Computer
17 pages
New Bridges Between Deep Learning and Partial Differential Equations
No ratings yet
New Bridges Between Deep Learning and Partial Differential Equations
5 pages
Graph Embedding and Extensions: A General Framework For Dimensionality Reduction
No ratings yet
Graph Embedding and Extensions: A General Framework For Dimensionality Reduction
12 pages
1368 - نیک افشان
No ratings yet
1368 - نیک افشان
6 pages
Geometric Diffusions As A Tool For Harmonic Analysis and Structure Definition of Data: Diffusion Maps
No ratings yet
Geometric Diffusions As A Tool For Harmonic Analysis and Structure Definition of Data: Diffusion Maps
6 pages
Simonovsky Dynamic Edge-Conditioned Filters CVPR 2017 Paper
No ratings yet
Simonovsky Dynamic Edge-Conditioned Filters CVPR 2017 Paper
10 pages
Matrix Completion From Power-Law Distributed Samples: Raghu Meka, Prateek Jain, and Inderjit S. Dhillon
No ratings yet
Matrix Completion From Power-Law Distributed Samples: Raghu Meka, Prateek Jain, and Inderjit S. Dhillon
9 pages
N-20087
No ratings yet
N-20087
8 pages
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
No ratings yet
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
10 pages
Isometric Projection: Report No. UIUCDCS-R-2006-2747 UILU-ENG-2006-1787
No ratings yet
Isometric Projection: Report No. UIUCDCS-R-2006-2747 UILU-ENG-2006-1787
24 pages
Nguyen Clusformer A Transformer Based Clustering Approach To Unsupervised Large-Scale Face CVPR 2021 Paper
No ratings yet
Nguyen Clusformer A Transformer Based Clustering Approach To Unsupervised Large-Scale Face CVPR 2021 Paper
10 pages
GNN-Foundations-Frontiers-and-Applications-chapter2
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter2
10 pages
Towards K-Means-Friendly Spaces: Simultaneous Deep Learning and Clustering
No ratings yet
Towards K-Means-Friendly Spaces: Simultaneous Deep Learning and Clustering
12 pages
Dynamical Measure Transport and Neural PDE Solvers For Sampling
No ratings yet
Dynamical Measure Transport and Neural PDE Solvers For Sampling
25 pages
Elastic Net Hypergraph Learning For Image Clustering and Seni Supervised Learning Liu2016
No ratings yet
Elastic Net Hypergraph Learning For Image Clustering and Seni Supervised Learning Liu2016
13 pages
2016-IJCAI-Parameter-Free Auto-Weighted Multiple Graph Learning
No ratings yet
2016-IJCAI-Parameter-Free Auto-Weighted Multiple Graph Learning
7 pages
Nonparametric Shape Priors For Active Contour-Based Image Segmentation
No ratings yet
Nonparametric Shape Priors For Active Contour-Based Image Segmentation
4 pages
MLSys-2022-gpu-semiring-primitives-for-sparse-neighborhood-methods-Paper
No ratings yet
MLSys-2022-gpu-semiring-primitives-for-sparse-neighborhood-methods-Paper
15 pages
A Neural Network Approach To Ordinal Regression
No ratings yet
A Neural Network Approach To Ordinal Regression
6 pages
Machine Learning and Geodesy A Survey
No ratings yet
Machine Learning and Geodesy A Survey
17 pages
Laude 2018 Discrete Continuous
No ratings yet
Laude 2018 Discrete Continuous
13 pages
Think Globally, Fit Locally
No ratings yet
Think Globally, Fit Locally
33 pages
A DOA Estimation Method in The Presence of Unknown Mutual Coupling Based On Nested Arrays
No ratings yet
A DOA Estimation Method in The Presence of Unknown Mutual Coupling Based On Nested Arrays
6 pages
Model Averaging With Discrete Bayesian Network Classifiers1
No ratings yet
Model Averaging With Discrete Bayesian Network Classifiers1
8 pages
Convolutional Knowledge Graph Embeddings
No ratings yet
Convolutional Knowledge Graph Embeddings
8 pages
Graph Regularized Non-Negative Matrix Factorization For Data Representation
No ratings yet
Graph Regularized Non-Negative Matrix Factorization For Data Representation
14 pages
2018_Semi-Supervised_Remote_Sensing_Classification_Via_Associative_Transfer
No ratings yet
2018_Semi-Supervised_Remote_Sensing_Classification_Via_Associative_Transfer
4 pages
CVPR2019 Residual Regression With Semantic Prior For Crowd Counting
No ratings yet
CVPR2019 Residual Regression With Semantic Prior For Crowd Counting
10 pages
Censoring Sensitive Data From Images
No ratings yet
Censoring Sensitive Data From Images
6 pages
PhysRevX 14 031001
No ratings yet
PhysRevX 14 031001
24 pages
CPD-Structured Multivariate Polynomial Optimization
No ratings yet
CPD-Structured Multivariate Polynomial Optimization
24 pages
Nonlinear Dimensionality Reduction by Locally Linear Embedding
No ratings yet
Nonlinear Dimensionality Reduction by Locally Linear Embedding
5 pages
Breaking The Curse of Dimensionality With Convex Neural Networks
No ratings yet
Breaking The Curse of Dimensionality With Convex Neural Networks
53 pages
2003.09573v1
No ratings yet
2003.09573v1
16 pages
Gropp 20 A
No ratings yet
Gropp 20 A
11 pages
Zhang Learning Deep CNN CVPR 2017 Paper
No ratings yet
Zhang Learning Deep CNN CVPR 2017 Paper
10 pages
A5f1 PDF
No ratings yet
A5f1 PDF
23 pages
Solving Inverse Problems
No ratings yet
Solving Inverse Problems
31 pages
Loss Landscapes and Optimization in Over-Parameterized Non-Linear Systems and Neural Networks
No ratings yet
Loss Landscapes and Optimization in Over-Parameterized Non-Linear Systems and Neural Networks
31 pages
Pattern Recognition Letters: René Vidal, Paolo Favaro
No ratings yet
Pattern Recognition Letters: René Vidal, Paolo Favaro
15 pages
Image - Anomaly - Detection With - GAN
No ratings yet
Image - Anomaly - Detection With - GAN
15 pages
DuchiShSiCh08
No ratings yet
DuchiShSiCh08
8 pages
BEST-4-Gradient-Based Competitive Learning Theory
No ratings yet
BEST-4-Gradient-Based Competitive Learning Theory
17 pages
Improved Financial Forecasting Via Quantum Machine Learning
No ratings yet
Improved Financial Forecasting Via Quantum Machine Learning
19 pages
[Cao'12]Generalization-Bounds-Metric-Learning
No ratings yet
[Cao'12]Generalization-Bounds-Metric-Learning
20 pages
Zhuang 2017
No ratings yet
Zhuang 2017
12 pages
Evaluating_Representation_Learning_and_Graph_Layout_Methods_for_Visualization
No ratings yet
Evaluating_Representation_Learning_and_Graph_Layout_Methods_for_Visualization
10 pages
Facenet: A Unified Embedding For Face Recognition and Clustering
No ratings yet
Facenet: A Unified Embedding For Face Recognition and Clustering
9 pages
Blur Cvpr12
No ratings yet
Blur Cvpr12
20 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
Mental Health Analysis in Social Media Posts: A Survey: Muskan Garg
No ratings yet
Mental Health Analysis in Social Media Posts: A Survey: Muskan Garg
24 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Get (Ebook) Data Science and Predictive Analytics: Biomedical and Health Applications using R, 2nd by Ivo D. Dinov ISBN 9783031174827, 3031174828 free all chapters
100% (2)
Get (Ebook) Data Science and Predictive Analytics: Biomedical and Health Applications using R, 2nd by Ivo D. Dinov ISBN 9783031174827, 3031174828 free all chapters
81 pages
Towards An Algorithmic Realization of Nash's Embedding Theorem
No ratings yet
Towards An Algorithmic Realization of Nash's Embedding Theorem
24 pages
2021 OxfordBibliographies GeoAI
No ratings yet
2021 OxfordBibliographies GeoAI
17 pages
Park Manifoldsgeometryandrobotics
No ratings yet
Park Manifoldsgeometryandrobotics
111 pages
Artificial Intelligence For Topic Modelling in Hin
No ratings yet
Artificial Intelligence For Topic Modelling in Hin
39 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
3191 Random Projections For Manifold Learning
No ratings yet
3191 Random Projections For Manifold Learning
8 pages
Literature Review of Fourier Series
100% (2)
Literature Review of Fourier Series
8 pages
Fastsvd-Ml-Rom: A - : Reduced Order Modeling Framework Based On Machine Learning For Real Time Applications
No ratings yet
Fastsvd-Ml-Rom: A - : Reduced Order Modeling Framework Based On Machine Learning For Real Time Applications
35 pages
A Review of Intelligent Airfoil Aerodynamic Optimization Methods Based On Data-Driven Advanced Models (For Aerodynamic Shape Optimization) (2023)
No ratings yet
A Review of Intelligent Airfoil Aerodynamic Optimization Methods Based On Data-Driven Advanced Models (For Aerodynamic Shape Optimization) (2023)
21 pages
Path-Based Spectral Clustering: Guarantees, Robustness To Outliers, and Fast Algorithms
No ratings yet
Path-Based Spectral Clustering: Guarantees, Robustness To Outliers, and Fast Algorithms
66 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
Lecture Dimensionality Reduction
No ratings yet
Lecture Dimensionality Reduction
34 pages
Lie Group Machine Learning 1st Edition Fanzhang Li download pdf
100% (17)
Lie Group Machine Learning 1st Edition Fanzhang Li download pdf
60 pages
Exploratory Data Analysis Updated
No ratings yet
Exploratory Data Analysis Updated
44 pages
s00521-022-08017-3
No ratings yet
s00521-022-08017-3
42 pages
Machine Learning Toolkit User Manual
No ratings yet
Machine Learning Toolkit User Manual
7 pages
Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys 2024 scribd download
100% (1)
Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys 2024 scribd download
55 pages
Arduino
No ratings yet
Arduino
6 pages
Nonlinear Dimensionality Reduction Techniques A Data Structure Preservation Approach 1st Edition Sylvain Lespinats Benoit Colange Denys Dutykh - Own the complete ebook with all chapters in PDF format
100% (1)
Nonlinear Dimensionality Reduction Techniques A Data Structure Preservation Approach 1st Edition Sylvain Lespinats Benoit Colange Denys Dutykh - Own the complete ebook with all chapters in PDF format
69 pages
IEEE Dimensionality Reduction
No ratings yet
IEEE Dimensionality Reduction
6 pages
5. Clustering in non-euclidean space
No ratings yet
5. Clustering in non-euclidean space
4 pages
Sparse Representations For Radar With MATLAB Examples (Peter Knee) (Z-Library)
No ratings yet
Sparse Representations For Radar With MATLAB Examples (Peter Knee) (Z-Library)
87 pages
Full download (eBook PDF) A First Course in Mathematical Modeling 4th Edition pdf docx
100% (6)
Full download (eBook PDF) A First Course in Mathematical Modeling 4th Edition pdf docx
46 pages
T 2V: D R T: OP EC Istributed Epresentations of Opics
No ratings yet
T 2V: D R T: OP EC Istributed Epresentations of Opics
25 pages
21AI502 Syllbus
No ratings yet
21AI502 Syllbus
5 pages
Mathematical Algorithms For Artificial Intelligence and Big Data
No ratings yet
Mathematical Algorithms For Artificial Intelligence and Big Data
34 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
15 pages

Hadsell Et Al - Dimensionality Reduction by Learning an Invariant Mapping

Uploaded by

Hadsell Et Al - Dimensionality Reduction by Learning an Invariant Mapping

Uploaded by

Dimensionality Reduction by Learning an Invariant Mapping

Raia Hadsell Sumit Chopra Yann LeCun

Abstract age data is limited because linearly combining images only

The loss function L is minimized using the stochastic gra-

2.3 The Algorithm

The algorithm ﬁrst generates the training set, then trains

Combine all the pairs to form the labeled training set.

(force) on the two points that are dissimilar and are at a

Figure 4. Experiment demonstrating the effectiveness of the Dr-

borhood graph. The result was a degenerate embedding in

Figure 7. This experiment measured DrLIM’s success at learning

3.4 Mapping Learned with Temporal

The ﬁnal experiment demonstrates dimensionality re-

Figure 10. 3d embedding of NORB images by LLE algorithm. The

Figure 9. The weights of the 20 hidden units of a fully-connected

You might also like