0% found this document useful (0 votes)
17 views

Neighbourhood Components Analysis

Neighbourhood components analysis is a supervised learning method that learns a distance metric to maximize classification performance. It transforms input data into a new space using a linear transformation matrix. The algorithm defines an objective function based on leave-one-out classification accuracy and iteratively optimizes the transformation matrix to maximize this function. This addresses the issue of selecting the optimal number of classes.

Uploaded by

john949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Neighbourhood Components Analysis

Neighbourhood components analysis is a supervised learning method that learns a distance metric to maximize classification performance. It transforms input data into a new space using a linear transformation matrix. The algorithm defines an objective function based on leave-one-out classification accuracy and iteratively optimizes the transformation matrix to maximize this function. This addresses the issue of selecting the optimal number of classes.

Uploaded by

john949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Neighbourhood components analysis

Neighbourhood components analysis is a supervised learning method for classifying multivariate data
into distinct classes according to a given distance metric over the data. Functionally, it serves the same
purposes as the K-nearest neighbors algorithm, and makes direct use of a related concept termed stochastic
nearest neighbours.

Definition
Neighbourhood components analysis aims at "learning" a distance metric by finding a linear transformation
of input data such that the average leave-one-out (LOO) classification performance is maximized in the
transformed space. The key insight to the algorithm is that a matrix corresponding to the transformation
can be found by defining a differentiable objective function for , followed by the use of an iterative solver
such as conjugate gradient descent. One of the benefits of this algorithm is that the number of classes can
be determined as a function of , up to a scalar constant. This use of the algorithm, therefore, addresses the
issue of model selection.

Explanation
In order to define , we define an objective function describing classification accuracy in the transformed
space and try to determine such that this objective function is maximized.

Leave-one-out (LOO) classification

Consider predicting the class label of a single data point by consensus of its -nearest neighbours with a
given distance metric. This is known as leave-one-out classification. However, the set of nearest-neighbours
can be quite different after passing all the points through a linear transformation. Specifically, the set of
neighbours for a point can undergo discrete changes in response to smooth changes in the elements of ,
implying that any objective function based on the neighbours of a point will be piecewise-constant,
and hence not differentiable.

Solution

We can resolve this difficulty by using an approach inspired by stochastic gradient descent. Rather than
considering the -nearest neighbours at each transformed point in LOO-classification, we'll consider the
entire transformed data set as stochastic nearest neighbours. We define these using a softmax function of
the squared Euclidean distance between a given LOO-classification point and each other point in the
transformed space:
The probability of correctly classifying data point is the probability of classifying the points of each of its
neighbours with the same class :

where is the probability of classifying neighbour of point .

Define the objective function using LOO classification, this time using the entire data set as stochastic
nearest neighbours:

Note that under stochastic nearest neighbours, the consensus class for a single point is the expected value
of a point's class in the limit of an infinite number of samples drawn from the distribution over its
neighbours i.e.: . Thus the predicted class is an affine
combination of the classes of every other point, weighted by the softmax function for each where
is now the entire transformed data set.

This choice of objective function is preferable as it is differentiable with respect to (denote


):

Obtaining a gradient for means that it can be found with an iterative solver such as conjugate gradient
descent. Note that in practice, most of the innermost terms of the gradient evaluate to insignificant
contributions due to the rapidly diminishing contribution of distant points from the point of interest. This
means that the inner sum of the gradient can be truncated, resulting in reasonable computation times even
for large data sets.

Alternative formulation

"Maximizing is equivalent to minimizing the -distance between the predicted class distribution and
the true class distribution (ie: where the induced by are all equal to 1). A natural alternative is the KL-
divergence, which induces the following objective function and gradient:" (Goldberger 2005)
In practice, optimization of using this function tends to give similar performance results as with the
original.

History and background


Neighbourhood components analysis was developed by Jacob Goldberger, Sam Roweis, Ruslan
Salakhudinov, and Geoff Hinton at the University of Toronto's department of computer science in 2004.

See also
Spectral clustering
Large margin nearest neighbor

References
J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov. (2005) Neighbourhood Components
Analysis (http://www.csri.utoronto.ca/~roweis/papers/ncanips.pdf). Advances in Neural
Information Processing Systems. 17, 513–520, 2005.

External links

Software
The MLPACK library contains a C++ implementation
nca (https://github.com/vomjom/nca) (C++)
scikit-learn's "NeighborhoodComponentsAnalysis (https://scikit-learn.org/stable/modules/ge
nerated/sklearn.neighbors.NeighborhoodComponentsAnalysis.html)" implementation
(Python)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Neighbourhood_components_analysis&oldid=1156400667"

You might also like