ISOMAP in ML
ISOMAP in ML
DSC 3601
OCTOBER 11
20DSC216
M. VISHWA
MACHINE LEARNING
Predictive Modelling
Predictive modelling is a probabilistic process that allows us to forecast
outcomes, on the basis of some predictors. These predictors are basically features
that come into play when deciding the final result, i.e. the outcome of the model.
Dimensionality Reduction
In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These factors are
basically variables called features. The higher the number of features, the harder
it gets to visualize the training set and then work on it. Sometimes, most of these
features are correlated, and hence redundant. This is where dimensionality
reduction algorithms come into play. Dimensionality reduction is the process of
reducing the number of random variables under consideration, by obtaining a set
of principal variables. It can be divided into feature selection and feature
extraction.
Dimensionality reduction technique can be defined as, "It is a way of
converting the higher dimensions dataset into lesser dimensions dataset
ensuring that it provides similar information." These techniques are widely
used in machine learning for obtaining a better fit predictive model while solving
the classification and regression problems. It is commonly used in the fields that
deal with high-dimensional data, such as speech recognition, signal processing,
bioinformatics, etc. It can also be used for data visualization, noise reduction,
cluster analysis, etc.
Importance of Dimensionality Reduction in Machine
Learning & Predictive Modelling
An intuitive example of dimensionality reduction can be discussed through
a simple e-mail classification problem, where we need to classify whether the e-
mail is spam or not. This can involve a large number of features, such as whether
or not the e-mail has a generic title, the content of the e-mail, whether the e-mail
uses a template, etc. However, some of these features may overlap. In another
condition, a classification problem that relies on both humidity and rainfall can
be collapsed into just one underlying feature, since both of the aforementioned
are correlated to a high degree. Hence, we can reduce the number of features in
such problems. A 3-D classification problem can be hard to visualize, whereas a
2-D one can be mapped to a simple 2 dimensional space, and a 1-D problem to a
simple line. The below figure illustrates this concept, where a 3-D feature space
is split into two 2-D feature spaces, and later, if found to be correlated, the number
of features can be reduced even further.
IsoMap Embedding:
Isomap is a nonlinear dimensionality reduction method. It is one of several
widely used low-dimensional embedding methods. Isomap is used for computing
a quasi-isometric, low-dimensional embedding of a set of high-dimensional data
points. The algorithm provides a simple method for estimating the intrinsic
geometry of a data manifold based on a rough estimate of each data point’s
neighbors on the manifold. Isomap is highly efficient and generally applicable to
a broad range of data sources and dimensionalities. Isomap is a technique that
combines several different algorithms, enabling it to use a non-linear way to
reduce dimensions while preserving local structures.
Use a KNN approach to find the k nearest neighbors of every data point.
Here, “k” is an arbitrary number of neighbors that you can specify within
model hyperparameters.
Once the neighbors are found, construct the neighborhood graph where
points are connected to each other if they are each other’s neighbors. Data
points that are not neighbors remain unconnected.
Compute the shortest path between each pair of data points (nodes).
Typically, it is either Floyd-Warshall or Dijkstra’s algorithm that is used
for this task. Note, this step is also commonly described as finding a
geodesic distance between points.
Use multidimensional scaling (MDS) to compute lower-dimensional
embedding. Given distances between each pair of points are known, MDS
places each object into the N-dimensional space (N is specified as a
hyperparameter) such that the between-point distances are preserved as
well as possible.
# Visualization
import plotly.express as px # for data visualization
import matplotlib.pyplot as plt # for showing handwritten digits
# Skleran
from sklearn.datasets import load_digits # for MNIST data
from sklearn.manifold import Isomap # for Isomap reduction
digits = load_digits()
# Load arrays containing digit data (64 pixels per image) and their true labels
X, y = load_digits(return_X_y=True)
# Some stats
print('Shape of digit images: ', digits.images.shape)
print('Shape of X (training data): ', X.shape)
print('Shape of y (true labels): ', y.shape)
#Let’s display the first 10 handwritten digits, so we have a better idea of what
we are working with.
fig, axs = plt.subplots(2, 5, sharey=False, tight_layout=True, figsize=(12,6),
facecolor='white')
n=0
plt.gray()
for i in range(0,2):
for j in range(0,5):
axs[i,j].matshow(digits.images[n])
axs[i,j].set(title=y[n])
n=n+1
plt.show()
Isometric Mapping
We will now apply Isomap to reduce the number of dimensions for each record
in the X array from 64 to 3.
### Step 1 - Configure the Isomap function, note we use default hyperparameter values in this
example
embed3 = Isomap(
n_neighbors=5, # default=5, algorithm finds local structures based on the nearest neighbors
n_components=3, # number of dimensions
eigen_solver='auto', # {‘auto’, ‘arpack’, ‘dense’}, default=’auto’
tol=0, # default=0, Convergence tolerance passed to arpack or lobpcg. not used if
eigen_solver == ‘dense’.
max_iter=None, # default=None, Maximum number of iterations for the arpack solver. not
used if eigen_solver == ‘dense’.
path_method='auto', # {‘auto’, ‘FW’, ‘D’}, default=’auto’, Method to use in finding shortest
path.
neighbors_algorithm='auto', # neighbors_algorithm{‘auto’, ‘brute’, ‘kd_tree’, ‘ball_tree’},
default=’auto’
n_jobs=-1, # n_jobsint or None, default=None, The number of parallel jobs to run. -1 means
using all processors
metric='minkowski', # string, or callable, default=”minkowski”
p=2, # default=2, Parameter for the Minkowski metric. When p = 1, this is equivalent to
using manhattan_distance (l1), and euclidean_distance (l2) for p = 2
metric_params=None # default=None, Additional keyword arguments for the metric
function.
)
### Step 2 - Fit the data and transform it, so we have 3 dimensions instead of 64
X_trans3 = embed3.fit_transform(X)
#Finally, let’s plot a 3D scatter plot to see what the data looks like after
reducing dimensions down to 3.
fig.show()
Conclusions
Isomap is one of the best tools for dimensionality reduction, enabling us to
preserve non-linear relationships between data points.