ISOMAP in ML

This document discusses dimensionality reduction techniques in machine learning. It begins by defining dimensionality reduction as reducing the number of random variables under consideration to obtain a set of principal variables. Two main techniques are feature selection and feature extraction. It then discusses Isomap, a nonlinear dimensionality reduction method that computes a low-dimensional embedding of data while preserving local structures. The document demonstrates applying Isomap to reduce the MNIST handwritten digits dataset from 64 to 3 dimensions for visualization.

Uploaded by

Vishwa Muthukumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

ISOMAP in ML

Uploaded by

Vishwa Muthukumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

ML –

DSC 3601

OCTOBER 11

20DSC216
M. VISHWA
MACHINE LEARNING

Machine learning is nothing but a field of study which allows computers

to “learn” like humans without any need of explicit programming. Machine
learning is a branch of artificial intelligence (AI) and computer science which
focuses on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.

Predictive Modelling
Predictive modelling is a probabilistic process that allows us to forecast
outcomes, on the basis of some predictors. These predictors are basically features
that come into play when deciding the final result, i.e. the outcome of the model.

Dimensionality Reduction
In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These factors are
basically variables called features. The higher the number of features, the harder
it gets to visualize the training set and then work on it. Sometimes, most of these
features are correlated, and hence redundant. This is where dimensionality
reduction algorithms come into play. Dimensionality reduction is the process of
reducing the number of random variables under consideration, by obtaining a set
of principal variables. It can be divided into feature selection and feature
extraction.
Dimensionality reduction technique can be defined as, "It is a way of
converting the higher dimensions dataset into lesser dimensions dataset
ensuring that it provides similar information." These techniques are widely
used in machine learning for obtaining a better fit predictive model while solving
the classification and regression problems. It is commonly used in the fields that
deal with high-dimensional data, such as speech recognition, signal processing,
bioinformatics, etc. It can also be used for data visualization, noise reduction,
cluster analysis, etc.
Importance of Dimensionality Reduction in Machine
Learning & Predictive Modelling
An intuitive example of dimensionality reduction can be discussed through
a simple e-mail classification problem, where we need to classify whether the e-
mail is spam or not. This can involve a large number of features, such as whether
or not the e-mail has a generic title, the content of the e-mail, whether the e-mail
uses a template, etc. However, some of these features may overlap. In another
condition, a classification problem that relies on both humidity and rainfall can
be collapsed into just one underlying feature, since both of the aforementioned
are correlated to a high degree. Hence, we can reduce the number of features in
such problems. A 3-D classification problem can be hard to visualize, whereas a
2-D one can be mapped to a simple 2 dimensional space, and a 1-D problem to a
simple line. The below figure illustrates this concept, where a 3-D feature space
is split into two 2-D feature spaces, and later, if found to be correlated, the number
of features can be reduced even further.

Components of Dimensionality Reduction

There are two components of dimensionality reduction:
Feature selection:
Feature selection is the process of selecting the subset of the relevant
features and leaving out the irrelevant features present in a dataset to build a
model of high accuracy. In other words, it is a way of selecting the optimal
features from the input dataset. It usually involves three ways:
 Filter
 Wrapper
 Embedded
Feature extraction:
Feature extraction is the process of transforming the space containing many
dimensions into space with fewer dimensions. This approach is useful when we
want to keep the whole information but use fewer resources while processing the
information. This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.
Techniques of dimensionality reduction:

Advantages of Dimensionality Reduction

 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.

Disadvantages of Dimensionality Reduction

 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is
sometimes undesirable.
 PCA fails in cases where mean and covariance are not enough to define
datasets.
 We may not know how many principal components to keep- in practice,
some thumb rules are applied.

IsoMap Embedding:
Isomap is a nonlinear dimensionality reduction method. It is one of several
widely used low-dimensional embedding methods. Isomap is used for computing
a quasi-isometric, low-dimensional embedding of a set of high-dimensional data
points. The algorithm provides a simple method for estimating the intrinsic
geometry of a data manifold based on a rough estimate of each data point’s
neighbors on the manifold. Isomap is highly efficient and generally applicable to
a broad range of data sources and dimensionalities. Isomap is a technique that
combines several different algorithms, enabling it to use a non-linear way to
reduce dimensions while preserving local structures.

High-level steps that Isomap performs:

 Use a KNN approach to find the k nearest neighbors of every data point.
Here, “k” is an arbitrary number of neighbors that you can specify within
model hyperparameters.
 Once the neighbors are found, construct the neighborhood graph where
points are connected to each other if they are each other’s neighbors. Data
points that are not neighbors remain unconnected.
 Compute the shortest path between each pair of data points (nodes).
Typically, it is either Floyd-Warshall or Dijkstra’s algorithm that is used
for this task. Note, this step is also commonly described as finding a
geodesic distance between points.
 Use multidimensional scaling (MDS) to compute lower-dimensional
embedding. Given distances between each pair of points are known, MDS
places each object into the N-dimensional space (N is specified as a
hyperparameter) such that the between-point distances are preserved as
well as possible.

Isomap in Python to reduce the dimensions of Data

Let’s now use Isomap to reduce the high dimensionality of pictures within
the MNIST dataset (a collection of handwritten digits). This will enable us to see
how different digits cluster together in a 3D space.
Setup:
We will use the following data and libraries:
 Scikit-learn library for
1) MNIST digit data from sklearn’s datasets (load_digits);
2) performing Isometric Mapping (Isomap);
 Plotly and Matplotlib for data visualizations
 Pandas for data manipulation
Let’s import libraries.

import pandas as pd # for data manipulation

# Visualization
import plotly.express as px # for data visualization
import matplotlib.pyplot as plt # for showing handwritten digits

# Skleran
from sklearn.datasets import load_digits # for MNIST data
from sklearn.manifold import Isomap # for Isomap reduction

#Next, we load the MNIST data.

digits = load_digits()

# Load arrays containing digit data (64 pixels per image) and their true labels
X, y = load_digits(return_X_y=True)

# Some stats
print('Shape of digit images: ', digits.images.shape)
print('Shape of X (training data): ', X.shape)
print('Shape of y (true labels): ', y.shape)

#Let’s display the first 10 handwritten digits, so we have a better idea of what
we are working with.
fig, axs = plt.subplots(2, 5, sharey=False, tight_layout=True, figsize=(12,6),
facecolor='white')
n=0
plt.gray()
for i in range(0,2):
for j in range(0,5):
axs[i,j].matshow(digits.images[n])
axs[i,j].set(title=y[n])
n=n+1
plt.show()

Isometric Mapping
We will now apply Isomap to reduce the number of dimensions for each record
in the X array from 64 to 3.

### Step 1 - Configure the Isomap function, note we use default hyperparameter values in this
example
embed3 = Isomap(
n_neighbors=5, # default=5, algorithm finds local structures based on the nearest neighbors
n_components=3, # number of dimensions
eigen_solver='auto', # {‘auto’, ‘arpack’, ‘dense’}, default=’auto’
tol=0, # default=0, Convergence tolerance passed to arpack or lobpcg. not used if
eigen_solver == ‘dense’.
max_iter=None, # default=None, Maximum number of iterations for the arpack solver. not
used if eigen_solver == ‘dense’.
path_method='auto', # {‘auto’, ‘FW’, ‘D’}, default=’auto’, Method to use in finding shortest
path.
neighbors_algorithm='auto', # neighbors_algorithm{‘auto’, ‘brute’, ‘kd_tree’, ‘ball_tree’},
default=’auto’
n_jobs=-1, # n_jobsint or None, default=None, The number of parallel jobs to run. -1 means
using all processors
metric='minkowski', # string, or callable, default=”minkowski”
p=2, # default=2, Parameter for the Minkowski metric. When p = 1, this is equivalent to
using manhattan_distance (l1), and euclidean_distance (l2) for p = 2
metric_params=None # default=None, Additional keyword arguments for the metric
function.
)

### Step 2 - Fit the data and transform it, so we have 3 dimensions instead of 64
X_trans3 = embed3.fit_transform(X)

### Step 3 - Print shape to test

print('The new shape of X: ',X_trans3.shape)

#Finally, let’s plot a 3D scatter plot to see what the data looks like after
reducing dimensions down to 3.

# Create a 3D scatter plot

fig = px.scatter_3d(None,
x=X_trans3[:,0], y=X_trans3[:,1], z=X_trans3[:,2],
color=y.astype(str),
height=900, width=900
)
# Update chart looks
fig.update_layout(#title_text="Scatter 3D Plot",
showlegend=True,
legend=dict(orientation="h", yanchor="top", y=0, xanchor="center", x=0.5),
scene_camera=dict(up=dict(x=0, y=0, z=1),
center=dict(x=0, y=0, z=-0.2),
eye=dict(x=-1.5, y=1.5, z=0.5)),
margin=dict(l=0, r=0, b=0, t=0),
scene = dict(xaxis=dict(backgroundcolor='white',
color='black',
gridcolor='#f0f0f0',
title_font=dict(size=10),
tickfont=dict(size=10),
),
yaxis=dict(backgroundcolor='white',
color='black',
gridcolor='#f0f0f0',
title_font=dict(size=10),
tickfont=dict(size=10),
),
zaxis=dict(backgroundcolor='lightgrey',
color='black',
gridcolor='#f0f0f0',
title_font=dict(size=10),
tickfont=dict(size=10),
)))
# Update marker size
fig.update_traces(marker=dict(size=2))

fig.show()

Note Book Code:

As you can see, Isomap has done a wonderful job in reducing dimensions from
64 to 3 while preserving non-linear relationships. This enabled us to visualize
the clusters of handwritten digits in a 3-dimensional space.

Conclusions
Isomap is one of the best tools for dimensionality reduction, enabling us to
preserve non-linear relationships between data points.

Spreadsheet Modeling and Decision Analysis: A Practical Introduction to Business Analytics, 9e 9th Edition Cliff Ragsdale - The full ebook version is ready for instant download
50% (2)
Spreadsheet Modeling and Decision Analysis: A Practical Introduction to Business Analytics, 9e 9th Edition Cliff Ragsdale - The full ebook version is ready for instant download
36 pages
IGCSE MATHEMATICS Function
No ratings yet
IGCSE MATHEMATICS Function
10 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Shake 2000 Manual
No ratings yet
Shake 2000 Manual
264 pages
UNIT 2-2
No ratings yet
UNIT 2-2
33 pages
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
UNIT-1 Regression vs. Classification
No ratings yet
UNIT-1 Regression vs. Classification
25 pages
Data Science Interview Questions 30 Days 1686062665
No ratings yet
Data Science Interview Questions 30 Days 1686062665
300 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
Ai new
No ratings yet
Ai new
4 pages
ML imppp (1)
No ratings yet
ML imppp (1)
12 pages
Lecture 7 Data Reduction
No ratings yet
Lecture 7 Data Reduction
5 pages
Deep Learning for Data Analytics 2023 Answer
No ratings yet
Deep Learning for Data Analytics 2023 Answer
6 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
Inbound 3415279694782152083
No ratings yet
Inbound 3415279694782152083
6 pages
Dimensionality
No ratings yet
Dimensionality
9 pages
Stacked Autoencoders. | Towards Data Science
No ratings yet
Stacked Autoencoders. | Towards Data Science
9 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Assignment 6
No ratings yet
Assignment 6
7 pages
Assignment
No ratings yet
Assignment
24 pages
ML Practical File
100% (2)
ML Practical File
43 pages
mini4
No ratings yet
mini4
9 pages
3
No ratings yet
3
7 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
TE Computer DSBDA
No ratings yet
TE Computer DSBDA
11 pages
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
No ratings yet
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
21 pages
Project Report: Optical Character Recognition Using Artificial Neural Network
No ratings yet
Project Report: Optical Character Recognition Using Artificial Neural Network
9 pages
Principal Component Analysis (PCA) in Machine Learning
No ratings yet
Principal Component Analysis (PCA) in Machine Learning
20 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Exercise RandomForest
No ratings yet
Exercise RandomForest
5 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Principal Component Analysis PCA in Machine Learning
No ratings yet
Principal Component Analysis PCA in Machine Learning
20 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Dimensionality reduction
No ratings yet
Dimensionality reduction
7 pages
Data Mining 11
No ratings yet
Data Mining 11
6 pages
MLSP Exp2
No ratings yet
MLSP Exp2
7 pages
Signature Verification Entailing Principal Component Analysis As A Feature Extractor
No ratings yet
Signature Verification Entailing Principal Component Analysis As A Feature Extractor
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
30 Days of Interview Preparation
100% (1)
30 Days of Interview Preparation
415 pages
Data Science Interview Questions #Week1
No ratings yet
Data Science Interview Questions #Week1
111 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
300 pages
Data Science Interview
100% (4)
Data Science Interview
12 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
6 Different Ways To Compensate For Missing Values in A Dataset
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
6 pages
PDSLabManualEXP7.docx (2)
No ratings yet
PDSLabManualEXP7.docx (2)
6 pages
Applications of Linear Algebra in Data Science
No ratings yet
Applications of Linear Algebra in Data Science
6 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Introduction to Data Science_ Data Preprocessing in Python _ by Karan Patel _ Python in Plain English
No ratings yet
Introduction to Data Science_ Data Preprocessing in Python _ by Karan Patel _ Python in Plain English
10 pages
DL DL2 DL3 Merged
No ratings yet
DL DL2 DL3 Merged
11 pages
Pattern Recognition Techniques
No ratings yet
Pattern Recognition Techniques
13 pages
Mvda - Question Bank
No ratings yet
Mvda - Question Bank
14 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Machine Learning Assignment Solution
No ratings yet
Machine Learning Assignment Solution
30 pages
UNIT2
No ratings yet
UNIT2
20 pages
Love Report
No ratings yet
Love Report
7 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
ICB 2012 1 Extract Renaissance
No ratings yet
ICB 2012 1 Extract Renaissance
25 pages
System Verilog Quick Ref
100% (5)
System Verilog Quick Ref
133 pages
Water Resource Economics The Analysis of Scarcity
No ratings yet
Water Resource Economics The Analysis of Scarcity
10 pages
Analysis: ΣF = F - mg= -ma
No ratings yet
Analysis: ΣF = F - mg= -ma
6 pages
Ansys Structural Brochure
No ratings yet
Ansys Structural Brochure
2 pages
Modulo Arts
No ratings yet
Modulo Arts
26 pages
g11 6 Trigonometry
No ratings yet
g11 6 Trigonometry
73 pages
Othm Level 4 Diploma in Business Management
No ratings yet
Othm Level 4 Diploma in Business Management
28 pages
Extensions: Critical Thinking
No ratings yet
Extensions: Critical Thinking
6 pages
Immediate download Advanced Construction Mathematics 1st Edition Surinder S. Virdi ebooks 2024
100% (2)
Immediate download Advanced Construction Mathematics 1st Edition Surinder S. Virdi ebooks 2024
55 pages
Chapter 8 Ppt New Period 3
No ratings yet
Chapter 8 Ppt New Period 3
12 pages
Question and Answer Jee Iit
No ratings yet
Question and Answer Jee Iit
49 pages
Class XI CS Practical File 2022-23
0% (1)
Class XI CS Practical File 2022-23
26 pages
Overlap of Computation and Communication Within Seqenence For LLM Inference
No ratings yet
Overlap of Computation and Communication Within Seqenence For LLM Inference
8 pages
Interference and Diffraction
No ratings yet
Interference and Diffraction
30 pages
Download full Probability and Statistics Degroot 3rd Edition Solutions Manual all chapters
100% (21)
Download full Probability and Statistics Degroot 3rd Edition Solutions Manual all chapters
20 pages
Fundamental Theorem of Arithmetic
0% (1)
Fundamental Theorem of Arithmetic
12 pages
Practical Statistics for Data Scientists
No ratings yet
Practical Statistics for Data Scientists
13 pages
Week 5 - Forecasting Techniques
No ratings yet
Week 5 - Forecasting Techniques
20 pages
Research of Sorting Technology Based On Industrial Robot of Machine Vision
No ratings yet
Research of Sorting Technology Based On Industrial Robot of Machine Vision
5 pages
(PDF Download) Quantitative Methods For Business 13th Edition Fulll Chapter
100% (3)
(PDF Download) Quantitative Methods For Business 13th Edition Fulll Chapter
53 pages
Ab Initio DFT and Its Role in Electronic Structure Theory
No ratings yet
Ab Initio DFT and Its Role in Electronic Structure Theory
14 pages
Hydraulics 2
No ratings yet
Hydraulics 2
6 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Measurements For The Beer Foam Experiment: Download The Video Here
No ratings yet
Measurements For The Beer Foam Experiment: Download The Video Here
2 pages
Eee 3 - 8
No ratings yet
Eee 3 - 8
159 pages
Prediction and Analysis of Gear Rattle: Development Transmissions
No ratings yet
Prediction and Analysis of Gear Rattle: Development Transmissions
5 pages