0% found this document useful (0 votes)

10 views

Dimensionality Reduction Algorithms

Uploaded by

crce.9546.ce

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Dimensionality Reduction Algorithms

Uploaded by

crce.9546.ce

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Dimensionality Reduction Algorithms

Explain Dimensionality Reduction Algorithms.

Dimensionality Reduction Algorithms

Dimensionality reduction refers to the process of reducing the number of features (or dimensions) in a
dataset while retaining as much information as possible. It is used to overcome the curse of
dimensionality, simplify models, reduce computational cost, and make data more interpretable.

Types of Dimensionality Reduction

1. Feature Selection: Selects a subset of the original features based on their importance or
relevance.
Techniques: Filter methods (e.g., correlation), Wrapper methods (e.g., recursive feature
elimination), and Embedded methods (e.g., LASSO).
2. Feature Extraction: Transforms the data into a new set of features that capture the most critical
information.
Techniques: Principal Component Analysis (PCA), t-SNE, and Autoencoders.

Popular Dimensionality Reduction Algorithms

1. Principal Component Analysis (PCA)

Type: Linear, unsupervised.

Concept: Projects data onto a lower-dimensional subspace while maximizing variance. It does this
by finding new axes (principal components) that are linear combinations of the original features.
Use Case: When features are highly correlated.
Steps:
1. Standardize the data.
2. Compute the covariance matrix.
3. Perform eigen decomposition to find eigenvalues and eigenvectors.
4. Select top eigenvectors (principal components) corresponding to the largest eigenvalues.

2. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Type: Non-linear, unsupervised.

Concept: Maps high-dimensional data to a lower-dimensional space (usually 2D or 3D) by
preserving local similarity. It minimizes the divergence between probability distributions of points
in the original and reduced spaces.
Use Case: Visualizing high-dimensional data (e.g., in clustering or classification tasks).

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/7
Limitations: Computationally expensive and may not preserve global structure.

3. Linear Discriminant Analysis (LDA)

Type: Linear, supervised.

Concept: Projects data to a lower-dimensional space that maximizes class separability. Unlike PCA,
LDA considers class labels.
Use Case: Classification problems.
Steps:
1. Compute within-class and between-class scatter matrices.
2. Compute eigenvalues and eigenvectors for the scatter matrices.
3. Choose eigenvectors corresponding to the largest eigenvalues.

4. Autoencoders

Type: Non-linear, unsupervised, deep learning-based.

Concept: Neural networks that learn a compressed representation of the input by reconstructing
it. The hidden layer acts as the reduced-dimensional representation.
Use Case: Feature extraction in complex datasets like images and time-series data.
Architecture:
Encoder: Compresses the input.
Bottleneck: Stores the reduced representation.
Decoder: Reconstructs the input from the compressed form.

5. Singular Value Decomposition (SVD)

Type: Linear, unsupervised.

Concept: Factorizes a matrix into three components: U , Σ, and V T , where Σ contains singular
values that determine the importance of each dimension.
Use Case: Document-term matrices in text processing (e.g., Latent Semantic Analysis).
Mathematical Form: A = U ΣV T

6. Kernel PCA

Type: Non-linear, unsupervised.

Concept: Extends PCA to handle non-linear data by applying the kernel trick (maps data to a
higher-dimensional space where it is linearly separable).
Use Case: Datasets with complex, non-linear relationships.

7. Uniform Manifold Approximation and Projection (UMAP)

Type: Non-linear, unsupervised.

Concept: Constructs a graph to preserve local and global data structure. It emphasizes both global
and local distances better than t-SNE.
Use Case: Data visualization and clustering.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/7
Benefits of Dimensionality Reduction

Improved Performance: Reduces noise and simplifies models.

Visualization: Enables plotting high-dimensional data.
Faster Computation: Reduces the complexity of training machine learning algorithms.
Avoid Overfitting: Reduces the risk of fitting noise by eliminating irrelevant features.

Challenges

Loss of information.
Choosing the right number of dimensions.
Computational expense for large datasets (non-linear techniques).

Would you like an example or a deeper dive into any specific algorithm?

Explain PCA.

Principal Component Analysis (PCA)

PCA is a popular linear dimensionality reduction technique used to transform a dataset with many
correlated features into a smaller set of uncorrelated features while retaining as much of the original
variance as possible. These new features are called principal components.

Key Ideas Behind PCA

1. Maximize Variance: The principal components are computed to capture the maximum variance in
the data.
2. Orthogonality: The principal components are mutually orthogonal (uncorrelated).
3. Feature Transformation: PCA transforms the original features into a new set of axes ranked by
their significance (variance explained).

How PCA Works

The process of PCA can be broken into the following steps:

1. Standardization of Data

Standardize the dataset to have a mean of 0 and a standard deviation of 1 for each feature.
This ensures that all features contribute equally, especially if they have different scales.

xij − μj
zij =

σj

2. Compute the Covariance Matrix

Calculate the covariance matrix to measure the relationship between features:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API.
1 3/7
1
Covariance Matrix = ⋅ (X T X)
n−1

The covariance matrix is a square, symmetric matrix where each element represents the
covariance between two features.

3. Eigen Decomposition

Perform eigen decomposition on the covariance matrix to compute eigenvalues and

eigenvectors:
Eigenvalues: Measure the variance explained by each principal component.
Eigenvectors: Represent the direction of the principal components.

4. Select Principal Components

Rank eigenvalues in descending order.

Select the top k eigenvectors (based on the largest eigenvalues) to form the transformation matrix.

5. Transform Data

Project the original data onto the new k -dimensional space:

Z =X ⋅W

where W is the matrix of eigenvectors corresponding to the top k eigenvalues.

Interpreting PCA Results

1. Explained Variance Ratio:

Indicates the proportion of the dataset's variance captured by each principal component.
Helps decide the number of components to retain.

Sum of top k eigenvalues

Explained Variance Ratio for k components:
Sum of all eigenvalues

2. Principal Components:
Represent linear combinations of the original features and are uncorrelated.

Advantages of PCA

1. Dimensionality Reduction: Reduces complexity while retaining most of the variance.

2. De-correlation: Eliminates multicollinearity between features.
3. Visualization: Enables plotting high-dimensional data in 2D or 3D.

Limitations of PCA

1. Linear Assumption: Assumes linear relationships between features, which may not always hold.
2. Interpretability: Principal components are combinations of original features, which can make
interpretation difficult.
3. Data Scaling Dependency: Sensitive to the scale of features; standardization is necessary.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/7
4. Variance Loss: Some variance is always lost, especially if too few components are retained.

Applications of PCA

1. Image Compression: Reduces the dimensionality of image data while preserving visual quality.
2. Preprocessing for Machine Learning: Simplifies datasets, especially for algorithms sensitive to
multicollinearity.
3. Data Visualization: Projects high-dimensional datasets into 2D or 3D for visual inspection.
4. Genomics: Reduces dimensions in large datasets like gene expression profiles.

Would you like a practical example, like using Python to perform PCA, or an explanation of how PCA
compares with other dimensionality reduction techniques?

Explain LDA

Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction technique used in

classification problems. Unlike PCA, which focuses on capturing the maximum variance in the data, LDA
aims to maximize class separability by finding a lower-dimensional space where the distinction between
different classes is preserved.

Key Objectives of LDA

1. Maximize Between-Class Variance: Ensures classes are as distinct as possible.

2. Minimize Within-Class Variance: Ensures data points within a class are tightly clustered.
3. Linear Projection: Projects the data into a new space defined by linear discriminants.

How LDA Works

Step 1: Compute the Mean Vectors

Calculate the mean vector for each class in the dataset:

1
μk =
∑ xi

i∈Ck

where μk is the mean of class k , Nk is the number of samples in class k , and xi are the feature

vectors.
Compute the overall mean of the dataset:

N
1
μ= ∑ xi

N
i=1
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/7
Step 2: Compute the Scatter Matrices

Within-Class Scatter Matrix SW : Measures the spread of data points within each class:

K
SW = ∑ ∑ (xi − μk )(xi − μk )T

k=1 i∈Ck

Between-Class Scatter Matrix SB : Measures the separation between class means:

K
SB = ∑ Nk (μk − μ)(μk − μ)T

k=1

Step 3: Compute the Linear Discriminants

Solve the generalized eigenvalue problem:

−1
SW SB v = λv

where v is the eigenvector and λ is the eigenvalue.

Select the eigenvectors corresponding to the largest eigenvalues to form the transformation
matrix.

Step 4: Project the Data

Project the data onto the new lower-dimensional space:

Z =X ⋅W

where W contains the top eigenvectors (linear discriminants).

Key Differences Between LDA and PCA

Aspect PCA LDA

Supervision Unsupervised Supervised

Objective Maximize variance Maximize class separability

−1
Components Eigenvectors of covariance matrix Eigenvectors of SW SB

Output Uncorrelated components Components that enhance class separability

Advantages of LDA

1. Class Separability: Optimized for classification tasks by focusing on inter-class distinction.

2. Noise Reduction: Reduces irrelevant variations in data while preserving class-specific information.
3. Dimensionality Reduction: Projects data to min(K − 1, d) dimensions, where K is the number
of classes and d is the original number of features.

Limitations of LDA

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/7
1. Linear Assumption: Assumes features and class distributions are linearly separable.
2. Normality Assumption: Assumes features within each class follow a Gaussian distribution.
3. Overfitting: May overfit when the number of features exceeds the number of samples.

Applications of LDA

1. Classification: Enhances performance in classification tasks, such as face recognition and text
categorization.
2. Preprocessing for Supervised Learning: Reduces dimensionality before applying models like SVM
or logistic regression.
3. Biomedical Data Analysis: Distinguishes between groups in gene expression or medical diagnosis
datasets.

Would you like a code example or a comparison with other algorithms?

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/7

StoryBoard Artist 5.1 Full Software Cracked Key Download Pc/mac
No ratings yet
StoryBoard Artist 5.1 Full Software Cracked Key Download Pc/mac
4 pages
Smart Waste Management System: Software Engineering
No ratings yet
Smart Waste Management System: Software Engineering
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Machine Unit4
No ratings yet
Machine Unit4
55 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
DS-VIVA
No ratings yet
DS-VIVA
4 pages
program-3
No ratings yet
program-3
7 pages
cheat sheet
No ratings yet
cheat sheet
2 pages
ML ch1
No ratings yet
ML ch1
19 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Mloa Exp2 C121
No ratings yet
Mloa Exp2 C121
20 pages
ML Questions Answer Q1
No ratings yet
ML Questions Answer Q1
79 pages
ML Mod 6
No ratings yet
ML Mod 6
5 pages
Apznzazhdljcco08e5denxdpmwyo3o0bbbl Avbpxuleoshb0su5nxvmc0kmm Nedtetebi8yzcpkitljoqgvxy2bm9 h7lf4pttnwfomnaaiuzkwez3ngcw8tojl 2 Mqyh57ajl0gsdcgvi7 Zyq2peekpbhxfc8bwvklrk40yokucqdffpuuvalsrcadb80ozuvpiug5 Vwbpc65kyeem2on3rtvppqicbjz71pp0ho0m
No ratings yet
Apznzazhdljcco08e5denxdpmwyo3o0bbbl Avbpxuleoshb0su5nxvmc0kmm Nedtetebi8yzcpkitljoqgvxy2bm9 h7lf4pttnwfomnaaiuzkwez3ngcw8tojl 2 Mqyh57ajl0gsdcgvi7 Zyq2peekpbhxfc8bwvklrk40yokucqdffpuuvalsrcadb80ozuvpiug5 Vwbpc65kyeem2on3rtvppqicbjz71pp0ho0m
25 pages
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
No ratings yet
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
36 pages
Aiml - 07 - 28
No ratings yet
Aiml - 07 - 28
4 pages
ML SELF UNIT 2
No ratings yet
ML SELF UNIT 2
20 pages
Module 3
No ratings yet
Module 3
41 pages
AAM UT-2 QB ANS
No ratings yet
AAM UT-2 QB ANS
29 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
Assignment4 - AnswerKey
No ratings yet
Assignment4 - AnswerKey
14 pages
Marketing Analytics Week-8 LAQ
No ratings yet
Marketing Analytics Week-8 LAQ
4 pages
ML 6
No ratings yet
ML 6
7 pages
TC-1 Final Answer Key
No ratings yet
TC-1 Final Answer Key
14 pages
21CS743 (2)
No ratings yet
21CS743 (2)
27 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
UNIT-4
No ratings yet
UNIT-4
79 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
ML QA
No ratings yet
ML QA
10 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Signature Verification Entailing Principal Component Analysis As A Feature Extractor
No ratings yet
Signature Verification Entailing Principal Component Analysis As A Feature Extractor
5 pages
AIML MODEL
No ratings yet
AIML MODEL
13 pages
Sanjey RS Lab
No ratings yet
Sanjey RS Lab
33 pages
Updated Lecture 13 Zainab
No ratings yet
Updated Lecture 13 Zainab
17 pages
ml1
No ratings yet
ml1
17 pages
Assignment
No ratings yet
Assignment
24 pages
DL Internal
No ratings yet
DL Internal
9 pages
Phase 2
No ratings yet
Phase 2
4 pages
ML Journal
No ratings yet
ML Journal
29 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
6 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
L10 ML
No ratings yet
L10 ML
5 pages
topic 2
No ratings yet
topic 2
10 pages
ML Interview
No ratings yet
ML Interview
17 pages
Phase 2
No ratings yet
Phase 2
4 pages
End Term + Mid Term
No ratings yet
End Term + Mid Term
54 pages
Unit 3,4 and 5
No ratings yet
Unit 3,4 and 5
5 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Presentation1
No ratings yet
Presentation1
15 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
29 pages
ML unit-4
No ratings yet
ML unit-4
17 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Aman Yadav
No ratings yet
Aman Yadav
30 pages
Myricom-DBL-v3.1.6.52817-API-manual-2019-08-04
No ratings yet
Myricom-DBL-v3.1.6.52817-API-manual-2019-08-04
46 pages
Neural Networks
No ratings yet
Neural Networks
21 pages
Creds
100% (2)
Creds
6 pages
Accenture Offer Letter Recinded: Select Download Format
0% (1)
Accenture Offer Letter Recinded: Select Download Format
3 pages
NVMS 2.1.4 Pro User Manual
No ratings yet
NVMS 2.1.4 Pro User Manual
190 pages
Dbms For Ba With Analytics Workbook v2023
No ratings yet
Dbms For Ba With Analytics Workbook v2023
39 pages
Group I Discrete Mathematics
No ratings yet
Group I Discrete Mathematics
4 pages
ACP - Revit For Electrical Design - 030221
No ratings yet
ACP - Revit For Electrical Design - 030221
4 pages
Design in Tech 2018
No ratings yet
Design in Tech 2018
88 pages
Vigueta y Bovedilla
No ratings yet
Vigueta y Bovedilla
12 pages
Guide For Activity No. 2
No ratings yet
Guide For Activity No. 2
2 pages
ZCU106 Evaluation Board User Guide - Ug1244-Zcu106-Eval-bd-1596082
No ratings yet
ZCU106 Evaluation Board User Guide - Ug1244-Zcu106-Eval-bd-1596082
135 pages
SSE4
No ratings yet
SSE4
2 pages
3 AI Annotation
No ratings yet
3 AI Annotation
34 pages
MR44 Datasheet
No ratings yet
MR44 Datasheet
20 pages
H3C UniServer R4900 G5技术白皮书
No ratings yet
H3C UniServer R4900 G5技术白皮书
83 pages
2021 1 Ict
No ratings yet
2021 1 Ict
20 pages
Reg - Mindfire Digital LLP Internship Cum PPO Recruitment Drive - 2025 Graduating Batch
No ratings yet
Reg - Mindfire Digital LLP Internship Cum PPO Recruitment Drive - 2025 Graduating Batch
2 pages
NLP Notes
No ratings yet
NLP Notes
16 pages
India Today 15 Aug 2022
No ratings yet
India Today 15 Aug 2022
138 pages
DOC-221970-D1 - MULTILINK Bally
No ratings yet
DOC-221970-D1 - MULTILINK Bally
66 pages
Midterm Quiz 2 - Attempt Review PDF
50% (2)
Midterm Quiz 2 - Attempt Review PDF
5 pages
Oracle 8i For Linux - A White Paper
No ratings yet
Oracle 8i For Linux - A White Paper
4 pages
Zero Conditional Exercise For 6º - Live Worksheets
No ratings yet
Zero Conditional Exercise For 6º - Live Worksheets
6 pages
Automatic Analysis of Syntactic Complexity in Second Language Writing
No ratings yet
Automatic Analysis of Syntactic Complexity in Second Language Writing
24 pages
KMS2164 2022 Course Outline
No ratings yet
KMS2164 2022 Course Outline
3 pages
Irfanview Tutorial
No ratings yet
Irfanview Tutorial
4 pages

Dimensionality Reduction Algorithms

Uploaded by

Dimensionality Reduction Algorithms

Uploaded by

Dimensionality Reduction Algorithms

Explain Dimensionality Reduction Algorithms.

Dimensionality Reduction Algorithms

Types of Dimensionality Reduction

Popular Dimensionality Reduction Algorithms

1. Principal Component Analysis (PCA)

Type: Linear, unsupervised.

2. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Type: Non-linear, unsupervised.

3. Linear Discriminant Analysis (LDA)

Type: Linear, supervised.

Type: Non-linear, unsupervised, deep learning-based.

5. Singular Value Decomposition (SVD)

Type: Linear, unsupervised.

Type: Non-linear, unsupervised.

7. Uniform Manifold Approximation and Projection (UMAP)

Type: Non-linear, unsupervised.

Improved Performance: Reduces noise and simplifies models.

Principal Component Analysis (PCA)

Key Ideas Behind PCA

How PCA Works

The process of PCA can be broken into the following steps:

2. Compute the Covariance Matrix

Calculate the covariance matrix to measure the relationship between features:

Perform eigen decomposition on the covariance matrix to compute eigenvalues and

4. Select Principal Components

Rank eigenvalues in descending order.

Project the original data onto the new k -dimensional space:

where W is the matrix of eigenvectors corresponding to the top k eigenvalues.

Interpreting PCA Results

1. Explained Variance Ratio:

Sum of top k eigenvalues

1. Dimensionality Reduction: Reduces complexity while retaining most of the variance.

Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction technique used in

Key Objectives of LDA

1. Maximize Between-Class Variance: Ensures classes are as distinct as possible.

How LDA Works

Step 1: Compute the Mean Vectors

Calculate the mean vector for each class in the dataset:

Between-Class Scatter Matrix SB : Measures the separation between class means:

Step 3: Compute the Linear Discriminants

Solve the generalized eigenvalue problem:

where v is the eigenvector and λ is the eigenvalue.

Step 4: Project the Data

Project the data onto the new lower-dimensional space:

where W contains the top eigenvectors (linear discriminants).

Key Differences Between LDA and PCA

Aspect PCA LDA

Objective Maximize variance Maximize class separability

Output Uncorrelated components Components that enhance class separability

1. Class Separability: Optimized for classification tasks by focusing on inter-class distinction.

Would you like a code example or a comparison with other algorithms?

You might also like