Unit_5(Dimensionality_Reduction)

The document discusses dimensionality reduction techniques, emphasizing the importance of feature selection methods such as filter, wrapper, and embedded methods. It highlights Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as key techniques for reducing data dimensionality while retaining information. Additionally, it provides tips for effective feature selection, including leveraging domain knowledge and using regularization techniques.

Uploaded by

Smrutee Behera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Unit_5(Dimensionality_Reduction)

Uploaded by

Smrutee Behera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

• The number of input features, variables, or columns present in a given

dataset is known as dimensionality, and the process to reduce these features

is called dimensionality reduction.

• A dataset contains a huge number of input features in various cases, which

makes the predictive modeling task more complicated, for such cases,
dimensionality reduction techniques are required to use.
Wrapper Method
Embedded Method:

• Feature selection using the embedded method combines the

advantages of both filter and wrapper methods.
• In this approach, feature selection occurs as part of the model
training process.
• The model itself selects the most relevant features during training,
based on their contribution to model performance.
• This method is particularly efficient and effective when working
with regularization techniques.
Feature Selection Methods: Useful Tricks & Tips

Here are some useful tricks and tips for feature selection:
• Understand Your Data: Before selecting features, thoroughly understand your
dataset. Know the domain and the relationships between different features.
• Filter Methods: Use statistical measures like correlation, chi-square, or mutual
information to rank features based on their relevance to the target variable.
• Wrapper Methods: Employ algorithms like Recursive Feature Elimination (RFE) or
Forward/Backward Selection, which select subsets of features based on the
performance of a specific machine learning algorithm.
• Embedded Methods: Some machine learning algorithms inherently perform feature
selection during training. Examples include LASSO (L1 regularization) and tree-
based methods like Random Forests.
*LASSO (Least Absolute Shrinkage and Selection Operator) regression is a type of linear regression that
incorporates regularization to prevent overfitting and enhance model interpretability.
• Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) or
t-distributed Stochastic Neighbor Embedding (t-SNE) can reduce the dimensionality of
your data while retaining most of the information.
• Feature Importance: For tree-based algorithms like Random Forest or Gradient
Boosting Machines (GBM), you can use the built-in feature importance attribute to
select the most important features.
• Domain Knowledge: Leverage domain expertise to identify features that are likely to
be important. Sometimes, features that seem irrelevant on the surface might be crucial
when considering domain-specific insights.
• Regularization: Regularization techniques like LASSO (L1 regularization) penalize
the absolute size of the coefficients, effectively performing feature selection by driving
some coefficients to zero.
• Cross-Validation: Perform feature selection within each fold of cross-validation to
ensure that your feature selection process is not biased by the specific dataset splits.
• Ensemble Methods: Combine the results of multiple feature selection methods to get a
more robust set of selected features.
• This feature extraction method reduces the dimensionality of large
data sets while preserving the maximum amount of information.

• Principal Component Analysis emphasizes variation and captures

important patterns and relationships between variables in the
dataset.
How Does Principal Component Analysis (PCA)
Work?
• In general, all the features are not equally important and there are certain
features that account for a large percentage of variance in the dataset.
• The motivation behind the PCA algorithm is that there are certain features
that capture a large percentage of variance in the original dataset.
• So it's important to find the directions of maximum variance in the
dataset.
• These directions are called principal components.
• And PCA is essentially a projection of the dataset onto the principal
components.
• So how do we find the principal components?
What is Covariance Matrix?

• The variance-covariance matrix is a square matrix with diagonal elements that represent
the variance and the non-diagonal components that express covariance.
• The covariance of a variable can take any real value- positive, negative, or zero.
• A positive covariance suggests that the two variables have a positive relationship, whereas a
negative covariance indicates that they do not.
• If two elements do not vary together, they have a zero covariance.
Example: Find the covariance matrix
Example: 2 Find the Eigen Values and Eigen Vector for 3 X 3 Matrix

𝟐 𝟏 𝟑
𝑨= 𝟏 𝟐 𝟑
𝟑 𝟑 𝟐𝟎

Sol:
• Singular Value Decomposition is a matrix factorization technique widely used in various
applications, including linear algebra, signal processing, and machine learning.
• It decomposes a matrix into three other matrices, allowing for the representation of the
original matrix in a reduced form.
Decomposition of Matrix:

Given a matrix M of size m x n (or a data frame with m rows and n columns),
SVD decomposes it into three matrices:

M = U *Σ *Vᵗ,

where U is an m x m orthogonal matrix, Σ is an m x r diagonal matrix, and

V is an r x n orthogonal matrix.
r is the rank of the matrix M
• The diagonal elements of Σ are the singular values of the original matrix M,
and they are arranged in descending order.
• The columns of U are the left singular vectors of M. These vectors form an
orthogonal basis for the column space of M.
• The columns of V are the right singular vectors of M.
In summary,
• PCA is suitable for unsupervised dimensionality reduction,
• LDA is effective for supervised problems with a focus on class separability, and
• SVD is versatile, catering to various applications including collaborative filtering and matrix
factorization.
The choice depends on the nature of your data and the goals of your analysis.

Sample Club Constitution and Bylaws Guidelines
67% (3)
Sample Club Constitution and Bylaws Guidelines
4 pages
Graduation Songs and Poems
No ratings yet
Graduation Songs and Poems
12 pages
Day School 03
No ratings yet
Day School 03
32 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
ML Notes.docx
No ratings yet
ML Notes.docx
15 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Lesson 7-Feature Selection and Principal Component Analysis
No ratings yet
Lesson 7-Feature Selection and Principal Component Analysis
24 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Unit 3
No ratings yet
Unit 3
50 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Unit-3
No ratings yet
Unit-3
28 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Lecture 15_23.09.2024_ Feature Selection
No ratings yet
Lecture 15_23.09.2024_ Feature Selection
47 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
PDSLabManualEXP7.docx (2)
No ratings yet
PDSLabManualEXP7.docx (2)
6 pages
6 Dimension Reduction Theory
No ratings yet
6 Dimension Reduction Theory
18 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
ML Unit 2 Part -2
No ratings yet
ML Unit 2 Part -2
6 pages
Presentation1
No ratings yet
Presentation1
15 pages
ML UNIT IV PART I
No ratings yet
ML UNIT IV PART I
11 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimensionality reduction
No ratings yet
Dimensionality reduction
7 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Sta 5
No ratings yet
Sta 5
16 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
Script
No ratings yet
Script
5 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Unit 3
No ratings yet
Unit 3
102 pages
Multivariate
100% (1)
Multivariate
78 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
Module-2 C3-C4
No ratings yet
Module-2 C3-C4
66 pages
D3S2 _ Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 _ Unsupervised - Dimensionality Reduction
81 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Feature Engineering
No ratings yet
Feature Engineering
5 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Customer Satisfaction - Wikipedia, The Free Encyclopedia
No ratings yet
Customer Satisfaction - Wikipedia, The Free Encyclopedia
9 pages
Scatter (C1: Cohesion (kN/m2) vs. C1: Phi (Deg) )
No ratings yet
Scatter (C1: Cohesion (kN/m2) vs. C1: Phi (Deg) )
2 pages
1st Year Time Table W.E.F 10.9.2024
No ratings yet
1st Year Time Table W.E.F 10.9.2024
10 pages
Culture SpecificPsychotherapyEthiopia Janetius
No ratings yet
Culture SpecificPsychotherapyEthiopia Janetius
17 pages
Geodesign With Little Time and Small Data
No ratings yet
Geodesign With Little Time and Small Data
13 pages
Merged_Lesson_Plan_and_Activity_Final
No ratings yet
Merged_Lesson_Plan_and_Activity_Final
7 pages
Introduction and Background of Joseph Andrews
No ratings yet
Introduction and Background of Joseph Andrews
2 pages
Predictive Maintenance of Railway Point Machine Using Machine Learning Algorithm
No ratings yet
Predictive Maintenance of Railway Point Machine Using Machine Learning Algorithm
3 pages
Biology U7
No ratings yet
Biology U7
8 pages
SEVEN LAMPS OF ADVOCACY – LAW Notes
No ratings yet
SEVEN LAMPS OF ADVOCACY – LAW Notes
13 pages
OCD - Thinking Bad Thoughts
No ratings yet
OCD - Thinking Bad Thoughts
5 pages
2024 GR 10 INVESTIGATION MG
No ratings yet
2024 GR 10 INVESTIGATION MG
5 pages
Experiments 5 Qualitative Analysis of Cation Group 1: Objective
No ratings yet
Experiments 5 Qualitative Analysis of Cation Group 1: Objective
7 pages
Bank Kata Kerja Buat CV Kamu
No ratings yet
Bank Kata Kerja Buat CV Kamu
5 pages
CSC649 Group Project and Presentation
No ratings yet
CSC649 Group Project and Presentation
4 pages
DM_s2025_159
No ratings yet
DM_s2025_159
8 pages
Optimism Superstition
No ratings yet
Optimism Superstition
10 pages
Nature and Purpose of Communication
75% (4)
Nature and Purpose of Communication
32 pages
Social Media in The Classroom A Literature Review
No ratings yet
Social Media in The Classroom A Literature Review
6 pages
A Journey To Highly Effective Living
No ratings yet
A Journey To Highly Effective Living
4 pages
2025 Grade 12 Project Question Paper-1
No ratings yet
2025 Grade 12 Project Question Paper-1
29 pages
1 Information Technology Lessons
No ratings yet
1 Information Technology Lessons
7 pages
Self-Control in School-Age Children: Educational Psychologist July 2014
No ratings yet
Self-Control in School-Age Children: Educational Psychologist July 2014
21 pages
Best Websites For Homework Help
100% (1)
Best Websites For Homework Help
6 pages
Kinder 2 3rd Monthly Exam 23 24
No ratings yet
Kinder 2 3rd Monthly Exam 23 24
11 pages
Hemophilia: Presented By, Mrs. Arifa T N Child Health Nursing Second Year M.SC Nursing Mims Con
100% (1)
Hemophilia: Presented By, Mrs. Arifa T N Child Health Nursing Second Year M.SC Nursing Mims Con
43 pages
Assignment
No ratings yet
Assignment
10 pages
Lesson 1 Introduction To Ethics
100% (2)
Lesson 1 Introduction To Ethics
2 pages