0% found this document useful (0 votes)

17 views

PCA - Feb 8

Uploaded by

nidhinb200723cs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

PCA - Feb 8

Uploaded by

nidhinb200723cs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Dimensionality Reduction

- Principal Component Analysis

1
Jayaraj P B
Outline
1. Dimension/Features in ML
2. The Curse of Dimensionality
3. Dimensionality Reduction
4. Methods for DR
5. PCA – Overview
6. Steps of PCA
7. Problem Solving
Features
•In machine learning, a feature, also known as a predictor, attribute,
or input variable, refers to an individual measurable property or
characteristic of the data that is used as input to a machine learning
model.
•Features represent the various dimensions or aspects of the data
that the model will consider when making predictions or
classifications.

•Feature engineering is the process of selecting, transforming, and

creating new features from the raw data to improve the performance
of the machine learning model.

•In summary, features in machine learning represent the input

variables or attributes that the model learns from to make
predictions or classifications.
Dimension
• The term "dimension" typically refers to the number of
features or variables that are used to represent each data
point in a dataset. It essentially represents the number of
columns or attributes in the dataset.

• The number of input features, variables, or columns present in

a given dataset is known as dimensionality

• In many machine learning applications, datasets can have a

large number of features, resulting in high-dimensional data.

• High-dimensional data can present challenges such as

increased computational complexity, the curse of
dimensionality, and difficulties in visualization and
interpretation.
The Curse of Dimensionality
• Handling the high-dimensional data is very difficult in practice,
commonly known as the curse of dimensionality.

• If the dimensionality of the input dataset increases, any machine

learning algorithm and model becomes more complex.

• As the number of features increases, the number of samples also

gets increased proportionally, and the chance of overfitting also
increases.

• If the machine learning model is trained on high-dimensional

data, it becomes overfitted and results in poor performance.
Dimensionality reduction
 Dimensionality reduction is the process of reducing the number
of features (or dimensions) in a dataset while retaining as much
information as possible.

 In other words, it is a process of transforming high-dimensional

data into a lower-dimensional space that still preserves the
essence of the original data.

 This can be done for a variety of reasons, such as

- to reduce the complexity of a model,

- to improve the performance of a learning algorithm, or
- to make it easier to visualize the data.
• In addition, high-dimensional data can also lead to overfitting,
where the model fits the training data too closely and does not
generalize well to new data

• Dimensionality reduction can help to mitigate these problems by

reducing the complexity of the model and improving its
generalization performance.

• There are two main approaches to dimensionality reduction:

- feature selection and
- feature extraction.
Feature Selection
• Feature selection involves selecting a subset of the original
features that are most relevant to the problem at hand.

• The goal is to reduce the dimensionality of the dataset while

retaining the most important features.

• There are several methods for feature selection

Forward Selection
Backward Selection
Bi-directional Elimination
Filters
Wrappers
Feature Extraction
• Feature extraction involves creating new features by combining or
transforming the original features.

• The goal is to create a set of features that captures the essence of

the original data in a lower-dimensional space.

• There are several methods for feature extraction, including

principal component analysis (PCA), linear discriminant analysis
(LDA), and t-distributed stochastic neighbor embedding (t-SNE).

• PCA is a popular technique that projects the original features

onto a lower-dimensional space while preserving as much of the
variance as possible.
Data Compression
Reduce data from
(inches)

2D to 1D

(cm)
Data Compression
(inches)

Reduce data from

(cm) 2D to 1D

Andrew Ng
Principal Component Analysis
• This method was introduced by Karl Pearson.

• It works on the condition that while the data in a higher

dimensional space is mapped to data in a lower dimension space,
the variance of the data in the lower dimensional space should be
maximum.
Principal Component Analysis: one
attribute first Temperature
42
40
24
Question: how much spread is in 30
the data along the axis? (distance 15
to the mean) 18
15
30
Variance=Standard deviation^2 15
30
n

(X
35
i  X) 2
30
s2  i 1 40
(n  1) 30

13
Covariance
Measure of the “spread” of a set of points around their
center of mass(mean)
Variance:
Measure of the deviation from the mean for points in one
dimension
Covariance:
Measure of how much each of the dimensions vary from
the mean with respect to each other

• Covariance is measured between two dimensions

• Covariance sees if there is a relation between two dimensions
• Covariance between one dimension is the variance
Covariance
Used to find relationships between dimensions in high dimensional
data sets

The Sample mean

Now consider two dimensions
X=Temperature Y=Humidity
40 90
Covariance: measures the 40 90
correlation between X and Y 40 90
30 90
15 70
15 70
15 70
n

(X
i 1
i  X )(Yi  Y ) 30
15
90
70
cov( X , Y ) 
(n  1) 30 70
30 70
30 90
40 70
30 90

16
More than two attributes: covariance matrix

Contains covariance values between all possible dimensions

(=attributes):

C nxn  (cij | cij  cov( Dimi , Dim j ))

Example for three attributes (x,y,z):

 cov( x, x) cov( x, y ) cov( x, z ) 

 
C   cov( y, x) cov( y, y ) cov( y, z ) 
 cov( z , x) cov( z , y ) cov( z , z ) 
 
17
Eigenvalues & eigenvectors

Vectors x having same direction as Ax are called

eigenvectors of A (A is an n by n matrix).
In the equation Ax=x,  is called an eigenvalue of A.

 2 3   3  12   3
  x      4 x 
 2 1  2  8   2

18
Eigenvalues & eigenvectors

Ax=x  (A-I)x=0

How to calculate x and :

• Calculate det(A-I), yields a polynomial (degree n)
• Determine roots to det(A-I)=0, roots are
eigenvalues 
• Solve (A- I) x=0 for each  to obtain eigenvectors x

19
Eigenvector and Eigenvalue
Ax - λx = 0
Ax = λx
(A – λI)x = 0

If we define a new matrix B:

B = A – λI
Bx = 0

x will be an eigenvector of A if and only if B does

not have an inverse, or equivalently det(B)=0 :

det(A – λI) = 0
Eigenvector and Eigenvalue
Example 1: Find the eigenvalues of
 2 12
I  A   (  2)(  5)  12
1  5  2  12
A 
 2  3  2  (  1)(   2) 1  5 
two eigenvalues: 1,  2

Note: The roots of the characteristic equation can be repeated. That is, λ1 = λ2 =…= λk. If
that happens, the eigenvalue is said to be of multiplicity k.
Example 2: Find the eigenvalues of

 2 1 0
2 1 0
I  A  0  2 0  (  2)3  0
A  0 2 0
0 0  2
0 0 2

λ = 2 is an eigenvector of multiplicity 3.
Principal Component Analysis

Y2
x
x
x
Note: Y1 is the x xx
x x
first eigen vector, x
x x
x
x
Y2 is the second. x x
Y2 ignorable. x
x x x X1
x x Key observation:
x x
x x variance = largest!

22
Principal components
1. principal component (PC1)
The eigenvalue with the largest absolute value will
indicate that the data have the largest variance along
its eigenvector, the direction along which there is
greatest variation
2. principal component (PC2)
the direction with maximum variation left in data,
orthogonal to the PC1.

In general, only few directions manage to capture most

of the variability in the data.

23
How (PCA) Work
1. Standardize the Data: If the features of your dataset are on
different scales, it’s essential to standardize them (subtract the
mean and divide by the standard deviation).
2. Compute the Covariance Matrix: Calculate the covariance matrix
for the standardized dataset.
3. Compute Eigenvectors and Eigenvalues: The eigenvectors
represent the directions of maximum variance, and the
corresponding eigenvalues indicate the magnitude of variance along
those directions.
4. Sort Eigenvectors by Eigenvalues: in descending order
5. Choose Principal Components: Select the top k eigenvectors
(principal components) where k is the desired dimensionality of the
reduced dataset.
6. Transform the Data: Multiply the original standardized data by
the selected principal components to obtain the new, lower-
dimensional representation of the data
Transformed Data
• Eigenvalues j corresponds to variance on each
component j
• Thus, sort by j
• Take the first p eigenvectors ei; where p is the number of
top eigenvalues
• These are the directions with the largest variances
 yi1   e1  xi1  x1 
    
 yi 2   e2  xi 2  x2 
 ...    ...  
    ... 
 y   e  x  x 
 ip   p  in n  25
Advantages of Dimensionality Reduction
• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.
• Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D
or 3D
• Overfitting Prevention: High dimensional data may lead to overfitting in
machine learning models, which can lead to poor generalization
performance.
• Feature Extraction: Dimensionality reduction can help in extracting
important features from high dimensional data, which can be useful in
feature selection for machine learning models.
• Data Pre-processing: Dimensionality reduction can be used as a pre-
processing step before applying machine learning algorithms
• Improved Performance: It reduces the complexity of the data, and hence
reducing the noise and irrelevant information in the data.
Disadvantages of Dimensionality Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is
sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to
define datasets.
• Interpretability: The reduced dimensions may not be easily
interpretable, and it may be difficult to understand the
relationship between the original features and the reduced
dimensions.
• Overfitting: In some cases, dimensionality reduction may lead to
overfitting, especially when the number of components is chosen
based on the training data.
• Sensitivity to outliers: Some dimensionality reduction techniques
are sensitive to outliers, which can result in a biased
representation of the data.
Important points:
• Dimensionality reduction is the process of reducing the number
of features in a dataset while retaining as much information as
possible.
• This can be done to reduce the complexity of a model, improve the performance
of a learning algorithm, or make it easier to visualize the data.

• Techniques for dimensionality reduction include: principal

component analysis (PCA), singular value decomposition (SVD),
and linear discriminant analysis (LDA).
• Each technique projects the data onto a lower-dimensional space
while preserving important information.
• Dimensionality reduction is performed during pre-processing
stage before building a model to improve the performance
• It is important to note that dimensionality reduction can also
discard useful information, so care must be taken when applying
these techniques.

Math Makes Sense Practice and Homework Book Grade 5 Answer Key
100% (1)
Math Makes Sense Practice and Homework Book Grade 5 Answer Key
7 pages
Accounts Payable Notes
100% (7)
Accounts Payable Notes
12 pages
Manjit Kaur A/P Bant Singh at Jai Singh: Your Bill Statement / Penyata Bil Anda
No ratings yet
Manjit Kaur A/P Bant Singh at Jai Singh: Your Bill Statement / Penyata Bil Anda
5 pages
G.o.no12 Amended To Go 59 by Telangana Govt.
80% (10)
G.o.no12 Amended To Go 59 by Telangana Govt.
2 pages
CatalogoPGT25 PDF
100% (5)
CatalogoPGT25 PDF
4 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Unit 3
No ratings yet
Unit 3
102 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Dimensonality Reduction
No ratings yet
Dimensonality Reduction
25 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
No ratings yet
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
32 pages
ML Unit 4 @ VS
No ratings yet
ML Unit 4 @ VS
33 pages
Dimension Reduction
No ratings yet
Dimension Reduction
38 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
3
No ratings yet
3
12 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Module 3
No ratings yet
Module 3
41 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
33 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Unit-3
No ratings yet
Unit-3
28 pages
Lecture4-Dimensionality Reduction Methods
No ratings yet
Lecture4-Dimensionality Reduction Methods
40 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Monograph PCA-FA Final Version
No ratings yet
Monograph PCA-FA Final Version
40 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
ML Unit 2 Part -2
No ratings yet
ML Unit 2 Part -2
6 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
35 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Lecture 9 -Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 -Data Prep - Reduction - PCA-M
44 pages
5-dimension reduction
No ratings yet
5-dimension reduction
48 pages
DR Pca
No ratings yet
DR Pca
22 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Pac
No ratings yet
Pac
70 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
Cascading Style Sheets
No ratings yet
Cascading Style Sheets
18 pages
Javascript 1
No ratings yet
Javascript 1
16 pages
HTML Forms
No ratings yet
HTML Forms
19 pages
HTML
No ratings yet
HTML
24 pages
Tips for Filling Out the PSC
No ratings yet
Tips for Filling Out the PSC
4 pages
Leave and License Agreement: Particulars Amount Paid GRN/Transaction Id Date
No ratings yet
Leave and License Agreement: Particulars Amount Paid GRN/Transaction Id Date
5 pages
United States Bankruptcy Court Southern District of New York
No ratings yet
United States Bankruptcy Court Southern District of New York
3 pages
Digital Relays
0% (1)
Digital Relays
12 pages
Acquiring Super User Status
100% (1)
Acquiring Super User Status
2 pages
Installation Instructi NS: 17 Touring/ 18+ Softail VO2 Air Intake 41061/71061/41053/40049
No ratings yet
Installation Instructi NS: 17 Touring/ 18+ Softail VO2 Air Intake 41061/71061/41053/40049
5 pages
DRRR Chapter 1
No ratings yet
DRRR Chapter 1
17 pages
University of The East vs. Jader G.R. No. 132344
No ratings yet
University of The East vs. Jader G.R. No. 132344
7 pages
Gas Dynamics and Jet Propulsion - P. Murugaperumal
No ratings yet
Gas Dynamics and Jet Propulsion - P. Murugaperumal
224 pages
Lecture 2 - Analysis of Cables
No ratings yet
Lecture 2 - Analysis of Cables
25 pages
BMW M54-M52tu Cam Replacement DIY: Important! Read This Before Using These Instructions!
No ratings yet
BMW M54-M52tu Cam Replacement DIY: Important! Read This Before Using These Instructions!
13 pages
EVEN sem 2024-25 Subject allocatioN & workload
No ratings yet
EVEN sem 2024-25 Subject allocatioN & workload
4 pages
FRICK_et_al_2015
No ratings yet
FRICK_et_al_2015
16 pages
PHI 17A-Dec 2017
No ratings yet
PHI 17A-Dec 2017
170 pages
Report
No ratings yet
Report
7 pages
Access Dynamic Business Law The Essentials 4th Edition Kubasek Solutions Manual All Chapters Immediate PDF Download
100% (4)
Access Dynamic Business Law The Essentials 4th Edition Kubasek Solutions Manual All Chapters Immediate PDF Download
41 pages
Unit 5 Postal and Shipping Services Presentation
No ratings yet
Unit 5 Postal and Shipping Services Presentation
42 pages
Admissions Policy 2024-25
No ratings yet
Admissions Policy 2024-25
7 pages
RT9011
No ratings yet
RT9011
12 pages
Travel Agencies Contact Info
100% (1)
Travel Agencies Contact Info
4 pages
MODULE 4 Open Source
No ratings yet
MODULE 4 Open Source
11 pages
C1 - M2 - Long-Term Debt Instruments+answer Key
No ratings yet
C1 - M2 - Long-Term Debt Instruments+answer Key
5 pages
ASTM E92 Vickers Hardness
No ratings yet
ASTM E92 Vickers Hardness
27 pages
E33 Preise e 011014
No ratings yet
E33 Preise e 011014
1 page
Cap & Floor
No ratings yet
Cap & Floor
17 pages