0% found this document useful (0 votes)
177 views

Cheat Sheet: - PCA Dimensionality Reduction

The document provides an overview of principal component analysis (PCA) for dimensionality reduction, including what PCA is, the steps to perform it, and how to reduce dimensions by keeping the first few principal components. PCA finds orthogonal vectors that maximize the variance of the data and rates them in decreasing order of variance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views

Cheat Sheet: - PCA Dimensionality Reduction

The document provides an overview of principal component analysis (PCA) for dimensionality reduction, including what PCA is, the steps to perform it, and how to reduce dimensions by keeping the first few principal components. PCA finds orthogonal vectors that maximize the variance of the data and rates them in decreasing order of variance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Cheat Sheet – PCA Dimensionality Reduction

What is PCA?
• Based on the dataset find a new set of orthogonal feature vectors in such a way that the
data spread is maximum in the direction of the feature vector (or dimension)
• Rates the feature vector in the decreasing order of data spread (or variance)
• The datapoints have maximum variance in the first feature vector, and minimum variance
in the last feature vector
• The variance of the datapoints in the direction of feature vector can be termed as a
measure of information in that direction.
Steps
1. Standardize the datapoints
2. Find the covariance matrix from the given datapoints
3. Carry out eigen-value decomposition of the covariance matrix
4. Sort the eigenvalues and eigenvectors

Dimensionality Reduction with PCA


• Keep the first m out of n feature vectors rated by PCA. These m vectors will be the best m
vectors preserving the maximum information that could have been preserved with m
vectors on the given dataset
Steps:
1. Carry out steps 1-4 from above
2. Keep first m feature vectors from the sorted eigenvector matrix
3. Transform the data for the new basis (feature vectors)
4. The importance of the feature vector is proportional to the magnitude of the eigen value

Figure 1 Figure 2
Feature # 1 (F1)

FeFeature # 1

Variance
Variance

1
e#

2
ur

#
re
at

atu
Fe
ew

w
Ne
N

F2 F1 Feature # 2 (F2) Feature # 2 F2 F1

Figure 3 Figure 1: Datapoints with feature vectors as


x and y-axis
Figure 2: The cartesian coordinate system is
rotated to maximize the standard deviation
Variance
Fe Feature # 1

along any one axis (new feature # 2)


1
e#

2 Figure 3: Remove the feature vector with


ur

e# minimum standard deviation of datapoints


ur
at

at
Fe F2 F2 (new feature # 1) and project the data on
w
w

Ne
Ne

Feature # 2 new feature # 2

Source: https://www.cheatsheets.aqeel-anwar.com

You might also like