PCA - Feb 8
PCA - Feb 8
1
Jayaraj P B
Outline
1. Dimension/Features in ML
2. The Curse of Dimensionality
3. Dimensionality Reduction
4. Methods for DR
5. PCA – Overview
6. Steps of PCA
7. Problem Solving
Features
•In machine learning, a feature, also known as a predictor, attribute,
or input variable, refers to an individual measurable property or
characteristic of the data that is used as input to a machine learning
model.
•Features represent the various dimensions or aspects of the data
that the model will consider when making predictions or
classifications.
Forward Selection
Backward Selection
Bi-directional Elimination
Filters
Wrappers
Feature Extraction
• Feature extraction involves creating new features by combining or
transforming the original features.
2D to 1D
(cm)
Data Compression
(inches)
Andrew Ng
Principal Component Analysis
• This method was introduced by Karl Pearson.
(X
35
i X) 2
30
s2 i 1 40
(n 1) 30
13
Covariance
Measure of the “spread” of a set of points around their
center of mass(mean)
Variance:
Measure of the deviation from the mean for points in one
dimension
Covariance:
Measure of how much each of the dimensions vary from
the mean with respect to each other
(X
i 1
i X )(Yi Y ) 30
15
90
70
cov( X , Y )
(n 1) 30 70
30 70
30 90
40 70
30 90
16
More than two attributes: covariance matrix
2 3 3 12 3
x 4 x
2 1 2 8 2
18
Eigenvalues & eigenvectors
Ax=x (A-I)x=0
19
Eigenvector and Eigenvalue
Ax - λx = 0
Ax = λx
(A – λI)x = 0
det(A – λI) = 0
Eigenvector and Eigenvalue
Example 1: Find the eigenvalues of
2 12
I A ( 2)( 5) 12
1 5 2 12
A
2 3 2 ( 1)( 2) 1 5
two eigenvalues: 1, 2
Note: The roots of the characteristic equation can be repeated. That is, λ1 = λ2 =…= λk. If
that happens, the eigenvalue is said to be of multiplicity k.
Example 2: Find the eigenvalues of
2 1 0
2 1 0
I A 0 2 0 ( 2)3 0
A 0 2 0
0 0 2
0 0 2
λ = 2 is an eigenvector of multiplicity 3.
Principal Component Analysis
X2
Y1
Y2
x
x
x
Note: Y1 is the x xx
x x
first eigen vector, x
x x
x
x
Y2 is the second. x x
Y2 ignorable. x
x x x X1
x x Key observation:
x x
x x variance = largest!
22
Principal components
1. principal component (PC1)
The eigenvalue with the largest absolute value will
indicate that the data have the largest variance along
its eigenvector, the direction along which there is
greatest variation
2. principal component (PC2)
the direction with maximum variation left in data,
orthogonal to the PC1.
23
How (PCA) Work
1. Standardize the Data: If the features of your dataset are on
different scales, it’s essential to standardize them (subtract the
mean and divide by the standard deviation).
2. Compute the Covariance Matrix: Calculate the covariance matrix
for the standardized dataset.
3. Compute Eigenvectors and Eigenvalues: The eigenvectors
represent the directions of maximum variance, and the
corresponding eigenvalues indicate the magnitude of variance along
those directions.
4. Sort Eigenvectors by Eigenvalues: in descending order
5. Choose Principal Components: Select the top k eigenvectors
(principal components) where k is the desired dimensionality of the
reduced dataset.
6. Transform the Data: Multiply the original standardized data by
the selected principal components to obtain the new, lower-
dimensional representation of the data
Transformed Data
• Eigenvalues j corresponds to variance on each
component j
• Thus, sort by j
• Take the first p eigenvectors ei; where p is the number of
top eigenvalues
• These are the directions with the largest variances
yi1 e1 xi1 x1
yi 2 e2 xi 2 x2
... ...
...
y e x x
ip p in n 25
Advantages of Dimensionality Reduction
• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.
• Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D
or 3D
• Overfitting Prevention: High dimensional data may lead to overfitting in
machine learning models, which can lead to poor generalization
performance.
• Feature Extraction: Dimensionality reduction can help in extracting
important features from high dimensional data, which can be useful in
feature selection for machine learning models.
• Data Pre-processing: Dimensionality reduction can be used as a pre-
processing step before applying machine learning algorithms
• Improved Performance: It reduces the complexity of the data, and hence
reducing the noise and irrelevant information in the data.
Disadvantages of Dimensionality Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is
sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to
define datasets.
• Interpretability: The reduced dimensions may not be easily
interpretable, and it may be difficult to understand the
relationship between the original features and the reduced
dimensions.
• Overfitting: In some cases, dimensionality reduction may lead to
overfitting, especially when the number of components is chosen
based on the training data.
• Sensitivity to outliers: Some dimensionality reduction techniques
are sensitive to outliers, which can result in a biased
representation of the data.
Important points:
• Dimensionality reduction is the process of reducing the number
of features in a dataset while retaining as much information as
possible.
• This can be done to reduce the complexity of a model, improve the performance
of a learning algorithm, or make it easier to visualize the data.