0% found this document useful (0 votes)
17 views

PCA - Feb 8

Uploaded by

nidhinb200723cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

PCA - Feb 8

Uploaded by

nidhinb200723cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Dimensionality Reduction

- Principal Component Analysis

1
Jayaraj P B
Outline
1. Dimension/Features in ML
2. The Curse of Dimensionality
3. Dimensionality Reduction
4. Methods for DR
5. PCA – Overview
6. Steps of PCA
7. Problem Solving
Features
•In machine learning, a feature, also known as a predictor, attribute,
or input variable, refers to an individual measurable property or
characteristic of the data that is used as input to a machine learning
model.
•Features represent the various dimensions or aspects of the data
that the model will consider when making predictions or
classifications.

•Feature engineering is the process of selecting, transforming, and


creating new features from the raw data to improve the performance
of the machine learning model.

•In summary, features in machine learning represent the input


variables or attributes that the model learns from to make
predictions or classifications.
Dimension
• The term "dimension" typically refers to the number of
features or variables that are used to represent each data
point in a dataset. It essentially represents the number of
columns or attributes in the dataset.

• The number of input features, variables, or columns present in


a given dataset is known as dimensionality

• In many machine learning applications, datasets can have a


large number of features, resulting in high-dimensional data.

• High-dimensional data can present challenges such as


increased computational complexity, the curse of
dimensionality, and difficulties in visualization and
interpretation.
The Curse of Dimensionality
• Handling the high-dimensional data is very difficult in practice,
commonly known as the curse of dimensionality.

• If the dimensionality of the input dataset increases, any machine


learning algorithm and model becomes more complex.

• As the number of features increases, the number of samples also


gets increased proportionally, and the chance of overfitting also
increases.

• If the machine learning model is trained on high-dimensional


data, it becomes overfitted and results in poor performance.
Dimensionality reduction
 Dimensionality reduction is the process of reducing the number
of features (or dimensions) in a dataset while retaining as much
information as possible.

 In other words, it is a process of transforming high-dimensional


data into a lower-dimensional space that still preserves the
essence of the original data.

 This can be done for a variety of reasons, such as

- to reduce the complexity of a model,


- to improve the performance of a learning algorithm, or
- to make it easier to visualize the data.
• In addition, high-dimensional data can also lead to overfitting,
where the model fits the training data too closely and does not
generalize well to new data

• Dimensionality reduction can help to mitigate these problems by


reducing the complexity of the model and improving its
generalization performance.

• There are two main approaches to dimensionality reduction:


- feature selection and
- feature extraction.
Feature Selection
• Feature selection involves selecting a subset of the original
features that are most relevant to the problem at hand.

• The goal is to reduce the dimensionality of the dataset while


retaining the most important features.

• There are several methods for feature selection

Forward Selection
Backward Selection
Bi-directional Elimination
Filters
Wrappers
Feature Extraction
• Feature extraction involves creating new features by combining or
transforming the original features.

• The goal is to create a set of features that captures the essence of


the original data in a lower-dimensional space.

• There are several methods for feature extraction, including


principal component analysis (PCA), linear discriminant analysis
(LDA), and t-distributed stochastic neighbor embedding (t-SNE).

• PCA is a popular technique that projects the original features


onto a lower-dimensional space while preserving as much of the
variance as possible.
Data Compression
Reduce data from
(inches)

2D to 1D

(cm)
Data Compression
(inches)

Reduce data from


(cm) 2D to 1D

Andrew Ng
Principal Component Analysis
• This method was introduced by Karl Pearson.

• It works on the condition that while the data in a higher


dimensional space is mapped to data in a lower dimension space,
the variance of the data in the lower dimensional space should be
maximum.
Principal Component Analysis: one
attribute first Temperature
42
40
24
Question: how much spread is in 30
the data along the axis? (distance 15
to the mean) 18
15
30
Variance=Standard deviation^2 15
30
n

(X
35
i  X) 2
30
s2  i 1 40
(n  1) 30

13
Covariance
Measure of the “spread” of a set of points around their
center of mass(mean)
Variance:
Measure of the deviation from the mean for points in one
dimension
Covariance:
Measure of how much each of the dimensions vary from
the mean with respect to each other

• Covariance is measured between two dimensions


• Covariance sees if there is a relation between two dimensions
• Covariance between one dimension is the variance
Covariance
Used to find relationships between dimensions in high dimensional
data sets

The Sample mean


Now consider two dimensions
X=Temperature Y=Humidity
40 90
Covariance: measures the 40 90
correlation between X and Y 40 90
30 90
15 70
15 70
15 70
n

(X
i 1
i  X )(Yi  Y ) 30
15
90
70
cov( X , Y ) 
(n  1) 30 70
30 70
30 90
40 70
30 90

16
More than two attributes: covariance matrix

Contains covariance values between all possible dimensions


(=attributes):

C nxn  (cij | cij  cov( Dimi , Dim j ))

Example for three attributes (x,y,z):

 cov( x, x) cov( x, y ) cov( x, z ) 


 
C   cov( y, x) cov( y, y ) cov( y, z ) 
 cov( z , x) cov( z , y ) cov( z , z ) 
 
17
Eigenvalues & eigenvectors

Vectors x having same direction as Ax are called


eigenvectors of A (A is an n by n matrix).
In the equation Ax=x,  is called an eigenvalue of A.

 2 3   3  12   3
  x      4 x 
 2 1  2  8   2

18
Eigenvalues & eigenvectors

Ax=x  (A-I)x=0

How to calculate x and :


• Calculate det(A-I), yields a polynomial (degree n)
• Determine roots to det(A-I)=0, roots are
eigenvalues 
• Solve (A- I) x=0 for each  to obtain eigenvectors x

19
Eigenvector and Eigenvalue
Ax - λx = 0
Ax = λx
(A – λI)x = 0

If we define a new matrix B:


B = A – λI
Bx = 0

x will be an eigenvector of A if and only if B does


not have an inverse, or equivalently det(B)=0 :

det(A – λI) = 0
Eigenvector and Eigenvalue
Example 1: Find the eigenvalues of
 2 12
I  A   (  2)(  5)  12
1  5  2  12
A 
 2  3  2  (  1)(   2) 1  5 
two eigenvalues: 1,  2

Note: The roots of the characteristic equation can be repeated. That is, λ1 = λ2 =…= λk. If
that happens, the eigenvalue is said to be of multiplicity k.
Example 2: Find the eigenvalues of

 2 1 0
2 1 0
I  A  0  2 0  (  2)3  0
A  0 2 0
0 0  2
0 0 2

λ = 2 is an eigenvector of multiplicity 3.
Principal Component Analysis

X2

Y1

Y2
x
x
x
Note: Y1 is the x xx
x x
first eigen vector, x
x x
x
x
Y2 is the second. x x
Y2 ignorable. x
x x x X1
x x Key observation:
x x
x x variance = largest!

22
Principal components
1. principal component (PC1)
The eigenvalue with the largest absolute value will
indicate that the data have the largest variance along
its eigenvector, the direction along which there is
greatest variation
2. principal component (PC2)
the direction with maximum variation left in data,
orthogonal to the PC1.

In general, only few directions manage to capture most


of the variability in the data.

23
How (PCA) Work
1. Standardize the Data: If the features of your dataset are on
different scales, it’s essential to standardize them (subtract the
mean and divide by the standard deviation).
2. Compute the Covariance Matrix: Calculate the covariance matrix
for the standardized dataset.
3. Compute Eigenvectors and Eigenvalues: The eigenvectors
represent the directions of maximum variance, and the
corresponding eigenvalues indicate the magnitude of variance along
those directions.
4. Sort Eigenvectors by Eigenvalues: in descending order
5. Choose Principal Components: Select the top k eigenvectors
(principal components) where k is the desired dimensionality of the
reduced dataset.
6. Transform the Data: Multiply the original standardized data by
the selected principal components to obtain the new, lower-
dimensional representation of the data
Transformed Data
• Eigenvalues j corresponds to variance on each
component j
• Thus, sort by j
• Take the first p eigenvectors ei; where p is the number of
top eigenvalues
• These are the directions with the largest variances
 yi1   e1  xi1  x1 
    
 yi 2   e2  xi 2  x2 
 ...    ...  
    ... 
 y   e  x  x 
 ip   p  in n  25
Advantages of Dimensionality Reduction
• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.
• Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D
or 3D
• Overfitting Prevention: High dimensional data may lead to overfitting in
machine learning models, which can lead to poor generalization
performance.
• Feature Extraction: Dimensionality reduction can help in extracting
important features from high dimensional data, which can be useful in
feature selection for machine learning models.
• Data Pre-processing: Dimensionality reduction can be used as a pre-
processing step before applying machine learning algorithms
• Improved Performance: It reduces the complexity of the data, and hence
reducing the noise and irrelevant information in the data.
Disadvantages of Dimensionality Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is
sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to
define datasets.
• Interpretability: The reduced dimensions may not be easily
interpretable, and it may be difficult to understand the
relationship between the original features and the reduced
dimensions.
• Overfitting: In some cases, dimensionality reduction may lead to
overfitting, especially when the number of components is chosen
based on the training data.
• Sensitivity to outliers: Some dimensionality reduction techniques
are sensitive to outliers, which can result in a biased
representation of the data.
Important points:
• Dimensionality reduction is the process of reducing the number
of features in a dataset while retaining as much information as
possible.
• This can be done to reduce the complexity of a model, improve the performance
of a learning algorithm, or make it easier to visualize the data.

• Techniques for dimensionality reduction include: principal


component analysis (PCA), singular value decomposition (SVD),
and linear discriminant analysis (LDA).
• Each technique projects the data onto a lower-dimensional space
while preserving important information.
• Dimensionality reduction is performed during pre-processing
stage before building a model to improve the performance
• It is important to note that dimensionality reduction can also
discard useful information, so care must be taken when applying
these techniques.

You might also like