0% found this document useful (0 votes)
8 views

Principal Component Analysis (PCA) : Anisha M. Lal

Uploaded by

Mugdha Ashiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Principal Component Analysis (PCA) : Anisha M. Lal

Uploaded by

Mugdha Ashiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Principal Component

Analysis (PCA)
ANISHA M. LAL
Dimensionality Reduction
and Feature Construction
• Principal components analysis (PCA)
• Principal Component Analysis (PCA) is an unsupervised
linear transformation technique that is widely used across
different fields, most prominently for feature extraction and
dimensionality reduction.
• PCA used to reduce dimensions of data without much loss of
information.
• Dimensionality Reduction is a process through which we can
visualize a high dimension data by reducing the no of
dimensions.
• Used in machine learning and in signal processing and image
compression (among other things).
PCA

PCA is “an orthogonal linear transformation that transfers


the data to a new coordinate system such that the greatest
variance by any projection of the data comes to lie on the
first coordinate (first principal component), the second greatest
variance lies on the second coordinate (second principal
component), and so on.”
Background for PCA

• Suppose attributes are A1 and A2, and we have n training


examples. x’s denote values of A1 and y’s denote values of
A2 over the training examples.
n

 i
( x  x ) 2

• Variance of an attribute: var( A1 )  i 1


(n  1)
• Covariance of two attributes:
n

 ( x  x )( y
i i  y)
cov( A1 , A2 )  i 1
(n  1)

• If covariance is positive, both dimensions increase together. If


negative, as one increases, the other decreases. Zero: independent
of each other.
• Covariance matrix
• Suppose we have n attributes, A1, ..., An.

• Covariance matrix:

C nn  (ci , j ), where ci , j  cov(Ai , Aj )


 cov(H , H ) cov(H , M ) 
 
 cov(M , H ) cov(M , M ) 

 var( H ) 104 .5 
  
 104 .5 var( M ) 

 47.7 104 .5 
  
104 .5 370 

Covariance matrix
PCA Algorithm
• Given original data set S = {x1, ..., xk}, produce new set
by subtracting the mean of attribute Ai from each xi.
• Calculate the covariance matrix.
• Calculate the (unit) eigenvectors and eigenvalues of the
covariance matrix.
• Order eigenvectors by eigenvalue, highest to lowest.
• Construct new feature vector.
5. Derive the new data set.
TransformedData = RowFeatureVector  RowDataAdjust
PCA
1. Given original data set S = {x1, ..., xk}, produce new set by
subtracting the mean of attribute Ai from each xi.

Mean: 1.81 1.91 Mean: 0 0


2. Calculate the covariance matrix:
x y
x
y

3. Calculate the (unit) eigenvectors and eigenvalues of the


covariance matrix:
4. Order eigenvectors by eigenvalue, highest to lowest.
  .677873399 
v1      1.28402771
  .735178956 

  .735178956 
v 2      .0490833989
 .677873399 

In general, you get n components. To reduce dimensionality to p, ignore


np components at the bottom of the list.
Construct new feature vector.
Feature vector = (v1, v2, ...vp)

  .677873399  .735178956 
FeatureVector1   
  .735178956 .677873399 

or reduced dimension feature vector :

  .677873399 
FeatureVector 2   
  .735178956 
5. Derive the new data set.

TransformedData = RowFeatureVector  RowDataAdjust

  .677873399  .735178956 
RowFeatureVector1   
  .735178956 .677873399 

RowFeatureVector 2   .677873399  .735178956 

This gives original data in terms of chosen components (eigenvectors)—that is, along these axes.

 .69  1.31 .39 .09 1.29 .49 .19  .81  .31  .71 
RowDataAdjust   
 .49  1.21 .99 .29 1.09 .79  .31  .81  .31  1.01
Reconstructing the original data
We did:
TransformedData = RowFeatureVector  RowDataAdjust

so we can do
RowDataAdjust = RowFeatureVector -1  TransformedData

= RowFeatureVector T  TransformedData

and
RowDataOriginal = RowDataAdjust + OriginalMean
Advantages of PCA
• Remove Co-relation Features.
• Improves the Algorithm Performance by reducing the no of
dimensions: The training time of the algorithms reduces
significantly with less number of features.
• Reduces over-fitting of data: Overfitting mainly occurs when
there are too many variables in the dataset. So, PCA helps in
overcoming the overfitting issue by reducing the number of
features.
• Improves Visualization: It is very hard to visualize and
understand the data in high dimensions. PCA transforms a high
dimensional data to low dimensional data (2 dimension) so that
it can be visualized easily.
Disadvantages of PCA
• Independent variables becomes less interpretable.
• Data Standardization is must for PCA: Principal components will
be biased towards features with high variance, leading to false
results. PCA is affected by scale, so you need to scale the features in
your data before applying PCA.
• Categorical features requires Encoding as PCA works only on
numerical data.
• Information is loss when data is spread in different
structures/shapes: Although Principal Components try to cover
maximum variance among the features in a dataset, if we don’t
select the number of Principal Components with care, it may miss
some information as compared to the original list of features.

You might also like