0% found this document useful (0 votes)
7 views

Pca

Uploaded by

arkadebmisra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Pca

Uploaded by

arkadebmisra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Principal Component Analysis

Ujjwal Maulik
Computer Sc. & Engg. Department
Jadavpur University
Topics
• Introduction
• Mathematical Concepts
• How PCA Works
• Application Areas
• References
Introduction
• PCA is a mathematical technique used in
Pattern recognition
• It is used when dimension of data points are
very high and it is impossible to visually
recognize them
• PCA maps high dimensional data points to
lower dimension
Mathematical Concepts
• Mean
• Standard Deviation
• Variance
• Covariance
• Eigen Values
• Eigen Vectors
Mean
• Mean is arithmetic average

• Consider following to data set

• Mean is 10 for both data set. But they are not


same
Standard Deviation
• They are different in how data are spread.
• For that we use Standard Deviation
Standard Deviation
Standard Deviation
Variance
• Variance is square of standard deviation
Covariance
• Standard deviation or variance tells u about
individual dimension of data point
• They can’t give u any idea how different
dimension of data point are dependent to
each other
• Covariance is measure of dependence of
different dimension of data points
Covariance
• If we have 3 variables x, y, z then we can
calculate Covariance between (x,y), (y,z), (x,z)
Covariance
• Cov(x,y)=Cov(y,x)
• Cov(x,x)=var(x)
• If covariance is positive that mean if we
increase one variable other will also increase
• If it zero then they are independent
• If it negative then if we increase one, other
will decrease
Covariance Matrix
• Covariance Matrix contains all the possible
covariance among different dimension
• If we have three dimension x, y, z then
Covariance matrix will look like follows
Covariance Matrix
• For n dimension this matrix will have n*n
entries
• This matrix will be used for PCA
Eigen Values and Vectors
• For a square matrix A, there may exists
another matrix X and scalar C such that
AX=CX
• X is called Eigen vector and C is called
corresponding Eigen value
• Only square matrix may have Eigen value and
vectors
• All square matrix may not have
Eigen Values and Vectors
Eigen Values and Vectors
• For a square matrix of size n, if it has Eigen
vectors then it will have n Eigen vectors and
corresponding n Eigen values
• Eigen values and vectors comes as a pair
Unit Length Eigen Vector
• In PCA we will use unit length Eigen vectors
How PCA Works
• Step 1: Get n data of d dimension. Here we have 10 data of
dimension 3
x y z
2.5 2.4 2.3
0.5 0.7 0.8
2.2 2.9 2.3
1.9 2.2 1.7
3.1 3.0 2.9
2.3 2.7 2.1
2 1.6 1.8
1 1.1 0.9
1.5 1.6 1.7
1.1 0.9 1.2
How PCA Works
• Step 2: calculate the mean and subtract it
from each data item.
• After subtraction mean will be zero.
• For above data mean(x)=1.81 and
mean(y)=1.91 and mean(z)=1.77
• Data points after adjusting means are follows
How PCA Works
x y z
.69 .49 0.5
-1.31 -1.21 -1.0
.39 .99 0.5
.09 .29 -0.1
1.29 1.09 1.1
.49 .79 0.3
.19 -.31 0.03
-.81 -.81 0.87
-.31 -.31 -0.07
-.71 -1.01 -0.57
How PCA Works
• Step 3: calculate covariance matrix
• Cov: .616555556 .615444444 .348266666
.615444444 .716555556 .360044444
.348266666 .360044444 .433066666
How PCA Works
• Step 4: calculate Eigen values and Eigen
vectors
• Eigen values: 1.516, 0.048, 0.202
• Eigen vectors
( 0.618) ( 0.751) (-0.232)
( 0.665) (-0.657) (-0.356)
( 0.420) (-0.066) ( 0.905)
• These Eigen vectors are of unit length
How PCA Works
• Step 5: sort the Eigen values in descending
order and choose top k Eigen values and
corresponding Eigen vectors.
• Value of k is application specific.
• Keep this k Eigen vectors side-by-side, we get
a matrix called W
• Lets say k=2
How PCA Works
• Eigen value selected are 1.516 and 0.202
• Eigen Vectors selected are:
( 0.618) (-0.232)
( 0.665) (-0.356)
( 0.420) ( 0.905)
How PCA Works
• Matrix W is : d * k
( 0.618 -0.232)
( 0.665 -0.356)
( 0.420 0.905)
• Transpose of W is WT: k * d
( 0.618 0.665 0.420)
(-0.232 -0.356 0.905)
How PCA Works
• Step 6: TransformedData= WT * D
• So now we will have n data points of
dimension k
( 0.618 0.665 0.420) (2.5) (4.107)
(-0.232 -0.356 0.905) * (2.4) = (0.647)
(2.3)
k*d d*1 k*1
How PCA Works
• New data set is as follows:
PCA1 PCA2
4.107 0.647
1.111 0.359
4.254 0.539
3.351 0.315
5.129 0.837
4.099 0.406
3.056 0.595
1.728 0.191
2.705 0.621
1.782 0.510
Application Areas
• Pattern Recognition: PCA can be used to
reduce the dimension of data and then
transformed data can be used for pattern
recognition.
• Pattern Recognition algorithm will be more
efficient on transformed data
Application Areas
• Lets say we have 100 students
• Each student is described by 50 features
• We want to classify this students
• Pattern recognition/classification algorithm
like KNN, K-Mean, GA will run very slowly
when dimension of data points are large
• So we need to select some features from
those 50 which will describes students as
earlier
Application Areas
• We need to choose most important features
• Here we can use PCA to determine most
important feature.
• Here we have 50 dimension, so PCA will give
50 Eigen values and vectors
• We can choose top 10 out of them and can
transformed all data point from 50 dimension
to 10 dimension
Application Areas
• Using transformed data we can build a
classifier which can be used to on test data
• When new data will come, first it will be
transformed using WT and new dimension
values will be used as input to classifier
Application Areas
• Following is typical PCA based classifier model

d*1 k*1 Class


PCA Classifier
References
• http://sebastianraschka.com/Articles/2014_pc
a_step_by_step.html
• http://www.cs.princeton.edu/picasso/mats/P
CA-Tutorial-Intuition_jp.pdf
• http://www.cs.otago.ac.nz/cosc453/student_t
utorials/principal_components.pdf

You might also like