09_PCA
09_PCA
PCA
1
Introduction
• Principal component analysis (PCA) is a popular technique for analyzing large datasets
containing a large number of dimensions/features per observation.
The greater the variance, the more the information. Vice versa.
2
PCs
• The first few principal components (PCs) account for most of the contained information
in the data, the remaining PCs can be discarded.
• The first PC is along the maximum variation of the data
• Discarding some of the PCs will reduce the dimensionality of the data.
3
Dimensionality reduction
• Benefits of dimensionality reduction:
• Reduction of computational overhead of subsequent processing.
• Noise reduction because only the most relevant information will be captured
and kept.
• A projection into a subspace of low dimension is useful for visualizing the
data.
4
PCA algorithm
• Step 1: Remove (subtract) the mean from the data points (the data is centered around the
origin point).
• Step 2: Calculate the covariance matrix for the features in the dataset.
• Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.
• Step 4: Sort eigenvalues and their corresponding eigenvectors.
• Step 5: Pick k eigenvalues (i.e. best PCs) and form a matrix of eigenvectors.
• Step 6: Transform the original data matrix to the new fewer-dimension data.
5
Variance and covariance
• Variance refers to the spread of a data set around its mean value.
σ 𝑥 − 𝑥 ҧ 2
𝑖
𝜎2 =
𝑁
• Covariance provides insight into how two variables vary from the mean with respect to
each other.
σ 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝑐𝑜𝑣 𝑥, 𝑦 =
𝑁
• Covariance matrix contains variance values between all possible dimensions, the matrix is
always symmetric. Example of three dimensional covariance matrix:
𝑐𝑜𝑣 𝑥, 𝑥 𝑐𝑜𝑣 𝑥, 𝑦 𝑐𝑜𝑣 𝑥, 𝑧
𝐶 = 𝑐𝑜𝑣 y, 𝑥 𝑐𝑜𝑣 y, 𝑦 𝑐𝑜𝑣 y, 𝑧
𝑐𝑜𝑣 z, 𝑥 𝑐𝑜𝑣 z, 𝑦 𝑐𝑜𝑣 𝑧, 𝑧
6
Eigenvalues and eigenvectors
• Eigenvalues measure the amount of the variation explained by each PC.
• Eigenvalue is largest for the first PC and smaller for the subsequent PCs.
• Eigenvectors provide the directions in which the data cloud is stretched most.
• Steps of calculating eigenvalues and eigenvectors of matrix 𝐴:
• Roots of 𝐴 − 𝜆𝛪 = 0 are the eigenvalues (𝜆1 , 𝜆2 , … ).
• Solve 𝐴𝑢 = 𝜆𝑢 for each 𝜆 to obtain eigenvectors (𝑢, 𝑣, … ).
7
Example
point x y
• We want to find a transformed dataset of 1 126 78
the shown one so that it contains only one 2 128 80
feature instead of two. 3 128 82
4 130 82
5 130 84
6 132 86
8
Solution – centering data & covariance matrix
Original data Centered data For calculating the covariance matrix
3.6 4.6
Covariance matrix 𝐶=
4.6 6.6 9
Solution – original and centered data points
10
Solution - eigenvalues
𝐴 − 𝜆𝐼 = 0
3.6 4.6 𝜆 0
− =0
4.6 6.6 0 𝜆
3.6 − 𝜆 4.6
=0
4.6 6.6 − 𝜆
𝜆2 − 12.4𝜆 + 3.84 = 0
−0.81
in the same procedure, the eigenvector of the second eigenvalue (𝜆2 = 0.26) is 𝑢2 =
0.59 12
Solution – variations and eigenvectors
• The covariance matrix is symmetric.
• Therefore, the eigenvectors are orthogonal to each
other.
• The angle between the eigenvectors is 90 degrees.
• The eigenvector related to the largest eigenvalue is in 𝑢2 𝑢1
the direction of the most variation of data.
13
Solution – PCs matrix
• The eigenvectors are then arranged based on the eigenvalues.
• Since 𝜆1 = 9.94 > 𝜆2 = 0.26 then PC1 is first and PC2 is next.
• The eigenvector matrix is constructed. The first column is for the eigenvector related to the largest
eigenvalue.
0.59 −0.81
𝑢=
0.81 0.59
• The first eigenvector represents PC1 while the second eigenvector represents PC2.
• This matrix is the transformation matrix, it is used to change the original data to a new form.
14
Solution – data transform
• The matrix (𝑢) can be used to transform the centered data (𝐷) such that their variables are
uncorrelated.
• The transformed data is found by 𝐷 ∗ 𝑢
−3 −4 −5 0.1
−1 −2 −2.2 −0.4
−1 0 0.59 −0.81 −0.6 0.8
=
1 0 0.81 0.59 0.6 −0.8
1 2 2.2 0.4 Note that the sum of variances is
3 4 5 −0.1 equal for both
3.6 4.6
• The covariance matrix 𝐶′ of the transformed data is 𝐶= ➔ 3.6+6.6=10.2
4.6 6.6
9.94 0
• 𝐶′ = 𝐶′ =
9.94 0
➔9.94+0.26=10.2
0 0.26 0 0.26
• The eigenvalues represent the variances of PC1 and PC2. 15
Solution – data transform
• The transformed data is the same as the original centered data with clockwise rotation
until the eigenvectors become in the same direction of the original axes.
9.94
• 𝑣𝑎𝑟 𝑃𝐶1 = = 97.4%
0.26
• 𝑣𝑎𝑟 𝑃𝐶2 = = 2.5%
9.94+0.26
16
Solution – new form of data
• Since PC1 counts for the most variance in the data (i.e. most of the information in the data
is in PC1), PC2 can be simply ignored because it contains almost no information.
• Now, PC1 represents the new form of the original data.
−5
−2.2
We changed the data from −0.6
2D to 1D 0.6
2.2
5
17
Number of PCs to retain
• One of several methods can be used to decide on the number of PCs to keep:
• Select the PCs that hold a specific amount of total variance (e.g. 90% of the total
variance).
• Select the PCs with variance (eigenvalue) greater than the average of the whole
eigenvalues.
18
Example – selecting PCs
PC1 PC2 PC3 PC4
• Suppose we have the shown matrix of eigenvectors
(PCs) -0.62 -1.14 -0.10 -0.14
1.73 -1.32 -0.16 0.14
• First method:
0.26 -1.35 0.65 0
• If we want to keep the least number of PCs that
account for at least 90% of the total variance then we -1.05 0.06 -0.09 0.46
will select only PC1 and PC2 because the sum of their 2.35 1.68 -0.28 0
variances is 60.2% + 36.2% = 96.4%. -0.59 1.43 0.23 0.16
• Second method: -2.47 -0.06 -0.40 -0.17
• Also PC1 and PC2 will be selected because their 0.41 1.51 0.32 -0.24
variances are greater than the average of eigenvalues. -1.58 0.19 0.03 -0.09
• Average(variance) = (2.41 + 1.45 + 0.1 + 0.04) / 4 = 1 1.58 -1.00 -0.20 -0.13
• 2.41 > 1 and 1.45 > 1 Variance 2.41 1.45 0.10 0.04
Variance % 60.2 36.2 2.5 1.1
19