2 - 4 Principal Component Analysis (PCA)
2 - 4 Principal Component Analysis (PCA)
Outline
• Motivations
• Basic concepts
• Preprocessing
• Mathematical background
• Dimension reduction
• Geometrical interpretation
1
Motivations
• Concerns when dealing with “huge” amount of data:
• The size of the data:
• The useful information is often «hidden» amongst hundred/thousands of
variables
• The measurements are often highly correlated with one another
(multicollinearity)
• The number of independent variables (degrees of freedom) is much less than
the number of measurements on hand
• Noise in the measurements
• Difficulties in distinguishing the noise from the deterministic variations
induced by external sources
Motivations
• Multivariate data analysis method for
• Explorative data analysis
• Outlier detection
• Rank reduction
• Graphical clustering
• Classification
2
PCA: Basic concepts
• Aim of the PCA:
artificial variables,
high dimension,
dimension much
strongly correlated
lower, independent
3
Preprocessing of the data
• Matrix X can be visualized in a coordinate system made up by
• J orthogonal axes, each representing one of the original J variables
• Each i-th sample is a J-dimensional row vector
• Two-dimensional example with two variables highly correlated
x2 • First step; x2
• Translate the data to the
center («mean centering»)
x1 x1
∗ 1
= − ̅•∗ ℎ ∗
• = ∗
• The diagonal elements of C are the dispersion related to the j-th variable
∗
= = − ̅ ∗· =
4
PCA – Basic concepts
• Principal Component Analysis is based on the decomposition
of the dataset matrix X
=
(I×J) (I×J)(J×J)
̅• = 0 ∀ = 1, … , ⇒ •̅ = 0 ∀ = 1, … ,
10
5
Mathematical background
• PCA scores and loadings can be related to the computation of
the eigenvalues and eigenvectors of the J×J covariance
matrix
• Remark
• C is a square, symmetric matrix, this leads to the following
properties:
• All the eigenvalues are real and positive
• All the eigenvectors are orthogonal to each other
11
Mathematical background
• Starting from the definition = , one can obtain the
following relationships
= = = =
• The latter equation corresponds to the eigendecomposition of
the square matrix =
• is a diagonal matrix whose diagonal elements are the eigenvalues of
C
• The m-th element = is the variance explained by the m-th
score
• P is the n×n square matrix whose m-th column is the eigenvector pm of
C
• it is a rotation matrix2.1 Principal Component Analysis (M. Grosso) 12
12
6
Mathematical background
• Once the eigenvectors pm are computed the corresponding
scores can be derived
= ⇒ = ⇒ =
13
Mathematical background
• The eigenvalues of the covariance matrix are related to the variance of the
scores
= = ,
• Thus the j-th eigenvalue is the dispersion captured by the j-th score
• The total variance in the original data set is preserved in the T matrix
14
7
Mathematical background
• In summary, one ends up with two matrices
, , , ,
= … = …
× ×1 ×1 ×1 × ×1 ×1 ×1
15
= ⋮ ≈ = ⋮ ≈
× × × − × × × × − ×
Information
Information
considered
considered
negligible
negligible
16
8
PCA – Dimension reduction
• Qualitative interpretation of the PCA
(I×A) (A×J)
≈
PA × × ×
PT
X = TA T In general:
≪
(J×J)
(I×J) (I×J)
• Only part of the information collected in the X matrix is relevant
• Only the first A columns of T (the first scores) take into account
most of the data variance
2.1 Principal Component Analysis (M. Grosso) 17
17
18
9
PCA – A geometrical interpretation PC1
Second
component x2
• Orthogonal projection onto a of loading 1
specific PC results in a score
for each sample
unit vector
• The loading is the unit vector along PC1
which defines this direction loading 1
x1
First
component
of loading 1
19
x1
20
10
J
PCA – Working principle
– Reduction to 1D
PCA
• PCA projects matrix X into: projection
• a score vector t1 X t1
• a loading vector p1
≈
I
× ×1 1×
PCA
projection ×1
• t1 and p1 are the first
components p1T
1×
21
22
11
PCA – Working principle – Reduction to 2D
• If two principal components are required, matrix is formed by
the outer products of t1 and p1, t2 and p2
p1 p2
X = t1 + t2 + E
23
X = t1 + t2 + … + tA + E
24
12
PCA – Working principle
• The master equation for PCA is eventually
= + + … +
• or
= +
× × × ×
25
= − · = −
26
13
Estimation of the components
• How many principal components are needed?
• Possible criterion: cumulative variance explained by the first
A principal components
• The number of principal components to be considered explains most of
the variance in the data (e.g., 95%)
• Alternative possibilities will be discusses in the case studies
27
28
14
PCA – Summary
• PCA projects the original data onto an orthogonal eigenspace of
smaller dimensions
• The space is described by the first A eigenvectors of the
covariance matrix
• The scores (i.e. the data projections onto the first eigenvectors)
represent a set of independent variables
• New data can be projected in the PCA model
29
References
1. Brereton, R.G. Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley, 2003
2. Brereton, R.G. Chemometrics for Pattern Recognition. Wiley, 2009
3. Jackson, J.E., A User’s Guide to Principal Components. Wiley, New York, 1991
4. Jolliffe, I.T. Principal Component Analysis. Second Edition. Springer, 2002.
5. Jolliffe IT, Cadima J. (2016). Principal component analysis: a review and recent developments.
Phil.Trans.R.Soc. A374:20150202.
6. Wold S., Esbensen K.,Geladi P (1987). Principal Component Analysis – A tutorial. Chemom. Intell. Lab. 2,
37-52
30
15