0% found this document useful (0 votes)
15 views

2 - 4 Principal Component Analysis (PCA)

1) Principal component analysis (PCA) is used to reduce the dimensionality of large datasets by transforming correlated variables into a smaller number of uncorrelated variables called principal components. 2) PCA works by computing the eigenvalues and eigenvectors of the covariance matrix of the dataset and using them to change the basis of the data to a new set of orthogonal variables ordered by variability. 3) This transformation projects the data onto a new set of axes such that the first axis captures the largest variability in the data, with each successive axis capturing the next highest variability.

Uploaded by

Michael Odiembo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

2 - 4 Principal Component Analysis (PCA)

1) Principal component analysis (PCA) is used to reduce the dimensionality of large datasets by transforming correlated variables into a smaller number of uncorrelated variables called principal components. 2) PCA works by computing the eigenvalues and eigenvectors of the covariance matrix of the dataset and using them to change the basis of the data to a new set of orthogonal variables ordered by variability. 3) This transformation projects the data onto a new set of axes such that the first axis captures the largest variability in the data, with each successive axis capturing the next highest variability.

Uploaded by

Michael Odiembo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

2.

1 Data analytics for dimensionality


reduction:
Principal Component Analysis (PCA)
Prof. Massimiliano Grosso
University of Cagliari, Italy
[email protected]
GRICU PhD School 2021
Digitalization Tools for the Chemical and Process Industries
March 12, 2021

Outline
• Motivations
• Basic concepts
• Preprocessing
• Mathematical background
• Dimension reduction
• Geometrical interpretation

2.1 Principal Component Analysis (M. Grosso) 2

1
Motivations
• Concerns when dealing with “huge” amount of data:
• The size of the data:
• The useful information is often «hidden» amongst hundred/thousands of
variables
• The measurements are often highly correlated with one another
(multicollinearity)
• The number of independent variables (degrees of freedom) is much less than
the number of measurements on hand
• Noise in the measurements
• Difficulties in distinguishing the noise from the deterministic variations
induced by external sources

2.1 Principal Component Analysis (M. Grosso)

Motivations
• Multivariate data analysis method for
• Explorative data analysis
• Outlier detection
• Rank reduction
• Graphical clustering
• Classification

• PCA allows interpretation based on all variables simultaneously,


leading to understanding deeper than what is possible looking
at the individual variables alone
• It is the first multivariate analysis to be carried out
2.1 Principal Component Analysis (M. Grosso) 4

2
PCA: Basic concepts
• Aim of the PCA:

Projection of the variables onto the


Original variables
Principal components (PCs)

artificial variables,
high dimension,
dimension much
strongly correlated
lower, independent

PCA: Basic concepts


• Data must be collected on matrix X J
• Column vectors represent the variables
(j=1,…,J)
• attributes, wavelenghts, physical/chemical
parameters etc.
• Row vectors represent the samples X
(i=1,…,I) collected during the
experiments

2.1 Principal Component Analysis (M. Grosso) 6

3
Preprocessing of the data
• Matrix X can be visualized in a coordinate system made up by
• J orthogonal axes, each representing one of the original J variables
• Each i-th sample is a J-dimensional row vector
• Two-dimensional example with two variables highly correlated
x2 • First step; x2
• Translate the data to the
center («mean centering»)
x1 x1

∗ 1
= − ̅•∗ ℎ ∗
• = ∗

2.1 Principal Component Analysis (M. Grosso) 7

Preprocessing of the data


• Mean centering allows to consider the covariance matrix =
∗ ∗
⋯ − ̅ ∗· ⋯ − ̅ ∗·
= ⋮ ⋱ ⋮ = ⋮ ⋱ ⋮
∗ ∗
⋯ − ∗· ⋯ − ̅ ∗·

• Indeed, for the element kl


∗ ∗
= = − ̅ ∗· − ̅ ∗· =

• The diagonal elements of C are the dispersion related to the j-th variable

= = − ̅ ∗· =

2.1 Principal Component Analysis (M. Grosso) 8

4
PCA – Basic concepts
• Principal Component Analysis is based on the decomposition
of the dataset matrix X
=
(I×J) (I×J)(J×J)

Scores matrix Loadings matrix


Artificial variables Rotation matrix relating
generated by artificial variables with the
PCA original ones

2.1 Principal Component Analysis (M. Grosso) 9

PCA – Basic concepts


• Important properties:
1. Even the scores are mean centered

̅• = 0 ∀ = 1, … , ⇒ •̅ = 0 ∀ = 1, … ,

2. Column vectors of the score matrix T are orthogonal:


=0∀ ≠
1. The square of the score matrix = is diagonal
3. Loadings matrix P is orthogonal: = ⇒ =

2.1 Principal Component Analysis (M. Grosso) 10

10

5
Mathematical background
• PCA scores and loadings can be related to the computation of
the eigenvalues and eigenvectors of the J×J covariance
matrix

• Remark
• C is a square, symmetric matrix, this leads to the following
properties:
• All the eigenvalues are real and positive
• All the eigenvectors are orthogonal to each other

2.1 Principal Component Analysis (M. Grosso) 11

11

Mathematical background
• Starting from the definition = , one can obtain the
following relationships
= = = =
• The latter equation corresponds to the eigendecomposition of
the square matrix =
• is a diagonal matrix whose diagonal elements are the eigenvalues of
C
• The m-th element = is the variance explained by the m-th
score
• P is the n×n square matrix whose m-th column is the eigenvector pm of
C
• it is a rotation matrix2.1 Principal Component Analysis (M. Grosso) 12

12

6
Mathematical background
• Once the eigenvectors pm are computed the corresponding
scores can be derived
= ⇒ = ⇒ =

• In practice, the original variables are projected onto the


orthogonal eigenspace defined by the eigenvectors/loadings

2.1 Principal Component Analysis (M. Grosso) 13

13

Mathematical background
• The eigenvalues of the covariance matrix are related to the variance of the
scores
= = ,

• Thus the j-th eigenvalue is the dispersion captured by the j-th score
• The total variance in the original data set is preserved in the T matrix

Sum of the variances of Sum of the variances


the original variables of the scores

2.1 Principal Component Analysis (M. Grosso) 14

14

7
Mathematical background
• In summary, one ends up with two matrices
, , , ,
= … = …
× ×1 ×1 ×1 × ×1 ×1 ×1

Scores matrix Loading

The j-th column represents an independent variable Each column is an


obtained by projecting the data onto the j-th eigenvector of the
eigenvector covariance matrix
Remind:
Sort the eigenvectors according to their eigenvalue
size (that is, their variance)
2.1 Principal Component Analysis (M. Grosso) 15

15

PCA – Dimension reduction


• The scores and loading matrices can be approximated by
considering only the first A principal components

= ⋮ ≈ = ⋮ ≈
× × × − × × × × − ×

Information
Information
considered
considered
negligible
negligible

2.1 Principal Component Analysis (M. Grosso) 16

16

8
PCA – Dimension reduction
• Qualitative interpretation of the PCA
(I×A) (A×J)

PA × × ×
PT
X = TA T In general:

(J×J)

(I×J) (I×J)
• Only part of the information collected in the X matrix is relevant
• Only the first A columns of T (the first scores) take into account
most of the data variance
2.1 Principal Component Analysis (M. Grosso) 17

17

PCA – A geometrical interpretation


• 2D example - Reduction to 1D
• Samples are strongly correlated
PC1
PC2
• First principal component PC1 is
the eigenvector direction
x22
corresponding to maximum
variance (largest eigenvalue) in the
coordinate space
• Second principal component is the
x11
orthogonal one leading to the
second variance directions

18

9
PCA – A geometrical interpretation PC1
Second
component x2
• Orthogonal projection onto a of loading 1
specific PC results in a score
for each sample
unit vector
• The loading is the unit vector along PC1
which defines this direction loading 1

x1
First
component
of loading 1

2.1 Principal Component Analysis (M.Grosso) 19

19

PCA – A geometrical interpretation


• The score is the projection of the point onto the first principal
component
x2 PC1
≈ =
t1

x1

2.1 Principal Component Analysis (M. Grosso) 20

20

10
J
PCA – Working principle
– Reduction to 1D
PCA
• PCA projects matrix X into: projection
• a score vector t1 X t1
• a loading vector p1


I
× ×1 1×
PCA
projection ×1
• t1 and p1 are the first
components p1T

2.1 Principal Component Analysis (M. Grosso) 21

21

PCA – A geometrical interpretation


• 3D example (A little bit more
PC3
complicated)
PC1
PC2
• Points are mostly aligned
along the 2D plane defined
by the PC1 and the PC2
directions

2.1 Principal Component Analysis (M. Grosso) 22

22

11
PCA – Working principle – Reduction to 2D
• If two principal components are required, matrix is formed by
the outer products of t1 and p1, t2 and p2
p1 p2

X = t1 + t2 + E

• Matrix X is decomposed into two sets of rank 1 outer products


(2 terms) and the residual matrix E
2.1 Principal Component Analysis (M. Grosso) 23

23

PCA – Working principle


• Successive components are formed by the outer products of ta
and pa
p1 p2 pA

X = t1 + t2 + … + tA + E

• Matrix X is decomposed into a set of rank 1 outer products (A


terms) and the residual matrix E
2.1 Principal Component Analysis (M. Grosso) 24

24

12
PCA – Working principle
• The master equation for PCA is eventually

= + + … +

• or
= +
× × × ×

original data score loading residual


matrix matrix matrix matrix

2.1 Principal Component Analysis (M. Grosso) 25

25

Estimation of the residuals


• When considering a PCA model with A principal components,
one can evaluate the residual E

= − · = −

2.1 Principal Component Analysis (M. Grosso) 26

26

13
Estimation of the components
• How many principal components are needed?
• Possible criterion: cumulative variance explained by the first
A principal components
• The number of principal components to be considered explains most of
the variance in the data (e.g., 95%)
• Alternative possibilities will be discusses in the case studies

2.1 Principal Component Analysis (M. Grosso) 27

27

PCA to predict new data – Projection of the


data onto the principal component space
• Single observations, (eventually new data xnew) can be
eventually projected onto the space defined by the PCA model:

t new  x new  PA xˆ new  x new  PA  PAT


1 A 1 J   J  A 1 J  1 J  J  A A J 

2.1 Principal Component Analysis (M. Grosso) 28

28

14
PCA – Summary
• PCA projects the original data onto an orthogonal eigenspace of
smaller dimensions
• The space is described by the first A eigenvectors of the
covariance matrix
• The scores (i.e. the data projections onto the first eigenvectors)
represent a set of independent variables
• New data can be projected in the PCA model

2.1 Principal Component Analysis (M. Grosso) 29

29

References
1. Brereton, R.G. Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley, 2003
2. Brereton, R.G. Chemometrics for Pattern Recognition. Wiley, 2009
3. Jackson, J.E., A User’s Guide to Principal Components. Wiley, New York, 1991
4. Jolliffe, I.T. Principal Component Analysis. Second Edition. Springer, 2002.
5. Jolliffe IT, Cadima J. (2016). Principal component analysis: a review and recent developments.
Phil.Trans.R.Soc. A374:20150202.
6. Wold S., Esbensen K.,Geladi P (1987). Principal Component Analysis – A tutorial. Chemom. Intell. Lab. 2,
37-52

2.1 Principal Component Analysis (M. Grosso) 30

30

15

You might also like