Chapter6_MV
Chapter6_MV
Analysis (PCA)
Haile M. 1
PCA: Introduction
• It is concerned with explaining the variance
covariance structure of the data through a few
linear combinations of the original variables.
Its general objectives are:
• data reduction
• interpretation
Haile M. 2
Cont.
• A member of the general linear model (GLM) where all analyses are
correlational
• Term often used interchangeably with “factor analysis”, however,
there are slight differences
Haile M. 3
How it comes?
• Too many variables: ??? Even the scatter plots and scatter plot
matrix are to be more useful for is relatively small number of
variables.
Haile M. 5
Intro.con
• probably the most widely-used and well-
known of the “standard” multivariate
methods
• invented by Pearson (1901) and Hotelling
(1933)
• Hesketh and Everitt, van Belle, Fisher, Heagerty,
and Lumley (2004), Afifi, May, and Clark (2012).
More advanced treatments are Mardia, Kent,
and Bibby (1979, chap. 8), and Rencher
Haile M. 6
Intro. Cont.
• How the aim is achieved?
❖ By transforming a new set of variables, the PCA, that are
linear transformations of the original variables
Haile M. 7
Intro. Con.
The basic goal of PCA is to describe variation in a set of correlated variables,
XT=(X1, X2,. . . , Xp) in terms of uncorrelated variables YT=(Y1,Y2,. . . , Yp)
❖The new variables defined by this process, y1, y2, …, yp, are principal
Components.
Haile M. 8
PCA: Transformation
From p original variables: x1,x2,...,xp:
Produce p new variables: y1,y2,...,yp:
Haile M. 14
Haile M. 15
Haile M. 16
Haile M. 17
Haile M. 18
Conclusion: In this case, the components Y1 and Y2 could replace the original three
variables with little loss of information.
Example 2: from text book page 465 Haile M. 19
Look at Johnson page 465/5
1:-
Haile M. 20
Haile M. 21
Haile M. 22
Unlike factor analysis, PCA is not scale invarian
The eigen values and eigenvectors of a covariance matrix
differ from those of the associated correlation matrix.
Usually, a PCA
of a covariance matrix is meaningful only if the variables are
expressed in the same units.
Haile M. 23
Interpretation of the Principal Components
(covariance vs. correlation)
Haile M. 24
Data Analysis: Practices on climate data with Rstudio
2.5
15000
2.0
Variance of PC
Variance of PC
10000
1.5
5000
1.0
0.5
0
1 2 3 4 5 6 7 8 Haile M. 1 2 3 4 5 6 7 827
Step 2: Next, we will compute the
principal component scores.
For example, the first principal component can
be computed using the elements of the first
eigenvector:
Y1=0.53(rainfall) +0.23(Max T) -0.26(Min T)-0.45 (RH)+0.06 (WR)+0.13(Exmax
T)-0.31(ExtMiT)+0.53(SS)
Haile M. 28
Interpretation of the Principal Components
Haile M. 30
First Principal Component Analysis - PCA1
Haile M. 31
Second Principal Component Analysis - PCA2:
Same with one above…
Haile M. 32