0% found this document useful (0 votes)
2 views

Chapter6_MV

Chapter Five/Six discusses Principal Components Analysis (PCA), a multivariate statistical technique aimed at reducing the dimensionality of data while preserving variance. It outlines the method's objectives, appropriate conditions for use, and the process of transforming original variables into uncorrelated principal components. Additionally, it emphasizes the importance of eigenvalues and eigenvectors in determining the number of principal components and interpreting their significance in relation to the original variables.

Uploaded by

Tewachew Guadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter6_MV

Chapter Five/Six discusses Principal Components Analysis (PCA), a multivariate statistical technique aimed at reducing the dimensionality of data while preserving variance. It outlines the method's objectives, appropriate conditions for use, and the process of transforming original variables into uncorrelated principal components. Additionally, it emphasizes the importance of eigenvalues and eigenvectors in determining the number of principal components and interpreting their significance in relation to the original variables.

Uploaded by

Tewachew Guadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter Five/Six:-: Principal Componets

Analysis (PCA)

Haile M. 1
PCA: Introduction
• It is concerned with explaining the variance
covariance structure of the data through a few
linear combinations of the original variables.
Its general objectives are:

• data reduction
• interpretation

Haile M. 2
Cont.
• A member of the general linear model (GLM) where all analyses are
correlational
• Term often used interchangeably with “factor analysis”, however,
there are slight differences

• A method of reducing large data sets into more manageable


“factors” or “components”

• A method of identifying the most useful variables in a dataset

• A method of identifying and classifying variables across common


themes, or constructs that they represent (FA)

Haile M. 3
How it comes?
• Too many variables: ??? Even the scatter plots and scatter plot
matrix are to be more useful for is relatively small number of
variables.

• This brings the scientist to Principal Components Analysis


(PCA):
• PCA; is multivariate statistical techniques that are often useful
in reducing dimensionality of a collection of unstructured
random variables for analysis and interpretation.

• MV technique with a central aim of reducing the


dimensionality of MV data set while accounting for as much of
the original variation as possible
Haile M.
present in the data set. 4
PCA appropriate?
• When the data is interval or ratio level

• When trying to reduce the number of variables to be used in


another GLM technique (ie….regression, MANOVA, etc...)

• When attempting to identify latent constructs that are being


measured by observed variables in the absence of a priori
theory.

Haile M. 5
Intro.con
• probably the most widely-used and well-
known of the “standard” multivariate
methods
• invented by Pearson (1901) and Hotelling
(1933)
• Hesketh and Everitt, van Belle, Fisher, Heagerty,
and Lumley (2004), Afifi, May, and Clark (2012).
More advanced treatments are Mardia, Kent,
and Bibby (1979, chap. 8), and Rencher

Haile M. 6
Intro. Cont.
• How the aim is achieved?
❖ By transforming a new set of variables, the PCA, that are
linear transformations of the original variables

❖ But the new variables are uncorrelated and are ordered so


that the first few of them account for most of the variation in
all the original variables.

❖ The first principal component of the observations is the linear


combination of the original variables whose sample variance
is the greatest amongst all possible such combinations

Haile M. 7
Intro. Con.
The basic goal of PCA is to describe variation in a set of correlated variables,
XT=(X1, X2,. . . , Xp) in terms of uncorrelated variables YT=(Y1,Y2,. . . , Yp)

Notice: the new variables are derive in decreasing order of “importance”

❖That is y1 accounts as much as possible of variation in the original data


amongst all linear combinations of x.

❖Then y2 is chosen to account for as much as possible of the remaining


variation, subject to being uncorrelated with y1,and so on.

❖The new variables defined by this process, y1, y2, …, yp, are principal
Components.

Haile M. 8
PCA: Transformation
From p original variables: x1,x2,...,xp:
Produce p new variables: y1,y2,...,yp:

y1 = a11x1 + a12x2 + ... + a1pxp


y2 = a21x1 + a22x2 + ... + a2pxp yi's are
... Principal Components
yp = ap1x1 + ap2x2 + ... + appxp
such that:
yi's are uncorrelated
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
Haile M. 9
Haile M. 10
Haile M. 11
Haile M. 12
Haile M. 13
How do we find the coefficients eij for a principal
componens?
The solution involves the eigenvalues and eigenvectors of the variance-covariance matrix
Σ.
Solution:
We are going to let λ1 through λp denote the eigenvalues of the variance-covariance
matrix Σ. These are ordered so that λ1 has the largest eigenvalue and λp is the smallest.

Haile M. 14
Haile M. 15
Haile M. 16
Haile M. 17
Haile M. 18
Conclusion: In this case, the components Y1 and Y2 could replace the original three
variables with little loss of information.
Example 2: from text book page 465 Haile M. 19
Look at Johnson page 465/5
1:-

2. A scree plot displays the variance explained by each component.


2:- Plots eigenvalues on Y axis and component number on X axis
Recommendation is to retain all components in the descent
before the first one on the line where it levels off (Cattell, 1966; as
cited by Stevens, 2002).

Haile M. 20
Haile M. 21
Haile M. 22
Unlike factor analysis, PCA is not scale invarian
The eigen values and eigenvectors of a covariance matrix
differ from those of the associated correlation matrix.
Usually, a PCA
of a covariance matrix is meaningful only if the variables are
expressed in the same units.

You tend to use the covariance matrix when the


variable scales are similar and the correlation
matrix when variables are on different scales.

Haile M. 23
Interpretation of the Principal Components
(covariance vs. correlation)

Haile M. 24
Data Analysis: Practices on climate data with Rstudio

• Step 1: Eigenvalues to determine how many


principal components should be considered
(Eigenvalues, and the proportion of variation explained by the
principal components) (Covariance!!!)
Components Eigen values Proportion Cumulative
1 18125.91 0.984 0.9841
2 213.46 0.012 0.996
3 46.63 0.001 0.998
4 19.12 0. 0.9993
5 7.63 0.9997
6 2.85
7 1.70
8 1.07
Total
If you take all of these eigenvalues and add them up and you get the total variance of 18418.37
The proportion of variation explained by each Haile
eigenvalue
M.
is given in the third column. Therefore,
about 99.6% of the variation is explained by the first two eigenvalues together.
Data Analysis: Practices on climate data with Rstudio

• Step 1: Eigenvalues to determine how many


principal components should be considered
(Eigenvalues, and the proportion of variation explained by the
principal components) (correlation!!!)
Components Eigen values Proportion Cumulative
1 2.81 0.35125 0.35125
2 1.74 0.2175 0.56875
3 1.07 0.13375 0.7025
4 0.89 0.11125 0.81375
5 0.6 0.075 0.88875
6 0.4 0.05 0.93875
7 0.26 0.0325 0.97125
8 0.23 0.02875 1
Total 8 1
If you take all of these eigenvalues and add them up and you get the total variance of 8
The proportion of variation explained by each Haile
eigenvalue
M.
is given in the third column. Therefore,
about 81% of the variation is explained by the first two eigenvalues together.
An Alternative Method
• to determine the number of principal
components is to look at a Scree Plot.
PCA of Covariance Matrix PCA of Correlation Matrix

2.5
15000

2.0
Variance of PC

Variance of PC
10000

1.5
5000

1.0
0.5
0

1 2 3 4 5 6 7 8 Haile M. 1 2 3 4 5 6 7 827
Step 2: Next, we will compute the
principal component scores.
For example, the first principal component can
be computed using the elements of the first
eigenvector:
Y1=0.53(rainfall) +0.23(Max T) -0.26(Min T)-0.45 (RH)+0.06 (WR)+0.13(Exmax
T)-0.31(ExtMiT)+0.53(SS)

Magnitudes of the coefficients give the contributions of each variable to that


component. However, the magnitude of the coefficients also depend on the
variances of the corresponding variables.

Haile M. 28
Interpretation of the Principal Components

• Step 3: To interpret each component, we must compute the


correlations between the original data for each variable and
each principal component.
Principal components
Variables 1 2 3
Rainfall
MaxT
MinT
RH
WR
ExtMaxT
ExtMinT
SS
Total
You will also note that if you look at the principal components themselves that
Haile M. 29
there is zero correlation between the components.
Interpretation of the principal
components
• is based on finding which variables are most strongly
correlated with each component, i.e., which of these
numbers are large in magnitude, the farthest from zero in
either positive or negative direction.
• Which numbers we consider to be large or small is of
course is a subjective decision.
• We need to determine at what level the correlation value
will be of importance.
• Here a correlation value above 0.5 is deemed important.
These larger correlations are in boldface in the table above:

• We will now interpret the principal component results with


respect to the value that we have deemed significant.

Haile M. 30
First Principal Component Analysis - PCA1

• The first principal component is strongly correlated with


five of the original variables. The first principal component
increases with increasing Arts, Health, Transportation,
Housing and Recreation scores. This suggests that these
five criteria vary together.

• If one increases, then the remaining ones tend to as well.


Furthermore, we see that the first principal component
correlates most strongly with the Arts.

• In fact, we could state that based on the correlation of
0.985 that this principal component is primarily a measure
of the Arts.

Haile M. 31
Second Principal Component Analysis - PCA2:
Same with one above…

• Do the same for standardized ones and retain


the number of PCA and compare the results
with the original one given above????

Haile M. 32

You might also like