0% found this document useful (0 votes)

2 views

Chapter6_MV

Chapter Five/Six discusses Principal Components Analysis (PCA), a multivariate statistical technique aimed at reducing the dimensionality of data while preserving variance. It outlines the method's objectives, appropriate conditions for use, and the process of transforming original variables into uncorrelated principal components. Additionally, it emphasizes the importance of eigenvalues and eigenvectors in determining the number of principal components and interpreting their significance in relation to the original variables.

Uploaded by

Tewachew Guadie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Chapter6_MV

Uploaded by

Tewachew Guadie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Chapter Five/Six:-: Principal Componets

Analysis (PCA)

Haile M. 1
PCA: Introduction
• It is concerned with explaining the variance
covariance structure of the data through a few
linear combinations of the original variables.
Its general objectives are:

• data reduction
• interpretation

Haile M. 2
Cont.
• A member of the general linear model (GLM) where all analyses are
correlational
• Term often used interchangeably with “factor analysis”, however,
there are slight differences

• A method of reducing large data sets into more manageable

“factors” or “components”

• A method of identifying the most useful variables in a dataset

• A method of identifying and classifying variables across common

themes, or constructs that they represent (FA)

Haile M. 3
How it comes?
• Too many variables: ??? Even the scatter plots and scatter plot
matrix are to be more useful for is relatively small number of
variables.

• This brings the scientist to Principal Components Analysis

(PCA):
• PCA; is multivariate statistical techniques that are often useful
in reducing dimensionality of a collection of unstructured
random variables for analysis and interpretation.

• MV technique with a central aim of reducing the

dimensionality of MV data set while accounting for as much of
the original variation as possible
Haile M.
present in the data set. 4
PCA appropriate?
• When the data is interval or ratio level

• When trying to reduce the number of variables to be used in

another GLM technique (ie….regression, MANOVA, etc...)

• When attempting to identify latent constructs that are being

measured by observed variables in the absence of a priori
theory.

Haile M. 5
Intro.con
• probably the most widely-used and well-
known of the “standard” multivariate
methods
• invented by Pearson (1901) and Hotelling
(1933)
• Hesketh and Everitt, van Belle, Fisher, Heagerty,
and Lumley (2004), Afifi, May, and Clark (2012).
More advanced treatments are Mardia, Kent,
and Bibby (1979, chap. 8), and Rencher

Haile M. 6
Intro. Cont.
• How the aim is achieved?
❖ By transforming a new set of variables, the PCA, that are
linear transformations of the original variables

❖ But the new variables are uncorrelated and are ordered so

that the first few of them account for most of the variation in
all the original variables.

❖ The first principal component of the observations is the linear

combination of the original variables whose sample variance
is the greatest amongst all possible such combinations

Haile M. 7
Intro. Con.
The basic goal of PCA is to describe variation in a set of correlated variables,
XT=(X1, X2,. . . , Xp) in terms of uncorrelated variables YT=(Y1,Y2,. . . , Yp)

Notice: the new variables are derive in decreasing order of “importance”

❖That is y1 accounts as much as possible of variation in the original data

amongst all linear combinations of x.

❖Then y2 is chosen to account for as much as possible of the remaining

variation, subject to being uncorrelated with y1,and so on.

❖The new variables defined by this process, y1, y2, …, yp, are principal
Components.

Haile M. 8
PCA: Transformation
From p original variables: x1,x2,...,xp:
Produce p new variables: y1,y2,...,yp:

y1 = a11x1 + a12x2 + ... + a1pxp

y2 = a21x1 + a22x2 + ... + a2pxp yi's are
... Principal Components
yp = ap1x1 + ap2x2 + ... + appxp
such that:
yi's are uncorrelated
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
Haile M. 9
Haile M. 10
Haile M. 11
Haile M. 12
Haile M. 13
How do we find the coefficients eij for a principal
componens?
The solution involves the eigenvalues and eigenvectors of the variance-covariance matrix
Σ.
Solution:
We are going to let λ1 through λp denote the eigenvalues of the variance-covariance
matrix Σ. These are ordered so that λ1 has the largest eigenvalue and λp is the smallest.

Haile M. 14
Haile M. 15
Haile M. 16
Haile M. 17
Haile M. 18
Conclusion: In this case, the components Y1 and Y2 could replace the original three
variables with little loss of information.
Example 2: from text book page 465 Haile M. 19
Look at Johnson page 465/5
1:-

2. A scree plot displays the variance explained by each component.

2:- Plots eigenvalues on Y axis and component number on X axis
Recommendation is to retain all components in the descent
before the first one on the line where it levels off (Cattell, 1966; as
cited by Stevens, 2002).

Haile M. 20
Haile M. 21
Haile M. 22
Unlike factor analysis, PCA is not scale invarian
The eigen values and eigenvectors of a covariance matrix
differ from those of the associated correlation matrix.
Usually, a PCA
of a covariance matrix is meaningful only if the variables are
expressed in the same units.

You tend to use the covariance matrix when the

variable scales are similar and the correlation
matrix when variables are on different scales.

Haile M. 23
Interpretation of the Principal Components
(covariance vs. correlation)

Haile M. 24
Data Analysis: Practices on climate data with Rstudio

• Step 1: Eigenvalues to determine how many

principal components should be considered
(Eigenvalues, and the proportion of variation explained by the
principal components) (Covariance!!!)
Components Eigen values Proportion Cumulative
1 18125.91 0.984 0.9841
2 213.46 0.012 0.996
3 46.63 0.001 0.998
4 19.12 0. 0.9993
5 7.63 0.9997
6 2.85
7 1.70
8 1.07
Total
If you take all of these eigenvalues and add them up and you get the total variance of 18418.37
The proportion of variation explained by each Haile
eigenvalue
M.
is given in the third column. Therefore,
about 99.6% of the variation is explained by the first two eigenvalues together.
Data Analysis: Practices on climate data with Rstudio

• Step 1: Eigenvalues to determine how many

principal components should be considered
(Eigenvalues, and the proportion of variation explained by the
principal components) (correlation!!!)
Components Eigen values Proportion Cumulative
1 2.81 0.35125 0.35125
2 1.74 0.2175 0.56875
3 1.07 0.13375 0.7025
4 0.89 0.11125 0.81375
5 0.6 0.075 0.88875
6 0.4 0.05 0.93875
7 0.26 0.0325 0.97125
8 0.23 0.02875 1
Total 8 1
If you take all of these eigenvalues and add them up and you get the total variance of 8
The proportion of variation explained by each Haile
eigenvalue
M.
is given in the third column. Therefore,
about 81% of the variation is explained by the first two eigenvalues together.
An Alternative Method
• to determine the number of principal
components is to look at a Scree Plot.
PCA of Covariance Matrix PCA of Correlation Matrix

2.5
15000

2.0
Variance of PC

Variance of PC
10000

1.5
5000

1.0
0.5
0

1 2 3 4 5 6 7 8 Haile M. 1 2 3 4 5 6 7 827
Step 2: Next, we will compute the
principal component scores.
For example, the first principal component can
be computed using the elements of the first
eigenvector:
Y1=0.53(rainfall) +0.23(Max T) -0.26(Min T)-0.45 (RH)+0.06 (WR)+0.13(Exmax
T)-0.31(ExtMiT)+0.53(SS)

Magnitudes of the coefficients give the contributions of each variable to that

component. However, the magnitude of the coefficients also depend on the
variances of the corresponding variables.

Haile M. 28
Interpretation of the Principal Components

• Step 3: To interpret each component, we must compute the

correlations between the original data for each variable and
each principal component.
Principal components
Variables 1 2 3
Rainfall
MaxT
MinT
RH
WR
ExtMaxT
ExtMinT
SS
Total
You will also note that if you look at the principal components themselves that
Haile M. 29
there is zero correlation between the components.
Interpretation of the principal
components
• is based on finding which variables are most strongly
correlated with each component, i.e., which of these
numbers are large in magnitude, the farthest from zero in
either positive or negative direction.
• Which numbers we consider to be large or small is of
course is a subjective decision.
• We need to determine at what level the correlation value
will be of importance.
• Here a correlation value above 0.5 is deemed important.
These larger correlations are in boldface in the table above:

• We will now interpret the principal component results with

respect to the value that we have deemed significant.

Haile M. 30
First Principal Component Analysis - PCA1

• The first principal component is strongly correlated with

five of the original variables. The first principal component
increases with increasing Arts, Health, Transportation,
Housing and Recreation scores. This suggests that these
five criteria vary together.

• If one increases, then the remaining ones tend to as well.

Furthermore, we see that the first principal component
correlates most strongly with the Arts.
•
• In fact, we could state that based on the correlation of
0.985 that this principal component is primarily a measure
of the Arts.

Haile M. 31
Second Principal Component Analysis - PCA2:
Same with one above…

• Do the same for standardized ones and retain

the number of PCA and compare the results
with the original one given above????

Haile M. 32

Instant download (Ebook) No Bullshit Guide to Linear Algebra by Ivan Savov ISBN 9780992001025, 0992001021 pdf all chapter
100% (3)
Instant download (Ebook) No Bullshit Guide to Linear Algebra by Ivan Savov ISBN 9780992001025, 0992001021 pdf all chapter
77 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
34 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Introduction To Linear Algebra For Science and Engineering 1st Ed
89% (57)
Introduction To Linear Algebra For Science and Engineering 1st Ed
550 pages
Matrix Analysis, Bellman
67% (3)
Matrix Analysis, Bellman
426 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
Intermediate R - Principal Component Analysis
No ratings yet
Intermediate R - Principal Component Analysis
8 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
STAT502
No ratings yet
STAT502
13 pages
Pca Ica
No ratings yet
Pca Ica
34 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
GIS320 Lecture6 Principal Components Analysis
No ratings yet
GIS320 Lecture6 Principal Components Analysis
16 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Jolliffe 2014
No ratings yet
Jolliffe 2014
5 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
No ratings yet
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
12 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
53 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Ahmed Rebai PCA-ICA
No ratings yet
Ahmed Rebai PCA-ICA
34 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
Dimensionality Reduction2023
No ratings yet
Dimensionality Reduction2023
20 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
DR Pca
No ratings yet
DR Pca
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
MDA PrincipalComponentAnalysis
No ratings yet
MDA PrincipalComponentAnalysis
20 pages
Unit 17
No ratings yet
Unit 17
12 pages
Principal+Component+Analysis
No ratings yet
Principal+Component+Analysis
6 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
PCA - Principal Component Analysis: Step by Step Computation of PCA
No ratings yet
PCA - Principal Component Analysis: Step by Step Computation of PCA
2 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Lecture Five-Multivariate Factor Models
No ratings yet
Lecture Five-Multivariate Factor Models
20 pages
Data Mining - Module 2 - HU
No ratings yet
Data Mining - Module 2 - HU
88 pages
Pca
No ratings yet
Pca
18 pages
program-3
No ratings yet
program-3
7 pages
Unit5 1
No ratings yet
Unit5 1
98 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
PCA Course Transcription
No ratings yet
PCA Course Transcription
18 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Principal Components Analysis: Hal Whitehead BIOL4062/5062
No ratings yet
Principal Components Analysis: Hal Whitehead BIOL4062/5062
29 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
16 pages
Principal Component Analysis: Learning Objectives
No ratings yet
Principal Component Analysis: Learning Objectives
11 pages
Doc-20240330-Wa0002 240330 194818
No ratings yet
Doc-20240330-Wa0002 240330 194818
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Mlfa Autumn 2023 Pca
No ratings yet
Mlfa Autumn 2023 Pca
32 pages
Fundamentals of Modern Mathematics: A Practical Review
From Everand
Fundamentals of Modern Mathematics: A Practical Review
David B. MacNeil
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Complex Matrices
No ratings yet
Complex Matrices
13 pages
Immediate download (eBook PDF) Modeling and Analysis of Dynamic Systems 3rd Edition ebooks 2024
100% (9)
Immediate download (eBook PDF) Modeling and Analysis of Dynamic Systems 3rd Edition ebooks 2024
46 pages
8.06 Spring 2016 Lecture Notes 1. Approximate Methods For Time-Independent Hamiltonians
No ratings yet
8.06 Spring 2016 Lecture Notes 1. Approximate Methods For Time-Independent Hamiltonians
26 pages
S1 EC5-Syllabus
No ratings yet
S1 EC5-Syllabus
104 pages
CP3 Notes - Toby Adkins
No ratings yet
CP3 Notes - Toby Adkins
57 pages
Unit-3 Matrices 2022-2023
No ratings yet
Unit-3 Matrices 2022-2023
35 pages
NMCP Unit 5
No ratings yet
NMCP Unit 5
4 pages
A First Course of Partial Differential Equations in Physical Sciences and Engineering - PDEbook
No ratings yet
A First Course of Partial Differential Equations in Physical Sciences and Engineering - PDEbook
285 pages
Session Plan - MATH-1036 Sep 2020
No ratings yet
Session Plan - MATH-1036 Sep 2020
2 pages
Detection of Urea Adulteration in Milk Using Near-Infrared Raman Spectros
No ratings yet
Detection of Urea Adulteration in Milk Using Near-Infrared Raman Spectros
10 pages
Math 2011
100% (1)
Math 2011
24 pages
ML GTU Solution
No ratings yet
ML GTU Solution
83 pages
ML Class Presentation Notes
No ratings yet
ML Class Presentation Notes
51 pages
Prerequisites, MFE Program, Berkeley-Haas PDF
No ratings yet
Prerequisites, MFE Program, Berkeley-Haas PDF
3 pages
Craig-Bampton Method For A Two Component System
No ratings yet
Craig-Bampton Method For A Two Component System
23 pages
Vinod, Hrishikesh D. - Hands-On Matrix Algebra Using R (2011) PDF
No ratings yet
Vinod, Hrishikesh D. - Hands-On Matrix Algebra Using R (2011) PDF
348 pages
Modified Gath-Geva Fuzzy Clustering For Identifica PDF
No ratings yet
Modified Gath-Geva Fuzzy Clustering For Identifica PDF
18 pages
R23 Syllabus Maths 271123
100% (1)
R23 Syllabus Maths 271123
7 pages
Spectral Graph Theory and Its Applications: Lillian Dai 6.454 Oct. 20, 2004
No ratings yet
Spectral Graph Theory and Its Applications: Lillian Dai 6.454 Oct. 20, 2004
26 pages
The APW Method
No ratings yet
The APW Method
10 pages
Strakos: On The Real Convergence Rate of The Conjugate Gradient Method
No ratings yet
Strakos: On The Real Convergence Rate of The Conjugate Gradient Method
15 pages
Lecture Notes On Linear Algebra: S. K. Panda IIT Kharagpur October 25, 2019
No ratings yet
Lecture Notes On Linear Algebra: S. K. Panda IIT Kharagpur October 25, 2019
27 pages
What Is Principal Component Analysis For Dummies
No ratings yet
What Is Principal Component Analysis For Dummies
6 pages
Mathcad Functions PDF
No ratings yet
Mathcad Functions PDF
33 pages
Btech Ec 503 Control System 2012
No ratings yet
Btech Ec 503 Control System 2012
7 pages
PhysRevLett 118 130201NulaRimanoveZetaF
No ratings yet
PhysRevLett 118 130201NulaRimanoveZetaF
5 pages
C. T. Sun Mechanics of Aircraft Structures
No ratings yet
C. T. Sun Mechanics of Aircraft Structures
192 pages

Chapter6_MV

Uploaded by

Chapter6_MV

Uploaded by

Chapter Five/Six:-: Principal Componets

• A method of reducing large data sets into more manageable

• A method of identifying the most useful variables in a dataset

• A method of identifying and classifying variables across common

• This brings the scientist to Principal Components Analysis

• MV technique with a central aim of reducing the

• When trying to reduce the number of variables to be used in

• When attempting to identify latent constructs that are being

❖ But the new variables are uncorrelated and are ordered so

❖ The first principal component of the observations is the linear

Notice: the new variables are derive in decreasing order of “importance”

❖That is y1 accounts as much as possible of variation in the original data

❖Then y2 is chosen to account for as much as possible of the remaining

y1 = a11x1 + a12x2 + ... + a1pxp

2. A scree plot displays the variance explained by each component.

You tend to use the covariance matrix when the

• Step 1: Eigenvalues to determine how many

• Step 1: Eigenvalues to determine how many

Magnitudes of the coefficients give the contributions of each variable to that

• Step 3: To interpret each component, we must compute the

• We will now interpret the principal component results with

• The first principal component is strongly correlated with

• If one increases, then the remaining ones tend to as well.

• Do the same for standardized ones and retain

You might also like