0% found this document useful (0 votes)

1 views

Principal_Components

The document discusses principal component analysis (PCA), focusing on its objectives of data reduction and interpretation through linear combinations of variables. It explains the mathematical foundations of PCA, including the derivation of principal components from covariance matrices, and provides examples of applying PCA to real datasets. Additionally, it covers the standardization of variables and the implications of sample variation in PCA.

Uploaded by

vanjunxin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Principal_Components

Uploaded by

vanjunxin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1 Introduction

• A principal component analysis is concerned with explaining the variance-covariance

structure of a set of variables through a few linear combinations of these vari-
ables.
• p components can reproduce the total system variability, often much of this vari-
ability can be accounted for by a small number of k of the principal components.
• The original data set, consisting of n measurements on p variables, is reduced to
a data set consisting of n measurements on k principal components.
• Its general objectives are (1) data reduction and (2) interpretation.

2 Population Principal Components

• Let the random vector X′ = [X1 , X2 , · · · , Xp ] have the covariance matrix Σ
with eigenvalues λ1 ≥ · · · ≥ λp ≥ 0.

• Consider the linear combinations

Y1 = a′1 X = a11 X1 + · · · + a1p Xp

Y2 = a′2 X = a21 X1 + · · · + a2p Xp
.. ..
. .
Yp = a′p X = ap1 X1 + · · · + app Xp

• we have

V ar(Yi ) = a′i Σai , i = 1, · · · , p

Cov(Yi , Yk ) = a′i Σak , i, k = 1, 2, · · · , p

• The principal components are those uncorrelated linear combinations Y1 , Y2 , · · · , Yp

whose variances are as large as possible.
• We deﬁne
– First principal component: linear combination a′1 X that maximizes V ar(a′1 X)
subject to a′1 a1 = 1.
– Second principal component: linear combination a′2 X that maximizes
V ar(a′2 X) subject to a′2 a2 = 1 and Cov(a′1 X, a′2 X) = 0.

···
– At the i step, ith principal component: linear combination a′i X that max-
imizes V ar(a′i X) subject to a′i ai = 1 and Cov(a′i X, a′k X) = 0, for k < i.

1
Let Σ be the covariance matrix associated with the random vector X′ = [X1 , · · · , Xp ].
Let Σ have the eigenvalue-eigenvector pairs (λ1 , e1 ), · · · , (λp , ep ) where λ1 ≥ · · · λp ≥
0. Then the ith principal component is given by

Yi = e′i X = ei1 X1 + · · · + eip Xp , i = 1, · · · , p

with these choices

V ar(Yi ) = e′i Σei = λi , i = 1, · · · , p
Cov(Yi , Yk ) = e′i Σek = 0, i ̸= k
If some λi are equal, the choices of the corresponding coefﬁcient vectors ei and hence
Yi are not unique.
Let X′ = [X1 , · · · , Xp ] have covariance matrix Σ, with the eigenvalue-eigenvector
pairs (λ1 , e1 ), · · · , (λp , ep ) where λ1 ≥ · · · λp ≥ 0. Let Y1 = e′1 X, · · · , Yp = e′p X be
the principal components. Then

∑
p ∑
p
σ11 + · · · + σpp = V ar(Xi ) = λ1 + · · · + λp = V ar(Yi )
i=1 i=1

If Y1 = e′1 X, · · · , Yp = e′p X are the principal components obtained from the

covariance matrix Σ, then
√
eik λi
ρYi ,Xk = √ i, k = 1, 2, · · · , p
σkk

are the correlation coefﬁcients between the components Yi and the variables Xk . Here
(λ1 , e1 ), · · · , (λp , ep ) are the eigenvalue eigenvector pairs for Σ.
Suppose the random variables X1 ,X2 and X3 have the covariance matrix
 
1 −2 0
−2 5 0
0 0 2

• Find the principal components for random vector X′ = [X1 , X2 , X3 ].

• Determine how many principal components should be used to replace the origi-
nal three variables.

• Calculate the correlation coefﬁcient between principal components with original

variables.
Comments
• Suppose X is distributed as Np (µ, Σ)

• The principal components y1 = e′1 x, · · · , yp = e′p x lie in the directions of the

axes of a constant density ellipsoid.

2
3 Principal Components Obtained from Standardized Variables

• Principal components may also be obtained for the standardized variables: Z = (V1/2 )−1 (X − µ)

• We have E(Z) = 0 and Cov(Z) = (V1/2 )−1 Σ(V1/2 )−1 = ρ

<+->Result 4 The ith principal component of the standardized variables Z′ = [Z1 , · · · , Zp ] with Cov(Z) = ρ, is given
by
′ ′ 1/2 −1
Yi = ei Z = ei (V ) (X − µ), i = 1, 2 · · · , p
∑ p ∑ p
Moreover, V ar(Yi ) = V ar(Zi ) = p
i=1 √ i=1
ρYi ,Zk = eik λi , i, k = 1, 2, · · · , p.
and
In this case, (λ1 , e1 ), · · · , (λp , ep ) are the eigenvalue eigenvector pairs for ρ, with λ1 ≥ · · · ≥ λp ≥ 0.
Consider the covariance matrix
[ ]
1 4
Σ=
4 100

and the derived correlation matrix

[ ]
1 0.4
ρ=
0.4 1

• Find the principal components for Σ and ρ

• Discussion

4 Principal Components for Covariance Matrices with Special Structures

•  
σ11 0 ··· 0
 0 σ22 ··· 0 
 
Σ= . .. .. .. 
 .. . . . 
0 0 ··· σpp
•  
σ2 ρσ 2 ··· ρσ 2
ρσ 2 σ2 · · · ρσ 2 
 
Σ= . .. .. .. 
 .. . . . 
ρσ 2 ρσ 2 ··· σ2

5 Summarizing Sample Variation By Principal Com-

ponents
• Suppose the data x1 , x2 , · · · , xn represent n independent drawings from some p-dimensional
population with mean vector µ and covariance matrix Σ. These data yield the sample
mean vector x̄, the sample covariance matrix S, and the sample correlation matrix R.
• The sample principal components are deﬁned as those linear combinations which have
maximum sample variance:

3
– First sample principal component: linear combination a′1 xj the sample variance
of a′1 xj subject to a′1 a1 = 1.
– Second sample principal component: linear combination a′2 xj the sample vari-
ance of a′2 xj subject to a′2 a2 = 1 and zero sample covariance for the paris (a′1 xj , a′2 xj )
– ith sample principal component: linear combination a′i xj the sample variance of
a′i xj subject to a′i ai = 1 and zero sample covariance for the paris (a′i xj , a′k xj ), k <
i
• If S = {sik } is the p × p sample covariance matrix with eigenvalue eigenvector paris
(λ̂1 , ê1 ), · · · , (λ̂p , êp ), the i-th sample principal component is given by

ŷi = ê′i x = eˆi1 x1 + · · · + eˆip xp , i = 1, 2, · · · , p

where λ̂1 ≥ · · · ≥ λ̂p ≥ 0 and x is any observation on the variables X1 , · · · , Xp .

• Sample variance(ŷk ) = λ̂k , , k = 1, 2, · · · , p Sample covariance(ŷi , ŷk ) = 0, i ̸=
0
∑
• T otal sample variance = pi=1 sii = λ̂1 + · · · + λ̂p
√
ê λ̂i
• rŷi ,xk = ik √
skk
, i, k = 1, 2, · · · , p
A census provided information, by tract, on ﬁve socioeconomic variables for the Madison,
Wisconsin, area. The data from 61 tracts are listed in as following
Tract Total Professional Employed Government Median
population degree age over 16 employment home value
(thousands) (percent) (percent) (percent) ($100,000)
1 2.67 5.71 69.02 30.3 1.48
2 2.25 4.37 72.98 43.3 1.44
.
.
.
61 6.48 4.93 74.23 20.9 1.98

These data produced the following summary statistics:

′
x̄ = [4.47, 3.96, 71.42, 26.91, 1.64]
 
3.397 −1.102 4.306 −2.078 0.027
−1.102 9.673 −1.513 10.953 1.203 
 
 −1.513 −28.937 −0.044
 4.306 55.626 
−2.078 10.953 −28.937 89.067 0.957 
0.027 1.203 −0.044 0.957 0.319

Can the sample variation be summarized by one or two principal components?

6 Standardizing the Sample Principal Components

• Standardization is accomplished by
zj = D−1/2 (xj − x̄) j = 1, 2, · · · , n
• Sample mean of the standardized data: z̄ = 0, and sample covariance matrix: Sz =
1
n−1
Z′ Z = R, where Z = [z′1 , · · · , z′n ]′ .
• the ith sample principal component is
ŷi = ê′i z = êi1 z1 + êi2 z2 + · · · + êip zp , i = 1, 2, · · · , p
where (λ̂i , êi ) is the ith eigenvalue-eigenvector pair of R with λ̂1 ≥ · · · ≥ λ̂p ≥ 0.

• Sample variance (ŷi ) = λ̂i , i = 1, 2, · · · , p

Sample covariance (ŷi , ŷk ) = 0, i ̸= k
• Total (standardized) sample variance=tr(R) = p = λ̂1 + · · · + λ̂p
√
• rŷi ,zk = êik λi , i, k = 1, 2, · · · , p

4
The weekly rates of return for ﬁve stocks (JP Morgan, Citibank, Wells Fargo, Royal Dutch
Shell, and ExxonMobil) listed on the New York Stock Exchange were determined for the period
January 2004 through December 2005. The weekly rates of return are deﬁned as (current Friday
closing price-previous Friday closing price)/(previous Friday closing price) adjusted for stock
splits and dividends. The data are listed in the following table:
week JP Morgan Citibank Wells Fargo Royal Dutch Shell Exxon Mobil
1 0.01303 -0.00784 -0.00319 -0.04477 0.00522
2 0.00849 0.01669 -0.00621 0.01196 0.01349
···
103 -0.01279 -0.01437 -0.01874 -0.00498 -0.01637

Find the sample principal components for these data and try to interpret them.

7 Large Sample Inferences

• Let Λ be the diagonal matrix of eigenvalues λ1 , · · · , λp of Σ, then
√ a
n(λ̂ − λ) ∼ Np (0, 2Λ2 )

• Let êi be the eigenvector associated with λ̂i , then

√ a
n(êi − ei ) ∼ Np (0, Ei )
∑p ′
where Ei = λi i=1,k̸=i (λk λ−λ k
i)
2 ek ek

• Each λ̂i is distributed independently of the elements of the associated êi .

Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
14 Midterm 2740 PDF
No ratings yet
14 Midterm 2740 PDF
17 pages
Spearman Rho Correlation
100% (1)
Spearman Rho Correlation
22 pages
Mathematical Modeling Fourth Edition: Mark M. Meerschaert
No ratings yet
Mathematical Modeling Fourth Edition: Mark M. Meerschaert
3 pages
Bia b350f Unit 4
No ratings yet
Bia b350f Unit 4
38 pages
Lecture_note5
No ratings yet
Lecture_note5
53 pages
MV - Principal Components Using SAS
No ratings yet
MV - Principal Components Using SAS
69 pages
Ch8-Principal Components
No ratings yet
Ch8-Principal Components
77 pages
Lecture 9 PRINCIPAL COMPONENTS
No ratings yet
Lecture 9 PRINCIPAL COMPONENTS
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
53 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
Multivariate Statistics PCA
No ratings yet
Multivariate Statistics PCA
19 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
Unit5 1
No ratings yet
Unit5 1
98 pages
AE - Tema 2 - Principal Component Analysis
No ratings yet
AE - Tema 2 - Principal Component Analysis
4 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
1731050702_ML15_PCA
No ratings yet
1731050702_ML15_PCA
12 pages
Week 2 Notes
No ratings yet
Week 2 Notes
23 pages
Week04
No ratings yet
Week04
86 pages
Principal Components Analysis (PCA) : 2.1 Outline of Technique
No ratings yet
Principal Components Analysis (PCA) : 2.1 Outline of Technique
21 pages
Factor_Analysis (1)
No ratings yet
Factor_Analysis (1)
8 pages
Principal Component Analysis (PCA) Final
No ratings yet
Principal Component Analysis (PCA) Final
37 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
39 pages
09 PCA
No ratings yet
09 PCA
22 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Principal Component Analysis: 2.1 Definition of Principal Components
No ratings yet
Principal Component Analysis: 2.1 Definition of Principal Components
8 pages
Principal Component Analysis Slides
No ratings yet
Principal Component Analysis Slides
26 pages
Aprendizaje Estadistico Final
No ratings yet
Aprendizaje Estadistico Final
71 pages
Pca
No ratings yet
Pca
10 pages
PC Regression
No ratings yet
PC Regression
25 pages
Multivariate Analysis (Slides 2)
No ratings yet
Multivariate Analysis (Slides 2)
25 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Unit-3
No ratings yet
Unit-3
28 pages
Pca Portfolio Selection
No ratings yet
Pca Portfolio Selection
18 pages
04 Presentation D 20 Sep 2021
No ratings yet
04 Presentation D 20 Sep 2021
55 pages
Spectrum Estimation For Large Dimensional Covariance Matrices Using Random Matrix Theory
No ratings yet
Spectrum Estimation For Large Dimensional Covariance Matrices Using Random Matrix Theory
25 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Eigen Value Exercise
No ratings yet
Eigen Value Exercise
2 pages
Section 2 - Descriptive Multivariate Statistics
No ratings yet
Section 2 - Descriptive Multivariate Statistics
9 pages
Module12 - Unsupervised Learning
No ratings yet
Module12 - Unsupervised Learning
52 pages
Component Analysis Is A Dimension-Reduction Tool That Can
No ratings yet
Component Analysis Is A Dimension-Reduction Tool That Can
2 pages
SVD Note PDF
No ratings yet
SVD Note PDF
2 pages
SVD Note
No ratings yet
SVD Note
2 pages
Ch9-Factor Analysis Model
No ratings yet
Ch9-Factor Analysis Model
44 pages
MVA Section1 2012
No ratings yet
MVA Section1 2012
14 pages
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
No ratings yet
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
31 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Lecture 10
No ratings yet
Lecture 10
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
Ch. 10 Principal Components Analysis (PCA)
No ratings yet
Ch. 10 Principal Components Analysis (PCA)
17 pages
Multivariate Analysis Notes
No ratings yet
Multivariate Analysis Notes
6 pages
L3
No ratings yet
L3
38 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Principal Components Analysis: Hal Whitehead BIOL4062/5062
No ratings yet
Principal Components Analysis: Hal Whitehead BIOL4062/5062
29 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
CPA 200 COmponents
No ratings yet
CPA 200 COmponents
11 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Algebraic Geometry
From Everand
Algebraic Geometry
Solomon Lefschetz
No ratings yet
Chap 7 Continuous Probability Distribution
No ratings yet
Chap 7 Continuous Probability Distribution
12 pages
Data Profiling Report
No ratings yet
Data Profiling Report
6 pages
Asymtotic Prop of Estimators, Plims and Consitency
No ratings yet
Asymtotic Prop of Estimators, Plims and Consitency
50 pages
CAMBRIDGE As LEVEL-PROB & STAT 1 May-June 2017 - 2021 PAST PAPERS BOOKLET
No ratings yet
CAMBRIDGE As LEVEL-PROB & STAT 1 May-June 2017 - 2021 PAST PAPERS BOOKLET
205 pages
Measurements and Instrumentation Test Examples
No ratings yet
Measurements and Instrumentation Test Examples
7 pages
9.3statistical Tables
No ratings yet
9.3statistical Tables
6 pages
A Janvi Ganatra Stat
No ratings yet
A Janvi Ganatra Stat
7 pages
T - Test (Beating The Market)
No ratings yet
T - Test (Beating The Market)
18 pages
Solution HW3
No ratings yet
Solution HW3
16 pages
Cat-I QP
No ratings yet
Cat-I QP
2 pages
Tutorial 5
No ratings yet
Tutorial 5
5 pages
Final Paper Stat-1201..
No ratings yet
Final Paper Stat-1201..
1 page
Labsheet8_241206_181419
No ratings yet
Labsheet8_241206_181419
6 pages
6644
No ratings yet
6644
2 pages
Business Analytics (MGT555)
No ratings yet
Business Analytics (MGT555)
2 pages
(eBook PDF) Discovering Statistics Using IBM SPSS Statistics 5th Edition instant download
100% (6)
(eBook PDF) Discovering Statistics Using IBM SPSS Statistics 5th Edition instant download
52 pages
P&S Final Exam
No ratings yet
P&S Final Exam
15 pages
EFM 515 Stats Lecture Notes
No ratings yet
EFM 515 Stats Lecture Notes
104 pages
IE 452 Assignment 2
100% (1)
IE 452 Assignment 2
3 pages
Stat Module 3
No ratings yet
Stat Module 3
15 pages
21AI63AI
No ratings yet
21AI63AI
2 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
BCS 040
No ratings yet
BCS 040
4 pages
Forecast Proposal For Intek Tapes: Project Work Submitted To: University at Buffalo, The State University at New York
No ratings yet
Forecast Proposal For Intek Tapes: Project Work Submitted To: University at Buffalo, The State University at New York
23 pages
MM13 Content Module 7 1
No ratings yet
MM13 Content Module 7 1
12 pages
One Sample Proportion Test: Practical Steps Involved in Test For Proportion of Successes
No ratings yet
One Sample Proportion Test: Practical Steps Involved in Test For Proportion of Successes
11 pages
DAE14
No ratings yet
DAE14
44 pages

Principal_Components

Uploaded by

Principal_Components

Uploaded by

1 Introduction

• A principal component analysis is concerned with explaining the variance-covariance

2 Population Principal Components

• Consider the linear combinations

Y1 = a′1 X = a11 X1 + · · · + a1p Xp

V ar(Yi ) = a′i Σai , i = 1, · · · , p

• The principal components are those uncorrelated linear combinations Y1 , Y2 , · · · , Yp

Yi = e′i X = ei1 X1 + · · · + eip Xp , i = 1, · · · , p

with these choices

If Y1 = e′1 X, · · · , Yp = e′p X are the principal components obtained from the

• Find the principal components for random vector X′ = [X1 , X2 , X3 ].

• Calculate the correlation coefﬁcient between principal components with original

• The principal components y1 = e′1 x, · · · , yp = e′p x lie in the directions of the

• We have E(Z) = 0 and Cov(Z) = (V1/2 )−1 Σ(V1/2 )−1 = ρ

and the derived correlation matrix

• Find the principal components for Σ and ρ

4 Principal Components for Covariance Matrices with Special Structures

5 Summarizing Sample Variation By Principal Com-

ŷi = ê′i x = eˆi1 x1 + · · · + eˆip xp , i = 1, 2, · · · , p

where λ̂1 ≥ · · · ≥ λ̂p ≥ 0 and x is any observation on the variables X1 , · · · , Xp .

These data produced the following summary statistics:

Can the sample variation be summarized by one or two principal components?

6 Standardizing the Sample Principal Components

• Sample variance (ŷi ) = λ̂i , i = 1, 2, · · · , p

7 Large Sample Inferences

• Let êi be the eigenvector associated with λ̂i , then

• Each λ̂i is distributed independently of the elements of the associated êi .

You might also like