PCA ChrisDing4
PCA ChrisDing4
Chris Ding
Department of Computer Science and Engineering
University of Texas at Arlington
PCA is the procedure of finding intrinsic
dimensions of the data
1.Data analysis
2.Data reduction
3.Data visualization
• Face recognition
• Handwritten digit recognition
• Text mining
• Image retrieval
• Microarray data analysis
• Protein classification
Use PCA to approximate an image (a data matrix)
112 x 92
original
PCA k=1
PCA k=2
PCA k=4
PCA k=6
Use PCA to approximate a set of images
original
PCA k=1
PCA k=2
PCA k=4
PCA k=6
Display the characters in 2-dim space
a T
x
~x G x T
T 1
a2 x
Application of feature reduction
Intrinsic dimensions of the data
Samples of children: hours of study, hours on internet, vs their age
Children’s age
Hours on internet
Intrinsic dimensions of the data
Samples of children: hours of study, hours on internet, vs their age
Children’s age
Hours on internet
PCA is the procedure of finding
intrinsic dimensions of the data
Find lines that best represent the data
PCA is a rotation of space to proper
directions (principal directions)
Geometric picture of principal components (PCs)
z1
Children’s age
Hours on internet
PCA Step 1: find a line that best represents the data
Children’s age
Hours on internet
PCA Step 1: find a line that best represents the data
Children’s age
Hours on internet
PCA Step 1: find a line that best represents the data
Children’s age
Hours on internet
PCA Step 1: find a line that best represents the data
Children’s age
Hours on internet
PCA Step 1: find a line that best represents the data
Children’s age
projection error
Hours on internet
Children’s age
Hours on internet
Children’s age
Hours on internet
PCA from maximum variance
smaller
variance
Larger
variance
Larger spread-out = Larger variance
What is Principal Component Analysis?
z1
var[ z1 ] is maximized
Principal Component as maximum variance
Because
1
n
2
var[ z1 ] E (( z1 z1 ) 2 ) a1T xi a1T x
n i 1
1 n T
T
a1 xi x xi x a1 a1T Sa1
n i 1
where
1 n
S xi x xi x
n i 1
T
1 n
is the covariance matrix. x xi is the mean.
n i 1
In the following, we assume the data is centered. x 0
Principal Component as maximum variance
To find a1 that T
maximizes var[ z1 ] subject to a1 a1 1
therefore a1 is an eigenvector of S
T T T
L a Sa2 (a a2 1) a a
2 2 2 1
Algebraic derivation of PCs
T T T
L a Sa2 (a a2 1) a a
2 2 2 1
L Sa2 a2 a1 0 0
a2
T
Sa2 a2 and a Sa2 2
Algebraic derivation of PCs
U (u1 ,u 2 , ,u k )
T
p T ad x
x x~ G x PCAsubspace
Algebraic derivation of PCs
Assume x0
p n
Form the matrix: X [ x1 , x2 , , xn ]
1
then S XX T
n
After you
1.Compute the covariance matrix S
2.Obtain the first k eigenvectors of S as (u_1, …, u_k)
Show that:
You can obtain (v_1,…,v_k) by doing matrix – vector
multiplications. No need to compute eigenvectors of the
kernel (Gram) matrix.
Reduction and Reconstruction Reconstruction
Dimension reduction X p n G T X d n
G T X d n X G (G T X ) pn
GT d p
Y G T X d n
X p n
X p n
G p d
Optimality property of PCA
Main theoretical result:
The matrix G consisting of the first d eigenvectors of the
covariance matrix S solves the following min problem:
T 2
min G pd X G (G X ) subject to G T G I d
F
2
X X reconstruction error
F
Linear projections
will not detect the
pattern.
Nonlinear PCA using Kernels
: x ( x)
• Computational efficiency: apply the kernel trick.
– Require PCA can be rewritten in terms of dot product.
More on kernels
K ( xi , x j ) ( xi ) ( x j ) later
Nonlinear PCA using Kernels
n
1
v i ( xi ) X
i
T
X X
n
( x) v ( x) i ( xi )
i
i ( x) ( xi ) i K ( x, xi )
i i
把每个类的数据点集中到类中心 ( 假设每个类大致是球型 )
这 K 个类中心就组成了主成分分析的子空间!
(这可用数学严格证明)
in p-dim space
in p-dim space
One early major advance on PCA, K-means (Zha, He, Ding, et al, NIPS 2000)
(Ding & He, ICML 2004)
Solution of K-means is represented by cluster indicators
H
n1
nk
n1
n2
nk
Q TQ I