0% found this document useful (0 votes)
250 views

Pca Vs Pls

The document compares PCA and PLS methods. It first discusses linear regression and its optimization problem to minimize squared error. It then covers PCA, which finds the directions of maximum variance in the data through eigendecomposition of the covariance matrix to compress the data, ignoring information about the variable y. PLS is then discussed as relating the projection directions to both X and y.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
250 views

Pca Vs Pls

The document compares PCA and PLS methods. It first discusses linear regression and its optimization problem to minimize squared error. It then covers PCA, which finds the directions of maximum variance in the data through eigendecomposition of the covariance matrix to compress the data, ignoring information about the variable y. PLS is then discussed as relating the projection directions to both X and y.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

PCA vs PLS

Maya Hristakeva

University of California, Santa Cruz

May 13, 2009

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 1 / 20


Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 2 / 20


Outline

Setup

Data matrix (instances as columns):

X = [x1 ... xT ] ∈ RN x T

Reference values:

y = [y1 ... yT ]T ∈ RT x 1

Goal: minimize square loss


T
1X T 1
min (xi w − yi )2 ≡ min ||XT w − y||2
w 2 w 2
i=1

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 3 / 20


Outline

Variance and Covariance


1
PT
Expectation of X = [x1 ... xT ]: E[X] = T i=1 xi
Variance of X:
T
1 X
var(X) = cov(X, X) = (xi − E[X])(xi − E[X])T
T i=1

Covariance of X = [x1 ... xT ] and Z = [z1 ... zT ]


T
1 X
cov(X, Z) = (xi − E[X])(zi − E[Z])T
T i=1

In this presentation, we assume that X and y are mean-centered:

E[X] = 0 and E[y] = 0

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 4 / 20


Linear Regression

Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 5 / 20


Linear Regression

Linear Regression
Least Squares optimization problem:
T
1X T 1
L(w) = min (xi w − yi )2 ≡ min ||XT w − y||2
w 2 w 2
i=1

Differentiate w.r.t. w:

∇w L(w) = X(XT w − y) = 0
XXT w = Xy

Exact solution:
w? = (XXT )−1 Xy
Note: XXT is not always invertible

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 6 / 20


Linear Regression

Ridge Regression

Regularization penalizes large values of ||w||22

1 λ
L(w) = min ||XT w − y||2 + ||w||2
w 2 2
Differentiate w.r.t. w:

∇w L(w) = X(XT w − y) + λw = 0
(XXT + λI)w = Xy

Exact solution:
w? = (XXT + λI)−1 Xy
Note: XXT + λI is always invertible for λ > 0

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 7 / 20


Prinicpal Component Analysis

Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 8 / 20


Prinicpal Component Analysis

Compression Loss Minimization

Find a rank k projection matrix P for which the compression loss is


minimized:
T
X
min ||Pxi − xi ||2 ≡ min ||PX − X||2
P P
i=1
= min tr ((I − P)XXT )
P
= max tr (PXXT )
P
T
X
= max tr var (P̃T xi )

i=1

where P is a projection matrix of rank k.

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 9 / 20


Prinicpal Component Analysis

Projection Matrix Properties

Properties of P:
P2 = P ∈ RNxN
P = ki=1 pi pTi = P̃P̃T for P̃ = [p1 ...pk ] ∈ RNxK
P

pTi pi = 1 (i.e. pi has unit-length)


pTi pj = 0 for i 6= j (i.e. pi and pj are orthogonal)

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 10 / 20


Prinicpal Component Analysis

Variance Maximization
Find k projection directions P̃ = [p1 ...pk ] for which the variance of
the compressed data (P̃T X) is maximized:
T T
X 1 X T
max tr var (P̃T xi ) ≡ max tr (P̃ xi )(P̃T xi )T

i=1
P̃ T i=1
N
1 X T
= max tr (xi P̃P̃T} xi )
P̃ T i=1 | {z
P
N
1 X
= max tr (P (xi xTi ))
P T i=1
1
= max tr (P XXT )
P
|T {z }
C

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 11 / 20


Prinicpal Component Analysis

PCA Solution
Let C = XXT : covariance matrix of X
X
max tr (PC) = max tr (P( γi ci cTi )
P P
i
X
= max γi tr (cTi Pci )
P | {zP }
i
cT
i Pci ≤1, i cT
i Pci =k
X
≤ max
P γi δi
0≤δi ≤1, i δi =k
i
k
X
= max γij = k largest eigenvalues of C
1≤i1 <i2 <ik ≤n
j=1

Hence, P consists of the eigenvectors corresponding to the k largest


eigenvalues of C.
Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 12 / 20
Prinicpal Component Analysis

Principal Component Regression


Principal Component Regression ≡ PCA + Linear Regression

Use PCA to find a k−rank projection matrix P = P̃P̃T

min ||PX − X||2


P

Minimize square loss


1
arg min ||(P̃T X)T w − y||2
w 2
Solution:

w? = (P̃T XXT P̃)−1 P̃T Xy ∈ Rk x 1

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 13 / 20


Prinicpal Component Analysis

Summary of PCA

Finds a set of k orthogonal direction


Directions of maximum variance of XXT
Minimizes compression error (i.e. best approximation of X)
Ignores all information about y while constructing the projection
matrix P

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 14 / 20


Partial Least Squares

Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 15 / 20


Partial Least Squares

Partial Least Squares (PLS)


Finds components from X that are also relevant to y
PLS finds projection directions for which the covariance between
X and y is maximized:
T
X
T 2
arg max(cov (X pi , y)) = arg max( (xTj pi )yj )2
pi pi
j=1
T
X
= arg max(tr (pTi (xj yj ))2
pi
j=1

= arg max(tr (pTi Xy))2


pi

= arg max(pTi Xy)(pTi Xy)T


pi

= arg max pTi XyyT XT pi


pi
Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 16 / 20
Partial Least Squares

Finding the First PLS Direction p1

Finding p1

arg max pT1 XyyT XT p1 s.t. pT1 p1 = 1


p1

L(p1 , λ) = pT1 XyyT XT p1 − λ(pT1 p1 − 1)


∇p1 L = XyyT XT p1 − λp1 = 0
XyyT XT p1 = λp1

Hence, p1 is the largest eigenvector of XyyT XT .

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 17 / 20


Partial Least Squares

Finding the remaining k − 1 PLS directions

Since (XyyT XT ) is a rank-1 matrix, an additional orthogonality


constraints is used to find the remaining k − 1 PLS projection
directions

arg max pTi XyyT XT pi


pi

s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 18 / 20


Partial Least Squares

PLS Regression
PLS Regression ≡ PLS Decomposition + Linear Regression
Use PLS to find a projection directions pi

max(cov (XT pi , y))2


pi

s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i


Minimize square loss
1
arg min ||(P̃T X)T w − y||2
w 2
Solution:
w? = (P̃T XXT P̃)−1 P̃T Xy
for P̃ = [p1 ... pk ]
Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 19 / 20
Partial Least Squares

Summary

PCA and PLS:


Differ in the optimization problem they solve to find a projection
matrix P
Are all linear decomposition techniques
Can be combined with various loss function other than square
loss

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 20 / 20

You might also like