Appendix II: Derivation of Principal Component Analysis
Appendix II: Derivation of Principal Component Analysis
Principal Component
Analysis
2.1
2.1.1
x = y 1 u 1 + y 2 u 2 + + y m u m
[154]
where m < n and y j is the weighting coefficient of basis vector u j formed by taking the inner
product of x with u j :
T
yj = x uj .
[155]
By forming a k n observation matrix X , the rows of which are the observations x k , we can
express Equation 154 and Equation 155 respectively in the following form:
T
= YU T
X
m
[156]
Y m = XU m
[157]
k m matrix whose columns are the coefficients for each column vector in U m . By the orthog157
onality of U m and by the imposition of an additional constraint that the columns of U m have
unit norm, i.e. U j = 1 , it follows that:
T
Um Um = Im
[158]
where I m is a m m diagonal matrix with unit entries. We refer to Equation 156 as the PCAfeature re-synthesis equation and Equation 157 as the PCA-feature projection equation where the
columns of Y m correspond to the estimated features and the columns of U m are the projection
vectors which perform the linear transformation of the input to the new uncorrelated basis.
The problem, then, for deriving a PCA is to obtain the matrix U m such that the residual error in
approximating x with x is minimized:
n
= x x =
y ju j
[159]
j = m+1
that is, the expansion of all the unused features results in a minimal signal which is the residual
error . A suitable criteria is the minimization of the expectation of the mean-square residual
error:
2
= E [ ] = E [ x x ]
[160]
E[(x)]
( x ) px ( x ) dx
[161]
where p x ( x ) is the probability density function of the random variable x . Since the expectation
operator is linear and due to the condition that the columns of U m are orthonormal it follows that:
158
T
= E [ ] = E y i u i y j u j
i = m + 1
j = m + 1
E[y j ]
[162]
j = m+1
E[y j ]
= E [ ( u j x ) ( x u j ) ] = u j E [ xx ]u j = u j Ru j
T
[163]
where R is the correlation matrix for x . Now by substitution of Equation 163 into Equation 162
we arrive at the quantity to be minimized:
n
u j Ru j
[164]
j = m+1
' = 2 ( Ru j ju j ) = 0, j = m + 1, , n
u j
[165]
It is well known that the solutions to this equation constitute the eigenvectors of the correlation
matrix R . It is also worth noting that the correlation matrix is related to the covariance matrix by
the following expression:
T
Q x = R x mm .
[166]
Thus, for zero-mean or centered data, the problem is equivalent to finding the eigenvectors of the
covariance matrix Q x . Since the columns of U are now determined to be the eigenvectors, we
can re-express the residual error as the sum of the eigenvalues of the unused portion of the basis:
n
[167]
j = m+1
159
and the solution to the minimization reduces to ordering the basis vectors u j such that the columns with the smallest eigenvalues occur in the unused portion of the basis which also implies that
the m columns of U m which are used for reconstruction should comprise the eigenvectors with
the m largest eigenvalues.
Now that we have arrived at the form of the solution for optimal orthonormal basis reconstruction
(in the square-error sense) we must find a general form for representing the solution. Since the
eigenvalues form the diagonal elements of the covariance of the transformed data with all other
elements equal to zero we can express the solution to the eigenvalue decomposition as a diagonalization of the input covariance matrix Q x .
160