Singular Value Decomposition Example PDF
Singular Value Decomposition Example PDF
edu
11x1 + x2 = x1
x1 + 11x2 = x2
and rearrange to get
(11 )x1 + x2 = 0
x1 + (11 )x2 = 0
16
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
(11 10)x1 + x2 = 0
x1 = x2
which is true for lots of values, so well pick x1 = 1 and x2 = 1 since those are small and
easier to work with. Thus, we have the eigenvector [1, 1] corresponding to the eigenvalue
= 10. For = 12 we have
(11 12)x1 + x2 = 0
x1 = x 2
and for the same reason as before well take x1 = 1 and x2 = 1. Now, for = 12 we have the
eigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the size
of the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalue
is column one, the eigenvector of the next largest eigenvalue is column two, and so forth
and so on until we have the eigenvector of the smallest eigenvalue as the last column of our
matrix. In the matrix below, the eigenvector for = 12 is column one, and the eigenvector
for = 10 is column two. " #
1 1
1 1
Finally, we have to convert this matrix into an orthogonal matrix which we do by applying
the Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizing
v~1 .
v~1 [1, 1] [1, 1] 1 1
u~1 = = 2 = = [ , ]
~
|v1 | 1 +1 2 2 2 2
Compute
w~2 = v~2 u~1 v~2 u~1 =
1 1 1 1
[1, 1] [ , ] [1, 1] [ , ] =
2 2 2 2
17
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
1 1
[1, 1] 0 [ , ] = [1, 1] [0, 0] = [1, 1]
2 2
and normalize
w~2 1 1
u~2 = = [ , ]
|w~2 | 2 2
to give
1 1
" #
U= 2 2
1 1
2 2
2 4 2 x3 x3
10x1 + 2x3 = x1
10x2 + 4x3 = x2
2x1 + 4x2 + 2x3 = x2
which rewrite as
(10 )x1 + 2x3 = 0
(10 )x2 + 4x3 = 0
2x1 + 4x2 + (2 )x3 = 0
which are solved by setting
(10 ) 0 2
0 (10 ) 4 =0
2 4 (2 )
18
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
x1 = 1, x3 = 1
(10 12)x2 + 4x3 = 2x2 + 4x3 = 0
x2 = 2x3
x2 = 2
So for = 12, v~1 = [1, 2, 1]. For = 10 we have
x3 = 0
2x1 + 4x2 = 0
x1 = 2x2
x1 = 2, x2 = 1
which means for = 10, v~2 = [2, 1, 0]. For = 0 we have
10x1 + 2x3 = 0
x3 = 5
10x1 20 = 0
x2 = 2
2x1 + 8 10 = 0
x1 = 1
which means for = 0, v~3 = [1, 2, 5]. Order v~1 , v~2 , and v~3 as column vectors in a matrix
according to the size of the eigenvalue to get
1 2 1
2 1 2
1 0 5
19
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
1 2 1
12 10
" #
0 6 6 6
2 2 2
3 1 1
1
0
=
12 10 5 5 1 3 1
2 2
0
1 2 5
30 30 30
20
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
Remember that to compute the SVD of a matrix A we want the product of three matrices
such that
A = U SV T
where U and V are orthonormal and S is diagonal. The column vectors of U are taken from
the orthonormal eigenvectors of AAT , and ordered right to left from largest corresponding
eigenvalue to the least. Notice that
2 0 8 6 0 2 1 5 7 0 104 8 90 108 0
1 6 0 1 7 0 6 0 0 10 8 87 9 12 109
T
AA =
5 0 7 4 0
8 0 7 8 0 = 90
9 90 111 0
7 0 8 5 0
6 1 4 5 0 108 12 111 138
0
0 10 0 0 7 0 7 0 0 7 0 109 0 0 149
is a matrix whose values are the dot product of all the terms, so it is a kind of dispersion
matrix of terms throughout all the documents. The singular values (eigenvalues) of AA T are
21
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
which are used to compute and order the corresponding orthonormal singular vectors of U .
0.54 0.07 0.82 0.11 0.12
0.10 0.59 0.11 0.79 0.06
U =
0.53 0.06 0.21 0.12 0.81
0.65 0.07 0.51 0.06 0.56
0.06 0.80 0.09 0.59 0.04
This essentially gives a matrix in which words are represented as row vectors containing
linearly independent components. Some word cooccurence patterns in these documents are
indicated by the signs of the coefficients in U . For example, the signs in the first column
vector are all negative, indicating the general cooccurence of words and documents. There
are two groups visible in the second column vector of U : car and wheel have negative
coefficients, while doctor, nurse, and hospital are all positive, indicating a grouping in which
wheel only cooccurs with car. The third dimension indicates a grouping in which car, nurse,
and hospital occur only with each other. The fourth dimension points out a pattern in
which nurse and hospital occur in the absence of wheel, and the fifth dimension indicates a
grouping in which doctor and hospital occur in the absence of wheel.
Computing V T is similar. Since its values come from orthonormal singular vectors of
T
A A, arranged right to left from largest corresponding singular value to the least, we have
79 6 107 68 7
6 136 0 6 112
AT A =
107 0 177 116 0
68 6 116 78 7
7 112 0 7 98
which contains the dot product of all the documents. Applying the Gram-Schmidt orthonor-
malization process and taking the transpose yields
0.46 0.02 0.87 0.00 0.17
0.07 0.76 0.06 0.60 0.23
VT
=
0.74 0.10 0.28 0.22 0.56
0.48 0.03 0.40 0.33 0.70
0.07 0.64 0.04 0.69 0.32
S contains the square roots of the singular values ordered from greatest to least along its
diagonal. These values indicate the variance of the linearly independent components along
each dimension. In order to illustrate the effect of dimensionality reduction on this data set,
well restrict S to the first three singular values to get
17.92 0 0
S= 0
15.17 0
0 0 3.56
22
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
In order for the matrix multiplication to go through, we have to eliminate the corresponding
row vectors of U and corresponding column vectors of V T to give us an approximation of A
using 3 dimensions instead of the original 5. The result looks like this.
A =
0.54 0.07 0.82
0.10 0.59 0.11 17.92 0 0 0.46 0.02 0.87 0.00 0.17
8 References
Deerwester, S., Dumais, S., Landauer, T., Furnas, G. and Harshman, R. (1990). Indexing by
Latent Semantic Analysis. Journal of the American Society of Information Science
41(6):391-407.
Ientilucci, E.J., (2003). Using the Singular Value Decomposition. http://www.cis.rit.
edu/ejipci/research.htm
23
ROUGH DRAFT - USE AT OWN RISK: suggestions [email protected]
Jackson, J. E. (1991). A Users Guide to Principal Components Analysis. John Wiley &
Sons, NY.
Manning, C. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing.
MIT Press, Cambridge, MA.
Marcus, M. and Minc, H. (1968). Elementary Linear Algebra. The MacMillan Company,
NY.
24