Singular-Value Decomposition and its Applications
Singular-Value Decomposition and its Applications
Zecheng Kuang1⇤
1 Department of Mathematics, University of California San Diego, 9450 Gilman Dr, 92092 , La Jolla, CA
Abstract: The singular-value decomposition (SVD) is a factorization of a real or complex matrix. During my study in
the linear algebra courses, I found it interesting to decompose a matrix and considered there might have many
useful applications of SVD in real life. In this thesis, we will discuss several applications of SVD such as
recommendation system, imagine compression, handwritten digits classification and the mathematical theorem
behind them.
Singular-value decomposition (SVD) allows an exact representation of any matrix and it is easy to eliminate
the less important data in the matrix to produce a low-dimensional approximation. This is meaningful in such
applications as image compression and recommendation system. Moreover, the natural of SVD allows to form a
subspace spanned by the columns of the matrix, which is useful in the applications such as digit recognition and
destroyed image reconstruction.
Let M be an m ⇥ n matrix, and let r be the rank of M . Then there exists a matrix factorization called Singular-
1. U is an m ⇥ r column-orthonormal matrix; That is, each of its columns is a unit vector and the dot product
3. V T is an r ⇥ n row-orthonormal matrix; That is, each of its rows is a unit vector and the dot product of any
two columns is 0.
⇤
E-mail: [email protected]
1
The proof of the existence of SVD can be refered to Applied Numerical Linear Algebra by James W. Demmel [2]
1
Singular-Value Decomposition and its Applications
The SVD of a matrix M has strong connections to the eigenvectors of the matrix M T M and M M T .
Proposition 1.1.
For any matrix M , M T M and M M T have non-negative eigenvalues.
M T M~v = ~v .
~v T M T M~v = ~v T ~v .
This is equivalent to
Proposition 1.2.
For any matrix M , rank(M )=rank(M T M )=rank(M M T ).
Proof. Let N ull(M ) be the null space of matrix M and x 2 N ull(M ), we have
M x = 0 ) M T M x = 0.
2
Zecheng Kuang
M T M x = 0 ) xT M T M x = 0 ) ||M x||22 = 0 ) M x = 0.
rank((M T )T ) = rank((M T )T M T ).
Equivalently,
rank(M ) = rank(M M T ).
Fact 1.1.
The rows of V T are the eigenvectors corresponding to the positive eigenvalues of M T M and the squares of the
diagonal entries of ⌃ are the positive eigenvalues of M T M . More specifically, the ith row of V T is the eigenvector
of M T M whose corresponding eigenvalue is the square of the ith entry of ⌃.
Suppose M = U ⌃V T , then
M T = (U ⌃V T )T = V ⌃ T U T .
M T = V ⌃U T .
M T M = V ⌃2 V T .
M T M V = V ⌃2 .
It’s not hard to see the diagonal entries of ⌃2 are the eigenvalues of M T M and V is the matrix whose columns are
the corresponding eigenvectors of M T M , which indicates that the rows of V T are the corresponding eigenvectors
of M T M .
Moreover, since M T M is a symmetric matrix, the number of non-zero eigenvalues equals the rank of the matrix,
which is equals to rank of M , which is r. By Proposition 1.1, we know M T M has r positive eigenvalues. So the
3
Singular-Value Decomposition and its Applications
Fact 1.2.
The columns of U are the eigenvectors corresponding to nonzero eigenvalues of M M T .
M M T = (U ⌃V T )(V ⌃U T ) = U ⌃2 U T ,
M M T U = U ⌃ 2 U T U = U ⌃2 .
We discover that U is the matrix whose columns are the corresponding eigenvectors of M M T .
Proposition 1.3.
For any matrix M , M T M and M M T have the same nonzero eigenvalues.
M T M~v = ~v .
M M T M~v = M ~v = M~v .
M M T ~x = ~x.
We can write
M M T ~v = ~v .
M T M M T ~v = M T ~v = M T ~v .
Let M T ~v = ~x , then
M T M~x = ~x.
4
Zecheng Kuang
Remark :
Since M T M is an n ⇥ n matrix and M M T is an m ⇥ m matrix, where n and m don’t necessarily equal to the
r. In fact, n and m are at least as large as r, which indicates that M T M and M M T should have an additional
n r and m r eigenpairs with eigenvalues of zeros. In this paper, we use the reduced SVDs[11], which get rid
of the zero eigenvalues and the corresponding eigenvectors, forming U as a m ⇥ r matrix, V T as r ⇥ n matrix
U is an m ⇥ m matrix, which contains all the eigenvectors of M M T , including the eigenvectors corresponding to
MT M.
V T is an n ⇥ n matrix, which contains all the eigenvectors of M T M , including the eigenvectors corresponding to
Example 1.1.
1 0 1 0
Let M = , the reduced SVD of M is given by
0 1 0 1
p " #
p1 0 p1 0
1 0 2 p0 2 2
0 1 0 2 0 p1 0 p1
2 2 .
U ⌃ VT
In real life, there might be times when we want to find a “best-fitting” curve to a set of some given points. For
example, people might want to find the relationship between lean body mass of human bodies and their muscle
strength or analyze how the housing price changes by years to obtain an estimation of the price in the future.
Most of the time, we might not be able to find such a linear equation that can be satisfied by all of those points,
so finding a “closest” solution is the best we can do. We call these solution to be the least square solutions.
How could we actually find such a “best-fitting curve”?
5
Singular-Value Decomposition and its Applications
Figure 2. Least square solution of the relation between human lean body mass and muscle strength. [1]
Question 2.1.
Suppose we denote all the points on the graph as (xi , yi ) pair, where xi and yi are the x-coordinate and y-
coordinate of point i. We want to find a line ax + by = c, such that
X
(axi + byi c)2
i
2 3 2 3
x1 y1
6x 2 7 6y 2 7
min ||a 4 5 + b 4 5 c||2
.. ..
. .
c 2 R.
for every a, b,2 3
x y 1
6 1 1 7
6 7
If we let A = 66x 2 y2 17
7 , ~
x = 3
a b c , the question then becomes to find ~x 2 R such that ||A~x||2 is smallest.
4. . .. 5
.. .. .
In general, we could have a least-square problem as following:
Question 2.2.
Suppose given A and ~b, where A is a m ⇥ n matrix and ~b 2 Rm , we want to find ~x such that A~x is closest to ~b.
In order words, find ~x such that
minn ||A~x ~b||2 .
x2R
6
Zecheng Kuang
M + = V ⌃+ U T ,
where ⌃+ is from taking the transpose of ⌃ and taking the numerical inverse of all the non-zero entries at ith
row and ith column for every i.
Example 2.1. 21 3
2 3 0 0
3 0 0 0 3
60 1
07
Suppose ⌃ = 40 2 0 05, then ⌃+ = 6 2 7
4 0 0 0 5.
0 0 0 0
0 0 0
Theorem 2.1.
Suppose given A and ~b, where A is a m ⇥ n matrix and ~b in Rm , then
~x = A+~b
Proof. In this proof, we need to utilize the full version of SVD, where U is m ⇥ m, V T is n ⇥ n and ⌃ is m ⇥ n.
Since finding an ~x such that ||U (⌃V T ~x U T ~b)||2 is smallest is the same as finding ~x such that ||⌃V T ~x U T ~b||2
Since the columns of ⌃ are orthogonal, then the columns of ⌃ form a subspace of Rn , then the only way that can
make ||⌃~
y ~c|| smallest is to make ⌃~
y the projection of ~c onto the column space of ⌃.
7
Singular-Value Decomposition and its Applications
82 3 2 3 2 39
>
> >
>
>
>
>
1 0
6 7 6 7 6 0 7> >
>
>
>6 7 6 7 6 7 >
>
<6 0 7 6 2 7 6 .. 7>>
6 7 6 7 6 . 7=
6 7
Let C(⌃) =span 6 7 , 6 7 , . . . , 6 7 , the column space of ⌃, where r is the rank of A. And we denote !i
>
> 607 607 6 r 7> >
>
> 6 7 6 7 6 7> >
>4 5 4 5
> 4 . 5> >
>
> .. .
. >
>
: . . .
. ;
as the ith column of ⌃. So ⌃~
y must equal to
2
Since < !i , !i >= i, we can rewrite projC(⌃) c as
!1 !2 !r
< c, 2
> !1 + < c, 2
> !2 + . . . + < c, 2
> !r .
1 2 r
Keep in mind that that are n columns in ⌃, where there are n r columns with all zero entries. So it turns out
!i
that < c, 2 >= 0 if r < i n.
i
8
Zecheng Kuang
2 3 2 3
!1
6 < c, 2 > 7 !1T / 2
1
6 1
7 6 7
6 < c, !2 > 7 6 T 7
6 7 6 !2 / 27
6
2
7 = ! ! ... ! 6 27
!1 !2 . . . ! n 6 .
2
7 1 2 n 6 . 7 c.
6 .. 7 6 . 7
6 7 6 . 7
4 5 4 5
< c, !n2 > !nT / 2
n
n
It turns out
y = projC(⌃) c = ⌃⌃+ c.
⌃~
So
V T ~x = ⌃+ U T ~b.
V V T ~x = ~x = V ⌃+ U T ~b
We conclude that
~x = A+~b
Remark : In this case, U ⌃V T is the full SVD of matrix A and A+ = V ⌃+ U T . However, if we let the reduced
T T
SVD of A to be U 0 ⌃0 V 0 , then A+ is also equal to V 0 ⌃0+ U 0 because there are m r all-zero columns and n r
+
all-zero rows in ⌃. When computing the product of V ⌃ , the last n r columns of V don’t have any e↵ect on
the product and thus the product will lead to an n ⇥ m matrix whose last m r columns are all zeros. When
T
timing the matrix U , again the last m r rows don’t have any e↵ect on the result. So the product of V ⌃+ U T
There are cases where compression of files or images are necessary during transmission between devices. For
example, when we browse images on internet with our cellphones, images might be resized to better fit on the
We know an image is composed of pixels and the color of each pixel is composed of three primary colors: Red,
Green and Blue. Then any image could be represented as three metrics R,G and B whose entries are the color
levels at the corresponding positions.
To better explain the idea, we use the black and white image for our example whose representing matrix is the
9
Singular-Value Decomposition and its Applications
If we have the matrix M whose entries are the grey values at the corresponding positions of this image. We claim
0
that M is the k-rank approximation of M, where
0 0 0
M = U ⌃ V 0T
0
U is the k-rank approximation of U, which only has the first k columns of U.
0
⌃ only has the first k singular values of ⌃.
If we set a value k and take the best k-rank approximation to the image matrix, we can then obtain an approximate
image matrix corresponding the resulting compressed image. Let k = 160, 40, 10 respectively, using the matlab
2
Downloaded from http://ucsdnews.ucsd.edu/slideshow/page/welcome week/2011
10
Zecheng Kuang
Since we drop a few singular values, when we recompute the approximated matrix, we only need to deal with
the columns of U and rows of V T that are actually get used. In other words, only first k columns of U, first k
singular values and first k rows of V T need to be stored, and thus we reduce memory usage.
Proof. Suppose M = U ⌃V T .
T
Let mij , uij , sij , vij be the entry in ith row, j th column of matrix M , U , ⌃, V T respectively.
XX T
mij = uik skl vlj
k l
XX XX XX
||M ||2 = (mij )2 = ( T 2
uik skl vlj ) (1)
i j i j k l
11
Singular-Value Decomposition and its Applications
As we square a sum of terms, we can create two copies of the sum and multiply each term of the first sum by
XX T 2
XX T
XX T
XXXX T T
( uik skl vlj ) =( uik skl vlj )( uim smn vnj )= uik skl vlj uim smn vnj
k l k l m n k l m n
XXXXXX
||M ||2 = T
uik skl vlj T
uim smn vnj (2)
i j k l m n
XXXX
||M ||2 = T
uik skk vkj T
uim smm vmj
i j k m
P
Since U is column-orthonormal, i uik uim = 1 if k = m or 0 otherwise. Then
XX
||M ||2 = T
skk vkj T
skk vkj
j k
P
Since V T is row-orthonormal, j
T T
vkj vkj = 1. Therefore,
X
||M ||2 = (skk )2 (3)
k
0 0
Suppose we have M 0 = U 0 ⌃0 V T , which preserves the first d singular values of ⌃, then since U 0 ⌃0 V T = U ⌃0 V T ,
M M 0 = U (⌃ ⌃0 )V .
X
||M M 0 ||2 = (skk s0kk )2 (4)
k
Since the only k such that skk di↵er to s0kk are those entries set to be zeros in ⌃0 . That is, ||M M 0 ||2 is the sum
In practice, people want to retain 90% of information in the approximation. More specially, we want the sum of
the squares of the singular values in ⌃0 to be at least 90% of the sum of the squares of the singular values in ⌃.
Matlab code 2.2 to compress an image with 7 di↵erent k-rank approximation (320, 160, 80, 40, 20, 10, 5) :
12
Zecheng Kuang
1 f u n c t i o n [ o u t p u t a r g s ] = CompressImage ( f i l e n a m e )
2 %open an image
3 image = r g b 2 g r a y ( imread ( f i l e n a m e ) ) ;
4 image = im2double ( i m r e s i z e ( image , 0 . 5 ) ) ;
6 %d i s p l a y t h e o r i g i n a l image
9 %g e t t h e SVD o f t h e image
10 [ U, S , V] = svd ( image ) ;
11
12 %g e t t h e d i a g o n a l e n t r i e s o f sigma
14
17 %g e t t h e new sigma
19 c o m p r e s s e d s i g m a s ( r a n k s ( i ) : end ) = 0 ;
20
21 compressed S = S ;
22 n = length ( diagonals ) ;
25 %c a l c u l a t e t h e compressed image
26 approx imag e = U ⇤ c o m p r e s s e d S ⇤ V ’ ;
27
28 %d i s p l a y t h e compressed image
ranks ( i ) ) ) ;
30 end
31 end
13
Singular-Value Decomposition and its Applications
Companies such as Amazon and Netflix collect tons of data, for example, users’ browsing or purchasing records
and provide recommendation to them according to some analyses on the data. How to know the user better and
recommend them the products they truly like then becomes a crucial problem that needs an e↵ective solution.
In this chapter, we will discuss the application of singular-value decomposition in recommendation system and
Suppose we are given the ratings of seven people towards five di↵erent movies respectively,
John 1 1 1 0 0
Billy 3 3 3 0 0
Charlie 4 4 4 0 0
Bella 5 5 5 0 0
Jack 0 0 0 4 4
Alex 0 0 0 5 5
Harry 0 0 0 2 2
We can then represent these ratings as a matrix M whose columns are the ratings of each user respectively. If we
2 3
61 1 1 0 07
6 7
63 3 3 0 07
6 7
6 7
64 4 4 0 07
6 7
6 7
6 7
65 5 5 0 07
6 7
6 7
60 0 0 4 47
6 7
6 7
60 0 0 5 57
6 7
4 5
0 0 0 2 2
M
2 3
60.14 0 7
6 7
60.42 0 7
6 7
6 7
60.56 0 7
6 7
6 7
6 7
60.70 0 7 2 32 3
6 7
6 7 12.4 0 0.58 0.58 0.58 0 0
6 0 0.607 6 76 7
6 74 54 5
6 7
= 6 7 0 9.5 0 0 0 0.71 0.71
6 0 0.757
4 5 ⌃ VT
0 0.30
U
14
Zecheng Kuang
2 3
6 0.14 0 7
6 7
60.42 0 7
6 7
6 7
6 7
60.56 0 7
6 7
6 7
60.70 0 7
6 7
6 7
6 0 0.607
6 7
6 7
= 6
6 0 0.757
7
4 5
0 0.30
U
U : Connects users to movie genres.
Each row represents the how each user likes a movie genre. For example, the numbers 0.70 and 0 in the fourth
rows of U represent Bella prefers genre 1 better than genre 2, where genre 1 represents “Sci-fi” and genre 2
2 3
6 12.4 0 7
4 5
0 9.5
⌃
⌃: Each diagonal entry represents the strength of each genre. The strength of “Sci-fi” is greater than the
strength of “Romance” because the data provides more information about “Sci-fi” genre and users who like “Sci-fi.”
2 3
6 0.58 0.58 0.58 0 0 7
4 5
0 0 0 0.71 0.71
VT
V T : Connects genre to movies.
Each row represents the strength of genre each movie partakes. As we seen, Matrix, Inception and Stars Wars
all have a positive numbers of 0.58 as “Sci-fi” genre while Casablanca and Titanic have 0’s as they don’t partake
Suppose there is new user Jane who only watches Stars Wars and rates it 4. How could we use this sys-
First of all, We can represent her rating as a vector u = [0, 0, 4, 0, 0]. By computing uV , we map Jane’s ratings
into the “genre space”. Since uV = [2.32, 0], Jane is interested in Sci-fi movies, but not interested in “Romance”
One useful thing we can do now is to map her representation back into “movie space” by computing
15
Singular-Value Decomposition and its Applications
[2.32, 0]V T , because matrix V T interprets the genre each movie partakes. If we compute [2.32, 0].V T to get
[1.35, 1.35, 1.35, 0, 0], this indicates that Jane would like Matrix, Inception and Star Wars, but not Casablanca or
Titanic.
Another approach is to find users who are similar to Jane and provide recommendations to Jane according to
the preferences of those similar users. We can use V to map all the users into “genre space”. For example,
John’s ratings will map to [1.74, 0] , while Jack’s will map to [0, 5.68]. We can then measure the similarities
between users by their cosine distances in “concept space”. It is clear that Jane is more similar to John than
to Jack because the [2.32, 0] and [1.74, 0] have the same direction, which gives the cosine distance 1, while
[2.32, 0] and [0, 5.68] have a dot product of 0, which gives the cosine distance 0, indicating the angle between
them is 90 degree3 . By looking up John’s preferences, we could then provide recommendations to Jane accordingly.
Remark :
In the example explained above, all the users rate either “Sci-fi” or “Romance”, but not both. In this case,
the rank of matrix M is equal to the number of “genres”, which is 2 and the decomposition of M will give
exactly the desired number of columns of U and V . However, in real life, we might not have such a simple
case. In fact, we might have the rank of M greater than the number of columns we want for U ,⌃, V . In this
case, we then need to eliminate the smallest singular values and the corresponding columns of U and V to
get the approximation of M as long as the approximation retains “enough” information from the original one. [2.1]
Handwritten digits are very common in our lives and there are cases that people want to recognize the
handwritten digits or letters by computers. For example, postal office requires an automated and precise way to
recognition handwritten zip codes. This is now automated by Optical Character Recognition (OCR) [10].
In this chapter, we will discuss the application of SVD in handwritten digits classification.
Suppose we have a set of k training images which have been classified as digit i, we want to use these
3 <u,v>
Cosine distance between two vectors u and v is given by cos ✓ = ||u||2 ||v||2
.
16
Zecheng Kuang
Firstly, we have to cut the frames and resize these k images to get images with size 16x16. Then we construct k
matrices whose entries are the grey levels of pixels at corresponding locations of each images respectively.For each
of these k matrices, we vectorize them into R256 vectors and use these vectors as columns to construct matrix M.
Figure 8. Vectorization[12]
17
Singular-Value Decomposition and its Applications
Proposition 2.2.
Suppose M = U ⌃V T , then
{u1 , u2 , . . . , ur }
form an orthonormal basis of the column space of M, where r is the rank of M and ui is the ith column of U.
M = U ⌃V T = T
1 u 1 v1 + T
2 u 2 v2 + ... + T
r u r vr
T
where i , u i , vi are the ith singular value of ⌃, ith column of U and ith rows of V T .
T
Since each of i u i vi can be written as
2 3
6 7
6 7
T
i u i vi
6 7
= i 6ui 7 vi 1 vi 2 . . . vi n = i viT 1 ui
T T T T T
i vi 2 u i ... i vi n u i
4 5
where viT j denotes the j th entry of vector viT and n is the number of elements in vector viT . Then we have
M= T T T T T T
1 v1 1 u 1 + 2 v2 1 u 2 + ... + r vr 1 u r ...... 1 v1 n u 1 + 2 v2 n u 2 + ... + r vr n u r
T T T
1 v1 1 u 1 + 2 v2 1 u 2 + ... + r vr 1 u r
and thus each column vector of M can be written as a linear combination of {u1 , u2 , . . . , ur }
If we compute the SVD of this matrix M, then the columns of U form an orthogonal basis of the column space
of matrix M. In other words, u1 , u2 , ..., ur form a “digit space” for a specific digit.
Keep in mind that we now only compute the subspace of one digit. We have to continue the same process to
Now, if given any unknown image of a digit, we do the pre-processing on this image as explained before and get
a vector ~
q . We want to find the “digit space” that ~
q is closest to. The way to do that is to find the smallest
residual between ~
q and all the orthonormal space.
Since u1 , u2 , ..., ur is an orthonormal basis for the column space of matrix M, denoting this column space W,
18
Zecheng Kuang
Since there are 10 digits in total, we want to compute the residual between ~
q and each of these 10 orthonormal
basis and classify the unknown digit as d, where the residual between ~
q and the orthonormal basis for digit d is
smallest among all the ten digits.
rd
X
min ||~
q < v, ud,i > ud,i ||2
0d9
i=1
Handwritten digit classification is actually a classic problem in machine learning. There are many di↵erent
approaches to solving this problem such as principal component analysis(PCA), nearest neighbor method,
statistical modelling and neural networks. It is always meaningful to write codes of the algorithm we use for the
problem as we can analyze the efficiency and accuracy of such algorithm. We try to use the idea of singular-value
But first we need some pre-classfied training images. Thanks for the MNIST database, we could get a huge set of
training images and their classifications. MNIST data database, which is available at [8] contains 60,000 training
examples with corresponding classifying labels and 10,000 examples for testing.
number of testing samples we want to classify. The function outputs the percentage of correct classification.
1 f u n c t i o n [ r a t e ] = d i g i t R e c o g n i t i o n ( numTrain , numTest )
2 d = l o a d ( ’ mnist . mat ’ ) ;
19
Singular-Value Decomposition and its Applications
3 X = d . trainX ;
4 Y = d . trainY ;
5 A = d . testX ;
6 B = d . testY ;
8 d i g i t S p a c e=z e r o s ( [ 2 5 6 , numTrain , 1 0 ] ) ;
10 p o s i t i o n = ones ( [ 1 , 1 0 ] ) ;
11
12 i = 0;
13 w h i l e i < ( numTrain ⇤ 1 0 )
14 pos = r a n d i ( 6 0 0 0 0 ) ;
15 d i g i t = Y( 1 , pos ) ;
19 d i g i t S p a c e ( : , p o s i t i o n ( 1 , d i g i t +1) , d i g i t +1)=imageVec ;
20 p o s i t i o n ( 1 , d i g i t +1)= p o s i t i o n ( 1 , d i g i t +1)+1;
21 i = i +1;
22 end
23 end
24
25 err = 0;
26 f o r i = 1 : numTest
27 pos = r a n d i ( 1 0 0 0 0 ) ;
29 vv = d o u b l e ( ImageCrop2 ( unknown ) ) ;
30
36 end
37
20
Zecheng Kuang
38 v a l u e = f i n d ( r e s==min ( r e s ) ) ;
39 value = value 1;
40
41 i f ( v a l u e˜=B( 1 , pos ) )
42 e r r = e r r +1;
43 end
44
45 end
46
47 r a t e = d o u b l e ( ( numTest e r r ) /numTest ) ;
48
49 end
Helper functions include “ImageCrop” which takes an image as input in the form as a matrix and returns the
pre-processing image(cutting the frame) as a vector.
1 f u n c t i o n [ v e c t o r ] = ImageCrop2 ( m a t ri x )
3 [ row c o l ] = s i z e ( m a t ri x ) ;
6 Left = 0;
7 for i = 1: col
8 i f sumCol ( i )>0
9 Left = i ;
10 brea k ;
11 end
12 end
13
15 Up = 0 ;
16 f o r i = 1 : row
17 i f sumRow( i )>0
18 Up = i ;
19 brea k ;
21
Singular-Value Decomposition and its Applications
20 end
21 end
22
24 Right = 0 ;
26 i f sumCol ( i )>0
27 Right = i ;
28 brea k ;
29 end
30 end
31
33 Bottom = 0 ;
34 f o r i = row : 1:1
35 i f sumRow( i )>0
36 Bottom = i ;
37 brea k ;
38 end
39 end
40
41 I = m a t ri x ;
44 I 3 = i m r e s i z e ( I2 , [ 1 6 1 6 ] ) ;
45 vector = I3 ( : ) ;
46
47 end
“residual” function basically returns the residual between a vector and a subspace.
Matlab code for residual
1 f u n c t i o n [ r e s ] = r e s i d u a l ( b a s i s , rank , v e c t o r )
2 proj = 0;
3 f o r i = 1 : rank
4 p r o j = p r o j + ( dot ( v e c t o r , b a s i s ( : , i ) ) ) ⇤ b a s i s ( : , i ) ;
22
Zecheng Kuang
5 end
6 r e s = norm ( v e c t o r p r o j ) ;
8 end
When we use 100 training images for each of the 10 digits to form our “digit space”, we get the following “mean
digits”.
These averaged images are called centroids[9]. We are treating each image as a R256 vector, which is vectorized
from the matrix of dimension 16 ⇥ 16, and then taking the average of all images in each digit classification
individually.
If we fix the number of testing samples to be 1000 and use (5, 10, 15, 20, . . . , 100) training images for each digit,
we can get the corresponding classification percentage.
23
Singular-Value Decomposition and its Applications
Figure 12.
Figure 13. shows the accurate classification rate as a function of number of training images using for each digit.
The percentage approaches 90% with 20 training digits and reaches an stable percentage of about 92% after 40
training digits.
24
Zecheng Kuang
Figure 13.
Only about 2% of accuracy of classifying 1000 handwritten digits increases with these extra 20 training digits,
which gives about 20 more correct classifications. However, the time used to compute “digit space” using 40
training digits is much longer than that of using 20 training digits. So it might not be necessary to use more than
It is common that accuracy loss happens during files or images transmission between devices. Most people have
experienced the situation that the file or image downloaded is destroyed and can’t be assessed. Though this
is rare in our lives, what could we actual do if we couldn’t get assessed to the original file or even the original
file doesn’t exist anymore? Could we obtain a best approximation of the original file? We adapt the idea from [4][5]
5
Image from https://www.math.vt.edu/ugresearch/Lassiter 2012 2013.pdf
25
Singular-Value Decomposition and its Applications
Using the similar idea as the “Handwritten Digit Recognition” example, we could construct a “human face
space” using di↵erent training images of the person’s faces (centered and of the same sizes). Viewing each image
of a person as one vector and constructing a matrix M using these vectors as columns, we could obtain the “face
space”, which is the column space of matrix M or more specifically, the space spanned by {u1 , u2 , . . . , ur }, where
ui is the ith column of matrix U given by the SVD of M .
However, unlike images of digits which are easy distinguishable 2-dimensional pictures, images of human faces
are much more complicated in terms of the variations in their facial expressions, the angles of faces towards
camera, the di↵erent shadows casts caused by di↵erent light source direction (known as illumination [6]) and the
background of the photo. An individual’s identity is captured by all these variations which will cause variances
between di↵erent images of the same individual. So it is necessary to normalize a face in order to minimize the
variance [7]. One way to do that is to subtract each image by the “Mean human face”.
More specifically, if given {v1 , v2 , . . . , vk }, k human face vectors, we define Mean face vector as:
k
1X
M ean = vi .
k i=1
26
Zecheng Kuang
i = vi M ean
Then we construct a matrix M whose columns are all the i ’s. If we compute the SVD of M and get U ⌃V T ,
then by Proposition 2.2, we get a space spanned by the column vectors of M and call this “face space”.
Now, since we have a destroyed image of human face, if we vectorized it and subtract from Mean, we get a vector,
called des .
Then the reconstruction is done by getting the best human face approximation to this image, which is the
We now re-frame the vector of the reconstruction into a square matrix. This matrix then will represent the
Then the projection of this destroyed image-Mean onto the face space is the best approximation of this destroyed
image to human face image. So we get the recovered image by taking the projection.
The recovered image is given by [4] as below
Acknowledgements
First of all, I would like to thank my adviser Thang Huynh for his patient guidance and encouragement during
the two quarters of my studying and researching. I am deeply thankful to him for giving me this opportunity and
27
Singular-Value Decomposition and its Applications
being my adviser.
I would also like to thank all the professors that I have taken courses with in the Math Department at UCSD.
Without their advice and enthusiasm for mathematics, I wouldnt have the passion for mathematics and finish
this paper. Especially thanks to Professor Thang Huynh for his advice in the Honors Program and in my pursuit
References
[2] JW. Demmel, Applied Numerical Linear Algebra, Soc. for Industrial and Applied Math, 1997, 109-117.
[3] J. Leskovec, A. Rajaraman, J. Ullman, Mining of Massive Datasets, Cambridge University Press, 2011,
418-427.
[4] M. Mazack. Algorithms for Handwritten Digit Recognition. Masters colloquium, Mathematics Department,
[5] M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp.
71-86, 1991.
[6] T. Chen, W. Yin, X.-S. Zhou, D. Comaniciu, T. S. Huang, Illumination Normalization for Face Recognition
and Uneven Background Correction Using Total Variation Based Image Models CVPR, 2005.
[7] T. Jebara, 3D Pose Estimation and Normalization for Face Recognition, Center for Intelligent Machines,
McGill University, 1996, 61-73.
28