0% found this document useful (0 votes)
5 views

Singular-Value Decomposition and its Applications

This undergraduate honor thesis by Zecheng Kuang explores Singular-Value Decomposition (SVD) and its various applications, including recommendation systems, image compression, and handwritten digit classification. The document provides a detailed mathematical foundation for SVD, including its definition, properties, and proofs related to eigenvalues and least squares problems. It emphasizes the practical significance of SVD in real-life scenarios, particularly in data approximation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Singular-Value Decomposition and its Applications

This undergraduate honor thesis by Zecheng Kuang explores Singular-Value Decomposition (SVD) and its various applications, including recommendation systems, image compression, and handwritten digit classification. The document provides a detailed mathematical foundation for SVD, including its definition, properties, and proofs related to eigenvalues and least squares problems. It emphasizes the practical significance of SVD in real-life scenarios, particularly in data approximation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Honor Thesis • 1-28

Undergraduate Honor Thesis

Singular-Value Decomposition and its Applications

Zecheng Kuang1⇤

1 Department of Mathematics, University of California San Diego, 9450 Gilman Dr, 92092 , La Jolla, CA

Abstract: The singular-value decomposition (SVD) is a factorization of a real or complex matrix. During my study in
the linear algebra courses, I found it interesting to decompose a matrix and considered there might have many
useful applications of SVD in real life. In this thesis, we will discuss several applications of SVD such as
recommendation system, imagine compression, handwritten digits classification and the mathematical theorem
behind them.

1. Introduction to Singular-Value Decomposition

Singular-value decomposition (SVD) allows an exact representation of any matrix and it is easy to eliminate

the less important data in the matrix to produce a low-dimensional approximation. This is meaningful in such

applications as image compression and recommendation system. Moreover, the natural of SVD allows to form a

subspace spanned by the columns of the matrix, which is useful in the applications such as digit recognition and
destroyed image reconstruction.

Let’s first discuss what Singular-value decomposition actually is.

1.1. The Definition of Singular-Value Decomposition

Let M be an m ⇥ n matrix, and let r be the rank of M . Then there exists a matrix factorization called Singular-

value Decomposition (SVD)1 of M with the form U ⌃V T , where

1. U is an m ⇥ r column-orthonormal matrix; That is, each of its columns is a unit vector and the dot product

of any two columns is 0.


2. ⌃ is a diagonal matrix where the diagonal entries are called the singular values of M .

3. V T is an r ⇥ n row-orthonormal matrix; That is, each of its rows is a unit vector and the dot product of any

two columns is 0.


E-mail: [email protected]
1
The proof of the existence of SVD can be refered to Applied Numerical Linear Algebra by James W. Demmel [2]

1
Singular-Value Decomposition and its Applications

Figure 1. Singular-Value Decomposition

The SVD of a matrix M has strong connections to the eigenvectors of the matrix M T M and M M T .

Proposition 1.1.
For any matrix M , M T M and M M T have non-negative eigenvalues.

Proof. Suppose ~v is an eigenvector of M T M whose corresponding eigenvalue is , then we have

M T M~v = ~v .

Multiplying both side by ~v T , we get

~v T M T M~v = ~v T ~v .

This is equivalent to

||M~v ||22 = ||~v ||22 .

As ||M~v ||22 , ||~v ||22 0, 0.

Similarly, we can show M M T has non-negative eigenvalues.

Proposition 1.2.
For any matrix M , rank(M )=rank(M T M )=rank(M M T ).

Proof. Let N ull(M ) be the null space of matrix M and x 2 N ull(M ), we have

M x = 0 ) M T M x = 0.

So x 2 N ull(M T M ), indicating N ull(M ) ⇢ N ull(M T M ).

2
Zecheng Kuang

Now, suppose x 2 N ull(M T M ), we then have

M T M x = 0 ) xT M T M x = 0 ) ||M x||22 = 0 ) M x = 0.

So x 2 N ull(M ), indicating N ull(M T M ) ⇢ N ull(M ).

We have N ull(M ) = N ull(M T M ) ) dim(N ull(M )) = dim(N ull(M T M )) ) rank(M ) = rank(M T M ).

Since rank(M ) = rank(M T ), we also have rank(M T ) = rank(M T M )


Now let’s substitute M with M T , we obtain

rank((M T )T ) = rank((M T )T M T ).

Equivalently,

rank(M ) = rank(M M T ).

Fact 1.1.
The rows of V T are the eigenvectors corresponding to the positive eigenvalues of M T M and the squares of the
diagonal entries of ⌃ are the positive eigenvalues of M T M . More specifically, the ith row of V T is the eigenvector
of M T M whose corresponding eigenvalue is the square of the ith entry of ⌃.

Proof. Let’s begin our explanation with the transpose of M .

Suppose M = U ⌃V T , then

M T = (U ⌃V T )T = V ⌃ T U T .

Since ⌃ is a diagonal matrix, its transpose is itself. Thus we have

M T = V ⌃U T .

Multiplying M and M T , we have

M T M = V ⌃2 V T .

Multiplying both sides by V , we get

M T M V = V ⌃2 .

It’s not hard to see the diagonal entries of ⌃2 are the eigenvalues of M T M and V is the matrix whose columns are

the corresponding eigenvectors of M T M , which indicates that the rows of V T are the corresponding eigenvectors

of M T M .
Moreover, since M T M is a symmetric matrix, the number of non-zero eigenvalues equals the rank of the matrix,

which is equals to rank of M , which is r. By Proposition 1.1, we know M T M has r positive eigenvalues. So the

diagonal entries of ⌃ are the square roots of the positive eigenvalues of M T M .

3
Singular-Value Decomposition and its Applications

Fact 1.2.
The columns of U are the eigenvectors corresponding to nonzero eigenvalues of M M T .

Proof. Since we have

M M T = (U ⌃V T )(V ⌃U T ) = U ⌃2 U T ,

multiplying both sides by U , we will get

M M T U = U ⌃ 2 U T U = U ⌃2 .

We discover that U is the matrix whose columns are the corresponding eigenvectors of M M T .

Moreover, there is a relationship between the eigenvalues of M T M and the eigenvalues of M M T .

Proposition 1.3.
For any matrix M , M T M and M M T have the same nonzero eigenvalues.

Proof. Suppose (~v , ) is the eigenpair of M T M and 6= 0,


We then can write

M T M~v = ~v .

Multiplying both side by M , we have

M M T M~v = M ~v = M~v .

Let M~v = ~x , then

M M T ~x = ~x.

It turns out that is also an eigenvalue of M M T .

Now suppose (~v , ) is the eigenpair of M M T and 6= 0,

We can write

M M T ~v = ~v .

Multiplying both side by M T , we have

M T M M T ~v = M T ~v = M T ~v .

Let M T ~v = ~x , then

M T M~x = ~x.

It turns out that is also an eigenvalue of M T M .

So we conclude that the nonzero eigenvalues of M T M and M M T are the same.

4
Zecheng Kuang

Remark :

Since M T M is an n ⇥ n matrix and M M T is an m ⇥ m matrix, where n and m don’t necessarily equal to the
r. In fact, n and m are at least as large as r, which indicates that M T M and M M T should have an additional

n r and m r eigenpairs with eigenvalues of zeros. In this paper, we use the reduced SVDs[11], which get rid

of the zero eigenvalues and the corresponding eigenvectors, forming U as a m ⇥ r matrix, V T as r ⇥ n matrix

and ⌃ as r ⇥ r (The diagonal entries of ⌃ are non-zero singular values of M ).

The full SVD of a m ⇥ n matrix is defined as M = U ⌃V T , where

U is an m ⇥ m matrix, which contains all the eigenvectors of M M T , including the eigenvectors corresponding to

the potential zero eigenvalues.


⌃ is an m ⇥ n matrix, whose entry at ith row, ith column is the square root of eigenvalue (possibly zero) of

MT M.

V T is an n ⇥ n matrix, which contains all the eigenvectors of M T M , including the eigenvectors corresponding to

the potential zero eigenvalues.

Example 1.1.
1 0 1 0
Let M = , the reduced SVD of M is given by
0 1 0 1

 p " #
p1 0 p1 0
1 0 2 p0 2 2
0 1 0 2 0 p1 0 p1
2 2 .
U ⌃ VT

The full SVD is given by 2 3


p1 0 p1 0
2 2
6 0 p1 0 p1 7
 p 6 2 27
1 0 2 p0 0 0 6 p1 0 p1
7
0 5
4 2 2
0 1 0 2 0 0 p1 p1 .
0 2
0 2
U ⌃
VT

2. Applications of Singular-Value Decomposition


2.1. Least Square Problems

In real life, there might be times when we want to find a “best-fitting” curve to a set of some given points. For
example, people might want to find the relationship between lean body mass of human bodies and their muscle

strength or analyze how the housing price changes by years to obtain an estimation of the price in the future.

Most of the time, we might not be able to find such a linear equation that can be satisfied by all of those points,

so finding a “closest” solution is the best we can do. We call these solution to be the least square solutions.
How could we actually find such a “best-fitting curve”?

First let’s turn it into a mathematically defined problem.

5
Singular-Value Decomposition and its Applications

Figure 2. Least square solution of the relation between human lean body mass and muscle strength. [1]

Question 2.1.
Suppose we denote all the points on the graph as (xi , yi ) pair, where xi and yi are the x-coordinate and y-
coordinate of point i. We want to find a line ax + by = c, such that

X
(axi + byi c)2
i

is closest to 0, which is the same as finding

2 3 2 3
x1 y1
6x 2 7 6y 2 7
min ||a 4 5 + b 4 5 c||2
.. ..
. .

c 2 R.
for every a, b,2 3
x y 1
6 1 1 7 
6 7
If we let A = 66x 2 y2 17
7 , ~
x = 3
a b c , the question then becomes to find ~x 2 R such that ||A~x||2 is smallest.
4. . .. 5
.. .. .
In general, we could have a least-square problem as following:

Question 2.2.
Suppose given A and ~b, where A is a m ⇥ n matrix and ~b 2 Rm , we want to find ~x such that A~x is closest to ~b.
In order words, find ~x such that
minn ||A~x ~b||2 .
x2R

~x is called the least square solution to the equation A~x = ~b.

6
Zecheng Kuang

How could we get the solution?

Firstly, let’s introduce the pseudoinverse of a matrix.

Definition 2.1 (Pseudoinverse of a Matrix[2]).


Suppose M is an m ⇥ n matrix and the singular-value decomposition of M is given by U ⌃V T (either full SVD or
reduced SVD), then the pseudoinverse of M is given by

M + = V ⌃+ U T ,

where ⌃+ is from taking the transpose of ⌃ and taking the numerical inverse of all the non-zero entries at ith
row and ith column for every i.

Example 2.1. 21 3
2 3 0 0
3 0 0 0 3
60 1
07
Suppose ⌃ = 40 2 0 05, then ⌃+ = 6 2 7
4 0 0 0 5.
0 0 0 0
0 0 0

Theorem 2.1.
Suppose given A and ~b, where A is a m ⇥ n matrix and ~b in Rm , then

~x = A+~b

is the least square solution to A~x = ~b.

Proof. In this proof, we need to utilize the full version of SVD, where U is m ⇥ m, V T is n ⇥ n and ⌃ is m ⇥ n.

Let A = U ⌃V T be the full SVD of matrix A, since U is a column-orthonormal matrix, then U T U = Im . As U is


a square matrix, U T has to be the inverse of U , which shows U U T = Im .

Then for any ~x 2 Rn ,

||A~x ~b||2 = ||U ⌃V T ~x U U T ~b||2 = ||U (⌃V T ~x U T ~b)||2 .

Since finding an ~x such that ||U (⌃V T ~x U T ~b)||2 is smallest is the same as finding ~x such that ||⌃V T ~x U T ~b||2

is smallest because ||U (⌃V T ~x U T ~b)||2  ||U ||.||⌃V T ~x U T ~b||2 .

y and U T ~b as ~c, then if we can find ~


For convenience, if we write V T ~x as ~ y such that ||⌃~
y ~c|| is small-

est, then we can find such ~x accordingly.

Since the columns of ⌃ are orthogonal, then the columns of ⌃ form a subspace of Rn , then the only way that can

make ||⌃~
y ~c|| smallest is to make ⌃~
y the projection of ~c onto the column space of ⌃.

Definition 2.2 (Projection of a vector onto a subspace).


Let W be a subspace of Rn , {u1 , . . . , um } be an orthogonal basis for W . If v is a vector in Rn , the projection of

7
Singular-Value Decomposition and its Applications

v onto W is denoted projW v, where


< v, u1 > < v, u2 > < v, um >
projW v = u1 + u2 + . . . + um
< u1 , u1 > < u2 , u2 > < um , um >
If {u1 , . . . , um } is an orthonormal basis for W , then

projw v =< v, u1 > u1 + < v, u2 > u2 + . . . + < v, um > um

Figure 3. Projection of a vector onto a subspace [13]

82 3 2 3 2 39
>
> >
>
>
>
>
1 0
6 7 6 7 6 0 7> >
>
>
>6 7 6 7 6 7 >
>
<6 0 7 6 2 7 6 .. 7>>
6 7 6 7 6 . 7=
6 7
Let C(⌃) =span 6 7 , 6 7 , . . . , 6 7 , the column space of ⌃, where r is the rank of A. And we denote !i
>
> 607 607 6 r 7> >
>
> 6 7 6 7 6 7> >
>4 5 4 5
> 4 . 5> >
>
> .. .
. >
>
: . . .
. ;
as the ith column of ⌃. So ⌃~
y must equal to

< c, !1 > < c, !2 > < c, !r >


projC(⌃) c = !1 + !2 + . . . + !r
< !1 , !1 > < !2 , !2 > < !r , !r >

2
Since < !i , !i >= i, we can rewrite projC(⌃) c as

!1 !2 !r
< c, 2
> !1 + < c, 2
> !2 + . . . + < c, 2
> !r .
1 2 r

Keep in mind that that are n columns in ⌃, where there are n r columns with all zero entries. So it turns out
!i
that < c, 2 >= 0 if r < i  n.
i

Now we can write


n
X !i
projC(⌃) c = < c, 2
> !i
i=1 i

8
Zecheng Kuang

which is the same as the product:

2 3 2 3
!1
6 < c, 2 > 7 !1T / 2
1
6 1
7 6 7
 6 < c, !2 > 7  6 T 7
6 7 6 !2 / 27
6
2
7 = ! ! ... ! 6 27
!1 !2 . . . ! n 6 .
2
7 1 2 n 6 . 7 c.
6 .. 7 6 . 7
6 7 6 . 7
4 5 4 5
< c, !n2 > !nT / 2
n
n

It turns out

y = projC(⌃) c = ⌃⌃+ c.
⌃~

So

V T ~x = ⌃+ U T ~b.

Multiplying both side by V , we have

V V T ~x = ~x = V ⌃+ U T ~b

We conclude that

~x = A+~b

is the least square solution to A~x = ~b.

Remark : In this case, U ⌃V T is the full SVD of matrix A and A+ = V ⌃+ U T . However, if we let the reduced
T T
SVD of A to be U 0 ⌃0 V 0 , then A+ is also equal to V 0 ⌃0+ U 0 because there are m r all-zero columns and n r
+
all-zero rows in ⌃. When computing the product of V ⌃ , the last n r columns of V don’t have any e↵ect on

the product and thus the product will lead to an n ⇥ m matrix whose last m r columns are all zeros. When
T
timing the matrix U , again the last m r rows don’t have any e↵ect on the result. So the product of V ⌃+ U T

must be the same as the product of V 0 ⌃0+ U 0 .

2.2. Image Compression

There are cases where compression of files or images are necessary during transmission between devices. For
example, when we browse images on internet with our cellphones, images might be resized to better fit on the

screens or compressed to increase efficiency of transmission.

We know an image is composed of pixels and the color of each pixel is composed of three primary colors: Red,

Green and Blue. Then any image could be represented as three metrics R,G and B whose entries are the color
levels at the corresponding positions.

To better explain the idea, we use the black and white image for our example whose representing matrix is the

one with the grey levels at each pixel as entries.

9
Singular-Value Decomposition and its Applications

Suppose we have an image of Triton2 .

Figure 4. Original Image of Triton

Figure 5. Black and White Image of Triton

If we have the matrix M whose entries are the grey values at the corresponding positions of this image. We claim
0
that M is the k-rank approximation of M, where

0 0 0
M = U ⌃ V 0T

0
U is the k-rank approximation of U, which only has the first k columns of U.
0
⌃ only has the first k singular values of ⌃.

V 0T only has the first k rows of V T .

If we set a value k and take the best k-rank approximation to the image matrix, we can then obtain an approximate

image matrix corresponding the resulting compressed image. Let k = 160, 40, 10 respectively, using the matlab

code 2.2, we get 3 compressed images as shown in Figure 6.

2
Downloaded from http://ucsdnews.ucsd.edu/slideshow/page/welcome week/2011

10
Zecheng Kuang

Figure 6. Compressed Image of Triton

Since we drop a few singular values, when we recompute the approximated matrix, we only need to deal with

the columns of U and rows of V T that are actually get used. In other words, only first k columns of U, first k

singular values and first k rows of V T need to be stored, and thus we reduce memory usage.

But why would this work?

Proposition 2.1 ([3]).


Let U ⌃V T be the SVD of M , then suppose M 0 is the the product of U ⌃0 V T , where ⌃0 is constructed by taking
the last r k singular values of ⌃ to be zero. We say M 0 is the best k-rank approximation of M ,
then ||M M 0 ||2 is equal to the sum of the squares of all the singular values that are set to zero.

Proof. Suppose M = U ⌃V T .
T
Let mij , uij , sij , vij be the entry in ith row, j th column of matrix M , U , ⌃, V T respectively.

By the definition of matrix multiplication,

XX T
mij = uik skl vlj
k l

Then the square of Frobenius norm of M ,

XX XX XX
||M ||2 = (mij )2 = ( T 2
uik skl vlj ) (1)
i j i j k l

11
Singular-Value Decomposition and its Applications

As we square a sum of terms, we can create two copies of the sum and multiply each term of the first sum by

each term of the second sum.

XX T 2
XX T
XX T
XXXX T T
( uik skl vlj ) =( uik skl vlj )( uim smn vnj )= uik skl vlj uim smn vnj
k l k l m n k l m n

So equation(1) can be written as

XXXXXX
||M ||2 = T
uik skl vlj T
uim smn vnj (2)
i j k l m n

Since ⌃ is a diagonal matrix, that is skl and smn = 0 unless k 6= l, m 6= n.

So we can rewrite equation(2) as

XXXX
||M ||2 = T
uik skk vkj T
uim smm vmj
i j k m

P
Since U is column-orthonormal, i uik uim = 1 if k = m or 0 otherwise. Then

XX
||M ||2 = T
skk vkj T
skk vkj
j k

P
Since V T is row-orthonormal, j
T T
vkj vkj = 1. Therefore,

X
||M ||2 = (skk )2 (3)
k

0 0
Suppose we have M 0 = U 0 ⌃0 V T , which preserves the first d singular values of ⌃, then since U 0 ⌃0 V T = U ⌃0 V T ,

M M 0 = U (⌃ ⌃0 )V .

By equation(3), we have the Frobenius norm of M-M’

X
||M M 0 ||2 = (skk s0kk )2 (4)
k

where s0ij is the entry in ith row and j th column of ⌃0 .

Since the only k such that skk di↵er to s0kk are those entries set to be zeros in ⌃0 . That is, ||M M 0 ||2 is the sum

of the squares of the elements of ⌃ that were set to 0.

So to minimize ||M M 0 ||2 , we have to pick the smallest elements in ⌃ to be zero.

In practice, people want to retain 90% of information in the approximation. More specially, we want the sum of
the squares of the singular values in ⌃0 to be at least 90% of the sum of the squares of the singular values in ⌃.

Matlab code 2.2 to compress an image with 7 di↵erent k-rank approximation (320, 160, 80, 40, 20, 10, 5) :

12
Zecheng Kuang

1 f u n c t i o n [ o u t p u t a r g s ] = CompressImage ( f i l e n a m e )

2 %open an image

3 image = r g b 2 g r a y ( imread ( f i l e n a m e ) ) ;
4 image = im2double ( i m r e s i z e ( image , 0 . 5 ) ) ;

6 %d i s p l a y t h e o r i g i n a l image

7 s u b p l o t ( 4 , 2 , 1 ) , imshow ( image ) , t i t l e ( s p r i n t f ( ’ O r i g i n a l Image ’ ) ) ;


8

9 %g e t t h e SVD o f t h e image

10 [ U, S , V] = svd ( image ) ;

11

12 %g e t t h e d i a g o n a l e n t r i e s o f sigma

13 diagonals = diag (S) ;

14

15 ranks = [ 320 , 160 , 80 ,40 , 20 , 10 , 5]


16 f o r i =1: l e n g t h ( r a n k s )

17 %g e t t h e new sigma

18 compressed sigmas = diagonals ;

19 c o m p r e s s e d s i g m a s ( r a n k s ( i ) : end ) = 0 ;
20

21 compressed S = S ;

22 n = length ( diagonals ) ;

23 compressed S ( 1 : n , 1 : n) = diag ( compressed sigmas ) ;


24

25 %c a l c u l a t e t h e compressed image

26 approx imag e = U ⇤ c o m p r e s s e d S ⇤ V ’ ;

27

28 %d i s p l a y t h e compressed image

29 s u b p l o t ( 4 , 2 , i +1) , imshow ( approx imag e ) , t i t l e ( s p r i n t f ( ’ Rank % d Image ’ ,

ranks ( i ) ) ) ;

30 end
31 end

13
Singular-Value Decomposition and its Applications

2.3. Recommendation System [3]

Companies such as Amazon and Netflix collect tons of data, for example, users’ browsing or purchasing records
and provide recommendation to them according to some analyses on the data. How to know the user better and

recommend them the products they truly like then becomes a crucial problem that needs an e↵ective solution.

In this chapter, we will discuss the application of singular-value decomposition in recommendation system and

explain how it works.

Suppose we are given the ratings of seven people towards five di↵erent movies respectively,

User|Movie Matrix Inception Stars Wars Casablanca Titanic

John 1 1 1 0 0

Billy 3 3 3 0 0

Charlie 4 4 4 0 0

Bella 5 5 5 0 0

Jack 0 0 0 4 4

Alex 0 0 0 5 5

Harry 0 0 0 2 2

We can then represent these ratings as a matrix M whose columns are the ratings of each user respectively. If we

compute the SVD of this matrix M , we will get

2 3
61 1 1 0 07
6 7
63 3 3 0 07
6 7
6 7
64 4 4 0 07
6 7
6 7
6 7
65 5 5 0 07
6 7
6 7
60 0 0 4 47
6 7
6 7
60 0 0 5 57
6 7
4 5
0 0 0 2 2
M
2 3
60.14 0 7
6 7
60.42 0 7
6 7
6 7
60.56 0 7
6 7
6 7
6 7
60.70 0 7 2 32 3
6 7
6 7 12.4 0 0.58 0.58 0.58 0 0
6 0 0.607 6 76 7
6 74 54 5
6 7
= 6 7 0 9.5 0 0 0 0.71 0.71
6 0 0.757
4 5 ⌃ VT
0 0.30
U

14
Zecheng Kuang

There is a way to interpret U, ⌃, V T respectively, where

2 3
6 0.14 0 7
6 7
60.42 0 7
6 7
6 7
6 7
60.56 0 7
6 7
6 7
60.70 0 7
6 7
6 7
6 0 0.607
6 7
6 7
= 6
6 0 0.757
7
4 5
0 0.30
U
U : Connects users to movie genres.

Each row represents the how each user likes a movie genre. For example, the numbers 0.70 and 0 in the fourth
rows of U represent Bella prefers genre 1 better than genre 2, where genre 1 represents “Sci-fi” and genre 2

represents “Romance” in this case.

2 3
6 12.4 0 7
4 5
0 9.5

⌃: Each diagonal entry represents the strength of each genre. The strength of “Sci-fi” is greater than the

strength of “Romance” because the data provides more information about “Sci-fi” genre and users who like “Sci-fi.”

2 3
6 0.58 0.58 0.58 0 0 7
4 5
0 0 0 0.71 0.71
VT
V T : Connects genre to movies.

Each row represents the strength of genre each movie partakes. As we seen, Matrix, Inception and Stars Wars

all have a positive numbers of 0.58 as “Sci-fi” genre while Casablanca and Titanic have 0’s as they don’t partake

any concept of “Sci-fi”.

Suppose there is new user Jane who only watches Stars Wars and rates it 4. How could we use this sys-

tem to recommend movies to her?

First of all, We can represent her rating as a vector u = [0, 0, 4, 0, 0]. By computing uV , we map Jane’s ratings
into the “genre space”. Since uV = [2.32, 0], Jane is interested in Sci-fi movies, but not interested in “Romance”

at all. We now have a representation of Jane’s ratings in “genre space”.

One useful thing we can do now is to map her representation back into “movie space” by computing

15
Singular-Value Decomposition and its Applications

[2.32, 0]V T , because matrix V T interprets the genre each movie partakes. If we compute [2.32, 0].V T to get

[1.35, 1.35, 1.35, 0, 0], this indicates that Jane would like Matrix, Inception and Star Wars, but not Casablanca or
Titanic.

Another approach is to find users who are similar to Jane and provide recommendations to Jane according to

the preferences of those similar users. We can use V to map all the users into “genre space”. For example,

John’s ratings will map to [1.74, 0] , while Jack’s will map to [0, 5.68]. We can then measure the similarities
between users by their cosine distances in “concept space”. It is clear that Jane is more similar to John than

to Jack because the [2.32, 0] and [1.74, 0] have the same direction, which gives the cosine distance 1, while

[2.32, 0] and [0, 5.68] have a dot product of 0, which gives the cosine distance 0, indicating the angle between

them is 90 degree3 . By looking up John’s preferences, we could then provide recommendations to Jane accordingly.

Remark :

In the example explained above, all the users rate either “Sci-fi” or “Romance”, but not both. In this case,

the rank of matrix M is equal to the number of “genres”, which is 2 and the decomposition of M will give
exactly the desired number of columns of U and V . However, in real life, we might not have such a simple

case. In fact, we might have the rank of M greater than the number of columns we want for U ,⌃, V . In this

case, we then need to eliminate the smallest singular values and the corresponding columns of U and V to

get the approximation of M as long as the approximation retains “enough” information from the original one. [2.1]

2.4. Classification of Handwritten Digits[4]

Handwritten digits are very common in our lives and there are cases that people want to recognize the
handwritten digits or letters by computers. For example, postal office requires an automated and precise way to

recognition handwritten zip codes. This is now automated by Optical Character Recognition (OCR) [10].

In this chapter, we will discuss the application of SVD in handwritten digits classification.

Suppose we have a set of k training images which have been classified as digit i, we want to use these

images to form our “recognition system”.

3 <u,v>
Cosine distance between two vectors u and v is given by cos ✓ = ||u||2 ||v||2
.

16
Zecheng Kuang

Figure 7. 8 training images of digit 6

Firstly, we have to cut the frames and resize these k images to get images with size 16x16. Then we construct k

matrices whose entries are the grey levels of pixels at corresponding locations of each images respectively.For each
of these k matrices, we vectorize them into R256 vectors and use these vectors as columns to construct matrix M.

M should be 256 by k. [8][9]

Figure 8. Vectorization[12]

Figure 9. Construction of matrix M

17
Singular-Value Decomposition and its Applications

Proposition 2.2.
Suppose M = U ⌃V T , then
{u1 , u2 , . . . , ur }
form an orthonormal basis of the column space of M, where r is the rank of M and ui is the ith column of U.

Proof. Suppose M has rank r, we can write

M = U ⌃V T = T
1 u 1 v1 + T
2 u 2 v2 + ... + T
r u r vr

T
where i , u i , vi are the ith singular value of ⌃, ith column of U and ith rows of V T .
T
Since each of i u i vi can be written as
2 3
6 7 
6 7
T
i u i vi
6 7
= i 6ui 7 vi 1 vi 2 . . . vi n = i viT 1 ui
T T T T T
i vi 2 u i ... i vi n u i
4 5

where viT j denotes the j th entry of vector viT and n is the number of elements in vector viT . Then we have


M= T T T T T T
1 v1 1 u 1 + 2 v2 1 u 2 + ... + r vr 1 u r ...... 1 v1 n u 1 + 2 v2 n u 2 + ... + r vr n u r

In general the ith column of M can be written as

T T T
1 v1 1 u 1 + 2 v2 1 u 2 + ... + r vr 1 u r

and thus each column vector of M can be written as a linear combination of {u1 , u2 , . . . , ur }

If we compute the SVD of this matrix M, then the columns of U form an orthogonal basis of the column space

of matrix M. In other words, u1 , u2 , ..., ur form a “digit space” for a specific digit.

Keep in mind that we now only compute the subspace of one digit. We have to continue the same process to

compute the subspace of all the ten digits.

Now, if given any unknown image of a digit, we do the pre-processing on this image as explained before and get
a vector ~
q . We want to find the “digit space” that ~
q is closest to. The way to do that is to find the smallest

residual between ~
q and all the orthonormal space.

Since u1 , u2 , ..., ur is an orthonormal basis for the column space of matrix M, denoting this column space W,

we define residual between ~


q and the orthonormal basis as the distance between ~
q and projw q [2.2], which is
given as
r
X
||~
q < q, ui > ui ||2
i=1

18
Zecheng Kuang

Since there are 10 digits in total, we want to compute the residual between ~
q and each of these 10 orthonormal

basis and classify the unknown digit as d, where the residual between ~
q and the orthonormal basis for digit d is
smallest among all the ten digits.

In other words, we want to find d, such that

rd
X
min ||~
q < v, ud,i > ud,i ||2
0d9
i=1

And we conclude d is the classification of the unknown digit.

Handwritten digit classification is actually a classic problem in machine learning. There are many di↵erent

approaches to solving this problem such as principal component analysis(PCA), nearest neighbor method,
statistical modelling and neural networks. It is always meaningful to write codes of the algorithm we use for the

problem as we can analyze the efficiency and accuracy of such algorithm. We try to use the idea of singular-value

decomposition as discussed before to write a Matlab code to classify handwritten digits.

But first we need some pre-classfied training images. Thanks for the MNIST database, we could get a huge set of
training images and their classifications. MNIST data database, which is available at [8] contains 60,000 training

examples with corresponding classifying labels and 10,000 examples for testing.

Figure 10. Sample images from MNIST test dataset4

The Matlab code is given below:


The “digitRecognition” function takes two input: the number of training images using for each digit and the

number of testing samples we want to classify. The function outputs the percentage of correct classification.

Matlab code for digitRecognition: :

1 f u n c t i o n [ r a t e ] = d i g i t R e c o g n i t i o n ( numTrain , numTest )

2 d = l o a d ( ’ mnist . mat ’ ) ;

19
Singular-Value Decomposition and its Applications

3 X = d . trainX ;

4 Y = d . trainY ;

5 A = d . testX ;
6 B = d . testY ;

8 d i g i t S p a c e=z e r o s ( [ 2 5 6 , numTrain , 1 0 ] ) ;

10 p o s i t i o n = ones ( [ 1 , 1 0 ] ) ;

11

12 i = 0;

13 w h i l e i < ( numTrain ⇤ 1 0 )
14 pos = r a n d i ( 6 0 0 0 0 ) ;

15 d i g i t = Y( 1 , pos ) ;

16 image = r e s h a p e (X( pos , : ) , 2 8 , 2 8 ) ’ ;

17 imageVec = ImageCrop2 ( image ) ;


18 i f p o s i t i o n ( 1 , d i g i t +1) <= numTrain

19 d i g i t S p a c e ( : , p o s i t i o n ( 1 , d i g i t +1) , d i g i t +1)=imageVec ;

20 p o s i t i o n ( 1 , d i g i t +1)= p o s i t i o n ( 1 , d i g i t +1)+1;

21 i = i +1;
22 end

23 end

24

25 err = 0;
26 f o r i = 1 : numTest

27 pos = r a n d i ( 1 0 0 0 0 ) ;

28 unknown = r e s h a p e (A( pos , : ) , 2 8 , 2 8 ) ’ ;

29 vv = d o u b l e ( ImageCrop2 ( unknown ) ) ;
30

31 res = zeros ([1 ,10]) ;

32 for digit = 0:9

33 [ U, S ,V]= svd ( d o u b l e ( d i g i t S p a c e ( : , : , d i g i t +1) ) , ’ econ ’ ) ;


34 rank=l e n g t h ( d i a g ( S ) ) ;

35 r e s ( 1 , d i g i t +1) = r e s i d u a l (U, rank , vv ) ;

36 end

37

20
Zecheng Kuang

38 v a l u e = f i n d ( r e s==min ( r e s ) ) ;

39 value = value 1;

40

41 i f ( v a l u e˜=B( 1 , pos ) )

42 e r r = e r r +1;

43 end

44

45 end

46

47 r a t e = d o u b l e ( ( numTest e r r ) /numTest ) ;

48

49 end

Helper functions include “ImageCrop” which takes an image as input in the form as a matrix and returns the
pre-processing image(cutting the frame) as a vector.

Matlab code for ImageCrop

1 f u n c t i o n [ v e c t o r ] = ImageCrop2 ( m a t ri x )

3 [ row c o l ] = s i z e ( m a t ri x ) ;

5 sumCol = sum ( matrix , 1 ) ;

6 Left = 0;

7 for i = 1: col

8 i f sumCol ( i )>0
9 Left = i ;

10 brea k ;

11 end

12 end
13

14 sumRow = sum ( matrix , 2 ) ;

15 Up = 0 ;

16 f o r i = 1 : row
17 i f sumRow( i )>0

18 Up = i ;

19 brea k ;

21
Singular-Value Decomposition and its Applications

20 end

21 end

22

23 sumCol = sum ( matrix , 1 ) ;

24 Right = 0 ;

25 for i = col : 1:1

26 i f sumCol ( i )>0
27 Right = i ;

28 brea k ;

29 end

30 end
31

32 sumRow = sum ( matrix , 2 ) ;

33 Bottom = 0 ;

34 f o r i = row : 1:1
35 i f sumRow( i )>0

36 Bottom = i ;

37 brea k ;

38 end
39 end

40

41 I = m a t ri x ;

42 I 2 = imcrop ( I , [ L e f t Up Right L e f t Bottom Up ] ) ;


43

44 I 3 = i m r e s i z e ( I2 , [ 1 6 1 6 ] ) ;

45 vector = I3 ( : ) ;

46

47 end

“residual” function basically returns the residual between a vector and a subspace.
Matlab code for residual

1 f u n c t i o n [ r e s ] = r e s i d u a l ( b a s i s , rank , v e c t o r )
2 proj = 0;

3 f o r i = 1 : rank

4 p r o j = p r o j + ( dot ( v e c t o r , b a s i s ( : , i ) ) ) ⇤ b a s i s ( : , i ) ;

22
Zecheng Kuang

5 end

6 r e s = norm ( v e c t o r p r o j ) ;

8 end

When we use 100 training images for each of the 10 digits to form our “digit space”, we get the following “mean
digits”.

Figure 11. 10 Mean Digits when using 100 training images

These averaged images are called centroids[9]. We are treating each image as a R256 vector, which is vectorized

from the matrix of dimension 16 ⇥ 16, and then taking the average of all images in each digit classification
individually.

But how many training images do we actually need in practice?

If we fix the number of testing samples to be 1000 and use (5, 10, 15, 20, . . . , 100) training images for each digit,
we can get the corresponding classification percentage.

23
Singular-Value Decomposition and its Applications

Figure 12.

Figure 13. shows the accurate classification rate as a function of number of training images using for each digit.

The percentage approaches 90% with 20 training digits and reaches an stable percentage of about 92% after 40

training digits.

24
Zecheng Kuang

Figure 13.

Only about 2% of accuracy of classifying 1000 handwritten digits increases with these extra 20 training digits,

which gives about 20 more correct classifications. However, the time used to compute “digit space” using 40

training digits is much longer than that of using 20 training digits. So it might not be necessary to use more than

40 training digits for each digit depending on the accuracy required.

2.5. Reconstruction of Destroyed Images of Human Faces

It is common that accuracy loss happens during files or images transmission between devices. Most people have

experienced the situation that the file or image downloaded is destroyed and can’t be assessed. Though this
is rare in our lives, what could we actual do if we couldn’t get assessed to the original file or even the original

file doesn’t exist anymore? Could we obtain a best approximation of the original file? We adapt the idea from [4][5]

Suppose we have a destroyed image of human face as followed. 5 .

5
Image from https://www.math.vt.edu/ugresearch/Lassiter 2012 2013.pdf

25
Singular-Value Decomposition and its Applications

Figure 14. Destroyed Image of Human Face

Using the similar idea as the “Handwritten Digit Recognition” example, we could construct a “human face

space” using di↵erent training images of the person’s faces (centered and of the same sizes). Viewing each image

of a person as one vector and constructing a matrix M using these vectors as columns, we could obtain the “face

space”, which is the column space of matrix M or more specifically, the space spanned by {u1 , u2 , . . . , ur }, where
ui is the ith column of matrix U given by the SVD of M .

Figure 15. Face Space [5]

However, unlike images of digits which are easy distinguishable 2-dimensional pictures, images of human faces

are much more complicated in terms of the variations in their facial expressions, the angles of faces towards

camera, the di↵erent shadows casts caused by di↵erent light source direction (known as illumination [6]) and the
background of the photo. An individual’s identity is captured by all these variations which will cause variances

between di↵erent images of the same individual. So it is necessary to normalize a face in order to minimize the

variance [7]. One way to do that is to subtract each image by the “Mean human face”.

More specifically, if given {v1 , v2 , . . . , vk }, k human face vectors, we define Mean face vector as:

k
1X
M ean = vi .
k i=1

26
Zecheng Kuang

And for each vi, we need to subtract the Mean face

i = vi M ean

to get the normalized face vector i.

Then we construct a matrix M whose columns are all the i ’s. If we compute the SVD of M and get U ⌃V T ,

then by Proposition 2.2, we get a space spanned by the column vectors of M and call this “face space”.

Now, since we have a destroyed image of human face, if we vectorized it and subtract from Mean, we get a vector,

called des .

Then the reconstruction is done by getting the best human face approximation to this image, which is the

projection of des onto the “face space”, or

reconstruction = projectionSpan{ 1 , 2 ,..., k } des .

We now re-frame the vector of the reconstruction into a square matrix. This matrix then will represent the

recovered image of the human face.


1
Pk
However, we have to normalized all the vectors of faces which means for every vi , we calculate vi k n=1 vi .

Then the projection of this destroyed image-Mean onto the face space is the best approximation of this destroyed

image to human face image. So we get the recovered image by taking the projection.
The recovered image is given by [4] as below

Figure 16. Recovered Image of Human Face

Acknowledgements

First of all, I would like to thank my adviser Thang Huynh for his patient guidance and encouragement during

the two quarters of my studying and researching. I am deeply thankful to him for giving me this opportunity and

27
Singular-Value Decomposition and its Applications

being my adviser.

I would also like to thank all the professors that I have taken courses with in the Math Department at UCSD.
Without their advice and enthusiasm for mathematics, I wouldnt have the passion for mathematics and finish

this paper. Especially thanks to Professor Thang Huynh for his advice in the Honors Program and in my pursuit

of the beauty of mathematics.

References

[1] G. Dallal, Introduction to Simple Linear Regression, available at http://www.jerrydallal.com/lhsp/slr.htm.

[2] JW. Demmel, Applied Numerical Linear Algebra, Soc. for Industrial and Applied Math, 1997, 109-117.

[3] J. Leskovec, A. Rajaraman, J. Ullman, Mining of Massive Datasets, Cambridge University Press, 2011,
418-427.

[4] M. Mazack. Algorithms for Handwritten Digit Recognition. Masters colloquium, Mathematics Department,

Western Washington University, 2009.

[5] M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp.
71-86, 1991.

[6] T. Chen, W. Yin, X.-S. Zhou, D. Comaniciu, T. S. Huang, Illumination Normalization for Face Recognition

and Uneven Background Correction Using Total Variation Based Image Models CVPR, 2005.

[7] T. Jebara, 3D Pose Estimation and Normalization for Face Recognition, Center for Intelligent Machines,
McGill University, 1996, 61-73.

[8] Y. LeCun, C. Cortes, C.Burges, The Mnist Database, http://yann.lecun.com/exdb/mnist/.

[9] Wikipedia., Centroids, https://en.wikipedia.org/wiki/Centroid.

[10] Wikipedia., Optical character recognition, https://en.wikipedia.org/wiki/Optical character recognition.


[11] Wikipedia., Reduced SVDs, https://en.wikipedia.org/wiki/Singularvalue decomposition#Reduced SVDs.

[12] Image downloaded from https://www.vision.jhu.edu/teaching/vision08/Handouts/case study pca1.pdf.

[13] Image downloaded from http://slideplayer.com/slide/7531421/.

28

You might also like