A Gentle Introduction To Singular-Value Decomposition For Machine Learning
A Gentle Introduction To Singular-Value Decomposition For Machine Learning
Perhaps the most known and widely used matrix decomposition method is the Singular-
Value Decomposition, or SVD. All matrices have an SVD, which makes it more stable
than other methods, such as the eigendecomposition. As such, it is often used in a wide
array of applications including compressing, denoising, and data reduction.
In this tutorial, you will discover the Singular-Value Decomposition method for
decomposing a matrix into its constituent elements.
Update Mar/2018: Fixed typo in reconstruction. Changed V in code to VT for clarity. Fixed typo
in the pseudoinverse equation.
A Gentle Introduction to Singular-Value Decomposition
Photo by Chris Heald, some rights reserved.
Tutorial Overview
This tutorial is divided into 5 parts; they are:
1. Singular-Value Decomposition
2. Calculate Singular-Value Decomposition
3. Reconstruct Matrix from SVD
4. SVD for Pseudoinverse
5. SVD for Dimensionality Reduction
Need help with Linear Algebra for Machine Learning?
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
1 A = U . Sigma . V^T
The SVD is calculated via iterative numerical methods. We will not go into the details of
these methods. Every rectangular matrix has a singular value decomposition, although
the resulting matrices may contain complex numbers and the limitations of floating point
arithmetic may cause some matrices to fail to decompose neatly.
The singular value decomposition (SVD) provides another way to factorize a matrix, into
singular vectors and singular values. The SVD allows us to discover some of the same
kind of information as the eigendecomposition. However, the SVD is more generally
applicable.
The function takes a matrix and returns the U, Sigma and V^T elements. The Sigma
diagonal matrix is returned as a vector of singular values. The V matrix is returned in a
transposed form, e.g. V.T.
The example below defines a 3×2 matrix and calculates the Singular-value
decomposition.
1 # Singular-value decomposition
2 from numpy import array
3 from scipy.linalg import svd
4 # define a matrix
5 A = array([[1, 2], [3, 4], [5, 6]])
6 print(A)
7 # SVD
8 U, s, VT = svd(A)
9 print(U)
10 print(s)
11 print(VT)
Running the example first prints the defined 3×2 matrix, then the 3×3 U matrix, 2
element Sigma vector, and 2×2 V^T matrix elements calculated from the decomposition.
1 [[1 2]
2 [3 4]
3 [5 6]]
4
5 [[-0.2298477 0.88346102 0.40824829]
6 [-0.52474482 0.24078249 -0.81649658]
7 [-0.81964194 -0.40189603 0.40824829]]
8
9 [ 9.52551809 0.51430058]
10
11 [[-0.61962948 -0.78489445]
12 [-0.78489445 0.61962948]]
The U, s, and V elements returned from the svd() cannot be multiplied directly.
The s vector must be converted into a diagonal matrix using the diag() function. By
default, this function will create a square matrix that is m x m, relative to our original
matrix. This causes a problem as the size of the matrices do not fit the rules of matrix
multiplication, where the number of columns in a matrix must match the number of rows
in the subsequent matrix.
After creating the square Sigma diagonal matrix, the sizes of the matrices are relative to
the original m x n matrix that we are decomposing, as follows:
1 U (m x m) . Sigma (m x m) . V^T (n x n)
1 U (m x m) . Sigma (m x n) . V^T (n x n)
We can achieve this by creating a new Sigma matrix of all zero values that is m x n (e.g.
more rows) and populate the first n x n part of the matrix with the square diagonal matrix
calculated via diag().
1 # Reconstruct SVD
2 from numpy import array
3 from numpy import diag
4 from numpy import dot
5 from numpy import zeros
6 from scipy.linalg import svd
7 # define a matrix
8 A = array([[1, 2], [3, 4], [5, 6]])
9 print(A)
10 # Singular-value decomposition
11 U, s, VT = svd(A)
12 # create m x n Sigma matrix
13 Sigma = zeros((A.shape[0], A.shape[1]))
14 # populate Sigma with n x n diagonal matrix
15 Sigma[:A.shape[1], :A.shape[1]] = diag(s)
16 # reconstruct matrix
17 B = U.dot(Sigma.dot(VT))
18 print(B)
Running the example first prints the original matrix, then the matrix reconstructed from
the SVD elements.
1 [[1 2]
2 [3 4]
3 [5 6]]
4
5 [[ 1. 2.]
6 [ 3. 4.]
7 [ 5. 6.]]
The above complication with the Sigma diagonal only exists with the case where m and
n are not equal. The diagonal matrix can be used directly when reconstructing a square
matrix, as follows.
1 # Reconstruct SVD
2 from numpy import array
3 from numpy import diag
4 from numpy import dot
5 from scipy.linalg import svd
6 # define a matrix
7 A = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
8 print(A)
9 # Singular-value decomposition
10 U, s, VT = svd(A)
11 # create n x n Sigma matrix
12 Sigma = diag(s)
13 # reconstruct matrix
14 B = U.dot(Sigma.dot(VT))
15 print(B)
Running the example prints the original 3×3 matrix and the version reconstructed directly
from the SVD elements.
1 [[1 2 3]
2 [4 5 6]
3 [7 8 9]]
4
5 [[ 1. 2. 3.]
6 [ 4. 5. 6.]
7 [ 7. 8. 9.]]
It is also called the the Moore-Penrose Inverse after two independent discoverers of the
method or the Generalized Inverse.
Matrix inversion is not defined for matrices that are not square. […] When A has more
columns than rows, then solving a linear equation using the pseudoinverse provides one
of the many possible solutions.
1 A^+ = VD^+U^T
Where A^+ is the pseudoinverse, D^+ is the pseudoinverse of the diagonal matrix Sigma
and U^T is the transpose of U.
1 A = U . Sigma . V^T
The D^+ can be calculated by creating a diagonal matrix from Sigma, calculating the
reciprocal of each non-zero element in Sigma, and taking the transpose if the original
matrix was rectangular.
1 s11, 0, 0
2 Sigma = ( 0, s22, 0)
3 0, 0, s33
1 1/s11, 0, 0
2 D^+ = ( 0, 1/s22, 0)
3 0, 0, 1/s33
The pseudoinverse provides one way of solving the linear regression equation,
specifically when there are more rows than there are columns, which is often the case.
NumPy provides the function pinv() for calculating the pseudoinverse of a rectangular
matrix.
The example below defines a 4×2 matrix and calculates the pseudoinverse.
1 # Pseudoinverse
2 from numpy import array
3 from numpy.linalg import pinv
4 # define matrix
5 A = array([
6 [0.1, 0.2],
7 [0.3, 0.4],
8 [0.5, 0.6],
9 [0.7, 0.8]])
10 print(A)
11 # calculate pseudoinverse
12 B = pinv(A)
13 print(B)
Running the example first prints the defined matrix, and then the calculated
pseudoinverse.
1 [[ 0.1 0.2]
2 [ 0.3 0.4]
3 [ 0.5 0.6]
4 [ 0.7 0.8]]
5
6 [[ -1.00000000e+01 -5.00000000e+00 9.04289323e-15 5.00000000e+00]
7 [ 8.50000000e+00 4.50000000e+00 5.00000000e-01 -3.50000000e+00]]
We can calculate the pseudoinverse manually via the SVD and compare the results to
the pinv() function.
First we must calculate the SVD. Next we must calculate the reciprocal of each value in
the s array. Then the s array can be transformed into a diagonal matrix with an added
row of zeros to make it rectangular. Finally, we can calculate the pseudoinverse from the
elements.
Running the example first prints the defined rectangular matrix and the pseudoinverse
that matches the above results from the pinv() function.
1 [[ 0.1 0.2]
2 [ 0.3 0.4]
3 [ 0.5 0.6]
4 [ 0.7 0.8]]
5
6 [[ -1.00000000e+01 -5.00000000e+00 9.04831765e-15 5.00000000e+00]
7 [ 8.50000000e+00 4.50000000e+00 5.00000000e-01 -3.50000000e+00]]
Data with a large number of features, such as more features (columns) than
observations (rows) may be reduced to a smaller subset of features that are most
relevant to the prediction problem.
The result is a matrix with a lower rank that is said to approximate the original matrix.
To do this we can perform an SVD operation on the original data and select the top k
largest singular values in Sigma. These columns can be selected from Sigma and the
rows selected from V^T.
1 B = U . Sigmak . V^Tk
1 T = U . Sigmak
Further, this transform can be calculated and applied to the original matrix A as well as
other similar matrices.
1 T = V^Tk . A
First a 3×10 matrix is defined, with more columns than rows. The SVD is calculated and
only the first two features are selected. The elements are recombined to give an
accurate reproduction of the original matrix. Finally the transform is calculated two
different ways.
Running the example first prints the defined matrix then the reconstructed
approximation, followed by two equivalent transforms of the original matrix.
1 [[ 1 2 3 4 5 6 7 8 9 10]
2 [11 12 13 14 15 16 17 18 19 20]
3 [21 22 23 24 25 26 27 28 29 30]]
4
5 [[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
6 [ 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]
7 [ 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.]]
8
9 [[-18.52157747 6.47697214]
10 [-49.81310011 1.91182038]
11 [-81.10462276 -2.65333138]]
12
13 [[-18.52157747 6.47697214]
14 [-49.81310011 1.91182038]
15 [-81.10462276 -2.65333138]]
The scikit-learn provides a TruncatedSVD class that implements this capability directly.
The TruncatedSVD class can be created in which you must specify the number of
desirable features or components to select, e.g. 2. Once created, you can fit the
transform (e.g. calculate V^Tk) by calling the fit() function, then apply it to the original
matrix by calling the transform() function. The result is the transform of A called T above.
Running the example first prints the defined matrix, followed by the transformed version
of the matrix.
We can see that the values match those calculated manually above, except for the sign
on some values. We can expect there to be some instability when it comes to the sign
given the nature of the calculations involved and the differences in the underlying
libraries and methods used. This instability of sign should not be a problem in practice as
long as the transform is trained for reuse.
1 [[ 1 2 3 4 5 6 7 8 9 10]
2 [11 12 13 14 15 16 17 18 19 20]
3 [21 22 23 24 25 26 27 28 29 30]]
4
5 [[ 18.52157747 6.47697214]
6 [ 49.81310011 1.91182038]
7 [ 81.10462276 -2.65333138]]
Extensions
This section lists some ideas for extending the tutorial that you may wish to explore.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Books
Chapter 12, Singular-Value and Jordan Decompositions, Linear Algebra and Matrix Analysis for
Statistics, 2014.
Chapter 4, The Singular Value Decomposition and Chapter 5, More on the SVD, Numerical
Linear Algebra, 1997.
Section 2.4 The Singular Value Decomposition, Matrix Computations, 2012.
Chapter 7 The Singular Value Decomposition (SVD), Introduction to Linear Algebra, Fifth
Edition, 2016.
Section 2.8 Singular Value Decomposition, Deep Learning, 2016.
Section 7.D Polar Decomposition and Singular Value Decomposition, Linear Algebra Done
Right, Third Edition, 2015.
Lecture 3 The Singular Value Decomposition, Numerical Linear Algebra, 1997.
Section 2.6 Singular Value Decomposition, Numerical Recipes: The Art of Scientific Computing,
Third Edition, 2007.
Section 2.9 The Moore-Penrose Pseudoinverse, Deep Learning, 2016.
API
numpy.linalg.svd() API
numpy.matrix.H API
numpy.diag() API
numpy.linalg.pinv() API.
sklearn.decomposition.TruncatedSVD API
Articles
Matrix decomposition on Wikipedia
Singular-value decomposition on Wikipedia
Singular value on Wikipedia
Moore-Penrose inverse on Wikipedia
Latent semantic analysis on Wikipedia
Summary
In this tutorial, you discovered the Singular-value decomposition method for
decomposing a matrix into its constituent elements.