gCha3A
gCha3A
A.0. Introduction 1
A.0.1. History
The singular value decomposition was originally developed by Differential Geometers, who
wished to determine whether a real bilinear form could be made equal to another by
independent orthogonal transformations of the two spaces it acts on. Eugenio Beltrami and
Camille Jordan discovered independently, in 1873 and 1874 respectively, that the singular
values of the bilinear forms, represented as a matrix, form a complete set of invariants for
bilinear forms under orthogonal substitutions. James Joseph Sylvester also arrived at the
singular value decomposition for real square matrices in 1889, apparently independent of
both Beltrami and Jordan. Sylvester called the singular values the canonical multipliers of
matrix A. The fourth mathematician to discover the singular value decomposition
independently is Autonne in 1915 who arrived at it via the polar decomposition. The first
proof of the singular value decomposition for rectangular and complex matrices seems to be
by Carl Eckart and Gale Young in 1936; they saw it as a generalization of the principal axis
transformation for Hermitian matrices.
In 1907, Erhard Schmidt defined an analog of singular values for integral operators (which
are compact, under some weak technical assumptions); it seems he was unaware of the
parallel work on singular values of finite matrices. This theory was further developed by
Émile Picard in 1910, who is the first to call the numbers σk singular values (or rather,
valeurs singulières).
Practical methods for computing the SVD date back to Kogbetliantz in 1954, 1955 and
Hestenes in 1958 resembling closely the Jacobi eigenvalue algorithm, which uses plane
rotations or Givens rotations. However, these were replaced by the method of Gene Golub
and William Kahan published in 1965, which uses Householder transformations or
reflections. In 1970, Golub and Christian Reinsch published a variant of the Golub/Kahan
algorithm that is still the one most-used today.
Theorem A.0.1: Suppose A is an mn matrix whose entries come from the field K (R or C).
Then there exists a factorization –called a singular-value decomposition of A– of the form
A = UΣV*,
where U is an mm unitary (i.e. U*U = UU* = I) matrix over K, Σ is an mn ‘diagonal’
matrix with nonnegative real numbers on the main diagonal, and V* denotes the conjugate
transpose of V, an nn unitary matrix over K.
Proof:
Exercise
1
Wikipedia, the free enctyclopedia
Παύλος Χριστοδουλίδης. Σημειώσεις διαλέξεων στη Γραμμική Άλγεβρα ii
Remarks A.0.1: (a) A common convention is to order the diagonal entries Σii in descending
order. In this case, the matrix Σ is uniquely determined by A (though the matrices U and V
are not). The diagonal entries of Σ are known as the singular values of A.
(b) (Intuitive explanation) In A = UΣV*, the columns of V form a set of orthonormal “input”
or “analyzing” basis vector directions for A (these are the eigenvectors of A*A), the columns
of U form a set of orthonormal “output” basis vector directions for A (these are the
eigenvectors of AA*) and the diagonal values in matrix Σ are the singular values, which can
be thought of as scalar “gain controls” by which each corresponding input is multiplied to
give a corresponding output (these are the square roots of the eigenvalues that correspond
with the same columns in U and V).
(c) (Geometric meaning) Because U and V are unitary, we know that the columns u1, ..., um
of U yield an orthonormal basis of Km and the columns v1, ..., vn of V yield an orthonormal
basis of Kn (with respect to the standard scalar products on these spaces). The linear
transformation T:Kn→Km that takes a vector x to Ax has a particularly simple description
with respect to these orthonormal bases: we have T(vi) = σiui, for i = 1, ..., min(m, n), where σi
is the i-th diagonal entry of Σ, and T(vi) = 0 for i > min(m, n). The geometric content of the
SVD theorem can thus be summarized as follows:
“For every linear map T:Kn→Km one can find orthonormal bases of Kn and Km such that T
maps the i-th basis vector of Kn to a non-negative multiple of the i-th basis vector of Km, and
sends the left-over basis vectors to zero. With respect to these bases, the map T is therefore
represented by a diagonal matrix with non-negative real diagonal entries.”
To get a more visual flavor of singular values and SVD decomposition –at least when
working on real vector spaces– consider the sphere S of radius one in Rn. The linear map T
maps this sphere onto an ellipsoid in Rm. Non-zero singular values are simply the lengths of
the semi-axes of this ellipsoid.
A.1.1. Pseudoinverse
The singular value decomposition can be used for computing the pseudoinverse of a matrix.
Indeed, the pseudoinverse of the matrix A with singular value decomposition A = UΣV* is
A+ = VΣ+U*,
where Σ+ is the pseudoinverse of Σ, which is formed by replacing every nonzero entry by its
reciprocal. The pseudoinverse is one way to solve linear least squares problems.
A set of homogeneous linear equations can be written as Ax = 0 for a matrix A and vector x.
A typical situation is that A is known and a non-zero x, which satisfies the equation, is to be
determined. Such an x belongs to A’s null space and is sometimes called a (right) null vector
of A. Vector x can be characterized as a right singular vector corresponding to a singular
value of A that is zero. This observation means that if A has no vanishing singular value the
equation has no non-zero x as a solution. It also means that if there are several vanishing
singular values, any linear combination of the corresponding right singular vectors is a valid
solution. Analogously to the definition of a (right) null vector, a non-zero x satisfying x*Α =
0, with x* denoting the conjugate transpose of x, is called a left null vector of A.
Παράρτημα Α. Ανάλυση ιδιάζουσας τιμής iii
A total least squares problem refers to determining the vector x that minimizes the 2-norm of
a vector Ax under the constraint ||x|| = 1. The solution turns out to be the right singular vector
of A corresponding to the smallest singular value.
Another application of the SVD is that it provides an explicit representation of the range and
the null space of a matrix A. The right singular vectors corresponding to vanishing singular
values of A span the null space of A. The left singular vectors corresponding to the non-zero
singular values of A span the range of A. As a consequence, the rank of A equals the number
of non-zero singular values which is the same as the number of non-zero elements in Σ. In
Numerical Linear Algebra the singular values can be used to determine the effective rank of a
matrix, as rounding error may lead to small but non-zero singular values in a rank deficient
matrix.
Some practical applications need to solve the problem of approximating a matrix A with
~
another matrix that has a specific rank r. In the case where the approximation is based on
~
minimizing the Frobenius norm of the difference between A and under the constraint that
~ ~ ~
rank( ) = r, it turns out that the solution is given by the SVD, namely = U V*, where
~
is the same matrix as Σ except that it contains only the r largest singular values (the other
singular values are replaced by zero). This is known as the Eckart-Young theorem (as it was
proved by Eckart and Young in 1936, albeit it was later found to have been proven before).
In addition to being the best approximation of a given rank to the matrix under the Frobenius
~
norm, as defined above is also the best approximation of that rank under the L2 norm.
The SVD can be thought of as decomposing a matrix into a weighted, ordered sum of
separable matrices. By separable, we mean that a matrix A can be written as an outer product
of two vectors A = uv. In particular, the matrix A can be decomposed as:
A = i = iU i Vi ,
i i
where Ui and Vi are the ith columns of the corresponding SVD matrices, σi are the ordered
singular values, and each Ai is separable. The SVD can be used to find the decomposition of
an image processing filter into separable horizontal and vertical filters. Note that the number
of non-zero σi is exactly the rank of the matrix.
Separable models often arise in biological systems, and the SVD decomposition is useful to
analyze such systems. For example, some visual area V1 simple cells receptive fields can be
well described by a Gabor filter in the space domain multiplied by a modulation function in
the time domain. Thus, given a linear filter evaluated through, for example, reverse
Παύλος Χριστοδουλίδης. Σημειώσεις διαλέξεων στη Γραμμική Άλγεβρα iv
correlation, one can rearrange the two spatial dimensions into one dimension, thus yielding a
two dimensional filter (space, time) that can be decomposed through SVD. The first column
of U in the SVD decomposition is then a Gabor filter while the first column of V represents
the time modulation (or vice-versa). One may then define an index of separability α =
12 / 2 , which is the fraction of the power in the matrix Α that is accounted for by the
first separable matrix in the decomposition.
The SVD is also applied extensively to the study of linear inverse problems, and is useful in
the analysis of regularization methods such as that of Tikhonov. It is widely used in Statistics
where it is related to principal component analysis and in Signal processing and Pattern
recognition. It is also used in output-only modal analysis, where the non-scaled mode shapes
can be determined from the singular vectors. Yet another usage is latent semantic indexing in
natural language text processing.
The SVD also plays a crucial role in the field of Quantum information, in a form often
referred to as the Schmidt decomposition. Through it, states of two quantum systems are
naturally decomposed, providing a necessary and sufficient condition for them to be
entangled: if the rank of the Σ matrix is larger than one.
One application of SVD to rather large matrices is in numerical weather prediction, where
Lanczos methods are used to estimate the most linearly quickly growing few perturbations to
the central numerical weather prediction over a given initial forward time period – i.e. the
singular vectors corresponding to the largest singular values of the linearized propagator for
the global weather over that time interval. The output singular vectors in this case are entire
weather systems. These perturbations are then run through the full nonlinear model to
generate an ensemble forecast, giving a handle on some of the uncertainty that should be
allowed for around the current central prediction.
Definition A.2.1: A non-negative real number σ is a singular value of matrix A if and only if
there exist unit-length vectors u in Km and v in Kn such that
Av = σu and A*u = σv.
The vectors u and v are called left-singular and right-singular vectors for σ, respectively.
Remark A.2.1: (a) In any singular value decomposition A = UΣV*, the diagonal entries of Σ
are necessarily equal to the singular values of A. The columns of U and V are, respectively,
left- and right-singular vectors for the corresponding singular values. Consequently, an m×n
matrix A has at least one and at most p = min(m, n) distinct singular values.
(b) It is always possible to find a unitary basis for Km consisting of left-singular vectors of A
and a unitary basis for Kn consisting of right-singular vectors of A.
Παράρτημα Α. Ανάλυση ιδιάζουσας τιμής v
(c) A singular value for which we can find two left (or right) singular vectors that are linearly
independent is called degenerate. Non-degenerate singular values always have unique left
and right singular vectors, up to multiplication by a unit phase factor eiφ (for the real case up
to sign). Consequently, if all singular values of A are non-degenerate and non-zero, then its
singular value decomposition is unique, up to multiplication of a column of U by a unit phase
factor and simultaneous multiplication of the corresponding column of V by the same unit
phase factor. Degenerate singular values, by definition, have non-unique singular vectors.
Furthermore, if u1 and u2 are two left-singular vectors which both correspond to the singular
value σ, then any normalized linear combination of the two vectors is also a left singular
vector corresponding to the singular value σ. The similar statement is true for right singular
vectors. Consequently, if A has degenerate singular values, then its singular value
decomposition is not unique.
Calculating the SVD consists of finding the eigenvalues and eigenvectors of AA* and A*A.
The eigenvectors of A*A make up the columns of V, the eigenvectors of AA* make up the
columns of U. The singular values are the square roots of the eigenvalues of AA* or A*A,
and are the diagonal entries of Σ arranged in descending order. The singular values are
always real numbers. If the matrix Α is a real matrix, then U and V are also real.
2 4
1 3
Example A.2.1: Find the SVD of matrix A = .
0 0
0 0
Solution:
Since, AR42, it follows that AT = A*.
2 4 20 14 0 0
1 3 2 1 0 0 14 10 0 0
AAT = =
0 0 4 3 0 0 0 0 0 0
0 0 0 0 0 0
and
2 4
T 2 1 0 0 1 3 5 11
A A = = .
4 3 0 0 0 0 11 25
0
0
STEP 3: The eigenvectors corresponding to the eigenvalues above are the columns of
matrices U and V respectively (with U*U = V*V = I):
0.82 0.58 0 0
0.58 0.82 0 0
U=
0 0 1 0
0 0 0 1
and
0.40 0.91
V = (exercise).
0.91 0.40
STEP 4: Finally as mentioned above the square roots of the eigenvalues (in descending order)
of AAT or ATA constitute the entries of matrix Σ as follows:
5.47 0
0 0.37
Σ= .
0 0
0 0
Exercise A.2.1: Verify that indeed UΣVT = A, for example A.2.1 above.
2 2
Example A.2.2: Find the SVD of matrix A = .
1 1
Solution:
Since, AR22, it follows that AT = A*.
2 2 2 1 8 0
AAT = =
1 1 2 1 0 2
and
2 1 2 2 5 3
ATA = = .
2 1 1 1 3 5
Παράρτημα Α. Ανάλυση ιδιάζουσας τιμής vii
STEP 3: The eigenvectors corresponding to the eigenvalues above are the columns of
matrices U and V respectively (with U*U = V*V = I):
1 0
U =
0 1
and
1 / 2 1/ 2
V = (exercise).
1 / 2 1 / 2
STEP 4: Finally as mentioned above the square roots of the eigenvalues (in descending order)
of AAT or ATA constitute the entries of matrix Σ as follows:
2 2 0
Σ = .
0 2
1 0 2 2 0 1/ 2 1/ 2
A =
1/ 2 1/ 2 .
0 1 0 2
1 0 0 0 2
0 0 3 0 0
Example A.2.3: : Find the SVD of matrix A = .
0 0 0 0 0
0 4 0 0 0
Solution:
Since, AR45, it follows that AT = A*.
1 0 0 0
1 0 0 0 2 5 0 0 0
0 0 0 4
0 0 3 0 0 0 9 0 0
T
AA = 0 3 0 0 =
0 0 0 0 0 0 0 0 0
0 0 0 0
0 4 0 0 0 0 0 0 16
2 0 0 0
and
Παύλος Χριστοδουλίδης. Σημειώσεις διαλέξεων στη Γραμμική Άλγεβρα viii
1 0 0 0 1 0 0 0 2
1 0 0 0 2
0 0 0 4 0 16 0 0 0
0 0 3 0 0
A A = 0
T
3 0 0 = 0 0 9 0 0 .
0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 0 4 0 0 0
0 0 0 2 0 0 0 4
STEP 3: The eigenvectors corresponding to the eigenvalues above are the columns of
matrices U and V respectively (with U*U = V*V = I):
0 0 1 0
0 1 0 0
U=
0 0 0 1
1 0 0 0
and
0 0 0.2 0 0.8
1 0 0 0 0
V = 0 1 0 0 0
(exercise).
0 0 0 1 0
0 0 0.8 0 0.2
STEP 4: Finally as mentioned above the square roots of the eigenvalues (in descending order)
of AAT or ATA constitute the entries of matrix Σ as follows:
4 0 0 0 0
0 3 0 0 0
Σ= .
0 0 5 0 0
0 0 0 0 0
0 1 0 0 0
0 0 1 0 4 0 0 0 0
0 0 1 0 0
0 1 0 0 0 3 0 0 0
A= 0.2 0 0 0 0.8 .
0 0 0 1 0 0 5 0 0
0 0 0 1 0
1 0 0 0 0 0 0 0 0
0.8 0 0 0 0.2
Παράρτημα Α. Ανάλυση ιδιάζουσας τιμής ix
Note: It should also be noted that this particular singular value decomposition is not unique
(why? – see remark A.2.1c). For instance, the following SVD for A is also valid (exercise):
0 1 0 0 0
0 0 1 0 4 0 0 0 0
0 0 1 0 0
0 1 0 0 0 3 0 0 0
A= 0.2 0 0 0 0.8 .
0 0 0 1 0 0 5 0 0
0.4 0 0 0.5 0.1
1 0 0 0 0 0 0 0 0
0.4 0 0 0.5 0.1
Remark A.2.3: In an analogous manner to SVD –which is very general in the sense that it can
be applied to any m×n matrix– for certain classes of square (mm) matrices A one can apply
the so-called eigenvalue decomposition, where one can factorize matrix A by
A = PDP−1,
with P being an mm matrix consisting of the eigenvectors of A and D an mm diagonal
matrix with the (corresponding) eigenvalues of A on the main diagonal.
Nevertheless, the two decompositions are related. Given an SVD of A, as described above,
the following two relations hold true:
A*A = VΣ*U*UΣV* = V(Σ*Σ)V* and AA* = UΣV*VΣ*U* = U(ΣΣ*)U*.
The right hand sides of these relations describe the eigenvalue decompositions of the left
hand sides. Consequently, the squares of the non-zero singular values of A are equal to the
non-zero eigenvalues of either A*A or AA*. Furthermore, the columns of U (left singular
vectors) are eigenvectors of AA* and the columns of V (right singular vectors) are
eigenvectors of A*A.
In the special case that A is a normal matrix, which by definition must be square, the spectral
theorem says that it can be unitarily diagonalized using a basis of eigenvectors, so that it can
Παύλος Χριστοδουλίδης. Σημειώσεις διαλέξεων στη Γραμμική Άλγεβρα x
However, the eigenvalue decomposition and the SVD differ for all other matrices A. While
the eigenvalue decomposition is A = UDU−1, where U is not necessarily unitary and D is not
necessarily positive semi-definite, the SVD is A = UΣV*, where Σ is a diagonal positive
semi-definite, and U and V are unitary matrices that are not necessarily related except
through the matrix A.
EXERCISES A.2.
(1) Find the SVD of each of the following matrices and state whether they are unique.
2 2 3 1 3 3
(a) , (b) , (c) ,
1 1 1 3 2 2
2 4
0 0 0 0 1 2
(d) , (e) ,
0 0
0 0 3 4
0 0
0 0 0 0
2 0 0 0 2
2 0 0 0
0 0 0 0 0
(f) , (g) 0 3 0 0 .
0 1 0 0 0
0 0 0 4
0 0 3 0 0 1 0 0 0