0% found this document useful (0 votes)
6 views

chapters7-8

Chapter 7 introduces special matrices such as unitary, normal, and Hermitian matrices, detailing their properties and definitions. It covers unitary and orthogonal matrices, their equivalences, and the Schur's Triangularization Theorem, which states that any square matrix can be transformed into an upper triangular form using a unitary matrix. The chapter also discusses Householder transformations and their applications in numerical linear algebra.

Uploaded by

Shiraz Jaffar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

chapters7-8

Chapter 7 introduces special matrices such as unitary, normal, and Hermitian matrices, detailing their properties and definitions. It covers unitary and orthogonal matrices, their equivalences, and the Schur's Triangularization Theorem, which states that any square matrix can be transformed into an upper triangular form using a unitary matrix. The chapter also discusses Householder transformations and their applications in numerical linear algebra.

Uploaded by

Shiraz Jaffar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Chapter 7 Special Matrices

In this chapter we will introduce some special matrices such as unitary, normal and
Hermitian matrices along with their interesting and special properties. For simplicity, we
shall use ⋅, ⋅ and ⋅ to denote the usual inner product and the Euclidean norm,

respectively, on F n .

7.1 Unitary and Orthogonal Matrices

n× n
For a matrix A ∈ , its Conjugate Transpose (or Hermitian Transpose) is defined by
AH = ( A )
T
(7.1)
For example,
H
⎛2 + i 3−i ⎞ ⎛2 −i 4 −i ⎞
⎜⎜ ⎟⎟ = ⎜⎜ ⎟⎟
⎝ 4 + i 7 − 2i ⎠ ⎝ 3 + i 7 + 2i ⎠
The Conjugate Transpose (or Hermitian Transpose) of a matrix have the following
properties:

(1) (A H ) = A
H

(2) (αA + β B ) = α A H + β B H , α , β ∈
H

(3) ( AC )H = C H AH

n×n
Definition 7.1 A matrix U ∈ is said to be Unitary if it satisfies

U H U = UU H = I n (7.2)

n×n
Equivalently, U ∈ is unitary iff U −1 = U H . In addition, if U is real then it is also
called an orthogonal matrix.

When we say a matrix is orthogonal, we always mean it is a real orthogonal matrix.


n×n
Hence U is orthogonal iff it is real and satisfies U T U = UU T = I . In general, if U ∈

satisfies U T U = UU T = I , then we call it a complex orthogonal matrix.

- 188 -
Lemma 7.1 If U is unitary (or orthogonal, respectively), then so are U H , U T , U .

Proof:
U is unitary ⇒ U H U = UU H = I
⇒ (U H ) U H = UU H = I ⇒ U H is unitary
H

Assume U = A + iB , then
U = A − iB , U T = AT + iB T , U H = AT − iB T

U is unitary ⇒ U H U = ( AT − iBT ) ( A + iB ) = ( AT A + B T B ) + i ( AT B − BT A ) = I
⇒ AT A + BT B = I , AT B − BT A = 0

Hence, for U T , U , we have

U H U = U T U = ( AT + iBT ) ( A − iB )
= ( AT A + BT B ) − i ( AT B − B T A ) = I − i 0 = I
⇒ U is unitary

U T (U T ) = U T U = I ⇒ U T is unitary
H

Theorem 7.2 Let U ∈ F n×n is unitary, then the following are equivalent:
(i) U is unitary (U is orthogonal if F = ).
(ii) The columns of U form an orthonormal basis of F n .
(iii) The rows of U form an orthonormal basis of F n .
(iv) Ux , Uy = x , y for all x , y ∈ F n .

(v) Ux = x for all x ∈ F n

Proof:
(i) ⇔ (ii) Let ( u1 , , un ) denotes the columns of U, then

U H U = I iff ui , u j = δ ij for all i, j ,

where δ ij is the Kronecker delta. Hence U is unitary iff the columns of U are

orthonormal. As any n orthonormal vectors in F n must form a basis of F n , the above


can be restated as: U is unitary iff the columns of U form an orthonormal basis of F n .

- 189 -
(i) ⇔ (iii) By Theorem 7.1, the result just proved,

U is unitary ⇔ U T is unitary
⇔ columns of U T form an orthonormal basis for F n
⇔ rows of U T form an orthonormal basis for F n

(i) ⇔ (v) Suppose U is unitary, then

= Ux , Ux = (Ux ) (Ux ) = x HU UUx = x H x =


2 H 2
Ux x
Therefore (v) holds.

(v) ⇔ (iv) Suppose (v) holds. Then for all x , y ∈ F n

U ( x + y) = x + y
2 2

⇒ Ux + Uy, Ux + Uy = x + y , x + y
2 2 2 2
⇒ Ux + Uy + Ux , Uy + Uy , Ux = x + y + x , y + y , x
⇒ Re ( Ux , Uy ) = Re ( x, y ) (7.3)

where Re(z) denotes the real part of the scalar z. If F = , then x , y is always real and

(7.3) implies that (iv) holds. If F = , then we can deduce from the identity

( )
2 2
U x + −1 y = x + −1 y
that
Im ( Ux , Uy ) = Im ( x, y )
for all x , y , where Im(z) denotes the imaginary part of z. This and (7.3) together give

(iv).

(iv) ⇔ (i) Suppose U is NOT unitary, then U H U − I ≠ 0 , and there exists x ∈ F n such

that y := (U H U − I ) x ≠ 0 . As a result,

0 < y H y = y H (U H U − I ) x = Ux , Uy − x , y ,

and (iv) does not hold.

Statement (iv) and (v) of the above theorem say that: a square matrix is unitary iff (1) it
preserves the usual inner product of vectors, or (2) it preserves the Euclidean norm of
vectors.

- 190 -
Let 1 ≤ m < n , and let u1 , , um be an orthonormal set of vectors in F n . As u1 , , um are

linearly independent, one can always append vectors to the collection of them to form a
basis B = {u1 , , um , vm +1 , , vn } of F n . Then one may apply the Gram-Schmidt process on

B to obtain an orthonormal basis of F n . It is easy to see that, since u1 , , um are already

orthonormal, the orthonormal basis obtained by applying the Gram-Schmidt process on B


will contain u1 , , um as the first m vectors. Thus, we conclude that any orthonormal set

in F n can be extended to become an orthonormal basis for F n . Equivalently, given any


V ∈ F n×m (where m < n) whose columns are orthonormal, one can always find

W ∈ F n×( n − m ) such that U = (V W ) ∈ F n×n is unitary (or orthogonal, if F = ).

7.2 Schur’s Triangularization Theorem

We shall introduce Householder matrix before deriving the Schur’s triangularization


theorem. In linear algebra, a Householder transformation (also known as Householder
reflection or elementary reflector) is a linear transformation that describes a reflection
about a plane or hyperplane containing the origin. Householder transformations are widely
used in numerical linear algebra, to perform QR decompositions and in the first step of the
QR algorithm. The Householder transformation was introduced in 1958 by Alston Scott
Householder.

The reflection hyperplane can be defined by a unit vector u which is orthogonal to the
hyperplane. The reflection of a point x about this hyperplane is:
x − 2 u , x u = x − 2u ( u H x )
(7.4)
or x′ = Qx = ( I − 2uu H ) x

This is a linear transformation given by the Householder matrix Q :

Q = I − 2uu H (7.5)
The Householder matrix has the following properties:
(1) Q is Hermitian: Q = Q H .

(2) Q is unitary: Q −1 = Q H .

(3) hence it is involuntary: Q 2 = I .

- 191 -
Figure 7.1 gives the geometric interoperation of Householder transformation:
Given x , find u ∋ Hx = ( I − 2uu T ) x = α e1
x
u hyperplane Where α = α e1 = Hx 2 = x

x − x e1
α e1 ∴u = (7.6)
x − x e1
Fig. 7.1 Householder transformation

Theorem 7.3 (Schur’s Triangularization Theorem)


n× n
(i) Let A ∈ , then there exists unitary matrix U such that T := U H AU is in
upper triangular form. Moreover, the n eigenvalues (counting algebraic
multiplicities) of A can be put in any order as the diagonal entries of T.
(ii) Let A ∈ n×n , and let the n eigenvalues (counting algebraic multiplicities) of A be
real. Then there exists orthogonal matrix U such that T := U T AU is in upper
triangular form. Moreover, the n eigenvalues of A can be put in any order as the
diagonal entries of T.

Proof: We shall prove (i) by mathematical induction on n. The proof of (ii) is similar.
Let ( λ1 , w1 ) be an eigenpair of A with w1 = 1 . Choose w2 , , wn to be such that

U1 = ( w1 , , wn ) is unitary,

⎛ w1H ⎞ ⎛ λ1 *1 ⎞
⎜ ⎟ ⎜ ⎟
⇒ U1 AU1 = ⎜ ⎟ ( Aw1 … Awn ) = ⎜
H

⎜ wnH ⎟ ⎜0 *2 ⎟⎠
⎝ ⎠ ⎝ ( n −1)×1
Choose
⎛ 1 01×( n −1) ⎞ ⎛ λ1 *1V2 ⎞
⎜ ⎟ ⎜ ⎟
U2 = ⎜ ⎟ ⇒ U 2 H (U1H AU1 ) U 2 = ⎜ ⎟
⎜ 0( n −1)×1 V2 ⎟⎠ ⎜ 0( n −1)×1 V2 *2 V2 ⎟⎠
H
⎝ ⎝
( n −1)×( n −1)
Choose V2 ∈ to be unitary

⎛ λ2 ∗3 ⎞
⎜ ⎟
∋ V2H ∗2 V2 = ⎜ ⎟
⎜0 ∗4 ⎟⎠
⎝ ( n − 2)×1 ( n −1)×( n −1)

- 192 -
⎛ λ1 * ⎞
⎜ ⎟
⇒ U 2 H U 1 H A U1 U 2 = ⎜ 0 λ2 ∗ ⎟
⎜ 0( n − 2)×2 ∗ ⎟⎠

Continue this process, we prove the theorem.

Remark 7.1
(i) Recall that a square matrix may not be diagonalizable, i.e., it may not be similar to
any diagonal matrix. The above theorem tells us that all matrices A ∈ n×n are
similar to some upper triangular matrix; moreover, the triangularizing matrix can
be taken to be unitary.

(ii) In general, if A, B are square matrices such that B = U H AU for some unitary
matrix (or orthogonal matrix, respectively) then we say B is unitarily similar (or
orthogonally similar, respectively) to A. Hence the Schur triangularization theorem
says that every complex square matrix is unitarily similar to an upper triangular
matrix. Also, every real square matrix that has no nonreal eigenvalues is
orthogonally similar to a real upper triangular matrix.
(iii) The relation of being unitarily similar (or orthogonally similar, respectively) is an
equivalence relation.
(iv) A real square matrix having some nonreal eigenvalues cannot be triangularized by
a real orthogonal matrix (why?). But it can always be triangularized (to become a
nonreal triangular matrix) by some nonreal unitary matrix.
(v) In the above theorem we triangularized A to become an upper triangular matrix. By
applying the theorem on AT instead, we can show easily that the results of the
theorem also hold if we change “upper triangular” to “lower triangular.”
We will give an example to demonstrate the Schur triangularization process.

Example 7.1 Find a Schur triangularization of the matrix


⎡ − 1 −1 2 ⎤
A = ⎢⎢ 8 −11 −8⎥⎥
⎢⎣ −10 11 7 ⎥⎦
Solution:
The eigenvalues of A are λ1 = 1, λ2 = −3, λ1 = −3. Arbitrarily choose an eigenvalue, say

λ1 = 1 , then

⎡ 1 ⎤
⎡ −2 − 1 2 ⎤ ⎢ 1 0 2⎥
A − I = ⎢⎢ 8 −12 −8⎥⎥ → ⎢0 1 1 ⎥
⎢ ⎥
⎣⎢ −10 11 6 ⎥⎦ ⎢0 0 0 ⎥
⎣ ⎦

- 193 -
⎡− 4 ⎤
⎡ −1⎤ ⎢ 3⎥
1⎢ ⎥ x − e1 3⎢ 2 ⎥
and x = −2 is an associated unit eigenvector. Let u = = − and let Q
3⎢ ⎥ x − e1 8 ⎢ 3⎥
⎢⎣ 2 ⎥⎦ ⎢ 2 ⎥
⎢⎣ 3 ⎥⎦
be the associated Householder matrix, i.e.,
⎡ −1 −2 2 ⎤
1⎢
Q = I − 2uu = ⎢ −2 2 1 ⎥⎥ = ⎡⎣ x
T
V]
3
⎢⎣ 2 1 2 ⎥⎦
where
⎡ −2 2⎤
1⎢
V= ⎢2 1 ⎥⎥
3
⎢⎣ 1 2 ⎥⎦
Then
⎡ 3 64 13 ⎤
1⎢ 1 ⎡ −13 −1 ⎤
QAQ = ⎢0 −13 −1⎥⎥ and V T AV = ⎢
3 3 ⎣ 16 −15⎥⎦
⎢⎣0 16 −5⎥⎦

Now triangularize the 2 × 2 matrix V T AV , which has the single eigenvalue −3. The

1 ⎡1⎤ x − e1 ⎡ −0.6154 ⎤
associated unit eigenvector is x = ⎢ −4 ⎥ . Let u = x − e = ⎢ −0.7882 ⎥ and let P be
17 ⎣ ⎦ 1 ⎣ ⎦
the Householder matrix associated with u , i.e.,
⎡ 0.24254 −0.97014 ⎤
P=⎢ ⎥
⎣ −0.97014 0.24254 ⎦
Then
⎡ −3 17 ⎤
PV T AVP = ⎢ 3⎥
⎢⎣ 0 −3 ⎥⎦

is a Schur triangularization of V T AV . Finally, let

⎡1 0 ⎤
U =Q⎢ ⎥
⎣0 P ⎦
Then
⎡1 0.97025 −21.747 ⎤
U AU = ⎢⎢0 −3.000 5.667 ⎥⎥
T

⎢⎣0 0 −3.000 ⎥⎦
is a (numerically approximate) Schur triangularization of A.

- 194 -
In summary, every square matrix is triangularizable but only non-defective matrices are
diagonalizable.

The Schur triangularization theorem is fundamental, as many nice and important results
follow from it. We list some of these results here.

Theorem 7.4 Let A ∈ F n× n has eigenvalues λ1 , , λn (counting algebraic

multiplicities). Then
n
tr ( A) = ∑ λi and det( A) = λ1λ2 λn .
i =1

Proof: By Schur triangularization theorem, there exists unitary U such that,


⎛ λ1 … * ⎞
⎜ ⎟
U AU = ⎜
H
⎟ =T (7.7)
⎜0 λn ⎟⎠

n
is upper triangular. It is then easy to deduce that tr ( A) = ∑ λi and det( A) = λ1λ2 λn
i =1

(exercise!).

Theorem 7.5 Let A ∈ F n× n has eigenvalues λ1 , , λn (counting algebraic

multiplicities) and let m be a positive integer. Then the eigenvalues of Am are


λ1m , , λnm . If A is nonsingular (i.e., λi for all i) then the eigenvalues of A− m are

λ1− m , , λn− m .

Proof: Let U be unitary such that (7.6) holds. Note that


⎛ λ1m … * ⎞
⎜ ⎟
Tm = ⎜ ⎟
⎜ 0 m⎟
λn ⎠

, λnm . As Am = (UTU H ) = UT mU H
m
has eigenvalues λ1m , is similar to T m , its

eigenvalues are exactly those of T m . Now suppose A is nonsingular. Then T is also


nonsingular, and its inverse is also in upper triangular form

- 195 -
⎛ λ1−1 … * ⎞
⎜ ⎟
T −1 = ⎜ ⎟
⎜ 0 −1 ⎟
λn ⎠

, λn−1 . Since T −1 = (U H AU ) = U H A−1U , by the result just


−1
and has eigenvalues λ1−1 ,

proved, the eigenvalues of A− m are λ1− m , , λn− m .

Corollary 7.6 Let A ∈ F n× n has eigenvalues λ1 , , λn (counting algebraic

multiplicities) and let q ( x ) = α 0 + α1 x + α k x k ∈ F ( x ) be a polynomial. Then the

eigenvalues of q ( A ) are q ( λ1 ) , , q ( λn ) .

Proof: Again Let U be unitary such that (7.7) holds. Then


q ( A ) = Q (UTU H ) = Uq (T ) U H . As in the proof of the last theorem, it is easy to see that

q (T ) is upper triangular and has eigenvalues q ( λ1 ) , , q ( λn ) . Since q ( A ) is similar

to q (T ) , the result follows.

Theorem 7.7 (Cayley-Hamilton Theorem) Let A ∈ F n×n and let has eigenvalues
det ( λ I − A ) = λ n + γ 1λ n −1 + + γ n be its characteristic polynomial. Then

An + γ 1 An −1 + + γ nI = 0 (7.8)

Proof: By Schur’s triangularization theorem, there exists unitary U such that T := U H AU

is in upper triangular form given in (7.7), where λ1 , , λn are the eigenvalues of A. Let

p ( λ ) be the characteristic polynomial of A. Then p ( λ ) = ( λ − λ1 ) ( λ − λn ) , and

p ( A ) = Up (T ) U H . The result will follow if we can show that

(T − λ1I1 ) (T − λn I ) = p (T ) = 0 (7.9)

We shall prove (7.9) by mathematical induction on n.


If n = 1 then T is the 1× 1 matrix ( λ1 ) , and p (T ) = ( λ1 − λ1 ) = 0 . Assume (7.8) holds

for any upper triangular T given in (7.7) with n = n0 . Now suppose T is in the form

- 196 -
given in (7.7), with n = n0 + 1 . Write T in block matrix form

⎡T ′ * ⎤
T =⎢ ⎥
⎣ 0 λn0 +1 ⎦

where T ′ is in upper triangular form with eigenvalues λ1 , , λn0 . By induction

assumption, (T ′ − λ1 I ) (T ′ − λ I ) = 0 . As a result,
n0

P (T ) = (T − λ I ) (T − λ I )(T − λ I )
1 n0 n0 +1

⎡T ′ − λn0 I *⎤ ⎞ ⎡*
⎛ ⎡T ′ − λ1 I *⎤ * ⎤
= ⎜⎢ ⎢ ⎥ ⎟⎟ ⎢0 λ − λ ⎥
⎜⎣ 0
⎝ *⎥⎦
⎣ 0 *⎦ ⎠ ⎣ n0 +1 n0 +1 ⎦

( )
⎡(T ′ − λ1 I ) T ′ − λn I *⎤ ⎡* * ⎤ ⎡0 *⎤ ⎡* * ⎤
=⎢ ⎥⎢ ⎥=⎢ ⎥=0
0
⎥⎢
⎢⎣ 0 *⎥⎦ ⎣0 0 ⎦ ⎣0 *⎦ ⎣0 0 ⎦
The proof is now completed.

Corollary 7.8 Let A ∈ F n×n . If A is nonsingular, then A−1 can be written as a linear

combination of I , A, A2 , An −1 .

Proof: Let p ( λ ) = λ n + γ 1λ n −1 + + γ n be the characteristic polynomial of A. By the

Cayley-Hamilton theorem, we have An + γ 1 An −1 + + γ n I = 0 . This implies

(A n −1
+ γ 1 An − 2 + + γ n −1 I ) A = −γ n I

Since A is nonsingular, we have

A−1 = −γ n−1 ( An −1 + γ 1 An − 2 + + γ n −1 I ) (7.10)

7.3 Normal Matrices

Definition 7.2 A matrix A ∈ F n×n is said to be normal if AAH = AH A

Lemma 7.9 An upper triangular matrix T ∈ F n×n is normal iff it is diagonal.


Proof: Let T = ( tij ) ∈ F n×n be upper triangular. If T is diagonal then clearly

- 197 -
T H T = TT H = dig t11 , ( 2
, tnn
2
)
and hence T is normal.
Conversely, suppose T is not diagonal. Then there exists a smallest i , say, i0 , such that
2
ti0 , j ≠ 0 for some j with 1 ≤ i0 < j ≠ n . Observe that the ( i0 , i0 ) entry of T H T is ti0 ,i0 ,

∑t
2 2
but the ( i0 , i0 ) entry of TT H is i0 , j > ti0 ,i0 . Since T H T ≠ TT H , T is not normal.
j = i0

Theorem 7.10 (Spectral decomposition of normal matrices)


n× n
(i) Let A ∈ . If A is normal then there exists unitary U such that U H AU is

diagonal and has the eigenvalues λ1 , , λn of A, in any given order, as diagonal

entries. Conversely, if D := U H AU is diagonal for some unitary U, then A is


normal and its eigenvalues are the diagonal entries of D.
(ii) Let A ∈ n× n
. If A is normal and all its eigenvalues λ1 , , λn are real, then there

exists orthogonal U such that U T AU is diagonal and has λ1 , , λn , in any given

order, as diagonal entries. Conversely, if D := U T AU is diagonal for some


orthogonal U, then A is normal and all its eigenvalues are real and are the
diagonal entries of D.

n× n
Proof: We shall prove (i) only. The proof of (ii) is similar. Let A ∈ , By Schur
triangularization theorem, there always exists unitary U such that T := U H AU is upper
triangular and has the eigenvalues of A, in any given order, as diagonal entries. Note that
AAH = AH A iff TT H = T H T . Hence, by Lemma 7.9, if A is normal then T is diagonal.
Conversely, suppose U H AU = D is diagonal for some unitary U. Then A is similar to D
and hence its eigenvalues are exactly those of D, which are also the diagonal entries of D.
Also it is straight forward to check that AAH = AH A .

The above theorem says that a complex matrix is normal iff it is unitarily similar to a
diagonal matrix. Or, equivalently, a matrix is normal iff it is diagonalizable by a unitary
matrix. As a result, a real normal matrix whose eigenvalues are not all real is unitarily (but
not orthogonally) similar to a complex (but not real) diagonal matrix. The proof of the
following result is left as an exercise.

- 198 -
Corollary 7.11
(i) Let A ∈ n×n . Then A is normal iff it has an orthonormal set of eigenvectors which
spans n .
(ii) Let A ∈ n×n . Then A is normal and all its eigenvalues are real iff it has an
orthonormal set of eigenvectors which spans n .

Theorem 7.12 (Spectral decomposition of unitary matrices) Let A ∈ F n×n . Then the
followings are equivalent
(i) A is unitary
(ii) A is normal and all eigenvalues of A have modulus of one.
(iii) U H AU = diag ( λ1 , , λn ) for some unitary U with λi = 1 for all i .

Proof:
(i) ⇔ (ii) Suppose A is unitary, then AAH = AH A = I and hence A is normal. Moreover,
if λ is any eigenvalue of A and x is a corresponding eigenvector, then, by Theorem
7.2,
x = Ax = λ x = λ x ⇒ λ = 1

(i) ⇔ (iii) This follows from Theorem 7.2 (why)

(iii) ⇔ (i) If (iii) holds then one easily verifies that AH A = I . Therefore A−1 = AH , and A
is unitary.

7.4 Hermitian Matrices

Definition 7.3 Let A ∈ F n×n


(i) If A = A H , then we call A a Hermitian matrix.
(ii) If A = − A H , then we call A a skew-Hermitian matrix.
(iii) A real Hermitian matrix (i.e., a real matrix A which satisfies A = AT ) is also called
a real symmetric matrix.
(iv) A real skew-Hermitian matrix (i.e., a real matrix A which satisfies A = − AT ) is also
called a real skew-symmetric matrix.

Remark 4.13 The following are easy to verify (exercise).

(i) A matrix A is Hermitian iff −1A is skew-Hermitian.

- 199 -
(ii) If A and B are Hermitian (or skew-Hermitian, real symmetric, real skew-symmetric,
respectively) matrices of the same order, then so is α A + β B , where α , β are

real scalars.
n× n
(iii) Every A ∈ can be written uniquely as the sum of a real symmetric matrix and
A + AT A − AT
a real skew-symmetric matrix: A = + .
2 2
n× n
(iv) Every A ∈ can be written uniquely as the sum of a Hermitian matrix and a
A + AH A − AH
skew-Hermitian matrix: A = + .
2 2
(v) Hermitian , Skew- Hermitian and Unitary matrices are all normal

Theorem 7.14 (Spectral decomposition of Hermitian matrices) Let A ∈ F n×n . Then the
followings are equivalent
(i) A is Hermitian (or A is real symmetric if F = )
(ii) A is normal and all eigenvalues of A are real and eigenvectors belonging to
distinct eigenvalues are orthogonal.
(iii) U H AU = diag ( λ1 , , λn ) for some unitary (or orthogonal if F = ) matrix U

with λi ∈ for all i .

Proof:
(i) ⇒ (ii) Suppose A is Hermitian, then AAH = AH A = A2 and hence A is normal.
Moreover, if λ is any eigenvalue of A and x is a corresponding eigenvector, since
A = AH , we have
λ x = x H λ x = x H Ax = ( x H Ax ) = x H Ax = x H λ x = λ x = λ x
2 H 2 2

This implies that λ = λ , and λ must be real.

Let ( λ1 , x1 ) and ( λ2 , x2 ) be two eigenpairs of A,

⇒ λ1 x1H x2 = ( λ1 x1 ) x2 = ( Ax1 ) x2 = x1H AH x2


H H

= x1H Ax2 = λ2 x1H x2


∵ λ1 ≠ λ2
⇒ x1 , x2 = x1H x2 = 0
⇒ x1 ⊥ x2

(ii) ⇒ (iii) This follows from Theorem 7.10.

- 200 -
(iii) ⇔ (i) If (iii) holds then one easily verifies that AH = A if F = , and that AT = A
if F = .

Theorem 7.15 Let A ∈ F n×n . If A = − A H , then all eigenvalues of A are pure imaginary
λ ( A ) ⊆ jω

Proof: Let ( λ , x ) be a eigenpair of A,

⇒ λ x H x = ( λ x ) x = x H AH x
H

∵ AH = − A
⇒ λ x H x = − x H Ax = −λ x H x
⇒ λ = −λ
⇒ λ is pure imaginary

⎛ 0 2 −1 ⎞
Example 7.2 Find an orthogonal matrix U that diagonalizes A = ⎜⎜ 2 3 −2 ⎟⎟
⎜ −1 −2 0 ⎟
⎝ ⎠
Solution:
(1) Calculate the eigenvalues of A:
p(λ ) = det(λ I − A) = (λ + 1)2 (5 − λ ) ⇒ λ = −1 , −1 , 5
(2) Calculate the respective eigenspaces
⎧⎪⎛ −1 −2 1 ⎞ ⎫⎪
T

N (5I − A) = Span ⎨⎜ , , ⎟ ⎬
⎩⎪⎝ 6 6 6 ⎠ ⎪⎭

N (− I − A) = Span {(1 , 0 , 1)T


, ( −2 , 1 , 0 )
T
} = Span {v , v }
1 2

(3) Using Gram-Schmidt Process to transform the vectors in N (− I − A) into orthogonal

ones.
(1 , 0 , 1) = ⎛ 1 , 0 , 1 ⎞
T T
v
u1 = 1 = ⎜ ⎟
v1 2 ⎝ 2 2⎠

⎡1 ⎤
⎢ 2⎥
h1 = v2 , u1 u1 = [ −2 1 0] 0 ⎥ u1 = − 2u1 = [ −1 0 −1]
⎢ T

⎢ ⎥
⎢1 ⎥
⎢⎣ 2 ⎥⎦

- 201 -
v2 − h1 = [ −2 1 0] − [ −1 0 −1] = [ −1 1 1]
T T T

[ −1 1 1]
T T
v2 − h1 ⎡ 1 1 1 ⎤
u2 = = ⎢− ⎥
v2 − h1 3 ⎣ 3 3 3⎦

⎧⎪⎛ 1 T
1 ⎞ ⎛ −1 1 1 ⎞ ⎫⎪
T

N (− I − A) = Span ⎨⎜ , 0 , ⎟ , ⎜ , , ⎟ ⎬
⎩⎪⎝ 2 2⎠ ⎝ 3 3 3 ⎠ ⎭⎪
(4) The columns of U form an orthonormal eigenbasis (WHY?):
⎛ −1 1 −1 ⎞
⎜ ⎟
⎜ 6 2 3⎟
⎜ −2 1 ⎟
U =⎜ 0 ⎟
⎜ 6 3⎟
⎜ 1 1 1 ⎟
⎜ ⎟
⎝ 6 2 3⎠

⇒ U T AU = diag ( 5, − 1, − 1)

If A is a real square matrix then obviously x H Ax = x T Ax is real for any real vector x . If

A is Hermitian, then it is easy to see that x H Ax is also real for any complex vector x .
The following is a classification of Hermitian and real symmetric matrices A according as

the positivity/negativity of the real scalars x H Ax , where x runs over all nonzero vectors

in F n .

Definition 7.4 Let A be either Hermitian or real symmetric. Then

(i) If x H Ax > 0 ∀ x ∈ F n & x ≠ 0 , then we call A a positive definite matrix and denote
by A > 0 .

(ii) If x H Ax < 0 ∀ x ∈ F n & x ≠ 0 , then we call A a negative definite matrix and


denote by A < 0 .

(iii) If x H Ax ≥ 0 ∀ x ∈ F n & x ≠ 0 , then we call A a positive semidefinite matrix and


denote by A ≥ 0 .

(iv) If x H Ax ≤ 0 ∀ x ∈ F n & x ≠ 0 , then we call A a negative semidefinite matrix and


denote by A ≤ 0 .
(v) If x H Ax > 0, y H Ay < 0 for some x , y ∈ F n , we call A indefinite.

- 202 -
For example,
⎛2 0⎞ ⎛ −2 0 ⎞ ⎛2 0 ⎞
A=⎜ ⎟>0, A=⎜ ⎟<0, A=⎜ ⎟ is indefinite
⎝ 0 4⎠ ⎝ 0 −4 ⎠ ⎝ 0 −4 ⎠

Question: Given a real symmetric matrix, how to determine its definiteness efficiently?
We have the following theorem.

Theorem 7.16 Let A be either Hermitian or real symmetric. Then.


(i) A > 0 ⇔ λi > 0 .

(ii) A < 0 ⇔ λi < 0 .

(iii) A ≥ 0 ⇔ λi ≥ 0 .

(iv) A ≤ 0 ⇔ λi ≤ 0 .

(v) A is indefinite ⇔ A has some positive and some negative eigenvalues .

Proof: We shall prove (i) and leave the others for the readers as exercises.
" ⇒ " Let A > 0 & (λ , x ) be an eigenpair

2 x H Ax
⇒ x H Ax = x H λ x = λ x ⇒ λ= 2
>0
x

" ⇐ " Suppose λ ( A) ⊆ +


. Let {x1 xn } be an orthonormal eigen-basis of A (Why can

assume this ? )
n
∀ x ∈ F n , x = ∑ α i xi for some α1 , ,α n
i =1

n n n
⇒ x H Ax = (∑ α i xi ) H (∑ α i λi xi ) = ∑ λiα i 2 > 0 (why? )
i =1 i =1 i =1

⇒ A>0

Let’s take a look of the properties of positive definite matrices.


Property I: A = AT ∈ n×n
& A > 0 ⇒ A is nonsingular & det(A) > 0 (∵ det(A) = ∏ λi )

n× n
Property II: A = AT ∈ & A > 0 ⇒ aii > 0, ∀i and all the leading principal

sub-matrices of A are positive definite

- 203 -
Proof: Pick
T
⎡ ⎤
xi = ⎢ 0, , 0, 1 , 0, , 0 ⎥ , ∀i
⎣ i − th ⎦
⇒ xi Axi = aii > 0
T

Pick
T ⎡A A ⎤
x = ⎡⎣ x1 , 0 ⎤⎦ , A = ⎢ 11 12 ⎥
⎣ A21 A22 ⎦
x T Ax = x1T A11 x1 > 0 ⇒ A11 is positive definite

n× n
Property III: A = AT ∈ & A > 0 ⇒ A can be reduced to upper triangular form using
only row operation III and the pivots elements will all be positive.
Sketch of the proof:
a11 > 0

⎛ a a ⎞ ⎛ a11 a12 ⎞
A2 = ⎜ 11 12 ⎟ → ⎜ (1) ⎟
⎝ a21 a22 ⎠ ⎝ 0 a 22 ⎠

∵ A2 > 0 & determinant is invariant under row operation of type III. Continue this

process , the property can be proved

n× n
Property IV: Let A = AT ∈ &A>0 Then
(i) A can be decompose as A = LU where L is lower triangular & U is upper
triangular.
(ii) A can be decompose as A = LDU where L is lower triangular & U is upper
triangular with all the diagonal element being equal to 1, D is an diagonal matrix.

Proof: by Gaussian elimination with type III row operation only and the fact that the
product of two lower (upper) triangular matrix is lower (upper) triangular.

Example 7.3
⎛ 4 2 −2 ⎞ ⎛1⎞ ⎛ 1⎞ row ⎛
4 2 −2 ⎞ ⎛ 1⎞ row ⎛
4 2 −2 ⎞
⎜ ⎟ R13 ⎜ ⎟ R12 ⎜ − ⎟
⎜ ⎟ R23 ⎜ − ⎟
⎜ ⎟
A = ⎜ 2 10 2 ⎟ ⎯⎯⎯⎯⎯→ A ∼ ⎜ 0 9 3 ⎟ ⎯⎯⎯⎯
⎝2⎠ ⎝ 2⎠ ⎝ 3 ⎠
→ A ∼ ⎜0 9 3 ⎟ =U
⎜ −2 2 5 ⎟ ⎜0 3 4 ⎟ ⎜0 0 3 ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
Thus A = LU

- 204 -
⎛ 1 00 ⎞⎟

⎛1⎞ ⎛ 1⎞ ⎛1⎞ ⎜ 1
where L = R12 ⎜ ⎟ R13 ⎜ − ⎟ R23 ⎜ ⎟ = 1 0⎟
⎝2⎠ ⎝ 2⎠ ⎝ 3⎠ ⎜ 2 ⎟
⎜⎜ − 1 1 1 ⎟⎟
⎝ 2 3 ⎠
Also A = LDU with
⎛ 1 0 0 ⎞⎟ ⎛1 1 −1 ⎞
⎜ ⎛4 0 0⎞ ⎜ 2 2⎟
⎜ ⎟
L=⎜ 1 1 ⎟
0 , D = ⎜0 9 0⎟ & U = ⎜0 1 1 ⎟
⎜ 2 ⎟ ⎜ 3 ⎟
⎜⎜ − 1 ⎜0 3 ⎟⎠
1 ⎟⎟ ⎜0 1 ⎟
1 ⎝ 0 0
⎝ 2 3 ⎠ ⎝ ⎠

Property V: Let A = AT ∈ n× n
& A > 0 If A = L1 DU
1 1 = L2 D2U 2 , then

L1 = L2 , D1 = D2 & U1 = U 2

Proof:
∵ D2−1 L−21 L1 D1 = U 2U1−1

∵ LHS is lower triangular & RHS is upper triangular with diagonal elements 1
⇒ U 2U1−1 = I ⇒ U 2 = U1
⇒ D2−1 L−21 L1 D1 = I
⇒ L−21 L1 = D2 D1−1 ⇒ L−21 L1 = I (why?)
⇒ L1 = L2 ⇒ D2 D1−1 = I ⇒ D1 = D2

Property VI: Let A = AT ∈ n× n


& A > 0 If A = L1 DU
1 1 = L2 D2U 2 , then A can be

factorized into
A = LDLT
where D is a diagonal matrix & L is lower triangular with 1 along the diagonal

Proof:
Let A = LDU
T
A= A
⎯⎯⎯ → LDU = U T DLT

Since the LDU representation is unique ⇒ U = LT

n× n
Property VII: (Cholesky Decomposition) Let A = AT ∈ & A > 0 , then A can be
factorized into
A = LLT
where L is lower triangular with positive diagonal

- 205 -
T
⎛ 1
⎞⎛ 1

Hint: A = LDL = ⎜ LD 2 ⎟ ⎜ LD 2 ⎟
T

⎝ ⎠⎝ ⎠

Example 7.4 In Example 7.3, we have seen that


⎛ 0 ⎞⎟ ⎛ 4 2 −2 ⎞
⎛ 4 2 −2 ⎞ ⎜ 1 0
⎜ ⎟ ⎜ ⎟
A = ⎜ 2 10 2 ⎟ = ⎜ 1 1 0 ⎟ ⎜ 0 9 3 ⎟ = LU
⎜ 2 ⎟
⎜ −2 2 5 ⎟ ⎜ ⎜0 0 3 ⎟
⎝ ⎠ ⎜− 1 1 1 ⎟⎟ ⎝ ⎠
⎝ 2 3 ⎠
Also
⎛ 1 0 0 ⎞⎟ ⎛ 4 0 0 ⎞ ⎛⎜ 1
1 −1 ⎞
⎜ 2 2⎟
⎜ ⎟
A=⎜ 1 1 0⎟⎜ 0 9 0⎟⎜0 1 1 ⎟ = LDU
⎜ 2 ⎟ ⎜ 3 ⎟
⎜⎜ − 1 ⎜ ⎟
1 1 ⎟⎟ ⎝ 0 0 3 ⎠ ⎜ 0 0 1 ⎟
⎝ 2 3 ⎠ ⎝ ⎠
Note that:
∵ A = AT ∴U = LT ⇒ A = LDLT
1
Define L1 = D 2 L , we have the Cholesky decomposition A = L1 LT1 .

- 206 -
Exercises

1. Let A ∈ F n×n and B ∈ F m×m have no eigenvalue in common and the characteristic

polynomial of A is p ( λ ) = ( λ − λ1 ) ( λ − λn ) . Prove that p ( A ) = 0 and p ( B ) is

nonsingular.

2. Let A ∈ F n×n , B ∈ F m×m and X ∈ F m×n , and the characteristic polynomial of A

is p ( λ ) = ( λ − λ1 ) ( λ − λn ) . Prove that if XA = BX , then Xp ( A) = p ( B ) X .

⎛ 1 3 −4 ⎞
3. Compute the LU factorization of A = ⎜⎜ −1 5 −3 ⎟⎟
⎜ 4 −8 23 ⎟
⎝ ⎠

⎛1 1 0⎞
4. Find the LU decomposition of the matrix A = ⎜⎜ 1 1 0 ⎟⎟
⎜0 0 1⎟
⎝ ⎠

⎛ 2 1 3⎞
5. Compute the LU factorization of A = ⎜⎜ 2 5 7 ⎟⎟
⎜4 2 7⎟
⎝ ⎠

6. Let A ∈ n×n
(a) If A is normal, show that A2 is also normal. Find a matrix A ∈ 2×2 which is not
normal, but A2 is normal.
(b) If A is unitary, show that A2 is also unitary. Find a matrix A ∈ 2×2 which is not
unitary, but A2 is unitary.
(c) If A is Hermitian, show that A2 is also Hermitian. Find a matrix A ∈ 2×2 which
is not Hermitian, but A2 is Hermitian.
n n

∑ aij = ∑ bij
2 2
7. If A = (aij ) , B = (bij ) ∈ F n×n are unitarily similar, prove that .
i , j =1 i , j =1

(Hint: consider the Frobenius norm ⋅ F


.)

⎡ a b⎤
8. Let A = ⎢ ⎥ , where a, b ∈ with b ≠ 0 ,
⎣ −b a ⎦
(a) Show that A is normal.
(b) Find the eigenvalues of A and find two orthonormal eigenvectors.

- 207 -
(c) Find a unitary matrix U such that U H AU is diagonal.
9. Let A = − AH be skew-Hermitian
(a) Show that the eigenvalues of A are pure imaginary.
(b) Show that I − A and I + A are both nonsingular.
⎡ 1 0 0⎤
10. Let A = ⎢⎢ 0 1 1 ⎥⎥ , use Cayley-Hamilton Theorem, compute A−1 and A5 .
⎢⎣ −2 −2 1 ⎥⎦

⎡1 0 1 ⎤ ⎡1 1 1 ⎤
11. Let A1 = ⎢ ⎥ and A2 = ⎢ ⎥ Apply the Gram-Schmidt process to find
⎣1 0 −1⎦ ⎣0 0 0⎦
2×3
matrices B1 , B2 ∈ such that span( A1 ) = span( B1 ) , span( A1 , A2 ) = span( B1 , B2 ) ,

and B1 , B2 are orthonormal with respect to the usual inner product.

12. Show that the transition matrix from one orthonormal basis of a finite dimensional
inner product space to another is unitary.
⎡2 0 1 ⎤
13. Find an orthonormal basis for 3×3
consisting of eigenvectors of ⎢⎢ 0 2 −1⎥⎥ .
⎢⎣ 1 −1 1 ⎥⎦
14. Show that the eigenvalues of a unitary matrix all have absolute value (or complex
modulus) 1.
15. Find the Cholesky decomposition of the following matrices
⎡2 1 1⎤ ⎡ 4 2 −6 ⎤ ⎡ 4 1 −1⎤
(a) ⎢⎢ 1 2 1 ⎥⎥ (b) ⎢⎢ 2 10 9 ⎥⎥ (c) ⎢⎢ 1 2 1 ⎥⎥ .
⎢⎣ 1 1 2 ⎥⎦ ⎢⎣ −6 9 26 ⎥⎦ ⎢⎣ −1 1 2 ⎥⎦
16. Following the proof of Schur’s Triangularization Theorem, find an orthogonal matrix
P such that PT AP is upper triangular:
⎡1 −1⎤ ⎡ 2 −1⎤ ⎡13 −9 ⎤
(a) ⎢ ⎥ (b) ⎢ ⎥ (c) ⎢ ⎥.
⎣1 3 ⎦ ⎣ −1 2 ⎦ ⎣16 −11⎦

- 208 -
Chapter 8 Singular Value Decomposition

n× n
Recall that if A ∈ is a normal matrix (i.e., if A satisfies AAH = AH A ) then A can be
decomposed as A = UDU H where U is unitary, and D is diagonal and has the
eigenvalues of A as diagonal entries. If A is a square matrix but not normal, then A does
not have such a nice decomposition. But still, according to the Schur triangularization

theorem, we have A = UTU H for some unitary U, and upper (or lower) triangular T with
the eigenvalues of A as diagonal entries. We may re-formulate these results as:

n× n
Proposition 8.1 Any square matrix A ∈ can be decomposed as A = UXV H , where
U = V is unitary, and X is upper (or lower) triangular; moreover, if A is normal then X is
diagonal.

Note that if U = V then A = UXV H must be square. Hence the result of Proposition 8.1
does not hold for nonsquare matrices A. However, if we do not require U = V , then any
matrix A, no matter square or not, can be decomposed as A = U ∑ V H for some unitary

U,V (here U and V may not be equal) and a “diagonal” matrix ∑ (see Theorem 8.2).
This is the important singular value decomposition, and is the main subject of this chapter.

8.1 Singular Value Decomposition (SVD)

Throughout this chapter we shall assume A ∈ F m×n (again we let F = or ). The

matrix AH A is an n × n positive semidefinite matrix. Therefore all the n eigenvalues of


AH A are nonnegative, and their nonnegative square roots are well-defined. We have the
following definition.

Definition 8.1 Let A ∈ F m×n , then the n nonnegative square roots of the eigenvalues of
AH A are called the singular values of A. We shall order them as
σ 1 ( A) ≥ σ 2 ( A) ≥ ≥ σ n ( A) ≥ 0 .

In other words, σ 1 ( A ) ≥ σ 2 ( A ) ≥ ≥ σ n ( A ) ≥ 0 are the singular values of A iff

- 209 -
σ 12 ≥ σ 22 ≥ ≥ σ n2 are the eigenvalues of AH A .

Suppose rank ( A) = r . By the spectral decomposition of Hermitian matrices, we have

AH A = VDV H

where V is unitary and D = diag (σ 12 ( A ) , , σ n2 ( A ) ) . Since rank ( AH A) = rank ( A) = r ,

we have rank ( D ) = r also, and hence only r of the singular values of A are nonzero.

Accordingly, we have σ 1 ( A ) ≥ ≥ σ r ( A ) > σ r +1 ( A ) = ≥ σ n ( A) = 0 .

m× n
Theorem 8.2 (Singular Value Decomposition (SVD)) Let A ∈ , rank ( A) = r and

its singular values σ 1 ( A ) ≥ ≥ σ r ( A ) > 0 be the r nonzero square roots of the

eigenvalues of AH A . Then ∃ ∑ + ≡ diag (σ 1 , , σr )∈ r ×r


and unitary matrices

U ∈ C m×m and V ∈ C n×n that make

⎡ ∑ +r×r 0 ⎤
= [U1 U 2 ] ⎢ ⎥ [V1 V2 ]
H
A = U ΣV H

⎣ 0 0 ⎦ m×n (8.1)
r
= ∑ σ i ui viH
i =1

where
U1 ≡ [u1 ur ] , U 2 ≡ [ur +1 um ]
V1 ≡ [ v1 vr ] , V2 ≡ [ vr +1 vn ]
m× n
Moreover, if A ∈ then U and V can be chosen to be real orthogonal.

n× n
Proof: Note that AH A ∈ is Hermitian and Positive Semi-definite with
rank ( AH A) = rank ( A) = r

⇒ ∃ a unitary matrix V ∋ AH A = Vdiag (σ 12 σ r2 , 0 0 )V H

where
σ1 ≥ σ 2 ≥ ≥ σr > 0

Define
∑ + = diag (σ 1 σ r ) & V = ⎡⎣(V1 ) n×r (V2 ) n×( n − r ) ⎤⎦

- 210 -
⇒ AH A = V1 ( ∑ + ) V1H
2
(8.2)
Choose
AH AV2 = 0 ⇒ AV2 = 0 (8.3)

Define
U1 = AV1 ( ∑ + ) ∈
−1 m× r
(8.4)
From (8.2) and (8.3), we have
U1H U1 = I r ⇒ U1 is unitary

Choose U 2 that satisfies

U 2 ∋ [U1 U2 ]∈ m× m

Then from (8.3) and (8.4), we have


⎡∑ + 0⎤
A [V1 V2 ] = [U1 U2 ] ⎢ ⎥
⎣ 0 0⎦

⎡∑+ 0⎤
⇒ A = U ∑V H , ∑ = ⎢ ⎥
⎣ 0 0⎦

m× n
Corollary 8.3 Let A ∈ ,then
(i) AH A and AAH have the same collection of nonzero eigenvalues.
(ii) AH and A have the same collection of nonzero singular values.

Proof:
(i) We need only to consider nonzero A's. By Theorem 8.2, we may write A = U ∑ V H

where U, V are unitary and ∑ ∈ m×n


has only nonzero entries σ 1 , , σ r > 0 at its

(1,1), (2, 2), , (r , r ) -th positions. Then

AH A = Udiag (σ 12 , , σ r2 , 0, , 0 )U H

and
AAH = Vdiag (σ 12 , , σ r2 , 0, , 0 )V H

have the same collection of nonzero eigenvalues σ 12 , , σ r2 .

(ii) By definition, the square roots of the eigenvalues of AAH are the singular values of
AH . Hence (ii) follows from (i).

- 211 -
Suppose A = U ∑ V H is the singular value decomposition of A, where U , ∑ , V are as

described in Theorem 8.2. Then AH = (U ∑ V H ) = V ∑ H U H , and this gives the singular


H

value decomposition of AH . Since ∑ and ∑ H have the same collection of “diagonal”

elements (in fact ∑ = ∑ H is square), we see that A and AH have the same collection of
nonzero singular values. This gives another proof of Corollary 8.3(ii).

Let A = U ∑ V H as in the statement of Theorem 8.2, i.e., U, V are unitary (or real

orthogonal if A is real), and ∑ is the m × n matrix whose (i, i)th entry is σ i

( i = 1, , r = rank ( A ) ) and all other entries zero. Let u1 , , um and v1 , , vn be the


columns of U and V respectively. Since AV = U ∑ , by considering the j-th column of
AV , we have
⎧σ u for j = 1, , r
Av j = ⎨ j j (8.5)
⎩ 0 for j = r + 1, , n
Similarly, we have
⎧σ v for j = 1, , r
AH u j = ⎨ j j (8.6)
⎩ 0 for j = r + 1, , m

Definition 8.2 For nonzero vectors u1 , , um and v1 , , vn which satisfy equations

(8.5) and (8.6), we call v1 , , vn the right singular vectors of A (corresponding to

singular values σ 1 , , σ n ), and u1 , , um the left singular vectors of A

(corresponding to singular values σ 1 , , σ m ). These should be compared with the

definition of the eigenvectors xi of A, which satisfy


Axi = λ xi

The above proves the following result.

Corollary 8.4 Let A ∈ m× n


, rank ( A) = r and σ 1 ≥ ≥ σ r be the r nonzero singular

values of A . Then there exist orthonormal left singular vectors u1 , , um in F m , and

- 212 -
orthonormal right singular vectors v1 , , vn in F n , such that

⎧σ u for j = 1, , r ⎧σ v for j = 1, , r
Av j = ⎨ j j , AH u j = ⎨ j j
⎩ 0 for j = r + 1, , n ⎩ 0 for j = r + 1, , m

Remark: In the SVD A = U ΣV H :

(1) The singular values σ 1 ( A ) ≥ ≥ σ r ( A ) > 0 of A are unique while U & V are not

unique.
n× n
(2) Columns of U are orthonormal eigenbasis for AH A ∈ .
m× m
(3) Columns of V are orthonormal eigenbasis for AAH ∈ .
⎡∑+ 0⎤
AV = U ∑ = U ⎢ ⎥
⎣ 0 0⎦
AH U = V ∑

(4) {v1 vr } is an orthonormal basis for R( A) .

(5) {vr +1 vn } is an orthonormal basis for N ( A) .

(6) {u1 ur } is an orthonormal basis for R( AH ) .

(7) {ur +1 um } is an orthonormal basis for N ( AH ) .

(8) rank(A) = number of nonzero singular values but rank(A) ≠ number of nonzero
eigenvalues.

For example,
⎛0 1⎞
A=⎜ ⎟ ⇒ rank ( A) = 1 but λ ( A) = 0, 0
⎝0 0⎠

m× n m× m
Lemma 8.5 Let A ∈ ,and Q ∈ be orthogonal Then

QA F
= A F

Proof:

- 213 -
n
= ∑ QAi
2 2 2
QA F
= QA1 QAn F 2
i =1
n
= ∑ Ai
2
2
i =1
2
= A F

Corollary 8.6 Let A = U ΣV H be the SVD of A. Then

n
A F
= Σ F
= ∑σ
i =1
i
2

Example 8.1 Find SVD of the matrix


⎛1 1⎞
A = ⎜⎜ 1 1 ⎟⎟
⎜0 0⎟
⎝ ⎠
Solution:
λ ( AH A) = 4 , 0 ⇒ σ 1 = 4, σ 2 = 0

⎧ ⎡1⎤ ⎫ 1 ⎡1⎤ ⎧⎡ 1 ⎤ ⎫ 1 ⎡1⎤


R( A) = span ⎨ ⎢ ⎥ ⎬ ⇒ v1 = ⎢ ⎥ , N ( A) = span ⎨ ⎢ ⎥ ⎬ ⇒ v2 = ⎢ ⎥
⎩ ⎣1⎦ ⎭ 2 ⎣1⎦ ⎩ ⎣ −1⎦ ⎭ 2 ⎣ −1⎦

⎧ ⎡1 ⎤ ⎫ ⎡1 ⎤
⎪⎢ ⎥ ⎪ 1 ⎢ ⎥
R( A ) = span ⎨ ⎢1 ⎥ ⎬ ⇒ u1 =
H
⎢1 ⎥ ,
⎪ ⎢0 ⎥ ⎪ 2
⎩⎣ ⎦ ⎭ ⎢⎣0 ⎥⎦

⎧⎡ 1 ⎤ ⎡0⎤ ⎫ ⎡1⎤ ⎡0⎤


⎪ ⎪ 1
N ( AH ) = span ⎨ ⎢⎢ −1⎥⎥ , ⎢0⎥ ⇒ u =
⎢ ⎥⎬
⎢ −1⎥ , u = ⎢ 0⎥
2⎢ ⎥ ⎢ ⎥
2 3
⎪⎢ 0 ⎥ ⎢⎣1 ⎥⎦ ⎪⎭ ⎢⎣ 0 ⎥⎦ ⎢⎣1 ⎥⎦
⎩⎣ ⎦
Thus
⎡ 1 1 ⎤
⎢ 0⎥
2 2
⎢ ⎥
⎢ 1 1 ⎥ 1 ⎡1 1 ⎤
U =⎢ − 0⎥ , V = ⎢ ⎥
2 2 2 ⎣1 −1⎦
⎢ ⎥
⎢ 0 0 1⎥
⎢⎣ ⎥⎦
And

- 214 -
⎛ 2 0⎞
A = U ΣV ⎜ ⎟
H
with Σ = ⎜ 0 0 ⎟
⎜ 0 0⎟
⎝ ⎠

In linear algebra, the singular value decomposition (SVD) is an important factorization of


a rectangular real or complex matrix, with many applications in signal processing and
statistics. Applications which employ the SVD include computing the pseudoinverse, least
squares fitting of data, matrix approximation, and determining the rank, range and null
space of a matrix.

Application 8.1: Image Compression

Given an original image (here 359 × 371 pixels) shown in Fig. 8.1.

Fig. 8.1 Detail from Durer’s Melancolia, Fig. 8.2 Spectrum of singular values of A
dated 1514, 359x371 image
We can write it as a 359 × 371 matrix A which can then be decomposed via the singular
value decomposition as
A = U ΣV T
where U is 359 × 359 , Σ is 359 × 371 and V is 371× 371 .

The matrix A however can also be written as a sum of rank 1 matrices


A = σ 1u1v1T + σ 2u2 v2T + + σ n un vnT

where each rank 1 matrix ui viT is the size of the original matrix. Each one of these

matrices is a mode. Because the singular values σ i are ordered σ 1 ≥ σ 2 ≥ ≥ σn ≥ 0 ,

significant compression of the image is possible if the spectrum of singular values has
only a few very strong entries.

- 215 -
We can therefore reconstruct the image from just a subset of modes. For example in
MATLAB we can write just the first mode as
[U,S,V]=svd(A)
B=U(:,1)*S(1,1)*V(:,1)’

Fig. 8.3 EOF reconstruction with the first mode


which only uses 5% of the storage ( 10 × 359 + 10 × 371 + 10 = 7310 pixels vs
359 × 371 = 133189 pixels. Adding modes, just adds resolution:
B=U(:,1:30)*S(1:30,1:30)*V(:,1:30)’

Fig. 8.4 EOF reconstruction with 30 modes

Fig. 8.5 EOF reconstruction with 50 modes

- 216 -
Fig. 8.6 EOF reconstruction with 100 modes

Fig. 8.7 EOF reconstruction with 200 modes

8.2 Polar Decomposition

Another application of SVD is the polar decomposition of a matrix. It is well known that
every complex number z can be written in the so called polar form
z = up = pu

where p = z is the modulus of z (and hence p is nonnegative), and u is a complex

number with modulus 1 (therefore uu H = u H u = 1 ). It turns out that this can be


generalized to matrices as well, if we replace p by some positive semidefinite matrix, and
replace u by some matrix U with either orthonormal columns or orthonormal rows (i.e., U
satisfies UU H = U H U = I ). We state the result below.

- 217 -
Theorem 8.7 Let A ∈ F m×n ,
(i) Suppose m ≥ n . Then
A = UP (8.7)
for some U ∈ F m×n whose columns are orthonormal (i.e., U H U = I n ) and some

positive semidefinite P ∈ F n×n with rank ( P ) = rank ( A) .


(ii) Suppose m ≤ n . Then
A = QV H (8.8)

for some V ∈ F n×m whose columns are orthonormal and some positive

semidefinite Q ∈ F m×m with rank (Q) = rank ( A) .


(iii) Suppose m = n . Then
A = UP = QV H (8.9)
for some n × n unitary (or real orthogonal if F = ) U and V, and some positive
semidefinite P, Q ∈ F n×n with rank ( P) = rank (Q) = rank ( A) .

Proof: We shall prove only (i), and (ii) and (ii) will follow.
Suppose m ≥ n . By Theorem 8.2, A = U ΣV H where U ∈ F m×n has orthonormal

columns, ∑ ≡ diag (σ 1 , , σ n ) has singular values of A as diagonal entries, and

V ∈ F n×n is unitary (or real orthogonal if F = ). Then

A = (UV H )(V ΣV H )

It is clear that the n × n matrix V ΣV H is positive semidefinite and has rank equal to

(UV ) (UV ) = VU
H
rank ( A) . Also, by direct checking, H H H
UV H = VV H = I n , which

means the columns of UV H are orthonormal. By renaming UV H and V ΣV H as U and


P respectively, we have the result of (i).

Note that in Theorem 8.5(iii), the two unitary matrices U and V H may not be equal, and
the two positive semidefinite matrices P and Q may not be equal. However, it is not hard
to see that P and Q must be unitarily similar, since their eigenvalues are exactly the
singular values of A (exercise). In fact, it is easy to show that, in Theorem 8.5, P is in fact

- 218 -
( A A) ( AA )
1 1
H 2
, the unique positive semidefinite square root of AH A , and Q is H 2

(exercise).

Application 8.2 Logarithmic Strain in Continuum Mechanics

Polar decomposition is frequently used in continuum mechanics. A deformation gradient


tensor F ∈ 3×3 is defined to describe deformation of an elastic body in continuum
mechanics,
∂x ⎛ ∂x ⎞
F= or ( Fij ) = ⎜ i ⎟⎟ ∈
3×3
(8.10)
∂X ⎜ ∂X
⎝ j ⎠

where X is the position vector of a particle in the undeformed configuration Ω 0 of the


elastic body, x is the position vector of the particle in the deformed configuration Ω
(see Fig. 8.1)

Ω0
Ω

X x

Fig 8.1 Deformation of an elastic body in continuum mechanics

The polar decomposition of the deformation gradient tensor F is


F = RU (8.11)
where U is called the right stretch tensor and R is an orthogonal matrix, which, in
continuum mechanics, is called the rigid body rotation matrix, and it reflects the rigid
body rotation of the particle during deformation. (8.11) is called the right polar
decomposition of F. Similarly, we have the left polar decomposition of F as
F = VR (8.12)
where V is called the left stretch tensor and R is the same R as in (8.11).

To filter out the rigid body rotation of a deformation including in the deformation gradient
tensor F , a right Cauchy-Green strain tensor C is defined as

C = FT F (8.13)
Obviously C is positive definite. The Green-Lagrange strain tensor is defined as

- 219 -
1 1
E= (C − I ) = ( F T F − I ) (8.14)
2 2
From (8.14), it is clear that under pure rigid body rotation ( F = R ) , E = 0 , there is no

strain produced in the body.

Substituting (8.11) into (8.13) yields


C = F T F = ( RU ) ( RU ) = U T RT RU = U TU = U 2
T

Hence the stretch tensor is related to the right Cauchy-Green strain tensor C as

= (FT F )
1 1
U =C 2 2
(8.14)
Using the spectral decomposition of U, we have

U = PΛPT (8.15)
and
Λ = diag ( λ1 , λ2 , λ3 ) , P = [ p1 p2 p3 ] (8.16)

where ( λ1 , λ2 , λ3 ) are the square roots of the eigenvalues of C, and [ p1 p2 p3 ] are the

ordered corresponding normalized eigenvectors and P is orthogonal.

Another strain definition for large deformation often used in continuum mechanics is the
logarithmic strain tensor:
ε = ln (U ) = Pdiag ( ln λ1 , ln λ2 , ln λ3 ) PT (8.17)

The rigid body rotation matrix R in the polar decomposition of (8.11) is given as

R = FU −1 (8.18)
with the inverse of U easily obtained from (8.15) as by using Theorem 7.5,
⎛1 1 1⎞ T
U −1 = Pdiag ⎜ , , ⎟P (8.19)
⎝ λ1 λ2 λ3 ⎠
Similar results can be obtained for the left polar decomposition (8.12) by defining a left
Cauchy-Green strain tensor B:
B = FF T (8.20)

⎛ 1 ⎞
⎜ 3 − 3 0 ⎟
⎜ ⎟
⎜ 5 3⎟
Example 8.2 Find the polar decomposition of the matrix F = ⎜ 0 − ⎟
⎜ 3 5⎟
⎜ 12 ⎟
⎜ 0 0 ⎟
⎝ 5 ⎠

- 220 -
Solution:
⎛ 3 −1 0 ⎞
⎜ ⎟
C = F T F = ⎜ −1 2 − 1 ⎟
⎜ 0 −1 3 ⎟
⎝ ⎠

Since C is positive definite, (C − λ 2 I ) and the characteristics polynomial for matrix C


are
⎛3− λ2 −1 0 ⎞
⎜ ⎟
(C − λ 2 I ) = ⎜ −1 2 − λ2 −1 ⎟
⎜ 0 −1 3 − λ 2 ⎟⎠

det(C − λ 2 I ) = ( λ 2 − 1)( λ 2 − 3)( λ 2 − 4 ) = 0

The eigenvalues of C is λ12 = 1, λ22 = 3, λ12 = 4 .

⎧⎡ 1 ⎤ ⎫
⎧ ⎡1 ⎤ ⎫ ⎪⎢ 6 ⎥⎪
⎪ ⎪ ⎪
⎪⎢ ⎥ ⎪⎪
For λ12 = 1 N (C − I ) = Span ⎨ ⎢⎢ 2 ⎥⎥ ⎬ = Span ⎨ ⎢ 2 ⎥ ⎬
⎪ ⎢1 ⎥ ⎪ ⎪⎢ 6 ⎥⎪
⎩⎣ ⎦ ⎭ ⎪ 1 ⎥⎪

⎩⎪ ⎢⎣ 6 ⎥⎦ ⎭⎪
⎧⎡− 1 ⎤ ⎫
⎧ ⎡ −1⎤ ⎫ ⎪⎢ 2 ⎥⎪
⎪⎢ ⎥ ⎪ ⎪⎢ ⎪
For λ22 = 3 N (C − 3I ) = Span ⎨ ⎢ 0 ⎥ ⎬ = Span ⎨ 0 ⎥⎬
⎢ ⎥
⎪⎢ 1 ⎥ ⎪ ⎪⎢ 1 ⎥⎪
⎩⎣ ⎦ ⎭ ⎪⎢
⎩⎣ 2 ⎥⎦ ⎭⎪
⎧⎡ 1 ⎤ ⎫
⎧ ⎡ −1⎤ ⎫ ⎪⎢− 3⎥⎪
⎪ ⎪ ⎪
⎪⎢ ⎥ ⎪⎪
For λ32 = 4 N (C − 4 I ) = Span ⎨ ⎢⎢ 1 ⎥⎥ ⎬ = Span ⎨ ⎢ 1 ⎥⎬
⎪ ⎢ −1⎥ ⎪ ⎪ ⎢ 3 ⎥⎪
⎩⎣ ⎦ ⎭ ⎪ − 1 ⎥⎪

⎪⎩ ⎢⎣ 3 ⎥⎦ ⎪⎭
Hence
Λ = diag ( λ1 , λ2 , λ3 ) = diag 1, ( 3, 2 , )
⎡1 − 1 − 1 ⎤
⎢ 6 2 3⎥
⎢ ⎥
P = [ p1 p2 p3 ] = ⎢ 2 0 1 ⎥
⎢ 6 3 ⎥
⎢1 1 − 1 ⎥
⎢⎣ 6 2 3 ⎥⎦

- 221 -
T
⎡1 − 1 − 1 ⎤ ⎡1 − 1 − 1 ⎤
⎢ 6 2 3 ⎥ ⎡1 ⎤⎢ 6 2 3⎥
⎢ ⎥⎢ ⎥⎢ 2 ⎥
U = P ΛP = ⎢ 2
T
0 1 ⎥ 3 ⎥⎢ 0 1 ⎥
⎢ 6 3 ⎥⎢ 6 3 ⎥
⎢1 ⎥
⎢ 2 ⎥⎦ ⎢⎢
1 − 1 ⎣ 1 1 − 1 ⎥
⎣⎢ 6 2 3 ⎦⎥ ⎣⎢ 6 2 3 ⎦⎥
⎡5 3 1 5 3⎤
⎢ + − − ⎥
⎢6 2 3 6 2 ⎥
⎢ 1 4 1 ⎥
=⎢ − −
3 3 3 ⎥
⎢ ⎥
⎢5 3 1 5 3⎥
⎢6 − 2 −
3
+
6 2 ⎥⎦

⎤ ⎡⎢ ⎤⎡ T
⎡1 − 1 − 1 ⎥⎢ 1 − 1 − 1 ⎤
⎢ 1
6 2 3⎥ ⎢ ⎥⎢ 6 2 3⎥
⎢ ⎥⎢ 1 ⎥⎢ 2 ⎥
U −1 = PΛ −1 PT = ⎢ 2 0 1 ⎥ 0 1 ⎥
⎢ 6 3 ⎥⎢ 3 ⎥⎢ 6 3 ⎥
⎢1 ⎢ ⎥⎢
1 − 1 ⎥⎢ 1⎥ 1 1 − 1 ⎥
⎢⎣ 3 ⎥⎦ ⎢ ⎢ 3 ⎥⎦
2 ⎦⎥ ⎣
6 2 6 2

⎡1 1 1 1 1 ⎤
+
⎢3 2 3 −
6 3 2 3⎥
⎢ ⎥
⎢ 1 5 1 ⎥
=⎢
6 6 6 ⎥
⎢ ⎥
⎢1 − 1 1 1
+
1 ⎥
⎢⎣ 3 2 3 6 3 2 3 ⎥⎦

⎡ 1 ⎤
⎢ 3 − 0 ⎥ ⎡1 + 1 1 1

1 ⎤
3 ⎢ 3 2 3⎥
⎢ ⎥ ⎢3 2 3 6

⎢ 5 3⎥⎢ 1 5 1 ⎥
R = FU −1 = ⎢ 0 − ⎥⎢
⎢ 3 5⎥ 6 6 6 ⎥
⎢ ⎥
⎢ 12 ⎥ ⎢ 1 − 1 1 1 1 ⎥
⎢ 0 0 ⎥ +
⎣ 5 ⎦ ⎢⎣ 3 2 3 6 3 2 3 ⎥⎦
⎡ 1 5 3 5 3 1 ⎤
⎢ + − − ⎥
⎢ 2 6 3 9 18 2 ⎥
⎢ 15 1 1 5 5 3 15 1 1 ⎥
=⎢ − + − − − ⎥
⎢ 18 15 2 5 6 3 6 5 18 15 2 5 ⎥
⎢ 2 1 1 2 1 ⎥
⎢ − + ⎥
⎢⎣ 15 5 15 15 5 ⎥⎦

Check: RT R = I

- 222 -
8.3 Pseudo Inverse

A pseudo inverse or generalized inverse of a matrix A is a matrix that has some properties
of the inverse matrix of A but not necessarily all of them. The term “the pseudo inverse”
commonly means the Moore–Penrose pseudo inverse.

The purpose of constructing a pseudo inverse is to obtain a matrix that can serve as the
inverse in some sense for a wider class of matrices than invertible ones. Typically, the
pseudo inverse exists for an arbitrary matrix, and when a matrix has an inverse, then its
inverse and the generalized inverse are the same.

Recall the definition of inverse matrix. If A ∈ F n×n is nonsingular, then it has a unique
inverse A−1 of the same order which satisfies
A−1 A = AA−1 = I n (8.21)

It is straight forward to deduce from (8.21) that


A−1 AA−1 = A−1 , AA−1 A = A, A−1 A and AA−1 are Hermitian (8.22)
If A is a square but singular matrix, or if A is a nonsquare matrix, then there does not exist
any matrix A−1 that satisfies (8.21). However, there always exists a matrix, denoted by
A− , that satisfies (8.22) when A−1 is replaced by A− . We state the result below.

Theorem 8.8 For any A ∈ F m×n ,there exists a unique A− ∈ F n×m which satisfies the
following:
(a) A− AA− = A−
(b) AA− A = A
(8.23)
( A A)
− H
(c ) = A− A

(d ) ( AA ) − H
= AA−

Proof: We prove existence first. Let A = U ΣV H be the singular value decomposition of

A, where U ∈ F m×m and V ∈ F n×n are unitary, and ∑ ∈ m×n


is such that its ( i, i ) th
entry is σ i ( i = 1, , r = rank ( A) ) and zero elsewhere, with σ 1 , , σ r being the r

nonzero singular values of A. Let ∑ − ∈ n×m


be such that its ( i, i ) th entry is 1
σi

( i = 1, , r ) and zero elsewhere. Define

- 223 -
A− = V ∑ − U H (8.24)
Then
⎛ r n−r

A A = Vdiag ⎜1,

,10, , 0 ⎟V H ,
⎜ ⎟
⎝ ⎠
(8.25)
⎛ r m−r

AA− = Udiag ⎜1, ,10, , 0 ⎟U H
⎜ ⎟
⎝ ⎠

By direct verification, all conditions of (8.23) hold. Hence the existence of A− is proved.

For uniqueness, let A− and A# satisfy all conditions of (8.23). By (8.23)(b), we have

AA# = ( AA− A ) A# = ( AA− )( AA# )

Then, by (8.23)(d), AA# is Hermitian and hence

AA# = ( AA# ) = (( AA )( AA )) = ( AA ) ( AA )
H H
− # # H − H

= ( AA# )( AA− ) = ( AA# A ) A− = AA−

where the last equality follows from (8.23)(b). Similarly, one can show that A# A = A− A
(exercise). It then follows that
A− = A− AA− = A− AA# = ( AA− ) A# = A# AA# = A#

where the first and the last equalities follow from (8.23)(a). The proof is now complete.

The matrix A− in the previous Theorem 8.8 is called the pseudo-inverse, or the
Moore-Penrose inverse, or the generalized inverse of A. Its explicit formula, in terms of
singular value decomposition, is given in (8.24). Clearly, if A is a square, nonsingular

matrix then A−1 = A− . Also, if A ∈ F m×n has full row (or column, respectively) rank then

A− is the unique right (or left, respectively) inverse of A that satisfies both AA− = I m

and A− A is Hermitian (or A− A = I n and AA− is Hermitian, respectively) (exercise).

Now consider the linear system Ax = b where A ∈ F m×n and b ∈ F m . Let i E


denote

the Euclidean norm. In Chapter 6, we have shown that:


(a) (Theorem 6.11) if Ax = b is a consistent system and A has full row rank, then

- 224 -
x := AH ( AAH ) b
−1

is the unique vector that satisfies (i) Ax = b and (ii) x E


< x E
for all x which

satisfy Ax = b and x ≠ x ;
(b) (Theorem 6.13) if A has full column rank, then x := ( AH A ) AH b is the unique vector
−1

that satisfies Ax − b E
< Ax − b E
for all x ∈ F n such that x ≠ x .

However, these results are somewhat restrictive because they cannot be applied if
rank ( A) < min(m, n) . It turns out that, by using the pseudo-inverse, a stronger result

(Theorem 8.10) can be derived and it covers both results of (a), (b) above. The following
lemma is needed to prove this stronger result.

Lemma 8.9 Let R(A) denote the column space of any matrix A. Then:
R ( A− ) = R ( A H ) (8.26)

Proof: For any A ∈ F m×n , we have


A− = A− AA− = ( AA− ) A− = AH ( A− ) A−
H H

Therefore, for any x ∈ F m ,

A− x = A H (( A )
− H
)
A − x ∈ R ( A H ) ⇒ R ( A− ) ⊆ R ( A H ) .

On the other hand, since


AH = ( AA− A) = ( A− A ) AH = A− AAH
H H

Therefore, for any x ∈ F m ,

A x = A− ( AAH x ) ∈ R ( A− ) ⇒ R ( AH ) ⊆ R ( A− )
H

Combining, we have the result.

Lemma 8.9 can also be proved by considering the singular value decomposition of A−
and AH .

Theorem 8.10 Let A ∈ F m×n and b ∈ F m . Then the vector x := A−b is the unique
vector in F n that satisfies
(i) Ax − b E
≤ Ax − b E
for all x ∈ F n , and

- 225 -
(ii) x E
< x E
for all x ∈ F n which satisfies Ax − b E
= Ax − b E
and x ≠ x

(In other words, x := A−b is the unique least square solution of the system Ax = b that
has the smallest Euclidean norm.)

Proof: For any x ∈ F n , write

Ax − b = ( Ax − Ax ) + ( Ax − b )

Notice that
AH ( Ax − b ) = AH AA− Ax − AH b = AH AA− b − AH b

= A H ( AA− ) b − AH b = A H ( A− ) AH b − AH b
H H

= ( AA− A ) b − AH b = AH b − AH b = 0
H

Hence Ax − b ∈ N ( AH ) , the null space of AH . Since

( Ax − Ax ) = A ( x − x ) ∈ R ( A) = N ( AH )

We see that
2 2 2
Ax − b E
= Ax − Ax E
+ Ax − b E

Thus
Ax − b E
≤ Ax − b E

with equality holds iff Ax − Ax = 0 , i.e., x − x ∈ N ( A ) . This proves (i).

Now suppose x ∈ F n is such that Ax − b E


= Ax − b E
and x ≠ x . Then, from the

above, 0 ≠ x − x ∈ N ( A ) . Since, by the preceding Lemma 8.9,

x = A− b ∈ R ( A− ) = R ( A H ) = N ( A )

and that
x = ( x − x) + x

We have
2 2 2 2
x E
= x−x E
+ x E
> x E

This proves (ii).

Example 8.3 According to Fig. 8.2, the position of the uppermost joint of the model is

- 226 -
Fig. 8.2 Least square problem

( x, y ) = ( cos a + cos b + cos c, sin a + sin b + sin c )

By first order approximation, we have


Δx ≈ − sin aΔa − sin bΔb − sin cΔc
Δy ≈ cos aΔa + cos bΔb + cos cΔc

Supposing at some instant, a = 45°, b = 30°, c = 45°, Δx = 0.02, Δy = −0.01 , we obtain,

0.02 ≈ −0.707 Δa − 0.500Δb − 0.707Δc


−0.01 ≈ 0.707 Δa + 0.886Δb + 0.707cΔc

And the uppermost joint is now moving from ( x, y ) to ( x + 0.02, y − 0.01) .

Alternatively, we can represent the equations as


Δx = AΔθ
where
⎡ 0.02 ⎤ ⎡ −0.707 −0.500 −0.707 ⎤
Δx = ⎢ ⎥ and A = ⎢ ⎥
⎣ −0.01⎦ ⎣ 0.707 0.886 0.707 ⎦

Now, we want to find a pseudoinverse A− such that Δθ = A− Δx . A commonly used

method to obtain the pseudoinverse is to find the least-square solution, Δϕ , of Δθ by

the following equation


Δϕ ≈ AT ( AAT ) Δx = A− Δx
−1

where A− = AT ( AAT )
−1
is our pseudoinverse and we obtain

- 227 -
−1
⎡ −0.707 0.707 ⎤ ⎛ ⎡ −0.707 0.707 ⎤ ⎞
⎢ ⎥ ⎜ ⎡ −0.707 −0.500 −0.707 ⎤ ⎢ ⎥ ⎟ ⎡ 0.02 ⎤
Δϕ ≈ ⎢ −0.500 0.886 ⎥ ⎜ ⎢ ⎥ − 0.500 0.886 ⎥ ⎟ ⎢ −0.01⎥
⎣ 0.707 0.886 0.707 ⎦ ⎢

⎢⎣ −0.707 0.707 ⎥⎦ ⎝ ⎢⎣ −0.707 0.707 ⎥⎦ ⎟⎠ ⎣ ⎦

⎡− 0.707 0.707 ⎤ −1
⎢ ⎥ ⎡ 1.250 - 1.433⎤ ⎡ 0.02 ⎤
= ⎢ − 0.500 0.886⎥ ⎢
- 1.433 1.750 ⎥⎦ ⎢⎣− 0.01⎥⎦
⎢⎣− 0.707 0.707 ⎥⎦ ⎣

⎡− 0.707 0.707 ⎤
⎡13.06 10.69 ⎤ ⎡ 0.02 ⎤
= ⎢⎢ − 0.500 0.886⎥⎥ ⎢
⎣10.69 9.328⎥⎦ ⎢⎣− 0.01⎥⎦
⎢⎣− 0.707 0.707 ⎥⎦

⎡− 0.707 0.707 ⎤ ⎡- 0.02390⎤


⎥ ⎡0.1543⎤ ⎢

= ⎢ − 0.500 0.886⎥ ⎢ ⎥ = ⎢ 0.02961 ⎥⎥ (in radian).
0.1205⎦
⎢⎣− 0.707 0.707 ⎥⎦ ⎣ ⎢⎣- 0.02390⎥⎦
Hence, the least-square solution is
( a + Δa, b + Δb, c + Δc ) ≈ (43.63, 31.70, 43.63°).

8.4 The Jordan Form

Recall for a square matrix A which is similar to another matrix B, we have

A = SBS −1 (8.27)

for some invertible S. Then Ak = SB k S −1 for all nonnegative integers k, and hence

p( A) = Sp( B) S −1 for any polynomial p ( z ) = α 0 + α1 z + α k z k in indeterminate z. If B

has a simple form, so that B k is easy to compute, then p ( A) can be obtained easily.
This is the case when A is diagonalizable, i.e., when the n × n matrix A has n linearly
independent eigenvectors, so that we may choose S to have these linearly independent
eigenvectors as columns, and B to be the diagonal matrix having the corresponding
eigenvalues as diagonal entries. However, not all square matrices are diagonalizable. For
example, consider the following matrix
⎛5 4 2 1⎞
⎜ ⎟
⎜ 0 1 −1 −1⎟
A=
⎜ −1 −1 3 0 ⎟
⎜ ⎟
⎝ 1 1 −1 2 ⎠

- 228 -
Including multiplicity, the eigenvalues of A are λ = 1, 2, 4, 4. The dimension of the kernel
of ( A − 4I ) is 1, so A is not diagonalizable. Still, we would like to find a B in (8.27)

which is “simple” enough, even if it is not diagonal. For this case,


⎛1 0 0 0⎞
⎜ ⎟
⎜ 2 0 0⎟
B=J =
⎜ 4 1⎟
⎜ ⎟
⎝0 4⎠

which with an invertible matrix S makes A = SJS −1 . The matrix J is almost diagonal and
is the Jordan normal form of A. The major task of this section is to fill in the theoretical
and computational details of this example.

Recall that, by the Schur triangularization theorem, any complex, square matrix A is
unitarily similar to an upper triangular matrix, i.e.,

A = UTU H (8.27)
for some unitary U and upper triangular T, and the eigenvalues of A can be placed in any
order on the diagonal of T. In addition, if A and all its eigenvalues are real then U can be
chosen to be real orthogonal and T to be real. Although upper triangular matrices have
some nice structure (e.g., if T has diagonal entries λi , then T k is also upper triangular

and has diagonal entries λik ), it is still difficult to compute their k-th powers. Moreover,

the upper triangular matrix T in (8.27) may not be unique for a same matrix A.
Nevertheless, (8.27) can be considered as a first step towards finding a “simple” B that is
similar to A. In fact, our proof of the Jordan form theorem starts with the result of the
Schur triangularization theorem. We first prove an auxiliary result.

Lemma 8.11 Let A ∈ F n×n and B ∈ F m×m have no eigenvalue in common. If X ∈ F m×n
satisfies XA = BX , then X = 0

Proof: Assume λ1 , , λn are the eigenvalues of A and η1 , ,ηm are the eigenvalues of

B, and λi ≠ η j , for any i = 1, , n and j = 1, , m . Then the characteristic polynomial of

A is p ( λ ) = ( λ − λ1 ) ( λ − λn ) . Then we have
p ( A ) = ( A − λ1 I ) ( A − λn I ) = An + γ 1 An−1 + + γ nI = 0 (8.28)

From XA = BX , we have

- 229 -
Xp ( A ) = XAn + γ 1 XAn −1 + + γ n X = ( XA ) An −1 + γ 1 ( XA ) An − 2 + +γnX
= BXAn −1 + γ 1 BXAn − 2 + +γnX
(8.29)

= ( B n + γ 1 B n −1 + + γ n I ) X = p( B) X

Or

0 = Xp ( A ) = p ( B ) X (8.30)

On the other hand, by Schur’s triangularization theorem, there exists unitary U such that

T := U H BU is in upper triangular form as

⎛η1 … * ⎞
⎜ ⎟
T = U H BU = ⎜ ⎟ (8.31)
⎜0 η ⎟
⎝ m⎠

and
p( B) = Up (T )U H (8.32)
with
p (T ) = (T − λ1 I ) (T − λn I )
⎛η1 − λ1 … * ⎞ ⎛η1 − λn … * ⎞
⎜ ⎟ ⎜ ⎟ (8.33)
=⎜ ⎟ ⎜ ⎟
⎜ 0 η − λ ⎟ ⎜ 0 η − λ ⎟
⎝ m 1⎠ ⎝ m n⎠

Since λi ≠ η j , for any i = 1, , n and j = 1, , m , we have from (8.33) that (T − λi I ) ,


i = 1, , n , is nonsingular. Hence p (T ) and thus p ( B) are nonsingular. Then from

(8.30), we arrive at
p ( B ) is nonsigular
0 = Xp ( A ) = p ( B) X ⇒
X =0

Lemma 8.12 Let A ∈ n× n


has eigenvalues λ1 , , λk with algebraic multiplicities

n1 , , nk respectively, where λ1 , , λk are distinct and n1 + + nk = n . Then

⎡T1 ⎤
⎢ T ⎥
A=S⎢ 2 ⎥ S −1 (8.34)
⎢ ⎥
⎢ ⎥
⎣ Tk ⎦

- 230 -
n× n ni ×ni
for some nonsingular S ∈ and upper triangular Ti ∈ , where all diagonal

entries of Ti are λi ( i = 1, , k ) . Moreover, if A and all its eigenvalues are real, then S,

T1 , , Tk can be chosen to be real.

Proof: We prove by using mathematical induction on k. If k = 1 the result follows from


the Schur triangularization theorem. Suppose the result holds for any matrix with k0

distinct eigenvalues, and A is an n × n matrix with k0 + 1 distinct eigenvalues

λ1 , , λk +1 of algebraic multiplicities n1 , , nk +1 , respectively. We first triangularize A by


0 0

using a unitary matrix U so that

A = UTU H
with T being an upper triangular matrix with diagonal entries
n1 n2 nk0 +1

λ1 , , λ1 , λ2 , , λ2 , , λk0 +1 , , λk0 +1 .

Write p = n1 + + nk0 , q = nk0 +1 , and

⎡T (1) Y ⎤
T =⎢ (2) ⎥
⎣ 0 T ⎦

where T (1) , T (2) , Y have orders p × p, q × q, p × q , respectively. Note that T (1) , T (2)

are upper triangular, and all eigenvalues of T (2) (which are equal to λk0 +1 ) are distinct
p× q
from those of T (1) . We claim that there exists X ∈ such that
⎡ I p X ⎤ ⎡T (1) 0 ⎤ ⎡T (1) Y ⎤ ⎡ I p X ⎤
⎢ 0 I ⎥⎢ (2) ⎥
=⎢ (2) ⎥ ⎢ ⎥ (8.35)
⎣ q⎦⎣ 0 T ⎦ ⎣ 0 T ⎦ ⎣ 0 Iq ⎦
If this is true then
⎡T (1) 0 ⎤ −1
T = Sa ⎢ S
(2) ⎥ a
⎣ 0 T ⎦

⎡I p X ⎤ (1)
where S a is the invertible matrix ⎢ ⎥ . As T has only k0 distinct eigenvalues
⎣ 0 I q⎦

λ1 , , λk , by induction assumption, there exists a p × p invertible Sb such that


0

- 231 -
⎡T1 ⎤
⎢ ⎥ −1
T (1) = Sb ⎢ ⎥ Sb
⎢ Tk0 ⎥⎦

where Ti are ni × ni upper triangular matrices with all diagonal entries equal

⎡ Sb 0 ⎤
λi ( i = 1, , k0 ) . It then follows that (8.34) holds for S = US a ⎢ ⎥ and Tk0 +1 = T (2) .
⎣ 0 Iq ⎦

To prove our claim, first observe that (8.35) holds iff


XT (2) = T (1) X + YI q

which is equivalent to
XT (2) − T (1) X = Y (8.36)
Definite φ : p× q
→ p× q
by φ ( X ) = XT (2) − T (1) X . It is easy to check that φ is a

linear transformation (exercise). If φ ( X ) = 0 , then XT (2) = T (1) X and, by the Lemma

8.11 and the assumption that T (1) and T (2) have no eigenvalues in common, X = 0 .

This proves that the kernel of φ is {0} . As a result,


dim φ ( p× q
) = dim ( p× q
) − dim ( ker φ ) = dim ( p× q
)
and hence φ ( p× q
)= p× q
, i.e., φ is surjective (onto). Accordingly, there exists

X∈ p× q
such that Y = φ ( X ) = XT (2) − T (1) X , and thus (8.36) holds for X. Our claim is

proved.

Finally, if A and all its eigenvalues are real, then the matrices U , S a , Sb above can be

chosen to be real, so that the matrices S , T1 , ,Tk in (8.34) are real.

Definition 8.3 Let λ ∈ and n be a positive integer. The Jordan block J n ( λ ) is the

n × n matrix
⎡λ 1 ⎤
⎢ λ ⎥
Jn (λ ) = ⎢ ⎥
⎢ 1⎥
⎢ ⎥
⎣ λ⎦

- 232 -
with λ on the diagonal and 1 on the superdiagonal, and zero elsewhere. A Jordan
matrix is a direct sum of Jordan blocks in the form J n1 ( λ1 ) ⊕ ⊕ J nk ( λk ) (here

λ1 , , λk may not be distinct).

Notice that J1 ( λ ) is just the 1× 1 matrix ( λ ), and a diagonal matrix is just a direct

sum of 1×1 Jordan blocks. We shall prove later that every complex matrix is similar to
a Jordan matrix (i.e., a direct sum of Jordan blocks). The Jordan matrix, which has a
much simpler structure than general upper triangular matrices, is then the matrix B we
want to find at the beginning of this chapter. We need the following lemmas to prove this
theorem. Recall that a permutation matrix is a square matrix, in which every column and
every row has exactly one nonzero entry whose value is 1.

Lemma 8.13 Let Ai ∈ ni × ni


for i = 1, , k and let π be a permutation on {1, , k} .

Then A1 ⊕ A2 ⊕ Ak is permutationally similar to Aπ1 ⊕ Aπ 2 ⊕ Aπ k , i.e., there exists

a permutation matrix P such that


A1 ⊕ A2 (
⊕ Ak = P Aπ1 ⊕ Aπ 2 )
⊕ Aπ k P −1 (8.37)

Proof: Exercise.

Lemma 8.14
(i) Let k , l be positive integers. Then

Nullity of J k ( 0 ) = min( k , l )
l

In particular, J k ( 0 ) = 0, if l ≥ k
l

(ii) Let k ≥ 2 and {e1 , , ek } be the standard basis of F k . Then

J k ( 0 ) ei +1 = ei for i = 1, , k −1

Proof: Exercise.

Lemma 8.15 Let A ∈ n× n


be such that all its eigenvalues are λ . Then there exists a

n× n
nonsingular matrix S ∈ and positive integers n1 , , nk such that n1 + + nk = n

and

- 233 -
⎡ J n1 ( λ ) ⎤
⎢ ⎥
⎢ J n2 ( λ ) ⎥ −1
A=S⎢ ⎥S (8.38)
⎢ ⎥
⎢⎣ J nk ( λ ) ⎥⎦

Moreover, if A and λ are real then S can be chosen to be real.

Proof: We assume that λ = 0 . For if λ ≠ 0 then we may consider A = A − λ I , the


eigenvalues of which are all zero. If
(
A = S J n1 ( 0 ) ⊕ J n2 ( 0 ) ⊕ )
⊕ J nk ( 0 ) S −1

then
(
A = A + λ I = S J n1 ( λ ) ⊕ J n2 ( λ ) ⊕ )
⊕ J nk ( λ ) S −1

and the result follows. We shall prove by mathematical induction on n.


If n = 1 then A = ( 0 ) = J1 ( 0 ) and the result is trivial.

Suppose the lemma holds for the cases n < N , where N > 1 is a fixed integer, and let
A∈ n× n
be such that all its eigenvalues are zero. If A = ( 0 ) , then

A = J1 ( 0 ) ⊕ J1 ( 0 ) ⊕ ⊕ J1 ( 0 )

and the result follows. Therefore we assume A ≠ ( 0 ) , so that dim R ( A) ≥ 1 . Note that 0 is

eigenvalue of A implies that dim ker( A) ≥ 1 , and hence dim R ( A) < N . As it is always

true that A ( R ( A) ) ⊆ R ( A) , it follows that the function φ : R ( A) → R ( A) , defined by

φ ( x ) = Ax for all x ∈ R ( A) , is a linear transformation on the subspace R( A) with

1 ≤ dim R ( A) < N . Observe that any eigenvalue of φ (i.e., scalar λ which satisfies

φ ( x ) = λ x for some nonzero x ∈ R ( A) ) is also an eigenvalue of A, and hence must be

zero. Thus, by induction assumption, there exists an ordered basis B of R( A) , with

respect to which the matrix representation of φ is in the form of

J m1 ( 0 ) ⊕ ⊕ J ml ( 0 )

If we denote the m1th vector of B by w then, from the m1th column of the above matrix

- 234 -
( m1 − 1)
th
representation, we see that Aw is the vector in B. With similar arguments,
we conclude that
⎧ m1 terms m2 terms ml terms ⎫
⎪ ml −1 ⎪
B = ⎨ Am1 −1w, Am1 − 2 w, , Aw, w , Am2 −1w, , Aw, w , ,A w, , Aw, w⎬
⎪ ⎪
⎩ ⎭
for some w1 , , wl ∈ R ( A ) , and Ami wi = 0 for all i = 1, , l . Let vi ∈ N
be such that

Avi = wi (thus A j wi = A j +1vi for i = 1, , l , and use the convention that A0 = I N .

Consider the set


E := { A j vi : i = 1, , l ; j = 0,1, , mi } = B ∪ {v1 , , vl }

We claim that E is linearly independent. For if


l mi
z := ∑∑ α i , j A j vi
i =1 j = 0

is the zero vector, then


l mi −1
0 = Az = ∑ ∑ α i , j A j wi
i =1 j = 0

which implies that


α i , j = 0 for all 1 ≤ i ≤ l and 0 ≤ j ≤ mi − 1
Thus
l
0 = z = ∑ α i , mi Ami −1wi
i =1

which in turn implies that


α i ,m = 0 for all 1 ≤ i ≤ l
i

Hence E is linearly independent.

Now for any x ∈ N


, since Ax ∈ R( A) and B = { A j +1vi : i = 1, , l; j = 0,1, , mi − 1}

is a basis for R( A) , we have

l mi −1
Ax = ∑ ∑ α i , j A j +1vi = Ay
i =1 j = 0

where
l mi −1
y = ∑ ∑ α i , j A j vi ∈ span( E )
i =1 j = 0

- 235 -
and hence
A ( x − y ) = 0, i.e., ( x − y ) ∈ ker( A)

From the identity x = y + ( x − y ) , we conclude that

N
⊆ span( E ) + ker( A)
(note that this may not be a direct sum). Since
E⊆ N
⊆ span( E ) + ker( A)

it follows that (why?) one can find linear independent vectors u1 , , uh ∈ ker( A) such

that
N
= span( E ) ⊕ span {u1 , , uh }

As E is linearly independent, E ∪ {u1 , , uh } is a basis of N


. By arranging the order

of vectors in this basis as


m1 +1 terms m2 +1 terms ml +1 terms
Am1 v1 Am1 −1v1 , , Av1 , v1 , Am2 v2 , , Av2 , v2 , , Aml vl , , Avl , vl , u1 , , uh

N ×N
and let these vectors form the columns of a (necessarily nonsingular) matrix S ∈ ,
we have
h trems
S AS = J m1 +1 ( 0 ) ⊕
−1
⊕ J ml +1 ( 0 ) ⊕ J1 ( 0 ) ⊕ ⊕ J1 ( 0 )

If A and λ are real in the first place, then we can replace every occurrence of by
in the above, and hence S is real also. This completes the proof.

n× n
Theorem 8.16 (Jordan Form) For A ∈ , there exists a nonsingular matrix
n× n
S∈ and positive integers m1 , , mr such that m1 + + mr = n and A = SJS −1

J = J m1 ( λ1 ) ⊕ ⊕ J mr ( λr ) (8.39)

Here λ1 , , λr (may not be distinct) are eigenvalues of A. The Jordan matrix J is unique

up to permutations of the diagonal Jordan blocks. Moreover, if A and all its eigenvalues
are real, then S can be chosen to be real.

Proof: By Lemma 8.12, there exists nonsingular S0 such that

A = S0 (T1 ⊕ ⊕ Tk ) S0−1

- 236 -
And Ti are upper triangular matrices whose diagonal entries are all equal to an

eigenvalue λi of A ( i = 1, , k ) . For each i = 1, , k , by Lemma 8.15, there exists

invertible Si such that

(
Ti = Si J ni1 ( λi ) ⊕ ⊕ J ni ,r ( λi ) Si−1
i
)
By taking
S = S0 ( S1 ⊕ ⊕ Sk )

we have A = SJS −1 where

(
J = J n11 ( λ1 ) ⊕ ) (
⊕ J n1,r ( λ1 ) ⊕ J n21 ( λ2 ) ⊕
1
⊕ J n2,r ( λ2 ) ⊕
2
)
(
⊕ J nk 1 ( λk ) ⊕ ⊕ J nk ,r ( λk )
k
)
If A and all λ1 , , λr are real then S0 , S1 , , S k can be chosen to be real, so that S is

real. It remains to prove that the matrix J is unique up to permutations of its diagonal
Jordan blocks. Let J be the Jordan matrix J m1 ( λ1 ) ⊕ ⊕ J mr ( λr ) , and λ ∈ . Then for

any positive integer l ,

( J − λI ) = J m1 ( λ1 − λ ) ⊕ ⊕ J mr ( λr − λ )
l l l

and hence
r
nullity of ( J − λ I ) = ∑ nullity of J mi ( λi − λ )
l l

i =1

∑λ nullity of J mi ( λi − λ ) = ∑ min ( mi , l )
l
=
i∈{1, , r}, i = λ i∈{1, , r}, λi = λ

Now suppose A is similar to J and to another Jordan matrix


J = J k1 ( μ1 ) ⊕ ⊕ J ks ( μ s )

( J − λI )
l
( J − λI )
l
Then J and J are similar to each other, and hence and are
similar to each other for any λ ∈ and any positive integer l . Consequently,

( J − λI )
l
( J − λI )
l
and have the same nullity, which means

∑ min ( mi , l ) = ∑ min ( k j , l )
i∈{1, , r}, λi = λ j∈{1, , s}, μ j = λ

for any λ ∈ and any positive integer l . From this, it can be deduced easily that the

diagonal Jordan blocks of J is a permutation of those of J.

- 237 -
The matrix J in the above theorem is called the Jordan form of A. Suppose we choose
some ordering of the distinct eigenvalues of A, say, λ1 , , λt . The diagonal Jordan

blocks of J can be ordered in such a way that the first blocks are those having λ1 as

eigenvalues, and are ordered in decreasing size of the blocks; the next blocks are those
having λ2 as eigenvalues, again ordered in decreasing size of the blocks, . Then the

representation of J is unique, once the ordering of the distinct eigenvalues of A are fixed.

From Theorem 8.16, we can deduce the following properties:


(1) Counting multiplicity, the eigenvalues of J, therefore A, are the diagonal entries.
(2) Given an eigenvalue λi , its geometric multiplicity is the dimension of

ker( A − λi I ) , and it is the number of Jordan blocks corresponding to λi .

(3) The sum of the sizes of all Jordan blocks corresponding to an eigenvalue λi is

its algebraic multiplicity


(4) A is diagonalizable if and only if, for every eigenvalue λ of A, its geometric and
algebraic multiplicities coincide.
(5) Notice that the Jordan block corresponding to λ is of the form λ I + , where
is a nilpotent matrix defined as ij = δ i , j −1 (where δ is the Kronecker

delta).

Note: The Jordan form of a matrix is a conceptual tool and is never used in numerical
computation. Because there is no numerically stable method to compute the Jordan form
⎛0 ε ⎞
of a matrix. To demonstrate this, consider the matrices A ( ε ) = ⎜ ⎟ where ε ∈ .
⎝0 0⎠

⎛0 1⎞
Clearly, A ( ε ) → A ( 0 ) as ε → 0 . The Jordan form of A ( ε ) is ⎜ ⎟ for ε ≠ 0 (via
⎝0 0⎠

⎛ε 0⎞ ⎛0 0⎞
the similarity matrix S = ⎜ ⎟ ), and the Jordan form of A ( 0 ) is ⎜ ⎟ . Hence the
⎝ 0 1⎠ ⎝0 0⎠

Jordan form of A ( ε ) does not converge to the Jordan form of A ( 0 ) , as ε approaches

0. Nevertheless, the Jordan form is still a very useful tool for analyzing properties of
square matrices.

- 238 -
Example 8.4 We return to the example given at the beginning of this section. For the
matrix
⎛5 4 2 1⎞
⎜ ⎟
⎜ 0 1 −1 −1⎟
A=
⎜ −1 − 1 3 0 ⎟
⎜ ⎟
⎝ 1 1 −1 2 ⎠
Find its Jordan form and S.
Solution: The characteristics polynomial for matrix A are
det( A − λ I ) = ( λ − 1)( λ − 2 )( λ − 4 ) = 0
2

The eigenvalues of A is λ1 = 1, λ2 = 2, λ3 = λ4 = 4 . Let’s check the multiplicity of A first.

⎧⎡ 1 ⎤ ⎫
⎪⎢ ⎥ ⎪
⎪ −1 ⎪
For λ1 = 1 N ( A − I ) = Span ⎨ ⎢ ⎥ ⎬
⎪⎢ 0 ⎥ ⎪
⎪⎩ ⎢⎣ 0 ⎥⎦ ⎪⎭

⎧⎡ 1 ⎤ ⎫
⎪⎢ ⎥ ⎪
⎪ −1 ⎪
For λ2 = 2 N ( A − 2 I ) = Span ⎨ ⎢ ⎥ ⎬
⎪⎢ 0 ⎥ ⎪
⎪⎩ ⎢⎣ 1 ⎥⎦ ⎪⎭

⎧⎡ 1 ⎤ ⎫
⎪⎢ ⎥ ⎪
⎪ 0 ⎪
For λ3 = λ4 = 4 N ( A − 4 I ) = Span ⎨ ⎢ ⎥ ⎬
⎪ ⎢ −1⎥ ⎪
⎢ ⎥
⎩⎪ ⎣ 1 ⎦ ⎭⎪
Since the nullity of ( A − 4 I ) is 1, the Jordan form of A is

⎛1 0 0 0⎞
⎜ ⎟
⎜ 2 0 0⎟
J=
⎜ 4 1⎟
⎜ ⎟
⎝0 4⎠

Suppose S = {v1 v2 v3 v4 } and

AS = SJ
Since
AS = { Av1 Av2 Av3 Av4 }

and
SJ = {v1 2v2 4v3 v3 + 4v4 }

- 239 -
This is equivalent to,

Av1 = v1 ⇒ (A − 1I )v1 = 0
Av2 = 2v2 ⇒ (A − 2 I )v2 = 0
Av3 = 4v3 ⇒ (A − 4 I )v3 = 0
Av4 = v3 +4v4 ⇒ (A − 4 I )v4 = v3

For i = 1, 2, 3 , we have vi ∈ N ( A − λi I ) , i.e. vi is an eigenvector of A corresponding to

the eigenvalue λi . Notice that for i = 4 , multiplying both sides by A − 4 I gives

(A − 4 I ) 2 v4 = (A − 4 I )v3

But (A − 4 I )v3 = 0 , so

(A − 4 I ) 2 v4 = 0

v4 ∈ N ⎡( A − 4 I ) ⎤ ∉ N ⎡⎣( A − 4 I ) ⎤⎦ and we can pick one as [1 0 0 0] . Vectors such


2 T
⎣ ⎦

as v4 are called generalized eigenvectors of A. Hence,

⎡1 1 1 1⎤
⎢ −1 − 1 0 0 ⎥⎥
S = {v1 v2 v3 v4 } = ⎢
⎢ 0 0 −1 0⎥
⎢ ⎥
⎣0 1 1 0⎦

Check: A = SJS −1

- 240 -
Exercises

1. Prove the following:


(a) ( ( A− ) = A , (A ) = ( AH ) .
− − H −
(b)

2. Let A ∈ n×n , suppose A2 = I


(a) Find all the possible eigenvalues of A.
(b) Show that A is diagonalizable. (Hint: consider the Jordan form of A.)
(c) If A has size 2 × 2 , find all the possible Jordan forms of A.
3. Show that
(a) A− = AH ( AAH )
−1
if A has full row rank.

(b) A− = ( AH A ) AH if A has full column rank.


−1

⎛1 0 0 0 2⎞
⎜ ⎟
0 0 3 0 0⎟
4. Given A = ⎜ , find its SVD.
⎜0 0 0 0 0⎟
⎜ ⎟
⎝0 4 0 0 0⎠

⎛0 1⎞
⎜ ⎟
5. Find the SVD of A = ⎜ 1 0 ⎟ .
⎜1 1⎟
⎝ ⎠
6. Prove that the rank of a matrix is equal to the number of nonzero singular values of
the matrix.
⎛ 1 2 −1 ⎞
⎜ ⎟
7. Given A = ⎜ 0 2 0 ⎟ , find its Jordan form and S.
⎜ 1 −2 3 ⎟
⎝ ⎠
n× n k
8. Let A ∈ . Show that Ak ≤ A 2
for all k ≥ 2 . If A is invertible, show that
2

A−1 ≥ 1 .
2 A 2

9. Suppose that A ∈ F n×n has singular values σ 1 , , σ n and eigenvalues λ1 , , λn .

n n
Show that ∏ σ i = ∏ λi .
i =1 i =1

10. Let A ∈ F n×n


(a) By considering the polar forms A = UP, where U is unitary and P is positive
semidefinite. If PU = UP, show that A is normal.
(b) If A is diagonalizable, show that there is a positive semidefinite Q such that

- 241 -
Q −1 AQ is normal.

⎛0 1⎞
11. Given A = ⎜ ⎟ , find its Jordan form and S.
⎝ −1 − 2 ⎠
12. Let A be a square matrix. If A is diagonalizable, show that all Jordan blocks in its
Jordan form have size 1×1 .
13. Let A1 , , Ak be square matrices. If A1 ⊕ ⊕ Ak is diagonalizable, show that all

A1 , , Ak are diagonalizable.

14. Suppose that A ∈ F n×n has singular values σ 1 , ,σ n

(a) Show that A is nonsingular if and only if σ i ≠ 0 for i = 1, ,n .

(b) Show that AH = A− if and only if σ i = 0 or σ i = 1 for i = 1, ,n.

⎛ 1 1⎞
⎜ ⎟
15. Let A = ⎜ 2 −2 ⎟
⎜ −2 2 ⎟
⎝ ⎠

(a) Find the singular values of A, and find the matrices U , ∑ , V in the SVD of

A = U ∑V H .
(b) Find the matrices U, P in the polar form A = UP.
(c) Find the pseudo-inverse A− of A.
16. Let J k ( λ ) denote the k × k Jordan block with eigenvalue λ . Let m be a positive

integer.
(a) If m < k , show that
⎛0 1 0⎞
⎜ ⎟
⎜ ⎟
J k (0) = ⎜
m
1⎟
⎜ ⎟
⎜ ⎟
⎜0 0 ⎟⎠

where the “1” on the first row is located at the (m + 1)th position and all other
unspecified entries are 0.
If m ≥ k , show that J k ( 0 ) = 0
m

(b) Show that

- 242 -
⎛ m ⎛ m ⎞ m −1 ⎛ m ⎞ m − 2 ⎛ m ⎞ m − k +1 ⎞
⎜λ ⎜ ⎟λ ⎜ ⎟λ ⎜ ⎟λ ⎟
⎜ ⎝1⎠ ⎝2⎠ ⎝ k − 1⎠ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎛ m ⎞ m−2 ⎟
Jk (λ )
m
=⎜ ⎜ ⎟λ ⎟
⎜ ⎝2⎠ ⎟
⎜ ⎛ m ⎞ m −1 ⎟
⎜ ⎜ ⎟λ ⎟
⎜ ⎝1⎠ ⎟
⎜ λ m ⎟
⎝ ⎠
⎧ m!
⎛m⎞ ⎪ if 0 ≤ r ≤ m
where ⎜ ⎟ = ⎨ r !( m − r ) !
⎝r⎠ ⎪
⎩ 0 otherwise

Hint, consider J k ( λ ) = ( λ I + J k ( 0 ) )
m m

(c) If λ ≠ 0 and m ≥ 2 , show that the Jordan form of J k ( λ ) is J k ( λ m ) .


m

Hint: consider the possible Jordan form of J k ( λ )


m
(
and rank J k ( λ ) − λ m I k
m
)
(d) Suppose that A is a nonsingular matrix with Jordan form J n1 ( λ1 ) ⊕ ⊕ J nk ( λk ) ,

show that the Jordan form of Am is J n1 ( λ1m ) ⊕ ⊕ J nk ( λkm )

17. Let A ∈ F m×n . Prove the following,


(a) rank ( A) = tr ( AA− ) .

(b) R ( AA− ) = R ( A) .

(c) If B is a matrix such that R( B ) ⊆ R ( A) , then AA− B = B .

⎛4 0 1 0⎞
⎜ ⎟
2 2 3 0⎟
18. Given A = ⎜ , find its Jordan form and S.
⎜ −1 0 2 0⎟
⎜ ⎟
⎝4 0 1 2⎠
19. Find all possible Jordan forms of a transformation with characteristic polynomial
( x − 1) 2 ( x + 2) 2 .
20. Find all possible Jordan forms of a transformation with characteristic polynomial
( x − 1)3 ( x + 2) .

- 243 -
⎛1 1⎞ ⎛0 1⎞
21. Diagonalize these: (a) ⎜ ⎟ ; (b) ⎜ ⎟.
⎝0 0⎠ ⎝1 0⎠

⎛ 1 −1 ⎞ ⎛ −1 0 ⎞
22. Decide if these two are similar: (a) ⎜ ⎟ ; (b) ⎜ ⎟.
⎝ 4 −3 ⎠ ⎝ 1 −1 ⎠
− − −
23. If x , y ∈ n
, show that ⎡⎣ xyT ⎤⎦ = ⎡⎣ x T x ⎤⎦ ⎡⎣ yT y ⎤⎦ yx T .

m× n
24. For A ∈ , prove that R ( A) = R ( AAT ) using only definitions and elementary
properties of the Moore-Penrose pseudoinverse.
m× n
25. For A ∈ , prove that R( A− ) = R( AT ) .

p× n m× n
26. For A ∈ and B ∈ show that N ( A) ⊆ N ( B ) if and only if BA− A = B .

⎛ 0 −1 ⎞
27. Find the Jordan form of this matrix: ⎜ ⎟
⎝1 0 ⎠

⎛1 1⎞
28. Compute the pseudoinverse of ⎜ ⎟
⎝ 2 2⎠
n× n n× m m× m
29. Let A ∈ , B∈ , D∈ and suppose further that D is nonsingular.
(a) Prove or disprove that

⎡ A AB ⎤ ⎡ A− − A− ABD − ⎤
⎢0 D ⎥ = ⎢ ⎥
⎣ ⎦ ⎣0 D− ⎦
(b) Prove or disprove that

⎡A B⎤ ⎡ A− − A− BD − ⎤
⎢ 0 D⎥ = ⎢ ⎥
⎣ ⎦ ⎣0 D− ⎦
30. Solve the least square problem of
⎛2 −4 5⎞ ⎧1⎫
⎜ ⎟ ⎪3⎪
6 0 3⎟ ⎪ ⎪
A=⎜ , b=⎨ ⎬
⎜2 −4 5⎟ ⎪−1⎪
⎜ ⎟
⎝6 0 3⎠ ⎩⎪ 3 ⎭⎪

31. Find the Jordan form from the given data.


(a) The matrix T is 5 × 5 with the single eigenvalue 3. The nullities of the powers
are: T − 3I has nullity two, (T − 3I ) has nullity three, (T − 3I ) has nullity
2 3

four, and (T − 3I ) has nullity five.


4

- 244 -
(b) The matrix S is 5 × 5 with two eigenvalues. For the eigenvalue 2 the nullities

( S − 2I )
2
are: S − 2 I has nullity two, and has nullity four. For the eigenvalue
-1 the nullities are: S + I has nullity one.
32. Find the Jordan form and a Jordan basis and S for each matrix.
⎛ −10 4 ⎞ ⎛ 5 −4 ⎞
(a) ⎜ ⎟ (b) ⎜ ⎟
⎝ −25 10 ⎠ ⎝ 9 −7 ⎠

⎛4 0 0⎞ ⎛5 4 3⎞ ⎛ 9 7 3⎞ ⎛ 2 2 −1 ⎞
(c) ⎜⎜ 2 1 3 ⎟⎟ ⎜ ⎟
(d) ⎜ −1 0 −3 ⎟
⎜ ⎟
(e) ⎜ −9 −7 −4 ⎟

(d) ⎜ −1 −1 1 ⎟

⎜5 0 4⎟ ⎜ 1 −2 1 ⎟ ⎜ 4 4 4⎟ ⎜ −1 − 2 2 ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎛7 1 2 2⎞
⎜ ⎟
1 4 −1 −1 ⎟
(f) ⎜
⎜ −2 1 5 −1 ⎟
⎜ ⎟
⎝1 1 2 8⎠

- 245 -
References:

1. Chan YM, Matrix Theory and Its Application MATH2303 [online]:


http://hkumath.hku.hk/course/MATH2303/ [Accessed December 2008].
2. Li CH, Linear Algebra with Applications [online]:
http://banyan.cm.nctu.edu.tw/linearalgebra2006/ [Accessed October 2008].
3. Hefferon J, Linear algebra [online]: http://joshua.smcvt.edu/linearalgebra/
[Accessed July 2010].
4. Goldberg JL, Matrix Theory with Applications, McGraw-Hill, Columbus, 1991.
5. Leon SJ, Linear Algebra with Applications, Macmillan, 1994.
6. Rorres C, Howard Anton H, Applications of Linear Algebra, Wiley, 1984.
7. Horn RA, Johnson CR, Matrix Analysis, Cambridge University Press, 1987.
8. The Mathworks, Inc., The Student Edition of Matlab, Prentice - Hall, 1995.
9. Kaw AK, Introduction to Matrix Algebra, Autar Kaw, Florida, 2008.
10. Szidarovszky F, Molnár S, Introduction to Matrix Theory (Series on Concrete and
Applicable Mathematics), World Scientific Publishing Company, Singapore, 2002.
11. Namboodiri K, Matrix Algebra: An Introduction (Quantitative Applications in the
Social Sciences). Sage Publications, Inc, California, 1984.
12. Ayres JF, Schaum's Outline of Theory and Problems of Matrices: Including 340
Solved Problems, Completely Solved in Detail, Mcgraw-Hill, New York, 1967.
13. Khamsi MA, Algebraic Properties of Matrix Operations. MathMedics, LLC. [online].
http://www.sosmath.com/matrix/matrix2/matrix2.html [Accessed December 2009].
14. Ionescu M, Math 24: Linear Algebra. [online]:
http://www.math.dartmouth.edu/archive/m24w07/public_html/ [Accessed June 2009].
15. Vorobets Y, MATH 304-503: Linear Algebra. [online]:
http://www.math.tamu.edu/~yvorobet/MATH304-503/ [Accessed September 2009].
16. Mocioala O, Linear Algebra MA26500. [online]:
http://www.math.purdue.edu/academic/courses/MA26500/ [Accessed June 2010].
17. Antonenko A, Linear Algebra. [online]: http://www.ams.sunysb.edu/~andant/
ams210-lec.html [Accessed March 2010]
18. Bandtlow O, Systems of Linear Equations MTH5112. [online]:
http://www.maths.qmul.ac.uk/~ob/MTH5112 [Accessed August 2009]
19. http://en.wikipedia.org/wiki/Main_Page

- 246 -

You might also like